dplyr Part 3

Introduction

In this module, we will explore a set of helper functions in order to:

  • extract unique rows
  • rename columns
  • sample data
  • extract columns
  • slice rows
  • arrange rows
  • compare tables
  • extract/mutate data using predicate functions
  • count observations for different levels of a variable

Case Study


Let us look at a case study (e-commerce data) and see how we can use dplyr helper functions to answer questions we have about and to modify/transform the underlying data set. You can download the data from here or import it directly using read_csv() from the readr package.

Data


ecom

Data Dictionary


  • id: row id
  • referrer: referrer website/search engine
  • os: operating system
  • browser: browser
  • device: device used to visit the website
  • n_pages: number of pages visited
  • duration: time spent on the website (in seconds)
  • repeat: frequency of visits
  • country: country of origin
  • purchase: whether visitor purchased
  • order_value: order value of visitor (in dollars)

Data Sanitization


Traffic Sources


distinct(ecom, referrer)

Device Types


distinct(ecom, device)

Rename Columns


Rename Columns


rename(ecom, time_on_site = duration)

Sampling Data


Sampling Data


sample_n(ecom, 700)

Sampling Data


sample_frac(ecom, size = 0.7)

Extract Columns


Extract Columns


pull(ecom, device)

Extract Columns


pull(ecom, 1)

Extract Columns


pull(ecom, -1)

Extract Rows


Extract Rows


slice(ecom, 1:20)

Extract Row


slice(ecom, n())

Tabulate Data


Tabulate Data


ecom %>%
  group_by(referrer) %>%
  tally()

Tabulate Data


ecom %>%
  group_by(referrer, bouncers) %>%
  tally()

Tabulate Data


ecom %>%
  group_by(referrer, purchase) %>%
  tally()

Tabulate Data


ecom %>%
  group_by(referrer, purchase) %>%
  tally() %>%
  filter(purchase == 'true')

Count


count(ecom, referrer, purchase)

Top Referrers


Top Referrers


ecom %>%
  count(referrer, purchase) %>%
  filter(purchase == 'true') %>%
  arrange(desc(n)) %>%
  top_n(n = 2)

Between


ecom %>%
  pull(n_pages) %>%
  between(5, 15) 

Case When


Select First Observation


ecom %>%
  pull(referrer) %>%
  nth(1)

ecom %>%
  pull(referrer) %>%
  first()

Select 1000th Observation


ecom %>%
  pull(referrer) %>%
  nth(1000)

Select Last Observation


ecom %>%
  pull(referrer) %>%
  last()