Readable Code with Pipes

Introduction


When you are dealing with a sequence of multiple operations, R code can get a bit cramped and not so easy on the eyes. The magrittr package by Stefan Milton Bache provides pipes enabling us to write R code that is readable.

  • especially useful when you have nested functions
  • similar in spirit to javascript’s method chaining
  • functions taking multiple arguments can be confusing and messy to read
  • with magrittr, you program from left to right

Pipes


R being a functional language, code contains a lot of parentheses. Complex code results in nested functions making them hard to read and maintain. If you are using tidyverse packages, magrittr will be automatically loaded. We will look at 3 different types of pipes:

  • %>% : pipe a value forward into an expression or function call
  • %<>%: result assigned to left hand side object instead of returning it
  • %$% : expose names within left hand side objects to right hand side expressions

Data

View Data


ecom

Data Dictionary


  • id: row id
  • referrer: referrer website/search engine
  • os: operating system
  • browser: browser
  • device: device used to visit the website
  • n_pages: number of pages visited
  • duration: time spent on the website (in seconds)
  • repeat: frequency of visits
  • country: country of origin
  • purchase: whether visitor purchased
  • order_value: order value of visitor (in dollars)

%>%

%<>%

%$%

Instructions


  • use %>% and tail() to get the last 10 rows of mtcars
head(ecom, 10)

# using pipe
ecom %>% head(10)

# tail
mtcars %>% tail(10)

Square Root

Introduction


y <- ecom$n_pages
y <- sqrt(y)
y

Square Root - Using Pipe 1


Now let us learn how to compute square root using pipe operators. In the above example, we have done two things:

  • assign n_pages to y using $
  • compuate square root of y and assign the result to y itself

We can assign expose a column from a data set using the %$% operator. For example, y <- mtcars %$% mpg will assign mpg to y. Similarly, we can assign the result of an operation performed on a variable to itself using %<>% operator. Let us assume you want to assign the absolute value of a variable to itself. This is how you would do it normally:

y <- abs(y)

Using %<>% operator, this is how you will achieve it:

y %<>% abs()

Instructions


  • use %$% to assing n_pages from ecom to y
  • use %<>% to compute square root of y and assign it to y
# select n_pages variable and assign it to y
y <- ecom 

# compute square root of y and assign it to y 
y 
# select n_pages variable and assign it to y
y <- ecom %$% n_pages

# compute square root of y and assign it to y 
y %<>% sqrt()

Square Root - Using Pipe 2


In the first example, we computed the square root of y in two steps while we could have achieved it in a single step like this:

y <- sqrt(econ$n_pages)

What we are doing above is:

  • select n_pages from econ
  • pass it on to sqrt()
  • assign the result to y

Instructions


Let us try to do this using pipes:

  • expose n_pages from econ using %$%
  • pass it on to sqrt() using %>%
  • assign the result to y

We have written the first part for you.

y <- ecom %$% 
  n_pages
y <- ecom %$% 
  n_pages %>% 
  sqrt()

Correlation

Introduction


From the ecommerce data, we want to explore the relationship between number of visits and time spent on the site for those who purchase/convert. We can achieve this in the following steps:

  • filter data for those who have purchased/converted
  • compute correlation by selecting n_visit and duration
ecom1 <- subset(ecom, purchase == 'true')
cor(ecom1$n_visit, ecom1$duration)

Correlation - Using pipe

We can chain functions using pipe operators. For example, using mtcars, to compute the average miles per gallon for cars with eight cylinders we will write:

mtcars %>% 
  subset(cyl == 8) %$% 
  mean(mpg)

This is how you can read the above code:

  • filter data from mtcars where cyl == 8 using subset()
  • from the filtered data set expose mpg using %$% and pass it into mean()

Instructions

Let us use pipe operators to compute the correlation between n_visit and duration:

  • filter data for those who have purchased (purchase == 'true') using subset() and %>%
  • expose n_visit and duration using %$% and pass them onto cor()
# with pipe
ecom %>%
  subset(purchase == 'true') 
# with pipe
ecom %>%
  subset(purchase == 'true') %$% 
  cor(n_visit, duration)

Visualization

Introduction


Let us look at a visualization example. From the ecommerce data set, we have the distribution of referrers (for those who have purchased/converted). We can decompose the code into the following steps:

  • subset data for those who have purchased/converted
  • extract the referrer column using $
  • compute the frequency using table()
  • pass the data to barplot()
barplot(table(subset(ecom, purchase == 'true')$referrer))

Visualization - Using Pipe


Let us build a barplot from mtcars data.

# without pipe
barplot(table(subset(mtcars, cyl == 8)$am))

# with pipe
mtcars %>%`
  subset(cyl == 8) %$%
  am %>%
  table() %>%
  barplot()

Visualization - Practice

Instructions


Let us now use pipes to build the same plot. We have written the partial code for you:

  • pass on the referrer data to table() using %>%
  • pass on the result from the previous step to barplot() using %>%
ecom %>%
  subset(purchase == 'true') %$%
  referrer
ecom %>%
  subset(purchase == 'true') %$%
  referrer %>%
  table() %>%
  barplot()

Regression

Introduction


summary(lm(duration ~ n_pages, data = ecom))

Regression - Using Pipe


# without pipe
summary(lm(disp ~ mpg, data = mtcars))

# with pipe
mtcars %$%`
  lm(disp ~ mpg) %>%
  summary()

Regression - Practice

Instructions


  • expose duration and n_pages from ecom using %$%
  • pass them onto lm()
  • pass the result from lm() to summary() using %>%
ecom %$%
ecom %$%
  lm(duration ~ n_pages) %>%
  summary()

String Manipulation

Introduction


email <- 'jovialcann@anymail.com'

# without pipe
toupper(strtrim(strsplit(email, '@')[[1]][1], 6))

String Manipulation - With Pipe

Instructions

email %>%
  strsplit(split = '@') %>%
  extract2(1) %>%
  extract(1) %>%
  strtrim(width = 6) %>%
  toupper()

Data Extraction

Introduction


  • extract()
  • extract2()
  • use_series()

Extract Column By Name


head(ecom['n_pages'], 3)

ecom %>%
  extract('n_pages') %>%
  head(3)

Extract Column By Position


head(ecom[6], 3)

ecom %>%
  extract(6) %>%
  head(3)

Extract Column (as vector)


head(ecom$n_pages)

ecom %>%
  use_series('n_pages') %>%
  head()

Extract List Element By Name


mt <- as.list(mtcars)

mt[['mpg']]

mt %>%
  extract2('mpg')

Extract List Element By Position


mt <- as.list(mtcars)

mt[[1]]

mt %>%
  extract2(1)

Extract List Element (as vector)


mt <- as.list(mtcars)

mt$mpg

mt %>%
  use_series(mpg)

Arithmetic Operations

Introduction


  • add()
  • subtract()
  • multiply_by()
  • multiply_by_matrix()
  • divide_by()
  • divide_by_int()
  • mod()
  • raise_to_power()

Addition


1:10 %>%
  `+`(1)

1:10 %>%
  add(1)

Multiplication


1:10 %>%
  `*`(3)

1:10 %>%
  multiply_by(3)

Division


1:10 %>%
  `/`(2)

1:10 %>%
  divide_by(2)

Power


1:10 %>%
  `^`(2)

1:10 %>%
  raise_to_power(2)

Logical Operators

Introduction


  • and()
  • or()
  • equals()
  • not()
  • is_greater_than()
  • is_weakly_greater_than()
  • is_less_than()
  • is_weakly_less_than()

Greater Than


1:10 %>%
  `>`(5)

1:10 %>%
  is_greater_than(5)

Weakly Greater Than


1:10 %>%
  `>=`(5)

1:10 %>%
  is_weakly_greater_than(5)