Readable Code with Pipes

Introduction


R code contain a lot of parentheses in case of a sequence of multiple operations. When you are dealing with complex code, it results in nested function calls which are hard to read and maintain. The magrittr package by Stefan Milton Bache provides pipes enabling us to write R code that is readable.

Pipes allow us to clearly express a sequence of multiple operations by:

  • structuring operations from left to right
  • avoiding
    • nested function calls
    • intermediate steps
    • overwriting of original data
  • minimizing creation of local variables

Pipes


If you are using tidyverse, magrittr will be automatically loaded. We will look at 3 different types of pipes:

  • %>% : pipe a value forward into an expression or function call
  • %<>%: result assigned to left hand side object instead of returning it
  • %$% : expose names within left hand side objects to right hand side expressions

Data

We will create a smaller data set from the above data to be used in some examples:

Data Dictionary

  • referrer: referrer website/search engine
  • n_pages: number of pages visited
  • duration: time spent on the website (in seconds)
  • purchase: whether visitor purchased

First Example

Let us start with a simple example. You must be aware of head(). If not, do not worry. It returns the first few observations/rows of data. We can specify the number of observations it should return as well. Let us use it to view the first 10 rows of our data set.

Using Pipe

Now let us do the same but with %>%.

Instructions


  • use %>% and tail() to get the last 10 rows of mtcars
head(ecom, 10)

# using pipe
ecom %>% head(10)

# tail
mtcars %>% tail(10)

Square Root

Introduction


Time to try a slightly more challenging example. We want the square root of n_pages column from the data set. To ensure the output does not clutter the page, we will view the first few observations using head().

##  [1] 2.449490 3.464102 1.000000 2.236068 4.242641 4.242641 1.000000
##  [8] 1.000000 1.000000 1.000000

Let us break down the above computation into small steps:

  • select/expose the n_pages column from ecom data
  • compute the square root
  • assign the first few observations to y


Square Root - Using Pipe


Now let us learn how to compute square root using pipe operators. In the above example, we have done two things:

  • assign n_pages to y using $
  • compuate square root of y and assign the result to y itself

We can assign expose a column from a data set using the %$% operator. For example, y <- mtcars %$% mpg will assign mpg to y. Similarly, we can assign the result of an operation performed on a variable to itself using %<>% operator. Let us assume you want to assign the absolute value of a variable to itself. This is how you would do it normally:

y <- abs(y)

Using %<>% operator, this is how you will achieve it:

y %<>% abs()

Instructions


  • use %$% to assing n_pages from ecom to y
  • use %<>% to compute square root of y and assign it to y
# select n_pages variable and assign it to y


# compute square root of y and assign it to y 
# select n_pages variable and assign it to y
y <-
    ecom_mini %$%
    n_pages

# compute square root of y and assign it to y 
y %<>% sqrt

Square Root - Using Pipe


In the first example, we computed the square root of y in two steps while we could have achieved it in a single step like this:

y <- sqrt(econ$n_pages)

What we are doing above is:

  • select n_pages from ecom
  • pass it on to sqrt()
  • assign the result to y

Instructions


Let us try to do this using pipes:

  • expose n_pages from ecom using %$%
  • pass it on to sqrt() using %>%
  • assign the result to y

We have written the first part for you.

y <- ecom %$% 
  n_pages %>% 
  sqrt()

Correlation

Introduction


Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. In R, correlation is computed using cor(). Let us look at the correlation between the number of pages browsed and time spent on the site for visitors who purchased some product. Below are the steps for computing correlation:

  • extract rows where purchase is TRUE
  • select/expose n_pages and duration columns
  • use cor() to compute the correlation


# without pipe
ecom1 <- subset(ecom, purchase)
cor(ecom1$n_pages, ecom1$duration)
## [1] 0.4290905

Correlation - Using pipe

We can chain functions using pipe operators. For example, using mtcars, to compute the average miles per gallon for cars with eight cylinders we will write:

mtcars %>% 
  subset(cyl == 8) %$% 
  mean(mpg)

This is how you can read the above code:

  • filter data from mtcars where cyl == 8 using subset()
  • from the filtered data set expose mpg using %$% and pass it into mean()

Instructions

Let us use pipe operators to compute the correlation between n_pages and duration:

  • filter data for those who have purchased using subset() and %>%
  • expose n_pages and duration using %$% and pass them onto cor()
# with pipe
ecom %>%
  subset(purchase) 
# with pipe
ecom %>%
  subset(purchase) %$% 
  cor(n_pages, duration)

# with pipe
ecom %>%
  filter(purchase) %$% 
  cor(n_pages, duration)

Visualization

Introduction


Let us look at a data visualization example. We will create a bar plot to visualize the frequency of different referrer types that drove purchasers to the website. Let us look at the steps involved in creating the bar plot:

  • extract rows where purchase is TRUE
  • select/expose referrer column
  • tabulate referrer data using table()
  • use the tabulated data to create bar plot using barplot()
barplot(table(subset(ecom, purchase)$referrer))

Visualization - Using Pipe



Let us build a barplot from mtcars data.

# without pipe
barplot(table(subset(mtcars, cyl == 8)$am))

# with pipe
mtcars %>%`
  subset(cyl == 8) %$%
  am %>%
  table() %>%
  barplot()

Visualization - Practice

Instructions


Let us now use pipes to build the same plot. We have written the partial code for you:

  • pass on the referrer data to table() using %>%
  • pass on the result from the previous step to barplot() using %>%
ecom %>%
  subset(purchase) %$%
  referrer %>%
  table() %>%
  barplot()

Regression

Let us look at a regression example. We regress time spent on the site on number of pages visited. Below are the steps involved in running the regression:

  • use duration and n_pages columns from econ data
  • pass the above data to lm()
  • pass the output from lm() to summary()
summary(lm(duration ~ n_pages, data = ecom))

Regression - Using Pipe


# without pipe
summary(lm(disp ~ mpg, data = mtcars))

# with pipe
mtcars %$%`
  lm(disp ~ mpg) %>%
  summary()

Regression - Practice

Instructions


  • expose duration and n_pages from ecom using %$%
  • pass them onto lm()
  • pass the result from lm() to summary() using %>%
ecom %$%
  lm(duration ~ n_pages) %>%
  summary()

String Manipulation


We want to extract the first name (jovial) from the below email id and convert it to upper case. Below are the steps to achieve this:

  • split the email id using the pattern @ using str_split()
  • extract the first element from the resulting list using extract2()
  • extract the first element from the character vector using extract()
  • extract the first six characters using str_sub()
  • convert to upper case using str_to_upper()
## [1] "JOVIAL"

String Manipulation - With Pipe

Instructions

email %>%
  str_split(pattern = '@') %>%
  extract2(1) %>%
  extract(1) %>%
  str_sub(start = 1, end = 6) %>%
  str_to_upper()

Data Extraction

Let us turn our attention towards data extraction. magrittr provides alternatives to $, [ and [[.

  • extract()
  • extract2()
  • use_series()

Extract Column By Name

To extract a specific column using the column name, we mention the name of the column in single/double quotes within [ or [[. In case of $, we do not use quotes.

Let us extract the first 3 observations of n_pages column.

# base 
ecom_mini['n_pages']

# magrittr
extract(ecom_mini, 'n_pages') 

Extract Column By Position

We can extract columns using their index position. Keep in mind that index position starts from 1 in R. In the below example, we show how to extract n_pages column but instead of using the column name, we use the column position.

# base 
ecom_mini[2]

# magrittr
extract(ecom_mini, 2) 

Extract Column

One important differentiator between [ and [[ is that [[ will return a atomic vector and not a data.frame. $ will also return a atomic vector. In magrittr, we can use use_series() in place of $.

# base 
ecom_mini$n_pages

# magrittr
use_series(ecom_mini, 'n_pages') 

Extract List Element

Let us convert ecom_mini into a list using as.list() as shown below:

ecom_list <- as.list(ecom_mini)

To extract elements of a list, we can use extract2(). It is an alternative for [[.

# base 
ecom_list[['n_pages']]

# magrittr
extract2(ecom_list, 'n_pages')

Extract List Element


# base 
ecom_list[[1]]

# magrittr
extract2(ecom_list, 1)

Extract List Element

We can extract the elements of a list using use_series() as well.

# base 
ecom_list$n_pages

# magrittr
use_series(ecom_list, n_pages)

Arithmetic Operations

Introduction


  • add()
  • subtract()
  • multiply_by()
  • multiply_by_matrix()
  • divide_by()
  • divide_by_int()
  • mod()
  • raise_to_power()

Addition


1:10 + 1

add(1:10, 1)

`+`(1:10, 1)

Multiplication


1:10 * 3

multiply_by(1:10, 3)

`*`(1:10, 3)

Division


1:10 / 2

divide_by(1:10, 2)

`/`(1:10, 2)

Power


1:10 ^ 2

raise_to_power(1:10, 2)

`^`(1:10, 2)

Logical Operators

Introduction


  • and()
  • or()
  • equals()
  • not()
  • is_greater_than()
  • is_weakly_greater_than()
  • is_less_than()
  • is_weakly_less_than()

Greater Than


1:10 > 5

is_greater_than(1:10, 5)

`>`(1:10, 5)

Weakly Greater Than


1:10 >= 5

is_weakly_greater_than(1:10, 5)

`>=`(1:10, 5)

References


  • https://magrittr.tidyverse.org/
  • http://r4ds.had.co.nz/pipes.html