R code contain a lot of parentheses in case of a sequence of multiple operations. When you are dealing with complex code, it results in nested function calls which are hard to read and maintain. The magrittr package by Stefan Milton Bache provides pipes enabling us to write R code that is readable.

Pipes allow us to clearly express a sequence of multiple operations by:

- structuring operations from left to right
- avoiding
- nested function calls
- intermediate steps
- overwriting of original data

- minimizing creation of local variables

If you are using tidyverse, magrittr will be automatically loaded. We will look at 3 different types of pipes:

`%>%`

: pipe a value forward into an expression or function call`%<>%`

: result assigned to left hand side object instead of returning it`%$%`

: expose names within left hand side objects to right hand side expressions

We will create a smaller data set from the above data to be used in some examples:

- referrer: referrer website/search engine
- n_pages: number of pages visited
- duration: time spent on the website (in seconds)
- purchase: whether visitor purchased

Let us start with a simple example. You must be aware of `head()`

. If not, do not worry. It returns the first few observations/rows of data. We can specify the number of observations it should return as well. Let us use it to view the first 10 rows of our data set.

Now let us do the same but with `%>%`

.

- use
`%>%`

and`tail()`

to get the last 10 rows of`mtcars`

```
head(ecom, 10)
# using pipe
ecom %>% head(10)
# tail
```

`mtcars %>% tail(10)`

Time to try a slightly more challenging example. We want the square root of `n_pages`

column from the data set. To ensure the output does not clutter the page, we will view the first few observations using `head()`

.

```
## [1] 2.449490 3.464102 1.000000 2.236068 4.242641 4.242641 1.000000
## [8] 1.000000 1.000000 1.000000
```

Let us break down the above computation into small steps:

- select/expose the
`n_pages`

column from`ecom`

data - compute the square root
- assign the first few observations to
`y`

Now let us learn how to compute square root using pipe operators. In the above example, we have done two things:

- assign
`n_pages`

to`y`

using`$`

- compuate square root of
`y`

and assign the result to`y`

itself

We can assign expose a column from a data set using the `%$%`

operator. For example, `y <- mtcars %$% mpg`

will assign `mpg`

to `y`

. Similarly, we can assign the result of an operation performed on a variable to itself using `%<>%`

operator. Let us assume you want to assign the absolute value of a variable to itself. This is how you would do it normally:

`y <- abs(y)`

Using `%<>%`

operator, this is how you will achieve it:

`y %<>% abs()`

- use
`%$%`

to assing`n_pages`

from`ecom`

to`y`

- use
`%<>%`

to compute square root of`y`

and assign it to`y`

```
# select n_pages variable and assign it to y
# compute square root of y and assign it to y
```

```
# select n_pages variable and assign it to y
y <-
ecom_mini %$%
n_pages
# compute square root of y and assign it to y
y %<>% sqrt
```

In the first example, we computed the square root of `y`

in two steps while we could have achieved it in a single step like this:

`y <- sqrt(econ$n_pages)`

What we are doing above is:

- select
`n_pages`

from`ecom`

- pass it on to
`sqrt()`

- assign the result to
`y`

Let us try to do this using pipes:

- expose
`n_pages`

from`ecom`

using`%$%`

- pass it on to
`sqrt()`

using`%>%`

- assign the result to
`y`

We have written the first part for you.

```
y <- ecom %$%
n_pages %>%
sqrt()
```

Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. In R, correlation is computed using `cor()`

. Let us look at the correlation between the number of pages browsed and time spent on the site for visitors who purchased some product. Below are the steps for computing correlation:

- extract rows where purchase is TRUE
- select/expose
`n_pages`

and`duration`

columns - use
`cor()`

to compute the correlation

```
# without pipe
ecom1 <- subset(ecom, purchase)
cor(ecom1$n_pages, ecom1$duration)
```

`## [1] 0.4290905`

We can chain functions using pipe operators. For example, using `mtcars`

, to compute the average miles per gallon for cars with eight cylinders we will write:

```
mtcars %>%
subset(cyl == 8) %$%
mean(mpg)
```

This is how you can read the above code:

- filter data from
`mtcars`

where`cyl == 8`

using`subset()`

- from the filtered data set expose
`mpg`

using`%$%`

and pass it into`mean()`

Let us use pipe operators to compute the correlation between `n_pages`

and `duration`

:

- filter data for those who have purchased using
`subset()`

and`%>%`

- expose
`n_pages`

and`duration`

using`%$%`

and pass them onto`cor()`

```
# with pipe
ecom %>%
subset(purchase)
```

```
# with pipe
ecom %>%
subset(purchase) %$%
cor(n_pages, duration)
# with pipe
ecom %>%
filter(purchase) %$%
cor(n_pages, duration)
```

Let us look at a data visualization example. We will create a bar plot to visualize the frequency of different referrer types that drove purchasers to the website. Let us look at the steps involved in creating the bar plot:

- extract rows where purchase is
`TRUE`

- select/expose
`referrer`

column - tabulate referrer data using
`table()`

- use the tabulated data to create bar plot using
`barplot()`

`barplot(table(subset(ecom, purchase)$referrer))`

Let us build a barplot from `mtcars`

data.

```
# without pipe
barplot(table(subset(mtcars, cyl == 8)$am))
# with pipe
mtcars %>%`
subset(cyl == 8) %$%
am %>%
table() %>%
barplot()
```

Let us now use pipes to build the same plot. We have written the partial code for you:

- pass on the referrer data to
`table()`

using`%>%`

- pass on the result from the previous step to
`barplot()`

using`%>%`

```
ecom %>%
subset(purchase) %$%
referrer %>%
table() %>%
barplot()
```

Let us look at a regression example. We regress time spent on the site on number of pages visited. Below are the steps involved in running the regression:

- use
`duration`

and`n_pages`

columns from econ data - pass the above data to
`lm()`

- pass the output from
`lm()`

to`summary()`

`summary(lm(duration ~ n_pages, data = ecom))`

```
# without pipe
summary(lm(disp ~ mpg, data = mtcars))
# with pipe
mtcars %$%`
lm(disp ~ mpg) %>%
summary()
```

- expose
`duration`

and`n_pages`

from`ecom`

using`%$%`

- pass them onto
`lm()`

- pass the result from
`lm()`

to`summary()`

using`%>%`

```
ecom %$%
lm(duration ~ n_pages) %>%
summary()
```

We want to extract the first name (jovial) from the below email id and convert it to upper case. Below are the steps to achieve this:

- split the email id using the pattern
`@`

using`str_split()`

- extract the first element from the resulting list using
`extract2()`

- extract the first element from the character vector using
`extract()`

- extract the first six characters using
`str_sub()`

- convert to upper case using
`str_to_upper()`

`## [1] "JOVIAL"`

```
email %>%
str_split(pattern = '@') %>%
extract2(1) %>%
extract(1) %>%
str_sub(start = 1, end = 6) %>%
str_to_upper()
```

Let us turn our attention towards data extraction. magrittr provides alternatives to `$`

, `[`

and `[[`

.

`extract()`

`extract2()`

`use_series()`

To extract a specific column using the column name, we mention the name of the column in single/double quotes within `[`

or `[[`

. In case of `$`

, we do not use quotes.

Let us extract the first 3 observations of `n_pages`

column.

```
# base
ecom_mini['n_pages']
# magrittr
extract(ecom_mini, 'n_pages')
```

We can extract columns using their index position. Keep in mind that index position starts from **1** in R. In the below example, we show how to extract `n_pages`

column but instead of using the column name, we use the column position.

```
# base
ecom_mini[2]
# magrittr
extract(ecom_mini, 2)
```

One important differentiator between `[`

and `[[`

is that `[[`

will return a atomic vector and not a `data.frame`

. `$`

will also return a atomic vector. In magrittr, we can use `use_series()`

in place of `$`

.

```
# base
ecom_mini$n_pages
# magrittr
use_series(ecom_mini, 'n_pages')
```

Let us convert `ecom_mini`

into a list using as.list() as shown below:

`ecom_list <- as.list(ecom_mini)`

To extract elements of a list, we can use `extract2()`

. It is an alternative for `[[`

.

```
# base
ecom_list[['n_pages']]
# magrittr
extract2(ecom_list, 'n_pages')
```

```
# base
ecom_list[[1]]
# magrittr
extract2(ecom_list, 1)
```

We can extract the elements of a list using `use_series()`

as well.

```
# base
ecom_list$n_pages
# magrittr
use_series(ecom_list, n_pages)
```

`add()`

`subtract()`

`multiply_by()`

`multiply_by_matrix()`

`divide_by()`

`divide_by_int()`

`mod()`

`raise_to_power()`

```
1:10 + 1
add(1:10, 1)
`+`(1:10, 1)
```

```
1:10 * 3
multiply_by(1:10, 3)
`*`(1:10, 3)
```

```
1:10 / 2
divide_by(1:10, 2)
`/`(1:10, 2)
```

```
1:10 ^ 2
raise_to_power(1:10, 2)
`^`(1:10, 2)
```

`and()`

`or()`

`equals()`

`not()`

`is_greater_than()`

`is_weakly_greater_than()`

`is_less_than()`

`is_weakly_less_than()`

```
1:10 > 5
is_greater_than(1:10, 5)
`>`(1:10, 5)
```

```
1:10 >= 5
is_weakly_greater_than(1:10, 5)
`>=`(1:10, 5)
```