ggplot2: Geoms

## Agenda

Create some of the most routinely used plots to explore data using the geom_* functions:

• Scatter Plot
• Bar Plot
• Box Plot
• Histogram
• Line Chart
• Regression Line

## Data

### Introduction

ecom

### Data Dictionary

• id: row id
• referrer: referrer website/search engine
• os: operating system
• browser: browser
• device: device used to visit the website
• n_pages: number of pages visited
• duration: time spent on the website (in seconds)
• repeat: frequency of visits
• country: country of origin
• purchase: whether visitor purchased
• order_value: order value of visitor (in dollars)

## Point

### Introduction

A scatter plot displays the relationship between two continuous variables. In ggplot2, we can build a scatter plot using geom_point(). Scatterplots can show you visually

• the strength of the relationship between the variables
• the direction of the relationship between the variables
• and whether outliers exist

### Example

ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point()

### Instructions

• set x to wt
• set y to mpg
• create a scatter plot by representing the data using points
ggplot(mtcars, aes(x = , y = )) + 
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point()

## Bar

### Introduction

Bar plots present grouped data with rectangular bars. The bars may represent the frequency of the groups or values. Bar plots can be:

• horizontal
• vertical
• grouped
• stacked
• proportional

### Example

ggplot(mtcars, aes(x = cyl)) +
geom_bar()

### Instructions

• set x to device
• represent the data using bars
ggplot(ecom, aes(x = factor())) +
ggplot(ecom, aes(x = factor(device))) +
geom_bar()

## Boxplot

### Introduction

• examine the distribution of a variable
• detect outliers, boxplots are very handy

### Example

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_boxplot()

### Instructions

• set x to device
• set y to n_pages
• represent the data using a boxplot
ggplot(ecom, aes(x = factor(), y = )) +
ggplot(ecom, aes(x = factor(device), y = n_pages)) +
geom_boxplot()

## Histogram - Part 1

### Introduction

Histograms are used to examine:

• distribution of a continuous variable
• skewness and kurtosis

### Example

ggplot(mtcars, aes(x = mpg)) +
geom_histogram()
## stat_bin() using bins = 30. Pick better value with binwidth.

### Instructions

• set x to duration
• represent the data using a histogram
ggplot(ecom, aes(x = )) +
ggplot(ecom, aes(x = duration)) +
geom_histogram()

## Histogram - Part 2

### Example

ggplot(mtcars, aes(x = mpg)) +
geom_histogram(bins = 5)

### Instructions

• set x to duration
• represent the data using a histogram
• set the number of bins to 5
ggplot(ecom, aes(x = )) +
ggplot(ecom, aes(x = duration)) +
geom_histogram(bins = 5)

## Line - Part 1

### Introduction

Line charts are used to examine trends over time.

### Data

gdp

### Example

ggplot(gdp, aes(year, china)) +
geom_line()

## Line - Part 2

### Instructions

• set x to year
• set y to india
• represent the data using a line
ggplot(gdp, aes(x = ___, y = ___ )) +
ggplot(gdp, aes(year, india)) +
geom_line()

## Label

### Instructions

ggplot(mtcars, aes(disp, mpg, label = rownames(mtcars))) +
geom_label()

## Text

### Instructions

ggplot(mtcars, aes(disp, mpg, label = rownames(mtcars))) +
geom_text()