ggplot2: Geoms

Agenda

Create some of the most routinely used plots to explore data using the geom_* functions:

• Scatter Plot
• Bar Plot
• Box Plot
• Histogram
• Line Chart
• Regression Line

Data

Introduction

ecom

Data Dictionary

• id: row id
• referrer: referrer website/search engine
• os: operating system
• browser: browser
• device: device used to visit the website
• n_pages: number of pages visited
• duration: time spent on the website (in seconds)
• repeat: frequency of visits
• country: country of origin
• purchase: whether visitor purchased
• order_value: order value of visitor (in dollars)

Point

Introduction

A scatter plot displays the relationship between two continuous variables. In ggplot2, we can build a scatter plot using geom_point(). Scatterplots can show you visually

• the strength of the relationship between the variables
• the direction of the relationship between the variables
• and whether outliers exist

Example

ggplot(mtcars, aes(x = disp, y = mpg)) +
geom_point() Instructions

• set x to wt
• set y to mpg
• create a scatter plot by representing the data using points
ggplot(mtcars, aes(x = , y = )) + 
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point()

Bar

Introduction

Bar plots present grouped data with rectangular bars. The bars may represent the frequency of the groups or values. Bar plots can be:

• horizontal
• vertical
• grouped
• stacked
• proportional

Example

ggplot(mtcars, aes(x = cyl)) +
geom_bar() Instructions

• set x to device
• represent the data using bars
ggplot(ecom, aes(x = factor())) +
ggplot(ecom, aes(x = factor(device))) +
geom_bar()

Boxplot

Introduction

• examine the distribution of a variable
• detect outliers, boxplots are very handy

Example

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
geom_boxplot() Instructions

• set x to device
• set y to n_pages
• represent the data using a boxplot
ggplot(ecom, aes(x = factor(), y = )) +
ggplot(ecom, aes(x = factor(device), y = n_pages)) +
geom_boxplot()

Histogram - Part 1

Introduction

Histograms are used to examine:

• distribution of a continuous variable
• skewness and kurtosis

Example

ggplot(mtcars, aes(x = mpg)) +
geom_histogram()
## stat_bin() using bins = 30. Pick better value with binwidth. Instructions

• set x to duration
• represent the data using a histogram
ggplot(ecom, aes(x = )) +
ggplot(ecom, aes(x = duration)) +
geom_histogram()

Histogram - Part 2

Example

ggplot(mtcars, aes(x = mpg)) +
geom_histogram(bins = 5) Instructions

• set x to duration
• represent the data using a histogram
• set the number of bins to 5
ggplot(ecom, aes(x = )) +
ggplot(ecom, aes(x = duration)) +
geom_histogram(bins = 5)

Line - Part 1

Introduction

Line charts are used to examine trends over time.

Data

gdp

Example

ggplot(gdp, aes(year, china)) +
geom_line() Line - Part 2

Instructions

• set x to year
• set y to india
• represent the data using a line
ggplot(gdp, aes(x = ___, y = ___ )) +
ggplot(gdp, aes(year, india)) +
geom_line()

Label

Instructions

ggplot(mtcars, aes(disp, mpg, label = rownames(mtcars))) +
geom_label()

Text

Instructions

ggplot(mtcars, aes(disp, mpg, label = rownames(mtcars))) +
geom_text()