ggplot2: Geoms

Agenda


Create some of the most routinely used plots to explore data using the geom_* functions:

  • Scatter Plot
  • Bar Plot
  • Box Plot
  • Histogram
  • Line Chart
  • Regression Line

Introduction

Data

Introduction


ecom

Data Dictionary


  • id: row id
  • referrer: referrer website/search engine
  • os: operating system
  • browser: browser
  • device: device used to visit the website
  • n_pages: number of pages visited
  • duration: time spent on the website (in seconds)
  • repeat: frequency of visits
  • country: country of origin
  • purchase: whether visitor purchased
  • order_value: order value of visitor (in dollars)

Point

Introduction


A scatter plot displays the relationship between two continuous variables. In ggplot2, we can build a scatter plot using geom_point(). Scatterplots can show you visually

  • the strength of the relationship between the variables
  • the direction of the relationship between the variables
  • and whether outliers exist

Example


ggplot(mtcars, aes(x = disp, y = mpg)) +
  geom_point()

Instructions


  • set x to wt
  • set y to mpg
  • create a scatter plot by representing the data using points
ggplot(mtcars, aes(x = , y = )) + 
ggplot(mtcars, aes(x = wt, y = mpg)) + 
  geom_point()

Bar

Introduction


Bar plots present grouped data with rectangular bars. The bars may represent the frequency of the groups or values. Bar plots can be:

  • horizontal
  • vertical
  • grouped
  • stacked
  • proportional

Example


ggplot(mtcars, aes(x = cyl)) +
  geom_bar()

Instructions


  • set x to device
  • represent the data using bars
ggplot(ecom, aes(x = factor())) +
ggplot(ecom, aes(x = factor(device))) +
  geom_bar()

Boxplot

Introduction


  • examine the distribution of a variable
  • detect outliers, boxplots are very handy

Example


ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_boxplot()

Instructions


  • set x to device
  • set y to n_pages
  • represent the data using a boxplot
ggplot(ecom, aes(x = factor(), y = )) +
ggplot(ecom, aes(x = factor(device), y = n_pages)) +
  geom_boxplot()

Histogram - Part 1

Introduction


Histograms are used to examine:

  • distribution of a continuous variable
  • skewness and kurtosis

Example


ggplot(mtcars, aes(x = mpg)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Instructions


  • set x to duration
  • represent the data using a histogram
ggplot(ecom, aes(x = )) +
ggplot(ecom, aes(x = duration)) +
  geom_histogram()

Histogram - Part 2

Bins


Example


ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(bins = 5)

Instructions


  • set x to duration
  • represent the data using a histogram
  • set the number of bins to 5
ggplot(ecom, aes(x = )) +
ggplot(ecom, aes(x = duration)) +
  geom_histogram(bins = 5)

Line - Part 1

Introduction


Line charts are used to examine trends over time.

Data


gdp

Example


ggplot(gdp, aes(year, china)) +
  geom_line()

Line - Part 2

Instructions


  • set x to year
  • set y to india
  • represent the data using a line
ggplot(gdp, aes(x = ___, y = ___ )) +
ggplot(gdp, aes(year, india)) +
  geom_line()

Label


Instructions


ggplot(mtcars, aes(disp, mpg, label = rownames(mtcars))) +
  geom_label()

Text

Introduction


Instructions


ggplot(mtcars, aes(disp, mpg, label = rownames(mtcars))) +
  geom_text()