How to work with dates in R?

Introduction


In this module, we will learn to work with date/time data in R using lubridate, an R package that makes it easy to work with dates and time. Let us begin by installing and loading the lubridate pacakge.

Origin


Let us look at the origin for the numbering system used for date and time calculations in R.

lubridate::origin

Current Date/Time


Next, let us check out the current date, time and whether it occurs in the am or pm. now() returns the date, time as well as the time zone whereas today() will return only the current date. am() and pm() return TRUE or FALSE.

now()
today()
am(now())
pm(now())

Data


transact

Data Dictionary


The data set has 3 columns. All the dates are in the format (yyyy-mm-dd).

  • Invoice: invoice date
  • Due: due date
  • Payment: payment date

Case Study


We will use the functions in the lubridate package to answer a few questions we have about the transact data.

  • extract date, month and year from Due
  • compute the number of days to settle invoice
  • compute days over due
  • check if due year is a leap year
  • check when due day in february is 29, whether it is a leap year
  • how many invoices were settled within due date
  • how many invoices are due in each quarter
  • what is the average duration between invoice date and payment date

Extract Date, Month & Year


Extract Date, Month & Year


The first thing we will learn is to extract the date, month and year.

this_day <- as_date('2017-03-23')
this_day <- as_date('2017-03-23')
day(this_day)
month(this_day)
year(this_day)

Case Study


Let us now extract the date, month and year from the Due column.

mutate(transact, 
  due_day   = ,
  due_month = ,
  due_year  = 
)
mutate(transact,
  due_day   = day(Due),
  due_month = month(Due),
  due_year  = year(Due)
)

Course Duration


Course Duration


Time to do some arithmetic with the dates. Let us calculate the duration of a course by subtracting the course start date from the course end date.

course_start    <- as_date('2017-04-12')
course_end      <- as_date('2017-04-21')
course_duration <- 
course_duration
course_start    <- as_date('2017-04-12')
course_end      <- as_date('2017-04-21')
course_duration <- course_end - course_start
course_duration

Case Study


Let us estimate the number of days to settle the invoice by subtracting the date of invoice from the date of payment.

mutate(transact,
  days_to_pay = 
)
mutate(transact,
  days_to_pay = Payment - Invoice
)

Case Study


How many of the invoices were settled post the due date? We can find this by:

  • subtracting the due date from the payment date
  • counting the number of rows where delay < 0
transact %>%
  mutate(
    delay = 
  ) %>%
  filter() %>%
  tally()
transact %>%
  mutate(
    delay = Due - Payment
  ) %>%
  filter(delay < 0) %>%
  tally()

Case Study

Just for fun, let us check if the due year happens to be a leap year.


mutate(transact
  is_leap  = 
)
mutate(transact
  is_leap  = leap_year(Due)
)

Case Study

Let us do some data sanitization. If the due day happens to be February 29, let us ensure that the due year is a leap year. Below are the steps to check if the due year is a leap year:

  • we will extract the following from the due date:
  • day
  • month
  • year
  • we will then create a new column is_leap which will have be set to TRUE if the year is a leap year else it will be set to FALSE
  • filter all the payments due on 29th Feb
  • select the following columns:
  • Due
  • is_leap

transact %>%
  mutate(
    due_day   = ,
    due_month = ,
    due_year  = ,
    is_leap   = 
  ) %>%
  select() %>%
  filter()
transact %>%
  mutate(
    due_day   = day(Due),
    due_month = month(Due),
    due_year  = year(Due),
    is_leap   = leap_year(due_year)
  ) %>%
  select(-(Invoice), -(Payment)) %>%
  filter(due_month == 2 & due_day == 29)

Shift Date


Shift Date


Time to shift some dates. We can shift a date by days, weeks or months. Let us shift the course start date by:

  • 2 days
  • 3 weeks
  • 1 year
course_start + 
course_start + 
course_start + 
course_start + days(2)
course_start + weeks(1)
course_start + years(1)

Interval


Interval


Let us calculate the duration of the course using interval. If you observe carefully, the result is not the duration in days but an object of class interval. Now let us learn how we can use intervals.

interval(course_start, course_end)

Shift Interval


Shift Interval


Intervals can be shifted too. In the below example, we shift the course interval by:

  • 1 day
  • 3 weeks
  • 1 year
course_interval <- interval(course_start, course_end)
int_shift(course_interval, by = )
int_shift(course_interval, by = )
int_shift(course_interval, by = )
course_interval <- interval(course_start, course_end)
int_shift(course_interval, by = days(1))
int_shift(course_interval, by = weeks(3))
int_shift(course_interval, by = years(1))

Intervals Overlap


Intervals Overlap


Let us say you are planning a vacation and want to check if the vacation dates overlap with the course dates. You can do this by:

  • creating vacation and course intervals
  • use int_overlaps() to check if two intervals overlap. It returns TRUE if the intervals overlap else FALSE.

Let us use the vacation start and end dates to create vacation_interval and then check if it overlaps with course_interval.

vacation_start    <- as_date('2017-04-19')
vacation_end      <- as_date('2017-04-25')
vacation_interval <- 
vacation_start    <- as_date('2017-04-19')
vacation_end      <- as_date('2017-04-25')
vacation_interval <- interval(vacation_start, vacation_end)
int_overlaps(course_interval, vacation_interval)

Case Study


Let us use intervals to count the number of invoices that were settled within the due date. To do this, we will:

  • create an interval for the invoice and due date
  • create a new column due_next by incrementing the due date by 1 day
  • another interval for due_next and the payment date
  • if the intervals overlap, the payment was made within the due date
transact %>%
  mutate(
    inv_due_interval = ,
    due_next         = ,
    due_pay_interval = ,
    overlaps         = 
  ) %>%
  select()
transact %>%
  mutate(
    inv_due_interval = interval(Invoice, Due),
    due_next         = Due + days(1),
    due_pay_interval = interval(due_next, Payment),
    overlaps         = int_overlaps(inv_due_interval, due_pay_interval)
  ) %>%
  select(Invoice, Due, Payment, overlaps)

Case Study


Below we show another method to count the number of invoices paid within the due date. Instead of using days to change the due date, we use int_shift to shift it by 1 day.

# using int_shift
transact %>%
  mutate(
    inv_due_interval = interval(Invoice, Due),
    due_pay_interval = interval(Due, Payment),  
    due_pay_next     = int_shift(due_pay_interval, by = days(1)),
    overlaps         = int_overlaps(inv_due_interval, due_pay_next)
  ) %>%
  select(Invoice, Due, Payment, overlaps)

Within


Within


Let us assume that we have to attend a conference in April 2017. Does it occur during the course duration? We can answer this using %within% which will return TRUE if a date falls within an interval.

conference <- as_date('2017-04-15')
conference <- as_date('2017-04-15')
conference %within% interval(course_start, course_end)

Case Study


Let us use %within% to count the number of invoices that were settled within the due date. We will do this by:

  • creating an interval for the invoice and due date
  • check if the payment date falls within the above interval
transact %>%
  mutate(
    inv_due_interval = ,
    overlaps = 
  ) %>%
  select()
transact %>%
  mutate(
    inv_due_interval = interval(Invoice, Due),
    overlaps = Payment %within% inv_due_interval
  ) %>%
  select(Due, Payment, overlaps)

Quarter


Quarter


Let us check the quarter and the semester in which the course starts.

course_start
course_start
quarter(course_start)
quarter(course_start, with_year = TRUE)
semester(course_start)  

Case Study


Let us count the invoices due for each quarter.

transact %>%
  mutate(
    quarter_due = 
  ) %>%
  count()
transact %>%
  mutate(
    quarter_due = quarter(Due)
  ) %>%
  count(quarter_due)

Convert


Case Study


Let us also get the course interval in different units.

course_interval / 
course_interval / 
course_interval / 
course_interval / 
course_interval / 
course_interval / dseconds()
course_interval / dminutes()
course_interval / dhours()
course_interval / dweeks()
course_interval / dyears()

Convert


Case Study


We can use time_length() to get the course interval in different units.

time_length(course_interval, )
time_length(course_interval, )
time_length(course_interval, )
time_length(course_interval, unit = "seconds")
time_length(course_interval, unit = "minutes")
time_length(course_interval, unit = "hours")

Convert


Case Study


as.period() is yet another way to get the course interval in different units.

as.period(course_interval, )
as.period(course_interval, )
as.period(course_interval, )
as.period(course_interval, unit = "seconds")
as.period(course_interval, unit = "minutes")
as.period(course_interval, unit = "hours")