Hacking strings with stringr

Introduction


In this module, we will learn to work with string data in R using stringr. As we did in the earlier modules, we will use a case study to explore the various features of the stringr package. You can download the data for the case study from here or directly import the data using the readr package. Let us begin by installing and loading stringr and a set of other pacakges we will be using.

Case Study

Introduction


  • extract domain name from random email ids
  • extract image type from url
  • extract image dimension from url
  • extract extension from domain name
  • extract http protocol from url
  • extract domain name from url
  • extract extension from url
  • extract file type from url

Data


mockstring

Sample Data


mock_data <-
  mockstring %>%
  slice(1:10) %>%
  select(email, address, full_name, currency, )

Detect @


Detect @


str_detect(mock_data$email, pattern = "@")

Count @


Count @


str_count(mock_data$email, pattern = "@")

Concatenate


Concatenate


str_c("email id:", mock_data$email)

Split


Split


str_split(mock_data$email, pattern = "@")

Sort


Sort


str_sort(mock_data$email)

Sort


Sort


str_sort(mock_data$email, descending = TRUE)

Case


Case


str_to_upper(mock_data$full_name)

Replace


Replace


str_replace(mock_data$address, "Street", "ST")

Extract


Extract


str_extract(mock_data$email, pattern = "com")

Match


Match


str_match(mock_data$email, pattern = "com")

Index


Index


str_which(mock_data$email, pattern = "com")

Locate


Locate


str_locate(mock_data$email, pattern = "com")

Length


Extract


Extract


str_sub(mock_data$currency, start = 1, end = 1)

Word


word(mock_data$full_name, 1)