Time-series

Sara Mortara true (re.green | ¡liibre!) , Andrea Sánchez-Tapia https://andreasancheztapia.netlify.app (¡liibre!)true
2022-07-21

In this exercise we will learn how to manipulate dates in R using the package lubridate. We will also use the package zoo to calculate the rolling mean of a variable.

In our example, we will use Covid-19 data from Recife in Pernambuco, Brazil.

This data was downloaded from the portal Brasil.IO.

covid <- read.csv("data/raw/covid19-dd7bc8e57412439098d9b25129ae6f35.csv")

Converting into date format

# First checking the class
class(covid$date)
[1] "character"
# Changing to date format
covid$date <- as_date(covid$date)
# Checking the class
class(covid$date)
[1] "Date"
# Now we can make numeric operations
range(covid$date)
[1] "2020-03-12" "2022-03-27"

Plotting a time-series with ggplot2

First, we will create a column containing the number of new cases.

ggplot(covid) +
  geom_line(aes(x = date, y = new_confirmed)) +
  theme_minimal()

Oops. We have negative cases and will substitute the negative values per zero.

covid$new_confirmed[covid$new_confirmed < 0] <- 0 

Let’s try again.

ggplot(covid) +
  geom_line(aes(x = date, y = new_confirmed)) +
  theme_minimal() +
  labs(x = "Date", y = "New cases")