The cansim R package is really helpful ๐Ÿ“ฆ ๐Ÿ“Š

Statistics Canada has a wealth of data that are essential for good public policy. Often a good third of my analytical scripts are devoted to accessing and processing data from the Statistics Canada website, which always seems like a waste of effort and good opportunity for making silly errors. So, I was keen to test out the cansimpackage for R to see how it might help. The quick answer is “very much”.

The documentation for the cansim package is thorough and doesn’t need to be repeated here. I thought it might be useful to illustrate how helpful the package can be by refactoring some earlier work that explored consumer price inflation.

These scripts always start off with downloading and extracting the relevant data file:

cpi_url <- "https://www150.statcan.gc.ca/n1/tbl/csv/18100004-eng.zip" # (1)
if(file.exists("18100004-eng.zip")) { # (2)
    # Already downloaded
  }  else {
    download.file(cpi_url,
      destfile = "18100004-eng.zip", 
      quiet = TRUE)
    unzip("18100004-eng.zip") # (3)
  }
cpi <- readr::read_csv("18100004.csv") # (4)

A few things to note here:

  1. You need to know the url for the data. Sometimes the logic is clear and you can guess, but often that doesn’t work and you need to spelunk through the Stats Can website
  2. To avoid downloading the file every time I run the script, there’s a test to see if the file already exists
  3. This approach yields lots of files and folders that you need to manage, including making sure they’re ignored by version control
  4. Using the great readr package imports the final csv file

With cansim all I need to know is the data series number:

cansim_table <- "18-10-0004"
cpi <- cansim::get_cansim(cansim_table)

get_cansimdownloads the right file to a temporary directory, extracts the data, and imports it as a tidyverse-compatible data frame.

The get_cansim function has some other nice features. It automatically creates a Date column with the right type, inferred from the standard REF_DATE column. And, it also creates a val_norm column that intelligently converts the VALUE column. For example, converting percentage or thousand-dollar values into standard formats.

The cansim package is a great example of a really helpful utility package that allows me to focus on analysis, rather than fiddling around with data. Definitely worth checking out if you deal with data from Statistics Canada.