Day 22: Having a rest with friends π·
Day 21: An impressive space at the foot of a mountain π·
Day 20: Iβm looking forward to resuming winter sports π·
Day 19: Mirror in a lake π·
Day 18: Lucy is finished for the day π·
Day 17: No need for a compass when hiking in the city, just follow the sound of traffic π·
Day 16: Rotation π·
Day 15: Ethereal π·
Day 14: My favourite wheels as a kid π·
Day 13: Lucy is a couch animal π·
Day 12: Rock legends π·
Day 11: Hygge π·
The cansim R package is really helpful π¦ π
Statistics Canada has a wealth of data that are essential for good public policy. Often a good third of my analytical scripts are devoted to accessing and processing data from the Statistics Canada website, which always seems like a waste of effort and good opportunity for making silly errors. So, I was keen to test out the cansim
package for R to see how it might help. The quick answer is “very much”.
The documentation for the cansim
package is thorough and doesn’t need to be repeated here. I thought it might be useful to illustrate how helpful the package can be by refactoring some earlier work that explored consumer price inflation.
These scripts always start off with downloading and extracting the relevant data file:
cpi_url <- "https://www150.statcan.gc.ca/n1/tbl/csv/18100004-eng.zip" # (1)
if(file.exists("18100004-eng.zip")) { # (2)
# Already downloaded
} else {
download.file(cpi_url,
destfile = "18100004-eng.zip",
quiet = TRUE)
unzip("18100004-eng.zip") # (3)
}
cpi <- readr::read_csv("18100004.csv") # (4)
A few things to note here:
- You need to know the url for the data. Sometimes the logic is clear and you can guess, but often that doesn’t work and you need to spelunk through the Stats Can website
- To avoid downloading the file every time I run the script, there’s a test to see if the file already exists
- This approach yields lots of files and folders that you need to manage, including making sure they’re ignored by version control
- Using the great
readr
package imports the final csv file
With cansim
all I need to know is the data series number:
cansim_table <- "18-10-0004"
cpi <- cansim::get_cansim(cansim_table)
get_cansim
downloads the right file to a temporary directory, extracts the data, and imports it as a tidyverse
-compatible data frame.
The get_cansim
function has some other nice features. It automatically creates a Date
column with the right type, inferred from the standard REF_DATE
column. And, it also creates a val_norm
column that intelligently converts the VALUE
column. For example, converting percentage or thousand-dollar values into standard formats.
The cansim
package is a great example of a really helpful utility package that allows me to focus on analysis, rather than fiddling around with data. Definitely worth checking out if you deal with data from Statistics Canada.
Day 10: The bridges of my morning run π· πββοΈ
Day 9: Swinging through the trees is safe with this gear on π·
Day 8: A benefit of a twilight run is that the sidewalks are clear π·
Day 7: Spice π·
Day 6: Street π·
Day 5: The toys are watching, always π·