I’m keen to adopt an iPad first approach. This MacStories article is a good primer and motivator www.macstories.net/stories/t…
For awhile now I’ve been meaning to get serious about automation on iOS. The Siri Shortcuts Field Guide from @macsparky is a great, comprehensive introduction. While taking the course, I built a good dozen useful shortcuts and have many ideas for more.
Ed Harcourt’s album Beyond the End has been great headphone music for at work πΆ itunes.apple.com/ca/album/…
Good advice from a philosopher on how to disagree constructively:
Many of my best friends think that some of my deeply held beliefs about important issues are obviously false or even nonsense. Sometimes, they tell me so to my face. How can we still be friends? Part of the answer is that these friends and I are philosophers, and philosophers learn how to deal with positions on the edge of sanity.
Here’s some “behind the scenes” geospatial analysis I used in a recent @PsephoAnalytics post. This was a good excuse to experiment with the sf #rstats π¦ which makes this much easier than my old workflows
Spatial analysis of votes in Toronto
This is a βbehind the scenesβ elaboration of the geospatial analysis in our recent post on evaluating our predictions for the 2018 mayoral election in Toronto. This was my first, serious use of the new sf
package for geospatial analysis. I found the package much easier to use than some of my previous workflows for this sort of analysis, especially given its integration with the tidyverse.
We start by downloading the shapefile for voting locations from the City of Torontoβs Open Data portal and reading it with the read_sf
function. Then, we pipe it to st_transform
to set the appropriate projection for the data. In this case, this isnβt strictly necessary, since the shapefile is already in the right projection. But, I tend to do this for all shapefiles to avoid any oddities later.
download.file("http://opendata.toronto.ca/gcc/voting_location_2018_wgs84.zip",
destfile = "data-raw/voting_location_2018_wgs84.zip", quiet = TRUE)
unzip("data-raw/voting_location_2018_wgs84.zip", exdir="data-raw/voting_location_2018_wgs84")
toronto_locations <- sf::read_sf("data-raw/voting_location_2018_wgs84",
layer = "VOTING_LOCATION_2018_WGS84") %>%
sf::st_transform(crs = "+init=epsg:4326")
toronto_locations
## Simple feature collection with 1700 features and 13 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -79.61937 ymin: 43.59062 xmax: -79.12531 ymax: 43.83052
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 1,700 x 14
## POINT_ID FEAT_CD FEAT_C_DSC PT_SHRT_CD PT_LNG_CD POINT_NAME VOTER_CNT
## <dbl> <chr> <chr> <chr> <chr> <chr> <int>
## 1 10190 P Primary 056 10056 <NA> 37
## 2 10064 P Primary 060 10060 <NA> 532
## 3 10999 S Secondary 058 10058 Malibu 661
## 4 11342 P Primary 052 10052 <NA> 1914
## 5 10640 P Primary 047 10047 The Summit 956
## 6 10487 S Secondary 061 04061 White Eag⦠51
## 7 11004 P Primary 063 04063 Holy Fami⦠1510
## 8 11357 P Primary 024 11024 Rosedale β¦ 1697
## 9 12044 P Primary 018 05018 Weston Pu⦠1695
## 10 11402 S Secondary 066 04066 Elm Grove⦠93
## # ... with 1,690 more rows, and 7 more variables: OBJECTID <dbl>,
## # ADD_FULL <chr>, X <dbl>, Y <dbl>, LONGITUDE <dbl>, LATITUDE <dbl>,
## # geometry <POINT [Β°]>
The file has 1700 rows of data across 14 columns. The first 13 columns are data within the original shapefile. The last column is a list column that is added by sf
and contains the geometry of the location. This specific design feature is what makes an sf
object work really well with the rest of the tidyverse: the geographical details are just a column in the data frame. This makes the data much easier to work with than in other approaches, where the data is contained within an [@data](https://micro.blog/data)
slot of an object.
Plotting the data is straightforward, since sf
objects have a plot
function. Hereβs an example where we plot the number of voters (VOTER_CNT
) at each location. If you squint just right, you can see the general outline of Toronto in these points.
What we want to do next is use the voting location data to aggregate the votes cast at each location into census tracts. This then allows us to associate census characteristics (like age and income) with the pattern of votes and develop our statistical relationships for predicting voter behaviour.
Weβll split this into several steps. The first is downloading and reading the census tract shapefile.
download.file("http://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/files-fichiers/gct_000b11a_e.zip",
destfile = "data-raw/gct_000b11a_e.zip", quiet = TRUE)
unzip("data-raw/gct_000b11a_e.zip", exdir="data-raw/gct")
census_tracts <- sf::read_sf("data-raw/gct", layer = "gct_000b11a_e") %>%
sf::st_transform(crs = "+init=epsg:4326")
Now that we have it, all we really want are the census tracts in Toronto (the shapefile includes census tracts across Canada). We achieve this by intersecting the Toronto voting locations with the census tracts using standard R subsetting notation.
to_census_tracts <- census_tracts[toronto_locations,]
And, we can plot it to see how well the intersection worked. This time weβll plot the CTUID, which is the unique identifier for each census tract. This doesnβt mean anything in this context, but adds some nice colour to the plot.
plot(to_census_tracts["CTUID"])
Now you can really see the shape of Toronto, as well as the size of each census tract.
Next we need to manipulate the voting data to get votes received by major candidates in the 2018 election. We take these data from the toVotes
package and arbitrarily set the threshold for major candidates to receiving at least 100,000 votes. This yields our two main candidates: John Tory and Jennifer Keesmaat.
major_candidates <- toVotes %>%
dplyr::filter(year == 2018) %>%
dplyr::group_by(candidate) %>%
dplyr::summarise(votes = sum(votes)) %>%
dplyr::filter(votes > 100000) %>%
dplyr::select(candidate)
major_candidates
## # A tibble: 2 x 1
## candidate
## <chr>
## 1 Keesmaat Jennifer
## 2 Tory John
Given our goal of aggregating the votes received by each candidate into census tracts, we need a data frame that has each candidate in a separate column. We start by joining the major candidates table to the votes table. In this case, we also filter the votes to 2018, since John Tory has been a candidate in more than one election. Then we use the tidyr
package to convert the table from long (with one candidate column) to wide (with a column for each candidate).
spread_votes <- toVotes %>%
tibble::as.tibble() %>%
dplyr::filter(year == 2018) %>%
dplyr::right_join(major_candidates) %>%
dplyr::select(-type, -year) %>%
tidyr::spread(candidate, votes)
spread_votes
## # A tibble: 1,800 x 4
## ward area `Keesmaat Jennifer` `Tory John`
## <int> <int> <dbl> <dbl>
## 1 1 1 66 301
## 2 1 2 72 376
## 3 1 3 40 255
## 4 1 4 24 161
## 5 1 5 51 211
## 6 1 6 74 372
## 7 1 7 70 378
## 8 1 8 10 77
## 9 1 9 11 82
## 10 1 10 12 76
## # ... with 1,790 more rows
Our last step before finally aggregating to census tracts is to join the spread_votes
table with the toronto_locations
data. This requires pulling the ward and area identifiers from the PT_LNG_CD
column of the toronto_locations
data frame which we do with some stringr
functions. While weβre at it, we also update the candidate names to just surnames.
to_geo_votes <- toronto_locations %>%
dplyr::mutate(ward = as.integer(stringr::str_sub(PT_LNG_CD, 1, 2)),
area = as.integer(stringr::str_sub(PT_LNG_CD, -3, -1))) %>%
dplyr::left_join(spread_votes) %>%
dplyr::select(`Keesmaat Jennifer`, `Tory John`) %>%
dplyr::rename(Keesmaat = `Keesmaat Jennifer`, Tory = `Tory John`)
to_geo_votes
## Simple feature collection with 1700 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -79.61937 ymin: 43.59062 xmax: -79.12531 ymax: 43.83052
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 1,700 x 3
## Keesmaat Tory geometry
## <dbl> <dbl> <POINT [Β°]>
## 1 55 101 (-79.40225 43.63641)
## 2 43 53 (-79.3985 43.63736)
## 3 61 89 (-79.40028 43.63664)
## 4 296 348 (-79.39922 43.63928)
## 5 151 204 (-79.40401 43.64293)
## 6 2 11 (-79.43941 43.63712)
## 7 161 174 (-79.43511 43.63869)
## 8 221 605 (-79.38188 43.6776)
## 9 68 213 (-79.52078 43.70176)
## 10 4 14 (-79.43094 43.64027)
## # ... with 1,690 more rows
Okay, weβre finally there. We have our census tract data in to_census_tracts
and our voting data in to_geo_votes
. We want to aggregate the votes into each census tract by summing the votes at each voting location within each census tract. We use the aggregate
function for this.
ct_votes_wide <- aggregate(x = to_geo_votes,
by = to_census_tracts,
FUN = sum)
ct_votes_wide
## Simple feature collection with 522 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -79.6393 ymin: 43.58107 xmax: -79.11547 ymax: 43.85539
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## Keesmaat Tory geometry
## 1 407 459 MULTIPOLYGON (((-79.39659 4...
## 2 1103 2400 MULTIPOLYGON (((-79.38782 4...
## 3 403 1117 MULTIPOLYGON (((-79.27941 4...
## 4 96 570 MULTIPOLYGON (((-79.28156 4...
## 5 169 682 MULTIPOLYGON (((-79.49104 4...
## 6 121 450 MULTIPOLYGON (((-79.54422 4...
## 7 42 95 MULTIPOLYGON (((-79.45379 4...
## 8 412 512 MULTIPOLYGON (((-79.33895 4...
## 9 83 132 MULTIPOLYGON (((-79.35427 4...
## 10 92 629 MULTIPOLYGON (((-79.44554 4...]
As a last step, to tidy up, we now convert the wide table with a column for each candidate into a long table that has just one candidate column containing the name of the candidate.
ct_votes <- tidyr::gather(ct_votes_wide, candidate, votes, -geometry)
ct_votes
## Simple feature collection with 1044 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -79.6393 ymin: 43.58107 xmax: -79.11547 ymax: 43.85539
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## candidate votes geometry
## 1 Keesmaat 407 MULTIPOLYGON (((-79.39659 4...
## 2 Keesmaat 1103 MULTIPOLYGON (((-79.38782 4...
## 3 Keesmaat 403 MULTIPOLYGON (((-79.27941 4...
## 4 Keesmaat 96 MULTIPOLYGON (((-79.28156 4...
## 5 Keesmaat 169 MULTIPOLYGON (((-79.49104 4...
## 6 Keesmaat 121 MULTIPOLYGON (((-79.54422 4...
## 7 Keesmaat 42 MULTIPOLYGON (((-79.45379 4...
## 8 Keesmaat 412 MULTIPOLYGON (((-79.33895 4...
## 9 Keesmaat 83 MULTIPOLYGON (((-79.35427 4...
## 10 Keesmaat 92 MULTIPOLYGON (((-79.44554 4...
Now that we have votes aggregated by census tract, we can add in many other attributes from the census data. We wonβt do that here, since this post is already pretty long. But, weβll end with a plot to show how easily sf
integrates with ggplot2
. This is a nice improvement from past workflows, when several steps were required. In the actual code for the retrospective analysis, I added some other plotting techniques, like cutting the response variable (votes) into equally spaced pieces and adding some more refined labels. Here, weβll just produce a simple plot.
ggplot2::ggplot(data = ct_votes) +
ggplot2::geom_sf(ggplot2::aes(fill = votes)) +
ggplot2::facet_wrap(~candidate)