Fixing a hack finds a better solution

Matthew Routley · 2018/09/02 · 3 minute read

In my Elections Ontario official results post, I had to use an ugly hack to match Electoral District names and numbers by extracting data from a drop down list on the Find My Electoral District website. Although it was mildly clever, like any hack, I shouldn’t have relied on this one for long, as proven by Elections Ontario shutting down the website.

So, a more robust solution was required, which led to using one of Election Ontario’s shapefiles. The shapefile contains the data we need, it’s just in a tricky format to deal with. But, the sf package makes this mostly straightforward.

We start by downloading and importing the Elections Ontario shape file. Then, since we’re only interested in the City of Toronto boundaries, we download the city’s shapefile too and intersect it with the provincial one to get a subset:

download.file("https://www.elections.on.ca/content/dam/NGW/sitecontent/2016/preo/shapefiles/Polling%20Division%20Shapefile%20-%202014%20General%20Election.zip", 
              destfile = "data-raw/Polling%20Division%20Shapefile%20-%202014%20General%20Election.zip")
unzip("data-raw/Polling%20Division%20Shapefile%20-%202014%20General%20Election.zip", 
      exdir = "data-raw/Polling%20Division%20Shapefile%20-%202014%20General%20Election")

prov_geo <- sf::st_read("data-raw/Polling%20Division%20Shapefile%20-%202014%20General%20Election", 
                        layer = "PDs_Ontario") %>%
  sf::st_transform(crs = "+init=epsg:4326")

download.file("http://opendata.toronto.ca/gcc/voting_location_2014_wgs84.zip",
              destfile = "data-raw/voting_location_2014_wgs84.zip")
unzip("data-raw/voting_location_2014_wgs84.zip", exdir="data-raw/voting_location_2014_wgs84")
toronto_wards <- sf::st_read("data-raw/voting_location_2014_wgs84", layer = "VOTING_LOCATION_2014_WGS84") %>%
  sf::st_transform(crs = "+init=epsg:4326")

to_prov_geo <- prov_geo %>%
  sf::st_intersection(toronto_wards)

Now we just need to extract a couple of columns from the data frame associated with the shapefile. Then we process the values a bit so that they match the format of other data sets. This includes converting them to UTF-8, formatting as title case, and replacing dashes with spaces:

electoral_districts <- to_prov_geo %>%
  dplyr::transmute(electoral_district = as.character(DATA_COMPI),
                   electoral_district_name = stringr::str_to_title(KPI04)) %>%
  dplyr::group_by(electoral_district, electoral_district_name) %>%
  dplyr::count() %>%
  dplyr::ungroup() %>%
  dplyr::mutate(electoral_district_name = stringr::str_replace_all(utf8::as_utf8(electoral_district_name), "\u0097", " ")) %>%
  dplyr::select(electoral_district, electoral_district_name)
electoral_districts
## Simple feature collection with 23 features and 2 fields
## geometry type:  MULTIPOINT
## dimension:      XY
## bbox:           xmin: -79.61919 ymin: 43.59068 xmax: -79.12511 ymax: 43.83057
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 23 x 3
##    electoral_distri… electoral_distric…                           geometry
##    <chr>             <chr>                                <MULTIPOINT [°]>
##  1 005               Beaches East York  (-79.32736 43.69452, -79.32495 43…
##  2 015               Davenport          (-79.4605 43.68283, -79.46003 43.…
##  3 016               Don Valley East    (-79.35985 43.78844, -79.3595 43.…
##  4 017               Don Valley West    (-79.40592 43.75026, -79.40524 43…
##  5 020               Eglinton Lawrence  (-79.46787 43.70595, -79.46376 43…
##  6 023               Etobicoke Centre   (-79.58697 43.6442, -79.58561 43.…
##  7 024               Etobicoke Lakesho… (-79.56213 43.61001, -79.5594 43.…
##  8 025               Etobicoke North    (-79.61919 43.72889, -79.61739 43…
##  9 068               Parkdale High Park (-79.49944 43.66285, -79.4988 43.…
## 10 072               Pickering Scarbor… (-79.18898 43.80374, -79.17927 43…
## # ... with 13 more rows

In the end, this is a much more reliable solution, though it seems a bit extreme to use GIS techniques just to get a listing of Electoral District names and numbers.

The commit with most of these changes in toVotes is here.