Fascinating to think of creating cognitive scaffolding for elephants aeon.co/essays/if…

We’ve completely retooled our approach to predicting elections to use an agent-based model. Looking forward to comparing our predictions to the actual results tonight for the Toronto election!

A new approach to predicting elections: Agent-based modeling

It’s been a while since we last posted – largely for personal reasons, but also because we wanted to take some time to completely retool our approach to modeling elections. In the past, we’ve tried a number of statistical approaches. Because every election is quite different to its predecessors, this proved unsatisfactory – there are simply too many things that change which can’t be effectively measured in a top-down view. Top-down approaches ultimately treat people as averages. But candidates and voters do not behave like averages; they have different desires and expectations.

We know there are diverse behaviours that need to be modeled at the person-level. We also recognize that an election is a system of diverse agents, whose behaviours affect each other. For example, a candidate can gain or lose support by doing nothing, depending only on what other candidates do. Similarly, a candidate or voter will behave differently simply based on which candidates are in the race, even without changing any beliefs. In the academic world, the aggregated results of such behaviours are called “emergent properties”, and the ability to predict such outcomes is extremely difficult if looking at the system from the top down.

So we needed to move to a bottom-up approach that would allow us to model agents heterogeneously, and that led us to what is known as agent-based modeling.

Agent-based modeling and elections

Agent-based models employ individual heterogeneous “agents” that are interconnected and follow behavioural rules defined by the modeler. Due to their non-linear approach, agent-based models have been used extensively in military games, biology, transportation planning, operational research, ecology, and, more recently, in economics (where huge investments are being made).

While we’ll write more on this in the coming weeks, we define voters’ and candidates’ behaviour using parameters, and “train” them (i.e., setting those parameters) based on how they behaved in previous elections. For our first proof of concept model, we have candidate and voter agents with two-variable issues sets (call the issues “economic” and “social”) – each with a positional score of 0 to 100. Voters have political engagement scores (used to determine whether they cast a ballot), demographic characteristics based on census data, likability scores assigned to each candidate (which include anything that isn’t based on issues, from name recognition to racial or sexual bias), and a weight for how important likability is to that voter. Voters also track, via polls, the likelihood that a candidate can win. This is important for their “utility function” – that is, the calculation that defines which candidate a voter will choose, if they cast a ballot at all. For example, a candidate that a voter may really like, but who has no chance of winning, may not get the voter’s ultimate vote. Instead, the voter may vote strategically.

On the other hand, candidates simply seek votes. Each candidate looks at the polls and asks 1) am I a viable candidate?; and 2) how do I change my positions to attract more voters? (For now, we don’t provide them a way to change their likability.) Candidates that have a chance of winning move a random angle from their current position, based on how “flexible” they are on their positions. If that move works (i.e., moves them up in the polls), they move randomly in the same general direction. If the move hurt their standings in the polls, they turn around and go randomly in the opposite general direction. At some point, the election is held – that is, the ultimate poll – and we see who wins.

This approach allows us to run elections with different candidates, change a candidate’s likability, introduce shocks (e.g., candidates changing positions on an issue) and, eventually, see how different voting systems might impact who gets elected (foreshadowing future work.) We’re not the first to apply agent-based modeling in psephology by any stretch (there are many in the academic world using it to explain observed behaviours), but we haven’t found any attempting to do so to predict actual elections.

Applying this to the Toronto 2018 Mayoral Race

First, Toronto voters have, over the last few elections, voted somewhat more right-wing than might have been predicted. Looking at the average positions across the city for the 2003, 2006, 2010, and 2014 elections looks like the following:

This doesn’t mean that Toronto voters are themselves more right-wing than might be expected, just that they voted this way. This is in fact the first interesting outcome of our new approach. We find that about 30% of Toronto voters have been based on candidate likability, and that for the more right-wing candidates, likability has been a major reason for choosing them. For example, in 2010, Rob Ford’s likability score was significantly higher that his major competitors (George Smitherman and Joe Pantalone). This isn’t to say that everyone liked Rob Ford – but those that did vote for him cared more about something other than issues, at least relative to those who voted for his opponents.

For 2018, likability is less a differentiating factor, with both major candidates (John Tory and Jennifer Keesmaat scoring about the same on this factor). Nor are the issues – Ms. Keesmaat’s positions don’t seem to be hurting her standing in the polls as she’s staked out a strong position left of centre on both issues. What appears to be the bigger factor this time around is the early probabilities assigned by voters to Ms. Keesmaat’s chance of victory, a point that seems to have been a part of the actual Tory campaign’s strategy. Having not been seen as a major threat to John Tory by much of the city, that narrative become self-reinforcing. Further, John Tory’s positions are relatively more centrist in 2018 than they were in 2014, when he had a markedly viable right-wing opponent in Doug Ford. (To prove the point of this approach’s value, we could simply introduce a right-wing candidate and see what happens…)

Thus, our predictions don’t appear to be wildly different from current polls (with Tory winning nearly 2-to-1), and map as follows:

There will be much more to say on this, and much more we can do going forward, but for a proof of concept, we think this approach has enormous promise.

Found the exact spot we first met! Who knew it would lead to this

Cold medication

Life 3.0 by Max Tegmark is a fascinating book on an important topic. AI’s implications for society are profound and require significant debate. This book is a start. 📚

I appreciate innovation in beer, but this one went too far for me.

Great fun at the #fivepeaks Earth Run in the Rouge

Hiking Balsam Lake Provincial Park

Warming up at Balsam Lake Provincial Park

We had fun meeting the author Dav Pilkey and getting a signed copy of Dog Man. I really appreciated his patience with spending several hours talking to kids and signing books. 📚

Enjoying great sour beers at #sourpalooza

Ready for Pentatonix

Axe Pancreatic Cancer tickets are available now! This is a very fun event for a great cause. Early Bird tickets are selling quickly. So, if you’re interested, sign up now at pancreaticcancercanada.ca/axe-pancr…

Ready for school

Fixing a hack finds a better solution

In my Elections Ontario official results post, I had to use an ugly hack to match Electoral District names and numbers by extracting data from a drop down list on the Find My Electoral District website. Although it was mildly clever, like any hack, I shouldn’t have relied on this one for long, as proven by Elections Ontario shutting down the website.

So, a more robust solution was required, which led to using one of Election Ontario’s shapefiles. The shapefile contains the data we need, it’s just in a tricky format to deal with. But, the sf package makes this mostly straightforward.

We start by downloading and importing the Elections Ontario shape file. Then, since we’re only interested in the City of Toronto boundaries, we download the city’s shapefile too and intersect it with the provincial one to get a subset:

download.file("[www.elections.on.ca/content/d...](https://www.elections.on.ca/content/dam/NGW/sitecontent/2016/preo/shapefiles/Polling%20Division%20Shapefile%20-%202014%20General%20Election.zip)", 
              destfile = "data-raw/Polling%20Division%20Shapefile%20-%202014%20General%20Election.zip")
unzip("data-raw/Polling%20Division%20Shapefile%20-%202014%20General%20Election.zip", 
      exdir = "data-raw/Polling%20Division%20Shapefile%20-%202014%20General%20Election")

prov_geo <- sf::st_read(“data-raw/Polling%20Division%20Shapefile%20-%202014%20General%20Election”, layer = “PDs_Ontario”) %>% sf::st_transform(crs = “+init=epsg:4326”)

download.file("opendata.toronto.ca/gcc/votin…", destfile = “data-raw/voting_location_2014_wgs84.zip”) unzip(“data-raw/voting_location_2014_wgs84.zip”, exdir=“data-raw/voting_location_2014_wgs84”) toronto_wards <- sf::st_read(“data-raw/voting_location_2014_wgs84”, layer = “VOTING_LOCATION_2014_WGS84”) %>% sf::st_transform(crs = “+init=epsg:4326”)

to_prov_geo <- prov_geo %>% sf::st_intersection(toronto_wards)

Now we just need to extract a couple of columns from the data frame associated with the shapefile. Then we process the values a bit so that they match the format of other data sets. This includes converting them to UTF-8, formatting as title case, and replacing dashes with spaces:

electoral_districts <- to_prov_geo %>%
  dplyr::transmute(electoral_district = as.character(DATA_COMPI),
                   electoral_district_name = stringr::str_to_title(KPI04)) %>%
  dplyr::group_by(electoral_district, electoral_district_name) %>%
  dplyr::count() %>%
  dplyr::ungroup() %>%
  dplyr::mutate(electoral_district_name = stringr::str_replace_all(utf8::as_utf8(electoral_district_name), "\u0097", " ")) %>%
  dplyr::select(electoral_district, electoral_district_name)
electoral_districts
## Simple feature collection with 23 features and 2 fields
## geometry type:  MULTIPOINT
## dimension:      XY
## bbox:           xmin: -79.61919 ymin: 43.59068 xmax: -79.12511 ymax: 43.83057
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 23 x 3
##    electoral_distri… electoral_distric…                           geometry
##                                                 
##  1 005               Beaches East York  (-79.32736 43.69452, -79.32495 43…
##  2 015               Davenport          (-79.4605 43.68283, -79.46003 43.…
##  3 016               Don Valley East    (-79.35985 43.78844, -79.3595 43.…
##  4 017               Don Valley West    (-79.40592 43.75026, -79.40524 43…
##  5 020               Eglinton Lawrence  (-79.46787 43.70595, -79.46376 43…
##  6 023               Etobicoke Centre   (-79.58697 43.6442, -79.58561 43.…
##  7 024               Etobicoke Lakesho… (-79.56213 43.61001, -79.5594 43.…
##  8 025               Etobicoke North    (-79.61919 43.72889, -79.61739 43…
##  9 068               Parkdale High Park (-79.49944 43.66285, -79.4988 43.…
## 10 072               Pickering Scarbor… (-79.18898 43.80374, -79.17927 43…
## # ... with 13 more rows

In the end, this is a much more reliable solution, though it seems a bit extreme to use GIS techniques just to get a listing of Electoral District names and numbers.

The commit with most of these changes in toVotes is here.