Like any 12-year old, my son is pretty keen on gaming. As an all Apple house, his options were a bit constrained. So, we decided to build a PC from components.
Iād last built a PC about 30 years ago, when I wasnāt much older than him. I remember thinking it was cool to be using a machine Iād built myself, plus as a parent it seemed like a good educational experience. I have to admit to being a bit nervous about the whole thing, as there was certainly a scenario in which we spent an entire weekend unsuccessfully trying to get a bunch of malfunctioning components to work.
After much deliberation and analysis, we ordered our parts from Newegg and everything arrived within a couple of weeks.
Given my initial anxiety, I was very relieved when we saw this screen. The BIOS booted up and showed that the RAM, SSD, and other components were all properly connected.
With that done, we then got to what ended up being the complicated part. Evidently part of the point of a gaming PC is to have lots of fans and lights. None of this was true when I was a kid and there were a daunting number of wires required to power the lights and fans. Sorting this out actually took a fair bit of time. But, eventually success!
Then our last challenge, which I likely should have anticipated much sooner. We didn’t bother ordering a DVD drive, since everything is online these days. But, our Windows installation showed up as a DVD and we couldn’t create any Windows install media on our Apple devices. Fortunately, we checked in with a slightly older kid down the street and he provided us with a USB drive with the right software. With that challenge solved, we finished the project!
Not counting choosing the components online, the whole project took about 5 hours from opening the boxes to booting into Windows for the first time. I’d definitely recommend it to anyone that’s tempted and technically inclined. My son is quite excited to be using a computer that he built from parts.
Staying in touch with my team is important. So, I schedule a skip-level meeting with someone on the team each week. These informal conversations are great for getting to know everyone, finding out about new ideas, and learning about recent achievements.
Getting these organized across a couple of dozen people is logistically challenging and Iāve developed a Shortcut to automate most of the process.
Borrowing from Scotty Jackson, I have a base in AirTable with a record for each team member. I use this to store all sorts of useful information about everyone, including when we last had a skip-level meeting. The Shortcut uses this field to pull out team members that I haven’t met with in the past four months and then randomizes the list of names. Then it passes each name over to Fantastical while also incrementing the date by a week. The end result is a recurring set of weekly meetings, randomized across team members.
The hardest part of the Shortcut development was figuring out how to get the names in a random order. A big thank you to sylumer in the Automators forum for pointing out that the Files action can randomly sort any list, not just lists of files.
Iām not sharing the Shortcut here, since the implementation is very specific to my needs. Rather, Iām sharing some of the thinking behind the code, since I think that it demonstrates the general utility of something like Shortcuts for managing routine tasks with just a small amount of upfront effort.
Inspired by Coretex, Iām declaring Tangible as my theme for 2021.
Iāve chosen this theme because I want to spend less time looking at a screen and more time with “tangible stuffā. Iām sure that this is a common sentiment and declaring this theme will keep me focused on improvements.
Since working from home with an iPad, Iām averaging about 9 hours a day with an iOS device. This isnāt just a vague estimate; Screen Time gives me to-the-minute tracking of every app Iām actively using.
Iām certainly not a Luddite! The ability of these rectangles of glass to take on so many functions and provide so much meaningful content is astounding. Thereās just something unsettling about the dominant role they play.
So, a few things I plan to try:
Although Iāll continue reading ebooks, since the convenience is so great, Iāll be rotating paper books into the queue regularly.
Iāve lost my running outdoor routine. Getting that back will be a nice addition to the Zoom classes and add in some very much needed fresh air.
As a family, weāve been enjoying playing board games on weekends, just not routinely. Making sure we play at least one game a week will be good for all of us.
My son is keen on electronics. Weāre going to try assembling a gaming PC for him from components, as well as learn some basic electronics with a breadboard and Arduino.
Our dog will be excited to get out for more regular walks with especially long ones on the weekend.
Iāll be adding much of this to Streaks, an app that Iāve found really helpful for building habits. I’ll also add a “tangible” tag to my time tracker to quantify the shift.
My hope is that I can find the right balance of screen time and tangible activities with intention.
MindNode is indispensable to my workflow. My main use for it is in tracking all of my projects and tasks, supported by MindNode’s Reminders integration. I can see all of my projects, grouped by areas of focus, simultaneously which is great for weekly reviews and for prioritizing my work.
I’ve also found it really helpful for sketching out project plans. I can get ideas out of my head easily with quick entry and then drag and drop nodes to explore connections. Seeing connections among items and rearranging them really brings out the critical elements.
MindNode’s design is fantastic and the app makes it really easy to apply styles across nodes. The relatively recent addition of tags has been great too. Overall, one of my most used apps.
I’m very excited to be recruiting for a Data Governance Sponsor to join my team and help enhance the use of good data analytics in our decisions at Metrolinx.
I’m looking for someone that enjoys telling compelling stories with data and has a passion for collaborating to build clean and reliable analytical processes. If you know someone that could fit (maybe you!), please pass along the job ad
I’ve been negligent in supporting some of my favourite apps on the App Store. In many cases, I reviewed the app a few years ago and then never refreshed my ratings. So, I’m making a new commitment to updating my reviews for apps by picking at least one each month to refresh.
First up is Fantastical. This one took a real hit when they switched to a subscription pricing model. I get the controversy with subscriptions in general. For me, Fantastical has earned a spot on my short list of apps that I support with an ongoing subscription.
And here’s my App Store review:
Fantastical is a great app and is definitely one of my top three most-used apps. Well worth the subscription price.
A few favourite features:
Integration of events and tasks into the calendar view
Access to event attachments
Automatic link detection for Teams and Zoom meetings
With the release of iOS 7, I’m reconsidering my earlier approach to the Home Screen. So far I’m trying out a fully automated first screen that uses the Smart Stack, Siri Suggestions, and Shortcut widgets. These are all automatically populated, based on anticipated use and have been quite prescient.
My second screen is all widgets with views from apps that I want to have always available. Although the dynamic content on the first screen has been really good, I do want some certainty about accessing specific content. This second screen replaces how I was using the Today View. I’m not really sure what to do with that feature anymore.
I’ve hidden all of the other screens and rely on the App Library and search to find anything else.
I still like the simplicity behind my earlier approach to the Home Screen. We’ll see if that is just what I’m used to. This new approach is worth testing out for at least a few weeks.
Skipping past the unnecessarily dramatic title, The Broken Algorithm That Poisoned American Transportation does make some useful points. As seems typical though these days, the good points are likely not the ones a quick reader would take away. My guess is most people see the headline and think that transportation demand models (TDMs) are inherently broken. Despite my biases, I don’t think this is actually true.
For me, the most important point is about a third of the way through:
nearly everyone agreed the biggest question is not whether the models can yield better results, but why we rely on them so much in the first place. At the heart of the matter is not a debate about TDMs or modeling in general, but the process for how we decide what our cities should look like.
Models are just a tool for helping guide decisions. Ideally we would use them to compare alternatives and pick a favoured “vector” of change (rough direction and magnitude). Then with continuous monitoring and refinements throughout the project’s lifecycle, we can guide decisions towards favoured outcomes. This is why scenario planning, sensitivity tests, and clear presentation of uncertainty are so important. This point is emphasized later in the article:
civil engineers doing the modeling tend to downplay the relevance of the precise numbers and speak more broadly about trends over time. Ideally, they argue, policymakers would run the model with varying population forecasts, land use patterns, and employment scenarios to get a range of expectations. Then, they would consider what range of those expectations the project actually works for.
Although I’m not a civil engineer, this sounds right to me! I get that people want certainty and precise numbers, I just don’t think anyone can provide these things. Major infrastructure projects have inherent risks and uncertainty. We need to acknowledge this and use judgement, along with a willingness to adjust over time. There is no magical crystal ball that can substitute for deliberation. [Me working from home:š§āāļøš®]
Fortunately for the modellers among us, the article does acknowledge that we’re getting better:
As problematic as they have been, the models have gotten smarter. Especially in the last decade or so, more states are working from dynamic travel models that more closely reflect how humans actually behave. They are better at taking into consideration alternate modes of transportation like biking, walking, and public transportation. And, unlike previous versions, theyāre able to model how widening one section of road might create bottlenecks in a different section.
But, wait:
Still, experts warn that unless we change the entire decision-making process behind these projects, a better model wonāt accomplish anything. The models are typically not even runāand the results presented to the publicāuntil after a state department of transportation has all but settled on a preferred project.
š Maybe it wasn’t the model’s fault after all.
This brings as back to the earlier point: we should be favouring more sophisticated decision-making processes, not just more sophisticated models.
I haven’t yet adopted the minimalist style of my iPhone for my iPad. Rather, I’ve found that setting up “task oriented” Shortcuts on my home screen is a good alternative to arranging lots of app icons.
The one I use the most is a “Reading” Shortcut, since this is my dominant use of the iPad. Nothing particularly fancy. Just a list of potential reading sources and each one starts up a Timery timer, since I like to track how much time I’m reading.
A nice feature of using a Shortcut for this is that I can add other actions, such as turning on Do Not Disturb or starting a specific playlist. I can also add and subtract reading sources over time, depending on my current habits. For example, the first one was Libby for a while, since I was reading lots of library books.
This is another example of how relatively simple Shortcuts can really help optimize how you use your iOS devices.
I’ve been keeping a “director’s commentary” of my experiences in Day One since August 2, 2012 (5,882 entries and counting). I’ve found this incredibly helpful and really enjoy the “On This Day” feature that shows all of my past entries on a particular day.
For the past few months, I’ve added in a routine based on the “5 minute PM” template which prompts me to add three things that happened that day and one thing I could have done to make the day better. This is a great point of reflection and will build up a nice library of what I’ve been doing over time.
My days seem like such a whirlwind sometimes that I actually have trouble remembering what I did that day. So, my new habit is to scroll through my Today view in Agenda. This shows me all of my notes from the day’s meetings. I’ve also created a Shortcut that creates a new note in Agenda with all of my completed tasks from Reminders. This is a useful reminder of any non-meeting based things I’ve done (not everything is a meeting, yet).
I’m finding this new routine to be a very helpful part of my daily shutdown routine: I often identify the most important thing to do tomorrow by reviewing what I did today. And starting tomorrow off with my top priority already identified really helps get the day going quickly.
watchOS 7 has some interesting new features for enhancing and sharing watch faces. After an initial explosion of developing many special purpose watch faces, I’ve settled on two: one for work and another for home.
Both watch faces use the Modular design with the date on the top left, time on the top right, and Messages on the bottom right. I like keeping the faces mostly the same for consistency and muscle memory.
My work watch face than adds the Fantastical complication right in the centre, since I often need to know which meeting I’m about to be late for. Reminders is on the bottom left and Mail in the bottom centre. I have this face set to white to not cause too much distraction.
My home watch face swaps in Now Playing in the centre, since I’m often listening to music or podcasts. And I have Activity in the bottom centre. This face is in orange, mostly to distinguish it from the work watch face.
Surprisingly, I’ve found this distinction between a work and home watch face even more important in quarantine. Switching from one face to another really helps enforce the transition between work and non-work when everything is all done at home.
The watch face that I’d really like to use is the Siri watch face. This one is supposed to intelligently expose information based on my habits. Sounds great, but almost never actually works.
I’m neither an epidemiologist nor a medical doctor. So, no one wants to see my amateur disease modelling.
That said, I’ve complained in the past about Ontario’s open data practices. So, I was very impressed with the usefulness of the data the Province is providing for COVID: a straightforward csv file that is regularly updated from a stable URL.
Using the data is easy. Here’s an example of creating a table of daily counts and cumulative totals:
Since I’m mostly stuck inside these days, I find I’m drinking more tea than usual. So, as a modification of my brew coffee shortcut, I’ve created a brew tea shortcut.
This one is slightly more complicated, since I want to do different things depending on if the tea is caffeinated or not.
We start by making this choice:
Then, if we choose caffeine, we log this to the Health app:
Uncaffeinated tea counts as water (at least for me):
And, then, regardless of the type of tea, we set a timer for 7 minutes:
Running this one requires more interactions with Siri, since she’ll ask which type we want. We can either reply by voice or by pressing the option we want on the screen.
I’ve been tracking my time at work for a while now, with the help of Toggl and Timery. Now that I’m working from home, work and home life are blending together, making it even more useful to track what I’m doing.
Physical exercise is essential to my sanity. So, I wanted to integrate my Apple Watch workouts into my time tracking. I thought I’d be able to leverage integration with the Health app through Shortcuts to add in workout times. Turns out you can’t access this kind of information and I had to take a more indirect route using the Automation features in Shortcuts.
I’ve setup two automations: one for when I start an Apple Watch workout and the other for when I stop the workout:
The starting automation just starts an entry in Timery:
The stopping automation, unsurprisingly, stops the running entry:
As with most of my Shortcuts, this is a simple one. Developing a portfolio of these simple automations is really helpful for optimizing my processes and freeing up time for my priorities.
That said, it is often the smaller automations that add up over time to make a big difference. My most used one is also the simplest in my Shortcuts Library. I use it every morning when I make my coffee. All the shortcut does is set a timer for 60 seconds (my chosen brew time for the Aeropress) and logs 90mg of caffeine into the Health app.
All I need to do is groggily say “Hey Siri, brew coffee” and then patiently wait for a minute. Well, that plus boil the water and grind the beans.
Simple, right? But thatās the point. Even simple tasks can be automated and yield consistencies and productivity gains.
I’m delivering a seminar on estimating capital costs for large transit projects soon. One of the main concepts that seems to confuse people is inflation (including the non-intuitive terms nominal and real costs). To guide this discussion, I’ve pulled data from Statistics Canada on the Consumer Price Index (CPI) to make a few points.
The first point is that, yes, things do cost more than they used to, since prices have consistently increased year over year (this is the whole point of monetary policy). I’m illustrating this with a long-term plot of CPI in Canada from 1914-01-01 to 2019-11-01.
I added in the images of candy bars to acknowledge my grandmother’s observation that, when she was a kid, candy only cost a penny. I also want to make a point that although costs have increased, we also now have a much greater diversity of candy to choose from. There’s an important analogy here for estimating the costs of projects, particulary those with a significant portion of machinery or technology assets.
The next point I want to make is that location matters, which I illustrate with a zoomed in look at CPI for Canada, Ontario, and Toronto.
This shows that over the last five years Toronto has seen higher price increases than the rest of the province and country. This has implications for project costing, since we may need to consider the source of materials and location of the project to choose the most appropriate CPI adjustment.
The last point I want to make is that the type of product also matters. To start, I illustrate this by comparing CPI for apples and alcoholic beverages (why not, there are 330 product types in the data and I have to pick a couple of examples to start).
In addition to showing how relative price inflation between products can change over time (the line for apples crosses the one for alcoholic beverages several times), this chart shows how short-term fluctuations in price can also differ. For example, the line for apples fluctuates dramatically within a year (these are monthly values), while alcoholic beverages is very smooth over time.
Once I’ve made the point with a simple example, I can then follow up with something more relevant to transit planners by showing how the price of transportation, public transportation, and parking have all changed over time, relative to each other and all-items (the standard indicator).
At least half of transit planning seems to actually be about parking, so that parking fees line is particularly relevant.
Making these charts is pretty straightforward, the only real challenge is that the data file is large and unwieldy. The code I used is here.
Podcasts are great. I really enjoy being able to pick and choose interesting conversations from such a broad swath of topics. Somewhere along the way though, I managed to subscribe to way more than I could ever listen to and the unlistened count was inducing anxiety (I know, a real first world problem).
So, time to start all over again and only subscribe to a chosen few:
Quirks & Quarks is the one I’ve been subscribed to the longest and is a reliable overview of interesting science stories. I’ve been listening to this one for so long that I used to rely on an AppleScript to get episodes into my original scroll wheel iPod, well before podcasts were embraced by Apple.
In Our Time is another veteran on my list. I really like the three-academics format and Melvyn Bragg is a great moderator. This show has a fascinating diversity of topics in science, history, and literature.
All Songs Considered has helped me keep up with the latest music and Bob Boilen is a very good interviewer.
The Talk Show has kept me up to date on the latest in Apple and related news since at least 2007.
Exponent has really helped me think more clearly about strategy with discussions of tech and society.
Focused has been a very helpful biweekly reminder to think more carefully about what I’m working on and how to optimize my systems.
Making Sense has had reliably interesting discussions from Sam Harris. It just recently went behind a paywall. But I’m happy to pay for it, which comes with access to the Waking Up app.
I admire what Jesse Brown has built with CANADALAND and happily support it.
Mindscape has had some of the most interesting episodes of any of my subscriptions in the last several months. There’s definitely a bias towards quantum mechanics and physics, but there’s nothing wrong with that.
When all together on a list like this, it looks like a lot. Many are biweekly though, so they don’t accumulate.
I use Overcast for listening to these. I’ve tried many other apps and this one has the right mix of features and simplicity for me. I also appreciate the freedom of the Apple Watch integration which allows me to leave my phone at home and still have podcasts to listen to.
For several years now, I’ve been a very happy Things user for all of my task management. However, recent reflections on the nature of my work have led to some changes. My role now mostly entails tracking a portfolio of projects and making sure that my team has the right resources and clarity of purpose required to deliver them. This means that I’m much less involved in daily project management and have a much shorter task list than in the past. Plus, the vast majority of my time in the office is spent in meetings to coordinate with other teams and identify new projects.
As a result, in order to optimize my systems, I’ve switched to using a combination of MindNode and Agenda for my task managment.
MindNode is an excellent app for mind mapping. I’ve created a mind map that contains all of my work-related projects across my areas of focus. I find this perspective on my projects really helpful when conducting a weekly review, especially since it gives me a quick sense of how well my projects are balanced across areas. As an example, the screenshot below of my mind map makes it very clear that I’m currently very active with Process Improvement, while not at all engaged in Assurance. I know that this is okay for now, but certainly want to keep an eye on this imbalance over time. I also find the visual presentation really helpful for seeing connections across projects.
MindNode has many great features that make creating and maintaining mind maps really easy. They look good too, which helps when you spend lots of time looking at them.
Agenda is a time-based note taking app. MacStories has done a thorough series of reviews, so I won’t describe the app in any detail here. There is a bit of a learning curve to get used to the idea of a time-based note, though it fits in really well to my meeting-dominated days and I’ve really enjoyed using it.
One point to make about both apps is that they are integrated with the new iOS Reminders system. The new Reminders is dramatically better than the old one and I’ve found it really powerful to have other apps leverage Reminders as a shared task database. I’ve also found it to be more than sufficient for the residual tasks that I need to track that aren’t in MindNode or Agenda.
I implemented this new approach a month ago and have stuck with it. This is at least three weeks longer than any previous attempt to move away from Things. So, the experiment has been a success. If my circumstances change, I’ll happily return to Things. For now, this new approach has worked out very well.
RStudio Cloud is a great service that provides a feature-complete version of RStudio in a web browser. In previous versions of Safari on iPad, RStudio Cloud was close to unusable, since the keyboard shortcuts didn’t work and they’re essential for using RStudio. In iPadOS, all of the shortcuts work as expected and RStudio Cloud is completely functional.
Although most of my analytical work will still be on my desktop, having RStudio on my iPad adds a very convenient option. RStudio Cloud also allows you to setup a project with an environment that persists across any device. So, now I can do most of my work at home, then fix a few issues at work, and refine at a coffee shop. Three different devices all using the exact same RStudio project.
One complexity with an RStudio Cloud setup is GitHub access. The usual approach of putting your git credentials in an .REnviron file (or equivalent) is a bad idea on a web service like RStudio Cloud. So, you need to type your git credentials into the console. To avoid having to do this very frequently, follow this advice and type this into the console:
My goal for the home screen is to stay focused on action by making it easy to quickly capture my intentions and to minimize distractions. With previous setups I often found that Iād unlock the phone, be confronted by a screen full of apps with notification badges, and promptly forget what I had intended to do. So, Iāve reduced my home screen to just two apps.
Drafts is on the right and is likely my most frequently used app. As the tag line for the app says, this is where text starts. Rather than searching for a specific app, launching it, and then typing, Drafts always opens up to a blank text field. Then I type whatever is on my mind and send it from Drafts to the appropriate app. So, text messages, emails, todos, meeting notes, and random ideas all start in Drafts. Unfortunately my corporate iPhone blocks iCloud Drive, so I canāt use Drafts to share notes across my devices. Anything that I want to keep gets moved into Apple Notes.
Things is on the left and is currently my favoured todo app. All of my tasks, projects, and areas of focus are in there, tagged by context, and given due dates, if appropriate. If the Things app has a notification badge, then Iāve got work to do today. If youāre keen, The Sweet Setup has a great course on Things.
A few more notes on my setup:
If Drafts isnāt the right place to start, I just pull down from the home screen to activate search and find the right app. Iāve found that the Siri Suggestions are often what Iām looking for (based on time of day and other context).
Some apps are more important for their output than input. These include calendar, weather, and notes. Iāve set these up as widgets in the Today View. A quick slide to the right reveals these.
I interact with several other apps through notifications, particularly for communication with Messages and Mail. But, Iāve set up VIPs in Mail to reduce these notifications to just the really important people.
I’ve been using this setup for a few months now and it certainly works for me. Even if this isn’t quite right for you, I’d encourage you to take a few minutes to really think through how you interact with your phone. I see far too many people with the default settings spending too much time scrolling around on their phones looking for the right app.
Like many of us, my online presence had become scattered across many sites: Twitter, Instagram, LinkedIn, Tumblr, and a close-to-defunct personal blog. So much of my content has been locked into proprietary services, each of which seemed like a good idea to start with.
Looking back at it now, I’m not happy with this and wanted to gather everything back into something that I could control. Micro.blog seems like a great home for this, as well described in this post from Manton Reece (micro.blog’s creator). So, I’ve consolidated almost everything here. All thatās left out is Facebook, which I may just leave alone.
By starting with micro.blog, I can selectively send content to other sites, while everything is still available from one source. I think this is a much better approach and Iām happy to be part of the open indie web again.
I’m very keen on backups. So many important things are digital now and, as a result, ephemeral. Fortunately you can duplicate digital assets, which makes backups helpful for preservation.
Most of my backup strategy was aimed at recovering from catastrophic loss, like a broken hard drive or stolen computer. I wasn’t sufficiently prepared for more subtle, corrosive loss of files. As a result, many videos of my kids' early years were lost. This was really hard to take, especially given that I thought I was so prepared with backups.
Fortunately, I found an old Mac Mini in a closet that had most of the missing files! This certainly wasn’t part of my backup strategy, but I’ll take it.
So, just a friendly reminder to make sure your backups are actually working as you expect. We all know this. But, please check.
My favourite spin studio has put on a fitness challenge for 2019. It has many components, one of which is improving your performance by 3% over six weeks. Iāve taken on the challenge and am now worried that I donāt know how reasonable this increase actually is. So, a perfect excuse to extract my metrics and perform some excessive analysis.
We start by importing a CSV file of my stats, taken from Torqās website. We use readr for this and set the col_types in advance to specify the type for each column, as well as skip some extra columns with col_skip. I find doing this at the same time as the import, rather than later in the process, more direct and efficient. I also like the flexibility of the col_date import, where I can convert the source data (e.g., āMon 11/02/19 - 6:10 AMā) into a format more useful for analysis (e.g., ā2019-02-11ā).
One last bit of clean up is to specify the instructors that have led 5 or more sessions. Later, weāll try to identify an instructor-effect on my performance and only 5 of the 10 instructors have sufficient data for such an analysis. The forcats::fct_other function is really handy for this: it collapses several factor levels together into one āOtherā level.
Lastly, I set the challenge_start_date variable for use later on.
## # A time tibble: 6 x 5
## # Index: Date
## Date Instructor avg_power max_power total_energy
##
## 1 2019-02-11 Justine 234 449 616
## 2 2019-02-08 George 221 707 577
## 3 2019-02-04 Justine 230 720 613
## 4 2019-02-01 George 220 609 566
## 5 2019-01-21 Justine 252 494 623
## 6 2019-01-18 George 227 808 590
To start, we just plot the change in average power over time. Given relatively high variability from class to class, we add a smoothed line to show the overall trend. We also mark the point where the challenge starts with a red line.
Overall, looks like I made some steady progress when I started, plateaued in the Summer (when I didnāt ride as often), and then started a slow, steady increase in the Fall. Unfortunately for me, it also looks like I started to plateau in my improvement just in time for the challenge.
Weāll start a more formal analysis by just testing to see what my average improvement is over time.
time_model <- lm(avg_power ~ Date, data = data)
summary(time_model)
##
## Call:
## lm(formula = avg_power ~ Date, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.486 -11.356 -1.879 11.501 33.838
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.032e+03 3.285e+02 -6.186 4.85e-08 ***
## Date 1.264e-01 1.847e-02 6.844 3.49e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.65 on 64 degrees of freedom
## Multiple R-squared: 0.4226, Adjusted R-squared: 0.4136
## F-statistic: 46.85 on 1 and 64 DF, p-value: 3.485e-09
Based on this, we can see that my performance is improving over time and the R2 is decent. But, interpreting the coefficient for time isnāt entirely intuitive and isnāt really the point of the challenge. The question is: how much higher is my average power during the challenge than before? For that, weāll set up a ādummy variableā based on the date of the class. Then we use dplyr to group by this variable and estimate the mean of the average power in both time periods.
data %<>%
dplyr::mutate(in_challenge = ifelse(Date > challenge_start_date, 1, 0))
(change_in_power <- data %>%
dplyr::group_by(in_challenge) %>%
dplyr::summarize(mean_avg_power = mean(avg_power)))
## # A tibble: 2 x 2
## in_challenge mean_avg_power
##
## 1 0 213.
## 2 1 231.
So, Iāve improved from an average power of 213 to 231 for a improvement of 8%. A great relief: Iāve exceeded the target!
Of course, having gone to all of this (excessive?) trouble, now Iām interested in seeing if the instructor leading the class has a significant impact on my results. I certainly feel as if some instructors are harder than others. But, is this supported by my actual results? As always, we start with a plot to get a visual sense of the data.
Thereās a mix of confirmation and rejection of my expectations here:
Charlotte got me started and thereās a clear trend of increasing power as I figure out the bikes and gain fitness
Georgeās class is always fun, but my progress isnāt as high as Iād like (indicated by the shallower slope). This is likely my fault though: Georgeās rides are almost always Friday morning and Iām often tired by then and donāt push myself as hard as I could
Justine hasnāt led many of my classes, but my two best results are with her. That said, my last two rides with her havenāt been great
Marawanās classes always feel really tough. Heās great at motivation and I always feel like Iāve pushed really hard in his classes. You can sort of see this with the relatively higher position of the best fit line for his classes. But, I also had one really poor class with him. This seems to coincide with some poor classes with both Tanis and George though. So, I was likely a bit sick then. Also, for Marawan, Iām certain Iāve had more classes with him (each one is memorably challenging), as heās substituted for other instructors several times. Looks like Torqās tracker doesnāt adjust the name of the instructor to match this substitution though
Tanisā classes are usually tough too. Sheās relentless about increasing resistance and looks like I was improving well with her
Other isnāt really worth describing in any detail, as it is a mix of 5 other instructors each with different styles.
Having eye-balled the charts. Letās now look at a statistical model that considers just instructors.
Each of the named instructors has a significant, positive effect on my average power. However, the overall R2 is much less than the model that considered just time. So, our next step is to consider both together.
##
## Call:
## lm(formula = avg_power ~ Date + Instructor, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.689 -11.132 1.039 10.800 26.267
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2332.4255 464.3103 -5.023 5.00e-06 ***
## Date 0.1431 0.0263 5.442 1.07e-06 ***
## InstructorGeorge -2.3415 7.5945 -0.308 0.759
## InstructorJustine 10.7751 8.9667 1.202 0.234
## InstructorMarawan 12.5927 8.9057 1.414 0.163
## InstructorTanis 3.3827 8.4714 0.399 0.691
## InstructorOther 12.3270 7.1521 1.724 0.090 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.12 on 59 degrees of freedom
## Multiple R-squared: 0.5034, Adjusted R-squared: 0.4529
## F-statistic: 9.969 on 6 and 59 DF, p-value: 1.396e-07
anova(time_instructor_model, time_model)
## Analysis of Variance Table
##
## Model 1: avg_power ~ Date + Instructor
## Model 2: avg_power ~ Date
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 59 13481
## 2 64 15675 -5 -2193.8 1.9202 0.1045
This model has the best explanatory power so far and increases the coefficient for time slightly. This suggests that controlling for instructor improves the time signal. However, when we add in time, none of the individual instructors are significantly different from each other. My interpretation of this is that my overall improvement over time is much more important than which particular instructor is leading the class. Nonetheless, the earlier analysis of the instructors gave me some insights that I can use to maximize the contribution each of them makes to my overall progress. When comparing these models with an ANOVA though, we find that there isnāt a significant difference between them.
The last model to try is to look for a time by instructor interaction.
## Analysis of Variance Table
##
## Model 1: avg_power ~ Date
## Model 2: avg_power ~ Date + Instructor + Date:Instructor
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 64 15675
## 2 54 11920 10 3754.1 1.7006 0.1044
We can see that there are some significant interactions, meaning that the slope of improvement over time does differ by instructor. Before getting too excited though, an ANOVA shows that this model isnāt any better than the simple model of just time. Thereās always a risk with trying to explain main effects (like time) with interactions. The story here is really that we need more data to tease out the impacts of the instructor by time effect.
The last point to make is that weāve focused so far on the average power, since thatās the metric for this fitness challenge. There could be interesting interactions among average power, maximum power, RPMs, and total energy, each of which is available in these data. Iāll return to that analysis some other time.
In the interests of science, Iāll keep going to these classes, just so we can figure out what impact instructors have on performance. In the meantime, looks like Iāll succeed with at least this component of the fitness challenge and I have the stats to prove it.
This is a ābehind the scenesā elaboration of the geospatial analysis in our recent post on evaluating our predictions for the 2018 mayoral election in Toronto. This was my first, serious use of the new sf package for geospatial analysis. I found the package much easier to use than some of my previous workflows for this sort of analysis, especially given its integration with the tidyverse.
We start by downloading the shapefile for voting locations from the City of Torontoās Open Data portal and reading it with the read_sf function. Then, we pipe it to st_transform to set the appropriate projection for the data. In this case, this isnāt strictly necessary, since the shapefile is already in the right projection. But, I tend to do this for all shapefiles to avoid any oddities later.
## Simple feature collection with 1700 features and 13 fields## geometry type: POINT## dimension: XY## bbox: xmin: -79.61937 ymin: 43.59062 xmax: -79.12531 ymax: 43.83052## epsg (SRID): 4326## proj4string: +proj=longlat +datum=WGS84 +no_defs## # A tibble: 1,700 x 14## POINT_ID FEAT_CD FEAT_C_DSC PT_SHRT_CD PT_LNG_CD POINT_NAME VOTER_CNT## <dbl> <chr> <chr> <chr> <chr> <chr> <int>## 1 10190 P Primary 056 10056 <NA> 37## 2 10064 P Primary 060 10060 <NA> 532## 3 10999 S Secondary 058 10058 Malibu 661## 4 11342 P Primary 052 10052 <NA> 1914## 5 10640 P Primary 047 10047 The Summit 956## 6 10487 S Secondary 061 04061 White Eagā¦ 51## 7 11004 P Primary 063 04063 Holy Famiā¦ 1510## 8 11357 P Primary 024 11024 Rosedale ā¦ 1697## 9 12044 P Primary 018 05018 Weston Puā¦ 1695## 10 11402 S Secondary 066 04066 Elm Groveā¦ 93## # ... with 1,690 more rows, and 7 more variables: OBJECTID <dbl>,## # ADD_FULL <chr>, X <dbl>, Y <dbl>, LONGITUDE <dbl>, LATITUDE <dbl>,## # geometry <POINT [Ā°]>
The file has 1700 rows of data across 14 columns. The first 13 columns are data within the original shapefile. The last column is a list column that is added by sf and contains the geometry of the location. This specific design feature is what makes an sf object work really well with the rest of the tidyverse: the geographical details are just a column in the data frame. This makes the data much easier to work with than in other approaches, where the data is contained within an [@data](https://micro.blog/data) slot of an object.
Plotting the data is straightforward, since sf objects have a plot function. Hereās an example where we plot the number of voters (VOTER_CNT) at each location. If you squint just right, you can see the general outline of Toronto in these points.
What we want to do next is use the voting location data to aggregate the votes cast at each location into census tracts. This then allows us to associate census characteristics (like age and income) with the pattern of votes and develop our statistical relationships for predicting voter behaviour.
Weāll split this into several steps. The first is downloading and reading the census tract shapefile.
Now that we have it, all we really want are the census tracts in Toronto (the shapefile includes census tracts across Canada). We achieve this by intersecting the Toronto voting locations with the census tracts using standard R subsetting notation.
And, we can plot it to see how well the intersection worked. This time weāll plot the CTUID, which is the unique identifier for each census tract. This doesnāt mean anything in this context, but adds some nice colour to the plot.
plot(to_census_tracts["CTUID"])
Now you can really see the shape of Toronto, as well as the size of each census tract.
Next we need to manipulate the voting data to get votes received by major candidates in the 2018 election. We take these data from the toVotes package and arbitrarily set the threshold for major candidates to receiving at least 100,000 votes. This yields our two main candidates: John Tory and Jennifer Keesmaat.
## # A tibble: 2 x 1## candidate ## <chr> ## 1 Keesmaat Jennifer## 2 Tory John
Given our goal of aggregating the votes received by each candidate into census tracts, we need a data frame that has each candidate in a separate column. We start by joining the major candidates table to the votes table. In this case, we also filter the votes to 2018, since John Tory has been a candidate in more than one election. Then we use the tidyr package to convert the table from long (with one candidate column) to wide (with a column for each candidate).
Our last step before finally aggregating to census tracts is to join the spread_votes table with the toronto_locations data. This requires pulling the ward and area identifiers from the PT_LNG_CD column of the toronto_locations data frame which we do with some stringr functions. While weāre at it, we also update the candidate names to just surnames.
Okay, weāre finally there. We have our census tract data in to_census_tracts and our voting data in to_geo_votes. We want to aggregate the votes into each census tract by summing the votes at each voting location within each census tract. We use the aggregate function for this.
ct_votes_wide <-aggregate(x = to_geo_votes,
by = to_census_tracts,
FUN = sum)
ct_votes_wide
As a last step, to tidy up, we now convert the wide table with a column for each candidate into a long table that has just one candidate column containing the name of the candidate.
Now that we have votes aggregated by census tract, we can add in many other attributes from the census data. We wonāt do that here, since this post is already pretty long. But, weāll end with a plot to show how easily sf integrates with ggplot2. This is a nice improvement from past workflows, when several steps were required. In the actual code for the retrospective analysis, I added some other plotting techniques, like cutting the response variable (votes) into equally spaced pieces and adding some more refined labels. Here, weāll just produce a simple plot.
Thanks to generous support, the 4th Axe Pancreatic Cancer fundraiser was a great success. We raised over $32K this year and all funds support the PancOne Network. So far, we’ve raised close to $120K in honour of my Mom. Thanks to everyone that has supported this important cause!
In my Elections Ontario official results post, I had to use an ugly hack to match Electoral District names and numbers by extracting data from a drop down list on the Find My Electoral District website. Although it was mildly clever, like any hack, I shouldnāt have relied on this one for long, as proven by Elections Ontario shutting down the website.
So, a more robust solution was required, which led to using one of Election Ontarioās shapefiles. The shapefile contains the data we need, itās just in a tricky format to deal with. But, the sf package makes this mostly straightforward.
We start by downloading and importing the Elections Ontario shape file. Then, since weāre only interested in the City of Toronto boundaries, we download the cityās shapefile too and intersect it with the provincial one to get a subset:
Now we just need to extract a couple of columns from the data frame associated with the shapefile. Then we process the values a bit so that they match the format of other data sets. This includes converting them to UTF-8, formatting as title case, and replacing dashes with spaces:
## Simple feature collection with 23 features and 2 fields
## geometry type: MULTIPOINT
## dimension: XY
## bbox: xmin: -79.61919 ymin: 43.59068 xmax: -79.12511 ymax: 43.83057
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 23 x 3
## electoral_distriā¦ electoral_districā¦ geometry
##
## 1 005 Beaches East York (-79.32736 43.69452, -79.32495 43ā¦
## 2 015 Davenport (-79.4605 43.68283, -79.46003 43.ā¦
## 3 016 Don Valley East (-79.35985 43.78844, -79.3595 43.ā¦
## 4 017 Don Valley West (-79.40592 43.75026, -79.40524 43ā¦
## 5 020 Eglinton Lawrence (-79.46787 43.70595, -79.46376 43ā¦
## 6 023 Etobicoke Centre (-79.58697 43.6442, -79.58561 43.ā¦
## 7 024 Etobicoke Lakeshoā¦ (-79.56213 43.61001, -79.5594 43.ā¦
## 8 025 Etobicoke North (-79.61919 43.72889, -79.61739 43ā¦
## 9 068 Parkdale High Park (-79.49944 43.66285, -79.4988 43.ā¦
## 10 072 Pickering Scarborā¦ (-79.18898 43.80374, -79.17927 43ā¦
## # ... with 13 more rows
In the end, this is a much more reliable solution, though it seems a bit extreme to use GIS techniques just to get a listing of Electoral District names and numbers.
The commit with most of these changes in toVotes is here.
In preparing for some PsephoAnalytics work on the upcoming provincial election, Iāve been wrangling the Elections Ontario data. As provided, the data is really difficult to work with and weāll walk through some steps to tidy these data for later analysis.
Hereās what the source data looks like:
Screenshot of raw Elections Ontario data
A few problems with this:
The data is scattered across a hundred different Excel files
Candidates are in columns with their last name as the header
Last names are not unique across all Electoral Districts, so canāt be used as a unique identifier
Electoral District names are in a row, followed by a separate row for each poll within the district
The party affiliation for each candidate isnāt included in the data
So, we have a fair bit of work to do to get to something more useful. Ideally something like:
## # A tibble: 9 x 5
## electoral_district poll candidate party votes
## <chr> <chr> <chr> <chr> <int>
## 1 X 1 A Liberal 37
## 2 X 2 B NDP 45
## 3 X 3 C PC 33
## 4 Y 1 A Liberal 71
## 5 Y 2 B NDP 37
## 6 Y 3 C PC 69
## 7 Z 1 A Liberal 28
## 8 Z 2 B NDP 15
## 9 Z 3 C PC 34
This is much easier to work with: we have one row for the votes received by each candidate at each poll, along with the Electoral District name and their party affiliation.
Candidate parties
As a first step, we need the party affiliation for each candidate. I didnāt see this information on the Elections Ontario site. So, weāll pull the data from Wikipedia. The data on this webpage isnāt too bad. We can just use the table xpath selector to pull out the tables and then drop the ones we arenāt interested in.
```
candidate_webpage <- "https://en.wikipedia.org/wiki/Ontario_general_election,_2014#Candidates_by_region"
candidate_tables <- "table" # Use an xpath selector to get the drop down list by ID
candidates <- xml2::read_html(candidate_webpage) %>%
rvest::html_nodes(candidate_tables) %>% # Pull tables from the wikipedia entry
.[13:25] %>% # Drop unecessary tables
rvest::html_table(fill = TRUE)
</pre>
<p>This gives us a list of 13 data frames, one for each table on the webpage. Now we cycle through each of these and stack them into one data frame. Unfortunately, the tables arenāt consistent in the number of columns. So, the approach is a bit messy and we process each one in a loop.</p>
<pre class="r"><code># Setup empty dataframe to store results
candidate_parties <- tibble::as_tibble(
electoral_district_name = NULL,
party = NULL,
candidate = NULL
)
for(i in seq_along(1:length(candidates))) { # Messy, but works
this_table <- candidates[[i]]
# The header spans mess up the header row, so renaming
names(this_table) <- c(this_table[1,-c(3,4)], "NA", "Incumbent")
# Get rid of the blank spacer columns
this_table <- this_table[-1, ]
# Drop the NA columns by keeping only odd columns
this_table <- this_table[,seq(from = 1, to = dim(this_table)[2], by = 2)]
this_table %<>%
tidyr::gather(party, candidate, -`Electoral District`) %>%
dplyr::rename(electoral_district_name = `Electoral District`) %>%
dplyr::filter(party != "Incumbent")
candidate_parties <- dplyr::bind_rows(candidate_parties, this_table)
}
candidate_parties</code></pre>
<pre>
# A tibble: 649 x 3
electoral_district_name party candidate
1 CarletonāMississippi Mills Liberal Rosalyn Stevens
</pre>
</div>
<div id="electoral-district-names" class="section level2">
<h2>Electoral district names</h2>
<p>One issue with pulling party affiliations from Wikipedia is that candidates are organized by Electoral District <em>names</em>. But the voting results are organized by Electoral District <em>number</em>. I couldnāt find an appropriate resource on the Elections Ontario site. Rather, here we pull the names and numbers of the Electoral Districts from the <a href="https://www3.elections.on.ca/internetapp/FYED_Error.aspx?lang=en-ca">Find My Electoral District</a> website. The xpath selector is a bit tricky for this one. The <code>ed_xpath</code> object below actually pulls content from the drop down list that appears when you choose an Electoral District. One nuisance with these data is that Elections Ontario uses <code>--</code> in the Electoral District names, instead of the ā used on Wikipedia. We use <code>str_replace_all</code> to fix this below.</p>
<pre class="r"><code>ed_webpage <- "https://www3.elections.on.ca/internetapp/FYED_Error.aspx?lang=en-ca"
ed_xpath <- "//*[(@id = \"ddlElectoralDistricts\")]" # Use an xpath selector to get the drop down list by ID
electoral_districts <- xml2::read_html(ed_webpage) %>%
rvest::html_node(xpath = ed_xpath) %>%
rvest::html_nodes("option") %>%
rvest::html_text() %>%
.[-1] %>% # Drop the first item on the list ("Select...")
tibble::as.tibble() %>% # Convert to a data frame and split into ID number and name
tidyr::separate(value, c("electoral_district", "electoral_district_name"),
sep = " ",
extra = "merge") %>%
# Clean up district names for later matching and presentation
dplyr::mutate(electoral_district_name = stringr::str_to_title(
stringr::str_replace_all(electoral_district_name, "--", "ā")))
electoral_districts</code></pre>
<pre>
# A tibble: 107 x 2
electoral_district electoral_district_name
1 001 AjaxāPickering
2 002 AlgomaāManitoulin
3 003 AncasterāDundasāFlamboroughāWestdale
4 004 Barrie
5 005 BeachesāEast York
6 006 BramaleaāGoreāMalton
7 007 BramptonāSpringdale
8 008 Brampton West
9 009 Brant
10 010 BruceāGreyāOwen Sound
# … with 97 more rows
</pre>
<p>Next, we can join the party affiliations to the Electoral District names to join candidates to parties and district numbers.</p>
<pre class="r"><code>candidate_parties %<>%
# These three lines are cleaning up hyphens and dashes, seems overly complicated
dplyr::mutate(electoral_district_name = stringr::str_replace_all(electoral_district_name, "ā\n", "ā")) %>%
dplyr::mutate(electoral_district_name = stringr::str_replace_all(electoral_district_name,
"Chatham-KentāEssex",
"ChathamāKentāEssex")) %>%
dplyr::mutate(electoral_district_name = stringr::str_to_title(electoral_district_name)) %>%
dplyr::left_join(electoral_districts) %>%
dplyr::filter(!candidate == "") %>%
# Since the vote data are identified by last names, we split candidate's names into first and last
tidyr::separate(candidate, into = c("first","candidate"), extra = "merge", remove = TRUE) %>%
dplyr::select(-first)</code></pre>
<pre><code>## Joining, by = "electoral_district_name"</code></pre>
<pre class="r"><code>candidate_parties</code></pre>
<pre>
# A tibble: 578 x 4
electoral_district_name party candidate electoral_district
*
1 CarletonāMississippi Mills Liberal Stevens 013
</pre>
<p>All that work just to get the name of each candiate for each Electoral District name and number, plus their party affiliation.</p>
</div>
<div id="votes" class="section level2">
<h2>Votes</h2>
<p>Now we can finally get to the actual voting data. These are made available as a collection of Excel files in a compressed folder. To avoid downloading it more than once, we wrap the call in an <code>if</code> statement that first checks to see if we already have the file. We also rename the file to something more manageable.</p>
<pre class="r"><code>raw_results_file <- "[www.elections.on.ca/content/d...](http://www.elections.on.ca/content/dam/NGW/sitecontent/2017/results/Poll%20by%20Poll%20Results%20-%20Excel.zip)"
zip_file <- "data-raw/Poll%20by%20Poll%20Results%20-%20Excel.zip"
if(file.exists(zip_file)) { # Only download the data once
# File exists, so nothing to do
} else {
download.file(raw_results_file,
destfile = zip_file)
unzip(zip_file, exdir="data-raw") # Extract the data into data-raw
file.rename("data-raw/GE Results - 2014 (unconverted)", "data-raw/pollresults")
}</code></pre>
<pre><code>## NULL</code></pre>
<p>Now we need to extract the votes out of 107 Excel files. The combination of <code>purrr</code> and <code>readxl</code> packages is great for this. In case we want to filter to just a few of the files (perhaps to target a range of Electoral Districts), we declare a <code>file_pattern</code>. For now, we just set it to any xls file that ends with three digits preceeded by a ā_ā.</p>
<p>As we read in the Excel files, we clean up lots of blank columns and headers. Then we convert to a long table and drop total and blank rows. Also, rather than try to align the Electoral District name rows with their polls, we use the name of the Excel file to pull out the Electoral District number. Then we join with the <code>electoral_districts</code> table to pull in the Electoral District names.</p>
<pre class="r">
file_pattern <- “*_[[:digit:]]{3}.xls” # Can use this to filter down to specific files
poll_data <- list.files(path = “data-raw/pollresults”, pattern = file_pattern, full.names = TRUE) %>% # Find all files that match the pattern
purrr::set_names() %>%
purrr::map_df(readxl::read_excel, sheet = 1, col_types = “text”, .id = “file”) %>% # Import each file and merge into a dataframe
Specifying sheet = 1 just to be clear we’re ignoring the rest of the sheets
Declare col_types since there are duplicate surnames and map_df can’t recast column types in the rbind
For example, Bell is in both district 014 and 063
dplyr::select(-starts_with(“X__")) %>% # Drop all of the blank columns
dplyr::select(1:2,4:8,15:dim(.)[2]) %>% # Reorganize a bit and drop unneeded columns
dplyr::rename(poll_number = POLL NO.) %>%
tidyr::gather(candidate, votes, -file, -poll_number) %>% # Convert to a long table
dplyr::filter(!is.na(votes),
poll_number != “Totals”) %>%
dplyr::mutate(electoral_district = stringr::str_extract(file, “[[:digit:]]{3}"),
votes = as.numeric(votes)) %>%
dplyr::select(-file) %>%
dplyr::left_join(electoral_districts)
poll_data
</pre>
<p>The only thing left to do is to join <code>poll_data</code> with <code>candidate_parties</code> to add party affiliation to each candidate. Because the names donāt always exactly match between these two tables, we use the <code>fuzzyjoin</code> package to join by closest spelling.</p>
<pre class="r"><code>poll_data_party_match_table <- poll_data %>%
group_by(candidate, electoral_district_name) %>%
summarise() %>%
fuzzyjoin::stringdist_left_join(candidate_parties,
ignore_case = TRUE) %>%
dplyr::select(candidate = candidate.x,
party = party,
electoral_district = electoral_district) %>%
dplyr::filter(!is.na(party))
poll_data %<>%
dplyr::left_join(poll_data_party_match_table) %>%
dplyr::group_by(electoral_district, party)
tibble::glimpse(poll_data)</code></pre>
<pre>
</pre>
<p>And, there we go. One table with a row for the votes received by each candidate at each poll. It would have been great if Elections Ontario released data in this format and we could have avoided all of this work.</p>
</div>
Analyzing these data was a great case study for the typical data management process. The data was structured for presentation, rather than analysis. So, there were several header rows, notes at the base of the table, and the data was spread across many worksheets.
Sometime recently, the ministry released an update that provides the data in a much better format: one sheet with rows for age and columns for years. Although this is a great improvement, Iāve had to update my case study, which makes it actually less useful as a lesson in data manipulation.
Although I’ve updated the main branch of the github repository, I’ve also created a branch that sources the archive.org version of the page from October 2016. Now, depending on the audience, I can choose the case study that has the right level of complexity.
Despite briefly causing me some trouble, I think it is great that these data are closer to a good analytical format. Now, if only the ministry could go one more step towards tidy data and make my case study completely unecessary.