Longform

A workflow for leaving the office

Sometimes it’s the small things, accumulated over many days, that make a difference. As a simple example, every day when I leave the office, I message my family to let them know I’m leaving and how I’m travelling. Relatively easy: just open the Messages app, find the most recent conversation with them, and type in my message.

Using Workflow I can get this down to just a couple of taps on my watch. By choosing the “Leaving Work” workflow, I get a choice of travelling options:

Leaving work from the Apple Watch

Choosing one of them creates a text with the right emoticon that is pre-addressed to my family. I hit send and off goes the message.

The workflow itself is straightforward:

Leaving work workflow

Like I said, pretty simple. But saves me close to a minute each and every day.

Charity donations by province

This tweet about the charitable donations by Albertans showed up in my timeline and caused a ruckus.

Many people took issue with the fact that these values weren’t adjusted for income. Seems to me that whether this is a good idea or not depends on what kind of question you’re trying to answer. Regardless, the CANSIM table includes this value. So, it is straightforward to calculate. Plus CANSIM tables have a pretty standard structure and showing how to manipulate this one serves as a good template for others.

library(tidyverse)
# Download and extract
url <- "[www20.statcan.gc.ca/tables-ta...](http://www20.statcan.gc.ca/tables-tableaux/cansim/csv/01110001-eng.zip)"
zip_file <- "01110001-eng.zip"
download.file(url,
              destfile = zip_file)
unzip(zip_file) 
# We only want two of the columns. Specifying them here.
keep_data <- c("Median donations (dollars)",
               "Median total income of donors (dollars)")
cansim <- read_csv("01110001-eng.csv") %>% 
  filter(DON %in% keep_data,
         is.na(`Geographical classification`)) %>% # This second filter removes anything that isn't a province or territory
  select(Ref_Date, DON, Value, GEO) %>%
  spread(DON, Value) %>% 
  rename(year = Ref_Date,
         donation = `Median donations (dollars)`,
         income = `Median total income of donors (dollars)`) %>% 
  mutate(donation_per_income = donation / income) %>% 
  filter(year == 2015) %>% 
  select(GEO, donation, donation_per_income)
cansim
## # A tibble: 16 x 3
##                                  GEO donation donation_per_income
##                                                   
##  1                           Alberta      450         0.006378455
##  2                  British Columbia      430         0.007412515
##  3                            Canada      300         0.005119454
##  4                          Manitoba      420         0.008032129
##  5                     New Brunswick      310         0.006187625
##  6         Newfoundland and Labrador      360         0.007001167
##  7 Non CMA-CA, Northwest Territories      480         0.004768528
##  8                 Non CMA-CA, Yukon      310         0.004643499
##  9             Northwest Territories      400         0.003940887
## 10                       Nova Scotia      340         0.006505932
## 11                           Nunavut      570         0.005651398
## 12                           Ontario      360         0.005856515
## 13              Prince Edward Island      400         0.008221994
## 14                            Quebec      130         0.002452830
## 15                      Saskatchewan      410         0.006910501
## 16                             Yukon      420         0.005695688

Curious that they dropped the territories from their chart, given that Nunavut has such a high donation amount.

Now we can plot the normalized data to find how the rank order changes. We’ll add the Canadian average as a blue line for comparison.

I’m not comfortable with using median donations (adjusted for income or not) to say anything in particular about the residents of a province. But, I’m always happy to look more closely at data and provide some context for public debates.

One major gap with this type of analysis is that we’re only looking at the median donations of people that donated anything at all. In other words, we aren’t considering anyone who donates nothing. We should really compare these median donations to the total population or the size of the economy. This Stats Can study is a much more thorough look at the issue.

For me the interesting result here is the dramatic difference between Quebec and the rest of the provinces. But, I don’t interpret this to mean that Quebecers are less generous than the rest of Canada. Seems more likely that there are material differences in how the Quebec economy and social safety nets are structured.

TTC delay data and Friday the 13th

The TTC releasing their Subway Delay Data was great news. I’m always happy to see more data released to the public. In this case, it also helps us investigate one of the great, enduring mysteries: Is Friday the 13th actually an unlucky day?

As always, we start by downloading and manipulating the data. I’ve added in two steps that aren’t strictly necessary. One is converting the Date, Time, and Day columns into a single Date column. The other is to drop most of the other columns of data, since we aren’t interested in them here.

url <- "http://www1.toronto.ca/City%20Of%20Toronto/Information%20&%20Technology/Open%20Data/Data%20Sets/Assets/Files/Subway%20&%20SRT%20Logs%20(Jan01_14%20to%20April30_17).xlsx"
filename <- basename(url)
download.file(url, destfile = filename, mode = "wb")
delays <- readxl::read_excel(filename, sheet = 2) %>% 
  dplyr::mutate(date = lubridate::ymd_hm(paste(`Date`, 
                                               `Time`, 
                                               sep = " ")),
                delay = `Min Delay`) %>% 
  dplyr::select(date, delay)
delays
## # A tibble: 69,043 x 2
##                   date delay
##                 <dttm> <dbl>
##  1 2014-01-01 02:06:00     3
##  2 2014-01-01 02:40:00     0
##  3 2014-01-01 03:10:00     3
##  4 2014-01-01 03:20:00     5
##  5 2014-01-01 03:29:00     0
##  6 2014-01-01 07:31:00     0
##  7 2014-01-01 07:32:00     0
##  8 2014-01-01 07:34:00     0
##  9 2014-01-01 07:34:00     0
## 10 2014-01-01 07:53:00     0
## # ... with 69,033 more rows

Now we have a delays dataframe with 69043 incidents starting from 2014-01-01 00:21:00 and ending at 2017-04-30 22:13:00. Before we get too far, we’ll take a look at the data. A heatmap of delays by day and hour should give us some perspective.

delays %>% 
  dplyr::mutate(day = lubridate::day(date),
         hour = lubridate::hour(date)) %>% 
  dplyr::group_by(day, hour) %>% 
  dplyr::summarise(sum_delay = sum(delay)) %>% 
  ggplot2::ggplot(aes(x = hour, y = day, fill = sum_delay)) +
    ggplot2::geom_tile(alpha = 0.8, color = "white") +
    ggplot2::scale_fill_gradient2() + 
    ggplot2::theme(legend.position = "right") +
    ggplot2::labs(x = "Hour", y = "Day of the month", fill = "Sum of delays")

Other than a reliable band of calm very early in the morning, no obvious patterns here.

We need to identify any days that are a Friday the 13th. We also might want to compare weekends, regular Fridays, other weekdays, and Friday the 13ths, so we add a type column that provides these values. Here we use the case_when function:

delays <- delays %>% 
    dplyr::mutate(type = case_when( # Partition into Friday the 13ths, Fridays, weekends, and weekdays
      lubridate::wday(.$date) %in% c(1, 7) ~ "weekend",
      lubridate::wday(.$date) %in% c(6) & 
        lubridate::day(.$date) == 13 ~ "Friday 13th",
      lubridate::wday(.$date) %in% c(6) ~ "Friday",
      TRUE ~ "weekday" # Everything else is a weekday
  )) %>% 
  dplyr::mutate(type = factor(type)) %>% 
  dplyr::group_by(type)
delays
## # A tibble: 69,043 x 3
## # Groups:   type [4]
##                   date delay    type
##                 <dttm> <dbl>  <fctr>
##  1 2014-01-01 02:06:00     3 weekday
##  2 2014-01-01 02:40:00     0 weekday
##  3 2014-01-01 03:10:00     3 weekday
##  4 2014-01-01 03:20:00     5 weekday
##  5 2014-01-01 03:29:00     0 weekday
##  6 2014-01-01 07:31:00     0 weekday
##  7 2014-01-01 07:32:00     0 weekday
##  8 2014-01-01 07:34:00     0 weekday
##  9 2014-01-01 07:34:00     0 weekday
## 10 2014-01-01 07:53:00     0 weekday
## # ... with 69,033 more rows

With the data organized, we can start with just a simple box plot of the minutes of delay by type.

ggplot2::ggplot(delays, aes(type, delay)) +
  ggplot2::geom_boxplot() + 
  ggplot2::labs(x = "Type", y = "Minutes of delay")

Not very compelling. Basically most delays are short (as in zero minutes long) with many outliers.

How about if we summed up the total minutes in delays for each of the types of days?

delays %>% 
  dplyr::summarise(total_delay = sum(delay)) 
## # A tibble: 4 x 2
##          type total_delay
##        <fctr>       <dbl>
## 1      Friday       18036
## 2 Friday 13th         619
## 3     weekday       78865
## 4     weekend       28194

Clearly the total minutes of delays are much shorter for Friday the 13ths. But, there aren’t very many such days (only 6 in fact). So, this is a dubious analysis.

Let’s take a step back and calculate the average of the total delay across the entire day for each of the types of days. If Friday the 13ths really are unlucky, we would expect to see longer delays, at least relative to a regular Friday.

daily_delays <- delays %>% # Total delays in a day
  dplyr::mutate(year = lubridate::year(date),
                day = lubridate::yday(date)) %>% 
  dplyr::group_by(year, day, type) %>% 
  dplyr::summarise(total_delay = sum(delay))

mean_daily_delays <- daily_delays %>% # Average delays in each type of day
  dplyr::group_by(type) %>% 
  dplyr::summarise(avg_delay = mean(total_delay))
mean_daily_delays
## # A tibble: 4 x 2
##          type avg_delay
##        <fctr>     <dbl>
## 1      Friday 107.35714
## 2 Friday 13th 103.16667
## 3     weekday 113.63833
## 4     weekend  81.01724
ggplot2::ggplot(daily_delays, aes(type, total_delay)) +
  ggplot2::geom_boxplot() + 
  ggplot2::labs(x = "Type", y = "Total minutes of delay")

On average, Friday the 13ths have shorter total delays (103 minutes) than either regular Fridays (107 minutes) or other weekdays (114 minutes). Overall, weekend days have far shorter total delays (81 minutes).

If Friday the 13ths are unlucky, they certainly aren’t causing longer TTC delays.

For the statisticians among you that still aren’t convinced, we’ll run a basic linear model to compare Friday the 13ths with regular Fridays. This should control for many unmeasured variables.

model <- lm(total_delay ~ type, data = daily_delays, 
            subset = type %in% c("Friday", "Friday 13th"))
summary(model)
## 
## Call:
## lm(formula = total_delay ~ type, data = daily_delays, subset = type %in% 
##     c("Friday", "Friday 13th"))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -103.357  -30.357   -6.857   18.643  303.643 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      107.357      3.858  27.829   <2e-16 ***
## typeFriday 13th   -4.190     20.775  -0.202     0.84    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50 on 172 degrees of freedom
## Multiple R-squared:  0.0002365,  Adjusted R-squared:  -0.005576 
## F-statistic: 0.04069 on 1 and 172 DF,  p-value: 0.8404

Definitely no statistical support for the idea that Friday the 13ths cause longer TTC delays.

How about time series tests, like anomaly detections? Seems like we’d just be getting carried away. Part of the art of statistics is knowing when to quit.

In conclusion, then, after likely far too much analysis, we find no evidence that Friday the 13ths cause an increase in the length of TTC delays. This certainly suggests that Friday the 13ths are not unlucky in any meaningful way, at least for TTC riders.

Glad we could put this superstition to rest!

Successful AxePC 2016 event

Thank you to all the participants, donors, and volunteers for making the third Axe Pancreatic Cancer event such a great success! Together we’re raising awareness and funding to support Pancreatic Cancer Canada.

Axe PC event photo

Axe PC 2016

We’re hosting our third-annual Axe Pancreatic Cancer event. Help us kick off Pancreatic Cancer Awareness Month by drinking beer and throwing axes!

Axe PC event poster

Public service vs. Academics

I recently participated in a panel discussion at the University of Toronto on the career transition from academic research to public service. I really enjoyed the discussion and there were many great questions from the audience. Here’s just a brief summary of some of the main points I tried to make about the differences between academics and public service.

The major difference I’ve experienced involves a trade-off between control and influence.

As a grad student and post-doctoral researcher I had almost complete control over my work. I could decide what was interesting, how to pursue questions, who to talk to, and when to work on specific components of my research. I believe that I made some important contributions to my field of study. But, to be honest, this work had very little influence beyond a small group of colleagues who are also interested in the evolution of floral form.

Now I want to be clear about this: in no way should this be interpreted to mean that scientific research is not important. This is how scientific progress is made – many scientists working on particular, specific questions that are aggregated into general knowledge. This work is important and deserves support. Plus, it was incredibly interesting and rewarding.

However, the comparison of the influence of my academic research with my work on infrastructure policy is revealing. Roads, bridges, transit, hospitals, schools, courthouses, and jails all have significant impacts on the day-to-day experience of millions of people. Every day I am involved in decisions that determine where, when, and how the government will invest scarce resources into these important services.

Of course, this is where the control-influence trade-off kicks in. As an individual public servant, I have very little control over these decisions or how my work will be used. Almost everything I do involves medium-sized teams with members from many departments and ministries. This requires extensive collaboration, often under very tight time constraints with high profile outcomes.

For example, in my first week as a public servant I started a year-long process to integrate and enhance decision-making processes across 20 ministries and 2 agencies. The project team included engineers, policy analysts, accountants, lawyers, economists, and external consultants from all of the major government sectors. The (rather long) document produced by this process is now used to inform every infrastructure decision made by the province.

Governments contend with really interesting and complicated problems that no one else can or will consider. Businesses generally take on the easy and profitable issues, while NGOs are able to focus on specific aspects of issues. Consequently, working on government policy provides a seemingly endless supply of challenges and puzzles to solve, or at least mitigate. I find this very rewarding.

None of this is to suggest that either option is better than the other. I’ve been lucky to have had two very interesting careers so far, which have been at the opposite ends of this control-influence trade-off. Nonetheless, my experience suggests that an actual academic career is incredibly challenging to obtain and may require significant compromises. Public service can offer many of the same intellectual challenges with better job prospects and work-life balance. But, you need to be comfortable with the diminished control.

Thanks to my colleague Andrew Miller for creating the panel and inviting me to participate. The experience led me to think more clearly about my career choices and I think the panel was helpful to some University of Toronto grad students.

From brutal brooding to retrofit-chic

Our offices will be moving to this new space. I’m looking forward to actually working in a green building, in addition to developing green building policies.

The Jarvis Street project will set the benchmark for how the province manages its own building retrofits. The eight-month-old Green Energy Act requires Ontario government and broader public-sector buildings to meet a minimum LEED Silver standard – Leadership in Energy and Environmental Design. Jarvis Street will also be used to promote an internal culture of conservation, and to demonstrate the province’s commitment to technologically advanced workspaces that are accessible, flexible and that foster staff collaboration and creativity, Ms. Robinson explains.

From brutal brooding to retrofit-chic

Emacs Installation on Windows XP

I spend a fair bit of time with a locked-down Windows XP machine. Fortunately, I’m able to install Emacs which provides capabilities that I find quite helpful. I’ve had to reinstall Emacs a few times now. So, for my own benefit (and perhaps your’s) here are the steps I follow:

  1. Download EmacsW32 patched and install in my user directory under Apps

    Available from http://ourcomments.org/Emacs/EmacsW32.html

  2. Set the environment variable for HOME to my user directory

    Right click on My Computer, select the Advanced tab, and then Environment Variables.

    Add a new variable and set Variable name to HOME and Variable value to C:\Documents and Settings\my_user_directory

  3. Download technomancy’s Emacs Starter Kit

    Available from http://github.com/technomancy/emacs-starter-kit

    Extract archive into .emacs_d in %HOME%

    Copy my specific emacs settings into .emacs_d\my_user_name.el

Canada LEED projects

The CaGBC maintains a list of all the registered LEED projects in Canada. This is a great resource, but rather awkward for analyses. I’ve copied these data into a DabbleDB application with some of the maps and tabulations that I frequently need to reference.

Here for example is a map of the density of LEED projects in each province. While here is a rather detailed view of the kinds of projects across provinces. There are several other views available. Are there any others that might be useful?

Every day is ‘science day'

I was given an opportunity to propose a measure to clarify how and on what basis the federal government allocates funds to STI - a measure that would strengthen relations between the federal government and the STI community by eliminating misunderstandings and suspicions on this point. In short, my proposal was that Ottawa direct its Science, Technology and Innovation Council to do three things:

To provide an up-to-date description of how these allocation decisions have been made in the past;

To identify the principles and sources of advice on which such decisions should be based;

To recommend the most appropriate structure and process - one characterized by transparency and openness - for making these decisions in the future.

These are reasonable suggestions from Preston Manning: be clear about why and how the Federal government funds science and technology.

Of course I may not agree with the actual decisions made through such a process, but at least I would know why the decisions were made. The current process is far too opaque and confused for such critical investment decisions.

Math and the City

judson.blogs.nytimes.com/2009/05/1…

A good read on the mathematics of scaling in urban patterns. I had looked into using the Bettencourt paper (cited in this article) for making allocation decisions. The trick is moving from the general patterns observed in urban scaling to specific recommendations for where to invest in new infrastructure. This is particularly challenging in the absence of good, detailed data on the current infrastructure stock. We’ve made good progress on gathering some of this data, and it might be worth revisiting this scaling relationship.

Mama Earth Organics

I’m certain that paying attention to where my food comes from is important. Food production influences my health, has environmental consequences, and affects both urban and rural design. Ideally, I would develop relationships with local farmers, carefully choose organic produce, and always consider broad environmental impacts. Except, I like to spend time with my young family, try to get some exercise, and have more than enough commitments through work to actually spend this much effort on food choices. So, I’ve outsourced this process to the excellent Mama Earth Organics.

Every week a basket of fresh organic and/or local fruit and vegetables arrives on our doorstep. Part of the fun of this service is that different items arrive each week, which diversifies our weekly food routine. But, we always know what’s coming several days in advance, so we can plan our meals well ahead of time. After over a year of service, we’ve only had a single complaint about quality and this was handled very quickly by Mama Earth with a full refund plus credit.

We’ve found the small basket is sufficient for two adults and a picky four-year old. We’ve also added in some fresh bread from St. John’s Bakery, which has been consistently delicious and lasts through most of the week.

Goodyear's Religious Beliefs vs. Evolution

Our minister of science continues to argue that his unwillingness to endorse the theory of evolution is not relevant to science policy. As quoted by the Globe and Mail:

My view isn’t important. My personal beliefs are not important.

I find this amazing. How can the minister of science’s views on the fundamental unifying theory of biology not be important?

I don’t expect him to understand the details of evolutionary theory or to have all of his personal beliefs vetted and religious views muted. However, I do expect him – as minister – to champion and support Canadian science, especially basic research. When our minister refuses to acknowledge the fundamental discoveries of science, our reputation is diminished.

There is also a legitimate – though rather exaggerated – concern that the minister’s views on the truth can influence policy and funding decisions. The funding councils are more than sufficiently independent to prevent any undue ministerial influence here. The real problem is an apparent distrust or lack of interest in basic research from the federal government.

Death Sentences Review

Death Sentences by Don Watson is a wonderful book – simultaneously funny, scary, and inspiring – that describes how “clichés, weasel words, and management-speak” are infecting public language.

The humour comes from Watson’s acerbic commentary and fantastic scorn for phrases like:

Given the within year and budget time flexibility accorded to the science agencies in the determination of resource allocation from within their global budget, a multi-parameter approach to maintaining the agencies budgets in real terms is not appropriate.

The book is scary because it makes a strong argument for the dangers of this type of language. Citizens become confused and disinterested, customers become jaded, and people loose their love for language. Also, as a public servant I see this kind of language every day and often find myself struggling to avoid banality and cliches (not to mention bullet points). We need more forceful advocates like Don Watson to call out politicians and corporations for abusing our language. This book certainly makes me want to try harder. And what’s more inspiring than struggling for a good cause against long odds?

The book also has a great glossary of typical weasel words with possible synonyms. So, I’m keeping the book in my office for quick reference.

Omnivore

After seventeen years as a vegetarian, I recently switched back to an omnivore. My motivation for not eating meat was environmental, since, on average, a vegetarian diet requires much less land, water, and energy. This is still the right motivation, but over the last year or so I’ve been rethinking my decision to not eat meat.

My concern was that I’d stopped paying attention to my food choices and a poorly considered vegetarian diet can easily yield a bad environmental outcome. In particular, modern agriculture now takes 10 calories of fossil fuel energy to produce a single calorie of food. This is clearly unsustainable. We cannot rely on non-renewable, polluting resources for our food, nor can we continue to transport food great distances – even if it is only vegetables. My unexamined commitment to a vegetarian diet was no longer consistent with environmental sustainability.

I think the solution is to eat local, organic food. This also requires eating seasonal food, but Canadian winters are horrible for local vegetables. This left me wanting to support local agriculture, but unable to restrict my diet. Returning to my original motivation to choose environmentally appropriate food convinced me it was time to return to being an omnivore. My new policy is to follow Michael Pollan’s advice: “Eat food. Mostly plants. Not too much.” In addition, I’ll favour locally grown, organic food and include small amounts of meat – which I hope will predominantly come from carefully considered and sustainable sources. I’ve also deciced that when faced with a dillema of choosing either local or organic, I’ll choose local. We need to support local agriculture and I’ll trade this for organic if necessary. Of course, in the majority of cases local and organic options are available, and I’ll choose them.

This is a big change and I look forward to exploring food again.

Instapaper Review

Instapaper is an integral part of my web-reading routine. Typically, I have a few minutes early in the morning and scattered throughout the day for quick scans of my favourite web sites and news feeds. I capture anything worth reading with Instapaper’s bookmarklet to create a reading queue of interesting articles. Then with a quick update to the iPhone app this queue is available whenever I find longer blocks of time for reading, particularly during the morning subway ride to work or late at night.

I also greatly appreciate Instapaper’s text view, which removes all the banners, ads, and link lists from the articles to present a nice and clean text view of the content only. I often find myself saving an article to Instapaper even when I have the time to read it, just so I can use this text-only view.

Instapaper is one of my favourite tools and the first iPhone application I purchased.

Election 2008

Like most Canadians, I’ll be at the polls today for the 2008 Federal Election.

In the past several elections, I’ve cast my vote for the party with the best climate change plan. The consensus among economists is that any credible plan must set a price on carbon emissions. My personal preference is for a predictable and transparent price to influence consumer spending, so I favour a carbon tax over a cap-and-trade. Enlightening discussions of these issues are available at Worthwhile Canadian Initiative, Jeffrey Simpson’s column at the Globe and Mail, or his book Hot Air.

Until now this voting principle has meant a vote for the Green Party who support a tax shift from income to pollution. My expectation for this vote was not that the Green Party would gain any direct political power, rather their environmental plan would gain political profile and convince the Liberals and Conservatives to improve their plans. A carbon tax is now a central component of this year’s Liberal Platform with the Green Shift. Both the Conservative Pary and NDP support a limited cap-and-trade system on portions of the economy, with the Conservatives supporting dubious “intensity-based” targets.

Although I quite like the central components of the Green Shift, I’m not too keen on the distracting social engineering aspects of the plan. Furthermore, the Liberals have certainly failed to implement any of their previous climate change plans while in power. Nonetheless, I do think (hope?) they will follow through this time and I prefer supporting a well-conceived plan that may not be implemented than a poor plan. Despite my support for this plan, I think the Liberals have done a rather poor job of explaining the Green Shift and have conducted a disappointing campaign.

In the end, my principle will hold. I’m voting for the Green Shift and, reluctantly, the Liberal Party of Canada.

A Map of the Limits of Statistics

In this article Nassim Nicholas Taleb applies his Black Swan idea to the current financial crisis and describes the strengths and weaknesses of econometrics.

For us the world is vastly simpler in some sense than the academy, vastly more complicated in another. So the central lesson from decision-making (as opposed to working with data on a computer or bickering about logical constructions) is the following: it is the exposure (or payoff) that creates the complexity —and the opportunities and dangers— not so much the knowledge ( i.e., statistical distribution, model representation, etc.). In some situations, you can be extremely wrong and be fine, in others you can be slightly wrong and explode. If you are leveraged, errors blow you up; if you are not, you can enjoy life.

Via Arts and Letters Daily

Globe and Mail: Incremental man

A detailed and fascinating portrait of Stephen Harper. As the article points out:

The core of any government reflects the personality of the prime minister, because everyone in the system responds to his or her ways of thinking, personality traits, political ambitions and policy preferences. Know the prime minister; know the government.

Harper has been an enigma and learning more about his personal policies and approach to governance is very useful while thinking about the upcoming election.

A general summary of the article comes from near the end:

And the long-distance runner – bright, intense, strategic, cautious and confident in every stride – has certainly got things done, from merging two parties, to winning a minority government, to fulfilling most of his campaign promises.

He also has pursued two broad changes in the nature of the federal government: giving the provinces more running room by keeping Ottawa out of some of their affairs and giving individuals a bit more money in the form of tax reductions, credits and child-care cheques.

And yet, despite these policies that he assumed would be popular, despite all the problems on the Liberal side, despite raising far more money, despite governing in mostly excellent economic times, despite stroking Quebec, despite gearing up for elections, his Conservatives have yet to break through decisively.

Patrick Watson

Reading up on the upcoming Polaris Music Prize reminded me of Patrick Watson, last year’s winner of the prize. His “Close to Paradise” album is inventive with intriguing lyrics, unique sounds, and an often driving piano track. Particular stand out tracks are Luscious Life, Drifters, and The Great Escape. The album is well worth considering and I’m looking forward to listening to the short-listed artists for this year’s prize.

Stuck in the middle

A recent press release from the federal government entitled “Making a Strong Canadian Economy Even Stronger” contains a sentence that struck me as odd.

As a result of actions taken in Budget 2007, Canada’s marginal effective tax rate (METR) on new business investment improved from third-highest in the G7 to third-lowest by 2011.

Fair enough, tax rates are projected to decline. But notice how they phrase the context of this reduction. Moving from third highest to third lowest is, in a list of seven countries, a change from third to fifth. Not a dramatic change – we were near the middle and we still are.

Creationists and their old tricks

TVO’s The Agenda had an interesting show on the debate between evolutionary biology and creationism. Jerry Coyne provided a great overview of evolution and a good defence during the debate.

The debate offered a great illustration of the intellectual vacuity that characterises creationism (aka intelligent design). Paul Nelson offers up an article by Doolittle and Bapteste as proof that Darwinism is unravelling. I suspect he hopes no one will read past the abstract to discover the reasonable debate scientists are having about the universality of a single tree of life. He certainly doesn’t want you to notice that the entire article is couched within evolutionary theory and not once does it claim that Darwinism has been falsified.

Here’s the hypothesis that Doolittle and Bapteste are evaluating:

“that there should be a universal TOL [tree of life], dichotomously branching all of the way down to a single root.” p2045

They then establish that gene transfer often occurs between lineages, particularly among prokaryotes, and consequently this universal tree of life does not exist. Certainly this complicates the construction of molecular trees and shows the importance for pluralism of mechanism in biology. But they write much more about the overall significance of this work.

“To be sure, much of evolution has been tree-like and is captured in hierarchical classifications.” p2048

“…it would be perverse to claim that Darwin’s TOL hypothesis has been falsified for animals (the taxon to which he primarily addressed himself) or that it is not an appropriate model for many taxa at many levels of analysis” p2048

And the crucial quote in this context:

“Holding onto this ladder of pattern […] should not be an essential element in our struggle against those who doubt the validity of evolutionary theory, who can take comfort from this challenge to the TOL only by a willful misunderstanding of its import.” p2048

Stikkit from the command line

Note – This post has been updated from 2007-03-20 to describe new installation instructions.

Overview

I’ve integrated Stikkit into most of my workflow and am quite happy with the results. However, one missing piece is quick access to Stikkit from the command line. In particular, a quick list of my undone todos is quite useful without having to load up a web browser. To this end, I’ve written a Ruby script for interacting with Stikkit. As I mentioned, my real interest is in listing undone todos. But I decided to make the script more general, so you can ask for specific types of stikkits and restrict the stikkits with specific parameters. Also, since the stikkit api is so easy to use, I added in a method for creating new stikkits.

Usage

The general use of the script is to list stikkits of a particular type, filtered by a parameter. For example,

ruby stikkit.rb --list calendar dates=today

will show all of today’s calendar events. While,

ruby stikkit.rb -l todos done=0

lists all undone todos. The use of -l instead of --list is simply a standard convenience. Furthermore, since this last example comprises almost all of my use for this script, I added a convenience method to get all undone todos

ruby stikkit.rb -t

A good way to understand stikkit types and parameters is to keep an eye on the url while you interact with Stikkit in your browser. To create a new stikkit, use the --create flag,

ruby stikkit.rb -c 'Remember me.'

The text you pass to stikkit.rb will be processed as usual by Stikkit.

Installation

Grab the script from the Google Code project and put it somewhere convenient. Making the file executable and adding it to your path will cut down on the typing. The script reads from a .stikkit file in your path that contains your username and password. Modify this template and save it as ~/.sikkit


     ---
     username: me@domain.org 
     password: superSecret 

The script also requires the atom gem, which you can grab with

gem install atom

I’ve tried to include some flexibility in the processing of stikkits. So, if you don’t like using atom, you can switch to a different format provided by Stikkit. The text type requires no gems, but makes picking out pieces of the stikkits challenging.

Feedback

This script serves me well, but I’m interested in making it more useful. Feel free to pass along any comments or feature requests.

Yahoo Pipes and the Globe and Mail

Most of my updates arrive through feeds to NetNewsWire. Since my main source of national news and analysis is the Globe and Mail, I’m quite happy that they provide many feeds for accessing their content. The problem is that many news stories are duplicated across these feeds. Furthermore, tracking all of the feeds of interest is challenging.

The new Yahoo Pipes offer a solution to these problems. Without providing too much detail, pipes are a way to filter, connect, and generally mash-up the web with a straightforward interface. I’ve used this service to collect all of the Globe and Mail feeds of interest, filter out the duplicates, and produce a feed I can subscribe to. Nothing fancy, but quite useful. The pipe is publicly available and if you don’t agree with my choice of news feeds, you are free to clone mine and create your own. There are plenty of other pipes available, so take a look to see if anything looks useful to you. Even better, create your own.

If you really want those details, Tim O'Reilly has plenty.

Stikkit Todos in GMail

I find it useful to have a list of my unfinished tasks generally, but subtley, available. To this end, I’ve added my unfinished todos from Stikkit to my Gmail web clips. These are the small snippets of text that appear just above the message list in GMail.

All you need is the subscribe link from your todo page with the ‘not done’ button toggled. The url should look something like:

http://stikkit.com/todos.atom?api_key={}&done=0

Paste this into the 'Search by topic or URL:’ box of Web Clips tab in GMail settings.

DabbleDB

My experiences helping people manage their data has repeatedly shown that databases are poorly understood. This is well illustrated by the rampant abuses of spreadsheets for recording, manipulating, and analysing data.

Most people realise that they should be using a database, the real issue is the difficulty of creating a proper database. This is a legitimate challenge. Typically, you need to carefully consider all of the categories of data and their relationships when creating the database, which makes the upfront costs quite significant. Why not just start throwing data into a spreadsheet and worry about it later?

I think that DabbleDB can solve this problem. A great strength of Dabble –- and the source of its name — is that you can start with a simple spreadsheet of data and progressively convert it to a database as you begin to better understand the data and your requirements.

Dabble also has a host of great features for working with data. I’ll illustrate this with a database I created recently when we were looking for a new home. This is a daunting challenge. We looked at dozens of houses each with unique pros and cons in different neighbourhoods and with different price ranges. I certainly couldn’t keep track of them all.

I started with a simple list of addresses for consideration. This was easily imported into Dabble and immediately became useful. Dabble can export to Google Earth, so I could quickly have an overview of the properties and their proximity to amenities like transit stops and parks. Next, I added in a field for asking price and MLS url which were also exported to Google Earth. Including price gave a good sense of how costs varied with location, while the url meant I could quickly view the entire listing for a property.

Next, we started scheduling appointments to view properties. Adding this to Dabble immediately created a calendar view. Better yet, Dabble can export this view as an iCal file to add into a calendaring program.

Once we started viewing homes, we began to understand what we really were looking for in terms of features. So, add these to Dabble and then start grouping, searching, and sorting by these attributes.

All of this would have been incredibly challenging without Dabble. No doubt, I would have simply used a spreadsheet and missed out on the rich functionality of a database.

Dabble really is worth a look. The best way to start is to watch the seven minute demo and then review some of the great screencasts.

Stikkit-- Out with the mental clutter

I like to believe that my brain is useful for analysis, synthesis, and creativity. Clearly it is not proficient at storing details like specific dates and looming reminders. Nonetheless, a great deal of my mental energy is devoted to trying to remember such details and fearing the consequences of the inevitable “it slipped my mind”. As counselled by GTD, I need a good and trustworthy system for removing these important, but distracting, details and having them reappear when needed. I’ve finally settled in on the new product from values of n called Stikkit.

Stikkit appeals to me for two main reasons: easy data entry and smart text processing. Stikkit uses the metaphor of the yellow sticky note for capturing text. When you create a new note, you are presented with a simple text field — nothing more. However, Stikkit parses your note for some key words and extracts information to make the note more useful. For example, if you type:

Phone call with John Smith on Feb 1 at 1pm

Stikkit realises that you are describing an event scheduled for February 1st at one in the afternoon with a person (“peep” in Stikkit slang) named John Smith. A separate note will be created to track information about John Smith and will be linked to the phone call note. If you add the text “remind me” to the note, Stikkit will send you an email and SMS message prior to the event. You can also include tags to group notes together with the keywords “tag as”.

A recent update to peeps makes them even more useful. Stikkit now collects information about people as you create notes. So, for example, if I later post:

- Send documents to John Smith john@smith.net

Stikkit will recognise John Smith and update my peep for him with the email address provided. In this way, Stikkit becomes more useful as you continue to add information to notes. Also, the prefixed “-” causes Stikkit to recognise this note as a todo. I can then list all of my todos and check them off as they are completed.

This text processing greatly simplifies data entry, since I don’t need to click around to create todos are choose dates from a calendar picker. Just type in the text, hit save, and I’m done. Fortunately, Stikkit has been designed to be smart rather than clever. The distinction here is that Stikkit relies on some key words (such as at, for, to) to mark up notes consistently and reliably. Clever software is exemplified by Microsoft Word’s autocorrect or clipboard assistant. My first goal when encountering these “features” is to turn them off. I find they rarely do the right thing and end up being a hindrance. Stikkit is well worth a look. For a great overview check out the screencasts in the forum.

Mac vs. PC Remotes

An image of a remote from Apple and a PC

I grabbed this image while preparing a new Windows machine. This seems to be an interesting comparison of the difference in design approaches between Apple and PC remotes. Both provide essentially the same functions. Clearly, however, one is more complex than the other. Which would you rather use?

Plantae's continued development

Prior to general release, plantae is moving web hosts. This seems like a good time to point out that all of plantae’s code is hosted at Google Code. The project has great potential and deserves consistent attention. Unfortunately, I can’t continue to develop the code. So, if you have an interest in collaborative software, particularly in the scientific context, I encourage you to take a look.

Text processing with Unix

I recently helped someone process a text file with the help of Unix command line tools. The job would have been quite challenging otherwise, and I think this represents a useful demonstration of why I choose to use Unix.

The basic structure of the datafile was:

; A general header file ;
1
sample: 0.183 0.874 0.226 0.214 0.921 0.272 0.117
2
sample: 0.411 0.186 0.956 0.492 0.150 0.278 0.110
3
...

In this case the only important information is the second number of each line that begins with “sample:”. Of course, one option is to manually process the file, but there are thousands of lines, and that’s just silly.

We begin by extracting only the lines that begin with “sample:”. grep will do this job easily:

grep "^sample" input.txt

grep searches through the input.txt file and outputs any matching lines to standard output.

Now, we need the second number. sed can strip out the initial text of each line with a find and replace while tr compresses any strange use of whitespace:

sed 's/sample: //g' | tr -s ' '

Notice the use of the pipe (|) command here. This sends the output of one command to the input of the next. This allows commands to be strung together and is one of the truly powerful tools in Unix.

Now we have a matrix of numbers in rows and columns, which is easily processed with awk.

awk '{print $2;}'

Here we ask awk to print out the second number of each row.

So, if we string all this together with pipes, we can process this file as follows:

grep "^sample" input.txt | sed 's/sample: //g' | tr -s ' ' | awk '{print $2;}' > output.txt

Our numbers of interest are in output.txt.