code

Charity donations by province

This tweet about the charitable donations by Albertans showed up in my timeline and caused a ruckus. Albertans give the most to charity in Canada, 50% more than the national average, even in tough economic times. #CdnPoli pic.twitter.com/keKPzY8brO — Oil Sands Action (@OilsandsAction) August 31, 2017 Many people took issue with the fact that these values weren’t adjusted for income. Seems to me that whether this is a good idea or not depends on what kind of question you’re trying to answer.

Continue reading →

Canada LEED projects

The CaGBC maintains a list of all the registered LEED projects in Canada. This is a great resource, but rather awkward for analyses. I’ve copied these data into a DabbleDB application with some of the maps and tabulations that I frequently need to reference. Here for example is a map of the density of LEED projects in each province. While here is a rather detailed view of the kinds of projects across provinces.

Continue reading →

Instapaper Review

Instapaper is an integral part of my web-reading routine. Typically, I have a few minutes early in the morning and scattered throughout the day for quick scans of my favourite web sites and news feeds. I capture anything worth reading with Instapaper’s bookmarklet to create a reading queue of interesting articles. Then with a quick update to the iPhone app this queue is available whenever I find longer blocks of time for reading, particularly during the morning subway ride to work or late at night.

Continue reading →

Stikkit from the command line

Note – This post has been updated from 2007-03-20 to describe new installation instructions. Overview I’ve integrated Stikkit into most of my workflow and am quite happy with the results. However, one missing piece is quick access to Stikkit from the command line. In particular, a quick list of my undone todos is quite useful without having to load up a web browser. To this end, I’ve written a Ruby script for interacting with Stikkit.

Continue reading →

Yahoo Pipes and the Globe and Mail

Most of my updates arrive through feeds to NetNewsWire. Since my main source of national news and analysis is the Globe and Mail, I’m quite happy that they provide many feeds for accessing their content. The problem is that many news stories are duplicated across these feeds. Furthermore, tracking all of the feeds of interest is challenging. The new Yahoo Pipes offer a solution to these problems. Without providing too much detail, pipes are a way to filter, connect, and generally mash-up the web with a straightforward interface.

Continue reading →

Stikkit Todos in GMail

I find it useful to have a list of my unfinished tasks generally, but subtley, available. To this end, I’ve added my unfinished todos from Stikkit to my Gmail web clips. These are the small snippets of text that appear just above the message list in GMail. All you need is the subscribe link from your todo page with the ‘not done’ button toggled. The url should look something like:

Continue reading →

DabbleDB

My experiences helping people manage their data has repeatedly shown that databases are poorly understood. This is well illustrated by the rampant abuses of spreadsheets for recording, manipulating, and analysing data. Most people realise that they should be using a database, the real issue is the difficulty of creating a proper database. This is a legitimate challenge. Typically, you need to carefully consider all of the categories of data and their relationships when creating the database, which makes the upfront costs quite significant.

Continue reading →

Stikkit-- Out with the mental clutter

I like to believe that my brain is useful for analysis, synthesis, and creativity. Clearly it is not proficient at storing details like specific dates and looming reminders. Nonetheless, a great deal of my mental energy is devoted to trying to remember such details and fearing the consequences of the inevitable “it slipped my mind”. As counselled by GTD, I need a good and trustworthy system for removing these important, but distracting, details and having them reappear when needed.

Continue reading →

Mac vs. PC Remotes

I grabbed this image while preparing a new Windows machine. This seems to be an interesting comparison of the difference in design approaches between Apple and PC remotes. Both provide essentially the same functions. Clearly, however, one is more complex than the other. Which would you rather use?

Plantae's continued development

Prior to general release, plantae is moving web hosts. This seems like a good time to point out that all of plantae’s code is hosted at Google Code. The project has great potential and deserves consistent attention. Unfortunately, I can’t continue to develop the code. So, if you have an interest in collaborative software, particularly in the scientific context, I encourage you to take a look.

Text processing with Unix

I recently helped someone process a text file with the help of Unix command line tools. The job would have been quite challenging otherwise, and I think this represents a useful demonstration of why I choose to use Unix. The basic structure of the datafile was: ; A general header file ; 1 sample: 0.183 0.874 0.226 0.214 0.921 0.272 0.117 2 sample: 0.411 0.186 0.956 0.492 0.150 0.278 0.110 3 .

Continue reading →

Principles of Technology Adoption

Choosing appropriate software tools can be challenging. Here are the principles I employ when making the decision: Simple: This seems obvious, but many companies fail here. Typically, their downfall is focussing on a perpetual increase in feature quantity. I don’t evaluate software with feature counts. Rather, I value software that performs a few key operations well. Small, focussed tools result in much greater productivity than overly-complex, all-in-one tools. 37 Signals’ Writeboard is a great example of a simple, focussed tool for collaborative writing.

Continue reading →

RSiteSearch

I’m not sure how this escaped my notice until now, but `RSiteSearch` is a very useful command in R. Passing a string to this function loads up your web browser with search results from the R documentation and mailing list. So, for example: RSiteSearch("glm") will show you everything you need to know about using R for generalised linear models.

R module for ConTeXt

I generally write my documents in Sweave format. This approach allows me to embed the code for analyses directly in the report derived from the analyses, so that all results and figures are generated dynamically with the text of the report. This provides both great documentation of the analyses and the convenience of a single file to keep track of and work with. Now there is a new contender for integrating analysis code and documentation with the release of an R module for ConTeXt.

Continue reading →

expand.grid

Here’s a simple trick for creating experimental designs in R: use the function expand.grid. A simple example is: treatments which produces: treatment level 1 A 1 2 B 1 3 C 1 4 D 1 5 A 2 6 B 2 7 C 2 8 D 2 9 A 3 10 B 3 11 C 3 12 D 3 Now, if you want to randomize your experimental treatments, try:

Continue reading →

Heart of the Matter

CBC’s Ideas has been running a series of shows on heart disease called “Heart of the Matter”. Episode 2 is particularly interesting from a statistical perspective, as the episode discusses several difficulties with the analysis of drug efficacy. Some highlights include: Effect sizes Some of the best cited studies for the use of drugs to treat heart disease show a statistically significant effect of only a few percentage points improvement. Contrast this with a dramatic, vastly superior improvement from diet alone.

Continue reading →

Plantae resurrected

Some technical issues coupled with my road-trip-without-a-laptop conspired to keep Plantae from working correctly. I’ve repaired the damage and isolated Plantae from such problems in the future. My apologies for the downtime.

Analysis of Count Data

When response variables are composed of counts, the standard statistical methods that rely on the normal distribution are no longer applicable. Count data are comprised of positive integers and, often, many zeros. For such data, we need statistics based on Poisson or binomial distributions. I’ve spent the past few weeks analysing counts from hundreds of transects and, as is typical, a particular challenge was determining the appropriate packages to use for R.

Continue reading →

Desktop Manager

I’m convinced that no computer display is large enough. What we need are strategies to better manage our computer workspace and application windows. Exposé and tabbed browsing are great features, but what I really want is the equivalent of a file folder. You put all of the relevant documents in a folder and then put it aside for when you need it. Once you’re ready, you open up the folder and are ready to go.

Continue reading →

Managing project files

As I accumulate projects (both new and completed), the maintenance and storage of the project files becomes increasingly important. There are two important goals for a file structure: find things quickly and don’t lose anything. My current strategy is as follows: Every project has a consistent folder structure: |-- analysis |-- data |-- db |-- doc |-- fig `-- utils analysis holds the R source files of the analysis. These, typically, are experiments and snippets of code.

Continue reading →

Sun grid

Sun’s new Grid Compute Utility could be a great resource. As I described in an earlier post, running simulations can be a challenge with limited computer resources. Rather than waiting hours for my computers to work through thousands of iterations, I could pay Sun $1, which would likely be sufficient for the scale of my work. This would be well worth the investment! I spend a single dollar and quickly get the results I need for my research or clients.

Continue reading →

Automator, Transmit, and Backup

The Strongspace weblog has a useful post about using Transmit and Automator to make backups. One challenge with this approach is backing up files scattered throughout your home folder. The solution is the “Follow symbolic links” option when mirroring. I created a backup folder and populated it with aliases to the files I’m interested in backing up. Mirroring this folder to Strongspace copies the files to the server. The other option is to use the “Skip files with specified names” feature, but this rapidly filled up for me.

Continue reading →

Remote data analysis

My six-year old laptop is incredibly slow, particularly when analysing data. Unfortunately, analysing data is my job, so this represents a problem. We have a new and fast desktop at home, but I can’t monopolise its use and it would negate the benefits of mobility. Fortunately, with the help of Emacs and ESS there is a solution. I write my R code on the laptop and evaluate the code on the desktop, which sends the responses and plots back to the laptop.

Continue reading →

Rails, sqlite3, and id=0

I’ve spent the last few days struggling with a problem with Plantae’s rails code. I was certain that code like this should work: class ContentController 'show', :id = @plant.id else ... end end def show @plant = Plant.find(params[:id]) end ... end These statements should create a new plant and then redirect to the show method which sends the newly created plant to the view. The problem was that rails insisted on asking sqlite3 for the plant with id=0, which cannot exist.

Continue reading →

Taxonomy release

Plantae now supports the addition and updating of species names and families. A rather important first step. Now onto adding character data to make the site actually useful.

Breakpoint regression

In my investigations of ovule fates, I’ve needed to estimate regression parameters from discontinuous functions. A general term for such estimates is breakpoint regression. OFStatisticalEstimates.pdf demonstrates an approach using R for such estimates in the context of seed-ovule ratios. The code includes a mechanism for generating seed-ovule data that illustrate the types of functions that need to be considered.

Managing Email

I recently lost control of my email. The combination of mailing lists, alerts, table of content notifications, and actual email from friends and colleagues was reaching a few hundred emails a day. The insanity had to stop! Here’s how I regained control. Goals Before describing my solution, let’s consider what a good email system should provide? Notifications of relevant new messages. A process for keeping track of important and unanswered messages.

Continue reading →

iTunes remote control

The current setup at home is that I’ve added all of our music (several thousand songs) to our Mac Mini and then send it through AirTunes to the home stereo. The complication is that the stereo and computer are at opposite ends of the house. Ideally, I can use my iBook to control the Mac Mini without needing to walk down the hall, but how? One solution is to use VNC, which allows complete control of the Mac Mini.

Continue reading →

Combining pdf files

Recently, I needed to combine several pdf files into one. The The Tao of Mac has a discussion of how to do this and I’m posting the code I used here so that I can find it again later. gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf -c save pop -f *.pdf Running this from a directory containing only the pdfs to be combined produces out.pdf.

Vector assignment in R

As I use R for data analysis and simulations, I become more comfortable and proficient with the R/S syntax and style of programming. One important insight is the use of vector assignments in simulations. I have often read that using such assignments is the preferred method, but until recently I had not realised the importance of this statement. To illustrate the use of vector assignments and their advantages, consider two models of the style illustrated below:

Continue reading →