Longform

expand.grid

Here’s a simple trick for creating experimental designs in R: use the function expand.grid.

A simple example is:

  treatments <- LETTERS[1:4]
  levels <- 1:3
  experiment <- data.frame(expand.grid(treatment=treatments, level=levels))

which produces:

   treatment level
1          A     1
2          B     1
3          C     1
4          D     1
5          A     2
6          B     2
7          C     2
8          D     2
9          A     3
10         B     3
11         C     3
12         D     3

Now, if you want to randomize your experimental treatments, try:

  experiment[sample(dim(experiment)[1]), ]

sample randomly chooses numbers from a vector the same length as the experiment data frame without replacement. The square brackets then use this random sample to subsample from the experiment data frame.

Burning your money

Burning our money by Marc Jaccard is a useful overview of some policy options for reducing greenhouse gas emissions. Unfortunately, this article is part of the Globe’s subscribers-only section, but his paper, Burning Our Money to Warm the Planet, is available from the CD Howe Institute.

Heart of the Matter

CBC’s Ideas has been running a series of shows on heart disease called “Heart of the Matter”. Episode 2 is particularly interesting from a statistical perspective, as the episode discusses several difficulties with the analysis of drug efficacy. Some highlights include:

Effect sizes Some of the best cited studies for the use of drugs to treat heart disease show a statistically significant effect of only a few percentage points improvement. Contrast this with a dramatic, vastly superior improvement from diet alone.

Response variables The focus of many drug studies has been on the reduction of cholesterol, rather than reductions in heart disease. Diet studies, for example, have shown dramatic improvements in reducing heart attacks while having no effect on cholesterol levels. Conversely, drug studies that show a reduction in cholesterol show no change in mortality rates.

Blocking of data Separate analyses of drug efficacy on female or elderly patients tend to show that drug therapy increases overall mortality. Lumping these data in with the traditional middle-aged male patients removes this effect and, instead, shows a significant decrease in heart disease with drug use.

The point here isn’t to make a comment on the influence of drug companies on medical research. Rather, such statistical concerns are common to all research disciplines. The primary concern of such analyses should be: what is the magnitude of the effect of a specific treatment on my variable of interest? The studies discussed in the Ideas program suggest that much effort has been devoted to detecting significant effects of drugs on surrogate response variables regardless of the size of the effect.

Plantae resurrected

Some technical issues coupled with my road-trip-without-a-laptop conspired to keep Plantae from working correctly. I’ve repaired the damage and isolated Plantae from such problems in the future. My apologies for the downtime.

Competitive Enterprise Institute

The Competitive Enterprise Institute has put out some ads that would be quite funny if they weren’t so misleading. I imagine that most viewers can see through the propaganda of the oil industry. Regardless, in the long-term, industries that invest in efficient and low-polluting technology will win and the members of CEI will be out of business.

CO2: They call it pollution. We call it life.

Google Importer

Google Importer is a useful Spotlight plugin that includes Google searches in Spotlight searches. This helps integrate your search into one interface, which seems like an obvious progression of Apple’s Spotlight technology.

Google calendar

Google Calendar has been featured in the news recently, and for good reason. Many of us have wanted access to a good online calendar program. One of my favourite features of Google Calendar is its integration with Gmail. If Gmail detects an event in your email message, a link appears that sends the information to Google Calendar. This is incredibly convenient and, seems to me, is one of the great promises of computers: reducing the tedious work that occupies much of our day.

An Inconvenient Truth

This looks like an incredibly important film. I hope it breaks all of the box office records.

Analysis of Count Data

When response variables are composed of counts, the standard statistical methods that rely on the normal distribution are no longer applicable. Count data are comprised of positive integers and, often, many zeros. For such data, we need statistics based on Poisson or binomial distributions. I’ve spent the past few weeks analysing counts from hundreds of transects and, as is typical, a particular challenge was determining the appropriate packages to use for R. Here’s what I’ve found so far.

The first step is to get an idea of the dispersion of data points:

Means <- tapply(y, list(x1, x2), mean)
Vars <- tapply(y, list(x1, x2), var)
plot(Means, Vars, xlab="Means", ylab="Variances")
abline(a=0, b=1)

For the Poisson distribution, the mean is equal to the variance. So, we expect the points to lie along the solid line added to the plot. If the points are overdispersed, a negative binomial link function may be more appropriate. The pscl library provides a function to test this:

library(pscl)
model.nb <- glm.nb(y ~ x, data=data)
odTest(model.nb)
summary(model.nb)

If the odTest function rejects the null model, then the data are overdispersed relative to a Poisson distribution. One particularly useful function is glmmPQL from the MASS library. This function allows for random intercepts and combined with the negative.binomial function of the same library, you can fit a negative binomial GLMM:

model.glmm.nb <- glmmPQL(y ~ x1 + x2,
                         random= ~ 1|transect, data=data,
                         family=negative.binomial(model.nb$theta))

In this case, I use the Θ estimated from the glm.nb function in the negative.binomial call. Also useful are the zeroinfl function of the pscl library for fitting zero-inflated Poisson or negative binomial models and the geeglm function of geepack for fitting generalized estimating equations for repeated measures. Finally, fitdistr from MASS allows for estimating the parameters of different distributions from empirical data.

Getting Evolution Up to Speed

There’s a common notion that our technology has slowed, or even stopped, human evolution. Evidently, this is not true as researchers have found many locations of strong positive selection in the human genome.

New evidence suggests humans are evolving more rapidly – and more recently – than most people thought possible. But for some radical evolutionists, Homo sapiens isn’t morphing quickly enough.

(Via Wired News.)

SSHRC and the theory of evolution

This is quite a surprise, McGill University’s Brian Alters had his proposal to study the effects of intelligent design on Canadian education rejected by the Canadian Social Sciences and Humanities Research Council. A stated reason for the rejection was that Alters did not provide “adequate justification for the assumption in the proposal that the theory of evolution, and not intelligent design theory, was correct.”

Granted, funding proposals can be rejected for a variety of reasons and the opinions of the reviewers do not necessarily reflect those of the funding body. However, the international media (Nature, The Guardian) are reporting on this and the suggestion is that the Canadian Government — or, at least, our funding agency for social science research — rejects evolution.

If SSHRC intends to pass judgement on scientific theories, they should review the evidence first. Biological evolution is a fact. Furthermore, the theory of evolution through natural selection has accumulated 150 years of empirical evidence and ranks as one of science’s greatest insights.

I hope that SSHRC clarifies their position on evolution soon.

(Via The Panda’s Thumb.)

Deschooling, Democratic Education, and Social Change

Matt Hern provides an interesting podcast available from Canadian Voices. He considers the 150 year history of compulsory state education and asks what benefits it has provided. The basic question is, Why do we send our kids to school? Although the answer seems obvious, he takes a different approach and argues for alternatives to public education. I’m always fascinated when someone argues against what I believe to be obvious. That’s when I learn the most about my biases.

Desktop Manager

I’m convinced that no computer display is large enough. What we need are strategies to better manage our computer workspace and application windows. Exposé and tabbed browsing are great features, but what I really want is the equivalent of a file folder. You put all of the relevant documents in a folder and then put it aside for when you need it. Once you’re ready, you open up the folder and are ready to go.

A feature that comes close to this is virtual desktops. I became enamoured with these while running KDE and have found them again for OS X with Desktop Manager. The idea is to create workspaces associated with specific tasks as a virtual desktop. You can then switch between these desktops as you move from one project to the next. So, for each of the projects I am currently working on, I’ve created a desktop with each application’s windows in the appropriate place. For a consulting project, I likely have Aquamacs running an R simulation with a terminal window open for updating my subversion repository. A project in the writing stage requires TeXShop and BibDesk, while a web-design project needs TextMate and Camino. Each of these workspaces is independent and I can quickly switch to them when needed. Since the applications are running with their windows in the appropriate place, I can quickly get back to work on the project at hand.

Application windows can be split across desktops and specific windows can be present across all desktops. I’ve also found it useful to have one desktop for communication (email, messaging, etc.) and another that has no windows open at all.

Managing project files

As I accumulate projects (both new and completed), the maintenance and storage of the project files becomes increasingly important. There are two important goals for a file structure: find things quickly and don’t lose anything. My current strategy is as follows:

Every project has a consistent folder structure:

|-- analysis
|-- data
|-- db
|-- doc
|-- fig
`-- utils

analysis holds the R source files of the analysis. These, typically, are experiments and snippets of code. The main analyses are in the doc directory.

data contains data files. Generally, these are csv files from clients.

db is for sql dumps of databases and sqlite files. I prefer working with databases over flat text files or Excel spreadsheets. These files are kept in the data folder and converted to sql databases for analyses.

doc holds the analysis and writeup as a Sweave file. This combines R and LaTeX to create a complete document from one source file.

fig is for diagrams and plots. Many of these are generated when processing the Sweave file, but some are constructed from other sources.

utils holds scripts and binaries that are required to run the analysis.

This entire directory structure is maintained with Subversion, so I have a record of changes and can access the project files from any networked computer.

Finally, once a project is complete, I archive the project and construct a sha checksum of the zip file.

openssl dgst -sha1 -out checksums.txt archive.zip

This checksum allows me to verify that the archive remains stable over time. Coupled with a good backup routine, this should keep the project files safe.

This may seem elaborate, but data and their analyses are too important to be left scattered around a laptop’s hard drive.

One other approach I’ve considered is using the R package structure to maintain projects. This is a useful guide, but the process seems too involved for my purposes.

Sun grid

Sun’s new Grid Compute Utility could be a great resource. As I described in an earlier post, running simulations can be a challenge with limited computer resources. Rather than waiting hours for my computers to work through thousands of iterations, I could pay Sun $1, which would likely be sufficient for the scale of my work. This would be well worth the investment! I spend a single dollar and quickly get the results I need for my research or clients.

Apparently, the American government has classified the Sun Grid as a weapon, so we can’t access it here in Canada, yet. I’m sure this will change shortly.

Automator, Transmit, and Backup

The Strongspace weblog has a useful post about using Transmit and Automator to make backups. One challenge with this approach is backing up files scattered throughout your home folder. The solution is the “Follow symbolic links” option when mirroring. I created a backup folder and populated it with aliases to the files I’m interested in backing up. Mirroring this folder to Strongspace copies the files to the server. The other option is to use the “Skip files with specified names” feature, but this rapidly filled up for me.

Remote data analysis

My six-year old laptop is incredibly slow, particularly when analysing data. Unfortunately, analysing data is my job, so this represents a problem. We have a new and fast desktop at home, but I can’t monopolise its use and it would negate the benefits of mobility.

Fortunately, with the help of Emacs and ESS there is a solution. I write my R code on the laptop and evaluate the code on the desktop, which sends the responses and plots back to the laptop. I can do this anywhere I have network access for the laptop and the results are quite quick.

There are a few tricks to the setup, especially if you want the plots sent back to the laptop’s screen, so I’ll document the necessary steps here.

First, you need to enable X11 forwarding on both the desktop and laptop computers (see the Apple Technote). Then start up X11 on the laptop.

Now, on the laptop, start up Emacs, open a file with R code and ssh to the desktop:

opt-x ssh -Y desktopip

Now, run R in the ssh buffer and link it to the R-code buffer

R opt-x ess-remote

That should do it. Now I have the speed of the desktop and the benefits of a laptop.

Rails, sqlite3, and id=0

I’ve spent the last few days struggling with a problem with Plantae’s rails code. I was certain that code like this should work:

class ContentController < ApplicationController
  def new
    @plant = Plant.new(params[:plant])
    if request.post? and @plant.save
      redirect_to :action => 'show', :id => @plant.id
    else
       ...
    end
  end

  def show
    @plant = Plant.find(params[:id])
  end
  ...
end

These statements should create a new plant and then redirect to the show method which sends the newly created plant to the view.

The problem was that rails insisted on asking sqlite3 for the plant with id=0, which cannot exist.

After a post to the rails mailing list and some thorough diagnostics I discovered this and realized I needed swig.

So, if anyone else has this problem:

sudo port install swig
sudo gem install sqlite3-ruby

is the solution.

The Globe and Mail-- Constitutional reform

How about a constitutional right to a healthy environment?

The Constitution of Canada guarantees its people important rights, such as freedom of religion, freedom of expression, fair trials, free elections, and language rights. But the effective exercise of these rights is impossible without safe water to drink, wholesome food to eat, and clean air to breathe. The right to a clean and healthy environment – arguably the most fundamental right of all – is conspicuously absent from our Charter of Rights and Freedoms, despite being a feature of the constitutions of many other nations.

Taxonomy release

Plantae now supports the addition and updating of species names and families. A rather important first step. Now onto adding character data to make the site actually useful.

plantae foundations

I’ve made a variety of important changes to plantae’s foundations. For the curious they are:

Now the plan is to start adding plant characters.

Sexual interference within flowers of Chamerion angustifolium

Hermaphroditism is prevalent in plants but may allow interference between male function (pollen removal and dispersal) and female function (pollen receipt and seed production) within a flower. Temporal or spatial segregation of gender within a hermaphroditic flower may evolve to reduce this interference and enhance male and female reproductive success. We tested this hypothesis using Chamerion angustifolium (Onagraceae), in which pollen removal (male) and pollen deposition (female) were measured directly on hermaphroditic and experimentally produced unisexual flowers. During a single flower visit in the field, bees deposited 159±24 (SE) pollen grains on a stigma and removed 1058±198 grains from each flower. Anther removal did not alter deposition rates. In the laboratory, bees removed 2669±273 pollen grains and deposited 209±72.3 cross-pollen and 120±28.4 facilitated self-pollen grains per visit. The presence of anthers significantly reduced cross-pollen deposition on the stigma. In contrast, pollen removal was not affected by presence of the pistil. These results suggest that within-flower interference affects female function and represents a fitness cost that can be reduced through temporal segregation of gender within the flower.

Citation ~ Download SegregationInterference.pdf

Plant breeding systems and pollen dispersal

This book of about 600 pages is written to provide practitioners of pollination biology with a broadly based source of methodologies as well as the basic conceptual background to aid in understanding. Thus, the book reflects the expertise of the assembled a team of internationally acclaimed scientists. Pollination biology enjoys over 200 years of scientific tradition. In recent years, the interdisciplinarity of pollination biology has become a model for integrating physics, chemistry, and biology into natural history, evolutionary and applied ecology. Pollination biology has developed its own techniques and approaches while incorporating ideas, methods, and technology from many facets of pure and applied science.

Citation

The effect of protandry on siring success in Chamerion angustifolium (Onagraceae) with different inflorescence sizes

Protandry, a form of temporal separation of gender within hermaphroditic flowers, may reduce the magnitude of pollen lost to selfing (pollen discounting) and also serve to enhance pollen export and outcross siring success. Because pollen discounting is strongest when selfing occurs between flowers on the same plant, the advantage of protandry may be greatest in plants with large floral displays. We tested this hypothesis with enclosed, artificial populations of Chamerion angustifolium (Onagraceae) by experimentally manipulating protandry (producing uniformly adichogamous or mixed protandrous and adichogamous populations) and inflorescence size (two-, six-, or 10-flowered inflorescences) and measuring pollinator visitation, seed set, female outcrossing rate, and outcross siring success. Bees spent more time foraging on and visited more flowers of larger inflorescences than small. Female outcrossing rates did not vary among inflorescence size treatments. However, seed set per fruit decreased with increasing inflorescence size, likely as a result of increased abortion of selfed embryos, perhaps obscuring the magnitude of geitonogamous selfing. Protandrous plants had a marginally higher female outcrossing rate than adichogamous plants, but similar seed set. More importantly, protandrous plants had, on average, a twofold siring advantage relative to adichogamous plants. However, this siring advantage did not increase linearly with inflorescence size, suggesting that protandry acts to enhance siring success, but not exclusively by reducing between-flower interference.

Citation ~ Download ProtandryDiscounting.pdf

Effect of population size on the mating system in a self-compatible, autogamous plant, Aquilegia canadensis (Ranunculaceae)

In self-compatible plants, small populations may experience reduced outcrossing owing to decreased pollinator visitation and mate availability. We examined the relation between outcrossing and population size in eastern Ontario populations of Aquilegia canadensis. Experimental pollinations showed that the species is highly self-compatible, and can achieve full seed-set in the absence of pollinators via automatic self-pollination. We estimated levels of outcrossing (t) and parental inbreeding coefficients (F) from allozyme variation in naturally pollinated seed families for 10 populations ranging in size from 32 to 750 reproductive individuals. The proportion of seeds produced through outcrossing was generally low (mean = 0.29 +/- 0.02 SE) and varied widely among populations (range = 0.00-0.83). Accordingly, estimates of F were large (mean = 0.26 +/- 0.05) and significantly greater than zero in seven populations. As expected, four small populations (N < 40) outcrossed less (0.17 +/- 0.03) than six large populations (N > 90; 0.38 +/- 0. 03). However, parental plants were not significantly more inbred in small than large populations (P = 0.18). There was no difference in the germination of seeds from hand self- and cross-pollinations. However, population genetic estimates of inbreeding depression for survival expressed from seed to reproductive maturity were very high (mean delta = 1 - relative fitness of selfed seed = 0.88 +/- 0.14). The combination of self-compatibility and automatic self-pollination makes the mating system of A. canadensis sensitive to variation in ecological factors that affect the likelihood of cross-pollination.

Citation ~ Download

Correlated evolution of dichogamy and self-incompatibility-- a phylogenetic perspective

Historically, dichogamy (the temporal separation of gender in flowering plants) has been interpreted as a mechanism for avoiding inbreeding. However, a comparative survey found that many dichogamous species are self-incompatible (SI), suggesting dichogamy evolved for other reasons, particularly reducing interference between male and female function. Here we re-examined the association between dichogamy and SI in a phylogenetic framework, and tested the hypothesis that dichogamy evolved to reduce interference between male and female function. Using paired comparisons and maximum-likelihood correlation analyses, we find that protandry (male function first) is positively correlated with the presence of SI and protogyny (female function first) with self-compatibility (SC). In addition, estimates of transition-rate parameters suggest strong selection for the evolution of SC in protogynous taxa and a constraint against transitions from protandry to protogyny in SC taxa. We interpret these results as support for protandry evolving to reduce interference and protogyny to reduce inbreeding.

Citation ~ SIandDichogamy.pdf

The consequences of clone size for paternal and maternal success in domestic apple (Malus x domestica)

Clonal growth in plants can increase pollen and ovule production per genet. However, paternal and maternal reproductive success may not increase because within-clone pollination (geitonogamy) can reduce pollen export to adjacent clones (pollen discounting) and pollen import to the central ramets (pollen limitation). We investigated the relationship between clone size and mating success using clones of Malus x domestica at four orchards (blocks of 1–5 rows of trees). For each block, we measured maternal function as fruit and seed set in all rows and paternal function as siring rate in the first row of the adjacent block. Expected relations between reproductive success and clone size were generated from simulations and data on pollen dispersal in this species. Siring rate per clone averaged 70\% and did not increase significantly with block size, consistent with simulations of pollen dispersal under pollen discounting. Simulations also indicated that the ratio of compatible to incompatible pollen received by a tree should decline with increased block size and from the periphery to the centre of blocks. However, no significant reductions in female function were detected among block sizes or within blocks. Our results suggest that paternal function may be more sensitive to the effects of clonality than female function.

Citation ~ Download AppleCloneSize.pdf

Responses to selection on male-phase duration in Chamerion angustifolium

Protandry (when male function precedes female) can enhance fitness by reducing selfing and increasing pollen export and outcrossed siring success. However, responses to selection on protandry may be constrained by genetic variation and correlations among floral traits. We examined these potential constraints in protandrous Chamerion angustifolium (Onagraceae) by estimating genetic variation in male-phase duration and associated floral traits using a paternal half-sib design and selection experiment. Narrow-sense heritability of male-phase duration was estimated as 0.23 (SE +- 0.04) and was positively correlated with floral display. The selection experiment shortened male-phase duration 0.8 SD from the parental average of 17.0 h and lengthened it by 2.0 SD. Furthermore, fixed floral longevity caused a negative association between male- and female-phase durations. These results suggest that selection on male-phase duration is not limited by genetic variation. However, changes in male-phase duration may influence pollinators through correlated changes in floral display and reduced opportunities for pollen receipt during female phase.

Citation ~ Download protandryHeritability.pdf

Beyond floricentrism -- the pollination function of inflorescences

Mating by outcrossing plants depends on the frequency and quality of interaction between pollen vectors and individual flowers. However, the historical focus of pollination biology on individual flowers (floricentrism) cannot produce a complete understanding of the role of pollination in plant mating, because mating is an aggregate process, which depends on the reproductive outcomes of all of a plant’s flowers. Simultaneous display of multiple flowers in an inflorescence increases a plant’s attractiveness to pollinators, which should generally enhance mating opportunities. However, whenever pollinators visit multiple flowers on an inflorescence, self-pollination among flowers can reduce the pollen available for export to other plants (pollen discounting) and increase the incidence of inbreeding depression for embryos and offspring. Therefore, the size of floral displays that maximizes mating frequency and quality generally balances the benefits of attractiveness against the costs of self-pollination. This balance can shift considerably if different flowers serve female and male functions at one time (sexual segregation) and flowers are arranged in inflorescences so that pollinators visit female flowers before male flowers. However, the effectiveness of sexual segregation depends on the extent to which a particular inflorescence architecture induces consistent movement patterns by pollinators. In general, the consistency of pollinator movement patterns varies with inflorescence architecture and differs between pollinator types. Such variation creates many options for the evolution of the diverse inflorescence characteristics within angiosperms, which can be appreciated only by moving beyond a floricentric perspective of the role of pollination in plant mating.

Citation Download

Pollen and ovule fates and reproductive performance by flowering plants

Pollen and ovules experience diverse fates during pollination, pollen-tube growth, fertilization, and seed development, which govern the male and female potential of flowering plants. This chapter identifies these fates and many of their interactions, and considers their theoretical implications for the evolution of pollen export and the production of selfed and outcrossed seeds. This analysis clarifies the importance of pollen quantity and quality for seed production, including the opportunity for poor pollen quality to cause misidentification of pollen limitation. Our analysis emphasizes the asymmetry of pollen and ovule fates and considers its consequences for reproductive evolution. We also identify ovule limitation as a constraint on seed production, which has paradoxically not been recognized before, but is an implicit assumption of previous theoretical analysis of mating-system evolution. Ovule limitation increases the diversity of possible reproductive policies. In addition to ovule limitation, we consider the implications of pollen and resource limitation for the evolution of self- and cross-fertilization. Resource limitation occurs only if plants produce more ovules than they can mature into seeds, which allows a mixture of selfing and outcrossing to be an optimal mating system in some circumstances. The chance of mixed mating being optimal is eroded by trade-offs between self- and cross-pollination, but they do not alter the optimal combination of selfing and outcrossing, should mixed mating be favoured. Our analysis illustrates the key role played by interactions between genetic and ecological influences on reproductive performance in the evolution of plant reproduction.

Citation