The Ecological Detective by Ray Hilborn and Marc Mangel is an excellent source for learning how to analyse ecological data with sophistication. Traditionally, ecological data is analysed from the binary perspective of hypothesis testing. The goal of such testing is to either accept or reject a null hypothesis. Although it is well entrenched in ecological training and publication, this hypothesis testing has repeatedly been attacked by statisticians and many ecologists.
Without entering into this debate now, Hilborn and Mangel present an alternative of constructing models of the biological system of interest and then testing the model with collected data. This approach offers a much more nuanced and powerful way of understanding ecological processes. No longer is the ecology simply used to accept or reject a null hypothesis. Rather, a deeper understanding is required that leads to greater insights into the process.
As part of reading through the Ecological Detective, I worked through the pseudo-code examples provided and implemented them in Mathematica. Iββve decided to make these files available: EcologicalDetective.zip. My hope is that someone else will find them useful. I would be particularly interested in discussing these files and the book with anyone interested. I should point out that this was my first use of Mathematica, so I would appreciate any feedback on how to use it more effectively.
In some recent research (http://public.me.com/mroutley/SIandDichogamy.pdf) I had to make inferences about families based on character states of the species within the family. One approach is to use a simple majority rule. For example, if more than half of the species possess character state x rather than y, then the family can be described as x. However, this approach seemed rather liberal, which led to a 2/3 majority criterion: if more than 2/3 of the species are x, the family is x; If less than 1/3 is x the family is y; otherwise the family is ambiguous.
A significant drawback to arbitrarily creating such criteria is that I had no idea what the consequences were for making Type I and Type II errors. Presumably, as the criterion becomes more stringent, Type I errors are less likely, but such uncertainty is not comforting. I decided a better approach would be to attempt a simulation study using different criteria coupled with a more sophisticated character state reconstruction algorithm.
The general approach was to create a family of twenty species and randomly assign a fixed proportion of each species one of two possible character states. The fixed probability chosen represented the decision criteria to be evaluated. For example, a 51% proportion is equivalent to the simple majority criterion. I then randomly generated 5,000 phylogenetic tree topologies for the family. To evaluate any given decision criterion, I compared the family characterization from the decision criterion to the ancestral-state reconstruction from Schluter et alβs maximum-likelihood analysis. The idea is that the maximum-likelihood reconstruction should be reasonably accurate, since it incorporates tree topology into its calculations. If the decision criterion and maximum-likelihood approaches yield similar answers, the decision criterion may be a good choice for describing families in the absence of species-level phylogenetic resolution.
I tested three decision criteria. Their results are: an 80% criterion was 98.1% accurate, 65% was 91.6%, and 55% was 60.8%. Clearly more stringent criteria are most similar to the more sophisticated maximum-likelihood analyses. However, stringency does exclude more data from the analysis as more families become ambiguously coded. A proper trade-off between stringency and sample size is required to make the best use of data.
I have been working through my references and papers trying to regain some control over the literature. Being reintroduced to the tedium of reference management, it seems like there must be a better way to catalogue and organize this important component of research. Ideally, with the Internet and some good citation support from publishers, I would never have to type a citation β just automagically download whatever I need. Obviously this is not currently available.
I use BibDesk for my reference management and the author has some interesting ideas about sharing reference databases easily among colleagues. In this spirit of sharing, Iββve decided to make my reference database available here. It is in BibTeX format, which most useful reference management software should recognize. As BibDesk matures I hope to make this database accessible in a more useful format (i.e., automatic synchronization). Until then I will update the publicly available database as often as possible. Ideally the database will become a group effort, maintained and expanded by whoever uses it. If you are interested in participating, let me know. There are some fields in the database that may not be useful. In particular, there is a link to the PDF location on my harddrive. I considered transfering these links and associated files to the Internet as well. However, there are copyright concerns with such a setup that need to be considered.
These data are the average seed set estimates for dichogamous and adichogamous Chamerion angustifolium at different inflorescence sizes.
Format:
maternalID: Identification code for the maternal plant (i.e., grandmother of the counted seeds).
individualID: Identification code of the plant.
array#: The array identification number.
dichogamyType: Indicates if the plant was dichogamous.
flowerPosition: Flowers were sampled from either the bottom or top of the inflorescence.
inflorescenceSize: The number of open flowers on each plant in the array.
seedCount: Number of full seeds
notSeedCount: Number of aborted seeds
Citation:
Routley, M.B. & B.C. Husband. 2003. The effect of protandry on siring success in Chamerion angustifolium (Onagraceae) with different inflorescence sizes. Evolution, 57: 240-248 PubMedPDF
These data are the average siring-success estimates for dichogamous and adichogamous Chamerion angustifolium. Siring success is estimated from the proportion of heterozygous progeny produced at the PGI locus. Dichogamy classes were homozygous for alternate PGI alleles, so that heterozygous progeny represent interclass pollen transfer.
Format:
Array: The array identification number.
DichogamyType: The dichogamy status of the plants in the array.
FlowerSize: The number of open flowers on each plant in the array.
ProportionHeterozygousProgeny: The ratio of heterozygous to homozygous progeny at the PGI locus.
Citation:
Routley, M.B. & B.C. Husband. 2003. The effect of protandry on siring success in Chamerion angustifolium (Onagraceae) with different inflorescence sizes. Evolution, 57: 240-248 PubMedProtandryDiscounting.pdf
These data are pollen counts from stigmas after single bee visits in populations of Chamerion angustifolium from Montana. Pollen was quantified with a Beckman-Coulter Multisizer 3 particle counter.
Format:
Ploidy: The cytotype of sampled plant, either tetraploid or diploid.
AntherPresence: Some flowers had their anthers removed with forceps. Others were left intact.
PollenCount: The estimated amount of pollen deposited on the stigma.
These data are pollen counts from anthers before and after single bee visits in populations of Chamerion angustifolium from Montana. Pollen was quantified with a Beckman-Coulter Multisizer 3 particle counter.
Format:
Population: The population sampled, either tetraploid or diploid.
Sample: An identification code representing the plant and flower sampled.
StigmaPresence: Some flowers had their stigma and style removed with forceps. Others were left intact.
Visitation: Whether the anther was sampled before or after a single bee visit.
PollenCount: The estimated amount of pollen present in the flower.
Unrelated to my βofficialβ thesis work, I have been thinking about floral form and its influence on plant fitness. As an excuse to start a discussion with anyone interested, Iβve posted this overview of what I hope to work on next.
Plant mating systems control the transmission of genes between generations and, therefore, are a fundamental characteristic of populations. Since flowers are the reproductive organs of plants, floral form fundamentally influences plant mating systems. However, research into floral evolution has traditionally βatomizedβ flowers into conspicuous traits that are then investigated independently. Despite the undeniable success of this reductionist approach, an alternate research strategy called phenotypic integration, found at the intersection of morphometrics, quantitative genetics, reproductive ecology, and plant evolution, offers a unique perspective. Floral integration, in particular, asserts that the variance-covariance structure of entire flowers, rather than mean values of individual traits, may be an important target for selection. This is especially relevant for animal-pollinated, hermaphroditic flowers (i.e., most angiosperms) in which the male and female sexual organs must be positioned precisely within the path of pollen movement. Consequently, I expect high integration for anther and stigma placement relative to, for example, vegetative characters. After a long period of neglect, floral integration is beginning to receive more attention. To date, most of this research has focussed on quantifying the magnitude of integration, whereas the evolutionary significance of variation in floral integration remains an open question.