## Analysis of Count Data

When response variables are composed of counts, the standard statistical methods that rely on the normal distribution are no longer applicable. Count data are comprised of positive integers and, often, many zeros. For such data, we need statistics based on Poisson or binomial distributions. I’ve spent the past few weeks analysing counts from hundreds of transects and, as is typical, a particular challenge was determining the appropriate packages to use for R. Here’s what I’ve found so far.

The first step is to get an idea of the dispersion of data points:

```
Means <- tapply(y, list(x1, x2), mean)
Vars <- tapply(y, list(x1, x2), var)
plot(Means, Vars, xlab="Means", ylab="Variances")
abline(a=0, b=1)
```

For the Poisson distribution, the mean is equal to the variance. So, we expect the points to lie along the solid line added to the plot. If the points are overdispersed, a negative binomial link function may be more appropriate. The `pscl`

library provides a function to test this:

```
library(pscl)
model.nb <- glm.nb(y ~ x, data=data)
odTest(model.nb)
summary(model.nb)
```

If the `odTest`

function rejects the null model, then the data are overdispersed relative to a Poisson distribution. One particularly useful function is `glmmPQL`

from the `MASS`

library. This function allows for random intercepts and combined with the negative.binomial function of the same library, you can fit a negative binomial GLMM:

```
model.glmm.nb <- glmmPQL(y ~ x1 + x2,
random= ~ 1|transect, data=data,
family=negative.binomial(model.nb$theta))
```

In this case, I use the Θ estimated from the `glm.nb`

function in the `negative.binomial`

call. Also useful are the `zeroinfl`

function of the `pscl`

library for fitting zero-inflated Poisson or negative binomial models and the `geeglm`

function of `geepack`

for fitting generalized estimating equations for repeated measures. Finally, `fitdistr`

from `MASS`

allows for estimating the parameters of different distributions from empirical data.