BIOL703 - James

Hey James, your experiment looks really neat. So here’s my interpretation to make sure I have it right. You’re testing the impact of nutrient-water, tap watre and rain water on sundew. You are looking at this through the degree of wilting at the end of your 28 day cycle and also by measuring the longest leaf once every week?

Example

Let’s walk through a simple example of the linear regression model. First let’s make some fake data, I’m only going to concentrate on the leaf growth here. For an example on how to incorporate your wilting data, again look at the examle I wrote for casey. We’re going to make a data frame with 45 rows (3*15 plants in each treatment group) and we’ll have four columns being plantID, treatment, time and leaf_size.

set.seed(420)
# generating a data.frame of fake growth data
df <- data.frame(plantID=rep(as.factor(1:45),4),
                 treat=rep(c(rep("rain",15),rep("tap",15),rep("nutr",15)),4),
                 time=as.numeric(c(rep(1,45),rep(2,45),rep(3,45),rep(4,45))),
                 leaf_size=c(rnorm(15, 6, 2),rnorm(15, 4, 2),rnorm(15, 2, 2),
                             rnorm(15, 12, 2),rnorm(15, 8, 2),rnorm(15, 4, 2),
                             rnorm(15, 18, 2),rnorm(15, 12, 2),rnorm(15, 6, 2),
                             rnorm(15, 24, 2),rnorm(15, 16, 2),rnorm(15, 8, 2)))

So in the above lines we’ve created a data frame with 180 rows and 4 columns. We’ve made 45 plant IDs (repeated 4 times for the 4 measurement intervals); three treatments; a time column indicating which measurement interval; and a leaf size column which is the dependent variable you’re measuring. There’s a few functions in there, rep() simply repeats whatever you want n times. rnorm() generates random data with a normal distribution (e.g. rnorm(15, 5, 2) means we are creating 15 data points with a mean around 5 and a standard deviation of 2). I’ve put a few in to simulate some growth data and made the results for each treatment type a little different so we can actually have a measurable effect in this example.

Okay so now we have that, let’s plot the data and see how it looks. We’ll need the library (ggplot2) to do this.

# loading ggplot2, in a actual script it's better practice to load all modules at the start
library(ggplot2)

# plot
ggplot(df, aes(x=time, y=leaf_size, colour=treat)) + 
  geom_point() +
  stat_smooth(data=subset(df, treat=="rain"), method = "lm") +
  stat_smooth(data=subset(df, treat=="tap"), method = "lm") +
  stat_smooth(data=subset(df, treat=="nutr"), method = "lm")

Anyway, the data looks good so let’s do an analysis. Because we have the leaf_size as a function of time we are going to build a linear regression model to analyse the data set and perform an analysis of variance (anova) on the three regression lines for treatment types. We fit the model with the following code..

# run a linear regression model
fit1 <- lm(formula = leaf_size ~ time*treat, data = df)

So we now have our model saved as fit1. Notice that the formula I’ve used is height ~ time*treat. In R formulas, a*b is actually short hand for a + b + a:b. This basically tells R that we are interested in: 1. the effect of time on leaf size 2. the effect of the treatment type on leaf size 3. the interaction effect of time:treat on leaf size (e.g. the effect of the treatments may change over time)

Cool so let’s run our ANOVA on our model. This is very easy.

anova(fit1)

## Analysis of Variance Table
## 
## Response: leaf_size
##             Df Sum Sq Mean Sq  F value    Pr(>F)    
## time         1 3793.1  3793.1 1043.864 < 2.2e-16 ***
## treat        2 2923.8  1461.9  402.315 < 2.2e-16 ***
## time:treat   2  561.0   280.5   77.189 < 2.2e-16 ***
## Residuals  174  632.3     3.6                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And there’s your result. Pretty clear results in this one so I probably cooked the books a little too hard when I made the dummy data.

Good luck mate.

BIOL703 - James

Statistical Tests

Example