Hey mate, here’s a little tutorial on how to do the analysis with your data. First up we need to setup the work space. First up we’ll load in the libraries we need. In this case just ggplot2. Then we’ll set the working directory to where our folder is.

# load libraries
# set your working directory

Reading in the data

First up I saved the data as a csv file using Excel. To do this I pasted the primary data to a new sheet and saved as a csv file. Look at the file called dataset.csv I have sent you. Okay first thing we need to do is read it into R. We do this using the function read.csv().

# read in your dataset
raw.data <- read.csv('dataset.csv')

Okay, now we need to reorganise the data to make it easier to analyse in R. It pays to think about how you should organise the data during your collection as with some thought before hand you can set yourself up to skip this step. Regardless it’s pretty easy to transform data in R so if it’s easier considerably to record it in one way it’s probably still better to do that. First up let’s look at the data

##     Treatment Week.2 Week.3 Week.4
Plotting the data

Let’s make a few plots to see what’s going on. First up let’s plot a line plot to see how the groups have changed over time. We will do that using ggplot.

# make a line graph
ggplot(df, aes(x=time, y=leaf_size, colour=treat)) + 
  geom_point() +
  stat_smooth(data=subset(df, treat == "Rain"), method = "lm") +
  stat_smooth(data=subset(df, treat == "Tap"), method = "lm") +
  stat_smooth(data=subset(df, treat == "Fertiliser"), method = "lm")

Looks like Rain and Tap have varied very little but Fertiliser has changed actually reduced in size!

Next up let’s make a boxplot of the final measurements.

# make a boxplot of final week
boxplot(leaf_size ~ treat, df[df$time == 3,])


Now let’s make a linear regression model to test statistically the difference between groups. This is very easy in R.

# linear regression model
fit1 <- lm(formula = leaf_size ~ time*treat, data = df)
# run anova
## Analysis of Variance Table
## Response: leaf_size
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## time         1   27.8   27.78  0.4290 0.513653    
## treat        2 2033.5 1016.76 15.7023 7.88e-07 ***
## time:treat   2  817.6  408.81  6.3135 0.002422 ** 
## Residuals  129 8353.0   64.75                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I’ll leave you to interpret these results.

Post-Hoc analysis

Also it can be useful to look at which of the groups significantly differ from each other (rather than just if certain factors do). We can do this by doing a Tukey analysis.

# post hoc analysis
Tukey <- TukeyHSD(x=aov(fit1))
## Warning in replications(paste("~", xx), data = mf): non-factors ignored:
## time
## Warning in replications(paste("~", xx), data = mf): non-factors ignored:
## time, treat
## Warning in TukeyHSD.aov(x = aov(fit1)): 'which' specified some non-factors
## which will be dropped
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## Fit: aov(formula = fit1)
## $treat
##                      diff       lwr       upr     p adj
## Rain-Fertiliser 7.9777778  3.955414 12.000142 0.0000193
## Tap-Fertiliser  8.4666667  4.444303 12.489030 0.0000057
## Tap-Rain        0.4888889 -3.533475  4.511253 0.9552671

This says Rain and Tap varied very little from each other while Fertiliser differed from both.

Bonus visulisation

Finally it can be useful to visualise our post-hoc Tukey analysis. We will do this using a script I have previously written which we will load using the source() command. This will pull a function I have defined from another R-script.

# load the function

# use my fancy function to visulise the Tukey anaylsis 
plot.tukeyGroup(Tukey, "leaf_size", "treat")

So there ya have it. Also may be worth looking at some of these analyses and visualisations using difference in growth between the first and last measurements but I will leave that to you. Good luck James!