Hey mate, here’s a little tutorial on how to do the analysis with your data. First up we need to setup the work space. First up we’ll load in the libraries we need. In this case just ggplot2. Then we’ll set the working directory to where our folder is.
# load libraries
library(ggplot2)
# set your working directory
setwd("~/Development/other_people_help/BIOL703/James/analysis")
First up I saved the data as a csv
file using Excel. To do this I pasted the primary data to a new sheet and saved as
a csv
file. Look at the file called dataset.csv
I have sent you. Okay first thing we need to do is read it into R. We do this using the function read.csv()
.
# read in your dataset
raw.data <- read.csv('dataset.csv')
Okay, now we need to reorganise the data to make it easier to analyse in R. It pays to think about how you should organise the data during your collection as with some thought before hand you can set yourself up to skip this step. Regardless it’s pretty easy to transform data in R so if it’s easier considerably to record it in one way it’s probably still better to do that. First up let’s look at the data
raw.data
## Treatment Week.2 Week.3 Week.4
## 1 Fertiliser 52 47 35
## 2 Fertiliser 18 15 12
## 3 Fertiliser 30 23 28
## 4 Fertiliser 23 21 20
## 5 Fertiliser 26 26 27
## 6 Fertiliser 32 35 19
## 7 Fertiliser 28 26 24
## 8 Fertiliser 45 33 28
## 9 Fertiliser 41 45 35
## 10 Fertiliser 29 26 18
## 11 Fertiliser 22 18 10
## 12 Fertiliser 25 24 22
## 13 Fertiliser 30 30 25
## 14 Fertiliser 32 30 31
## 15 Fertiliser 28 27 18
## 16 Rain 56 60 64
## 17 Rain 40 43 43
## 18 Rain 29 26 30
## 19 Rain 42 44 46
## 20 Rain 30 36 37
## 21 Rain 31 32 34
## 22 Rain 29 35 41
## 23 Rain 28 28 32
## 24 Rain 30 36 39
## 25 Rain 25 29 36
## 26 Rain 27 26 34
## 27 Rain 30 35 43
## 28 Rain 29 31 34
## 29 Rain 31 39 42
## 30 Rain 28 28 30
## 31 Tap 30 32 32
## 32 Tap 32 41 40
## 33 Tap 36 40 41
## 34 Tap 28 28 32
## 35 Tap 41 41 45
## 36 Tap 34 37 28
## 37 Tap 38 39 40
## 38 Tap 36 35 35
## 39 Tap 37 38 43
## 40 Tap 21 27 30
## 41 Tap 33 34 35
## 42 Tap 46 49 55
## 43 Tap 24 30 34
## 44 Tap 47 45 50
## 45 Tap 27 25 29
Okay so we have the data in a format with 4 columns [treatment, Week.2, Week.3, Week.4]
. For doing our analysis it will be much easier if we can have time
as a column so we effectively want to stack the Week
columns on top of each other. Let’s do that now. Also it can be useful to have an id
column so we know which plant is which across the analysis.
# make a reorganised data frame
df <- data.frame(id = rep(c(1:45),3),
treat = rep(raw.data$Treatment,3),
time = c(rep(1,45),rep(2,45),rep(3,45)),
leaf_size = c(raw.data$Week.2, raw.data$Week.3, raw.data$Week.4))
Okay so we made an id column using rep(c(1:45),3)
which repeats the numbers 1 -> 45
3 times. We have then repeated the treatment column 3 times rep(raw.data$Treatment,3)
and repeated the time
values c(rep(1,45),rep(2,45),rep(3,45))
. Finally I have stacked the leaf_size
data using c(raw.data$Week.2, raw.data$Week.3, raw.data$Week.4)
.
Here’s what we have now..
df
## id treat time leaf_size
## 1 1 Fertiliser 1 52
## 2 2 Fertiliser 1 18
## 3 3 Fertiliser 1 30
## 4 4 Fertiliser 1 23
## 5 5 Fertiliser 1 26
## 6 6 Fertiliser 1 32
## 7 7 Fertiliser 1 28
## 8 8 Fertiliser 1 45
## 9 9 Fertiliser 1 41
## 10 10 Fertiliser 1 29
## 11 11 Fertiliser 1 22
## 12 12 Fertiliser 1 25
## 13 13 Fertiliser 1 30
## 14 14 Fertiliser 1 32
## 15 15 Fertiliser 1 28
## 16 16 Rain 1 56
## 17 17 Rain 1 40
## 18 18 Rain 1 29
## 19 19 Rain 1 42
## 20 20 Rain 1 30
## 21 21 Rain 1 31
## 22 22 Rain 1 29
## 23 23 Rain 1 28
## 24 24 Rain 1 30
## 25 25 Rain 1 25
## 26 26 Rain 1 27
## 27 27 Rain 1 30
## 28 28 Rain 1 29
## 29 29 Rain 1 31
## 30 30 Rain 1 28
## 31 31 Tap 1 30
## 32 32 Tap 1 32
## 33 33 Tap 1 36
## 34 34 Tap 1 28
## 35 35 Tap 1 41
## 36 36 Tap 1 34
## 37 37 Tap 1 38
## 38 38 Tap 1 36
## 39 39 Tap 1 37
## 40 40 Tap 1 21
## 41 41 Tap 1 33
## 42 42 Tap 1 46
## 43 43 Tap 1 24
## 44 44 Tap 1 47
## 45 45 Tap 1 27
## 46 1 Fertiliser 2 47
## 47 2 Fertiliser 2 15
## 48 3 Fertiliser 2 23
## 49 4 Fertiliser 2 21
## 50 5 Fertiliser 2 26
## 51 6 Fertiliser 2 35
## 52 7 Fertiliser 2 26
## 53 8 Fertiliser 2 33
## 54 9 Fertiliser 2 45
## 55 10 Fertiliser 2 26
## 56 11 Fertiliser 2 18
## 57 12 Fertiliser 2 24
## 58 13 Fertiliser 2 30
## 59 14 Fertiliser 2 30
## 60 15 Fertiliser 2 27
## 61 16 Rain 2 60
## 62 17 Rain 2 43
## 63 18 Rain 2 26
## 64 19 Rain 2 44
## 65 20 Rain 2 36
## 66 21 Rain 2 32
## 67 22 Rain 2 35
## 68 23 Rain 2 28
## 69 24 Rain 2 36
## 70 25 Rain 2 29
## 71 26 Rain 2 26
## 72 27 Rain 2 35
## 73 28 Rain 2 31
## 74 29 Rain 2 39
## 75 30 Rain 2 28
## 76 31 Tap 2 32
## 77 32 Tap 2 41
## 78 33 Tap 2 40
## 79 34 Tap 2 28
## 80 35 Tap 2 41
## 81 36 Tap 2 37
## 82 37 Tap 2 39
## 83 38 Tap 2 35
## 84 39 Tap 2 38
## 85 40 Tap 2 27
## 86 41 Tap 2 34
## 87 42 Tap 2 49
## 88 43 Tap 2 30
## 89 44 Tap 2 45
## 90 45 Tap 2 25
## 91 1 Fertiliser 3 35
## 92 2 Fertiliser 3 12
## 93 3 Fertiliser 3 28
## 94 4 Fertiliser 3 20
## 95 5 Fertiliser 3 27
## 96 6 Fertiliser 3 19
## 97 7 Fertiliser 3 24
## 98 8 Fertiliser 3 28
## 99 9 Fertiliser 3 35
## 100 10 Fertiliser 3 18
## 101 11 Fertiliser 3 10
## 102 12 Fertiliser 3 22
## 103 13 Fertiliser 3 25
## 104 14 Fertiliser 3 31
## 105 15 Fertiliser 3 18
## 106 16 Rain 3 64
## 107 17 Rain 3 43
## 108 18 Rain 3 30
## 109 19 Rain 3 46
## 110 20 Rain 3 37
## 111 21 Rain 3 34
## 112 22 Rain 3 41
## 113 23 Rain 3 32
## 114 24 Rain 3 39
## 115 25 Rain 3 36
## 116 26 Rain 3 34
## 117 27 Rain 3 43
## 118 28 Rain 3 34
## 119 29 Rain 3 42
## 120 30 Rain 3 30
## 121 31 Tap 3 32
## 122 32 Tap 3 40
## 123 33 Tap 3 41
## 124 34 Tap 3 32
## 125 35 Tap 3 45
## 126 36 Tap 3 28
## 127 37 Tap 3 40
## 128 38 Tap 3 35
## 129 39 Tap 3 43
## 130 40 Tap 3 30
## 131 41 Tap 3 35
## 132 42 Tap 3 55
## 133 43 Tap 3 34
## 134 44 Tap 3 50
## 135 45 Tap 3 29
Let’s make a few plots to see what’s going on. First up let’s plot a line plot to see how the groups have changed over time. We will do that using ggplot.
# make a line graph
ggplot(df, aes(x=time, y=leaf_size, colour=treat)) +
geom_point() +
stat_smooth(data=subset(df, treat == "Rain"), method = "lm") +
stat_smooth(data=subset(df, treat == "Tap"), method = "lm") +
stat_smooth(data=subset(df, treat == "Fertiliser"), method = "lm")
Looks like Rain
and Tap
have varied very little but Fertiliser
has changed actually reduced in size!
Next up let’s make a boxplot of the final measurements.
# make a boxplot of final week
boxplot(leaf_size ~ treat, df[df$time == 3,])
Now let’s make a linear regression model to test statistically the difference between groups. This is very easy in R.
# linear regression model
fit1 <- lm(formula = leaf_size ~ time*treat, data = df)
# run anova
anova(fit1)
## Analysis of Variance Table
##
## Response: leaf_size
## Df Sum Sq Mean Sq F value Pr(>F)
## time 1 27.8 27.78 0.4290 0.513653
## treat 2 2033.5 1016.76 15.7023 7.88e-07 ***
## time:treat 2 817.6 408.81 6.3135 0.002422 **
## Residuals 129 8353.0 64.75
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I’ll leave you to interpret these results.
Also it can be useful to look at which of the groups significantly differ from each other (rather than just if certain factors do). We can do this by doing a Tukey analysis.
# post hoc analysis
Tukey <- TukeyHSD(x=aov(fit1))
## Warning in replications(paste("~", xx), data = mf): non-factors ignored:
## time
## Warning in replications(paste("~", xx), data = mf): non-factors ignored:
## time, treat
## Warning in TukeyHSD.aov(x = aov(fit1)): 'which' specified some non-factors
## which will be dropped
Tukey
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = fit1)
##
## $treat
## diff lwr upr p adj
## Rain-Fertiliser 7.9777778 3.955414 12.000142 0.0000193
## Tap-Fertiliser 8.4666667 4.444303 12.489030 0.0000057
## Tap-Rain 0.4888889 -3.533475 4.511253 0.9552671
This says Rain and Tap varied very little from each other while Fertiliser differed from both.
Finally it can be useful to visualise our post-hoc Tukey analysis. We will do this using a script I have previously written which we will load using the source()
command. This will pull a function I have defined from another R-script.
# load the function
source('tukey_plot.R')
# use my fancy function to visulise the Tukey anaylsis
plot.tukeyGroup(Tukey, "leaf_size", "treat")
So there ya have it. Also may be worth looking at some of these analyses and visualisations using difference in growth between the first and last measurements but I will leave that to you. Good luck James!