Analysis example

Hey mate, here’s a little tutorial on how to do the analysis with your data. First up we need to setup the work space. First up we’ll load in the libraries we need. In this case just ggplot2. Then we’ll set the working directory to where our folder is.

# load libraries
library(ggplot2)
# set your working directory
setwd("~/Development/other_people_help/BIOL703/James/analysis")

Reading in the data

First up I saved the data as a csv file using Excel. To do this I pasted the primary data to a new sheet and saved as a csv file. Look at the file called dataset.csv I have sent you. Okay first thing we need to do is read it into R. We do this using the function read.csv().

# read in your dataset
raw.data <- read.csv('dataset.csv')

Okay, now we need to reorganise the data to make it easier to analyse in R. It pays to think about how you should organise the data during your collection as with some thought before hand you can set yourself up to skip this step. Regardless it’s pretty easy to transform data in R so if it’s easier considerably to record it in one way it’s probably still better to do that. First up let’s look at the data

raw.data

##     Treatment Week.2 Week.3 Week.4
## 1  Fertiliser     52     47     35
## 2  Fertiliser     18     15     12
## 3  Fertiliser     30     23     28
## 4  Fertiliser     23     21     20
## 5  Fertiliser     26     26     27
## 6  Fertiliser     32     35     19
## 7  Fertiliser     28     26     24
## 8  Fertiliser     45     33     28
## 9  Fertiliser     41     45     35
## 10 Fertiliser     29     26     18
## 11 Fertiliser     22     18     10
## 12 Fertiliser     25     24     22
## 13 Fertiliser     30     30     25
## 14 Fertiliser     32     30     31
## 15 Fertiliser     28     27     18
## 16       Rain     56     60     64
## 17       Rain     40     43     43
## 18       Rain     29     26     30
## 19       Rain     42     44     46
## 20       Rain     30     36     37
## 21       Rain     31     32     34
## 22       Rain     29     35     41
## 23       Rain     28     28     32
## 24       Rain     30     36     39
## 25       Rain     25     29     36
## 26       Rain     27     26     34
## 27       Rain     30     35     43
## 28       Rain     29     31     34
## 29       Rain     31     39     42
## 30       Rain     28     28     30
## 31        Tap     30     32     32
## 32        Tap     32     41     40
## 33        Tap     36     40     41
## 34        Tap     28     28     32
## 35        Tap     41     41     45
## 36        Tap     34     37     28
## 37        Tap     38     39     40
## 38        Tap     36     35     35
## 39        Tap     37     38     43
## 40        Tap     21     27     30
## 41        Tap     33     34     35
## 42        Tap     46     49     55
## 43        Tap     24     30     34
## 44        Tap     47     45     50
## 45        Tap     27     25     29

Okay so we have the data in a format with 4 columns [treatment, Week.2, Week.3, Week.4]. For doing our analysis it will be much easier if we can have time as a column so we effectively want to stack the Week columns on top of each other. Let’s do that now. Also it can be useful to have an id column so we know which plant is which across the analysis.

# make a reorganised data frame
df <- data.frame(id = rep(c(1:45),3),
                 treat = rep(raw.data$Treatment,3), 
                 time = c(rep(1,45),rep(2,45),rep(3,45)), 
                 leaf_size = c(raw.data$Week.2, raw.data$Week.3, raw.data$Week.4))

Okay so we made an id column using rep(c(1:45),3) which repeats the numbers 1 -> 45 3 times. We have then repeated the treatment column 3 times rep(raw.data$Treatment,3) and repeated the time values c(rep(1,45),rep(2,45),rep(3,45)). Finally I have stacked the leaf_size data using c(raw.data$Week.2, raw.data$Week.3, raw.data$Week.4).

Here’s what we have now..

df

##     id      treat time leaf_size
## 1    1 Fertiliser    1        52
## 2    2 Fertiliser    1        18
## 3    3 Fertiliser    1        30
## 4    4 Fertiliser    1        23
## 5    5 Fertiliser    1        26
## 6    6 Fertiliser    1        32
## 7    7 Fertiliser    1        28
## 8    8 Fertiliser    1        45
## 9    9 Fertiliser    1        41
## 10  10 Fertiliser    1        29
## 11  11 Fertiliser    1        22
## 12  12 Fertiliser    1        25
## 13  13 Fertiliser    1        30
## 14  14 Fertiliser    1        32
## 15  15 Fertiliser    1        28
## 16  16       Rain    1        56
## 17  17       Rain    1        40
## 18  18       Rain    1        29
## 19  19       Rain    1        42
## 20  20       Rain    1        30
## 21  21       Rain    1        31
## 22  22       Rain    1        29
## 23  23       Rain    1        28
## 24  24       Rain    1        30
## 25  25       Rain    1        25
## 26  26       Rain    1        27
## 27  27       Rain    1        30
## 28  28       Rain    1        29
## 29  29       Rain    1        31
## 30  30       Rain    1        28
## 31  31        Tap    1        30
## 32  32        Tap    1        32
## 33  33        Tap    1        36
## 34  34        Tap    1        28
## 35  35        Tap    1        41
## 36  36        Tap    1        34
## 37  37        Tap    1        38
## 38  38        Tap    1        36
## 39  39        Tap    1        37
## 40  40        Tap    1        21
## 41  41        Tap    1        33
## 42  42        Tap    1        46
## 43  43        Tap    1        24
## 44  44        Tap    1        47
## 45  45        Tap    1        27
## 46   1 Fertiliser    2        47
## 47   2 Fertiliser    2        15
## 48   3 Fertiliser    2        23
## 49   4 Fertiliser    2        21
## 50   5 Fertiliser    2        26
## 51   6 Fertiliser    2        35
## 52   7 Fertiliser    2        26
## 53   8 Fertiliser    2        33
## 54   9 Fertiliser    2        45
## 55  10 Fertiliser    2        26
## 56  11 Fertiliser    2        18
## 57  12 Fertiliser    2        24
## 58  13 Fertiliser    2        30
## 59  14 Fertiliser    2        30
## 60  15 Fertiliser    2        27
## 61  16       Rain    2        60
## 62  17       Rain    2        43
## 63  18       Rain    2        26
## 64  19       Rain    2        44
## 65  20       Rain    2        36
## 66  21       Rain    2        32
## 67  22       Rain    2        35
## 68  23       Rain    2        28
## 69  24       Rain    2        36
## 70  25       Rain    2        29
## 71  26       Rain    2        26
## 72  27       Rain    2        35
## 73  28       Rain    2        31
## 74  29       Rain    2        39
## 75  30       Rain    2        28
## 76  31        Tap    2        32
## 77  32        Tap    2        41
## 78  33        Tap    2        40
## 79  34        Tap    2        28
## 80  35        Tap    2        41
## 81  36        Tap    2        37
## 82  37        Tap    2        39
## 83  38        Tap    2        35
## 84  39        Tap    2        38
## 85  40        Tap    2        27
## 86  41        Tap    2        34
## 87  42        Tap    2        49
## 88  43        Tap    2        30
## 89  44        Tap    2        45
## 90  45        Tap    2        25
## 91   1 Fertiliser    3        35
## 92   2 Fertiliser    3        12
## 93   3 Fertiliser    3        28
## 94   4 Fertiliser    3        20
## 95   5 Fertiliser    3        27
## 96   6 Fertiliser    3        19
## 97   7 Fertiliser    3        24
## 98   8 Fertiliser    3        28
## 99   9 Fertiliser    3        35
## 100 10 Fertiliser    3        18
## 101 11 Fertiliser    3        10
## 102 12 Fertiliser    3        22
## 103 13 Fertiliser    3        25
## 104 14 Fertiliser    3        31
## 105 15 Fertiliser    3        18
## 106 16       Rain    3        64
## 107 17       Rain    3        43
## 108 18       Rain    3        30
## 109 19       Rain    3        46
## 110 20       Rain    3        37
## 111 21       Rain    3        34
## 112 22       Rain    3        41
## 113 23       Rain    3        32
## 114 24       Rain    3        39
## 115 25       Rain    3        36
## 116 26       Rain    3        34
## 117 27       Rain    3        43
## 118 28       Rain    3        34
## 119 29       Rain    3        42
## 120 30       Rain    3        30
## 121 31        Tap    3        32
## 122 32        Tap    3        40
## 123 33        Tap    3        41
## 124 34        Tap    3        32
## 125 35        Tap    3        45
## 126 36        Tap    3        28
## 127 37        Tap    3        40
## 128 38        Tap    3        35
## 129 39        Tap    3        43
## 130 40        Tap    3        30
## 131 41        Tap    3        35
## 132 42        Tap    3        55
## 133 43        Tap    3        34
## 134 44        Tap    3        50
## 135 45        Tap    3        29

Plotting the data

Let’s make a few plots to see what’s going on. First up let’s plot a line plot to see how the groups have changed over time. We will do that using ggplot.

# make a line graph
ggplot(df, aes(x=time, y=leaf_size, colour=treat)) + 
  geom_point() +
  stat_smooth(data=subset(df, treat == "Rain"), method = "lm") +
  stat_smooth(data=subset(df, treat == "Tap"), method = "lm") +
  stat_smooth(data=subset(df, treat == "Fertiliser"), method = "lm")

Looks like Rain and Tap have varied very little but Fertiliser has changed actually reduced in size!

Next up let’s make a boxplot of the final measurements.

# make a boxplot of final week
boxplot(leaf_size ~ treat, df[df$time == 3,])

Analysis

Now let’s make a linear regression model to test statistically the difference between groups. This is very easy in R.

# linear regression model
fit1 <- lm(formula = leaf_size ~ time*treat, data = df)
# run anova
anova(fit1)

## Analysis of Variance Table
## 
## Response: leaf_size
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## time         1   27.8   27.78  0.4290 0.513653    
## treat        2 2033.5 1016.76 15.7023 7.88e-07 ***
## time:treat   2  817.6  408.81  6.3135 0.002422 ** 
## Residuals  129 8353.0   64.75                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I’ll leave you to interpret these results.

Post-Hoc analysis

Also it can be useful to look at which of the groups significantly differ from each other (rather than just if certain factors do). We can do this by doing a Tukey analysis.

# post hoc analysis
Tukey <- TukeyHSD(x=aov(fit1))

## Warning in replications(paste("~", xx), data = mf): non-factors ignored:
## time

## Warning in replications(paste("~", xx), data = mf): non-factors ignored:
## time, treat

## Warning in TukeyHSD.aov(x = aov(fit1)): 'which' specified some non-factors
## which will be dropped

Tukey

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = fit1)
## 
## $treat
##                      diff       lwr       upr     p adj
## Rain-Fertiliser 7.9777778  3.955414 12.000142 0.0000193
## Tap-Fertiliser  8.4666667  4.444303 12.489030 0.0000057
## Tap-Rain        0.4888889 -3.533475  4.511253 0.9552671

This says Rain and Tap varied very little from each other while Fertiliser differed from both.

Bonus visulisation

Finally it can be useful to visualise our post-hoc Tukey analysis. We will do this using a script I have previously written which we will load using the source() command. This will pull a function I have defined from another R-script.

# load the function
source('tukey_plot.R')

# use my fancy function to visulise the Tukey anaylsis 
plot.tukeyGroup(Tukey, "leaf_size", "treat")

So there ya have it. Also may be worth looking at some of these analyses and visualisations using difference in growth between the first and last measurements but I will leave that to you. Good luck James!