Again this is easy as it’s already been implied by the variables chosen. Our hypotheses are..
Participants job level is related to leadership qualities
Participants gender is related to leadership qualities
And the respective nulls..
There is no relationship between job level and leadership qualities
There is no relationship between gender and leadership qualities
Because we’ve already gone through it twice now, we’ll skip over this part quite quickly. Let’s load in the data and process it as we did before.
# setwd
setwd('~/Documents/upwork/Tammy_Ferrante')
# load in data (df stands for "data frame")
df <- read.csv('LeadershipData.csv')
# pre processing
# count how many rows have zero variance
length(df[df$ZeroVar == 1,]$ResponseId)
## [1] 13
# drop these rows from the data frame
df <- df[!df$ZeroVar == 1,]
# how many rows are we removing?
length(df[!complete.cases(df), ]$ResponseId)
## [1] 71
# remove the rows
df <- df[complete.cases(df),]
# since we have removed a lot of rows it is sometimes useful to reset the index
row.names(df) <- NULL
# make collarcolor a factor
df$CollarColor <- as.factor(df$CollarColor)
df$Gender <- as.factor(df$Gender)
Let’s check the CollarColor groups and make sure there’s enough data in each
summary(df$Gender)
## 1 2
## 573 672
summary(df$CollarColor)
## 1 2 3 4 5
## 204 206 328 496 11
Like last time, CollarColor 5 is pretty low (only 11 samples) so let’s drop it as it will likely skew the results.
df <- df[!df$CollarColor == 5,]
df$CollarColor <- droplevels(df$CollarColor)
row.names(df) <- NULL
Now let’s check our groups with a quick plot.
# divde graphics into 1x3 panel
op <- par(mfrow = c(1, 3))
# make Collar Color plots
boxplot(ValuesDomain~CollarColor, data=df, main='Leadership Values', xlab='Collar Color', ylab='Score')
boxplot(MediatorDomain~CollarColor, data=df, main='Mediator Skills', xlab='Collar Color', ylab='Score')
boxplot(FulfillmentDomain~CollarColor, data=df, main='Leadership Fulfillment', xlab='Collar Color', ylab='Score')
# make Gender plots
boxplot(ValuesDomain~Gender, data=df, main='Leadership Values', xlab='Gender', ylab='Score')
boxplot(MediatorDomain~Gender, data=df, main='Mediator Skills', xlab='Gender', ylab='Score')
boxplot(FulfillmentDomain~Gender, data=df, main='Leadership Fulfillment', xlab='Gender', ylab='Score')
Now let’s build our model so we can check assumptions and remove outliers. Notice we use cbind()
to merge the three dependent variables together to build the MANOVA. The onyl difference to the last one is that we are now including a second dependent variable (gender) and also looking at the interaction effect of CollarColor:Gender.
fit <- manova(cbind(ValuesDomain, MediatorDomain, FulfillmentDomain) ~ CollarColor*Gender, data = df)
Okay now let’s use Cook’s distance again to remove the outliers just like last time.
# calculate cooksd
cooksd <- cooks.distance(fit)
sample_size <- nrow(df)
influential <- as.numeric(1:length(cooksd[,1]))[(cooksd>4/sample_size)[,1] | (cooksd>4/sample_size)[,2] | (cooksd>4/sample_size)[,3]]
# drop rows
df <- df[-influential,]
# don't forget to reset the index!
row.names(df) <- NULL
print(paste(as.character(length(influential)), 'outliers removed'))
## [1] "117 outliers removed"
Alright, time to check the model assumptions. As before, the assumptions for MANOVA are exactly the same as ANOVA:
And like last time, the only difference is, checking the assumptions for a MANOVA can be a real pain in the ass! Because if you remember from part 2..
Reason being R does not have good diagnostic support for mlm objects. Consequently we need to make a model to test for each dependent variable and category! Luckily we’re only checking one categorical independent variable so that’s just three plots. We’ll do this quickly this time using the
plot(lm)
to quickly produce our diagnostics.
However first let’s check the independence of observations. Again like in the last exercise we’ll just make sure there’s no duplicate participants.
# How many duplicate IDs?
length(df[duplicated(df$ResponseId),]$ResponseID)
## [1] 0
Now let’s quickly make our models to check assumptions and produce the diagnostics.
fitVal <- lm(formula = ValuesDomain ~ CollarColor*Gender, data = df)
fitMed <- lm(formula = MediatorDomain ~ CollarColor*Gender, data = df)
fitFul <- lm(formula = FulfillmentDomain ~ CollarColor*Gender, data = df)
And make the diagnostic plots.
op <- par(mfrow = c(2,2))
plot(fitVal)
plot(fitMed)
plot(fitFul)
par(op)
Everything looks fine and nothing is being violated so now we can continue.
Now we get to run the MANOVA. First let’s look at the response overall and then break it down to each variable. First we need to refit the model (as we removed the outliers before).
fit <- manova(cbind(ValuesDomain, MediatorDomain, FulfillmentDomain) ~ CollarColor*Gender, data = df)
summary(fit)
## Df Pillai approx F num Df den Df Pr(>F)
## CollarColor 3 0.035675 4.4489 9 3327 8.113e-06 ***
## Gender 1 0.031122 11.8528 3 1107 1.209e-07 ***
## CollarColor:Gender 3 0.024277 3.0159 9 3327 0.001367 **
## Residuals 1109
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Well it’s pretty convincing that CollarColor
is having an affect of some kind and so is Gender. Additionally there’s also a significant interaction effect taking place between CollarColor and Gender. Let’s break it down further and look at each DV in isolation.
summary(aov(fit))
## Response ValuesDomain :
## Df Sum Sq Mean Sq F value Pr(>F)
## CollarColor 3 24.10 8.0336 7.4820 5.837e-05 ***
## Gender 1 26.92 26.9193 25.0709 6.424e-07 ***
## CollarColor:Gender 3 15.49 5.1639 4.8093 0.002475 **
## Residuals 1109 1190.76 1.0737
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response MediatorDomain :
## Df Sum Sq Mean Sq F value Pr(>F)
## CollarColor 3 12.52 4.1722 4.1085 0.0065343 **
## Gender 1 11.49 11.4857 11.3105 0.0007970 ***
## CollarColor:Gender 3 19.08 6.3617 6.2646 0.0003238 ***
## Residuals 1109 1126.19 1.0155
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response FulfillmentDomain :
## Df Sum Sq Mean Sq F value Pr(>F)
## CollarColor 3 13.82 4.6059 4.3572 0.0046345 **
## Gender 1 25.93 25.9328 24.5322 8.444e-07 ***
## CollarColor:Gender 3 18.37 6.1242 5.7934 0.0006268 ***
## Residuals 1109 1172.32 1.0571
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Clearly there is a strong effect here for both CollarColor and Gender. Additionally there is also a significant interaction effect between CollarColor and Gender for the each response. Time to break it down further with a post-hoc analysis.
Again, Tukey doesn’t support multiple dependent variables but that’s okay. We can just break down the responses to their individual regression values.
TukeyVal <- TukeyHSD(x=aov(fitVal))
TukeyMed <- TukeyHSD(x=aov(fitMed))
TukeyFul <- TukeyHSD(x=aov(fitFul))
Again let’s produce some nice post-hoc analyses using the function we wrote in the previous report.
# usually load all libraries at the start but this one I'm loading now to show where it's used.
library(multcompView)
plot.tukeyGroup <- function(TUKEY, DV, variable){
# I need to group the treatments that are not different each other together.
generate_label_df <- function(TUKEY, variable){
# Extract labels and factor levels from Tukey post-hoc
Tukey.levels <- TUKEY[[variable]][,4]
Tukey.labels <- data.frame(multcompLetters(Tukey.levels)['Letters'])
#I need to put the labels in the same order as in the boxplot :
Tukey.labels$treatment=rownames(Tukey.labels)
Tukey.labels=Tukey.labels[order(Tukey.labels$treatment) , ]
return(Tukey.labels)
}
# Apply the function on my dataset
LABELS=generate_label_df(TUKEY, variable)
# A panel of colors to draw each group with the same color :
my_colors=c( rgb(143,199,74,maxColorValue = 255),rgb(242,104,34,maxColorValue = 255),
rgb(111,145,202,maxColorValue = 255),rgb(254,188,18,maxColorValue = 255),
rgb(74,132,54,maxColorValue = 255),rgb(236,33,39,maxColorValue = 255),
rgb(165,103,40,maxColorValue = 255))
# Draw the basic boxplot
a=boxplot(df[[DV]] ~ df[[variable]], ylim=c(min(df[[DV]]), 1.1*max(df[[DV]])),
col=my_colors[as.numeric(LABELS[,1])] , ylab="Score" , main=DV)
# I want to write the letter over each box. Over is how high I want to write it.
over=0.1*max( a$stats[nrow(a$stats),] )
#Add the labels
text(c(1:nlevels(df[[variable]])), a$stats[nrow(a$stats),]+over, LABELS[,1],
col=my_colors[as.numeric(LABELS[,1])])
}
Now let’s apply it to our tests. Note that we do not need to do this for Gender as it has only two levels. These post hoc analyese are only useful if you have more than two levels in your factors.
plot.tukeyGroup(TukeyVal, "ValuesDomain", "CollarColor")
plot.tukeyGroup(TukeyMed, "MediatorDomain", "CollarColor")
plot.tukeyGroup(TukeyFul, "FulfillmentDomain", "CollarColor")
Now because we have a positive interaction effect, we know the effects of CollarColor operate differently for each gender. Let’s illustrate this in a plot so we can really get an idea of what is going on.
# first seperate the dataframes by Gender
df2 <- split(df, df$Gender)
# divde graphics into 1x3 panel
op <- par(mfrow = c(1, 3))
# make Collar Color plots for gender 1
print('Male')
## [1] "Male"
color = 'blue'
boxplot(ValuesDomain~CollarColor, data=df2[[1]], main='Leadership Values', xlab='Collar Color', ylab='Score', col=color)
boxplot(MediatorDomain~CollarColor, data=df2[[1]], main='Mediator Skills', xlab='Collar Color', ylab='Score', col=color)
boxplot(FulfillmentDomain~CollarColor, data=df2[[1]], main='Leadership Fulfillment', xlab='Collar Color', ylab='Score', col=color)
# make Collar Color plots for gender 2
print('Female')
## [1] "Female"
color = 'orange'
boxplot(ValuesDomain~CollarColor, data=df2[[2]], main='Leadership Values', xlab='Collar Color', ylab='Score', col=color)
boxplot(MediatorDomain~CollarColor, data=df2[[2]], main='Mediator Skills', xlab='Collar Color', ylab='Score', col=color)
boxplot(FulfillmentDomain~CollarColor, data=df2[[2]], main='Leadership Fulfillment', xlab='Collar Color', ylab='Score', col=color)
We have performed a factorial MANOVA on multiple dependent variables and have shown that CollarColor
has a significant effect on leadership qualities and Gender also has a significant effect on leadership qualities. Additionally there is a significant interaction effect between Gender and CollarColor suggesting that the effect of CollarColor on leadership qualities is different bewteen each gender.
summary(fit)
## Df Pillai approx F num Df den Df Pr(>F)
## CollarColor 3 0.035675 4.4489 9 3327 8.113e-06 ***
## Gender 1 0.031122 11.8528 3 1107 1.209e-07 ***
## CollarColor:Gender 3 0.024277 3.0159 9 3327 0.001367 **
## Residuals 1109
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Because of the clear results we are able to reject our null hypotheses that “There is no relationship between job level and leadership qualities” and “There is no relationship between gender and leadership qualities”.
The data was cleaned and processed. We removed categorical groups with very few samples and we removed all rows that contained zero variance in their responses and rows containing missing values were removed. Additionally we checked for duplicated ResponseIDs and removed data outliers using 4/N threshold of Cook’s distance.
summary(df$ValuesDomain)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.750 7.250 8.125 7.993 8.750 10.000
summary(df$MediatorDomain)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.917 7.417 8.167 8.099 8.833 10.000
summary(df$FulfillmentDomain)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.000 7.364 8.091 8.037 8.818 10.000
summary(df$CollarColor)
## 1 2 3 4
## 154 171 302 490
summary(df$Gender)
## 1 2
## 509 608
# divde graphics into 1x3 panel
op <- par(mfrow = c(1, 3))
# make Collar plots
boxplot(ValuesDomain~CollarColor, data=df, main='Leadership Values', xlab='Collar Color', ylab='Score')
boxplot(MediatorDomain~CollarColor, data=df, main='Mediator Skills', xlab='Collar Color', ylab='Score')
boxplot(FulfillmentDomain~CollarColor, data=df, main='Leadership Fulfillment', xlab='Collar Color', ylab='Score')
# make Gender plots
boxplot(ValuesDomain~Gender, data=df, main='Leadership Values', xlab='Gender', ylab='Score')
boxplot(MediatorDomain~Gender, data=df, main='Mediator Skills', xlab='Gender', ylab='Score')
boxplot(FulfillmentDomain~Gender, data=df, main='Leadership Fulfillment', xlab='Gender', ylab='Score')
We used R to build diagnostic plots to test the assumptions of MANOVA:
We found no violation of these assumptions and were able to continue the analysis without transforming the data.
op <- par(mfrow = c(2,2))
plot(fitVal)
plot(fitMed)
plot(fitFul)
par(op)
MANOVA:
summary(fit)
## Df Pillai approx F num Df den Df Pr(>F)
## CollarColor 3 0.035675 4.4489 9 3327 8.113e-06 ***
## Gender 1 0.031122 11.8528 3 1107 1.209e-07 ***
## CollarColor:Gender 3 0.024277 3.0159 9 3327 0.001367 **
## Residuals 1109
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
for each DV:
summary(aov(fit))
## Response ValuesDomain :
## Df Sum Sq Mean Sq F value Pr(>F)
## CollarColor 3 24.10 8.0336 7.4820 5.837e-05 ***
## Gender 1 26.92 26.9193 25.0709 6.424e-07 ***
## CollarColor:Gender 3 15.49 5.1639 4.8093 0.002475 **
## Residuals 1109 1190.76 1.0737
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response MediatorDomain :
## Df Sum Sq Mean Sq F value Pr(>F)
## CollarColor 3 12.52 4.1722 4.1085 0.0065343 **
## Gender 1 11.49 11.4857 11.3105 0.0007970 ***
## CollarColor:Gender 3 19.08 6.3617 6.2646 0.0003238 ***
## Residuals 1109 1126.19 1.0155
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response FulfillmentDomain :
## Df Sum Sq Mean Sq F value Pr(>F)
## CollarColor 3 13.82 4.6059 4.3572 0.0046345 **
## Gender 1 25.93 25.9328 24.5322 8.444e-07 ***
## CollarColor:Gender 3 18.37 6.1242 5.7934 0.0006268 ***
## Residuals 1109 1172.32 1.0571
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There was a clear significant relationship between CollarColor
and leadership qualities and Gender
and leadership qualities. Consequently we are able to reject noth our null hypotheses that “There is no relationship between job level and leadership qualities” and “There is no relationship between gender and leadership qualities”.
This analysis has not taken into account the possible interaction effects of other potential covariates such as country, age and employment. Further analysis could expand on this study by investigating the effect of additional covariates.