This page is a follow-on from the one about bar plots with single factors
Take a look at the dataset below which contains the results of a sleep experiment (it shows the number of extra hours of sleep - relative to a control group - that 10 participants experienced after taking medicine “1” compared to after taking medicine “2”):
print(sleep)
## extra group ID
## 1 0.7 1 1
## 2 -1.6 1 2
## 3 -0.2 1 3
## 4 -1.2 1 4
## 5 -0.1 1 5
## 6 3.4 1 6
## 7 3.7 1 7
## 8 0.8 1 8
## 9 0.0 1 9
## 10 2.0 1 10
## 11 1.9 2 1
## 12 0.8 2 2
## 13 1.1 2 3
## 14 0.1 2 4
## 15 -0.1 2 5
## 16 4.4 2 6
## 17 5.5 2 7
## 18 1.6 2 8
## 19 4.6 2 9
## 20 3.4 2 10
The ‘results’ of the experiment are in column “extra”, namely the number of extra hours of sleep for each participant for each medicine. The ‘factors’ are in the other two columns, “group” (ie which medicine was taken) and “ID” (ie the ID of the participant). If we try to plot this as a bar plot using the barplot(height)
method we do not get the full picture of the experiment:
barplot(sleep$extra)
This is unhelpful because the results from each participant are not next to each other! What we need to do is use the barplot(y ~ x, data)
format, but something like barplot(extra ~ group, data = sleep)
would cause R to throw an error because “group” does not contain unique values. Instead, we need to use “group” AND “ID” as the factors, as together these columns uniquely define each row (there is only one row where group = 1 and ID = 1, and so on). This can be achieved simply by using the ‘+’ sign to indicate that these are being combined to create the x-data in our formula:
barplot(extra ~ group + ID, data = sleep)
A better approach is to show the bars side-by-side:
barplot(extra ~ group + ID, data = sleep, beside = TRUE)
Let’s improve how the plot looks:
barplot()
can be used to:
xlab
and ylab
main
col
legend()
will add a legend with the contents thereof being set by its keyword arguments:
title
sets the label for the legendlegend
sets the text that appears for each series being labelledpch
controls what symbols are displayedcol
gives the colours of the symbolsbox()
gives the plot an outlineabline(h=0)
draws a line on the x-axisbarplot(
# Plot data
extra ~ group + ID, data = sleep, beside = TRUE,
# Axis labels
ylab = "Additional Sleep Time [hr]", xlab = "Participant ID",
# Graph title
main = "Student's Sleep Experiment",
# Add colour
col = c("lightsalmon", "lightskyblue2")
)
# Add a legend
legend(
"topleft", title = "Medicine", legend = c("1", "2"), pch = 15,
col = c("lightsalmon", "lightskyblue2")
)
# Draw a box outline
box()
# Include the x-axis
abline(h=0)
Firstly, in order to use error bars, the bar plot needs to represent multiple data points that have been grouped together. Secondly, we need to know what the ‘error’ actually is. In this example, we will use the ‘standard error’ (the standard deviation of a sampling distribution) which can be calculated using the summarySE()
function from the Rmisc
library:
library(Rmisc)
sleep_summ <- summarySE(sleep, measurevar = "extra", groupvars = "group")
print(sleep_summ)
## group N extra sd se ci
## 1 1 10 0.75 1.789010 0.5657345 1.279780
## 2 2 10 2.33 2.002249 0.6331666 1.432322
Creating a bar plot using the extra
column from above will result in two bars, each representing the mean of 10 individual data points. The standard error of each of these 10 (the se
column of the sleep_summ
data frame shown above) can be plotted as line segments using segments()
where:
bp
summarySE()
function, which we have called sleep_summ
.lwd
sets the line widthbp <- barplot(
# Plot data
sleep_summ$extra,
# Bar labels
names.arg = c("1", "2"),
# Axis labels
ylab = "Additional Sleep Time [hr]", xlab = "Medicine",
# Graph title
main = "Student's Sleep Experiment",
# Add colour
col = c("lightsalmon", "lightskyblue2"),
# y-Axis limits
ylim = c(0, max(sleep_summ$extra + sleep_summ$se) + 0.5)
)
# Draw a box outline
box()
# Add error bars
segments(bp, sleep_summ$extra - sleep_summ$se, bp, sleep_summ$extra + sleep_summ$se, lwd = 1.5)
The Titanic dataset details the number of passengers that were on board the famous passenger ship that sunk in 1912. It contains one ‘result’ (“Freq” - the number of each type of passenger) and four ‘factors’ (“Class”, “Sex”, “Age” and “Survived”). The first 15 rows are as follows:
# Convert to data frame
titanic <- as.data.frame(Titanic)
print(head(titanic, 15))
## Class Sex Age Survived Freq
## 1 1st Male Child No 0
## 2 2nd Male Child No 0
## 3 3rd Male Child No 35
## 4 Crew Male Child No 0
## 5 1st Female Child No 0
## 6 2nd Female Child No 0
## 7 3rd Female Child No 17
## 8 Crew Female Child No 0
## 9 1st Male Adult No 118
## 10 2nd Male Adult No 154
## 11 3rd Male Adult No 387
## 12 Crew Male Adult No 670
## 13 1st Female Adult No 4
## 14 2nd Female Adult No 13
## 15 3rd Female Adult No 89
There are too many factors to plot all at once; on a 2D graph only two can be shown at once. That’s no problem though because we can just make four graphs:
data <- aggregate(
Freq ~ Class + Age, data = titanic, sum
)
print(data)
## Class Age Freq
## 1 1st Child 6
## 2 2nd Child 24
## 3 3rd Child 79
## 4 Crew Child 0
## 5 1st Adult 319
## 6 2nd Adult 261
## 7 3rd Adult 627
## 8 Crew Adult 885
barplot(
# Plot data
Freq ~ Class + Age, data = data, beside = TRUE,
# Axis labels
ylab = "Count", xlab = "Age",
# Graph title
main = "Age of passengers on the Titanic",
# Add colour
col = factor(data$Class)
)
# Add a legend
legend(
"topleft", title = "Class", legend = c("1st", "2nd", "3rd", "Crew"),
pch = 15, col = factor(data$Class)
)
# Draw a box outline
box()
data <- aggregate(
Freq ~ Survived + Sex, data = titanic, sum
)
print(data)
## Survived Sex Freq
## 1 No Male 1364
## 2 Yes Male 367
## 3 No Female 126
## 4 Yes Female 344
barplot(
# Plot data
Freq ~ Survived + Sex, data = data, beside = TRUE,
# Axis labels
ylab = "Count", xlab = "Gender",
# Graph title
main = "Gender of passengers on the Titanic",
# Add colour
col = factor(data$Survived)
)
# Add a legend
legend(
"topright", title = "Survived", legend = c("No", "Yes"),
pch = 15, col = factor(data$Survived)
)
# Draw a box outline
box()
data <- aggregate(
Freq ~ Sex + Class, data = titanic, sum
)
print(data)
## Sex Class Freq
## 1 Male 1st 180
## 2 Female 1st 145
## 3 Male 2nd 179
## 4 Female 2nd 106
## 5 Male 3rd 510
## 6 Female 3rd 196
## 7 Male Crew 862
## 8 Female Crew 23
barplot(
# Plot data
Freq ~ Sex + Class, data = data, beside = TRUE,
# Axis labels
ylab = "Count", xlab = "Class",
# Graph title
main = "Class of passengers on the Titanic",
# Add colour
col = factor(data$Sex)
)
# Add a legend
legend(
"topleft", title = "Gender", legend = c("Male", "Female"),
pch = 15, col = factor(data$Sex)
)
# Draw a box outline
box()
data <- aggregate(
Freq ~ Age + Survived, data = titanic, sum
)
print(data)
## Age Survived Freq
## 1 Child No 52
## 2 Adult No 1438
## 3 Child Yes 57
## 4 Adult Yes 654
barplot(
# Plot data
Freq ~ Age + Survived, data = data, beside = TRUE,
# Axis labels
ylab = "Count", xlab = "Survival",
# Graph title
main = "Survival of passengers on the Titanic",
# Add colour
col = factor(data$Age)
)
# Add a legend
legend(
"topright", title = "Age", legend = c("Child", "Adult"),
pch = 15, col = factor(data$Age)
)
# Draw a box outline
box()