⇦ Back

This page is a follow-on from the one about bar plots with single factors

1 Data with Two Factors

Take a look at the dataset below which contains the results of a sleep experiment (it shows the number of extra hours of sleep - relative to a control group - that 10 participants experienced after taking medicine “1” compared to after taking medicine “2”):

print(sleep)
##    extra group ID
## 1    0.7     1  1
## 2   -1.6     1  2
## 3   -0.2     1  3
## 4   -1.2     1  4
## 5   -0.1     1  5
## 6    3.4     1  6
## 7    3.7     1  7
## 8    0.8     1  8
## 9    0.0     1  9
## 10   2.0     1 10
## 11   1.9     2  1
## 12   0.8     2  2
## 13   1.1     2  3
## 14   0.1     2  4
## 15  -0.1     2  5
## 16   4.4     2  6
## 17   5.5     2  7
## 18   1.6     2  8
## 19   4.6     2  9
## 20   3.4     2 10

The ‘results’ of the experiment are in column “extra”, namely the number of extra hours of sleep for each participant for each medicine. The ‘factors’ are in the other two columns, “group” (ie which medicine was taken) and “ID” (ie the ID of the participant). If we try to plot this as a bar plot using the barplot(height) method we do not get the full picture of the experiment:

barplot(sleep$extra)

This is unhelpful because the results from each participant are not next to each other! What we need to do is use the barplot(y ~ x, data) format, but something like barplot(extra ~ group, data = sleep) would cause R to throw an error because “group” does not contain unique values. Instead, we need to use “group” AND “ID” as the factors, as together these columns uniquely define each row (there is only one row where group = 1 and ID = 1, and so on). This can be achieved simply by using the ‘+’ sign to indicate that these are being combined to create the x-data in our formula:

barplot(extra ~ group + ID, data = sleep)

A better approach is to show the bars side-by-side:

barplot(extra ~ group + ID, data = sleep, beside = TRUE)

1.1 Format

Let’s improve how the plot looks:

  • Keyword arguments in barplot() can be used to:
    • Set axis labels via xlab and ylab
    • Set the title with main
    • Add colour with col
  • legend() will add a legend with the contents thereof being set by its keyword arguments:
    • title sets the label for the legend
    • legend sets the text that appears for each series being labelled
    • pch controls what symbols are displayed
    • col gives the colours of the symbols
  • box() gives the plot an outline
  • abline(h=0) draws a line on the x-axis
barplot(
    # Plot data
    extra ~ group + ID, data = sleep, beside = TRUE,
    # Axis labels
    ylab = "Additional Sleep Time [hr]", xlab = "Participant ID",
    # Graph title
    main = "Student's Sleep Experiment",
    # Add colour
    col = c("lightsalmon", "lightskyblue2")
)
# Add a legend
legend(
    "topleft", title = "Medicine", legend = c("1", "2"), pch = 15,
    col = c("lightsalmon", "lightskyblue2")
)
# Draw a box outline
box()
# Include the x-axis
abline(h=0)

1.2 Using Errors Bars

Firstly, in order to use error bars, the bar plot needs to represent multiple data points that have been grouped together. Secondly, we need to know what the ‘error’ actually is. In this example, we will use the ‘standard error’ (the standard deviation of a sampling distribution) which can be calculated using the summarySE() function from the Rmisc library:

library(Rmisc)

sleep_summ <- summarySE(sleep, measurevar = "extra", groupvars = "group")
print(sleep_summ)
##   group  N extra       sd        se       ci
## 1     1 10  0.75 1.789010 0.5657345 1.279780
## 2     2 10  2.33 2.002249 0.6331666 1.432322

Creating a bar plot using the extra column from above will result in two bars, each representing the mean of 10 individual data points. The standard error of each of these 10 (the se column of the sleep_summ data frame shown above) can be plotted as line segments using segments() where:

  • The x-values of the line segments are the same as those of the bars. We can get these from the bar plot itself by setting it equal to a variable bp
  • The y-values of the line segments are the heights of the two bars plus and minus the values of the standard error. Both of these can be retrieved from the result of the summarySE() function, which we have called sleep_summ.
  • lwd sets the line width
bp <- barplot(
    # Plot data
    sleep_summ$extra,
    # Bar labels
    names.arg = c("1", "2"),
    # Axis labels
    ylab = "Additional Sleep Time [hr]", xlab = "Medicine",
    # Graph title
    main = "Student's Sleep Experiment",
    # Add colour
    col = c("lightsalmon", "lightskyblue2"),
    # y-Axis limits
    ylim = c(0, max(sleep_summ$extra + sleep_summ$se) + 0.5)
)
# Draw a box outline
box()
# Add error bars
segments(bp, sleep_summ$extra - sleep_summ$se, bp, sleep_summ$extra + sleep_summ$se, lwd = 1.5)

2 Data with More Than Two Factors

The Titanic dataset details the number of passengers that were on board the famous passenger ship that sunk in 1912. It contains one ‘result’ (“Freq” - the number of each type of passenger) and four ‘factors’ (“Class”, “Sex”, “Age” and “Survived”). The first 15 rows are as follows:

# Convert to data frame
titanic <- as.data.frame(Titanic)
print(head(titanic, 15))
##    Class    Sex   Age Survived Freq
## 1    1st   Male Child       No    0
## 2    2nd   Male Child       No    0
## 3    3rd   Male Child       No   35
## 4   Crew   Male Child       No    0
## 5    1st Female Child       No    0
## 6    2nd Female Child       No    0
## 7    3rd Female Child       No   17
## 8   Crew Female Child       No    0
## 9    1st   Male Adult       No  118
## 10   2nd   Male Adult       No  154
## 11   3rd   Male Adult       No  387
## 12  Crew   Male Adult       No  670
## 13   1st Female Adult       No    4
## 14   2nd Female Adult       No   13
## 15   3rd Female Adult       No   89

There are too many factors to plot all at once; on a 2D graph only two can be shown at once. That’s no problem though because we can just make four graphs:

data <- aggregate(
    Freq ~ Class + Age, data = titanic, sum
)
print(data)
##   Class   Age Freq
## 1   1st Child    6
## 2   2nd Child   24
## 3   3rd Child   79
## 4  Crew Child    0
## 5   1st Adult  319
## 6   2nd Adult  261
## 7   3rd Adult  627
## 8  Crew Adult  885
barplot(
    # Plot data
    Freq ~ Class + Age, data = data, beside = TRUE,
    # Axis labels
    ylab = "Count", xlab = "Age",
    # Graph title
    main = "Age of passengers on the Titanic",
    # Add colour
    col = factor(data$Class)
)
# Add a legend
legend(
    "topleft", title = "Class", legend = c("1st", "2nd", "3rd", "Crew"),
    pch = 15, col = factor(data$Class)
)
# Draw a box outline
box()

data <- aggregate(
    Freq ~ Survived + Sex, data = titanic, sum
)
print(data)
##   Survived    Sex Freq
## 1       No   Male 1364
## 2      Yes   Male  367
## 3       No Female  126
## 4      Yes Female  344
barplot(
    # Plot data
    Freq ~ Survived + Sex, data = data, beside = TRUE,
    # Axis labels
    ylab = "Count", xlab = "Gender",
    # Graph title
    main = "Gender of passengers on the Titanic",
    # Add colour
    col = factor(data$Survived)
)
# Add a legend
legend(
    "topright", title = "Survived", legend = c("No", "Yes"),
    pch = 15, col = factor(data$Survived)
)
# Draw a box outline
box()

data <- aggregate(
    Freq ~ Sex + Class, data = titanic, sum
)
print(data)
##      Sex Class Freq
## 1   Male   1st  180
## 2 Female   1st  145
## 3   Male   2nd  179
## 4 Female   2nd  106
## 5   Male   3rd  510
## 6 Female   3rd  196
## 7   Male  Crew  862
## 8 Female  Crew   23
barplot(
    # Plot data
    Freq ~ Sex + Class, data = data, beside = TRUE,
    # Axis labels
    ylab = "Count", xlab = "Class",
    # Graph title
    main = "Class of passengers on the Titanic",
    # Add colour
    col = factor(data$Sex)
)
# Add a legend
legend(
    "topleft", title = "Gender", legend = c("Male", "Female"),
    pch = 15, col = factor(data$Sex)
)
# Draw a box outline
box()

data <- aggregate(
    Freq ~ Age + Survived, data = titanic, sum
)
print(data)
##     Age Survived Freq
## 1 Child       No   52
## 2 Adult       No 1438
## 3 Child      Yes   57
## 4 Adult      Yes  654
barplot(
    # Plot data
    Freq ~ Age + Survived, data = data, beside = TRUE,
    # Axis labels
    ylab = "Count", xlab = "Survival",
    # Graph title
    main = "Survival of passengers on the Titanic",
    # Add colour
    col = factor(data$Age)
)
# Add a legend
legend(
    "topright", title = "Age", legend = c("Child", "Adult"),
    pch = 15, col = factor(data$Age)
)
# Draw a box outline
box()

3 Save Plot

Finally, use ggsave("File Name.png") to save the plot to your computer.

⇦ Back