⇦ Back

This page is a follow-on from the one about bar plots with single factors

1 Initial Step

If you’re going to be using ggplot2, the first thing you need to do is load the library:

library(ggplot2)

2 Data with Two Factors

Take a look at the dataset below which contains the results of a sleep experiment (it shows the number of extra hours of sleep - relative to a control group - that 10 participants experienced after taking medicine “1” compared to after taking medicine “2”):

print(sleep)
##    extra group ID
## 1    0.7     1  1
## 2   -1.6     1  2
## 3   -0.2     1  3
## 4   -1.2     1  4
## 5   -0.1     1  5
## 6    3.4     1  6
## 7    3.7     1  7
## 8    0.8     1  8
## 9    0.0     1  9
## 10   2.0     1 10
## 11   1.9     2  1
## 12   0.8     2  2
## 13   1.1     2  3
## 14   0.1     2  4
## 15  -0.1     2  5
## 16   4.4     2  6
## 17   5.5     2  7
## 18   1.6     2  8
## 19   4.6     2  9
## 20   3.4     2 10

The ‘results’ of the experiment are in column “extra”, namely the number of extra hours of sleep for each participant for each medicine. The ‘factors’ are in the other two columns, “group” (ie which medicine was taken) and “ID” (ie the ID of the participant). If we try to plot this as a bar plot as per normal we do not get the full picture of the experiment:

p <- ggplot(sleep, aes(x = ID, y = extra))
p <- p + geom_bar(stat = "identity")
print(p)

As you can see, there is only one bar for each participant! We’re expecting two bars; one for each of the two times they repeated the experiment. Using colour to differentiate the two experimental runs will help to see what’s going wrong:

p <- ggplot(sleep, aes(x = ID, y = extra, fill = factor(group)))
p <- p + geom_bar(stat = "identity")
print(p)

So, what’s going on here is that there are indeed two bars for each participant, but the first (reddish-pink) one is behind the second (blueish-green) one. What we need to do is have them be side-by-side, ie for them to ‘dodge’ each other:

p <- ggplot(sleep, aes(x = ID, y = extra, fill = factor(group)))
p <- p + geom_bar(stat = "identity", position = position_dodge())
print(p)

As you can see, this was achieved by using the position_dodge() function and the “position” keyword argument.

2.1 Format

Let’s improve how the plot looks:

p <- ggplot(sleep, aes(x = ID, y = extra, fill = factor(group)))
p <- p + geom_bar(stat = "identity", position = position_dodge())
p <- p + scale_fill_manual(values = c("#F4A582", "#92C5DE"))
p <- p + xlab("Participant ID")
p <- p + labs(
    title = "Student's Sleep Experiment", y = "Additional Sleep Time [hr]",
    fill = "Medicine"
)
print(p)

2.2 Using Errors Bars - Reducing a Factor

Notice that each bar is currently representing an exact number: the height of each corresponds to a single value, the number of extra hours the participant slept for. The concept of adding errors bars to this plot doesn’t work; you can’t calculate a standard error on one number!

If we calculate the summary statistics for each factor, however, that will reduce the number of factors by one and cause each bar to represent all of the data points corresponding to that group. This can be done with the summarySE() function from the “Rmisc” package:

library(Rmisc)

# "measurevar" is the variable being measured
# "groupvars" are the variables representing the groups
sleep_summ <- summarySE(sleep, measurevar = "extra", groupvars = "group")
print(sleep_summ)
##   group  N extra       sd        se       ci
## 1     1 10  0.75 1.789010 0.5657345 1.279780
## 2     2 10  2.33 2.002249 0.6331666 1.432322

As you can see from the above data frame, we now only have one factor (group) with 10 data points (N) in each. The mean value for each group has bee calculated (extra) along with the standard error (se). This can now all be plotted:

p <- ggplot(sleep_summ, aes(x = group, y = extra, fill = factor(group)))
p <- p + geom_bar(stat = "identity", position = position_dodge())
p <- p + geom_errorbar(aes(ymax = extra + se, ymin = extra - se), width = 0.2)
# Bar fill colours
p <- p + scale_fill_manual(values = c("#F4A582", "#92C5DE"))
# Titles and labels
p <- p + xlab("Medicine")
p <- p + labs(
    title = "Student's Sleep Experiment", y = "Additional Sleep Time [hr]"
)
# Remove legend
p <- p + theme(legend.position = "none")
print(p)

3 Data with More Than Two Factors

The Titanic dataset details the number of passengers that were on board the famous passenger ship that sunk in 1912. It contains one ‘result’ (“Freq” - the number of each type of passenger) and four ‘factors’ (“Class”, “Sex”, “Age” and “Survived”). The first 15 rows are as follows:

# Convert to data frame
titanic <- as.data.frame(Titanic)
print(head(titanic, 15))
##    Class    Sex   Age Survived Freq
## 1    1st   Male Child       No    0
## 2    2nd   Male Child       No    0
## 3    3rd   Male Child       No   35
## 4   Crew   Male Child       No    0
## 5    1st Female Child       No    0
## 6    2nd Female Child       No    0
## 7    3rd Female Child       No   17
## 8   Crew Female Child       No    0
## 9    1st   Male Adult       No  118
## 10   2nd   Male Adult       No  154
## 11   3rd   Male Adult       No  387
## 12  Crew   Male Adult       No  670
## 13   1st Female Adult       No    4
## 14   2nd Female Adult       No   13
## 15   3rd Female Adult       No   89

There are too many factors to plot all at once; on a 2D graph only two can be shown at once. That’s no problem though because we can just make four graphs:

p <- ggplot(
    titanic, aes(x = Age, y = Freq, fill = factor(Class))
)
p <- p + geom_bar(stat = "identity", position = position_dodge())
p <- p + scale_fill_manual(
    values = c("#D1E5F0", "#92C5DE", "#4393C3", "#2166AC")
)
p <- p + xlab("Age")
p <- p + labs(
    title = "Age of passengers on the Titanic",
    y = "Count", fill = "Class"
)
print(p)

p <- ggplot(
    titanic, aes(x = Class, y = Freq, fill = factor(Sex))
)
p <- p + geom_bar(stat = "identity", position = position_dodge())
p <- p + scale_fill_manual(
    values = c("#92C5DE", "#2166AC")
)
p <- p + xlab("Class")
p <- p + labs(
    title = "Class of passengers on the Titanic",
    y = "Count", fill = "Gender"
)
print(p)

p <- ggplot(
    titanic, aes(x = Sex, y=Freq, fill=factor(Survived))
)
p <- p + geom_bar(stat = "identity", position = position_dodge())
p <- p + scale_fill_manual(
    values = c("#92C5DE", "#2166AC")
)
p <- p + xlab("Gender")
p <- p + labs(
    title = "Gender of passengers on the Titanic",
    y = "Count",
    fill = "Survived"
)
print(p)

p <- ggplot(
    titanic, aes(x = Survived, y=Freq, fill=factor(Age))
)
p <- p + geom_bar(stat = "identity", position = position_dodge())
p <- p + scale_fill_manual(
    values = c("#92C5DE", "#2166AC")
)
p <- p + xlab("Survived")
p <- p + labs(
    title = "Survival of passengers on the Titanic",
    y = "Count", fill = "Age"
)
print(p)

4 Save Plot

Finally, use ggsave("File Name.png") to save the plot to your computer.

⇦ Back