⇦ Back

1 Initial Steps

If you’re going to be using ggplot2, the first thing you need to do is load the library:

library(ggplot2)

Next, remember that when you are using ggplot2 you always need to have your data in long format. Take a look at the dataset below which contains the results of a sleep experiment (it shows the number of extra hours of sleep - compared to a control group - that 10 participants experienced after taking medicine “1” vs after taking medicine “2”):

The data below is in wide format and needs to be augmented before it can be used with ggplot2:

print(wide)
##    ID    1    2
## 1   1  0.7  1.9
## 2   2 -1.6  0.8
## 3   3 -0.2  1.1
## 4   4 -1.2  0.1
## 5   5 -0.1 -0.1
## 6   6  3.4  4.4
## 7   7  3.7  5.5
## 8   8  0.8  1.6
## 9   9  0.0  4.6
## 10 10  2.0  3.4

Wide-format data can be converted to long-format with the below code:

# Use gather() from tidyr to convert
# from wide to long format
library(tidyr)
long <- gather(wide, group, extra, c("1", "2"))


The data below is in long format and is suitable for plotting with ggplot2:

print(long)
##    ID group extra
## 1   1     1   0.7
## 2   2     1  -1.6
## 3   3     1  -0.2
## 4   4     1  -1.2
## 5   5     1  -0.1
## 6   6     1   3.4
## 7   7     1   3.7
## 8   8     1   0.8
## 9   9     1   0.0
## 10 10     1   2.0
## 11  1     2   1.9
## 12  2     2   0.8
## 13  3     2   1.1
## 14  4     2   0.1
## 15  5     2  -0.1
## 16  6     2   4.4
## 17  7     2   5.5
## 18  8     2   1.6
## 19  9     2   4.6
## 20 10     2   3.4

2 Plotting

2.1 Plotting the Frequencies of Values in a Data Frame

The following dataset contains the diameter, height and volume of 31 black cherry trees (only the first 6 of which are shown here):

print(head(trees))
##   Girth Height Volume
## 1   8.3     70   10.3
## 2   8.6     65   10.3
## 3   8.8     63   10.2
## 4  10.5     72   16.4
## 5  10.7     81   18.8
## 6  10.8     83   19.7

When a bar plot is created with ggplot2 using its default settings it will count the number of occurrences of a number in a column and use that for the height of the bars:

p <- ggplot(trees, aes(Height))
p <- p + geom_bar()
print(p)

Notice that there was no argument passed to the geom_bar() function; there was nothing between its brackets and so the default settings were used. Therefore, it counted the number of trees of each height (because “Height” was the column that was passed to the aes() function) and plotted those as the bars. We can see that there were 5 trees of height 80 ft.

2.2 Plotting the Values in a Data Frame

The below dataset shows the death rates in Virginia in 1940:

print(VADeaths)
##       Rural Male Rural Female Urban Male Urban Female
## 50-54       11.7          8.7       15.4          8.4
## 55-59       18.1         11.7       24.3         13.6
## 60-64       26.9         20.3       37.0         19.3
## 65-69       41.0         30.9       54.6         35.1
## 70-74       66.0         54.3       71.1         50.0

Let’s only plot one of the columns - Urban Female - and have one bar for each age group:

  • The aes() function takes the x-variable and the y-variable of the plot in that order, so we want to pass it rownames(VADeaths) in the x-position because the row names of the data frame (ie the age groups) will form the x-labels of our bars
  • The column name that will go in the y-position of the aes() function is “Urban Female” as that will give use the height of each bar
  • This time, in contrast to the first example, we want the height of each bar to be the actual value in the data frame (ie the actual death rate). This is done by passing the ‘identity’ option to the stat keyword argument of the geom_bar() function:
# Convert the array to a data frame
vadeaths <- as.data.frame(VADeaths)

p <- ggplot(vadeaths, aes(rownames(vadeaths), `Urban Female`))
p <- p + geom_bar(stat = "identity")
print(p)

Notice that the column we wanted to plot (“Urban Female”) had a space in its name. We therefore had to use grave accents when specifying it in the aes() function.

2.3 Plotting the Mean of Values in a Data Frame

The next dataset contains the weight of 71 chicks, measured six weeks after hatching, that were each fed on one of six different diets (first 15 data points shown):

print(head(chickwts, 15))
##    weight      feed
## 1     179 horsebean
## 2     160 horsebean
## 3     136 horsebean
## 4     227 horsebean
## 5     217 horsebean
## 6     168 horsebean
## 7     108 horsebean
## 8     124 horsebean
## 9     143 horsebean
## 10    140 horsebean
## 11    309   linseed
## 12    229   linseed
## 13    181   linseed
## 14    141   linseed
## 15    260   linseed

To plot the mean result of each group, we need to use the “summary” option of the stat keyword argument:

p <- ggplot(chickwts, aes(feed, weight))
p <- p + geom_bar(stat = "summary")
print(p)

3 Formatting

Let’s make the plot look a little better:

3.1 Titles and Labels

# Change the data frame's levels' names to edit the bars' labels
levels(chickwts$feed)[levels(chickwts$feed) == "casein"] <- "Casein"
levels(chickwts$feed)[levels(chickwts$feed) == "horsebean"] <- "Horsebean"
levels(chickwts$feed)[levels(chickwts$feed) == "linseed"] <- "Linseed"
levels(chickwts$feed)[levels(chickwts$feed) == "meatmeal"] <- "Meatmeal"
levels(chickwts$feed)[levels(chickwts$feed) == "soybean"] <- "Soybean"
levels(chickwts$feed)[levels(chickwts$feed) == "sunflower"] <- "Sunflower"

p <- ggplot(chickwts, aes(feed, weight))
p <- p + geom_bar(stat = "summary")
# Remove the main x-axis label
p <- p + xlab("")
# Add main title and y-axis label
p <- p + labs(title = "Chicken Weights By Feed Type", y = "Weight [g]")
print(p)

Remember that you can include Unicode in your axis titles using the \U Unicode indicator (eg "Pi: \U03C0" renders as “Pi: π”).

3.2 Distinguish Groups (Colour and Legend)

p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + xlab("")
p <- p + labs(title = "Chicken Weights By Feed Type", y = "Weight [g]")
print(p)

3.2.1 Edit the Legend

Change the legend’s title using the “fill” keyword argument in the labs() function:

p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + xlab("")
p <- p + labs(
    title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)

3.2.2 Remove the Legend

The legend is placed automatically when you use the “fill” keyword argument to colour the bars. Remove it by changing its position to “none”:

p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + xlab("")
p <- p + labs(
    title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
p <- p + theme(legend.position = "none")
print(p)

3.2.3 Edit the Colours

blue <- "#00B7EB"
pink <- "#EE2A7B"
yellow <- "#FFD100"
green <- "#5BBF21"

p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + scale_fill_manual(
    values = c(blue, yellow, pink, green, "red", "orange")
)
p <- p + xlab("")
p <- p + labs(
    title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)

3.2.4 Use a Colour Palette

Viridis:

# Import the library that contains the palette
library(viridis)

p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + scale_fill_viridis(discrete = TRUE)
p <- p + xlab("")
p <- p + labs(
    title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)

Ggsci:

# Import the library that contains the palette
library(ggsci)

p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + scale_fill_lancet()
p <- p + xlab("")
p <- p + labs(
    title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)

RColorBrewer:

# Import the library that contains the palette
library(RColorBrewer)

p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + scale_fill_brewer(palette = "Purples")
p <- p + xlab("")
p <- p + labs(
    title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)

3.3 Show Individual Points

Add in the geom_point() function:

p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + geom_point()
p <- p + scale_fill_grey(start = 0.8, end = 0.2)
p <- p + xlab("")
p <- p + labs(
    title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)

3.4 Outline the Bars

Use the col keyword argument in the geom_bar() function:

p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary", col = "black")
p <- p + scale_fill_grey(start = 0.8, end = 0.2)
p <- p + xlab("")
p <- p + labs(
    title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)

3.5 Show Error Bars

Two changes need to happen here:

  • summarySE() from the “Rmisc” package can be used to generate the summary statistics for each group
  • The mean of each group is generated by the above function and so that can be plotted directly. The option being passed to stat thus needs to change from “summary” to “identity”.
library(Rmisc)
chickwts_summ <- summarySE(chickwts, measurevar = "weight", groupvars = "feed")
print(chickwts_summ)
##        feed  N   weight       sd       se       ci
## 1    Casein 12 323.5833 64.43384 18.60045 40.93931
## 2 Horsebean 10 160.2000 38.62584 12.21456 27.63126
## 3   Linseed 12 218.7500 52.23570 15.07915 33.18898
## 4  Meatmeal 11 276.9091 64.90062 19.56827 43.60083
## 5   Soybean 14 246.4286 54.12907 14.46660 31.25319
## 6 Sunflower 12 328.9167 48.83638 14.09785 31.02916
p <- ggplot(chickwts_summ, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "identity")
p <- p + geom_errorbar(
    aes(ymin = weight - se, ymax = weight + se), width = 0.25
)
p <- p + scale_fill_grey(start = 0.8, end = 0.2)
p <- p + xlab("")
p <- p + labs(
    title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)

4 Save Plot

Finally, use ggsave("File Name.png") to save the plot to your computer.

5 Next: Multiple Factors

For bar plots that make use of multiple factors, see here.

⇦ Back