If you’re going to be using ggplot2, the first thing you need to do is load the library:
library(ggplot2)
Next, remember that when you are using ggplot2 you always need to have your data in long format. Take a look at the dataset below which contains the results of a sleep experiment (it shows the number of extra hours of sleep - compared to a control group - that 10 participants experienced after taking medicine “1” vs after taking medicine “2”):
The data below is in wide format and needs to be augmented before it can be used with ggplot2:
print(wide)
## ID 1 2
## 1 1 0.7 1.9
## 2 2 -1.6 0.8
## 3 3 -0.2 1.1
## 4 4 -1.2 0.1
## 5 5 -0.1 -0.1
## 6 6 3.4 4.4
## 7 7 3.7 5.5
## 8 8 0.8 1.6
## 9 9 0.0 4.6
## 10 10 2.0 3.4
Wide-format data can be converted to long-format with the below code:
# Use gather() from tidyr to convert
# from wide to long format
library(tidyr)
long <- gather(wide, group, extra, c("1", "2"))
The data below is in long format and is suitable for plotting with ggplot2:
print(long)
## ID group extra
## 1 1 1 0.7
## 2 2 1 -1.6
## 3 3 1 -0.2
## 4 4 1 -1.2
## 5 5 1 -0.1
## 6 6 1 3.4
## 7 7 1 3.7
## 8 8 1 0.8
## 9 9 1 0.0
## 10 10 1 2.0
## 11 1 2 1.9
## 12 2 2 0.8
## 13 3 2 1.1
## 14 4 2 0.1
## 15 5 2 -0.1
## 16 6 2 4.4
## 17 7 2 5.5
## 18 8 2 1.6
## 19 9 2 4.6
## 20 10 2 3.4
The following dataset contains the diameter, height and volume of 31 black cherry trees (only the first 6 of which are shown here):
print(head(trees))
## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
When a bar plot is created with ggplot2 using its default settings it will count the number of occurrences of a number in a column and use that for the height of the bars:
p <- ggplot(trees, aes(Height))
p <- p + geom_bar()
print(p)
Notice that there was no argument passed to the geom_bar()
function; there was nothing between its brackets and so the default settings were used. Therefore, it counted the number of trees of each height (because “Height” was the column that was passed to the aes()
function) and plotted those as the bars. We can see that there were 5 trees of height 80 ft.
The below dataset shows the death rates in Virginia in 1940:
print(VADeaths)
## Rural Male Rural Female Urban Male Urban Female
## 50-54 11.7 8.7 15.4 8.4
## 55-59 18.1 11.7 24.3 13.6
## 60-64 26.9 20.3 37.0 19.3
## 65-69 41.0 30.9 54.6 35.1
## 70-74 66.0 54.3 71.1 50.0
Let’s only plot one of the columns - Urban Female - and have one bar for each age group:
aes()
function takes the x-variable and the y-variable of the plot in that order, so we want to pass it rownames(VADeaths)
in the x-position because the row names of the data frame (ie the age groups) will form the x-labels of our barsaes()
function is “Urban Female” as that will give use the height of each barstat
keyword argument of the geom_bar()
function:# Convert the array to a data frame
vadeaths <- as.data.frame(VADeaths)
p <- ggplot(vadeaths, aes(rownames(vadeaths), `Urban Female`))
p <- p + geom_bar(stat = "identity")
print(p)
Notice that the column we wanted to plot (“Urban Female”) had a space in its name. We therefore had to use grave accents when specifying it in the aes()
function.
The next dataset contains the weight of 71 chicks, measured six weeks after hatching, that were each fed on one of six different diets (first 15 data points shown):
print(head(chickwts, 15))
## weight feed
## 1 179 horsebean
## 2 160 horsebean
## 3 136 horsebean
## 4 227 horsebean
## 5 217 horsebean
## 6 168 horsebean
## 7 108 horsebean
## 8 124 horsebean
## 9 143 horsebean
## 10 140 horsebean
## 11 309 linseed
## 12 229 linseed
## 13 181 linseed
## 14 141 linseed
## 15 260 linseed
To plot the mean result of each group, we need to use the “summary” option of the stat
keyword argument:
p <- ggplot(chickwts, aes(feed, weight))
p <- p + geom_bar(stat = "summary")
print(p)
Let’s make the plot look a little better:
# Change the data frame's levels' names to edit the bars' labels
levels(chickwts$feed)[levels(chickwts$feed) == "casein"] <- "Casein"
levels(chickwts$feed)[levels(chickwts$feed) == "horsebean"] <- "Horsebean"
levels(chickwts$feed)[levels(chickwts$feed) == "linseed"] <- "Linseed"
levels(chickwts$feed)[levels(chickwts$feed) == "meatmeal"] <- "Meatmeal"
levels(chickwts$feed)[levels(chickwts$feed) == "soybean"] <- "Soybean"
levels(chickwts$feed)[levels(chickwts$feed) == "sunflower"] <- "Sunflower"
p <- ggplot(chickwts, aes(feed, weight))
p <- p + geom_bar(stat = "summary")
# Remove the main x-axis label
p <- p + xlab("")
# Add main title and y-axis label
p <- p + labs(title = "Chicken Weights By Feed Type", y = "Weight [g]")
print(p)
Remember that you can include Unicode in your axis titles using the \U
Unicode indicator (eg "Pi: \U03C0"
renders as “Pi: π”).
p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + xlab("")
p <- p + labs(title = "Chicken Weights By Feed Type", y = "Weight [g]")
print(p)
Change the legend’s title using the “fill” keyword argument in the labs()
function:
p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + xlab("")
p <- p + labs(
title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)
The legend is placed automatically when you use the “fill” keyword argument to colour the bars. Remove it by changing its position to “none”:
p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + xlab("")
p <- p + labs(
title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
p <- p + theme(legend.position = "none")
print(p)
blue <- "#00B7EB"
pink <- "#EE2A7B"
yellow <- "#FFD100"
green <- "#5BBF21"
p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + scale_fill_manual(
values = c(blue, yellow, pink, green, "red", "orange")
)
p <- p + xlab("")
p <- p + labs(
title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)
Viridis:
# Import the library that contains the palette
library(viridis)
p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + scale_fill_viridis(discrete = TRUE)
p <- p + xlab("")
p <- p + labs(
title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)
Ggsci:
# Import the library that contains the palette
library(ggsci)
p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + scale_fill_lancet()
p <- p + xlab("")
p <- p + labs(
title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)
RColorBrewer:
# Import the library that contains the palette
library(RColorBrewer)
p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + scale_fill_brewer(palette = "Purples")
p <- p + xlab("")
p <- p + labs(
title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)
Add in the geom_point()
function:
p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary")
p <- p + geom_point()
p <- p + scale_fill_grey(start = 0.8, end = 0.2)
p <- p + xlab("")
p <- p + labs(
title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)
Use the col
keyword argument in the geom_bar()
function:
p <- ggplot(chickwts, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "summary", col = "black")
p <- p + scale_fill_grey(start = 0.8, end = 0.2)
p <- p + xlab("")
p <- p + labs(
title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)
Two changes need to happen here:
summarySE()
from the “Rmisc” package can be used to generate the summary statistics for each groupstat
thus needs to change from “summary” to “identity”.library(Rmisc)
chickwts_summ <- summarySE(chickwts, measurevar = "weight", groupvars = "feed")
print(chickwts_summ)
## feed N weight sd se ci
## 1 Casein 12 323.5833 64.43384 18.60045 40.93931
## 2 Horsebean 10 160.2000 38.62584 12.21456 27.63126
## 3 Linseed 12 218.7500 52.23570 15.07915 33.18898
## 4 Meatmeal 11 276.9091 64.90062 19.56827 43.60083
## 5 Soybean 14 246.4286 54.12907 14.46660 31.25319
## 6 Sunflower 12 328.9167 48.83638 14.09785 31.02916
p <- ggplot(chickwts_summ, aes(feed, weight, fill = as.factor(feed)))
p <- p + geom_bar(stat = "identity")
p <- p + geom_errorbar(
aes(ymin = weight - se, ymax = weight + se), width = 0.25
)
p <- p + scale_fill_grey(start = 0.8, end = 0.2)
p <- p + xlab("")
p <- p + labs(
title = "Chicken Weights by Feed Type", y = "Weight [g]", fill = "Feed"
)
print(p)
Finally, use ggsave("File Name.png")
to save the plot to your computer.