⇦ Back

1 Plotting Values

To create a bar plot we can use the barplot() function in one of two ways:

  1. barplot(height) where height is a list of numbers (eg a vector or the column of a data frame). This will create a plot where the height of each bar corresponds to its respective number in the list.
  2. barplot(formula, data) where formula is a statement of the form y ~ x with y being numerical data and x being categorical data, both of which come from a data set data

Here is an example of each:

1.1 Numerical Data

For this example, we’re going to use the built-in dataset “trees” which contains the diameter, height and volume of 31 black cherry trees (only the first 6 of which are shown here):

print(head(trees))
##   Girth Height Volume
## 1   8.3     70   10.3
## 2   8.6     65   10.3
## 3   8.8     63   10.2
## 4  10.5     72   16.4
## 5  10.7     81   18.8
## 6  10.8     83   19.7

We can use barplot(height) to create a bar plot where, appropriately, the height of each bar corresponds to the height of each tree:

height <- trees$Height
barplot(height)

1.2 Numerical and Categorical Data

This example uses the built-in dataset “swiss” which has economic data from different Swiss provinces in 1888. We’re going to use the ‘Agriculture’ column (the % of the male population working in agriculture) from six of those provinces:

swiss$Province <- rownames(swiss)
rownames(swiss) <- 1:nrow(swiss)
print(head(swiss[c("Province", "Agriculture")]))
##       Province Agriculture
## 1   Courtelary        17.0
## 2     Delemont        45.1
## 3 Franches-Mnt        39.7
## 4      Moutier        36.5
## 5   Neuveville        43.5
## 6   Porrentruy        35.3

We have numerical data (Agriculture) and categorical data (Province) so we can use the second format for the barplot() function, namely barplot(formula, data):

barplot(Agriculture ~ Province, data = head(swiss))

2 Plotting the Frequencies of Values (Histograms)

A bar plot where the heights are the frequencies at which values appear in the dataset (as opposed to the values themselves) are called histograms. These can be plotted using the hist() function. Here’s an example using the same Height data from the “trees” dataset that was used before:

hist(trees$Height)

This can be changed to a probability density plot (where the total area of the histogram’s bars is 1) by specifying freq = FALSE:

hist(trees$Height, freq = FALSE)

3 Plotting the Mean Values of Groups of Data

If we use the “chickwts” dataset we get the weights of 71 chicks, measured six weeks after hatching, that were each fed one of six different diets (first 15 data points shown):

print(head(chickwts, 15))
##    weight      feed
## 1     179 horsebean
## 2     160 horsebean
## 3     136 horsebean
## 4     227 horsebean
## 5     217 horsebean
## 6     168 horsebean
## 7     108 horsebean
## 8     124 horsebean
## 9     143 horsebean
## 10    140 horsebean
## 11    309   linseed
## 12    229   linseed
## 13    181   linseed
## 14    141   linseed
## 15    260   linseed

Even though we have both categorical and numerical data we can’t immediately plot it with barplot(weight ~ feed, data = chickwts) because there are multiple data points in each group. R would try to plot one bar for each row of the data frame and it would fail because multiple data points would have the same x-value (“horsebean”, “linseed”, etc). We first need to aggregate it to get, for example, the mean value of each group:

data <- aggregate(weight ~ feed, data = chickwts, mean)
print(data)
##        feed   weight
## 1    casein 323.5833
## 2 horsebean 160.2000
## 3   linseed 218.7500
## 4  meatmeal 276.9091
## 5   soybean 246.4286
## 6 sunflower 328.9167

Now we can plot it:

barplot(weight ~ feed, data = data)

4 Formatting

Let’s make the plot look a little better:

4.1 Titles and Labels

  • Change the bar labels with names.arg
  • Set the x- and y-axis labels with xlab and ylab
  • Set the graph title with main
barplot(
    # Plot data
    weight ~ feed, data = data,
    # Bar labels
    names.arg = c("Casein", "Horsebean", "Linseed", "Meatmeal", "Soybean", "Sunflower"),
    # Axis labels
    ylab = "Weight [g]", xlab = "Feed Type",
    # Graph title
    main = "Chicken Weights By Feed Type"
)

Remember that you can include Unicode in your axis titles using the \U Unicode indicator (eg "Pi: \U03C0" renders as “Pi: π”).

4.2 Distinguish Groups

Now that we have our labelling sorted out for the graph as a whole let’s improve how we label the different groups:

4.2.1 Use Colour

Colour can be added in barplot() with the col keyword argument (see here for options):

# Custom Colours
pink <- "#FB4188"
green <- "#87C94A"
blue <- "#39C2F3"
yellow <- "#FADB39"
lgrey <- "#798287"
dgrey <- "#43454C"
colours <- c(pink, green, blue, yellow, lgrey, dgrey)

# Create bar plot
barplot(
    # Plot data
    weight ~ feed, data = data,
    # Bar labels
    names.arg = c("Casein", "Horsebean", "Linseed", "Meatmeal", "Soybean", "Sunflower"),
    # Axis labels
    ylab = "Weight [g]", xlab = "Feed Type",
    # Graph title
    main = "Chicken Weights By Feed Type",
    # Add colour
    col = colours
)

4.2.2 Use Colour and a Legend

  • A legend can be added with the legend() function, specifying:
    • The location of the box as a positional argument
    • Its title with title
    • The text labels to display using legend
    • The type of markers to display in the legend with pch (the options for this are here)
    • The colours to attach to the labels using col
    • We want to place the legend outside of the plot area, so we need to create an ‘inset’ for it to sit in. This is done using the inset keyword argument.
  • In order for the legend to be placed outside of the plot area, we need to tweak the parameters of the graph with the par() function:
    • Create extra space on the right for the legend to sit in by changing the margins of the graph with the mar argument. The default for the margins is c(bottom, left, top, right) = c(5, 4, 4, 2) + 0.1 so we are adding 6 units of space between the right edge of the graph and the side of the figure (and we are also removing 4 units of space from beneath the plot).
    • xpd = TRUE allows elements (eg our legend) to be drawn outside of the plot area
  • Lastly, we want to turn off the x-axis labels (as they have been moved to the legend) with xaxt = 'n' and xlab = ""
# Custom Colours
pink <- "#FB4188"
green <- "#87C94A"
blue <- "#39C2F3"
yellow <- "#FADB39"
lgrey <- "#798287"
dgrey <- "#43454C"
colours <- c(pink, green, blue, yellow, lgrey, dgrey)
# Custom labels
labels <- c("Casein", "Horsebean", "Linseed", "Meatmeal", "Soybean", "Sunflower")

# Add extra space to the right of the plot and enable drawing outside of the plot area
par(mar = c(1.1, 4.1, 4.1, 8.1), xpd = TRUE)
# Create bar plot
barplot(
    # Plot data
    weight ~ feed, data = data,
    # Bar labels
    xaxt = 'n',
    # Axis labels
    ylab = "Weight [g]", xlab = "",
    # Graph title
    main = "Chicken Weights By Feed Type",
    # Add colour
    col = colours
)
# Add a legend
legend("right", title = "Feed Type", legend = labels, pch = 16, col = colours, inset=c(-0.3,0))

4.3 Show Individual Points

This is a little tricky. In the first instance, see what happens if you simply add a scatter plot on top of the bar plot with points():

# Create bar plot
barplot(
    # Plot data
    weight ~ feed, data = data,
    # Bar labels
    names.arg = c("Casein", "Horsebean", "Linseed", "Meatmeal", "Soybean", "Sunflower"),
    # Axis labels
    ylab = "Weight [g]", xlab = "Feed Type",
    # Graph title
    main = "Chicken Weights By Feed Type"
)
# Add a scatter plot
points(chickwts$feed, chickwts$weight)

The data is correct but it’s offset from the bars! The reason this happens is because R doesn’t plot the bars at exactly x = 1, x = 2, etc. This can be seen by looking up the exact x-values R uses when creating the bars: assign the barplot to a variable and print its value:

bp <- barplot(weight ~ feed, data = data)
print(bp)
##      [,1]
## [1,]  0.7
## [2,]  1.9
## [3,]  3.1
## [4,]  4.3
## [5,]  5.5
## [6,]  6.7

This has produced a matrix with six values which correspond to the x-values of each of the six bars. As you can see, they are not 1, 2, 3, etc! One way to solve this problem is to replace the x-data we want to use in our scatter plot (which, at the moment is the categorical feed data: casein, horsebean, linseed, etc) with these x-values. The first step to doing this is to convert the chickwts$feed column to numerical data:

# Convert categorical data to numerical
chickwts$feed <- as.numeric(chickwts$feed)
print(chickwts$feed)
##  [1] 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6
## [39] 6 6 6 6 6 6 6 6 6 6 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1

These numbers correspond to which of the six bars the data in that row will be plotted on. We can use these to look-up the actual x-value of these bars from the matrix we got earlier:

# Replace ordinal data with the x-values of the bars
for (i in 1:nrow(chickwts)) {
    chickwts$feed[i] <- bp[chickwts$feed[i]]
}
print(chickwts$feed)
##  [1] 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.9 3.1 3.1 3.1 3.1 3.1 3.1 3.1 3.1 3.1
## [20] 3.1 3.1 3.1 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 5.5 6.7 6.7
## [39] 6.7 6.7 6.7 6.7 6.7 6.7 6.7 6.7 6.7 6.7 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3
## [58] 4.3 4.3 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7

Now we can plot this as a normal x-y scatter plot on top of the bar plot using points(). It’s a good idea to increase the y-axis limits (using ylim()) when doing this to ensure that all of the points can fit:

# Create bar plot
bp <- barplot(
    # Plot data
    weight ~ feed, data = data,
    # Bar labels
    names.arg = c("Casein", "Horsebean", "Linseed", "Meatmeal", "Soybean", "Sunflower"),
    # Axis labels
    ylab = "Weight [g]", xlab = "Feed Type",
    # Axis limits
    ylim = c(0, max(chickwts$weight) * 1.1),
    # Graph title
    main = "Chicken Weights By Feed Type"
)
# Convert categorical data to numerical
chickwts$feed <- as.numeric(chickwts$feed)
# Replace ordinal data with the x-values of the bars
for (i in 1:nrow(chickwts)) {
    chickwts$feed[i] <- bp[chickwts$feed[i]]
}
# Add a scatter plot
points(chickwts$feed, chickwts$weight)

5 Save Plot

Finally, use png("Name of Plot.png") to save the plot as a PNG file to your computer.

⇦ Back