⇦ Back

1 The Basics

A bar plot is created in Python by using the bar() function from Matplotlib’s pyplot module (which gets imported and re-named to a shorthand name: plt). The first argument passed to this function is the x-positions of the bars and the second argument is the heights. The function show() will then display the graph:

import matplotlib.pyplot as plt

x_positions = [0, 1, 2]
heights = [6, 12, 9]

plt.bar(x_positions, heights)
plt.show()

1.1 Plot Title, Axis Labels, Bar Widths

Titles and labels are added using title(), xlabel() and ylabel() while the width of the bars is defined by the third argument in the bar() function:

x_positions = [0, 1, 2]
heights = [6, 12, 9]
width = 0.5

plt.bar(x_positions, heights, width)
plt.title('How bar plots work in Matplotlib')
plt.xlabel('The first argument of the bar() function')
plt.ylabel('The second argument of the bar() function')

plt.show()

1.2 Automatically Get the x-Positions

This is exactly the same example as the previous one, except this time Numpy’s arange() function is being used to automatically generate the [0, 1, 2] x-positions using the length of the heights variable (which happens to be the number of bars that are going to be plotted):

import numpy as np

heights = [6, 12, 9]
x_positions = np.arange(len(heights))
width = 0.5

plt.bar(x_positions, heights, width)
plt.title('How bar plots work in Matplotlib')
plt.xlabel('The first argument of the bar() function')
plt.ylabel('The second argument of the bar() function')

plt.show()

1.3 Offset the Bars

In this example, a fictitious exercise session is plotted. The x-axis represents time, so it makes sense to not have any gaps between the bars. It also makes sense to offset them: if something starts at minute 3, that means it runs from 3 to 4 so you would want the bar that relates to minute 3 to be centred on 3.5, not 3.

import numpy as np

speed = [
    20, 24, 28, 32, 36, 32, 28, 24, 20, 24, 28, 32, 36,
    32, 28, 24, 20, 24, 28, 32, 36, 32, 28, 24, 20
]
minute = np.arange(len(speed))
offset = 0.5
width = 1

plt.bar(minute + offset, speed, width)
plt.title('Speed Interval Training')
plt.ylabel('Speed')
plt.ylim(17, 39)
plt.yticks(sorted(set(speed)))
plt.xlabel('Minute')
plt.xlim(0, len(minute))

plt.show()

2 Plotting a Dataset

It’s not often that you will have your data in the perfect format with the x-positions and heights in their own variables already. Most times, you will need to do some level of data manipulation before you can plot. Here’s an example using the iris dataset from the scikit-learn package (see here for more info on these toy datasets):

from sklearn.datasets import load_iris

# Load the dataset
iris = load_iris()

This dataset contains data from three different species of the iris flower:

print(iris['target_names'])
## ['setosa' 'versicolor' 'virginica']

For this example we only want one of these three groups, so extract the data for the ‘versicolor’ species only:

# Get the number corresponding to the versicolor species
species = np.where(iris['target_names'] == 'versicolor')
# Lookup this number in the 'target' column
idx = [i for i, v in enumerate(iris['target']) if v == species]
# Filter to get only the rows with this number
data = iris['data'][idx]

The information in this array is divided into four columns: sepal length, sepal width, petal length and petal width. The first 10 rows (of 50) are below:

print(data[:10])
## [[7.  3.2 4.7 1.4]
##  [6.4 3.2 4.5 1.5]
##  [6.9 3.1 4.9 1.5]
##  [5.5 2.3 4.  1.3]
##  [6.5 2.8 4.6 1.5]
##  [5.7 2.8 4.5 1.3]
##  [6.3 3.3 4.7 1.6]
##  [4.9 2.4 3.3 1. ]
##  [6.6 2.9 4.6 1.3]
##  [5.2 2.7 3.9 1.4]]

Extract only the petal lengths (the third column, ie the one at index 2):

# Extract the 'petal length (cm)' data
petal_length = data[:, 2]
print(petal_length)
## [4.7 4.5 4.9 4.  4.6 4.5 4.7 3.3 4.6 3.9 3.5 4.2 4.  4.7 3.6 4.4 4.5 4.1
##  4.5 3.9 4.8 4.  4.9 4.7 4.3 4.4 4.8 5.  4.5 3.5 3.8 3.7 3.9 5.1 4.5 4.5
##  4.7 4.4 4.1 4.  4.4 4.6 4.  3.3 4.2 4.2 4.2 4.3 3.  4.1]

Now that we have the data we want, there are two ways to plot it:

2.1 Plotting Values

The first way is to do what we’ve done before: plot the numbers as they are:

# Extract the data
heights = petal_length
x_positions = np.arange(len(heights))

# Plot
plt.bar(x_positions, heights)
plt.title('The lengths of the petals of 50 iris versicolor flowers')
plt.ylabel('Length (cm)')
plt.xlabel('')

plt.show()

This code has worked as expected, but it’s probably not the graph that we want. It would be much more useful to have a bar plot of the number of petals of each length. For that we need to plot the counts, ie the number of occurrences of each petal length:

2.2 Plotting Counts (aka Histograms)

The number of times each value appears in the ‘petal length’ dataset can be counted with the unique() function from Numpy using the return_counts option:

# Find the unique values in the dataset and count their occurrences
uniq, cnts = np.unique(petal_length, return_counts=1)

# Plotting counts (creating a histogram with smallest bin width)
x_positions = uniq
heights = cnts
width = 0.1

# Plot
plt.bar(x_positions, heights, width)
plt.title('The lengths of the petals of 50 iris versicolor flowers')
plt.xlabel('Length (cm)')
plt.ylabel('Count')

plt.show()

3 Data Types

The first examples used lists while the last examples (the ones that used the iris dataset) used arrays. It makes no difference; the bar() function treats them both the same. It also treats Pandas data frames and series the same, as shown in this example where the iris dataset is first converted into a data frame before being plotted:

import pandas as pd

# Load the data set
iris = load_iris()
# Convert the array to a data frame
iris_df = pd.DataFrame(iris['data'], columns=iris['feature_names'])
# Add the species data as a column to the data frame
iris_df['Species'] = iris['target']
# Get the value that corresponds to the versicolor species
species = np.where(iris['target_names'] == 'versicolor')
# Convert tuple to integer
species = species[0][0]
# Filter to only have data from the versicolor plant
data = iris_df[iris_df['Species'] == species]
# Count the number of occurrences of each petal length
uniq, cnts = np.unique(data['petal length (cm)'], return_counts=1)

# Consolidate the data you want to plot
x_positions = uniq
heights = cnts
width = 0.1

# Plot
plt.bar(x_positions, heights, width)
plt.title('Bar plot using a Pandas data frame')
plt.xlabel('Petal Lengths (cm)')
plt.ylabel('Count')

plt.show()

4 Using Latex and Annotations

It’s possible to use Latex for the text in the plot, which brings with it the ability to use Greek letters, equations, Unicode symbols and the like. See here for more info.

# Make figures A6 in size
A = 6
plt.rc('figure', figsize=[46.82 * .5**(.5 * A), 33.11 * .5**(.5 * A)])
# Use Latex
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rc('text.latex', preamble=r'\usepackage{textgreek}')

# Plot
plt.bar(x_positions, heights, width)
plt.title(r'Using \LaTeX')
plt.xlabel('Petal Lengths (cm)')
plt.ylabel(r'Count (\textSigma)')
plt.annotate(r'We can create equations if we want:', (3, 6))
plt.annotate(r'$\Sigma = \frac{x}{2}$', (3, 5.5))

plt.show()

5 Using Colours

Colours are specified using the color keyword argument. Colours needs to be provided in a list and, if the number of bars is larger than the number of provided colours, it will wrap around (for example, if there are five bars and only three colours are provided, the fourth and fifth bars will have the same colours as the first and second, respectively).

5.1 Defined Colours

# Defined colours
colours = ['red', 'green']

# Plot
plt.bar(x_positions, heights, width, color=colours)
plt.title('Customising the Colours')
plt.xlabel('Petal Lengths (cm)')
plt.ylabel('Count')

plt.show()

5.2 Colour Palette

# Colour Palette
plt.bar(x_positions, heights, width, color=['C0', 'C1', 'C2'])
plt.title('Customising the Colours')
plt.xlabel('Petal Lengths (cm)')
plt.ylabel('Count')

plt.show()

5.3 Custom Colours

# Custom Colours
pink = '#FB4188'
green = '#87C94A'
blue = '#39C2F3'
yellow = '#FADB39'
lgrey = '#798287'
dgrey = '#43454C'
colours = [pink, green, blue, yellow, lgrey, dgrey]

# Plot
plt.bar(x_positions, heights, width, color=colours)
plt.title('Customising the Colours')
plt.xlabel('Petal Lengths (cm)')
plt.ylabel('Count')

plt.show()

6 Save Plot

Finally, use plt.savefig('name_of_plot.png') to save the plot to your computer.

⇦ Back