1 Packages

The code on this page uses the Statsmodels, Pandas, Seaborn and Matplotlib packages. These can be installed from the terminal with:

$ python3.11 -m pip install statsmodels
$ python3.11 -m pip install pandas
$ python3.11 -m pip install seaborn
$ python3.11 -m pip install matplotlib

Replace python3.11 with the version of Python you are using. Once installed, these packages can be imported into your Python script via the following:

from statsmodels import api as sm
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

2 Example Data

For this page we will use the “American National Election Survey 1996” dataset from Statsmodels (see the documentation here):

# Load the data
dataset = sm.datasets.anes96.load_pandas()

The data within this data set has 11 columns but we will only use three of them:

PID : The political party with which the survey respondent identified
educ : Education level
selfLR : Self-reported political leanings from “Left” to “Right” on a scale of 1 to 7

All three of these columns are numerically encoded: they contain numbers which represent their categorical meaning. We will decode the PID and educ columns as follows:

pid = {
    0: 'Strong Democrat',
    1: 'Weak Democrat',
    2: 'Indep-Democrat',
    3: 'Indep-Indep',
    4: 'Indep-Republican',
    5: 'Weak Republican',
    6: 'Strong Republican',
}
educ = {
    1: '1-8 grades',
    2: 'Some high school',
    3: 'High school',
    4: 'Some college',
    5: 'College degree',
    6: "Master's",
    7: 'PhD',
}
cols = ['PID', 'educ', 'selfLR']
df = dataset['data'].loc[:, cols]
df['PID'] = df['PID'].replace(pid)
df['educ'] = df['educ'].replace(educ)

print(df.head(10))

##                  PID          educ  selfLR
## 0  Strong Republican   High school     7.0
## 1      Weak Democrat  Some college     3.0
## 2      Weak Democrat      Master's     2.0
## 3      Weak Democrat      Master's     3.0
## 4    Strong Democrat      Master's     5.0
## 5      Weak Democrat  Some college     3.0
## 6      Weak Democrat  Some college     5.0
## 7   Indep-Republican  Some college     5.0
## 8        Indep-Indep  Some college     4.0
## 9    Strong Democrat   High school     3.0

3 Pivot Tables

A pivot table takes the groups from within a data frame and re-organises the data according to those groups. Pandas has two methods that will do this: .pivot() and .pivot_table(). The difference is that the first needs the data frame to already be grouped while the second can do the grouping, aggregating and pivoting in one step.

3.1 `.pivot()`

The fact that this method requires a data frame that is already grouped is not a problem because we can use the .groupby() method for this (see this page for more):

.groupby() outputs data in long format
.pivot() outputs data in wide format

Let’s have a go with getting the mean value for respondents’ political leanings, grouped by political party identification and education level:

grouped = df.groupby(['PID', 'educ'])['selfLR'].mean()
grouped = grouped.reset_index()

print(grouped.head())

##               PID            educ    selfLR
## 0  Indep-Democrat      1-8 grades  4.000000
## 1  Indep-Democrat  College degree  3.384615
## 2  Indep-Democrat     High school  3.892857
## 3  Indep-Democrat        Master's  3.555556
## 4  Indep-Democrat             PhD  3.200000

And, now that the data is grouped, we can pivot it:

pivot_table = grouped.pivot(index='PID', columns='educ', values='selfLR')

print(pivot_table)

## educ               1-8 grades  College degree  High school  Master's       PhD  Some college  Some high school
## PID                                                                                                           
## Indep-Democrat            4.0        3.384615     3.892857  3.555556  3.200000      4.266667          4.250000
## Indep-Indep               NaN        4.333333     3.833333  4.333333  4.000000      3.888889          4.666667
## Indep-Republican          6.0        4.875000     5.000000  5.000000  4.812500      4.937500          4.857143
## Strong Democrat           4.2        3.117647     3.762712  2.925000  2.545455      3.000000          4.421053
## Strong Republican         6.0        5.882353     5.642857  5.735849  5.680000      5.515152          6.250000
## Weak Democrat             4.0        3.647059     4.102041  3.195122  2.782609      3.916667          3.600000
## Weak Republican           NaN        4.933333     5.000000  5.000000  4.823529      5.175000          5.400000

3.2 Heat Map

This data is perfect for plotting in a heat map, although before we do that it would be clever to re-order the columns into their logical order. The easiest way to do that in this example is just to start again, so here we go:

# Load the data
dataset = sm.datasets.anes96.load_pandas()
# Select the columns
cols = ['PID', 'educ', 'selfLR']
df = dataset['data'].loc[:, cols]
# Group the data
grouped = df.groupby(['PID', 'educ'])['selfLR'].mean()
grouped = grouped.reset_index()
# Pivot
pivot_table = grouped.pivot(index='PID', columns='educ', values='selfLR')
pivot_table = pivot_table.rename(index=pid)
pivot_table = pivot_table.rename(columns=educ)
# Plot
label = 'Self-Reported Political Leanings'
sns.heatmap(pivot_table, annot=True, cbar_kws={'label': label})
plt.ylabel('Political Party Identified With')
plt.xlabel('Education Level')
plt.xticks(rotation=60)
plt.tight_layout()
plt.show()

3.3 `.pivot_table()`

Instead of chaining .groupby(), .mean(), .reset_index() and .pivot() we can do it all with .pivot_table() as follows:

# Load the data
dataset = sm.datasets.anes96.load_pandas()
# Select the columns
cols = ['PID', 'educ', 'selfLR']
df = dataset['data'].loc[:, cols]
# Decode the values
pid = {
    0: 'Strong Democrat',
    1: 'Weak Democrat',
    2: 'Indep-Democrat',
    3: 'Indep-Indep',
    4: 'Indep-Republican',
    5: 'Weak Republican',
    6: 'Strong Republican',
}
educ = {
    1: '1-8 grades',
    2: 'Some high school',
    3: 'High school',
    4: 'Some college',
    5: 'College degree',
    6: "Master's",
    7: 'PhD',
}
df['PID'] = df['PID'].replace(pid)
df['educ'] = df['educ'].replace(educ)

# Create a pivot table in one step
pivot_table = df.pivot_table(
    index='PID', columns='educ', values='selfLR', aggfunc='mean'
)

print(pivot_table)

## educ               1-8 grades  College degree  High school  Master's       PhD  Some college  Some high school
## PID                                                                                                           
## Indep-Democrat            4.0        3.384615     3.892857  3.555556  3.200000      4.266667          4.250000
## Indep-Indep               NaN        4.333333     3.833333  4.333333  4.000000      3.888889          4.666667
## Indep-Republican          6.0        4.875000     5.000000  5.000000  4.812500      4.937500          4.857143
## Strong Democrat           4.2        3.117647     3.762712  2.925000  2.545455      3.000000          4.421053
## Strong Republican         6.0        5.882353     5.642857  5.735849  5.680000      5.515152          6.250000
## Weak Democrat             4.0        3.647059     4.102041  3.195122  2.782609      3.916667          3.600000
## Weak Republican           NaN        4.933333     5.000000  5.000000  4.823529      5.175000          5.400000

Note that df.pivot_table() is the same as pd.pivot_table(df):

pivot_table = pd.pivot_table(
    df, index='PID', columns='educ', values='selfLR', aggfunc='mean'
)

print(pivot_table)

## educ               1-8 grades  College degree  High school  Master's       PhD  Some college  Some high school
## PID                                                                                                           
## Indep-Democrat            4.0        3.384615     3.892857  3.555556  3.200000      4.266667          4.250000
## Indep-Indep               NaN        4.333333     3.833333  4.333333  4.000000      3.888889          4.666667
## Indep-Republican          6.0        4.875000     5.000000  5.000000  4.812500      4.937500          4.857143
## Strong Democrat           4.2        3.117647     3.762712  2.925000  2.545455      3.000000          4.421053
## Strong Republican         6.0        5.882353     5.642857  5.735849  5.680000      5.515152          6.250000
## Weak Democrat             4.0        3.647059     4.102041  3.195122  2.782609      3.916667          3.600000
## Weak Republican           NaN        4.933333     5.000000  5.000000  4.823529      5.175000          5.400000

Using .reset_index() is an optional extra step that will return the data frame to the more recognisable format with numbers as the indices:

pivot_table = pivot_table.reset_index()

print(pivot_table)

## educ                PID  1-8 grades  College degree  High school  Master's       PhD  Some college  Some high school
## 0        Indep-Democrat         4.0        3.384615     3.892857  3.555556  3.200000      4.266667          4.250000
## 1           Indep-Indep         NaN        4.333333     3.833333  4.333333  4.000000      3.888889          4.666667
## 2      Indep-Republican         6.0        4.875000     5.000000  5.000000  4.812500      4.937500          4.857143
## 3       Strong Democrat         4.2        3.117647     3.762712  2.925000  2.545455      3.000000          4.421053
## 4     Strong Republican         6.0        5.882353     5.642857  5.735849  5.680000      5.515152          6.250000
## 5         Weak Democrat         4.0        3.647059     4.102041  3.195122  2.782609      3.916667          3.600000
## 6       Weak Republican         NaN        4.933333     5.000000  5.000000  4.823529      5.175000          5.400000

⇦ Back

Data Handling in Python:Pivot Tables

1 Packages

2 Example Data

3 Pivot Tables

3.1 .pivot()

3.2 Heat Map

3.3 .pivot_table()

Data Handling in Python:
Pivot Tables

3.1 `.pivot()`

3.3 `.pivot_table()`