1 From Long to Wide
2 From Wide to Long
- 2.1 Using gather() from tidyr
3 From Long to Wide via Summarising

A “pivot table” is a way of summarising the data in a table. For example, if we have the following results of three tests that were taken by two students:

Student	Test	Score
A	Test 1	66
B	Test 1	76
A	Test 2	80
B	Test 2	77
A	Test 3	59
B	Test 3	61

Then a pivot table would look as follows:

Student	Test 1	Test 2	Test 3
A	66	80	59
B	76	77	61

As you can see, the first table was long while the second was wide. That’s another way of talking about this type of operation: going from long to wide or, if you want to go the opposite way, from wide to long.

1 From Long to Wide

For this example we will use the “Gapminder” dataset, which can be installed as follows:

install.packages("gapminder", repos = "http://cran.us.r-project.org")

This dataset contains the life expectancy, population and GDP per capita of 142 countries as recorded in various years:

# Import the gapminder as a tibble
library(gapminder)
# Convert tibble to data frame
gapminder <- as.data.frame(gapminder)
# Display
head(gapminder, n = 20)
cat("\n\t\t\t\t\t\t\t...\n\n")
tail(gapminder)

##        country continent year lifeExp      pop gdpPercap
## 1  Afghanistan      Asia 1952  28.801  8425333  779.4453
## 2  Afghanistan      Asia 1957  30.332  9240934  820.8530
## 3  Afghanistan      Asia 1962  31.997 10267083  853.1007
## 4  Afghanistan      Asia 1967  34.020 11537966  836.1971
## 5  Afghanistan      Asia 1972  36.088 13079460  739.9811
## 6  Afghanistan      Asia 1977  38.438 14880372  786.1134
## 7  Afghanistan      Asia 1982  39.854 12881816  978.0114
## 8  Afghanistan      Asia 1987  40.822 13867957  852.3959
## 9  Afghanistan      Asia 1992  41.674 16317921  649.3414
## 10 Afghanistan      Asia 1997  41.763 22227415  635.3414
## 11 Afghanistan      Asia 2002  42.129 25268405  726.7341
## 12 Afghanistan      Asia 2007  43.828 31889923  974.5803
## 13     Albania    Europe 1952  55.230  1282697 1601.0561
## 14     Albania    Europe 1957  59.280  1476505 1942.2842
## 15     Albania    Europe 1962  64.820  1728137 2312.8890
## 16     Albania    Europe 1967  66.220  1984060 2760.1969
## 17     Albania    Europe 1972  67.690  2263554 3313.4222
## 18     Albania    Europe 1977  68.930  2509048 3533.0039
## 19     Albania    Europe 1982  70.420  2780097 3630.8807
## 20     Albania    Europe 1987  72.000  3075321 3738.9327
## 
##                          ...
## 
##       country continent year lifeExp      pop gdpPercap
## 1699 Zimbabwe    Africa 1982  60.363  7636524  788.8550
## 1700 Zimbabwe    Africa 1987  62.351  9216418  706.1573
## 1701 Zimbabwe    Africa 1992  60.377 10704340  693.4208
## 1702 Zimbabwe    Africa 1997  46.809 11404948  792.4500
## 1703 Zimbabwe    Africa 2002  39.989 11926563  672.0386
## 1704 Zimbabwe    Africa 2007  43.487 12311143  469.7093

As you can see, this is a long dataset. To pivot it and make it wide, you have two options:

1.1 Using spread() from tidyr

The spread() function allows us to choose which variable to use as the column headings and which to use as the data in the table:

library(tidyr)
# Select the columns we are interested in
long <- gapminder[c("country", "continent", "year", "pop")]
# Pivot
wide <- spread(long, year, pop)

Show the population in each country in each year sampled:

head(wide[, 1:12])

##       country continent     1952     1957     1962     1967     1972     1977     1982     1987     1992     1997
## 1 Afghanistan      Asia  8425333  9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415
## 2     Albania    Europe  1282697  1476505  1728137  1984060  2263554  2509048  2780097  3075321  3326498  3428038
## 3     Algeria    Africa  9279525 10270856 11000948 12760499 14760787 17152804 20033753 23254956 26298373 29072015
## 4      Angola    Africa  4232095  4561361  4826015  5247469  5894858  6162675  7016384  7874230  8735988  9875024
## 5   Argentina  Americas 17876956 19610538 21283783 22934225 24779799 26983828 29341374 31620918 33958947 36203463
## 6   Australia   Oceania  8691212  9712569 10794968 11872264 13177000 14074100 15184200 16257249 17481977 18565243

1.2 Using cast() from reshape

The cast() function allows us to do the same thing without needing to first trim down to just the columns we want:

library(reshape)
wide <- cast(gapminder, country ~ year, value = "pop")
head(wide[, 1:12])

##       country     1952     1957     1962     1967     1972     1977     1982     1987     1992     1997     2002
## 1 Afghanistan  8425333  9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 25268405
## 2     Albania  1282697  1476505  1728137  1984060  2263554  2509048  2780097  3075321  3326498  3428038  3508512
## 3     Algeria  9279525 10270856 11000948 12760499 14760787 17152804 20033753 23254956 26298373 29072015 31287142
## 4      Angola  4232095  4561361  4826015  5247469  5894858  6162675  7016384  7874230  8735988  9875024 10866106
## 5   Argentina 17876956 19610538 21283783 22934225 24779799 26983828 29341374 31620918 33958947 36203463 38331121
## 6   Australia  8691212  9712569 10794968 11872264 13177000 14074100 15184200 16257249 17481977 18565243 19546792

Of course, we do want to keep the “continent” column there:

wide <- cast(gapminder, country + continent ~ year, value = "pop")
head(wide[, 1:12])

##       country continent     1952     1957     1962     1967     1972     1977     1982     1987     1992     1997
## 1 Afghanistan      Asia  8425333  9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415
## 2     Albania    Europe  1282697  1476505  1728137  1984060  2263554  2509048  2780097  3075321  3326498  3428038
## 3     Algeria    Africa  9279525 10270856 11000948 12760499 14760787 17152804 20033753 23254956 26298373 29072015
## 4      Angola    Africa  4232095  4561361  4826015  5247469  5894858  6162675  7016384  7874230  8735988  9875024
## 5   Argentina  Americas 17876956 19610538 21283783 22934225 24779799 26983828 29341374 31620918 33958947 36203463
## 6   Australia   Oceania  8691212  9712569 10794968 11872264 13177000 14074100 15184200 16257249 17481977 18565243

1.3 Using dcast() from reshape2

The reshape2 package has the acast() and the dcast() functions: acast() returns its output as a vector, matrix or array while dcast() returns it as a data frame. They are otherwise identical to cast() from the reshape package, although they use value.var instead of value as the keyword for the value parameter:

library(reshape2)
wide <- dcast(gapminder, country + continent ~ year, value.var = "pop")
head(wide[, 1:12])

##       country continent     1952     1957     1962     1967     1972     1977     1982     1987     1992     1997
## 1 Afghanistan      Asia  8425333  9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415
## 2     Albania    Europe  1282697  1476505  1728137  1984060  2263554  2509048  2780097  3075321  3326498  3428038
## 3     Algeria    Africa  9279525 10270856 11000948 12760499 14760787 17152804 20033753 23254956 26298373 29072015
## 4      Angola    Africa  4232095  4561361  4826015  5247469  5894858  6162675  7016384  7874230  8735988  9875024
## 5   Argentina  Americas 17876956 19610538 21283783 22934225 24779799 26983828 29341374 31620918 33958947 36203463
## 6   Australia   Oceania  8691212  9712569 10794968 11872264 13177000 14074100 15184200 16257249 17481977 18565243

Something that’s useful to remember is that the default aggregation function for dcast() is length. In other words, it will return a count of the number of elements relevant to the row and column headings by default. In this example, it gives the number of countries for which we have data at each time point and from each continent:

wide <- dcast(gapminder, continent ~ year)
print(wide)

##   continent 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007
## 1    Africa   52   52   52   52   52   52   52   52   52   52   52   52
## 2  Americas   25   25   25   25   25   25   25   25   25   25   25   25
## 3      Asia   33   33   33   33   33   33   33   33   33   33   33   33
## 4    Europe   30   30   30   30   30   30   30   30   30   30   30   30
## 5   Oceania    2    2    2    2    2    2    2    2    2    2    2    2

2 From Wide to Long

Unsurprisingly, this is the opposite operation: going from a wide table to a long one.

2.1 Using gather() from tidyr

# Either explicitly define which columns to use
gathercols <- c(
    "1952", "1957", "1962", "1967", "1972", "1977",
    "1982", "1987", "1992", "1997", "2002", "2007"
)
# Or programmatically extract them from the column names
gathercols <- colnames(wide)[3:length(colnames(wide))]

# Convert from wide format to long format
long <- gather(wide, "year", "pop", all_of(gathercols), factor_key = TRUE)
head(long)

##   continent 1952 year pop
## 1    Africa   52 1957  52
## 2  Americas   25 1957  25
## 3      Asia   33 1957  33
## 4    Europe   30 1957  30
## 5   Oceania    2 1957   2
## 6    Africa   52 1962  52

3 From Long to Wide via Summarising

Lastly, we can generate summary statistics for the data in the table at the same time as we are pivoting it. For this example, let’s use the ChickWeight pre-loaded dataset which details the weights of chicks on different diets at different ages:

head(ChickWeight)

##   weight Time Chick Diet
## 1     42    0     1    1
## 2     51    2     1    1
## 3     59    4     1    1
## 4     64    6     1    1
## 5     76    8     1    1
## 6     93   10     1    1

Here’s how we can pivot the table and end up with the median weight of each chick over the time points:

chick_weights <- aggregate(
    weight ~ Chick + Diet, data = ChickWeight, FUN = median
)
chick_weights$Chick <- as.numeric(as.character(chick_weights$Chick))
chick_weights <- chick_weights[order(chick_weights$Chick), ]
head(chick_weights)

##    Chick Diet weight
## 15     1    1   99.5
## 17     2    1  112.5
## 14     3    1  107.0
## 11     4    1   94.5
## 18     5    1  123.5
## 12     6    1  132.5

⇦ Back

Data Handling in R:Pivot Tables