As an example, let’s use the Motor Trend car road test dataset (which is one of the datasets that comes pre-loaded in R). The first six rows look like this:
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
If you have data in a data frame (which the above data is) or in a vector (and each column in the above data frame is a vector) then you can perform operations on an entire column in one go. For example, we can convert the weight of the cars in the above dataset (this information is stored in the “wt” column) from ‘thousands of pounds’ into ‘thousands of kilograms’ by dividing by 2.205 like this:
df <- mtcars$wt / 2.205
print(df)
## [1] 1.188209 1.303855 1.052154 1.458050 1.560091 1.569161
Notice that we could perform this operation on all of the rows in this column in one go. This is useful for simple tasks like dividing by 2.205 which can be written in one line, but it gets much more complicated if you want to do multiple things. Thankfully, it is possible to iterate through the rows; look at the data row-by-row and decide what to do with each.
A ‘for loop’ will repeat the same operations a given number of times. You can tell the programme to perform the same action FOR a certain number of repetitions. The syntax is as follows:
for (i in 1:3) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
Note that you start with the word for
, followed by an expression in round brackets then one or more functions between curly brackets. This tells the programme:
Hence the output of this script is the numbers ‘1’, ‘2’ and ‘3’ in order.
This works for vectors as well:
for (x in c("one", "two", "three")) {
print(x)
}
## [1] "one"
## [1] "two"
## [1] "three"
The above example iterated over the elements of the list, but you can also iterate over the indexes of the list like this:
ls <- c("one", "two", "three")
for (idx in 1:length(ls)) {
print(idx)
}
## [1] 1
## [1] 2
## [1] 3
However, the above code is not good practice because it contains superfluous information: we know you want to start at “1” so there’s no need to specify “1:length(ls)”. Rather, use the seq_along()
function which always starts at the beginning of the list and ends at the end:
for (idx in seq_along(ls)) {
print(idx)
}
## [1] 1
## [1] 2
## [1] 3
Given the same ‘mtcars’ data frame as above, we can iterate over each column directly:
for (column in mtcars) {
print(column)
}
## [1] 21.0 21.0 22.8 21.4 18.7 18.1
## [1] 6 6 4 6 8 6
## [1] 160 160 108 258 360 225
## [1] 110 110 93 110 175 105
## [1] 3.90 3.90 3.85 3.08 3.15 2.76
## [1] 2.620 2.875 2.320 3.215 3.440 3.460
## [1] 16.46 17.02 18.61 19.44 17.02 20.22
## [1] 0 0 1 1 0 1
## [1] 1 1 1 0 0 0
## [1] 4 4 4 3 3 3
## [1] 4 4 1 1 2 1
Sometimes it makes more sense to iterate over the number of columns. This allows us to use the index of each column to access the corresponding information in a different object. For example, if we want to print each column’s heading and also print each column’s contents we need to tell the script to print the ith column heading and then to print the ith column’s contents for each i from 1 to the number of columns in the data frame:
for (i in seq_along(mtcars)) {
print(colnames(mtcars)[i])
print(mtcars[[i]])
}
## [1] "mpg"
## [1] 21.0 21.0 22.8 21.4 18.7 18.1
## [1] "cyl"
## [1] 6 6 4 6 8 6
## [1] "disp"
## [1] 160 160 108 258 360 225
## [1] "hp"
## [1] 110 110 93 110 175 105
## [1] "drat"
## [1] 3.90 3.90 3.85 3.08 3.15 2.76
## [1] "wt"
## [1] 2.620 2.875 2.320 3.215 3.440 3.460
## [1] "qsec"
## [1] 16.46 17.02 18.61 19.44 17.02 20.22
## [1] "vs"
## [1] 0 0 1 1 0 1
## [1] "am"
## [1] 1 1 1 0 0 0
## [1] "gear"
## [1] 4 4 4 3 3 3
## [1] "carb"
## [1] 4 4 1 1 2 1
Instead of iterating over a data frame’s columns we can look at each of its rows in turn like this:
for (i in 1:nrow(mtcars)) {
print(mtcars[i, ])
}
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
## mpg cyl disp hp drat wt qsec vs am gear carb
## Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
## mpg cyl disp hp drat wt qsec vs am gear carb
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## mpg cyl disp hp drat wt qsec vs am gear carb
## Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
## mpg cyl disp hp drat wt qsec vs am gear carb
## Valiant 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
However, the above code is again not good practice. Instead of iterating over the numbers from 1 to the number of rows in the data frame (nrow()
), as was done above, you should instead use seq_len(nrow())
like so:
for (i in seq_len(nrow(mtcars))) {
print(mtcars[i, ])
}
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
## mpg cyl disp hp drat wt qsec vs am gear carb
## Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
## mpg cyl disp hp drat wt qsec vs am gear carb
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## mpg cyl disp hp drat wt qsec vs am gear carb
## Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
## mpg cyl disp hp drat wt qsec vs am gear carb
## Valiant 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1
A ‘while loop’ will continue performing a set of operations until a condition is met. Let’s imagine you have a jug that is 1 litre in size and you are pouring 150 ml glasses of water into it. You want to continue doing this as many times as you can without letting the jug overflow. Let’s set up the scenario:
water_in_jug <- 0
capacity <- 1000
glass_size <- 150
number_of_pours <- 0
You start with 0 ml of water in the jug. The capacity of the jug is 1000 ml. The size of the glass you are pouring in is 150 ml. You start having poured this glass into the jug 0 times. Now start pouring:
while (water_in_jug < capacity) {
water_in_jug <- water_in_jug + 150
number_of_pours <- number_of_pours + 1
}
print(number_of_pours)
## [1] 7
print(water_in_jug)
## [1] 1050
Whoopsie! You poured 7 glasses worth of water into the jug and caused it to overflow by 50 ml! Think about how you would modify the code to stop it from pouring too much.
Let’s continue using the above example. Now that we know that 6 pours is the most we can perform before the jug overflows, we could use a for loop and set it to run exactly six times:
for (i in 1:6) {
water_in_jug <- i * glass_size
number_of_pours <- i
}
print(number_of_pours)
## [1] 6
print(water_in_jug)
## [1] 900
Hurray! No overflow!
However, we could also use a while loop and set it to run exactly six times:
water_in_jug <- 0
capacity <- 1000
glass_size <- 150
number_of_pours <- 0
while (number_of_pours < 6) {
water_in_jug <- water_in_jug + 150
number_of_pours <- number_of_pours + 1
}
print(number_of_pours)
## [1] 6
print(water_in_jug)
## [1] 900
The opposite (using a for loop as a while loop) isn’t really possible; how could you set up a for loop to run until a condition is met? You don’t know beforehand how many times the loop needs to run for!