Here we have a data set from a fictitious research study that is incomplete - it only specifies when the participants visited the study centre for the first time, and lists these in the ‘Visit Number’ column as ‘Visit 1’:
df <- data.frame(
participant_id = c(102, 105, 105, 111, 111, 111),
visit_date = c(
"2018-02-20", "2018-04-24", "2019-02-15",
"2018-01-27", "2013-11-19", "2018-11-28"
),
visit_no = c("Visit 1", "Visit 1", NA, "Visit 1", NA, NA)
)
print(df)
## participant_id visit_date visit_no
## 1 102 2018-02-20 Visit 1
## 2 105 2018-04-24 Visit 1
## 3 105 2019-02-15 <NA>
## 4 111 2018-01-27 Visit 1
## 5 111 2013-11-19 <NA>
## 6 111 2018-11-28 <NA>
What we want to do is fill in the gaps: calculate from the dates which visits were ’Visit 0’s and ’Visit 2’s. More specifically, we can see that:
Let’s see how to calculate that automatically:
For each participant, we want to filter to get only their visit information, get the date of their Visit 1 and then assess, for each other visit, whether it came before or after this date:
# Iterate over the participant IDs
for (id in unique(df$participant_id)) {
# Extract data for this participant ID
this_id <- subset(df, participant_id == id)
# Date of visit 1
visit1 <- subset(this_id, visit_no == "Visit 1")[["visit_date"]]
# The visit before visit 1 is visit 0
for (i in rownames(this_id)) {
date <- df[[i, "visit_date"]]
if (visit1 > date) {
df[[i, "visit_no"]] <- "Visit 0"
} else if (visit1 < date) {
df[[i, "visit_no"]] <- "Visit 2"
}
}
}
print(df)
## participant_id visit_date visit_no
## 1 102 2018-02-20 Visit 1
## 2 105 2018-04-24 Visit 1
## 3 105 2019-02-15 Visit 2
## 4 111 2018-01-27 Visit 1
## 5 111 2013-11-19 Visit 0
## 6 111 2018-11-28 Visit 2