If you are running an experiment with multiple timepoints and you want to select only the paired samples - ie the ones that have been tested twice - you need to filter by the number of times a values appears. Take the following example data:
df <- data.frame(
id = c(
"101-0001", "101-0001", "101-0002", "101-0002", "101-0003", "101-0004",
"101-0005"
),
timepoint = c("A", "B", "A", "B", "A", "A", "A")
)
print(df)
## id timepoint
## 1 101-0001 A
## 2 101-0001 B
## 3 101-0002 A
## 4 101-0002 B
## 5 101-0003 A
## 6 101-0004 A
## 7 101-0005 A
We can see that participants “101-0001” and “101-0002” have been tested at two timepoints each. To select only them, you can group by id
then file for those that appear twice using dplyr:
library(dplyr, warn.conflicts = FALSE)
df %>%
group_by(id) %>%
filter(n() == 2) -> subset
print(subset)
## # A tibble: 4 × 2
## # Groups: id [2]
## id timepoint
## <chr> <chr>
## 1 101-0001 A
## 2 101-0001 B
## 3 101-0002 A
## 4 101-0002 B
…or, using only Base R, you can split the data frame by
factors and count the number of rows for each factor using nrow
. Then, search for factors that appear twice and filter by that search result:
res <- by(df, df$id, nrow)
res <- res == 2
subset <- df[df$id %in% names(which(res)), ]
print(subset)
## id timepoint
## 1 101-0001 A
## 2 101-0001 B
## 3 101-0002 A
## 4 101-0002 B