The Problem

Let’s say we have the following incomplete dataset:

main <- data.frame(
    Record.Id = c(100, 200, 300, 400, 500, 600),
    Name = c("Alpha", "Bravo", NA, NA, "Echo", "Foxtrot"),
    Age = c(NA, NA, NA, NA, 25, 26),
    Height = c(NA, NA, 173, 174, NA, NA)
)
print(main)

##   Record.Id    Name Age Height
## 1       100   Alpha  NA     NA
## 2       200   Bravo  NA     NA
## 3       300    <NA>  NA    173
## 4       400    <NA>  NA    174
## 5       500    Echo  25     NA
## 6       600 Foxtrot  26     NA

There are a whole bunch of NA values where data is missing.

Now let’s say that we have a supplementary dataset that contains some data that the main dataset doesn’t:

supplementary <- data.frame(
    Record.Id = c(300, 400, 500, 600, 100),
    Name = c("Charlie", "Delta", NA, NA, "Alpha"),
    Age = c(23, 24, NA, NA, 21),
    Height = c(173, 174, NA, 176, 171)
)
print(supplementary)

##   Record.Id    Name Age Height
## 1       300 Charlie  23    173
## 2       400   Delta  24    174
## 3       500    <NA>  NA     NA
## 4       600    <NA>  NA    176
## 5       100   Alpha  21    171

We could use this supplementary data to fill in some of the gaps, except that we have two problems:

The supplementary dataset is not in the same order as the main dataset
It is not the same size as the main dataset

So we can’t just put the data frames on top of each other and create a simple combination of the two. Essentially, what we want to do is merge the two data frames, but “merge” has a very specific definition in R and it’s not what we want in this situation.

Data Handling in R:
Filling in Missing Data

The Problem

The Solution

Data Handling in R:Filling in Missing Data

The Problem

The Solution

Data Handling in R:
Filling in Missing Data