This example uses the ‘chickwts’ built-in dataset which documents the weight of 71 chickens which had been fed different diets.

An experiment was conducted to measure and compare the effectiveness of various feed supplements on the growth rate of chickens. Newly-hatched chicks were randomly allocated into six groups, and each group was given a different feed supplement. Their weights in grams after six weeks are given along with feed types.

Here’s an idea of what the data looks like:

print(head(chickwts, 15))

##    weight      feed
## 1     179 horsebean
## 2     160 horsebean
## 3     136 horsebean
## 4     227 horsebean
## 5     217 horsebean
## 6     168 horsebean
## 7     108 horsebean
## 8     124 horsebean
## 9     143 horsebean
## 10    140 horsebean
## 11    309   linseed
## 12    229   linseed
## 13    181   linseed
## 14    141   linseed
## 15    260   linseed

Let’s see what the six groups (type of feed given to the chickens) are:

for (feed_type in unique(chickwts$feed)) {
    print(feed_type)
}

## [1] "horsebean"
## [1] "linseed"
## [1] "soybean"
## [1] "sunflower"
## [1] "meatmeal"
## [1] "casein"

Which Statistical Test Should Be Used?

The (alternative) hypothesis is that there is a difference in growth rate between chickens fed different diets
The variable being measured is weight (mass), which is continuous
The hypothesis is that the chickens will weigh a different amount if they were fed a different diet
Specifically, the hypothesis is that the mean weight will be different depending on which diet was used
There are more than two groups (there are six)
We cannot assume that the data is Normally distributed, so a nonparametric test should be used
If one chicken grows particularly fast, it will not affect the growth rate of any of the other chickens. Thus the measurements are independent.

Using the above flowchart we see that we should use Kruskal-Wallis one-way ANOVA.

Performing the Kruskal-Wallis Test

The Kruskal-Wallis test is performed using the kruskal.test() function (more info):

The independent variable is ‘feed’ and the dependent variable is ‘weight’
When using an entire data frame as the input, the kruskal.test() function used the ‘tilde’ notation. This format requires that the dependent variable be specified first, followed by a tilde (“~”, which means ‘proportional to’ in statistics), followed by the independent variable: weight ~ feed
The data frame being used (‘chickwts’) is indicated using the ‘data’ keyword argument

kruskal.test(weight ~ feed, data = chickwts)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  weight by feed
## Kruskal-Wallis chi-squared = 37.343, df = 5, p-value = 5.113e-07

As we can see from the output the p-value was 5.113e-07, which is strong evidence in favour of the alternative hypothesis (which is that chicks grow at different rates when fed these different diets). The individual values that have been returned can be accessed by setting a variable equal to kruskal.test() and then indexing it:

k <- kruskal.test(weight ~ feed, data = chickwts)

# Get the Kruskal-Wallis rank sum statistic
chi_squared <- k$statistic
# Get the degrees of freedom of the approximate chi-squared distribution of
# the test statistic
df <- k$parameter
# Get the p-value of the test
p <- k$p.value
# Get the character string "Kruskal-Wallis rank sum test"
name_of_test <- k$method
# Get a character string giving the names of the data
comparison <- k$data.name

# Print them all
print(chi_squared)
print(df)
print(p)
print(name_of_test)
print(comparison)

## Kruskal-Wallis chi-squared 
##                   37.34272 
## df 
##  5 
## [1] 5.11283e-07
## [1] "Kruskal-Wallis rank sum test"
## [1] "weight by feed"

⇦ Back

Statistics in R:Kruskal-Wallis Rank Sum Test

Which Statistical Test Should Be Used?

Performing the Kruskal-Wallis Test

Statistics in R:
Kruskal-Wallis Rank Sum Test