Solution to statistical test task

Compare the flipper length of penguins using statistical tests

First, create subsets of the flipper length of all species as vectors:

library(tidyverse)
adelie <- filter(penguins, species == "Adelie")$flipper_len
gentoo <- filter(penguins, species == "Gentoo")$flipper_len
chinstrap <- filter(penguins, species == "Chinstrap")$flipper_len

Step 1: Test all groups for normality

With Shapiro-Wilk

The Shapiro-Wilk test suggests that Adelie and Chinstrap are normally distributed, but Gentoo is not.

shapiro.test(adelie)


    Shapiro-Wilk normality test

data:  adelie
W = 0.99339, p-value = 0.72

shapiro.test(gentoo)


    Shapiro-Wilk normality test

data:  gentoo
W = 0.96219, p-value = 0.00162

shapiro.test(chinstrap)


    Shapiro-Wilk normality test

data:  chinstrap
W = 0.98891, p-value = 0.8106

With QQ-plots

library(patchwork) # using patchwork to combine multiple plots into one
ch <- ggpubr::ggqqplot(chinstrap) + labs(title = "chinstrap")
ad <- ggpubr::ggqqplot(adelie) + labs(title = "adelie")
ge <- ggpubr::ggqqplot(gentoo) + labs(title = "gentoo")

ch + ge + ad

Compare Chinstrap vs. Adelie

Both are normally distributed, so we can compare variances for Chinstrap and Adelie penguins:

var.test(chinstrap, adelie)


    F test to compare two variances

data:  chinstrap and adelie
F = 1.1894, num df = 67, denom df = 150, p-value = 0.3854
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.8022092 1.8217141
sample estimates:
ratio of variances 
          1.189396

This indicates that the variances do not differ between Chinstrap and Adelie penguins. Therefore, we can use a t-test to compare mean flipper length:

t.test(chinstrap, adelie)


    Welch Two Sample t-test

data:  chinstrap and adelie
t = 5.7804, df = 119.68, p-value = 6.049e-08
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 3.859244 7.880530
sample estimates:
mean of x mean of y 
 195.8235  189.9536

The t-test suggests, that the mean flipper length differs between Chinstrap and Adelie penguins.

Compare Chinstrap vs. Gentoo

Gentoo is not normally distributed, so we can use a Wilcoxon-rank-sum test to compare means:

wilcox.test(chinstrap, gentoo)


    Wilcoxon rank sum test with continuity correction

data:  chinstrap and gentoo
W = 92, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0

The Wilcoxon tests suggests that the flipper lengths differ between Chinstrap and Gentoo penguins.

Compare Adelie vs. Gentoo

Gentoo is not normally distributed, so we can use a Wilcoxon-rank-sum test to compare means:

wilcox.test(adelie, gentoo)


    Wilcoxon rank sum test with continuity correction

data:  adelie and gentoo
W = 26, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0

The Wilcoxon tests suggests that the flipper lengths differ between Adelie and Gentoo penguins.

Visual comparison

The boxplot supports the findings of the statistical tests. Notches do not overlap.

penguins |>
  ggplot(aes(x = species, y = flipper_len)) +
  geom_boxplot(notch = TRUE)

Similarly the standard errors of the means do not overlap between the species.

penguins |>
  ggplot(aes(x = species, y = flipper_len)) +
  stat_summary()

To add the significance levels of the tests performed, I used the geom_signif() layer provided by the ggsignif package.

Since I performed two different tests (t-test for the comparison of Chinstrap and Adelie penguins, Wilcoxon test for the other two comparisons), I added two different significance layers that distinguish the two tests.

For the boxplot:

library(ggsignif)
penguins_boxplot <- penguins |>
  ggplot(aes(x = species, y = flipper_len)) +
  geom_boxplot(notch = TRUE) +
  geom_signif(
    comparisons = list(
      c("Chinstrap", "Adelie")
    ),
    test = "t.test",
    test.args = list(var.equal = TRUE),
    map_signif_level = TRUE
  ) +
  geom_signif(
    comparisons = list(
      c("Chinstrap", "Gentoo"),
      c("Gentoo", "Adelie")
    ),
    test = "wilcox.test",
    y_position = c(235, 240),
    map_signif_level = TRUE
  )

# To print the plot, just type its name:
penguins_boxplot

For a plot with mean and errorbar:

penguins |>
  ggplot(aes(x = species, y = flipper_len)) +
  stat_summary() +
  geom_signif(
    comparisons = list(
      c("Chinstrap", "Adelie")
    ),
    test = "t.test",
    test.args = list(var.equal = TRUE),
    map_signif_level = TRUE,
    y_position = 200,
    tip_length = 0.01
  ) +
  geom_signif(
    comparisons = list(
      c("Chinstrap", "Gentoo"),
      c("Gentoo", "Adelie")
    ),
    test = "wilcox.test",
    y_position = c(216, 218),
    map_signif_level = TRUE,
    tip_length = 0.01
  )

You can also do a barplot with error bars. In this case, the error bars are barely visible, because the standard error of the mean is quite small compared to the mean.

ggplot(penguins, aes(x = species, y = flipper_len)) +
  stat_summary(
    fun.data = mean_se,
    geom = "errorbar",
    width = 0.2
  ) +
  stat_summary(
    fun = mean,
    geom = "bar",
    fill = "grey70"
  ) +
  geom_signif(
    comparisons = list(
      c("Chinstrap", "Adelie")
    ),
    test = "t.test",
    test.args = list(var.equal = TRUE),
    map_signif_level = TRUE,
    y_position = 205,
    tip_length = 0.01
  ) +
  geom_signif(
    comparisons = list(
      c("Chinstrap", "Gentoo"),
      c("Gentoo", "Adelie")
    ),
    test = "wilcox.test",
    y_position = c(221, 230),
    map_signif_level = TRUE,
    tip_length = 0.01
  )

For the fast ones

Of course you can further improve the appearance of you plots. Here is an example of a simple, yet elegant version of the boxplot from above:

penguins_boxplot +
  labs(
    y = "Flipper length (mm)"
  ) +
  theme_classic() +
  theme(
    text = element_text(size = 16),
    axis.title.x = element_blank(),
    panel.grid.major.y = element_line(color = "grey90")
  )