Solution to statistical test task

1 Compare the flipper length of penguins using statistical tests

First, create subsets of the flipper length of all species as vectors:

library(palmerpenguins)
library(tidyverse)
adelie <- filter(penguins, species == "Adelie")$flipper_length_mm
gentoo <- filter(penguins, species == "Gentoo")$flipper_length_mm
chinstrap <- filter(penguins, species == "Chinstrap")$flipper_length_mm

1.1 Step 1: Test all groups for normality

With Shapiro-Wilk

The Shapiro-Wilk test suggests that Adelie and Chinstrap are normally distributed, but Gentoo is not.

shapiro.test(adelie)

    Shapiro-Wilk normality test

data:  adelie
W = 0.99339, p-value = 0.72
shapiro.test(gentoo)

    Shapiro-Wilk normality test

data:  gentoo
W = 0.96219, p-value = 0.00162
shapiro.test(chinstrap)

    Shapiro-Wilk normality test

data:  chinstrap
W = 0.98891, p-value = 0.8106

With QQ-plots

library(patchwork) # using patchwork to combine multiple plots into one
ch <- ggpubr::ggqqplot(chinstrap) + labs(title = "chinstrap")
ad <- ggpubr::ggqqplot(adelie) + labs(title = "adelie")
ge <- ggpubr::ggqqplot(gentoo) + labs(title = "gentoo")

ch + ge + ad

1.2 Compare Chinstrap vs. Adelie

Both are normally distributed, so we can compare variances for Chinstrap and Adelie penguins:

var.test(chinstrap, adelie)

    F test to compare two variances

data:  chinstrap and adelie
F = 1.1894, num df = 67, denom df = 150, p-value = 0.3854
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.8022092 1.8217141
sample estimates:
ratio of variances 
          1.189396 

This indicates that the variances do not differ between Chinstrap and Adelie penguins. Therefore, we can use a t-test to compare mean flipper length:

t.test(chinstrap, adelie)

    Welch Two Sample t-test

data:  chinstrap and adelie
t = 5.7804, df = 119.68, p-value = 6.049e-08
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 3.859244 7.880530
sample estimates:
mean of x mean of y 
 195.8235  189.9536 

The t-test suggests, that the mean flipper length differs between Chinstrap and Adelie penguins.

1.3 Compare Chinstrap vs. Gentoo

Gentoo is not normally distributed, so we can use a Wilcoxon-rank-sum test to compare means:

wilcox.test(chinstrap, gentoo)

    Wilcoxon rank sum test with continuity correction

data:  chinstrap and gentoo
W = 92, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0

The Wilcoxon tests suggests that the flipper lengths differ between Chinstrap and Gentoo penguins.

1.4 Compare Adelie vs. Gentoo

Gentoo is not normally distributed, so we can use a Wilcoxon-rank-sum test to compare means:

wilcox.test(adelie, gentoo)

    Wilcoxon rank sum test with continuity correction

data:  adelie and gentoo
W = 26, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0

The Wilcoxon tests suggests that the flipper lengths differ between Adelie and Gentoo penguins.

1.5 Visual comparison

The boxplot supports the findings of the statistical tests. Notches do not overlap.

penguins |> 
  ggplot(aes(x = species, y = flipper_length_mm)) +
  geom_boxplot(notch = TRUE)

Similarly the standard errors of the means do not overlap between the species.

penguins |>
  ggplot(aes(x = species, y = flipper_length_mm)) +
  stat_summary() 

To add the significance levels of the tests performed, I used the geom_signif() layer provided by the ggsignif package.

Since I performed two different tests (t-test for the comparison of Chinstrap and Adelie penguins, Wilcoxon test for the other two comparisons), I added two different significance layers that distinguish the two tests.

For the boxplot:

library(ggsignif)
penguins |> 
  ggplot(aes(x = species, y = flipper_length_mm)) +
  geom_boxplot(notch = TRUE) +
  geom_signif(
    comparisons = list(
      c("Chinstrap", "Adelie")
    ),
    test = "t.test",
    test.args = list(var.equal = TRUE),
    map_signif_level = TRUE
  ) +
  geom_signif(
    comparisons = list(
      c("Chinstrap", "Gentoo"),
      c("Gentoo", "Adelie")
    ),
    test = "wilcox.test",
    y_position = c(235, 240),
    map_signif_level = TRUE
  )

For a plot with mean and errorbar:

penguins |> 
  ggplot(aes(x = species, y = flipper_length_mm)) +
  stat_summary() +
  geom_signif(
    comparisons = list(
      c("Chinstrap", "Adelie")
    ),
    test = "t.test",
    test.args = list(var.equal = TRUE),
    map_signif_level = TRUE,
    y_position = 200,
    tip_length = 0.01
  ) +
  geom_signif(
    comparisons = list(
      c("Chinstrap", "Gentoo"),
      c("Gentoo", "Adelie")
    ),
    test = "wilcox.test",
    y_position = c(216, 218),
    map_signif_level = TRUE,
    tip_length = 0.01
  )