Expand to reproduce the data
set.seed(123)
<- tibble(
mydata normal = rnorm(
n = 200,
mean = 50,
sd = 5
),non_normal = runif(
n = 200,
min = 45,
max = 55
) )
Day 3 - Introduction to Data Analysis with R
Freie Universität Berlin - Theoretical Ecology
March 14, 2025
There are various tests and the outcome might differ!
Shapiro-Wilk-Test
Visual tests: QQ-Plot
A tibble with two variables: normal
and non_normal
\(H_0\): Data does not differ from a normal distribution
The data does not deviate significantly from a normal distribution (Shapiro-Wilk-Test, W = 0.991, p = 0.23).
Points should match the straight line. Small deviations are okay.
Counts of insects in agricultural units treated with different insecticides.
First, test for normal distribution!
F-Test
Levene test
First, test for normal distribution
shapiro.test(TreatA)
#>
#> Shapiro-Wilk normality test
#>
#> data: TreatA
#> W = 0.95757, p-value = 0.7487
shapiro.test(TreatB)
#>
#> Shapiro-Wilk normality test
#>
#> data: TreatB
#> W = 0.95031, p-value = 0.6415
shapiro.test(TreatC)
#>
#> Shapiro-Wilk normality test
#>
#> data: TreatC
#> W = 0.92128, p-value = 0.2967
Result: All 3 treatments are normally distributed.
\(H_0\): Variances do not differ between groups
var.test(TreatA, TreatB)
#>
#> F test to compare two variances
#>
#> data: TreatA and TreatB
#> F = 1.2209, num df = 11, denom df = 11, p-value = 0.7464
#> alternative hypothesis: true ratio of variances is not equal to 1
#> 95 percent confidence interval:
#> 0.3514784 4.2411442
#> sample estimates:
#> ratio of variances
#> 1.22093
Variances of sprays A & B don’t differ significantly (F-Test, \(F_{11,11}\) = 1.22, p = 0.75)
\(H_0\): Variances do not differ between groups
var.test(TreatA, TreatC)
#>
#> F test to compare two variances
#>
#> data: TreatA and TreatC
#> F = 7.4242, num df = 11, denom df = 11, p-value = 0.002435
#> alternative hypothesis: true ratio of variances is not equal to 1
#> 95 percent confidence interval:
#> 2.137273 25.789584
#> sample estimates:
#> ratio of variances
#> 7.424242
Variances of sprays A & C differ significantly (F-Test, \(F_{11,11}\) = 7.42, p = 0.002)
t-test
Welch-Test (corrected t-test)
Wilcoxon rank sum test
\(H_0\): The samples do not differ in their mean
Treatment A and B: normally distributed and equal variance
t.test(TreatA, TreatB, var.equal = TRUE)
#>
#> Two Sample t-test
#>
#> data: TreatA and TreatB
#> t = -0.45352, df = 22, p-value = 0.6546
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -4.643994 2.977327
#> sample estimates:
#> mean of x mean of y
#> 14.50000 15.33333
The means of spray A and B don’t differ significantly (t = -0.45, df = 22, p = 0.66)
\(H_0\): The samples do not differ in their mean
Treatment A and C: normally distributed and non-equal variance
t.test(TreatA, TreatC, var.equal = FALSE)
#>
#> Welch Two Sample t-test
#>
#> data: TreatA and TreatC
#> t = 7.5798, df = 13.91, p-value = 2.655e-06
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> 7.885546 14.114454
#> sample estimates:
#> mean of x mean of y
#> 14.5 3.5
The means of spray A and C do differ significantly (t = 7.58, df = 13.9, p < 0.001)
\(H_0\): The samples do not differ in their mean
We don’t need the Wilcoxon test to compare treatment A and B, but for the sake of an example:
The means of spray A and B do not differ significantly (W = 62, p = 0.58)
Are there pairs of data points?
Example: samples of invertebrates across various rivers before and after sewage plants.
ggsignif
The ggsignif
package offers a geom_signif()
layer that can be added to a ggplot to annotate significance levels
geom_signif()
geom_signif()
test
: run specific testtest.args
: pass additional arguments in a list?geom_signif
for more optionsstat_summary
Another way to plot the results is to plot mean and standard error of the mean:
stat_summary
Another way to plot the results is to plot mean and standard error of the mean:
stat_summary
Another way to plot the results is to plot mean and standard error of the mean:
stat_summary
Just like before, you can also add a geom_signif
to a barplot:
ggplot(
InsectSprays,
aes(x = spray, y = count)
) +
stat_summary(
fun.data = mean_se,
geom = "errorbar",
width = 0.3
) +
stat_summary(
fun.y = mean,
geom = "bar"
) +
ggsignif::geom_signif(
comparisons = list(
c("A", "B"),
c("B", "C"),
c("A", "C")
),
test = "t.test",
map_signif_level = TRUE,
y_position = c(17, 18, 19)
)
Task 1 (45) min)
Statistical tests
Find the task description here
Selina Baldauf // Statistical tests