# install.packages("tidyverse")
# install.packages("palmerpenguins")
library(tidyverse)
library(palmerpenguins)
Solution to ggplot tasks
1 Get started
First we need to load the packages needed to complete this task:
- Have a look at the penguin data set
penguins
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
# ℹ 338 more rows
# ℹ 2 more variables: sex <fct>, year <int>
2 Exploratory plotting
2.1 Relationship between bill length and bill depth (scatterplot)
What is the relationship between bill length and bill depth?
First, I created a scatter plot and added a linear regression line. From the plot, It looks like bill length is decreasing with increasing bill depth.
# Bill length vs. bill depth scatterplot with regression line
ggplot(
data = penguins,
aes(
x = bill_length_mm,
y = bill_depth_mm
)+
) geom_point() +
geom_smooth(method = "lm", se = FALSE)
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
# or short
ggplot(penguins, aes(bill_length_mm, bill_depth_mm)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
If we add the color aesthetic locally to the point layer, only this layer will be affected by it. The regression line is not separated by species but still calculated and plotted for all data points together:
# Bill length vs. bill depth scatterplot with regression line
# color as aesthetic local to the point layer
ggplot(penguins, aes(bill_length_mm, bill_depth_mm)) +
geom_point(aes(color = species)) +
geom_smooth(method = "lm", se = FALSE)
We can see an example of the Simpson’s paradox here. If you don’t consider species, it looks like the bill depth decreases with bill length. But after separating the data by species, we see that the effect is actually the opposite.
To draw separate regression lines for the species, we need to either add the color aesthetic to the smooth layer as well, or define the color aesthetic globally in the top layer ggplot
call:
# Define color aesthetic once globally
ggplot(penguins, aes(
x = bill_length_mm,
y = bill_depth_mm,
color = species
+
)) geom_point() +
geom_smooth(method = "lm", se = FALSE)
2.2 Difference in flipper length between species (boxplot)
Is there a difference in flipper length between the species?
First I created a simple boxplot with notches:
# Basic boxplot of flipper length with notches
ggplot(penguins, aes(species, flipper_length_mm)) +
geom_boxplot(notch = TRUE)
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).
A geom_point
with position = position_jitter()
will add the individual data points to the plot. It’s important to set a seed here to get the same result for the point position on the x-axis every time. Otherwise your plot is not reproducible. I added width = 0.5
to make the jittering a bit narrower:
ggplot(penguins, aes(species, flipper_length_mm)) +
geom_boxplot() +
geom_point(position = position_jitter(
seed = 123,
width = 0.5
))
2.3 Differences between body mass of male and female penguins (boxplot)
Are male penguins heavier than female penguins? And is this different between the 3 species?
First, a basic boxplot of the body mass by sex:
#|warning: false
# Basic boxplot of body mass for penguins of different sex
ggplot(penguins, aes(x = sex, y = body_mass_g)) +
geom_boxplot()
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).
I added species as color aesthetic:
#|warning: false
ggplot(penguins, aes(x = sex,
y = body_mass_g)) +
geom_boxplot(aes(color = species))
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).
Alternatively, I could also specify the fill aesthetic:
#|warning: false
ggplot(penguins, aes(x = sex, y = body_mass_g)) +
geom_boxplot(aes(fill = species))
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).
Species as facets:
ggplot(penguins, aes(x = sex, y = body_mass_g)) +
geom_boxplot() +
facet_wrap(~species)
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).
I added a violin plot in the background to show the distribution of the datapoints. To make the violins visible, I changed the width of the boxplot to 0.4:
ggplot(penguins, aes(x = sex, y = body_mass_g)) +
geom_violin() +
geom_boxplot(width = .04) +
facet_wrap(~species)
2.4 Distribution of flipper length between species (histogram)
Make a histogram of the the flipper length separated by species.
The default histogram is a histogram where the different groups are stacked:
ggplot(penguins, aes(
x = flipper_length_mm,
fill = species
+
)) geom_histogram()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).
To unstack the groups, you have to use position = "identity"
. Also, it’s a good idea to make the histogram slightly transparent (alpha = 0.4
) to see the overlapping areas.
ggplot(penguins, aes(
x = flipper_length_mm,
fill = species
+
)) geom_histogram(
alpha = 0.5,
position = "identity"
)
Separated by facets (no need to specify the position here, because there is only one group per plot).
ggplot(penguins, aes(
x = flipper_length_mm,
fill = species
+
)) geom_histogram() +
facet_wrap(~species, ncol = 1)
2.5 Penguin flipper length by species and sex (heatmap)
For this data it does not make too much sense, but a heat map would look like this:
ggplot(penguins, aes(
x = species,
y = sex,
fill = flipper_length_mm
+
)) geom_tile()
3 Beautify the plots
3.1 Beautify plots from Task 1
Here are just some examples of how to make the plots from before prettier. Of course there a many other options as well.
Example one: Boxplot of flipper length and species
ggplot(penguins, aes(species, flipper_length_mm, color = species)) +
geom_boxplot(width = 0.3) +
geom_point(
alpha = 0.5,
position = position_jitter(width = 0.1, seed = 123)
+
) ::scale_color_uchicago() +
ggscilabs(x = "Species", y = "Flipper length (mm)") +
theme_minimal() +
theme(legend.position = "none")
What was changed compared to the basic plot?
- Add color for each species by setting a global
color
aesthetic - Make boxes and jitter points less wide by setting
width
for both layers - Make jitter points slightly transparent by specifying
alpha = 0.5
for the jitter layer - Change the color to nicer colors from the
ggsci
package - Change from default theme to
theme_minimal()
- Remove the legend with
legend.position = "none"
- Change the axis labels with
labs()
Example two: Reproducing the plot from the presentation
The following code is adapted from the palmerpengins package website.
ggplot(
data = penguins,
aes(
x = bill_length_mm,
y = bill_depth_mm,
color = species,
shape = species
)+
) geom_point(size = 3, alpha = 0.8) +
geom_smooth(method = "lm", se = FALSE) +
scale_color_manual(values = c("darkorange", "purple", "cyan4")) +
labs(
title = "Penguin bill dimensions",
subtitle = "Bill length and depth for Adelie, Chinstrap and
Gentoo Penguins at Palmer Station LTER",
x = "Bill length (mm)",
y = "Bill depth (mm)",
color = "Penguin species",
shape = "Penguin species"
+
) theme_minimal() +
theme(
legend.position = c(0.85, 0.15),
legend.background = element_rect(fill = "white", color = NA)
)
What was changed compared to the basic plot?
- Make points larger and slightly transparent by setting
size
andalpha
for the point layer - Change to custom color scale
- Add title and subtitle with
labs
- Change title of x-axis, y-axis and legend for color and shape aesthetic with
labs
- Use
theme_minimal()
instead of default theme - Change legend position to bottom right corner within the plot
- Positions are relative to the bottom left corner of the plot
- 0.85 (85% of plot width) to the right
- 0.15 (15% of plot height) towards the top
4 Save one of the plots on your machine
Example with one of the plots from above:
# First save the plot in a variable
<- ggplot(penguins, aes(species, flipper_length_mm, color = species)) +
flipper_box geom_boxplot(width = 0.3) +
geom_jitter(alpha = 0.5, position = position_jitter(width = 0.2, seed = 123)) +
::scale_color_uchicago() +
ggscilabs(x = "Species", y = "Flipper length (mm)") +
theme_minimal() +
theme(legend.position = "none")
# save as png in /img directory of the project
ggsave(filename = "img/flipper_box.png", plot = flipper_box)
# save as pdf in /img directory of the project
ggsave(filename = "img/flipper_box.pdf", plot = flipper_box)
5 Some more examples
Histogram
ggplot(penguins, aes(x = flipper_length_mm, fill = species)) +
geom_histogram(alpha = 0.6) +
::scale_fill_d3() +
ggscilabs(
y = "Frequency",
x = "Flipper length [mm]",
fill = "Penguin species"
+
) theme_minimal()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).
Heat map
For the penguin data set a heat map does not make that much sense. But an example would be:
ggplot(penguins, aes(
x = species,
y = sex,
fill = flipper_length_mm
+
)) geom_tile() +
scale_fill_viridis_c() +
theme_classic()
5.1 The patchwork
package
With the patchwork
package, you can combine multiple ggplots into one plot. The package allows you to add annotations to the plot and to control the layout and appearance.
Below you find a simple example of two different penugin scatterplots. For more explanation and an overview of what is possible with the package, please have a look at the package documentation
library(patchwork)
# Collecting legends and defining a common theme
<- ggplot(penguins, aes(
plot_1 x = bill_length_mm, y = bill_depth_mm,
color = species
+
)) geom_point()
<- ggplot(penguins, aes(
plot_2 x = bill_length_mm, y = body_mass_g,
color = species
+
)) geom_point()
# Simple combination of 2 plots in patchwork
+ plot_2 plot_1
And a more complex example where shared layers are defined for both plots:
# more complex combination with annotation and definition of shared layers
<- plot_1 + plot_2 +
final_plot plot_layout(guides = "collect") +
plot_annotation(tag_levels = "a", tag_prefix = "(", tag_suffix = ")") &
theme_minimal() &
scale_color_manual(values = c("darkorange", "purple", "cyan4")) &
labs(
color = "Penguin species"
&
) theme(
plot.tag.position = c(0.1, 0.95),
plot.tag = element_text(face = "bold")
)
final_plot
6 References
Check out the package website of the palmerpenguin
package. They have more nice examples of data visualizations that you can do with ggplot.
Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/. doi: 10.5281/zenodo.3960218.