Data visualization with ggplot2

Day 2 - Introduction to Data Analysis with R

Selina Baldauf

Freie Universität Berlin - Theoretical Ecology

March 8, 2026

A ggplot showcase

Example plots you can create with ggplot

A ggplot showcase

Visualization by Jake Kaupp, code available on Github

A ggplot showcase

Visualizations produced by the BBC News data team

A ggplot showcase

Visualization by Cédric Scherer, code available on Github

A ggplot showcase

Overview of different plot possibilites created by Cédric Scherer. Check out his tutorial to learn how to make all these plots with ggplot

The ggplot2 package

  • Consistent grammar/structure
  • Flexible structure allows you to produce any type of plots
  • Highly customizable appearance (themes)
  • Many extension packages that provide additional plot types, themes, colors, animation, …
    • See here for a list of ggplot extensions
  • Active community that provides help and inspiration
  • Perfect package for exploratory data analysis and beautiful plots

Part of the tidyverse

library(tidyverse)
# Or load it individually:
# library(ggplot2)

The data

The gapminder dataset contains data on life expectancy, GDP per capita, and population for 142 countries from 1952 to 2007.

# install.packages("gapminder")
library(gapminder)
gapminder
#> # A tibble: 1,704 × 6
#>    country     continent  year lifeExp      pop gdpPercap
#>    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
#>  1 Afghanistan Asia       1952    28.8  8425333      779.
#>  2 Afghanistan Asia       1957    30.3  9240934      821.
#>  3 Afghanistan Asia       1962    32.0 10267083      853.
#>  4 Afghanistan Asia       1967    34.0 11537966      836.
#>  5 Afghanistan Asia       1972    36.1 13079460      740.
#>  6 Afghanistan Asia       1977    38.4 14880372      786.
#>  7 Afghanistan Asia       1982    39.9 12881816      978.
#>  8 Afghanistan Asia       1987    40.8 13867957      852.
#>  9 Afghanistan Asia       1992    41.7 16317921      649.
#> 10 Afghanistan Asia       1997    41.8 22227415      635.
#> # ℹ 1,694 more rows

How is life expectancy related to GDP per capita?

Building a ggplot

Step by step

Step 1: ggplot(data)

The ggplot() function initializes a ggplot object. Every ggplot needs this function.

ggplot(gapminder)
  • Empty plot because we haven’t told ggplot what to put on the axes yet

Step 2: add aes(x, y)

The aesthetics define how data variables are mapped to plot properties.

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp
  )
)
  • Axes are set up, but no data points yet
  • Scales adapt automatically to the range of the data

Step 3: add a geom

geoms define how data points are represented. There are many different geoms to chose from

Step 3: add a geom

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp
  )
) +
  geom_point()
  • New layers are added with +
  • data and aes defined in ggplot call are inherited to all plot layers

Does this depend on the continent?

Local vs. global aesthetics

ggplot() +
  geom_point(
    data = gapminder,
    aes(
      x = gdpPercap,
      y = lifeExp
    )
  )
  • data and/or aes can also be local to a layer
  • You need to explicitly specify the data argument with data = ...

Here, it does not make a difference in the result.

Add color: aes(color = ...)

Map a variable to the color aesthetic to distinguish groups:

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp,
    color = continent
  )
) +
  geom_point()
  • color = continent maps the continent to point color
  • A legend is added automatically

Other aesthetics: size

Besides color, you can also map variables to size:

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp,
    color = continent,
    size = pop
  )
) +
  geom_point()

Other aesthetics: shape

Besides color, you can also map variables to shape:

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp,
    color = continent,
    shape = continent
  )
) +
  geom_point()

Changing the scales of the aesthetics

The scales onto which the aesthetic elements are mapped can be changed.

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp,
    color = continent
  )
) +
  geom_point()
  • GDP is condensed on the left (few large values)
  • A log scale would spread the data out

scale_x_log10

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp,
    color = continent
  )
) +
  geom_point() +
  scale_x_log10()
  • Scales can be changed for all elements of aes:

scale_aes-name_scale-type

Here we scale the x aesthetic to log10.

geom_smooth

Add a smoothing line that helps see patterns in the data

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp,
    color = continent
  )
) +
  geom_point() +
  geom_smooth(method = "lm") +
  scale_x_log10()
  • With method = "lm", a linear regression line is added
  • All geoms done separately for each continent because color is defined globally

geom_boxplot

Compare a numeric variable across groups:

ggplot(
  gapminder,
  aes(
    x = continent,
    y = lifeExp
  )
) +
  geom_boxplot()

geom_boxplot

Compare a numeric variable across groups:

geom_boxplot with color

ggplot(
  gapminder,
  aes(
    x = continent,
    y = lifeExp,
    color = continent
  )
) +
  geom_boxplot()

geom_boxplot with fill

ggplot(
  gapminder,
  aes(
    x = continent,
    y = lifeExp,
    fill = continent
  )
) +
  geom_boxplot()

geom_tile

You can create a simple heatmap with geom_tile

ggplot(
  gapminder,
  aes(
    x = year,
    y = continent,
    fill = lifeExp
  )
) +
  geom_tile()
  • Here we would have to choose a different color scheme to see differences

geom_histogram

See the distribution of a single numeric variable:

ggplot(
  gapminder,
  aes(x = lifeExp)
) +
  geom_histogram()
  • ggplot counts the observations for you (y-axis)

geom_histogram with groups

ggplot(
  gapminder,
  aes(
    x = lifeExp,
    fill = continent
  )
) +
  geom_histogram(
    position = "identity",
    alpha = 0.5
  )
  • By default histograms are stacked
  • Use position = "identity" to overlap them
  • alpha makes sure you see overlapping areas
  • Here it’s a bit too crowded to see differences between continents

Small multiples with facet_wrap

Split your plots along one variable with facet_wrap

ggplot(
  gapminder,
  aes(
    x = lifeExp,
    fill = continent
  )
) +
  geom_histogram() +
  facet_wrap(vars(continent))
  • Especially useful when color/shape gets crowded

Small multiples with facet_grid

Split your plots along two variables with facet_grid

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp,
    color = continent
  )
) +
  geom_point() +
  scale_x_log10() +
  facet_grid(
    rows = vars(continent),
    cols = vars(year)
  )
  • facet_grid(rows = vars(...), cols = vars(...))

So many other geoms

Check out:

Summary: The ggplot skeleton

Every ggplot has the same basic structure:

ggplot(
  data,
  aes(x = ..., y = ..., color = ...)
) +
  geom_xxx()
  • ggplot(): initialize the plot with data and aesthetic mappings
  • aes(): map variables to visual properties (x-axis, y-axis, color, shape, size, …)
  • geom_xxx(): define how data points are drawn
  • Layers are connected with +

Now you

Task 1.1 - 1.2 (45 min)

Exploratory data analysis with the penguin data set

Find the task description here

(Don’t do the “Beautify” task)

Beautifying plots

From default to publication-ready

Mapping vs. setting aesthetics

Inside aes(): map a variable to color

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp,
    color = continent
  )
) +
  geom_point()

Inside geom: set color of all points

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp
  )
) +
  geom_point(
    color = "steelblue"
  )

Mapping vs. setting aesthetics

You can change point shape, size, and transparency in the same way:

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp
  )
) +
  geom_point(
    size = 3, # size
    shape = 17, # shape
    color = "steelblue", # color
    alpha = 0.5 # transparency (0-1)
  ) +
  scale_x_log10()

Shape and color codes

Mapping vs. setting aesthetics

Mapping and setting aesthetics can also be combined:

ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp,
    color = continent
  )
) +
  geom_point(
    size = 2,
    alpha = 0.6
  ) +
  scale_x_log10()

Starting point

Let’s take this plot and make it look better step by step:

g <- ggplot(
  gapminder,
  aes(
    x = gdpPercap,
    y = lifeExp,
    color = continent
  )
) +
  geom_point() +
  scale_x_log10()
g
  • Saving the plot in a variable g so we can build on it
  • Other plot layers can still be added to g

scale_color_manual(): choose your own colors

Change the colors of the color aesthetic:

g <- g +
  scale_color_manual(
    values = c(
      "darkorange",
      "steelblue",
      "forestgreen",
      "orchid",
      "grey70"
    )
  )
g
  • One color per group: number of colors must match the number of groups
  • Colors can be names or hex codes like "#FF6B35"

scale_color_viridis_d

Change the colors of the color aesthetic:

g +
  scale_color_viridis_d(
    option = "magma"
  )
  • The viridis color palette is designed for viewers with common forms of color blindness
  • viridis_d for discrete colors, viridis_c for continuous colors
  • Different options of viridis color palettes: "magma", "inferno", "plasma", "viridis", "cividis"

scale_color_brewer(): use a color palette

g +
  scale_color_brewer(
    palette = "Set2"
  )
  • Pre-defined palettes, many are colorblind-friendly
  • Other options: "Dark2", "Paired", "Pastel1", …
  • Check ?scale_color_brewer for all options

scale_fill_* vs. scale_color_*

ggplot(
  gapminder,
  aes(
    x = continent,
    y = lifeExp,
    color = continent
  )
) +
  geom_boxplot() +
  scale_color_brewer(
    palette = "Dark2"
  )

ggplot(
  gapminder,
  aes(
    x = continent,
    y = lifeExp,
    fill = continent
  )
) +
  geom_boxplot() +
  scale_fill_brewer(
    palette = "Dark2"
  )

labs: Change axis and legend titles and add plot title

g <- g +
  labs(
    x = "GDP per capita [US$]",
    y = "Life expectancy [years]",
    color = "Continent",
    title = "Wealth and life expectancy",
    subtitle = "Higher GDP per capita is associated with longer life expectancy",
    caption = "Data from the gapminder package"
  )
g

labs: Change axis and legend titles and add plot title

theme_*: change appearance

ggplot2 offers many pre-defined themes that we can apply to change the appearance of a plot.

g +
  theme_classic()

g +
  theme_bw()

theme_*: change appearance

ggplot2 offers many pre-defined themes that we can apply to change the appearance of a plot.

g +
  theme_minimal()

g +
  theme_linedraw()

theme_*: change appearance

Since ggplot2 v. 4.0.0, you can change overall color choices of a pre-defined theme.

g +
  theme_bw(
    ink = "#BBBBBB",
    paper = "#333333"
  )
  • paper: affects background elements
  • ink: affects foreground elements (text, lines, points, …)
  • accent: affects elements that are used to highlight information (like geom_smooth() lines)

theme_*: change appearance

Since ggplot2 v. 4.0.0, you can change overall color choices of a pre-defined theme.

g +
  theme_minimal(
    paper = "cornsilk",
    ink = "navy"
  )
  • paper: affects background elements
  • ink: affects foreground elements (text, lines, points, …)
  • accent: affects elements that are used to highlight information (like geom_smooth() lines)

theme(): fine-tune individual elements

g <- g +
  theme_classic() +
  theme(
    # Move legend to the bottom
    legend.position = "bottom",
    # Make the title bold
    plot.title = element_text(face = "bold"),
    # Add the major grid lines
    panel.grid.major = element_line(color = "grey80")
  )
g

The basic functioning of theme elements is:

theme(
  element_name = element_function()
)

theme(): fine-tune individual elements

  • Check ?theme for all theme elements and options
  • Search: “ggplot theme …” (e.g. “ggplot theme remove legend”)

Theme reference sheet by Isabella Benabaye

theme_set(): set global theme

You can set a global theme that will be applied to all ggplot objects in the current R session.

# Globally set theme_minimal as the default theme
theme_set(theme_minimal())

Add this to the beginning of your script.

You can also specify some defaults, e.g. the text size:

theme_set(
  theme_minimal(
    base_size = 16,
    paper = "cornsilk",
    ink = "navy"
  )
)

This is very practical if you want to achieve a consistent look, e.g. for a scientific journal.

ggsave()

A ggplot object can be saved on disk in different formats.

Without specifications:

# save plot g in img as my_plot.pdf
ggsave(filename = "img/my_plot.pdf", plot = g)
# save plot g in img as my_plot.png
ggsave(filename = "img/my_plot.png", plot = g)

Or with specifications:

# save a plot named g in the img directory under the name my_plot.png
# with width 16 cm and height 9 cm
ggsave(
  filename = "img/my_plot.png",
  plot = g,
  width = 16,
  heigth = 9,
  units = "cm"
)

Have a look at ?ggsave to see all options.

Now you

Task 2 (30 min)

Make your penguin plots more beautiful

Find the task description here