Tidy data with tidyr

Slides in full screen     Download PDF slides

1 Task

Note

Find the solution here.

1.1 Get started

Before you start, make sure to load the tidyverse package.

library(tidyverse)

1.2 Let’s tidy some data sets

First, complete both tasks before you move to the extras.

1. relig_income

Have a look at the relig_income data set that is included in tidyverse package. The data set contains the results of a survey asking people about their religion and income category.

What is not tidy about this data set? Can you fix it?

2. billboard

Have a look at the billboard data set that is included in the tidyverse package. The data set contains information about the chart rank of songs in the year 2000.

What is not tidy about this data set? Can you fix it?

3. fish_encounters

Have a look at the fish_encounters data set that is included in the tidyverse package. Each row indicates that a fish was detected (seen = 1) at a particular station.

Which format might be more convenient if you wanted to compare fish across stations at a glance?

1.3 For the fast ones

  • Check out the values_drop_na and names_prefix argument of pivot_longer. What does it do and how can you use it with the billboard data?
  • Check out the values_fill argument of pivot_wider. How could you use it in the fish_encounters case to replace the missing values that are created?
  • This is a bit tricky: How would you have to change the penguins table if you wanted to make such a plot:

Hint: First use dplyr and only select the columns that you need for the plot. Then think about how to use tidyr to transform the data so it’s ready for ggplot