library(tidyverse)Tidy data with tidyr
Slides in full screen Download PDF slides
1 Task
Find the solution here.
1.1 Get started
Before you start, make sure to load the tidyverse package.
1.2 Let’s tidy some data sets
First, complete both tasks before you move to the extras.
1. relig_income
Have a look at the relig_income data set that is included in tidyverse package. The data set contains the results of a survey asking people about their religion and income category.
What is not tidy about this data set? Can you fix it?
2. billboard
Have a look at the billboard data set that is included in the tidyverse package. The data set contains information about the chart rank of songs in the year 2000.
What is not tidy about this data set? Can you fix it?
3. fish_encounters
Have a look at the fish_encounters data set that is included in the tidyverse package. Each row indicates that a fish was detected (seen = 1) at a particular station.
Which format might be more convenient if you wanted to compare fish across stations at a glance?
1.3 For the fast ones
- Check out the
values_drop_naandnames_prefixargument ofpivot_longer. What does it do and how can you use it with thebillboarddata? - Check out the
values_fillargument ofpivot_wider. How could you use it in thefish_encounterscase to replace the missing values that are created? - This is a bit tricky: How would you have to change the
penguinstable if you wanted to make such a plot:
Hint: First use dplyr and only select the columns that you need for the plot. Then think about how to use tidyr to transform the data so it’s ready for ggplot