Task 1: Filter, select, and mutate

← Back to session page

Find the solution here after the session ends.

Getting started

A helpful resource for this task is the dplyr cheatsheet.

Before you start, make sure to load the tidyverse package.

library(tidyverse)

We will continue working with the penguins data set.

Filter penguins

Find all penguins that …

  1. … have a bill length between 40 and 45 mm.

  2. … are of the species Adelie or Gentoo.

  3. … lived on the island Dream in the year 2007.

Remove missing values

  1. Remove all penguins with missing values for sex using drop_na().

Select columns

  1. Select only the variables species, sex, and year.

  2. Select only columns that start with "bill".

Add new columns

  1. Add a column with the ratio of bill length to bill depth.

  2. Add a column with abbreviations for the species (Adelie = A, Gentoo = G, Chinstrap = C).

Combine with the pipe

  1. Use the pipe to combine multiple steps: start with penguins, remove rows with missing sex, keep only Adelie penguins, and select only species, sex, and body_mass.

For the fast ones

You can do these in any order, or skip them and just take a break.

  • Use filter_out() to exclude penguins from Torgersen island, then select only species, island, and flipper_len.

  • Create a size_category column with case_when based on body mass (small < 3500, medium < 5000, large >= 5000).

    • Extra: Do it in a pipe that also removes NAs and selects only species, body_mass, and size_category.