library(tidyverse)Solution: Import and Export Data
Read a CSV file
Read the tree data:
trees <- read_csv("data/tree_growth.csv")
trees# A tibble: 15 × 6
tree_id species height_m diameter_cm age_years site
<chr> <chr> <dbl> <dbl> <dbl> <chr>
1 T001 Oak 12.3 35.2 45 Forest_A
2 T002 Beech 15.1 28.7 38 Forest_A
3 T003 Pine 18.5 22.1 30 Forest_B
4 T004 Oak 9.8 25.4 32 Forest_A
5 T005 Birch 11.2 18.9 25 Forest_B
6 T006 Pine 20.1 26.3 35 Forest_B
7 T007 Beech 13.7 31.5 42 Forest_A
8 T008 Oak 14.5 38.1 50 Forest_B
9 T009 Birch 8.9 15.2 20 Forest_A
10 T010 Pine 16.8 24.7 28 Forest_B
11 T011 Oak 11.1 29.8 36 Forest_A
12 T012 Beech 17.3 33.6 48 Forest_B
13 T013 Birch 10.5 17.4 22 Forest_A
14 T014 Pine 19.2 25.8 33 Forest_B
15 T015 Oak 13.9 36.5 47 Forest_A
Explore the data:
summary(trees) tree_id species height_m diameter_cm
Length:15 Length:15 Min. : 8.90 Min. :15.20
Class :character Class :character 1st Qu.:11.15 1st Qu.:23.40
Mode :character Mode :character Median :13.90 Median :26.30
Mean :14.19 Mean :27.28
3rd Qu.:17.05 3rd Qu.:32.55
Max. :20.10 Max. :38.10
age_years site
Min. :20.0 Length:15
1st Qu.:29.0 Class :character
Median :35.0 Mode :character
Mean :35.4
3rd Qu.:43.5
Max. :50.0
There are 4 tree species (Beech, Birch, Oak, Pine). The tallest tree is 20.1 m.
Calculate the mean height:
mean(trees$height_m)[1] 14.19333
Read an Excel file
library(readxl)birds <- read_excel("data/bird_observations.xlsx")
birds# A tibble: 12 × 6
observation_id species count habitat date temperature_c
<chr> <chr> <dbl> <chr> <chr> <dbl>
1 OBS001 Robin 5 Garden 2024-03-15 12.3
2 OBS002 Blackbird 3 Forest 2024-03-15 11.8
3 OBS003 Blue Tit 8 Garden 2024-03-16 13.1
4 OBS004 Great Tit 4 Forest 2024-03-16 12.5
5 OBS005 Robin 6 Meadow 2024-03-17 14.2
6 OBS006 Sparrow 12 Garden 2024-03-17 13.8
7 OBS007 Blackbird 2 Forest 2024-03-18 11
8 OBS008 Blue Tit 7 Garden 2024-03-18 12.9
9 OBS009 Sparrow 9 Meadow 2024-03-19 15.1
10 OBS010 Great Tit 3 Forest 2024-03-19 13.4
11 OBS011 Robin 4 Garden 2024-03-20 14.7
12 OBS012 Blackbird 5 Meadow 2024-03-20 15.3
summary(birds) observation_id species count habitat
Length:12 Length:12 Min. : 2.000 Length:12
Class :character Class :character 1st Qu.: 3.750 Class :character
Mode :character Mode :character Median : 5.000 Mode :character
Mean : 5.667
3rd Qu.: 7.250
Max. :12.000
date temperature_c
Length:12 Min. :11.00
Class :character 1st Qu.:12.45
Mode :character Median :13.25
Mean :13.34
3rd Qu.:14.32
Max. :15.30
Write data to a file
write_csv(trees, file = "data/trees_copy.csv")
Note
For this to work, the data/ directory must already exist in your project. Otherwise you will get an error.
Challenge: a slightly messy file
If you try to read the file directly, the metadata lines on top are treated as data:
read_csv("data/water_quality.csv")Open the file in a text editor — you can see 4 lines of metadata before the actual data starts. Use skip to skip them:
water <- read_csv("data/water_quality.csv", skip = 4)Rows: 12 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): sample_id, lake
dbl (4): depth_m, temperature_c, dissolved_oxygen_mg_l, ph
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
water# A tibble: 12 × 6
sample_id lake depth_m temperature_c dissolved_oxygen_mg_l ph
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 WQ001 Wannsee 1 8.2 10.5 7.8
2 WQ002 Wannsee 5 6.8 8.3 7.5
3 WQ003 Wannsee 10 5.1 6.1 7.2
4 WQ004 Mueggelsee 1 9 11.2 8.1
5 WQ005 Mueggelsee 5 7.3 9 7.7
6 WQ006 Mueggelsee 10 5.5 5.8 7.3
7 WQ007 Tegeler See 1 8.8 10.8 7.9
8 WQ008 Tegeler See 5 7 8.7 7.6
9 WQ009 Tegeler See 10 5.3 6.5 7.4
10 WQ010 Wannsee 1 9.5 10.1 7.7
11 WQ011 Mueggelsee 1 10.2 11.5 8
12 WQ012 Tegeler See 1 9.1 10.3 7.8
Optional tasks
Challenge: an even messier file
This file has two problems:
- Metadata lines on top (4 lines to skip)
- The delimiter is
;instead of,
So read_csv() won’t work even with skip. Use read_delim() instead:
soil <- read_delim(
"data/soil_nutrients_messy.csv",
delim = ";",
skip = 4
)Rows: 10 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr (2): Plot ID, Soil Type
dbl (4): Nitrogen (mg/kg), Phosphorus (mg/kg), pH, Organic Matter (%)
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
soil# A tibble: 10 × 6
`Plot ID` `Soil Type` `Nitrogen (mg/kg)` `Phosphorus (mg/kg)` pH
<chr> <chr> <dbl> <dbl> <dbl>
1 P01 Sandy 12.3 4.5 6.2
2 P02 Clay 18.7 7.2 5.8
3 P03 Sandy 10.1 3.8 NA
4 P04 Loam 15.9 6.1 6.5
5 P05 Clay 20.3 NA 5.5
6 P06 Sandy 11.8 4.1 6
7 P07 Loam 16.2 5.9 6.3
8 P08 Clay 19.5 7.8 5.7
9 P09 Sandy 9.4 3.2 6.4
10 P10 Loam 14.7 5.5 6.1
# ℹ 1 more variable: `Organic Matter (%)` <dbl>
Clean messy column headers
Some column names have spaces and special characters. Use janitor::clean_names() to fix this:
# install.packages("janitor")
library(janitor)
soil <- clean_names(soil)
soilThe column names are now clean without spaces and special characters, all lowercase.