install.packages("tidyverse")
Day 1 - Introduction to Data Analysis with R
Freie Universität Berlin - Theoretical Ecology
March 14, 2025
The tidyverse is an opinonated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
(www.tidyverse.org)
These are the main packages from the tidyverse that we will use:
Install the tidyverse once with:
Then load and attach the packages at the beginning of your script:
You can also install and load the tidyverse packages individually, but since we will use so many of them together, it’s easier to load and attach them together.
readr
is a tidyverse package. To use it, you can load the tidyverse:
The most important functions are:
read_csv
/write_csv
to read/write comma delimited files
read_tsv
/write_tsv
to read/write tab delimited files
read_delim
/write_delim
to read/write files with any delimiter
read_*()
All read_*
functions take a path to the data file as a first argument:
read_*(file = “path/to/your/file”, …)
Import files with a readr
function fitting the delimiter of your file:
Use read_delim
for a generic type of delimiter:
All read_*
functions return a tibble
read_*()
The read functions provide several options to modify the reading of data.
Have a look at ?read_delim
for all options.
Useful if your data is not a “perfect table”
read_*()
Specify number of lines to skip reading with skip
#> # A tibble: 6 × 1
#> Metadata
#> <chr>
#> 1 Date: June, 12, 1989
#> 2 Author: Selina Baldauf
#> 3 Temperature, Rainfall
#> 4 1.5, 2
#> 5 1, 0
#> 6 0.5, 0.6
read_*()
Specify whether the data has a header column or not with col_names
#> # A tibble: 2 × 2
#> `1.5` `2`
#> <dbl> <dbl>
#> 1 1 0
#> 2 0.5 0.6
read_*()
Specify whether the data has a header column or not with col_names
#> # A tibble: 2 × 2
#> `1.5` `2`
#> <dbl> <dbl>
#> 1 1 0
#> 2 0.5 0.6
write_*()
Every read_*
has a corresponding write_*
function to export data from R.
Write data from R e.g.
To share transformed or summarized data
Summarize complex raw data and continue working with summarized data
…
write_*()
All write_*
functions take the data to write as the first and the file to write to as the second argument:
write_*(x = dat, file = “path/to/save/file.*”, …)
The readxl
package is part of the tidyverse, but you need to load it explicitly
By default, this reads the first sheet. You can read other sheets with
read_excel
also has other functionality, like skipping rows etc.?read_excel
and the package documentation for more functionalityA little warning:
summary
function and checking if the number of rows etc. is correctC:/Users/Selina/folder1/folder2/data/file_to_read.csv
data/file_to_read.csv
getwd()
Working with R and RStudio, the best way is to:
Follow these guidelines to make data import to R easier and less frustrating
.csv
, .txt
instead of .xlsx
)Save an Excel spreadsheet as csv
Follow these guidelines to make data import to R easier and less frustrating
.csv
, .txt
instead of .xlsx
)species_name
instead of species name
janitor::clean_names()
from the janitor
package.
as a decimal separator (not ,
)data-raw/my_data.csv
instead of data raw/my data.csv
Task (20 min)
Read and write data files
Find the task description here
Selina Baldauf // Import and export data