read_csv("data/temperature.csv")
Scientific workflows: Tools and Tips 🛠️
2025-05-15
📅 Every 3rd Thursday 🕓 4-5 p.m. 📍 Webex
🔧 Maintainability Can I understand, update and fix my own code?
🔄 Reproducibility Can someone else reproduce my results?
🏋 Reliability Will my code work in the future?
⚙️ Reusability Can someone else actually use my code?
Learn how to keep the kitchen clean.
Will they be able to understand and use your project?
Practical reproducibility check
Send your project to a colleague and ask them understand and run the code your analysis.
README
filetemplate
R package)Your collaborators and your future self will love you for this.
File names should be
Names should allow for easy searching, grouping and extracting information from file names.
📄 2023-04-20 temperature göttingen.csv
📄 2023-04-20 rainfall göttingen.csv
📄 2023-04-20_temperature-goettingen.csv
📄 2023-04-20_rainfall-goettingen.csv
Names shoud be informative and reveal the file content.
📄 01preparedataforanalysis.R
📄 01firstscript.R
📄 01_prepare-data-for-analysis.R
📄 01_lm-temperature-trend.R
If you order your files by name, the ordering should make sense:
01
, 02
, …)YYYY-MM-DD
format📄 2023-04-20_temperature-goettingen.csv
📄 2023-04-21_temperature-goettingen.csv
📄 01_prepare-data.R
📄 02_lm-temperature-trend.R
To read and write files, you need to tell R where to find them.
Common workflow:
Set working directory with setwd()
, then read files from there:
This is not reproducible! Your computer at exactly this time is the only one that has this working directory.
File -> New Project -> New Directory -> New Project
Create Project
File -> New Project -> Existing Directory
Create Project
Option 2: Use the here
package
here
function*.Rproj
, .here
, .git
, …here
*.Rproj
OR with .here
Artwork by Allison Horst, CC BY 4.0
library()
calls# Purpose: Create Figure 2 showing the relationship
# between body mass and bill length
# Authors: Selina Baldauf, Jane Doe, Jon Doe
# load libraries ---------------------------------
library(tidyverse)
library(vegan)
# Set global options -----------------------------
# Plot themes and colors
theme_set(theme_minimal())
custom_colors <- c("cyan4", "darkorange", "purple")
# Source additional code -------------------------
source("R/my_cool_function.R")
# Read data --------------------------------------
temperature <- read_csv("data/temperature.csv")
rainfall <- read_csv("data/rainfall.csv")
####
or ----
Ctrl/Cmd + Shift + R
Have a naming convention for variables and functions and stick to it
snake_case
for longer variable names<-
, ==
, +
, etc.)<-
, ==
, +
, etc.)|>
, %>%
) and +
in ggplot followed by new line<-
, ==
, +
, etc.)|>
, %>%
) and +
in ggplot followed by new line<-
, ==
, +
, etc.)|>
, %>%
) and +
in ggplot followed by new lineDo I really have to remember all of this?
Luckily, no! R and RStudio provide some nice helpers
{lintr}
The lintr
package analyses your code files or entire project and tells you what to fix.
lintr
IDEs offer auto-formatting tools.
Auto-format your scripts on save and let the IDE do the job
RStudio: Open command palette (Tools
-> Show command palette
), search for “format” -> “Reformat documents on save”
One huge script is hard to maintain
01_prepare-data.R
: Read raw data and prepare it for analysis02_run-models.R
: Run statistical analysis03_make-figures.R
: Create manuscript figuressource()
to source R scripts in other scriptsWrite a main workflow script that calls scripts in the right order.
make.R
, run.R
or main.R
Example: Same data preparation code for multiple data sets
# Read the data
my_data1 <- readr::read_csv("data/data1.csv")
my_data2 <- readr::read_csv("data/data2.csv")
# Clean and summarize data
my_data1 <- my_data1 |>
summarize(
height = mean(height),
biomass = mean(biomass),
.by = c(country, species)
)
my_data2 <- my_data2 |>
summarize(
height = mean(height),
biomass = mean(biomass),
.by = c(country, species)
)
What’s the problem?
Solution: If you notice that you copy-paste code - write a function
Function in R/prepare_data.R
:
Add the output of devtools::session_info()
to your README
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.4.2 (2024-10-31 ucrt)
os Windows 11 x64 (build 26100)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Germany.utf8
ctype English_Germany.utf8
tz Europe/Berlin
date 2025-05-13
pandoc 3.6.3 @ C:\\Users\\Selina\\AppData\\Local\\Programs\\Quarto\\bin\\tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.2)
cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.2)
devtools 2.4.5 2022-10-11 [1] CRAN (R 4.4.2)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.2)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.2)
evaluate 1.0.3 2025-01-10 [1] CRAN (R 4.4.2)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.2)
fs 1.6.5 2024-10-30 [1] CRAN (R 4.4.2)
glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.2)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.2)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.2)
httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.2)
jsonlite 1.9.1 2025-03-03 [1] CRAN (R 4.4.3)
knitr 1.49 2024-11-08 [1] CRAN (R 4.4.2)
later 1.4.1 2024-11-27 [1] CRAN (R 4.4.2)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.2)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.2)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.2)
mime 0.12 2021-09-28 [1] CRAN (R 4.4.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.2)
pkgbuild 1.4.6 2025-01-16 [1] CRAN (R 4.4.2)
pkgload 1.4.0 2024-06-28 [1] CRAN (R 4.4.2)
profvis 0.4.0 2024-09-20 [1] CRAN (R 4.4.2)
promises 1.3.2 2024-11-28 [1] CRAN (R 4.4.2)
purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.2)
R6 2.6.1 2025-02-15 [1] CRAN (R 4.4.3)
Rcpp 1.0.14 2025-01-12 [1] CRAN (R 4.4.2)
remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.2)
rlang 1.1.5 2025-01-17 [1] CRAN (R 4.4.2)
rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.4.2)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.2)
shiny 1.10.0 2024-12-14 [1] CRAN (R 4.4.2)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.2)
usethis 3.1.0 2024-11-26 [1] CRAN (R 4.4.2)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.2)
xfun 0.52 2025-04-02 [1] CRAN (R 4.4.3)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.2)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.2)
[1] C:/Users/Selina/AppData/Local/Programs/R/R-4.4.2/library
──────────────────────────────────────────────────────────────────────────────
{renv}
Idea: Have a project-local environment with all packages needed by the project
Why this is useful?
Check out the renv website for more information
{renv}
Very simple to use and integrate into your project workflow:
renv
package, then they can also call renv::restore()
… help you to write robust and reproducible code.
Artwork by Allison Horst, CC BY 4.0
Of course there is much that I left out:
But this is for another time
📅 19th June 🕓 4-5 p.m. 📍 Webex
For topic suggestions and/or feedback send me an email
Questions?
What they forgot to teach you about R book by Jenny Bryan and Jim Hester
Blogpost by Jenny Bryan on good project-oriented workflows
R best practice blogpost by Krista L. DeStasio
Book about coding style for R: The tidyverse style guide
The Turing way book General concepts and things to think about regarding reproducible research
Selina Baldauf // Reproducible data analysis