Write R code that lasts: Practical tips for reproducible data analysis

Description

Do your R projects sometimes get messy, your scripts grow out of control, or your results are hard to reproduce? In this lecture, I’ll share practical tips for setting up clean, maintainable, and reproducible data analysis workflows. We’ll cover how to structure projects, organize scripts, write good-practice code, and manage dependencies effectively. The focus is on realistic practices that make your code easier for others — and your future self — to understand and reuse. While the examples will use R, many of the tips apply to anyone doing data analysis, also in other languages. Whether you’re just starting out or already experienced with R, this session will help you build better habits for writing clean, reusable code.

Slides in full screen     Download PDF slides

Summary

Below you find a summary of all the topics covered in this session. You can use this summary as a checklist for your own projects to see where you can still make improvements.

Project organization

  • Self-contained project structure: All files in one place separated into sub-folders
  • Include a REAMDE file
  • Name files properly: Files should be machine-readable, human-readable and work with default file ordering

Coding

  • Use save paths: Use RStudio projects and the here package
  • Structure your scripts: Initialize on top, read all data in one place
  • Use a consistent coding style: Follow the tidyverse style guide, use the lintr package to analyze your code, use auto-formatting tools
  • Modularize long scripts: Break down long scripts into logical units
  • Don’t repeat yourself (DRY): Don’t copy and paste code, write functions instead

Managing dependencies

  • Low effort, manual: Use devtools::session_info() to list package and software versions
  • More effort, more reproducible: Use renv package to manage dependencies