Write R code that lasts: Practical tips for reproducible data analysis
Description
Do your R projects sometimes get messy, your scripts grow out of control, or your results are hard to reproduce? In this lecture, I’ll share practical tips for setting up clean, maintainable, and reproducible data analysis workflows. We’ll cover how to structure projects, organize scripts, write good-practice code, and manage dependencies effectively. The focus is on realistic practices that make your code easier for others — and your future self — to understand and reuse. While the examples will use R, many of the tips apply to anyone doing data analysis, also in other languages. Whether you’re just starting out or already experienced with R, this session will help you build better habits for writing clean, reusable code.
Summary
Below you find a summary of all the topics covered in this session. You can use this summary as a checklist for your own projects to see where you can still make improvements.
Project organization
- Self-contained project structure: All files in one place separated into sub-folders
- Include a REAMDE file
- Name files properly: Files should be machine-readable, human-readable and work with default file ordering
Coding
- Use save paths: Use RStudio projects and the
here
package - Structure your scripts: Initialize on top, read all data in one place
- Use a consistent coding style: Follow the tidyverse style guide, use the
lintr
package to analyze your code, use auto-formatting tools - Modularize long scripts: Break down long scripts into logical units
- Don’t repeat yourself (DRY): Don’t copy and paste code, write functions instead
Managing dependencies
- Low effort, manual: Use
devtools::session_info()
to list package and software versions - More effort, more reproducible: Use
renv
package to manage dependencies
Resources and links
Books, Papers, Blog Posts, etc.
- What they forgot to teach you about R: A free online book by Jenny Bryan et al. with many basic and practical tips on things to know about R besides data analysis.
- The Turing Way: An open, community‑driven handbook on reproducible, ethical, and collaborative research.
- “Workflow vs Script”: Jenny Bryan’s blog post on turning one‑off scripts into reusable, automated workflows.
- “R Best Practices”: Krista L. DeStasio’s overview of coding conventions and project hygiene in R.
- “How to Name Files” (slides): Jenny Bryan’s slide deck on systematic, machine‑friendly file naming.
- tidyverse Style Guide: A concise set of conventions for naming, spacing, and structuring R code.
Tools
template
R package: R package to create template project structure forhere
R package: Build robust relative file paths that always start at your project root.
lintr
R package: Static code analysis for catching style issues and potential bugs in R scripts.
renv
R package: Create project‑local package libraries and lock dependency versions for reproducibility.