Reproducible data analysis - R packages as research compendia

Sebastian Hanß and Selina Baldauf

28 June, 2021

Workshop overview

Motivation

This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence (DOI 10.5281/zenodo.3332807).

Two main topics

Motivations for a better project organization and workflow

How? Use a research compendium!

This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence (DOI 10.5281/zenodo.3332807).

What is a research compendium?

This is a collection of all digital parts of a research project including data, code, texts (protocols, reports, questionnaires, metadata). The collection is created in such a way that reproducing all results is straight forward.

(Definition from The Turing Way)

Excursion to the TTW website

Excursion to Karthik Ram’s talk at RStudio::conf 2019

Mavick et al. 2018

Advantages of using R Packages

Hands-on I

Create a research compendium with the R package structure

Version control

Git as a distributed version control system (VCS)

Hands-on II

Add version control to the research compendium

Automated testing

What?

Why?

How?

Hands-on III

Unit tests

Continous integration (CI)

“If it hurts, do it more often” (Martin Fowler)

Continous integration (CI)

Idea

Github Actions for CI (and other automation tasks)

This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence (DOI 10.5281/zenodo.3332807).

Hands-on IV

Github Actions for R CMD check

Reproducibility and Accessability

Another excursion to Karthik Ram’s talk at RStudio::conf 2019

Holepunch

Hands-on V

Publish your repository on Binder with a live RStudio server instance