Before you start the how-to, please check that you have the following things set up and running:
devtools, usethis, testthat
usethis::browse_github_token()
in the console for a step by step guideThere are different ways to create a package in R. We based this How-To on the second method using the usethis
package.
Go to File | New Project | New Directory | R Package
.
Give your package a name that is accepted as an R-package name:
Make sure to check the Create a git repository
checkmark to initialize the package with a git repository and a .gitignore file.
If you accidentally created the package without a git repository you can either use usethis::use_git()
or you can initialize an empty git repository from the terminal git init
.
usethis::create_package()
Another possibility is to create a package by running usethis::create_package("path/to/package/packageName")
. If you leave all the other function parameters at their default value, this command will create a new directory (if it doesn’t already exist) and initialize an R package in that directory. Compared to the GUI method, this R package by default is already configured to using the roxygen2
package for package documentation (see later) and will not have any dummy files in it.
However, with this method you have to seperately initialize a git repository by running usethis::use_git()
or git init
from the terminal.
Checkout the options that you have in the Build
tab in RStudio. You can do everything described below via the GUI or by calling R functions.
During the entire package building process the devtools
package is your friend and you can (and should)
Check
button in the Build
panel or by calling devtools::check()
(Keyboard shortcut Ctrl/Cmd + Shift + E)devtools::load_all()
devtools::document()
(Ctrl/Cmd + Shift + D)
devtools::check()
will also build the documentation for youThe DESCRIPTION
file contains important metadata about your package and describes which other packages it needs to run correctly. If an R project contains a DESCRIPTION
file, RStudio and devtools will consider it an R package.
Fill out the fields of the description file, e.g.
Package: YoMos2020
Type: Package
Title: Analyse Temperature Data From Somewhere
Authors@R:
person(given = "Selina",
family = "Baldauf",
role = c("aut", "cre"),
email = "selina.baldauf@fu-berlin.de")
Description: Analysis of the temperature data collected in Berlin from our
research project xyz presented in paper abc.
License: `use_mit_license()`, `use_gpl3_license()` or friends to
pick a license
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.0
There are a few conventions on how to fill out the fields of the DESCRIPTION
file. Please have a look in the R packages book to find out more about this file. Also have a look at how usethis
created the template for you. This will already give you a hint on how to fill out the fields.
If you want to add a specific license to your project you can use different functions from the usethis
package, e.g. for the GNU General Public License 3 (gpl-3) license you can use
This adds LICENSE.md with your name (or the name(s) specified in the DESCRIPTION file) in it to the project and automatically updates the license field in DESCRIPTION
.
A common place for datasets that you want to make available in your research compendium is the /data
folder. The convention of R packages is that you should only story .RData
files in there. Everything that is not in .RData
format can go in the /data-raw
folder.
First create a /data-raw
folder by calling usethis::use_data_raw()
. This will also create a DATASET.R
file in which you can create your data and then add it to the /data
folder as an .RData
file. By default these files are compressed using bzip2
but you can also chose other compressions by changing the compress
parameter of the function. Just put the following lines in the DATASET.R
file and run them:
Save all the functions you need for your analysis in the R/
folder. You can group related functions together in the same R-script but unrelated functions should go in separate scripts. Here, we create a simple printer function that prints the mean of a vector. We save this function in a new R-Script called printMean.R. To create the script, you can conveniently use usethis::use_r("printMean")
which will create, save and open the script for you. Add the following simple function:
You can document your functions and data using the package roxygen2
. This allows you to describe your functions in the same place where you define them. Roxygen2 will automatically generate .Rd
files in the /man
folder, update the NAMESPACE
if you import or export functions and update the DESCRIPTION
file. The basic workflow is:
Code| Insert Roxygen Skeleton
or with the keyboard shortcut Ctrl + Alt + Shift + R
(on Windows)devtools::document()
to generate the documentation in the /man
folder and update the NAMESPACE
[If you created your package via the GUI and not with usethis::create_package()
you need to configure the build tools to use roxygen2. In order to do this navigate to Build
-> Configure Build Tools
and hit the checkmark at Generate documentation with Roxygen
. Click Configure
and make sure all the checkmarks are checked so that roxygen generates all help files and the documentation is rebuild everytime your install and restart the package. Before this works correctly, you have to delete all alld documenation files that were not generated by roxygen.If you forget to do this, you will get a warning when your rebuild or check your package, so be careful to read the warnings if there are any. ]
To create a documentation for our simple printMean
function, navigate to the respective R script and add a roxygen skeleton.
Now you can fill out the skeleton by adding all the information about the function that you want to appear in the help file for that function using tags. E.g. your roxygen description could look like the following:
#' @title Print mean values in a sentence
#' @description This function calculates the mean of its input and prints it into a sentence. The function
#' is called in the analysis.Rmd file
#' @export
#' @param x A vector of numeric values of which to calculate the mean
#'
#' @return The output of \code{\link{print}} after calculating the \code{\link{mean}} and pasting it
#' into a sentence
#'
#' @examples
#' printMean(c(1,2,3))
#'
To document datasets in your compendium open up a new R script with the name of your dataset in the /R
directory (in our case dummyTempData.R
) (remember usethis::use_r("dummyTempData")
?) and describe your data set before specifying the name of your dataset as a string. As a bare minimum this could look like the following:
#' @title Temperature data from place X
#' @description Temperature measurements (2m height) measured on March 21st, 2009 at place x
#' @format A \code{data.frame} with 2 columns, which are:
#' \describe{
#' \item{id}{id of the sensor}
#' \item{temp}{temperature in °C}
#'}
"dummyTempData"
Unfortunately, here you can’t insert a preset roxygen skeleton, because a data set is not a function. However, if you are too lazy to document your dataset by hand, you can use the sinew
package which is very convenient with package documentation using roxygen2
. To get a pre-made skeleton for the data set just run sinew::makeOxygen(obj = dummyTempData)
and you will get a perfect skeleton printed to the console, that you can copy paste to your dummyTempData.R
.
If you now run devtools::document()
or rebuild your package, you will find an automatically generated read only file printMean.Rd
and dummyTempData.Rd
in the /man
folder of your research compendium. If you call ?printMean
or ?dummyTempData
(after installing and restarting) you open the help file just like with any other R function you know.
Please also have a look at the conventions and guidelines on roxygen2-documentation the quick reference of the most common tags.
Add an /analysis
folder with an rmarkdown document report.Rmd
to your compendium. Depending on the type of report you want to share you have to edit the yaml header of your report. It can e.g. look something like this:
---
title: "Temperature Analyis"
author: "Selina Baldauf"
date: "28 June, 2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
devtools::load_all(".")
```
# This is the temperature analysis
```{r}
printMean(dummyTempData$temp)
```
In the Git
tab (next to Build
) you can first stage all created files and then commit them or you can do it in the terminal:
For this, you can conveniently use the usethis
package:
This will initialize a new GitHub repository from your compendium and will add the url to DESCRIPTION
.
If you do this for the first time it is possible that you run into some issues because you might not have set up everything correctly, yet. But the usethis
package will help you through all the steps with its error messages. One thing you definitely need for this step to work is a PAT (personal acces token) for GitHub added to your .Renviron file. Check if you have one or create one with usethis::browse_github_token()
.
Add a README.md to your package that will be rendered on the webpage of your GitHub repository.
In there, you can describe your package and add some information on how to download it and how to get started with the package. You can also do this on the GitHub webpage of your repository.
piggyback
If you use an R package as a research compendium, you might also want to share large datasets along with it. The problem is, that you cannot manage those large files anymore by tracking them with git and commiting directly to GitHub. However, GitHub allows you to attach up to 2 GB of files to each release of your project. The piggyback
package takes advantage of this and allows you to create GitHub releases, upload and download large datasets associated with your compendium.
First, we simulate a large datafile by creating a dataset in our data-raw
folder (if you have really large file you would not save them as .txt but rather compress them first).
largeDataSet <- data.frame(id = 1:1e6, temp = rnorm(1e6,15,2))
write.table(largeDataSet, "data-raw/largeDataSet.txt")
To use the piggyback
package, you first need to install it on your machine with install.packages("piggyback")
. To upload our dataset, we first need to create a new release on GitHub. The first release has to be created on GitHub, by clicking on Create a new release
, giving it a Tag (e.g. v0.0.1) and a title. You will see that the release contains your source code as zip and as a tarball (.tar.gz).
With piggyback you can always list your existing releases by running
You can also create a new release directly from R by calling
To track your dataset (which means that everytime the dataset changes you can create a new release), call
To piggyback your dataset to your package, you now simply have to run
Now you can find it attached to your latest release.
If you want to use the large dataset in your package for the analysis or to generate the report, you can write a function to download the data using piggyback::pb_download
. Just create a new file loadData.R
in your /R
directory and add the following function.
If this function is called, it will automatically download all the assets attached to the latest release of the repository and put them in a /data-raw
folder. Now the data can be accessed by all the package functions. Have a look at ?piggyback::pb_download
to see which parameters you can specify. You don’t have to download all the assets and you can also chose which version of the release should be used for the download.
You can also create new releases directly from piggyback
and you can track large datasets by creating new releases if the datasets change. This will not be covered here, but if you are interested have a look at the piggyback documentation.
First, create the infrastructure needed by the thestthat
package.
This creates a /tests
folder which contains a /testthat
folder and an R-script called testthat.R
. You do not need to change anything in the testthat.R script. This is called e.g. by rcmdcheck::rcmdcheck()
to run all the tests you created in your package. The /testthat
folder is the place to store the R-scripts with the actual tests. Also, the testthat
package ist now added to the Suggests field in DESCRIPTION
.
Now you can add a test script that tests your simple printer function. To create a test script for a specific function, open the R-script with the function and then initialize a basic test file for the script by calling
The testfile can be found in /tests/testthat
. Testfiles have to be named test-xyz.R in order to be found (usethis::use_test()
does the naming for you - so you don’t have to worry about it).
We can write a very simple test function for our printMean function that checks if it works correctly
If you run the test and you don’t get any errors in the console, then it means that your test was successful. If not you will get a helpful error message telling you what went wrong.
If you want to run all the tests from your package at once to see if they pass your can run
Before we start, it’s time to commit all the changes to our package. Use the Git
panel or run
Then create the basic file structure needed to set up a workflow by calling
This creates a .github
folder, in which you find a .gitignore
file and a /workflows
folder. In this folder you can store all your GitHub actions in .yaml
file format. The usethis::use_github_actions()
function automatically added a first GitHub action called R-CMD-check.yaml
. This action is triggered on every push or pull request on the master branch and checks your package to make sure that new code does not break the package.
If you want to add other actions that already are available (e.g. from the GitHub repository r-lib/actions) you can do the following:
This adds the respective yaml-file to ./github/workflows
. Of course you can also write your own actions and simply add them to the /workflows folder.
holepunch
There is a nice intro for holepunch: Getting started with holepunch. The code below is copied’n’pasted from the holepunch repo’s readme.
remotes::install_github("karthik/holepunch")
write_dockerfile(maintainer = "your_name")
# To write a Dockerfile. It will automatically pick the date of the last
# modified file, match it to that version of R and add it here. You can
# override this by passing r_date to some arbitrary date
# (but one for which a R version exists).
generate_badge() # This generates a badge for your readme.
# ----------------------------------------------
# At this time 🙌 push the code to GitHub 🙌
# ----------------------------------------------
# And click on the badge or use the function below to get the build
# ready ahead of time.
build_binder()
# 🤞🚀
[Be carful here: Since October 2020, GitHub has renamed its default branch for repositories from master to main. If you have a repository with main instead of master as default branch, holepunch will throw an error when trying to connect Binder to your GitHub repository. If this happens make sure that your replace master
with main
in the .binder/Dockerfile
URL and in the URL used in the Readme badge]
You can use devtools::spell_check()
to test your entire package for spelling errors. This is very useful if you have reports or other text intensive files in there. By default, the spell check will be for American English. However, you can change the langauage by adding a Language
field to your DESCRIPTION
file.
If you want to write a new function and generate the corresponding R script in the /R
directory, you can conveniently call usethis::use_r("functionName")
. This will create a file functionName.R
ind the /R
directory and open it for you.
Use the goodpractice
package to further test your package for parts which can be improved. Running goodpractice::gp()
.{R} will tell you where your comment lines are too long, where you should consider renaming variables, if something is wrong with your NAMESPACE
, …
https://the-turing-way.netlify.app/introduction/introduction.html
https://the-turing-way.netlify.app/research_compendia/research_compendia.html
https://nvie.com/posts/a-successful-git-branching-model/ https://guides.github.com/introduction/flow/
https://help.github.com/en/actions/building-and-testing-code-with-continuous-integration/about-continuous-integration https://www.youtube.com/watch?v=F3wZTDmHCFA
https://www.r-bloggers.com/rendering-your-readme-with-github-actions/ https://www.edwardthomson.com/blog/github_actions_advent_calendar.html
testthat
James, Justin. 2012. “10 Classic Mistakes That Plague Software Development Projects.” 2012. https://www.techrepublic.com/blog/10-things/10-classic-mistakes-that-plague-software-development-projects/.
Scheller, Robert M, Brian R Sturtevant, Eric J Gustafson, Brendan C Ward, and David J Mladenoff. 2010. “Increasing the Reliability of Ecological Models Using Modern Software Engineering Techniques.” Frontiers in Ecology and the Environment 8 (5): 253–60. https://doi.org/10.1890/080141.
Baxter, Susan M., Steven W. Day, Jacquelyn S. Fetrow, and Stephanie J. Reisinger. 2006. “Scientific Software Development Is Not an Oxymoron.” PLoS Computational Biology 2 (9): 0975–78. https://doi.org/10.1371/journal.pcbi.0020087.
Merali, Zeeya. 2010. “Computational Science: …Error.” Nature 467 (7317): 775–77. https://doi.org/10.1038/467775a.
Passig, Kathrin, and Johannes Jander. 2013. Weniger Schlecht Programmieren. O’Reilly.
Storer, Tim. 2017. “Bridging the Chasm: A Survey of Software Engineering Practice in Scientific Programming.” ACM Computing Surveys 50 (4). https://doi.org/10.1145/3084225.
Marwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018. “Packaging Data Analytical Work Reproducibly Using R (and Friends).” American Statistician 72 (1): 80–88. https://doi.org/10.1080/00031305.2017.1375986.