R and Open Science

Karthik Ram


Shortcuts: M =   ,   G =

Most science is not reproducible or repeatable, even within the same lab group over time.


Data Life Cycle

source: Michener, 2006 Ecoinformatics.

Open Science

Open data + code

Source: Wolkovich et al. GCB 2012.

Source: PLOS, 2007

R packages are increasingly showing up in domain journals.

Source: Molecular Ecology, 2012

R Open Science

Open Science needs open source tools

Source: Revolution Analytics, 2010, Nature editorial, 2012

Why R?

The old way...

Why R?

A better way

glm(y ~ -1 + a + c + z + a:z, data = mydata, maxit = 30)

This is reproducible, repeatable and can serve as a analytic workflow.

Wrapping all science APIs

Development team

Carl Boettiger

Karthik Ram

Scott Chamberlain

Advisory team

Temple Lang
Hadley Wickham

JJ Allaire

Matt Jones

Ropensci's Packages

Data repositories






R and APIs

API keys can be stored in a users.rprofile

	options(MendeleyKey = "uf5daib7wyil7ag5buc")
	options(MendeleyPrivateKey = "faj2os5dyd7jop2fok6")
	options(PlosApiKey = "ef3vip9yak7od3hud4g")
	options(SpringerMetdataKey = "ri9hi7woc6jax4vaf8w")

Note: These keys aren't real.

Public Library of Science full text - rplos

plot_throughtime(list("reproducible science"), 500)

Managing bibliography - RMendeley

Manage libraries and measure impact of research

groupDocInfo(mc, 530031, 4344945792)
[1] "SUMMARY: Modern biological experiments create vast amounts of data which are geographically distributed. These datasets consist of petabytes of raw data and billions of documents. Yet to the best of our knowledge, a search engine technology that searches and cross-links all different data types in life sciences does not exist.....

      forename        surname
   "Dominic S" 	"L\xfctjohann" 
# ....

Accessing data behind papers - dryad

# Get the URL for a data file
dryaddat <- download_url("10255/dryad.1759")

# Get a file given the URL
file <- dryad_getfile(dryaddat)

Tracking altmetrics - raltmet

Tracks altmetrics across various sources such as GitHub, Total impact, CitedIn, CiteULike, Stackoverflow.

GitHub(userorg = "ropensci", repo = "rmendeley")
totimp(id = "10.5061/dryad.8671")
stackexchange(ids = 16632)

Mapping biodiversity data - rgbif

distribution <- occurrencelist(sciname = "Danaus plexippus", coordinatestatus = TRUE, maxresults = 1000, latlongdf = TRUE)

Also see Cartodb's powerful mapping capabilites and R package.

Sharing unpublished data - (figshare)

Using Figshare's new API, it is now possible to share figures, data, and any other object generated in R directly to one's figshare account.

> figshare(data)
# code isn't ready yet but once it is, it will return a persistent identifier

A multi-institution consortium to build infrastructure for open science


DataONE creates all the necessary components to support persistent and secure access to earth observation data.

DataONE's upcoming R package will allow users to submit and access data to/from member nodes directly from the console.

Provenance is important for reproducibility

Source: Modified from original version by James Cheney, University of Edinburgh.

Making R provenance aware

DataNE provenance working group and R

Taking an approach similar to knitr where a user can track workflow provenance using hooks.

Using XML to track metadata and maintain provenance traces across runs


GitHub + Science

Rapid peer-peer sharing of code is great for science

R packages early in development can easily be tested, rapidly deployed from GitHub using devtools and revised before submitting to a persistent repository such as CRAN.

install_github("RMendeley", "ropensci")

R + collaborative writing

knitr + Markdown

Xie Y (2012). knitr: A general-purpose package for dynamic report generation in R.

knitr + Markdown + GitHub

GitHub automatically renders Markdown and even provides syntax highlighting

knitr + Markdown + GitHub = executible paper

knitr + Markdown + GitHub = pre publication review

Incorporate citations with R + Markdown


citet(c(Halpern2006 = "10.1111/j.1461-0248.2005.00827.x"))
# then cite in your markdown file

# or read citations from a bibtex file which can be automatically generated and updated from services like Mendeley
bib <- read.bibtex("example.bib") # then cite inline citet(bib[["knitr"]])

- knitcitations on Carl Boettiger's GitHub
- tutorial

Open notebooks with R

R talks to Dropbox, Amazon S3, Wordpress, img.ur, and elsewhere in the

Various tools in R can drive data reuse, new collaborations, new tools, novel visualization, and keep the entire research process transparent through open notebooks.


Please us if you have feedback or ideas for collaborations.

All ropensci projects are on
also on and