R and Open Science

Karthik Ram

karthik@ropensci.org


Shortcuts: M =   ,   G =


Most science is not reproducible or repeatable, even within the same lab group over time.


Science

Data Life Cycle

source: Michener, 2006 Ecoinformatics.



Open Science


Open data + code

Source: Wolkovich et al. GCB 2012.




Source: PLOS, 2007





R packages are increasingly showing up in domain journals.



Source: Molecular Ecology, 2012


R Open Science



Open Science needs open source tools



Source: Revolution Analytics, 2010, Nature editorial, 2012

Why R?

The old way...

Why R?

A better way



glm(y ~ -1 + a + c + z + a:z, data = mydata, maxit = 30)


This is reproducible, repeatable and can serve as a analytic workflow.




Wrapping all science APIs




Development team


Carl Boettiger

Karthik Ram

Scott Chamberlain




Advisory team


Duncan
Temple Lang
Hadley Wickham


JJ Allaire

Bertram
Ludascher
Matt Jones


Ropensci's Packages

Data repositories


 
   

Literature


   

Metadata


 
     


R and APIs

API keys can be stored in a users.rprofile

 
	options(MendeleyKey = "uf5daib7wyil7ag5buc")
	options(MendeleyPrivateKey = "faj2os5dyd7jop2fok6")
	options(PlosApiKey = "ef3vip9yak7od3hud4g")
	options(SpringerMetdataKey = "ri9hi7woc6jax4vaf8w")
	





Note: These keys aren't real.

Public Library of Science full text - rplos


library(rplos)
plot_throughtime(list("reproducible science"), 500)


Managing bibliography - RMendeley

Manage libraries and measure impact of research

groupDocInfo(mc, 530031, 4344945792)
$abstract
[1] "SUMMARY: Modern biological experiments create vast amounts of data which are geographically distributed. These datasets consist of petabytes of raw data and billions of documents. Yet to the best of our knowledge, a search engine technology that searches and cross-links all different data types in life sciences does not exist.....

$authors
$authors[[1]]
      forename        surname
   "Dominic S" 	"L\xfctjohann" 
# ....
	


Accessing data behind papers - dryad

# Get the URL for a data file
dryaddat <- download_url("10255/dryad.1759")

# Get a file given the URL
file <- dryad_getfile(dryaddat)


Tracking altmetrics - raltmet

Tracks altmetrics across various sources such as GitHub, Total impact, CitedIn, CiteULike, Stackoverflow.

GitHub(userorg = "ropensci", repo = "rmendeley")
totimp(id = "10.5061/dryad.8671")
stackexchange(ids = 16632)

Mapping biodiversity data - rgbif

distribution <- occurrencelist(sciname = "Danaus plexippus", coordinatestatus = TRUE, maxresults = 1000, latlongdf = TRUE)

Also see Cartodb's powerful mapping capabilites and R package.


Sharing unpublished data - (figshare)

Using Figshare's new API, it is now possible to share figures, data, and any other object generated in R directly to one's figshare account.


> figshare(data)
# code isn't ready yet but once it is, it will return a persistent identifier






A multi-institution consortium to build infrastructure for open science



DataNE

DataONE creates all the necessary components to support persistent and secure access to earth observation data.




DataONE's upcoming R package will allow users to submit and access data to/from member nodes directly from the console.



Provenance is important for reproducibility



Source: Modified from original version by James Cheney, University of Edinburgh.



Making R provenance aware



DataNE provenance working group and R

Taking an approach similar to knitr where a user can track workflow provenance using hooks.


Using XML to track metadata and maintain provenance traces across runs


Ideas?





GitHub + Science

Rapid peer-peer sharing of code is great for science



R packages early in development can easily be tested, rapidly deployed from GitHub using devtools and revised before submitting to a persistent repository such as CRAN.


library(devtools)
install_github("RMendeley", "ropensci")



R + collaborative writing


knitr + Markdown



Xie Y (2012). knitr: A general-purpose package for dynamic report generation in R.

knitr + Markdown + GitHub

GitHub automatically renders Markdown and even provides syntax highlighting




knitr + Markdown + GitHub = executible paper




knitr + Markdown + GitHub = pre publication review



Incorporate citations with R + Markdown


knitcitations

citet(c(Halpern2006 = "10.1111/j.1461-0248.2005.00827.x"))
# then cite in your markdown file
citet("Halpern2006")

# or read citations from a bibtex file which can be automatically generated and updated from services like Mendeley
bib <- read.bibtex("example.bib") # then cite inline citet(bib[["knitr"]])

- knitcitations on Carl Boettiger's GitHub
- tutorial


Open notebooks with R

R talks to Dropbox, Amazon S3, Wordpress, img.ur, and elsewhere in the




Various tools in R can drive data reuse, new collaborations, new tools, novel visualization, and keep the entire research process transparent through open notebooks.

  bit.ly/ORqpuM

Please us if you have feedback or ideas for collaborations.

All ropensci projects are on
also on and

/

#