My friend Jonah asked me to guest lecture in his R seminar aimed at grad students and postdocs in Integrative Biology. I gave Jonah a bunch of topic options ranging from reproducible research with R to data manipulation. The consensus was data visualization so I put together a 2 hour talk/hands on presentation for ggplot2 […] Read more – ‘A quick introduction to ggplot2’.
I’ve been thinking a lot about the importance of version control in science of late. This is not just because of my involvement with multiple collaborative efforts that would be a nightmare to move forward without a structured workflow. I fortuitously got involved in a collaboration between GitHub, BiomedCentral, and a handful of bioinformatics scientists […] Read more – ‘Version control for science’.
Altmetrics is all the rage these days in the scientometrics world. One rationale for developing these metrics has been to quantify the entire range of academic output beyond publications to include everything from datasets and code to presentations. The idea is that these metrics would one day be used in tenure committees (and tenure track […] Read more – ‘Altmetrics as a discovery tool’.
After a winter break of working at half-speed, I’m finding it a little daunting to face the overwhelming number of projects that need my attention in the new year. As I sort through and prioritize the ones where the biggest fires are raging, I also like to use this time to reevaluate how I go […] Read more – ‘Make your research a little more open this year’.
Since someone asked about tables in markdown in the comments section of an earlier post, I thought I’d elaborate a little more. Since the appeal of markdown is its minimalism, options for formatting tables are also fairly limited. LaTeX is a much better tool if one needs to work with complicated tables (like cells that […] Read more – ‘Formatting tables in markdown’.
In my last post I sang praises for markdown as a way to write and collaborate on manuscripts and other scientific documents. As easy as it is to use, the one command line step is enough of a barrier for most academics. This brought back an old idea that I batted around with a few […] Read more – ‘Thoughts on a preprint server’.
I spent an hour this morning polishing up a proposal. This mostly involved running spell-checks, cleaning up tables, and making sure I added in all the right references. That’s when I realized something. I haven’t used Microsoft Word to write anything in over 6 months. How fantastic! Like everyone else I’ve been complaining about MS […] Read more – ‘How to ditch Word’.
I was fortunate enough to be invited to the PLOS altmetrics workshop held last week in Fort Mason as part of the rOpenSci team. For those of you that haven’t heard of the term altmetrics, it refers to alternative measures of scholarly impact beyond just citations which can take a very long time before being […] Read more – ‘PLOS Altmetrics workshop’.
When I first started using markdown a couple of years ago, I expected its popularity to be somewhat short lived and mostly in a blogging/note taking context. The greatest appeal of markdown is the fact the learning curve is non-existent, unparsed documents are easily readable (Latex on the other hand is not), and content can […] Read more – ‘Markdown and the future of collaborative manuscript writing’.
I’ll freely admit that even as a postdoc I suffer from quite a bit of impostor syndrome, more so than when I was a grad student. Although this feeling is widespread among academics, it is not impossible to beat. Looks like everyone has decided to speak out about it this week on the academic blogosphere. […] Read more – ‘Imposter week’.
A few weeks back I gave a talk at the local Berkeley R meetup group. The idea was to help people not make the same mistakes I made when I first started out learning R. It was the first time I made an entire presentation with Deck.js and I generated the syntax highlighted R code […] Read more – ‘An intro to R’.
I’ve neglected this blog for quite some time but I’m getting around to finishing up a bunch of draft posts. But here is a quick one: Listing objects in your global environment A simple ls() doesn’t really tell you enough useful information at a glance. Most often I just want to know what I named […] Read more – ‘Two incredibly useful functions to throw into your .rprofile’.
I searched around to see if there was a blog post somewhere describing how to customize one’s .rprofile but was surprised to find just one outdated post. So here is quick intro on the topic. If you are a power R user, you already know about what it does. For those of you that don’t, […] Read more – ‘Customizing your .rprofile’.
In early May I had the opportunity to attend a workshop on using high performance computing in R hosted at Nimbios. I’ve been meaning to write a summary of the meeting ever since but got sidetracked by various other projects. Since a collaborator recently asked for meeting notes I finally took the time to write […] Read more – ‘HPC for biological research’.
I had a fantastic time at the DataCite summer 2011 meeting: Data and the Scholarly Record: The Changing Landscape [full schedule] that happened right here in Berkeley. In addition to great talks, I was pretty stoked to interact with a diverse group of people (practicing scientists/data researchers to publishers/repository managers) and also connect with twitter folks IRL. […] Read more – ‘DataCite 2011, recap’.
As an ecologist working on climate change questions, I’ve always found it rather tedious to acquire and process climate data, especially when dealing with large spatiotemporal scales. Although many agencies provide free access to climate data, there is often some overhead (typically one to two days) before the data are made available for download via […] Read more – ‘Climate datasets in R’.
Lately I’ve come to rely on a whole bunch of “2.0″ tools that I now find indispensable. I tried and given up on many products (e.g. Papers and its ios app) but below is a list of tools that I find myself using several times each day. I’ve chosen to highlight a few that don’t […] Read more – ‘A roundup of academic workflow tools’.
I’ve been battling memory limits in R for over two years. Although R has numerous resources for high-performance computing, I still couldn’t get around hardware limitations. Things really got out of control last summer when I started analyzing data on how climate change influences population synchrony across large spatiotemporal gradients. My datasets were simply too […] Read more – ‘R + EC2 + RStudio Server’.
Unless you regularly use particular R packages, it’s becomes difficult to stay on top of updates and bug fixes. Updates usually also include significant improvements in performance. I wrote this short snippet of code which I run about once a month to keep up on updates. This short bit of code will give you a […] Read more – ‘Staying up to date on R packages’.
Thanks for stopping by Inundata. Over the last few years, my work has undergone a major transformation from small-scale single investigator field projects to large scale efforts with big heterogeneous datasets and multiple investigators. As part of the process, I’ve learned a variety new skills to deal with the flood of data. My motivation behind […] Read more – ‘Welcome to Inundata’.