R + EC2 + RStudio Server
by Karthik Ram. Average Reading Time: about 2 minutes.
I’ve been battling memory limits in R for over two years. Although R has numerous resources for high-performance computing, I still couldn’t get around hardware limitations. Things really got out of control last summer when I started analyzing data on how climate change influences population synchrony across large spatiotemporal gradients. My datasets were simply too many and too large and no amount of code finessing, nor heavy use of Hadley’s approach helped much.
Initially I was turned off by the learning curve associated with the ins and outs of setting up R on EC2 but eventually I set up my own Ubuntu box with R, all of my packages and customizations, and saved that as a 64-bit AMI capable of running high memory quadruple extra large instances. This set up has worked really well for me over the last few months.
With the recent release of RStudio, and Rstudio server, I’ve been toying with the idea of running it on an EBS backed instance. Inspired by JD’s tweet, I got around to setting mine up this weekend. Here is a quick walk through.
Assuming you’ve launched and used EC2 services before, start out by launching a newer version of Ubuntu (I’m running 10.04 Lucid) and install the current release of R (2.13).
Next, install RStudio server by following the instructions here (be sure to follow 64-bit).
Once successfully installed, create a new user like so:
sudo adduser username
At this point, be sure to go change your EC2 security group to allow port 8787 on TCP.
If the instructions so far seem complicated or if you’d rather not start from scratch, you can follow instructions here to launch an existing AMI with Rstudio server compatible versions and take it from there.
Next, launch Rstudio from the server using your instance DNS like so:
http://ec2-75-102-193-170.compute-1.amazonaws.com:8787
(be sure to replace the DNS above with your current DNS from the EC2 Dashboard)
Next, login with the username and password set earlier and if everything worked, you should see something like this:
Next, install all the packages you would like. If you require Java backed packages such as glmulti, go ahead and set up Java from the terminal.
After that, you can easily (using GUI menus) save this customized instance by following instructions here. Voila. From now on, whenever you need to run a high-memory instance of R, just launch new instance, choose My AMIs, and once launched, connect to it via the browser using the current DNS. Brilliant!
How much memory are you using on it?
For the most memory intensive runs I use the m2.4 xlarge instance which is 8 cores (26 ECUs) and 68.4 gigs of ram. For most routine stuff I pick smaller sized instances.
interesting. Does that mean that you do a lot of parallel processing? Do you use any special programs for that?
Yeah. R can natively take advantage of the extra RAM but you will need to explicitly parallelizing the code to take advantage of all the extra cores. The quickest way to do this is using the Multicore library and Revolution Analytics’ foreach.
For more on high-performance computing with R, see some excellent presentations by Dirk Eddelbeutel here.
R + EC2 + RStudio Server http://bit.ly/gvsAIl #rstats
Inundata – R + EC2 + RStudio Server http://bit.ly/fZk4VY
RT @_inundata: R + EC2 + RStudio Server http://bit.ly/gvsAIl #rstats
Thanks for the interesting post. I know it depends on a lot of factors, but can you give any kind of numbers on how much your typical analysis ends up costing you?
Sure. I tend to use spot instances which allows me to bid at a lower price than the going rate. So my last simulation lasted six hours and ran me $5.94 (I bid $0.99 for an instance that typically costs $2.80). So if you need to run intensive analysis on an infrequent basis, it works out to be fairly cheap. But if you end up using it a lot, these small day to day costs tend to add up.
What is the ID of your AMI, how to find it from the list?
My AMI is currently private but I will post an update when I am able to make it public. For now, search for e42cdb8d which is publicly available and compatible with RStudio server.
R + EC2 + RStudio Server http://ff.im/-AchpS
Nice post. Quick question: how did you make an AMI of your Ubuntu install? I’ve been looking but there doesn’t seem to be an easy straightforward way to do this.
If you want to save it as an EBS backed instance, the last link in the post (here) should do it for you.
To save a regular AMI, these instructions worked really well for me. One thing to be careful about is being clear about the bucket region (step #8) because that cannot be changed post hoc.
Hi Karthik,
Nice post, can you guide me on how to do this in windows platform?
Unfortunately I have no experience running a windows platform on the cloud.
And what is the differens in computation speed of using R on the same computer: a) remotely through Radmin or Team Viewer; b) through R-studio Server?
Inundata – R + EC2 + RStudio Server http://t.co/Wm15xP6
Just stumbled across this page. I’ve been maintaining AMIs specifically for RStudio Server for a few months now. Details are at http://www.louisaslett.com/RStudio_AMI/
Hope that helps.
Thanks Louis! I’m sure others who stumble upon this post will find it useful.
@hylopsar Cloudnumbers OK. Good option (I not used yet) @rstudioapp sever version on Linux in a browser, and run on AWS http://t.co/nPzaHJ2m
Usando ferramenta estatística R e RStudio Server em um servidor EC2 da AWS – http://t.co/6jLkwALp
Usando ferramenta estatística R e RStudio Server em um servidor EC2 da AWS – http://t.co/6jLkwALp
Usando ferramenta estatística R e RStudio Server em um servidor EC2 da AWS – http://t.co/6jLkwALp
@mhkeller. how to set up a custom AMI w/ R for EC2: http://t.co/I5d4bn84.. @fredbenenson is this similar to your setup?
Inundata – R + EC2 + RStudio Server http://t.co/L6AxdKP7 #pop #data
Inundata – R + EC2 + RStudio Server http://t.co/L6AxdKP7 #pop #data
Reading: Inundata – R + EC2 + RStudio Server: http://t.co/haX1BqcX