Tips on using R with Ubuntu

Posted in |

Installing R

The R program is available in Ubuntu. You can install it by choosing r-base from synaptic. (Actually, you will almost certainly want to compile add-on R packages locally, so install r-base-dev instead.) Use the menus System -> Administration -> Synaptic package manager.

The version in Ubuntu will generally be out-of-date rather quickly, so you will want to add the CRAN repository. That will keep you up to date. You can do this using the Ubuntu menus. Open: System -> Administration -> Software sources and choose the Third-Party Software tab. Click the Add... button and enter:

deb http://cran.stat.ucla.edu/bin/linux/ubuntu hardy/

Note that you need to replace hardy with the name of your release (e.g., gutsy if you are using Gutsy Gibbon). Also, note that cran.stat.ucla.edu is simply one of many repository mirrors. You may find better performance if you replace that with a more local mirror. (Also, be sure to enable backports in the Software sources dialog. Otherwise, you may get unresolved dependencies. It is a check box under the Updates tab.)

The packages on the CRAN repository are digitally signed by the packager to ensure that no one is feeding you a trojan horse. It will make installation smoother if you download and install the corresponding public key. Save the text of the key in a file on your desktop. Then run the Software sources application again and choose the Authentication tab. Click on the Import Key File... button and load in the public key. After that, Ubuntu will not prompt you for confirmation every time you try to install a CRAN package.

Installing R packages

One thing you may notice is that there are some R packages already compiled and packaged for Ubuntu (they all begin with r-cran-). You can install these using synaptic. I recommend however, that you do not install these packages. Instead install packages directly in R. To install packages system-wide run (from the bash prompt):

sudo R

then type

install.packages('pkgname', dep = TRUE)

You need to have installed r-base-dev for this to work in Ubuntu. Note that 'pkgname' needs to be the name of a package on CRAN (and you must keep the quotes). If you want to update your packages type:

update.packages(ask = FALSE)

Definitely update packages every few months.

If you want to install packages into your home account try (from the bash prompt):

mkdir ~/lib ~/lib/R

and then run R. In R, type

install.packages('pkgname', lib = '~/lib/R', dep = TRUE)

Note that in R, you will need to specify lib again when loading the package:

library('pkgname', lib = '~/lib/R')

There is a way to add the local path permanently to your R package search path. You can type ?Renviron in R to get more information. If you have a downloaded package file (ie from a package not on CRAN), you can install from the bash prompt with:

R CMD INSTALL somepackage.tgz

or

R CMD INSTALL -l ~/lib/R somepackage.tgz

if you want to install locally in your home account.

Performance issues

R requires a lot of RAM, so running on a machine with lots of memory (ie papua) is recommended.

You can get better performance from R on Ubuntu by installing the atlas libraries. Use synaptic to install libatlas3gf-sse2 (this name might change in later versions of Ubuntu -- search for libatlas in synaptic). You should check that your machine supports SSE2 (if not choose a different atlas lib). From the bash prompt try:

grep sse2 /proc/cpuinfo

If it returns some text, then your CPU supports SSE2.

Note that which version of libatlas you should install depends on your system architecture. It is not clear which to install for more recent 64-bit CPU's (ours are a bit old, but watch this space...) It has been suggested that libatlas3gf-base is the one for 64-bit machines. Not a lot of documentation is available. You might have a look in /usr/share/doc/libatlas3gf-<which one you installed>.

I did a quick test just now in R. This was run in Ubu 8.04 running in a Parallels virtual machin on my MacBook (but it should be not much different on other installs).

First install R without libatlas installed. Then run R and type

> mm <- matrix(rnorm(10^6), 10^3)

> system.time(crossprod(mm))

Repeat this after install libatlas3gf-base and then libatlas3gf-sse2. I got a good speedup each time (you might need to uninstall libatlas3gf-base).

It would be nice to have a comprehensive set of R benchmarks for testing!