R is designed as a true computer language with control-flow constructions for iteration and alternation, and it allows users to add additional functionality by defining new functions. For computationally intensive tasks, C, C++ and Fortran code can be linked and called at run time.
NOTE: R is not a parallel program. It is single-threaded, which means that it can only be run on 1 processor. Single, serial jobs are best run on your desktop machine or on Helix. There are two situations in which it is an advantage to run R on Biowulf:
- if you have a large number of independent R jobs (e.g. processing many independent datasets), you can submit them as a 'swarm' of jobs which can all run simultaneously.
- the Rmpi and snow packages can be used to parallelize R computations.
For basic information about setting up an R job, see the R documentation listed at the end of this page. Also see the Batch Queuing System in the Biowulf user guide.
Create a script such as the following:
script file /home/username/runR -------------------------------------------------------------------------- #!/bin/tcsh # This file is runR # #PBS -N R #PBS -m be #PBS -k oe date /usr/local/bin/R --vanilla < /data/username/R/Rtest.r > /data/username/R/Rtest.out --------------------------------------------------------------------------
Submit the script using the 'qsub' command, e.g.
qsub -v -l nodes=1 /home/username/runR
The swarm program is a convenient way to submit large numbers of jobs. Create a swarm command file containing a single job on each line, e.g.
swarm command file /home/username/Rjobs -------------------------------------------------------------------------- /usr/local/bin/R --vanilla < /data/username/R/R1 > /data/username/R/R1.out /usr/local/bin/R --vanilla < /data/username/R/R2 > /data/username/R/R2.out /usr/local/bin/R --vanilla < /data/username/R/R3 > /data/username/R/R3.out /usr/local/bin/R --vanilla < /data/username/R/R4 > /data/username/R/R4.out /usr/local/bin/R --vanilla < /data/username/R/R5 > /data/username/R/R5.out .... --------------------------------------------------------------------------
swarm -f /home/username/RjobsSwarm will create the PBS batch scripts and submit the jobs to the system. See the Swarm documentation for more information.
Rmpi is a wrapper to the LAM implementation of MPI. [Rmpi documentation].
The package snow (Simple Network of Workstations) implements a simple
mechanism for using a workstation cluster for ``embarrassingly parallel''
computations in R. [snow
documentation]
Users who wish to use Rmpi and SNOW will need to add the path for LAM into their .cshrc or .bashrc files, as below:
setenv PATH /usr/local/etc:/usr/local/lam/bin:$PATH (for csh or tcsh) PATH=/usr/local/lam/bin:$PATH (for bash)
To run Rmpi on multiple nodes, LAM must be started on those nodes with the lamboot command before Rmpi is loaded. Any spawned Rmpi slaves must be shut down with mpi.close.Rslaves() or mpi.quit() before exiting R, and lamhalt must be run to shut down LAM before exiting the batch job.
Sample Rmpi batch script:
------- this file is myscript.bat-------------------------- #!/bin/csh #PBS -j oe cd $PBS_O_WORKDIR lamboot $PBS_NODEFILE /usr/local/bin/R --vanilla > myrmpi.out <<EOF library(Rmpi) mpi.spawn.Rslaves(nslaves=$np) mpi.remote.exec(mpi.get.processor.name()) n <- 3 mpi.remote.exec(double, n) mpi.quit() EOF lamhalt --------------------------------------------------------------
Sample batch script using snow:
------- this file is myscript.bat-------------------------- #!/bin/csh #PBS -j oe cd $PBS_O_WORKDIR lamboot $PBS_NODEFILE /usr/local/bin/R --vanilla > myrmpi.out <<EOF library(snow) cl <- makeCluster($np, type = "MPI") clusterCall(cl, function() Sys.info()[c("nodename","machine")]) clusterCall(cl, runif, $np) stopCluster(cl) mpi.quit() EOF lamhalt --------------------------------------------------------------
Either of the above scripts could be submitted with:
qsub -v np=4 -l nodes=2 myscript.batNote that it is entirely up to the user to run the appropriate number of processes for the nodes requested. In the example above, the $np variable is set to 4 and exported via the qsub command, and this variable is used in the script to run 4 snow processes on 2 dual-cpu nodes.
Production runs should be run with batch as above, but for testing purposes an occasional interactive run may be useful.
Sample interactive session with Rmpi: (user input in bold)
[user@biowulf ~]$ qsub -I -l nodes=2 qsub: waiting for job 136623.biobos to start qsub: job 136623.biobos ready [user@p227 ~]$ lamboot $PBS_NODEFILE LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University [user@p227 ~]$ lamexec C hostname #just checking hostnames) p227 p228 [user@p227 ~]$ R R : Copyright 2006, The R Foundation for Statistical Computing Version 2.3.1 (2006-06-01) ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [Previously saved workspace restored] > library(Rmpi) > mpi.spawn.Rslaves(nslaves=4) 4 slaves are spawned successfully. 0 failed. master (rank 0, comm 1) of size 5 is running on: p227 slave1 (rank 1, comm 1) of size 5 is running on: p227 slave2 (rank 2, comm 1) of size 5 is running on: p228 slave3 (rank 3, comm 1) of size 5 is running on: p227 slave4 (rank 4, comm 1) of size 5 is running on: p228 > demo("simplePI") demo(simplePI) ---- ~~~~~~~~ Type <Return> to start : > simple.pi <- function(n, comm = 1) { mpi.bcast.cmd(n <- mpi.bcast(integer(1), type = 1, comm = .comm), comm = comm) mpi.bcast(as.integer(n), type = 1, comm = comm) mpi.bcast.cmd(id <- mpi.comm.rank(.comm), comm = comm) mpi.bc .... [TRUNCATED] > simple.pi(100000) [1] 3.141593 > mpi.quit() #very important [user@p227 ~]$ lamhalt LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University [user@p227 ~]$ exit logout qsub: job 136623.biobos completed
Sample interactive session with snow: (user input in bold)
[user@biowulf ~]$ qsub -I -l nodes=2 qsub: waiting for job 136706.biobos to start qsub: job 136706.biobos ready [user@p227 ~]$ lamboot $PBS_NODEFILE LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University [user@p227 ~]$ lamboot $PBS_NODEFILE LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University [user@p227 ~]$ R R : Copyright 2006, The R Foundation for Statistical Computing Version 2.3.1 (2006-06-01) ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [Previously saved workspace restored] > library(snow) > cl <- makeCluster(4, type = "MPI") Loading required package: Rmpi 4 slaves are spawned successfully. 0 failed. > clusterCall(cl, function() Sys.info()[c("nodename","machine")]) [[1]] nodename machine "p227" "x86_64" [[2]] nodename machine "p228" "x86_64" [[3]] nodename machine "p227" "x86_64" [[4]] nodename machine "p228" "x86_64" > sum(parApply(cl, matrix(1:100,10), 1, sum)) [1] 5050 > system.time(unlist(clusterApply(cl, splitList(x, length(cl)), + qbeta, 3.5, 4.1))) [1] 0.017 0.000 0.022 0.000 0.000 > clusterCall(cl, runif, 3) [[1]] [1] 0.01032138 0.62865716 0.62550058 [[2]] [1] 0.01032138 0.62865716 0.62550058 [[3]] [1] 0.01032138 0.62865716 0.62550058 [[4]] [1] 0.01032138 0.62865716 0.62550058 > stopCluster(cl) [1] 1 > mpi.quit() [user@p227 ~]$ lamhalt LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University [user@p227 ~]$ exit logout qsub: job 136706.biobos completed
- The R Homepage
- R manuals (web)
- PDF manuals including An Introduction to R.
- The R FAQ