7.1. Datarmor use

7.1.1. Package loading

In order to use the osmose and calibrar R packages on Datarmor, the first step is to load the R module as follows:

module load R
module load nco

The following modules will be loaded:

  Currently Loaded Modulefiles:
1) nco/4.7.1_conda            4) impi/2017.2.174
2) intel-cc-17/17.0.2.174     5) intel-cmkl-17/17.0.2.174
3) java/1.8.0                 6) R/3.4.3-intel-17.0.2.174

Warning

The nco module must be loaded in order to use the R ncdf4 library.

The second step is to define where the libraries are located. In order to avoid multiple copies, a possibility is to use the R libraries that have been built in Nicolas Barrier’s home. This is done as follows:

# CSH users
setenv R_LIBS_USER /home1/datahome/nbarrier/libs/R/lib
# BASH/SH users
export R_LIBS_USER=/home1/datahome/nbarrier/libs/R/lib

To test whether the libraries are found, run R and types:

library("osmose")
library("calibrar")

7.1.2. Running parallel R programs

Running parallel R programs in Datarmor can be achieved in multiple ways (see the examples in /appli/services/exemples/R/).

7.1.2.1. Running on multiple nodes (MPI)

To run the calibration on multiple nodes, the calibration must be run by using the RMPISNOW program. The calibration is run by using the following PBS file:

#!/bin/csh
#PBS -q mpi_2
#PBS -l select=2:ncpus=28:mpiprocs=14:mem=125g
#PBS -l walltime=24:00:00

cd $PBS_O_WORKDIR
echo $HOST
pbsnodes $HOST

# recovering the number of MPI processes (here, 2 * 14 = 28)
setenv mpiproc `cat $PBS_NODEFILE  |wc -l`

# load the R libraries
source /usr/share/Modules/3.2.10/init/csh
module load R
module load nco

# set the path of the osmose/calibrar libraries
setenv R_LIBS /home1/datahome/nbarrier/libs/R/lib

# Run R in parallel mode.
time mpiexec -np $mpiproc /appli/R/3.3.2-intel-cc-17.0.2.174/lib64/R/library/snow/RMPISNOW --no-save -q < calibrate_MPI.R >& ea.out

Hint

It is possible to use BASH instead of CSH. However, it is highly advised to use CSH, since it is the default Datarmor shell.

When using RMPISNOW, the parallel library that must is used is the doSNOW package. As a consequence, the calibration script must be modified as follows:

# Need to load the doSNOW package, which is 
# installed in Nicolas Barrier's home
require("doSNOW")

# call the RMPI/Snow make cluster (note here that there are no arguments!)
cl <- makeCluster()

# call the registerDoSNOW function instead of the registerDoParallel
registerDoSNOW(cl)

# send the variables and loaded libraries defined in the above to the nodes
clusterExport(cl, c("objfn", "calibData", "calInfo", "observed", "minmaxt"))
clusterEvalQ(cl, library("osmose"))
clusterEvalQ(cl, library("calibrar"))

# run the calibration
cal1 = calibrate(calibData['paropt'], fn=objfn, method='default',
                 lower=calibData['parmin'], upper=calibData['parmax'], 
                 phases=calibData['parphase'], control=control, replicates=1)

# stop the cluster
stopCluster(cl)

The main differences between this R script and the one described in Section 6.5.8 are:

  • require("doSNOW") instead of require("parallel")

  • No arguments in the makeCluster function

  • registerDoSNOW instead of registerDoParallel

7.1.2.2. Running on a single node

To run the calibration in parallel on a single node (for instance on a Shared Memory machine), the doParallel library is used. In this case, the PBS file is as follows:

#!/bin/csh
#PBS -q omp
#PBS -l select=1:ncpus=28:mem=120g
#PBS -l walltime=24:00:00

cd $PBS_O_WORKDIR
echo $HOST
pbsnodes $HOST

# load the R libraries
source /usr/share/Modules/3.2.10/init/csh
module load R
module load nco

# set the path of the osmose/calibrar libraries
setenv R_LIBS /home1/datahome/nbarrier/libs/R/lib

# run R in parallel mode
time R --vanilla < calibrate_OMP.R >& ea.out

The calibrate.R script must be modified as follows:

# With OMP, you need to load the doParallel library
require("doParallel")

# Initialisation of the cluster.
# BE SURE THAT THE NUMBER OF CORE HERE IS 
# CONSISTENT WITH THE NUMBER OF YOUR PBS FILE
cl <- makeCluster(control$nCores)

# register the doParallel so that for each is activated
registerDoParallel (cl)

# send the variables and loaded libraries defined in the above to the nodes
clusterExport(cl, c("objfn", "calibData", "calInfo", "observed", "minmaxt"))
clusterEvalQ(cl, library("osmose"))
clusterEvalQ(cl, library("calibrar"))

# run the calibration
cal1 = calibrate(calibData['paropt'], fn=objfn, method='default',
                 lower=calibData['parmin'], upper=calibData['parmax'], 
                 phases=calibData['parphase'], control=control, replicates=1)

# stop the cluster
stopCluster(cl)

The main difference between this R script and the one described in Section 6.5.8 is:

  • require("doParallel") instead of require("parallel")

Danger

The number of cores defined in the .pbs file (ncpus) must be consistent with the value in the control$nCores parameter.

7.1.2.3. Running on a single core (sequential)

To run R in parallel on a single core (i.e. in sequential), the PBS file must be as follows:

#!/bin/csh
#PBS -l walltime=24:00:00
#PBS -l mem=1g

cd $PBS_O_WORKDIR
echo $HOST
pbsnodes $HOST

# load the R libraries
source /usr/share/Modules/3.2.10/init/csh
module load R
module load nco

# set the path of the osmose/calibrar libraries
setenv R_LIBS /home1/datahome/nbarrier/libs/R/lib

# Run R in parallel mode.
time R --vanilla < calibrate_SEQ.R >& ea.out

In this case, no parallel libraries need to be loaded. Hence, the script described in Section 6.5 can be used without any modifications.