# 7.1. Datarmor use¶

In order to use the osmose and calibrar R packages on Datarmor, the first step is to load the R module as follows:

module load R


The following modules will be loaded:

  Currently Loaded Modulefiles:
1) nco/4.7.1_conda            4) impi/2017.2.174
2) intel-cc-17/17.0.2.174     5) intel-cmkl-17/17.0.2.174
3) java/1.8.0                 6) R/3.4.3-intel-17.0.2.174


Warning

The nco module must be loaded in order to use the R ncdf4 library.

The second step is to define where the libraries are located. In order to avoid multiple copies, a possibility is to use the R libraries that have been built in Nicolas Barrier’s home. This is done as follows:

# CSH users
setenv R_LIBS_USER /home1/datahome/nbarrier/libs/R/lib

# BASH/SH users
export R_LIBS_USER=/home1/datahome/nbarrier/libs/R/lib


To test whether the libraries are found, run R and types:

library("osmose")
library("calibrar")


## 7.1.2. Running parallel R programs¶

Running parallel R programs in Datarmor can be achieved in multiple ways (see the examples in /appli/services/exemples/R/).

### 7.1.2.1. Running on multiple nodes (MPI)¶

To run the calibration on multiple nodes, the calibration must be run by using the RMPISNOW program. The calibration is run by using the following PBS file:

#!/bin/csh
#PBS -q mpi_2
#PBS -l select=2:ncpus=28:mpiprocs=14:mem=125g
#PBS -l walltime=24:00:00

cd $PBS_O_WORKDIR echo$HOST
pbsnodes $HOST # recovering the number of MPI processes (here, 2 * 14 = 28) setenv mpiproc cat$PBS_NODEFILE  |wc -l

source /usr/share/Modules/3.2.10/init/csh

# set the path of the osmose/calibrar libraries
setenv R_LIBS /home1/datahome/nbarrier/libs/R/lib

# Run R in parallel mode.
time mpiexec -np $mpiproc /appli/R/3.3.2-intel-cc-17.0.2.174/lib64/R/library/snow/RMPISNOW --no-save -q < calibrate_MPI.R >& ea.out  Hint It is possible to use BASH instead of CSH. However, it is highly advised to use CSH, since it is the default Datarmor shell. When using RMPISNOW, the parallel library that must is used is the doSNOW package. As a consequence, the calibration script must be modified as follows: # Need to load the doSNOW package, which is # installed in Nicolas Barrier's home require("doSNOW") # call the RMPI/Snow make cluster (note here that there are no arguments!) cl <- makeCluster() # call the registerDoSNOW function instead of the registerDoParallel registerDoSNOW(cl) # send the variables and loaded libraries defined in the above to the nodes clusterExport(cl, c("objfn", "calibData", "calInfo", "observed", "minmaxt")) clusterEvalQ(cl, library("osmose")) clusterEvalQ(cl, library("calibrar")) # run the calibration cal1 = calibrate(calibData['paropt'], fn=objfn, method='default', lower=calibData['parmin'], upper=calibData['parmax'], phases=calibData['parphase'], control=control, replicates=1) # stop the cluster stopCluster(cl)  The main differences between this R script and the one described in Section 6.5.8 are: • require("doSNOW") instead of require("parallel") • No arguments in the makeCluster function • registerDoSNOW instead of registerDoParallel ### 7.1.2.2. Running on a single node¶ To run the calibration in parallel on a single node (for instance on a Shared Memory machine), the doParallel library is used. In this case, the PBS file is as follows: #!/bin/csh #PBS -q omp #PBS -l select=1:ncpus=28:mem=120g #PBS -l walltime=24:00:00 cd$PBS_O_WORKDIR
echo $HOST pbsnodes$HOST

source /usr/share/Modules/3.2.10/init/csh

# set the path of the osmose/calibrar libraries
setenv R_LIBS /home1/datahome/nbarrier/libs/R/lib

# run R in parallel mode
time R --vanilla < calibrate_OMP.R >& ea.out


The calibrate.R script must be modified as follows:

# With OMP, you need to load the doParallel library
require("doParallel")

# Initialisation of the cluster.
# BE SURE THAT THE NUMBER OF CORE HERE IS
# CONSISTENT WITH THE NUMBER OF YOUR PBS FILE
cl <- makeCluster(control$nCores) # register the doParallel so that for each is activated registerDoParallel (cl) # send the variables and loaded libraries defined in the above to the nodes clusterExport(cl, c("objfn", "calibData", "calInfo", "observed", "minmaxt")) clusterEvalQ(cl, library("osmose")) clusterEvalQ(cl, library("calibrar")) # run the calibration cal1 = calibrate(calibData['paropt'], fn=objfn, method='default', lower=calibData['parmin'], upper=calibData['parmax'], phases=calibData['parphase'], control=control, replicates=1) # stop the cluster stopCluster(cl)  The main difference between this R script and the one described in Section 6.5.8 is: • require("doParallel") instead of require("parallel") Danger The number of cores defined in the .pbs file (ncpus) must be consistent with the value in the control$nCores parameter.

### 7.1.2.3. Running on a single core (sequential)¶

To run R in parallel on a single core (i.e. in sequential), the PBS file must be as follows:

#!/bin/csh
#PBS -l walltime=24:00:00
#PBS -l mem=1g

cd $PBS_O_WORKDIR echo$HOST
pbsnodes \$HOST