6.5. Create the calibration script

The next step is to write the calibration script, which is the main script to execute to run the calibration.

The sample script used to calibrate the [HLS+16] configuration, provide in the osmose package. is detailed below.

Danger

All the files and script provided in the package are built for the calibration of the reference configuration. They should never be used to calibrate another configuration.

6.5.1. Library and function loading

Load the libraries and source the runModel.R file (which must be written by the user, see Section 6.6).

require("osmose")
require("calibrar")

# Recovering of the runModel function associated with the reference configuration
filename = system.file(package="osmose", "extdata", "calib", "runModel.R")
source(filename)

6.5.2. Define your likelyfood functions

The user has the possibility to define its own likelyhood function. For instance:

# creates a user defined likelyhood function
minmaxt = function(obs, sim) {
    output = -1e+6*sum(log(pmin((sim+1)/(obs[, 1]+1), 1)), na.rm=TRUE) + 
              1e+6*sum(log(pmax((sim+1)/(obs[, 2]+1), 1)), na.rm=TRUE)
    return(output)
}

Warning

The function name must match the name of the type column in the calibration settings (see Table 6.4)

6.5.3. Read calibration information

The next step is to read the calibration informations (see Table 6.4). This is done by using the calibrar::getCalibrationInfo function.

# reads calibration informations
calib_path = system.file(package="osmose", "extdata", "calib")
calib_file = "calibration_settings.csv"
calInfo = getCalibrationInfo(path=calib_path, file=calib_file)

6.5.4. Read observation datasets

The next step is to load the observation datasets by using the calibrar::getObservedData function.

# loads the observed data (path is the data directory path, 
# data.folder is the data directory name)
observed = getObservedData(calInfo, path=calib_path, data.folder="DATA")

6.5.5. Load the calibrated parameters

Load the calibrated parameters (see Table 6.5).

In this example, the reading of the parameters is done as follows:

# load calibration parameters
param_to_calib = system.file(package="osmose", "extdata", "calib", "parameters_to_calib.csv")
calibData = read.csv(file=param_to_calib, 
                     header=TRUE, 
                     sep=",", 
                     row.names=1)

Warning

Since the format of this file is free, the user must insure that it is properly read.

6.5.6. Create objective function

The next step is to create the objective function, by using the calibrar::createObjectiveFunction function. It’s arguments are:

  • runModel: the function that is used to run the model (which must be created by the user, see Section 6.6)
  • info: the calibration information (output of the calibrar::getCalibrationInfo function)
  • observed: the observed datasets (output of the calibrar::getObservedData function)
  • aggregate: if TRUE, a scalar value is returned by using the aggFn.
  • aggFn: the aggregation function. Default is a weighted sum (the weights being provided in Table 6.4).
  • the last arguments are the additional arguments of the runModel function (in this case, the names arguments)
# create an objective function
# additional arguments to the runModel function
# are provided here.
objfn = createObjectiveFunction(runModel=runModel, 
                                info=calInfo, 
                                observed=observed, 
                                aggFn=calibrar:::.weighted.sum,
                                aggregate=FALSE,
                                names=row.names(calibData))

6.5.7. Run the calibration

Finally, the calibration is run by using the calibrate function.

control = list()
# control$maxgen = c(2, 2, 2, 2)   # maximum number of generations (former gen.max parameter)
control$maxgen = 2   # maximum number of generations (former gen.max parameter)
control$master = system.file(package="osmose", "extdata", "master")   # directory that will be copied
control$run = "RUN"   # run directory
control$restart.file = "calib_restart"   # name of the restart file
control$REPORT = 1    # number of generations to run before saving a restart
# control$parallel = TRUE
# control$nCores = 2
control$popsize = 15   # population  size (former seed parameter)

# cl = makeCluster(control$nCores)
# registerDoParallel(cl)

# send the variables and loaded libraries defined in the above to the nodes
# clusterExport(cl, c("objfn", "calibData", "calInfo", "observed", "minmaxt"))
# clusterEvalQ(cl, library("osmose"))
# clusterEvalQ(cl, library("calibrar"))

cal1 = calibrate(calibData['paropt'], fn=objfn, method='default',
                 lower=calibData['parmin'], upper=calibData['parmax'], 
                 phases=calibData['parphase'], replicates=1, control=control)

# stopCluster(cl)

The arguments are:

  • the parameter initial guess
  • fn: the objective function
  • method: the optimiziation method. The default method is the Adaptative Hierarchical Recombination Evolutionary Strategy (AHR-ES)
  • lower: the lower parameter bounds
  • upper: the upper parameter bounds
  • phases: the calibration phase at which the parameters are estimated.
  • replicates: the number of replicates (i.e the number of times the runModel function will be called).
  • control: additional arguments stored in a list (see Table 6.6).
Table 6.6 Elements that can be put in the control argument list
Option Description
maxit Maximum number of executions of the objective function.
maxgen Maximum number of generations
popsize Population size
parallel Boolean, TRUE in order to activate the parallel execution of the optimization.
ncores The number of cores available in the parallel cluster for the active session. If parallel=TRUE, the default is to get the number of cores of the system.
run An optional folder path to create all the temporary folders needed to run the simulations for each parameter combination tested by the optimization algorithm. The folders are recycled every generation
master An optional folder path. All the contents of the designated folder will be copied to each temporary folder.
REPORT Number of iterations after saving a new restart object, which contains all the information necessary to restart the calibration at that point. The default is NULL, and no restart files are created.
restart.file Filename for the restart file to be created.

In the above example, calibration for each population will be performed in a RUN directory. For each population, a RUN/iX will be created, which will be the working directory in which the runModel function will be called.

The master directory (containing the calibration-parameters.csv and config.csv configuration files) will be copied in all the RUN/iX directories.

Hint

For OSMOSE calibration, you can control the number of replicates either through the replicates argument, or by setting the OSMOSE Java parameter simulation.nsimulation.

Note

Either maxit or maxgen must be provided. If maxit is provided, maxgen is computed as

\[\frac{maxit}{replicates \times popsize}\]

Warning

Since the OSMOSE model is stochastic, gradient base methods are innappropriate. The only method compatible with OSMOSE are AHR-ES.

6.5.8. Running calibration in parallel

Because the calibration process is time and resource consuming, it is highly advised to run the calibration in parallel. This is achieved by setting the control$parallel = TRUE and by initializing the cluster priori the call to the calibrate function.

Furthermore, the content that is used in the calibration (calibration data, likelyhood functions, packages) must be exported to the cluster. This is done by using the clusterExport and clusterEvalQ functions. The calibrate.R file must be edited as follows:

control$parallel = TRUE
control$nCores = 2   #  default value is the available number of cores

cl = makeCluster(control$nCores)
registerDoParallel(cl)

# send the variables and loaded libraries defined in the above to the nodes
clusterExport(cl, c("objfn", "calibData", "calInfo", "observed", "minmaxt"))
clusterEvalQ(cl, library("osmose"))
clusterEvalQ(cl, library("calibrar"))

cal1 = calibrate(calibData['paropt'], fn=objfn, method='default',
                 lower=calibData['parmin'], upper=calibData['parmax'],
                 phases=calibData['parphase'], control=control, replicates=2)

stopCluster(cl)

Danger

The way to run R in parallel depends on the calculation platform used. For Datarmor users, see dat.

Caution

The cluster must be closed after the calibrate function.