Skip to content

Run ACCESS-OM

Prerequisites

General prerequisites

Before running ACCESS-OM, you need to fulfil general prerequisites outlined in the First Steps section.

Model-specific prerequisites

  • Join the hh5, qv56, ua8 and ik11 projects at NCI
    To join these projects, request membership on the respective hh5, qv56, ua8 and ik11 NCI project pages.
    For more information on how to join specific NCI projects, please refer to How to connect to a project.
  • Payu
    Payu on Gadi is available through the conda/analysis3 environment in the hh5 project.
    After obtaining hh5 project membership, load the conda/analysis3 environment to automatically retrieve payu as follows:
    module use /g/data/hh5/public/modules
    module load conda/analysis3
    To check that payu is available, run:
    payu --version
    payu --version 1.0.19

Get ACCESS-OM configuration

A standard ACCESS-OM configuration is available on the COSIMA GitHub.
This is a 1° horizontal resolution configuration with interannual forcing from 1 Jan 1958 to 31 Dec 2018.
To get it on Gadi, create a directory to store the model configuration.Navigate to this directory and clone the GitHub repo in it by running:

git clone https://github.com/COSIMA/1deg_jra55_iaf.git

mkdir -p ~/access-om cd ~/access-om git clone https://github.com/COSIMA/1deg_jra55_iaf.git Cloning into '1deg_jra55_iaf'... remote: Enumerating objects: 14715, done. remote: Counting objects: 100% (3401/3401), done. remote: Compressing objects: 100% (24/24), done. remote: Total 14715 (delta 3383), reused 3379 (delta 3377), pack-reused 11314 Receiving objects: 100% (14715/14715), 35.68 MiB | 18.11 MiB/s, done. Resolving deltas: 100% (10707/10707), done.

Some modules may interfere with git commands (e.g., matlab/R2018a). If you have trouble cloning the repository, run the following command before trying again:
module purge

Edit ACCESS-OM configuration

It is good practice to create a new git branch to store all your modifications for a particular run, so as not to modify the reference configuration.

To create a local branch called "example_run", from within the cloned repo execute:

git checkout -b example_run

Payu

Payu is a workflow management tool for running numerical models in supercomputing environments.
The general layout of a payu-supported model run consists of two main directories:

  • The laboratory directory, where all the model components reside. For ACCESS-OM, it is typically /scratch/$PROJECT/$USER/access-om2.
  • The control directory, where the model configuration resides and from where the model is run (in this example, the cloned directory ~/access-om/1deg_jra55_iaf.

This distinction of directories separates the small-size configuration files from the larger binary outputs and inputs. In this way, the configuration files can be placed in the $HOME directory (as it is the only filesystem actively backed-up on Gadi), without overloading it with too much data. Furthermore, this separation allows multiple self-resubmitting experiments that share common executables and input data to be run simultaneously.

To setup the laboratory directory, run the following command from the control directory:

payu init

This creates the laboratory directory, together with relevant subdirectories, depending on the configuration. The main subdirectories of interest are:

  • work → a temporary directory where the model is run. It gets cleaned after each run.
  • archive → the directory where output is stored after each run.
  • cd ~/access-om/1deg_jra55_iaf payu init laboratory path: /scratch/$PROJECT/$USER/access-om2 binary path: /scratch/$PROJECT/$USER/access-om2/bin input path: /scratch/$PROJECT/$USER/access-om2/input work path: /scratch/$PROJECT/$USER/access-om2/work archive path: /scratch/$PROJECT/$USER/access-om2/archive

Edit the Master Configuration file

The config.yaml file located in the control directory, is the Master Configuration file.
This file, which controls the general model configuration, contains several parts:

  • PBS resources
    queue: normal
    walltime: 3:00:00
    jobname: 1deg_jra55_iaf
    mem: 1000GB
    These lines can be edited to change the PBS directives for the PBS job.
    For example, to run ACCESS-OM under the tm70 project (ACCESS-NRI), add the following line:
    project: tm70
    To run ACCESS-OM, you need to be a member of a project with allocated Service Units (SU). For more information, check how to join relevant NCI projects.
  • Model configuration
    name: common
    model: access-om2
    input: /g/data/ik11/inputs/access-om2/input_20201102/common_1deg_jra55
    These lines let payu know which driver to use for the main model configuration (access-om).
    The name field here is not actually used for the configuration run so you can safely ignore it.
  • Submodels
    submodels:
        - name: atmosphere
          model: yatm
          exe: /g/data/access/payu/access-om2/bin/coe/um7.3x
          input:
                - /g/data/ik11/inputs/access-om2/input_20201102/yatm_1deg
                - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/atmos/3hr/rsds/gr/v20190429
                - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/atmos/3hr/rlds/gr/v20190429
                - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/atmos/3hr/prra/gr/v20190429
                - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/atmos/3hr/prsn/gr/v20190429
                - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/atmos/3hrPt/psl/gr/v20190429
                - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/land/day/friver/gr/v20190429
                - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/atmos/3hrPt/tas/gr/v20190429
                - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/atmos/3hrPt/huss/gr/v20190429
                - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/atmos/3hrPt/uas/gr/v20190429
                - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/atmos/3hrPt/vas/gr/v20190429
                - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0/landIce/day/licalvf/gr/v20190429
          ncpus: 1
    - name: ocean model: mom exe: /g/data/ik11/inputs/access-om2/bin/fms_ACCESS-OM_730f0bf_libaccessom2_d750b4b.x input: /g/data/ik11/inputs/access-om2/input_20201102/mom_1deg ncpus: 216
    - name: ice model: cice exe: /g/data/ik11/inputs/access-om2/bin/cice_auscom_360x300_24p_edcfa6f_libaccessom2_d750b4b.exe input: /g/data/ik11/inputs/access-om2/input_20201102/cice_1deg ncpus: 24
    ACCESS-OM is a coupled model deploying multiple submodels (i.e. model components). This section specifies the submodels and configuration options required to execute the model correctly.
    Each submodel contains additional configuration options that are read in when the submodel is running. These options are specified in the subfolder of the control directory, whose name matches the submodel's name (e.g., configuration options for the atmosphere submodel are in the ~/access-om/1deg_jra55_iaf/atmosphere directory).
  • Collate
    collate:
        restart: true
        walltime: 1:00:00
        mem: 30GB
        ncpus: 4
        queue: normal
        exe: /g/data/ik11/inputs/access-om2/bin/mppnccombine
    The collate process combines a number of smaller files, which contain different parts of the model grid, into target output files. Restart files are typically tiled in the same way and will also be combined together if the restart option is set to true.
  • Runlog
    runlog: true
    When running a new configuration, payu automatically commits changes in git if runlog is set to true.
    Should not be changed to avoid losing track of the current experiment.
  • Stack size
    stacksize: unlimited
    The stacksize is the maximum size (in KiB) of the per-thread resources allocated for each process. This is often set to unlimited as explicit stacksize values may not be correctly communicated across Gadi nodes.
  • Restart frequency
    restart_freq: 1
    The restart frequency specifies the rate of saved restart files.
    For example, to save restart files every fifth run (i.e. restart004, restart009, restart014, etc.), you need to set restart_freq: 5.
    Intermediate restarts are still temporarily saved and deleted only after a permanently archived restart has been produced.
  • mpirun arguments
    mpirun: --mca io ompio --mca io_ompio_num_aggregators 1
    Line to append mpirun arguments to the mpirun call of the model.
  • qsub flags
    qsub_flags: -W umask=027
    This line is the configuration marker for any additional qsub flags.
  • Environment variables
    env:
        UCX_LOG_LEVEL: 'error'
    Line to add the specified variables to the run environment.
  • Platform-specific defaults
    platform: 
        nodesize: 48
    Lines to control the platform-specific default parameters.
    nodesize: 48 sets the default number of cpus per node to 48, to fully utilise nodes regardless of the requested number of cpus.
  • User scripts
    userscripts:
        error: resub.sh
        run: rm -f resubmit.count
    A namelist to include separate user scripts or subcommands at various stages of a payu submission.
    error gets called if the model does not run correctly and returns an error code;
    run gets called after the model execution, but prior to model output archive.


To find out more about other configuration settings for the config.yaml file, check out how to configure your experiment with payu.

Change run length

To change the internal run length, edit the restart_period field in the &date_manager_nml section of the ~/access-om/1deg_jra55_iaf/accessom2.nml file:

&date_manager_nml
    forcing_start_date = '1958-01-01T00:00:00'
    forcing_end_date = '2019-01-01T00:00:00'
! Runtime for a single segment/job/submit, format is years, months, seconds, ! two of which must be zero. restart_period = 5, 0, 0 &end
The internal run length (controlled by restart_period) can be different from the total run length. Also, while restart_period can be reduced, it should not be increased to more than 5 years to avoid errors. For more information about the difference between internal run and total run lengths, or how to run the model for more than 5 years, refer to the section Run configuration for multiple years.

Run ACCESS-OM configuration

After editing the configuration, you are ready to run ACCESS-OM.
ACCESS-OM suites run on Gadi through a PBS job submission managed by payu.

Payu setup (optional)

As a first step, from within the control directory, it is good practice to run:

payu setup

This will prepare the model run, based on the experiment configuration. payu setup laboratory path: /scratch/$PROJECT/$USER/access-om2 binary ppath: /scratch/$PROJECT/$USER/access-om2/bin input path: /scratch/$PROJECT/$USER/access-om2/input work path: /scratch/$PROJECT/$USER/access-om2/work archive path: /scratch/$PROJECT/$USER/access-om2/archive Loading input manifest: manifests/input.yaml Loading restart manifest: manifests/restart.yaml Loading exe manifest: manifests/exe.yaml Setting up atmosphere Setting up ocean Setting up ice Setting up access-om2 Checking exe and input manifests Updating full hashes for 3 files in manifests/exe.yaml Creating restart manifest Writing manifests/restart.yaml Writing manifests/exe.yaml

This step can be skipped as it is also included in the run command. However, running it explicitly helps to check for errors and make sure executable and restart directories are accessible.

Run configuration

To run ACCESS-OM configuration for one internal run length (controlled by restart_period in the ~/access-om/1deg_jra55_iaf/accessom2.nml file), execute:

payu run -f

This will submit a single job to the queue with a total run length of restart_period.

The -f option ensures that payu will run even if there is an existing non-empty work directory created from a previous failed run.

payu run -f payu: warning: Job request includes 47 unused CPUs. payu: warning: CPU request increased from 241 to 288 Loading input manifest: manifests/input.yaml Loading restart manifest: manifests/restart.yaml Loading exe manifest: manifests/exe.yaml payu: Found modules in /opt/Modules/v4.3.0 qsub -q normal -P tm70 -l walltime=10800 -l ncpus=288 -l mem=1000GB -N 1deg_jra55_iaf -l wd -j n -v PYTHONPATH=/g/data3/tm70/dm5220/scripts/python_modules/,PAYU_PATH=/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/bin,PAYU_FORCE=True,MODULESHOME=/opt/Modules/v4.3.0,MODULES_CMD=/opt/Modules/v4.3.0/libexec/modulecmd.tcl,MODULEPATH=/g/data3/hh5/public/modules:/etc/scl/modulefiles:/opt/Modules/modulefiles:/opt/Modules/v4.3.0/modulefiles:/apps/Modules/modulefiles -W umask=027 -l storage=gdata/hh5+gdata/ik11+gdata/qv56 -- /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/bin/python3.9 /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.01/bin/payu-run <job-ID>.gadi-pbs

Run configuration for multiple years

If you want to run ACCESS-OM configuration for multiple internal run lengths (controlled by restart_period in the ~/access-om/1deg_jra55_iaf/accessom2.nml file), use the option -n:

payu run -f -n <number-of-runs>

This will run the configuration number-of-runs times with a total run length of restart_period * number-of-runs.
For example, to run the configuration for a total of 50 years with restart_period = 5, 0, 0 (5 years), the number-of-runs should be set to 10:

payu run -f -n 10

Monitor ACCESS-OM runs

Currently, there is no specific tool to monitor ACCESS-OM runs.
You can execute the following command to show the status of all your submitted PBS jobs:

qstat -u $USER

qstat -u $USER Job id                Name             User             Time Use S Queue --------------------- ---------------- ---------------- -------- - ----- <job-ID>.gadi-pbs     1deg_jra55_iaf   <$USER>            <time> R normal-exec <job-ID-2>.gadi-pbs   <other-job-name> <$USER>            <time> R normal-exec <job-ID-3>.gadi-pbs   <other-job-name> <$USER>            <time> R normal-exec If you changed the jobname in the PBS resources of the Master Configuration file, that will appear as your job's Name instead of 1deg_jra55_iaf.
S indicates the status of your run, where:

  • Q → Job waiting in the queue to start
  • R → Job running
  • E → Job ending

If there are no jobs listed with your jobname (or if no job is listed), your run either successfully completed or was terminated due to an error.

Stop a run

If you want to manually terminate a run, you can do so by executing:

qdel <job-ID>

Error and output log files

While the model is running, payu saves the standard output and standard error in the respective access-om2.out and access-om2.err files in the control directory. You can examine the contents of these files to check on the status of a run as it progresses.
When the model completes its run, or if it crashes, the output and error log files are by default renamed as jobname.o<job-ID> and jobname.e<job-ID>, respectively.

Model Live Diagnostics

ACCESS-NRI developed the Model Live Diagnostics framework to check, monitor, visualise, and evaluate model behaviour and progress of ACCESS models currently running on Gadi.
For a complete documentation on how to use this framework, check the Model Diagnostics documentation.


ACCESS-OM outputs

While the configuration is running, output files (and restart files) are moved from the work directory to the archive directory /scratch/$PROJECT/$USER/access-om2/archive. They are also symlinked in the control directory to ~/access-om/1deg_jra55_iaf/archive.
Both outputs and restarts are stored in subfolders for each different configuration (in this case, 1deg_jra55_iaf). Inside the configuration folder, they are further subdivided for each internal run.
The naming format for a typical output folder is outputXXX and for a restart folder restartXXX, where XXX is the internal run number starting from 000.
Outputs and restarts are separated in the respective folders for each model component.
cd /scratch/$PROJECT/$USER/access-om2/archive/1deg_jra55_iaf ls output000 pbs_logs restart000


References

Last update: September 21, 2023