Skip to content

Run ACCESS-OM2

Prerequisites

General prerequisites

Before running ACCESS-OM2, you need to fulfil general prerequisites outlined in the First Steps section.

If you are unsure whether ACCESS-OM2 is the right choice for your experiment, take a look at the overview of ACCESS Models.

Note

In this documentation the same code is sometimes shown in a highlight code-block, and also in a simulated terminal. The code-block is useful because it is easy to copy the code example to your clipboard (mouse over the code block and click the icon on the far right of the code block). The simulated terminal is to illustrate what happens when commands are run on a terminal on Gadi.

Model-specific prerequisites

Join the vk83 and qv56 projects at NCI

To join these projects request membership on the respective vk83 and qv56 NCI project pages.

For more information on how to join specific NCI projects, please refer to How to connect to a project.

Payu

Payu is a workflow management tool for running numerical models in supercomputing environments for which there is extensive documentation.

Payu on Gadi is available through a dedicated conda environment in the vk83 project.

After joining the vk83 project load the payu environment:

module use /g/data/vk83/modules
module load payu

To check that payu is available, run:

payu --version

payu --version 1.1.3

Note: payu version >=1.1.3 is required


Get ACCESS-OM2 configuration

All released ACCESS-OM2 configurations are available from the ACCESS-OM2 configs GitHub repository. Released configurations are tested and supported by ACCESS-NRI. ACCESS-NRI has adapted these model configurations from those originally developed by COSIMA.

There are global configurations for three resolutions: 1°, 0.25°, 0.1°. For each resolution there are two options of atmospheric forcing: Repeat Year Forcing (RYF) and Interannual Forcing (IAF). Each configuration also has a biogeochemical (BGC) configuration if this is required. Note the BGC experiments are slower and so consume more resources, both compute time and generally also disk space.

Each configuration is stored as a separate specially named branch in the ACCESS-OM2 configs GitHub repository. Anyone using a configuration is advised to clone only a single branch and not attempt to keep this structure. The ACCESS-OM2 configs repo has more information about the available experiments and the naming scheme of the branches.

The first step is to choose a configuration from those available. For example, if the 1° horizontal resolution configuration with repeat-year JRA55 forcing (without bgc) is the required configuration then the release-1deg_jra55_ryf branch is the correct configuration.

The next step is to clone this branch to a location on Gadi:

mkdir -p ~/access-om2 cd ~/access-om2 payu clone -b expt -B release-1deg_jra55_ryf https://github.com/ACCESS-NRI/access-om2-configs.git 1deg_jra55_ryf Cloned repository from https://github.com/ACCESS-NRI/access-om2-configs.git to directory: .../access-om/1deg_jra55_ryf Created and checked out new branch: expt laboratory path: /scratch/.../access-om2 binary path: /scratch/.../access-om2/bin input path: /scratch/.../access-om2/input work path: /scratch/.../access-om2/work archive path: /scratch/.../access-om2/archive Updated metadata. Experiment UUID: daeee7ff-07e4-4f93-823b-cb7c6e4bdb6e Added archive symlink to /scratch/.../access-om2/archive/1deg_jra55_ryf-expt-daeee7ff To change directory to control directory run: cd 1deg_jra55_ryf

In the example above the payu clone command clones the 1° repeat-year JRA55 configuration (-B release-1deg_jra55_ryf) to a new experiment branch (-b expt) to a directory named 1deg_jra55_ryf:

mkdir -p ~/access-om2
cd ~/access-om2
payu clone -b expt -B release-1deg_jra55_ryf https://github.com/ACCESS-NRI/access-om2-configs.git 1deg_jra55_ryf

Note

payu uses branches to differentiate between different experiments the same git repository. So it is recommended to always change the branch name you clone into to something meaningful for the planned experiment. See the payu tutorial for more information.


Running an ACCESS-OM2 configuration

ACCESS-OM2 configurations run on Gadi through a PBS Job submission managed by payu.

The general layout of a payu-supported model run consists of two main directories:

  • The control directory contains the model configuration and serves as the execution directory for running the model (in this example, the cloned directory ~eaccess-om2/1deg_jra55_ryf).
  • The laboratory directory, where all the model components reside. For ACCESS-OM2, it is typically /scratch/$PROJECT/$USER/access-om2.

This separates the small text configuration files from the larger binary outputs and inputs. In this way, the control directory can be in the $HOME directory (as it is the only filesystem actively backed-up on Gadi). The quotas for $HOME are low and strict, which limits what can be stored there, so it is not suitable for larger files.

The laboratory directory is a shared space for all payu experiments using the same model.

Within the laboratory directory are two subdirectories within which payu automatically creates directories named uniquely for the experiment being run:

  • work → a temporary directory is created within here when the model is run. It gets created as part of a run and then removed after the run succeeds.
  • archive → the directory within which the output is stored following each successful run.

payu creates symbolic links in the control directory called archive and work that point to the corresponding directories in the laboratory directory.

This design allows multiple self-resubmitting experiments that share common executables and input data to be run simultaneously.

Run configuration

To run a configuration:

payu run

This will submit a single job to the queue with a run length of restart_period. restart_period is defined in the accessom2.nml file in the control directory.

Note

You can add the -f option to payu run and it will run even if there is an existing non-empty work directory created from a previous failed run or from running payu setup.

Run configuration multiple times

If you want to run a configuration multiple times automatically use the option -n:

payu run -n <number-of-runs>

This will run number-of-runs times with a total length of restart_period * number-of-runs, where restart_period is the length of each model run.

For example, to run a configuration for a total of 50 years with restart_period of 5 years the number-of-runs should be set to 10:

payu run -n 10

Note

restart_period is the configuration option that sets how long the model will run. See how to change run length for a description of where this is set and how to change it.

Monitor ACCESS-OM2 runs

payu run reports the PBS job-ID, e.g. 110020843.gadi-pbs, as the last line to the terminal. qstat can be used to query the status of the job, e.g.

qstat 110021035 qstat 110021035 Job id                Name             User             Time Use S Queue --------------------- ---------------- ---------------- -------- - ----- <110021035>.gadi-pbs  1deg_jra55_ryf   <$USER>            <time> R normal-exec

To show the status of all your submitted PBS jobs:

qstat -u $USER Job id                Name             User             Time Use S Queue --------------------- ---------------- ---------------- -------- - ----- <110021035>.gadi-pbs   1deg_jra55_ryf   <$USER>            <time> R normal-exec <000000000>.gadi-pbs   <other-job-name> <$USER>            <time> R normal-exec <000000000>.gadi-pbs   <other-job-name> <$USER>            <time> R normal-exec

The default name of the your job is the name of the payu control directory, in this example 1deg_jra55_ryf. This can be changed by altering the jobname in set in the config.yaml.

S indicates the status of your run, where:

  • Q → Job waiting in the queue to start
  • R → Job running
  • E → Job ending
  • H → Job on hold

If there are no jobs listed with your jobname (or if no job is listed), your run either successfully completed or was terminated due to an error.

A job can be on hold for a number of reasons, see the NCI documentation for more information.

PBS output files

When the model completes PBS writes the standard outout and error streams to two files into the control directory: jobname.o<job-ID> and jobname.e<job-ID> respectively. This is terminal output that isn't otherwise redirected into model log files.

You can archive these files using payu sweep which moves them to the archive directory.

Stop a run

If you want to manually terminate a run, you can do so by executing:

qdel job-ID

Which will kill the current job without waiting for it to complete. If you have used the -n option ( e.g., payu run -n), but subsequently decide not to keep running after the current process completes, you can create a file called stop_run in the control directory, and this will prevent payu from submitting another job.

Error and output log files

While the model is running, payu saves the model standard output and standard error in the access-om2.out and access-om2.err log files in the control directory. You can examine the contents of these files to check on the status of a run as it progresses.

At the end of a successful run these log files are archived to the archive directory and will not be found in the control directory. If they remain in the control directory after the PBS job for a run has completed this is an indication the run has failed.

Did the model run correctly?

To determine if a model has run correctly it must first be established that it has finished. The qstat commands above and the presence of PBS log files should be used to determine if the PBS job has ended.

If the model did not run to completion correctly the following will still be in the control directory:

work/
access-om2.err
access-om2.out

This is because payu will only run the archive step when the model runs without error.

ACCESS-OM2 outputs

At the end of a successful model run, output files, restart files and log files are moved from the work directory to the archive directory. A symbolic link in the control directory to a directory in the laboratory (/scratch/$PROJECT/$USER/access-om2/archive) is provided for convenience.

If a model run is unsuccessful the work directory is left untouched to facilitate "run forensics" to determine the cause of the model failure.

Outputs and restarts are stored in subfolders within the archive, subdivided for each run of the model.

Output folders are outputXXX and restart folders restartXXX, where XXX is the run number starting from 000.

Model components are separated into subdirectories within the output and restart directories.

cd ~/access-om2/1deg_jra55_ryf <terminal-line data="input" directory="~/access-om2/1deg_jra55_yaf<">ls output000 pbs_logs restart000

Model Live Diagnostics

ACCESS-NRI developed the Model Live Diagnostics framework to check, monitor, visualise, and evaluate model behaviour and progress of ACCESS models currently running on Gadi.

For a complete documentation on how to use this framework, check the Model Diagnostics documentation.

Trouble-shooting

If payu doesn't run correctly for some reason a good first step step, from within the control directory, is to run:

payu setup

This will prepare the model run: create the ephemeral work directory based on the experiment configuration, generate manifests and report some useful information to the user, such as the location of the laboratory where the work and archive directories are located. payu setup laboratory path: /scratch/$PROJECT/$USER/access-om2 binary path: /scratch/$PROJECT/$USER/access-om2/bin input path: /scratch/$PROJECT/$USER/access-om2/input work path: /scratch/$PROJECT/$USER/access-om2/work archive path: /scratch/$PROJECT/$USER/access-om2/archive Loading input manifest: manifests/input.yaml Loading restart manifest: manifests/restart.yaml Loading exe manifest: manifests/exe.yaml Setting up atmosphere Setting up ocean Setting up ice Setting up access-om2 Checking exe and input manifests Updating full hashes for 3 files in manifests/exe.yaml Creating restart manifest Writing manifests/restart.yaml Writing manifests/exe.yaml

This can help to isolate issues such as permissions problems accessing files and directories, missing files or malformed/incorrect paths.

Note

By default payu run will not proceed if there is an existing work directory. So after payu setup either payu sweep before attempting to run the configuration, or use payu run -f


Modifying an ACCESS-OM2 configuration

Once you are comfortable that you can clone and run an existing configuration it is often the case that you will want to modify the configuration depending on your science goals.

This section describes the model configuration and how to modify it.

Modifications can be to the way the model is run by payu, or can change the way specific model components are configured, or the coupling between them. Sometimes changes are required to both, if the model component changes require a change to the resources needed for the model to complete.

The config.yaml file located in the control directory, is the Master Configuration file, which controls the general model configuration. It contains several parts, some of which it is more likely will need modification, and others are rarely changed without significant understanding of how the model is configured.

Change run length

One of the most common changes is to adjust the duration of the model run. For example when debugging changes to a model, it is common to reduce the run length to minimise resource consumption and return faster feedback on changes.

To change the run length, edit the restart_period field in the &date_manager_nml section of the ~/access-om2/1deg_jra55_ryf/accessom2.nml file:

&date_manager_nml
    forcing_start_date = '1958-01-01T00:00:00'
    forcing_end_date = '2019-01-01T00:00:00'<br>
    ! Runtime for a single segment/job/submit, format is years, months, seconds,
    ! two of which must be zero.
    restart_period = 5, 0, 0
&end

For example, to make the model run for only 1 month change restart_period to

restart_period = 0, 1, 0

Note

While restart_period can be reduced, it should not be increased to more than 5 years to avoid errors. For more information about the difference between run length and total experiment length, or how to run the model for more than 5 years, refer to the section Run configuration for multiple years.

Modify PBS resources

If the model has been altered and needs longer to complete, more memory, or you want to change which queue it uses then this is the part of config.yaml you need to modify:

# If submitting to a different project to your default, uncomment line below
# and replace PROJECT_CODE with appropriate code. This may require setting shortpath
# project: PROJECT_CODE

# Force payu to always find, and save, files in this scratch project directory
# shortpath: /scratch/PROJECT_CODE

queue: normal
walltime: 3:00:00
jobname: 1deg_jra55_ryf
mem: 1000GB

These lines can be edited to change the PBS directives for the PBS job.

For example, to run ACCESS-OM2 under the ol01 project (COSIMA Working Group), uncomment the line beginning with # project by deleting the # symbol and replace PROJECT_CODE wih ol01:

project: ol01

If other compute projects will be used to run a configuration then the shortpath will also need to be uncommented and the path to the desired /scratch/PROJECT_CODE added. Doing this will make sure the same /scratch location is used for the laboratory regardless of which project is used to run the experiment.

Note

To run ACCESS-OM2, you need to be a member of a project with allocated Service Units (SU). For more information, check how to join relevant NCI projects

Syncing output data

As discussed above the laboratory directory is typically in a directory on ephemeral /scratch storage where files are regularly deleted once they have been unaccessed for a period of time. For this reason climate model outputs need to be moved to a location with longer term storage. On gadi this is typically under a project code on /g/data.

payu has in-built support to sync outputs, restarts and a copy of the control directory git history to another location. To do this modify this section of the config.yaml shown below: change enable to True, and set path to a location on /g/data.

# Sync options for automatically copying data from ephemeral scratch space to 
# longer term storage
sync:
    enable: False # set path below and change to true
    path: none # Set to location on /g/data or a remote server and path (rsync syntax)
    exclude:
      - '*.nc.*'
      - 'iceh.????-??-??.nc'

Saving model restarts

The model outputs restart files after every run so the model can then run again from the saved model state.

Restart files can occupy a significant amount of disk space and it isn't necessary to be able to restart the model at every point where the model was stopped during a run. The restart_freq specifies a strategy for what restart files are retained.

This can either be a number, which retains every nth numbered restart, or a pandas style date-time frequency alias. For example to preserve the ability to restart the model every 50 model run years:

restart_freq: 50Y
The most recent sequential restarts are retained, and only deleted only after a permanently archived restart has been produced.

See the documentation for more detail.

Other rarely modified configuration options

Model configuration

This tells payu which driver to use for the main model configuration (access-om2) and the location of all inputs that are common to all the component models, or submodels.

name: common
model: access-om2
input: /g/data/ik11/inputs/access-om2/input_20201102/common_1deg_jra55
The name field here is not actually used for the configuration run so you can safely ignore it.

Submodels

ACCESS-OM2 is a coupled model deploying multiple submodels (i.e. model components).

This section specifies the submodels and configuration options required to execute the model correctly.

Each submodel contains additional configuration options that are read in when the submodel is running. These options are specified in the subfolder of the control directory, whose name matches the submodel's name (e.g., configuration options for the ocean submodel are in the ~/access-om2/1deg_jra55_ryf/ocean directory).

Expand for detail
submodels:
    - name: atmosphere
    model: yatm
    exe: /g/data/vk83/apps/spack/0.20/release/linux-rocky8-x86_64/intel-19.0.5.281/libaccessom2-git.2023.10.26=2023.10.26-ieiy3e7hidn4dzaqly3ly2yu45mecgq4/bin/yatm.exe
    input:
            - /g/data/vk83/experiments/inputs/access-om2/remapping_weights/JRA55/global.1deg/2020.05.30/rmp_jrar_to_cict_CONSERV.nc
            - /g/data/vk83/experiments/inputs/JRA-55/RYF/v1-4/data
    ncpus: 1

    - name: ocean
    model: mom
    exe: /g/data/vk83/apps/spack/0.20/release/linux-rocky8-x86_64/intel-19.0.5.281/mom5-git.2023.11.09=2023.11.09-ewcdbrfukblyjxpkhd3mfkj4yxfolal4/bin/fms_ACCESS-OM.x
    input:
        - /g/data/vk83/experiments/inputs/access-om2/ocean/grids/mosaic/global.1deg/2020.05.30/grid_spec.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/grids/mosaic/global.1deg/2020.05.30/ocean_hgrid.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/grids/mosaic/global.1deg/2020.05.30/ocean_mosaic.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/grids/bathymetry/global.1deg/2020.10.22/topog.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/grids/bathymetry/global.1deg/2020.10.22/ocean_mask.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/grids/vertical/global.1deg/2020.10.22/ocean_vgrid.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/processor_masks/global.1deg/216.16x15/2020.05.30/ocean_mask_table
        - /g/data/vk83/experiments/inputs/access-om2/ocean/chlorophyll/global.1deg/2020.05.30/chl.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/initial_conditions/global.1deg/2020.10.22/ocean_temp_salt.res.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/tides/global.1deg/2020.05.30/tideamp.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/tides/global.1deg/2020.05.30/roughness_amp.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/tides/global.1deg/2020.05.30/roughness_cdbot.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/surface_salt_restoring/global.1deg/2020.05.30/salt_sfc_restore.nc
    ncpus: 216

    - name: ice
    model: cice5
    exe: /g/data/vk83/apps/spack/0.20/release/linux-rocky8-x86_64/intel-19.0.5.281/cice5-git.2023.10.19=2023.10.19-rh3xfkrgajya3ghtliacuhlx3pgvrzqs/bin/cice_auscom_360x300_24x1_24p.exe
    input:
        - /g/data/vk83/experiments/inputs/access-om2/ice/grids/global.1deg/2020.05.30/grid.nc
        - /g/data/vk83/experiments/inputs/access-om2/ice/grids/global.1deg/2020.10.22/kmt.nc
        - /g/data/vk83/experiments/inputs/access-om2/ice/initial_conditions/global.1deg/2020.05.30/i2o.nc
        - /g/data/vk83/experiments/inputs/access-om2/ice/initial_conditions/global.1deg/2020.05.30/o2i.nc
        - /g/data/vk83/experiments/inputs/access-om2/ice/initial_conditions/global.1deg/2020.05.30/u_star.nc
        - /g/data/vk83/experiments/inputs/access-om2/ice/initial_conditions/global.1deg/2020.05.30/monthly_sstsss.nc
    ncpus: 24

Collation

The MOM model typically outputs model diagnostics as tiles: rather than output a single file it is saved as a number of smaller tiles each of which contain a part of the model grid.

The collate process combines a number of these smaller files into a single output file. Restart files are typically tiled in the same way and will also be combined together if the restart option is set to true.

collate:
    restart: true
    walltime: 1:00:00
    mem: 30GB
    ncpus: 4
    queue: normal
    exe: /g/data/ik11/inputs/access-om2/bin/mppnccombine</code></pre>
  • runlog

runlog: true
When running a new configuration, payu automatically commits changes with git if runlog is set to true.

This should not be changed as it is an essential part of the provenance of an experiment. payu updates the manifest files for every run, and relies on runlog to save this information in the git history so there is a record of all inputs, restarts and executables used in an experiment.

  • Platform-specific defaults
platform: 
nodesize: 48

Set platform-specific default parameters. In this case it sets the default number of cpus per node to 48. This might need changing if the configuration is run on hardware with different nodesize.

  • User scripts
userscripts:
    error: resub.sh
    run: rm -f resubmit.count

A dictionary to run scripts or subcommands at various stages of a payu submission.

error gets called if the model does not run correctly and returns an error code. run gets called after the model execution, but prior to model output archive

Miscellaneous

There rest of the configuration settings should never need changing: stacksize, mpirun, qsub_flags and env.

Show
stacksize: unlimited
mpirun: --mca io ompio --mca io_ompio_num_aggregators 1
qsub_flags: -W umask=027
env:
UCX_LOG_LEVEL: 'error'

To find out more about other configuration settings for the config.yaml file, refer to how to configure your experiment with payu.

Edit a single ACCESS-OM2 component configuration

Each of the model components contains configuration options specific to that model that are read in when the model component is running. These options are typically useful to modify the physics used in the model, the input data or the model variables saved in the output files. These configuration options are are specified in files in a subfolder of the control directory, named the same as the submodel's name in the config.yaml submodel section (e.g., configuration options for the ocean submodel are in the ~/access-om2/1deg_jra55_ryf/ocean directory). To modify these options please refer to the User Guide of each individual model component.


References


Last update: April 25, 2024