Run ACCESS-OM2

About

ACCESS-OM2 is an Ocean Sea-Ice model. More information is available in the ACCESS-OM2 overview.

The instructions below outline how to run ACCESS-OM2 using ACCESS-NRI's software deployment pipeline, specifically designed to run on the National Computating Infrastructure (NCI) supercomputer Gadi.

If you are unsure whether ACCESS-OM2 is the right choice for your experiment, take a look at the overview of ACCESS Models.

All Model configurations are open source, licensed under CC BY 4.0 CC icon BY icon and available on ACCESS-NRI GitHub.

ACCESS-OM2 release notes are available on the ACCESS-Hive Forum and are updated when new releases are made available.

Prerequisites

General prerequisites

Before running ACCESS-OM2, you need to fulfil general prerequisites outlined in the First Steps section.

Model-specific prerequisites

Join the vk83 and qv56 projects at NCI
To join these projects, request membership on the respective vk83 and qv56 NCI project pages.
For more information on joining specific NCI projects, refer to How to connect to a project.
Payu
Payu is a workflow management tool for running numerical models in supercomputing environments, for which there is extensive documentation.
Payu on Gadi is available through a dedicated conda environment in the vk83 project.
After joining the vk83 project, load the payu module:
```
module use /g/data/vk83/modules
module load payu
```
To check that payu is available, run:
```
payu --version
```
payu --version 1.1.3

Warning

payu version >=1.1.3 is required

Get ACCESS-OM2 configuration

All released ACCESS-OM2 configurations are available from the ACCESS-OM2 configs GitHub repository.
Released configurations are tested and supported by ACCESS-NRI, as an adaptation of those originally developed by COSIMA.

For more information on ACCESS-OM2 configurations, check ACCESS-OM2 page.

More information about the available experiments and the naming scheme of the branches can also be found in the ACCESS-OM2 configs GitHub repository.

The first step is to choose a configuration from those available.
For example, if the required configuration is the 1° horizontal resolution with repeat-year JRA55 forcing (without BGC), then the branch to select is release-1deg_jra55_ryf.

To clone this branch to a location on Gadi, run:

mkdir -p ~/access-om2
cd ~/access-om2
payu clone -b expt -B release-1deg_jra55_ryf https://github.com/ACCESS-NRI/access-om2-configs 1deg_jra55_ryf

In the example above the payu clone command clones the 1° repeat-year JRA55 configuration (-B release-1deg_jra55_ryf) to a new experiment branch (-b expt) to a directory named 1deg_jra55_ryf.

Admonition

Anyone using a configuration is advised to clone only a single branch (as shown in the example above) and not the entire repository.

mkdir -p ~/access-om2 cd ~/access-om2 payu clone -b expt -B release-1deg_jra55_ryf https://github.com/ACCESS-NRI/access-om2-configs.git 1deg_jra55_ryf Cloned repository from https://github.com/ACCESS-NRI/access-om2-configs.git to directory: .../access-om/1deg_jra55_ryf Created and checked out new branch: expt laboratory path: /scratch/.../access-om2 binary path: /scratch/.../access-om2/bin input path: /scratch/.../access-om2/input work path: /scratch/.../access-om2/work archive path: /scratch/.../access-om2/archive Updated metadata. Experiment UUID: daeee7ff-07e4-4f93-823b-cb7c6e4bdb6e Added archive symlink to /scratch/.../access-om2/archive/1deg_jra55_ryf-expt-daeee7ff To change directory to control directory run: cd 1deg_jra55_ryf

Tip

payu uses branches to differentiate between different experiments in the same local git repository.
For this reason, it is recommended to always set the cloned branch name (expt in the example above) to something meaningful for the planned experiment.
For more information refer to this payu tutorial.

Run ACCESS-OM2 configuration

If you want to modify your configuration, refer to Edit ACCESS-OM2 configuration.

ACCESS-OM2 configurations run on Gadi through a PBS job submission managed by payu.

The general layout of a payu-supported model run consists of two main directories:

The control directory contains the model configuration and serves as the execution directory for running the model (in this example, the cloned directory ~/access-om2/1deg_jra55_ryf).
The laboratory directory, where all the model components reside. For ACCESS-OM2, it is typically /scratch/$PROJECT/$USER/access-om2.

This separates the small text configuration files from the larger binary outputs and inputs. In this way, the control directory can be in the $HOME directory (as it is the only filesystem actively backed-up on Gadi). The quotas for $HOME are low and strict, which limits what can be stored there, so it is not suitable for larger files.

The laboratory directory is a shared space for all payu experiments using the same model.
Inside the laboratory directory there are two subdirectories:

work → a directory where payu automatically creates a temporary subdirectory while the model is run. The temporary subdirectory gets created as part of a run and then removed after the run succeeds.
archive → the directory where the output is stored following each successful run.

Within each of the above directories payu automatically creates subdirectories uniquely named according to the experiment being run.
Payu also creates symbolic links in the control directory pointing to the archive and work directories.

This design allows multiple self-resubmitting experiments that share common executables and input data to be run simultaneously.

Admonition

Files on the /scratch drive, such as the laboratory directory, might get deleted if not accessed for several days and the /scratch drive is limited in space. For these reasons, all model runs which are to be kept should be moved to /g/data/ by enabling the sync step in payu. To know more refer to Syncing output data.

Run configuration

To run ACCESS-OM2 configuration execute the following command from within the control directory:

payu run

This will submit a single job to the queue with a run length of restart_period.
For information about restart_period, refer to Change run length.

cd ~/access-om2/1deg_jra55_ryf payu run payu: warning: Job request includes 47 unused CPUs. payu: warning: CPU request increased from 241 to 288 Loading input manifest: manifests/input.yaml Loading restart manifest: manifests/restart.yaml Loading exe manifest: manifests/exe.yaml payu: Found modules in /opt/Modules/v4.3.0 qsub -q normal -P tm70 -l walltime=10800 -l ncpus=288 -l mem=1000GB -N 1deg_jra55_ryf -l wd -j n -v PAYU_PATH=/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/bin,MODULESHOME=/opt/Modules/v4.3.0,MODULES_CMD=/opt/Modules/v4.3.0/libexec/modulecmd.tcl,MODULEPATH=/g/data/hr22/modulefiles:/g/data/hh5/public/modules:/etc/scl/modulefiles:/opt/Modules/modulefiles:/opt/Modules/v4.3.0/modulefiles:/apps/Modules/modulefiles -W umask=027 -l storage=gdata/hh5+gdata/vk83 -- /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/bin/python3.10 /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/bin/payu-run <job-ID>.gadi-pbs>

Tip

You can add the -f option to payu run to let the model run even if there is an existing non-empty work directory, created from a previous failed run or from running payu setup.

Run configuration for more than 5 years

As mentioned in the Change run length section, you cannot specify more than 5 years as restart_period.
If you want to run a configuration for more than 5 years, you need to use the -n option:

payu run -n <number-of-runs>

This will run ACCESS-OM2 number-of-runs consecutive times, each with a run length equal to restart_period.
This way, the total experiment length will be restart_period * number-of-runs.

For example, to run a configuration for a total of 50 years with a restart_period of 5 years, the number-of-runs should be set to 10:

payu run -n 10

Monitor ACCESS-OM2 runs

The payu run command prints out the PBS job-ID (formatted as <9-digit-number>.gadi-pbs), as the last line to the terminal.
To print out information on the status of a specific job, you can execute the following command:

qstat <job-ID>

qstat <job-ID>

Job id Name User Time Use S Queue

--------------------- ---------------- ---------------- -------- - -----

<job-ID>.gadi-pbs 1deg_jra55_ryf <$USER> <time> R normal-exec

To show the status of all your submitted PBS jobs, you can execute the following command:

qstat -u $USER

qstat -u $USER Job id Name User Time Use S Queue --------------------- ---------------- ---------------- -------- - ----- <job-ID>.gadi-pbs 1deg_jra55_ryf <$USER> <time> R normal-exec <job-ID>.gadi-pbs <other-job-name> <$USER> <time> R normal-exec <job-ID>.gadi-pbs <other-job-name> <$USER> <time> R normal-exec

The default name of your job is the name of the payu control directory (1deg_jra55_ryf in the example above).
This can be changed by altering the jobname in the PBS resources section of the config.yaml file.

S indicates the status of your run, where:

Q → Job waiting in the queue to start
R → Job running
E → Job ending
H → Job on hold

If there are no jobs listed with your jobname (or if no job is listed), your run either successfully completed or was terminated due to an error.
For more information, check NCI documentation.

Stop a run

If you want to manually terminate a run, you can do so by executing:

qdel <job-ID>

which kills the specified job without waiting for it to complete.

Tip

If you started an ACCESS-OM2 run using the -n option (e.g., to run the model for more than 5 years), but subsequently decide not to keep running after the current process completes, you can create a file called stop_run in the control directory.
This will prevent payu from submitting another job.

Error and output log files

PBS output files

When the model completes a run, PBS writes the standard output and error streams to two files inside the control directory: <jobname>.o<job-ID> and <jobname>.e<job-ID>, respectively.

These files usually contain logs about payu tasks, and give an overview of the resources used by the job.
To move these files to the archive directory, use the following commmand:

payu sweep

Model log files

While the model is running, payu saves the model standard output and error streams in the access-om2.out and access-om2.err files inside the control directory, respectively.
You can examine the contents of these files to check on the status of a run as it progresses (or after a failed run has completed).

Warning

At the end of a successful run these log files are archived to the archive directory and will no longer be found in the control directory. If they remain in the control directory after the PBS job for a run has completed it means the run has failed.

Model Live Diagnostics

ACCESS-NRI developed the Model Live Diagnostics framework to check, monitor, visualise, and evaluate model behaviour and progress of ACCESS models currently running on Gadi.
For a complete documentation on how to use this framework, check the Model Diagnostics documentation.

Trouble-shooting

If payu doesn't run correctly for some reason, a good first step is to run the following command from within the control directory:

payu setup

This command will:

create the laboratory and work directories based on the experiment configuration
generate manifests
report useful information to the user, such as the location of the laboratory where the work and archive directories are located

payu setup laboratory path: /scratch/$PROJECT/$USER/access-om2 binary path: /scratch/$PROJECT/$USER/access-om2/bin input path: /scratch/$PROJECT/$USER/access-om2/input work path: /scratch/$PROJECT/$USER/access-om2/work archive path: /scratch/$PROJECT/$USER/access-om2/archive Loading input manifest: manifests/input.yaml Loading restart manifest: manifests/restart.yaml Loading exe manifest: manifests/exe.yaml Setting up atmosphere Setting up ocean Setting up ice Setting up access-om2 Checking exe and input manifests Updating full hashes for 3 files in manifests/exe.yaml Creating restart manifest Writing manifests/restart.yaml Writing manifests/exe.yaml

This can help to isolate issues such as permissions problems accessing files and directories, missing files or malformed/incorrect paths.

ACCESS-OM2 outputs

At the end of a successful model run, output files, restart files and log files are moved from the work directory to the archive directory.
Symbolic links to these directories are also provided in the control directory for convenience.

If a model run is unsuccessful, the work directory is left untouched to facilitate the identification of the cause of the model failure.

Outputs and restarts are stored in subfolders within the archive directory, subdivided for each run of the model.
Output and restart folders are called outputXXX and restartXXX, respectively, where XXX is the run number starting from 000.

Model components are separated into subdirectories within the output and restart directories.

cd ~/access-om2/1deg_jra55_ryf <terminal-line data="input" directory="~/access-om2/1deg_jra55_yaf<">ls output000 pbs_logs restart000

Edit ACCESS-OM2 configuration

This section describes how to modify ACCESS-OM2 configuration.
The modifications discussed in this section can change the way ACCESS-OM2 is run by payu, or how its specific model components are configured and coupled together.

The config.yaml file located in the control directory is the Master Configuration file, which controls the general model configuration. It contains several parts, some of which it is more likely will need modification, and others which are rarely changed without having a deep understanding of how the model is configured.

To find out more about configuration settings for the config.yaml file, refer to how to configure your experiment with payu.

Change run length

One of the most common changes is to adjust the duration of the model run.
For example, when debugging changes to a model, it is common to reduce the run length to minimise resource consumption and return faster feedback on changes.

The run length is controlled by the restart_period field in the &date_manager_nml section of the ~/access-om2/1deg_jra55_ryf/accessom2.nml file:

&date_manager_nml
    forcing_start_date = '1958-01-01T00:00:00'
    forcing_end_date = '2019-01-01T00:00:00'<br>
    ! Runtime for a single segment/job/submit, format is years, months, seconds,
    ! two of which must be zero.
    restart_period = 5, 0, 0
&end

The format is restart_period = <number_of_years>, <number_of_months>, <number_of_days>.

For example, to make the model run for 1 year, 4 months and 10 days, change restart_period to:

restart_period = 1, 4, 10

Warning

While restart_period can be reduced, it should not be increased to more than 5 years to avoid errors.

It is also important to differentiate between run length and total experiment length.
For more information about their difference, or how to run the model for more than 5 years, refer to the section Run configuration for more than 5 years.

Modify PBS resources

If the model has been altered and needs more time to complete, more memory, or needs to be submitted under a different NCI project, you will need to modify the following section in the config.yaml:

# If submitting to a different project to your default, uncomment line below
# and replace PROJECT_CODE with appropriate code. This may require setting shortpath
# project: PROJECT_CODE

# Force payu to always find, and save, files in this scratch project directory
# shortpath: /scratch/PROJECT_CODE

queue: normal
walltime: 3:00:00
jobname: 1deg_jra55_ryf
mem: 1000GB

These lines can be edited to change the PBS directives for the PBS job.

For example, to run ACCESS-OM2 under the ol01 project (COSIMA Working Group), uncomment the line beginning with # project by deleting the # symbol and replace PROJECT_CODE wih ol01:

project: ol01

Warning

If projects other than ol01 are used to run ACCESS-OM2 configuration, then the shortpath field also needs to be uncommented and the path to the desired /scratch/PROJECT_CODE added.
Doing this will make sure the same /scratch location is used for the laboratory, regardless of which project is used to run the experiment.

To run ACCESS-OM2, you need to be a member of a project with allocated Service Units (SU). For more information, check how to join relevant NCI projects.

Syncing output data

The laboratory directory is typically under the /scratch storage on Gadi, where files are regularly deleted once they have been unaccessed for a period of time. For this reason climate model outputs need to be moved to a location with longer term storage.
On Gadi, this is typically in a folder under a project code on /g/data.

Payu has built-in support to sync outputs, restarts and a copy of the control directory git history to another location.
This feature is controlled by the following section in the config.yaml file:

# Sync options for automatically copying data from ephemeral scratch space to 
# longer term storage
sync:
    enable: False # set path below and change to true
    path: none # Set to location on /g/data or a remote server and path (rsync syntax)
    exclude:
      - '*.nc.*'
      - 'iceh.????-??-??.nc'

To enable syncing, change enable to True, and set path to a location on /g/data, where payu will copy output and restart folders. A sensible path could be: /g/data/$PROJECT/$USER/experiment_name/.

Admonition

The ACCESS-OM2 default configurations include a userscript in the sync step that concatenates daily history/diagnostic output from the Sea-Ice model (CICE5) into monthly files. This speeds up access and saves storage space, but will only run if sync is enabled.

Saving model restarts

ACCESS-OM2 outputs restart files after every run to allow for subsequent runs to start from a previously saved model state.
Restart files can occupy a significant amount of disk space, and keeping a lot of them is often not necessary.

The restart_freq field in the config.yaml file specifies a strategy for retaining restart files.
This can either be a number (in which case every nth restart file is retained), or one of the following pandas-style datetime frequencies:

YS → start of the year
MS → start of the month
D → day
H → hour
T → minute
S → second

For example, to preserve the ability to restart ACCESS-OM2 every 50 model-years, set:

restart_freq: '50YS'

The most recent sequential restarts are retained, and only deleted after a permanently archived restart file has been produced.

For more information, check payu Configuration Settings documentation.

Other configuration options

Warning

The following sections in the config.yaml file control configuration options that are rarely modified, and often require a deeper understanding of how ACCESS-OM2 is structured to be safely changed.

Model configuration

This section tells payu which driver to use for the main model configuration (access-om2) and the location of all inputs that are common to all its model components.

name: common
model: access-om2
input: /g/data/ik11/inputs/access-om2/input_20201102/common_1deg_jra55

The name field is not actually used for the configuration run, so it can be safely ignored.

Submodels

ACCESS-OM2 is a coupled model deploying multiple submodels (i.e. model components).

This section specifies the submodels and configuration options required to execute ACCESS-OM2 correctly.

Each submodel contains additional configuration options that are read in when the submodel is running. These options are specified in the subfolder of the control directory whose name matches the submodel's name (e.g., configuration options for the ocean submodel are in the ~/access-om2/1deg_jra55_ryf/ocean directory).

Expand to show the full submodels section

submodels:
    - name: atmosphere
      model: yatm
      exe: /g/data/vk83/apps/spack/0.20/release/linux-rocky8-x86_64/intel-19.0.5.281/libaccessom2-git.2023.10.26=2023.10.26-ieiy3e7hidn4dzaqly3ly2yu45mecgq4/bin/yatm.exe
      input:
            - /g/data/vk83/experiments/inputs/access-om2/remapping_weights/JRA55/global.1deg/2020.05.30/rmp_jrar_to_cict_CONSERV.nc
            - /g/data/vk83/experiments/inputs/JRA-55/RYF/v1-4/data
      ncpus: 1

    - name: ocean
      model: mom
      exe: /g/data/vk83/apps/spack/0.20/release/linux-rocky8-x86_64/intel-19.0.5.281/mom5-git.2023.11.09=2023.11.09-ewcdbrfukblyjxpkhd3mfkj4yxfolal4/bin/fms_ACCESS-OM.x
      input:
        - /g/data/vk83/experiments/inputs/access-om2/ocean/grids/mosaic/global.1deg/2020.05.30/grid_spec.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/grids/mosaic/global.1deg/2020.05.30/ocean_hgrid.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/grids/mosaic/global.1deg/2020.05.30/ocean_mosaic.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/grids/bathymetry/global.1deg/2020.10.22/topog.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/grids/bathymetry/global.1deg/2020.10.22/ocean_mask.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/grids/vertical/global.1deg/2020.10.22/ocean_vgrid.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/processor_masks/global.1deg/216.16x15/2020.05.30/ocean_mask_table
        - /g/data/vk83/experiments/inputs/access-om2/ocean/chlorophyll/global.1deg/2020.05.30/chl.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/initial_conditions/global.1deg/2020.10.22/ocean_temp_salt.res.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/tides/global.1deg/2020.05.30/tideamp.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/tides/global.1deg/2020.05.30/roughness_amp.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/tides/global.1deg/2020.05.30/roughness_cdbot.nc
        - /g/data/vk83/experiments/inputs/access-om2/ocean/surface_salt_restoring/global.1deg/2020.05.30/salt_sfc_restore.nc
      ncpus: 216

    - name: ice
      model: cice5
      exe: /g/data/vk83/apps/spack/0.20/release/linux-rocky8-x86_64/intel-19.0.5.281/cice5-git.2023.10.19=2023.10.19-rh3xfkrgajya3ghtliacuhlx3pgvrzqs/bin/cice_auscom_360x300_24x1_24p.exe
      input:
        - /g/data/vk83/experiments/inputs/access-om2/ice/grids/global.1deg/2020.05.30/grid.nc
        - /g/data/vk83/experiments/inputs/access-om2/ice/grids/global.1deg/2020.10.22/kmt.nc
        - /g/data/vk83/experiments/inputs/access-om2/ice/initial_conditions/global.1deg/2020.05.30/i2o.nc
        - /g/data/vk83/experiments/inputs/access-om2/ice/initial_conditions/global.1deg/2020.05.30/o2i.nc
        - /g/data/vk83/experiments/inputs/access-om2/ice/initial_conditions/global.1deg/2020.05.30/u_star.nc
        - /g/data/vk83/experiments/inputs/access-om2/ice/initial_conditions/global.1deg/2020.05.30/monthly_sstsss.nc
      ncpus: 24

Collate

Rather than outputting a single diagnostic file over the whole model horizontal grid, MOM typically generates diagnostic outputs as tiles, each of which spans over a portion of the model horizontal grid.

The collate section in the yaml.conf file controls the process that combines these smaller files into a single output file.

collate:
    restart: true
    walltime: 1:00:00
    mem: 30GB
    ncpus: 4
    queue: normal
    exe: /g/data/ik11/inputs/access-om2/bin/mppnccombine

Restart files are typically tiled in the same way and will also be combined together if the restart field is set to true.

Runlog

runlog: true

When running a new configuration, payu automatically commits changes with git if runlog is set to true.

Warning

This should not be changed as it is an essential part of the provenance of an experiment.
payu updates the manifest files for every run, and relies on runlog to save this information in the git history, so there is a record of all inputs, restarts, and executables used in an experiment.

Platform

platform: 
    nodesize: 48

Set platform-specific default parameters.
In the example above, the default number of cpus per node is set to 48.

Warning

This might need changing if the configuration is run on hardware with different node structure.

Userscripts

userscripts:
    error: tools/resub.sh
    run: rm -f resubmit.count
    sync: /g/data/vk83/apps/om2-scripts/concatenate_ice/concat_ice_daily.sh

A dictionary to run scripts or subcommands at various stages of a payu submission.

error gets called if the model does not run correctly and returns an error code.
run gets called after the model execution, but prior to model output archive.
sync gets called at the start of the sync pbs job. For more information refer to Syncing output data.

For more information about specific userscripts fields, check the relevant section of payu Configuration Settings documentation.

Miscellaneous

The following configuration settings should never require changing:

stacksize: unlimited
mpirun: --mca io ompio --mca io_ompio_num_aggregators 1
qsub_flags: -W umask=027
env:
    UCX_LOG_LEVEL: 'error'

Edit a single ACCESS-OM2 component configuration

Each of ACCESS-OM2 components contains additional configuration options that are read in when the model component is running.
These options are typically useful to modify the physics used in the model, the input data, or the model variables saved in the output files.

These configuration options are specified in files located inside a subfolder of the control directory, named according to the submodel's name specified in the config.yaml submodels section (e.g., configuration options for the ocean component are in the ~/access-om2/1deg_jra55_ryf/ocean directory).
To modify these options please refer to the User Guide of the respective model component.

Get Help

If you have questions or need help regarding ACCESS-OM2, consider creating a topic in the COSIMA category of the ACCESS Hive Forum.
For assistance on how to request help from ACCESS-NRI, follow the guidelines on how to get help.

- https://cosima.org.au - Kiss et al. (2020) - https://payu.readthedocs.io/en/latest/usage.html - https://github.com/access-nri/access-om2 - https://opus.nci.org.au/

Last update: July 24, 2024