Danger
ACCESS-OM3 is currently only in alpha release, no support is currently provided for this model and the model configuration and model code will change before the full release. Using ACCESS-OM3 is only recommended for experienced (or brave!) users and collaborators developing ACCESS models. For a supported experience, see run ACCESS-OM2
Run ACCESS-OM3
About
ACCESS-OM3 is an Ocean Sea-Ice model. More information is available in the ACCESS-OM3 overview.
The instructions below outline how to run ACCESS-OM3 using ACCESS-NRI's software deployment pipeline, specifically designed to run on the National Computating Infrastructure (NCI) supercomputer Gadi.
If you are unsure whether ACCESS-OM3 is the right choice for your experiment, take a look at the overview of ACCESS Models.
All ACCESS-OM3 configurations are open source, licensed under CC BY 4.0 and available on ACCESS-NRI GitHub.
Prerequisites
-
NCI Account
Before running ACCESS-OM3, you need to Set Up your NCI Account. -
Join NCI projects
Join the following projects by requesting membership on their respective NCI project pages:For more information on joining specific NCI projects, refer to How to connect to a project.
-
Payu
Payu is a workflow management tool for running numerical models in supercomputing environments, for which there is extensive documentation.
Payu on Gadi is available through a dedicatedconda
environment in the vk83 project.
After joining the vk83 project, load thepayu
module:module use /g/data/vk83/modules module load payu
To check that payu is available, run:
payu --version
payu --version 1.1.3 Warning
payu version >=1.1.6 is required
Get ACCESS-OM3 configuration
Danger
ACCESS-OM3 is currently only in alpha release, no support is currently provided for this model and the model configurations. The model code/configurations will change before the full release. When browsing the configuration repository on GitHub any branch that has a prefix dev-
indicates that it is still in development (eventually release-
prefix indicates supported configurations -- example).
All released ACCESS-OM3 configurations are available from the ACCESS-OM3 configs GitHub repository.
Released configurations are tested and supported by ACCESS-NRI.
For more information on ACCESS-OM3 configurations, check ACCESS-OM3 page.
More information about the available experiments and the naming scheme of the branches can also be found in the ACCESS-OM3 configs GitHub repository.
The first step is to choose a configuration from those available.
For example, if the required configuration is the 1° horizontal resolution with repeat-year JRA55 forcing (without BGC), then the branch to select is dev-1deg_jra55do_ryf
.
To clone this branch to a location on Gadi and navigate to that directory, run:
mkdir -p ~/access-om3
cd ~/access-om3
payu clone -b expt -B dev-1deg_jra55do_ryf https://github.com/ACCESS-NRI/access-om3-configs 1deg_jra55_ryf
cd 1deg_jra55_ryf
In the example above the payu clone
command clones the 1° repeat-year JRA55 configuration (-B dev-1deg_jra55do_ryf
)
as a new experiment branch (-b expt
) to a directory named 1deg_jra55_ryf
.
Admonition
Anyone using a configuration is advised to clone only a single branch (as shown in the example above) and not the entire repository.
Tip
payu uses branches to differentiate between different experiments in the same local git repository.
For this reason, it is recommended to always set the cloned branch name (expt
in the example above) to something meaningful for the planned experiment.
For more information refer to this payu tutorial.
Run ACCESS-OM3 configuration
If you want to modify your configuration, refer to Edit ACCESS-OM3 configuration.
ACCESS-OM3 configurations run on Gadi through a PBS job submission managed by payu.
The general layout of a payu-supported model run consists of two main directories:
- The control directory contains the model configuration and serves as the execution directory for running the model (in this example, the cloned directory
~/access-om3/1deg_jra55_ryf
). - The laboratory directory, where all the model components reside. For ACCESS-OM3, it is typically
/scratch/$PROJECT/$USER/access-om3
.
This separates the small text configuration files from the larger binary outputs and inputs. In this way, the control directory can be in the $HOME
directory (as it is the only filesystem actively backed-up on Gadi). The quotas for $HOME
are low and strict, which limits what can be stored there, so it is not suitable for larger files.
The laboratory directory is a shared space for all payu experiments using the same model.
Inside the laboratory directory there are two subdirectories:
work
→ a directory where payu automatically creates a temporary subdirectory while the model is run. The temporary subdirectory gets created as part of a run and then removed after the run succeeds.archive
→ the directory where the output is stored following each successful run.
Within each of the above directories payu automatically creates subdirectories uniquely named according to the experiment being run.
Payu also creates symbolic links in the control directory pointing to the archive
and work
directories.
This design allows multiple self-resubmitting experiments that share common executables and input data to be run simultaneously.
Admonition
Files on the /scratch
drive, such as the laboratory directory, might get deleted if not accessed for several days and the /scratch
drive is limited in space. For these reasons, all model runs which are to be kept should be moved to /g/data/
by enabling the sync step in payu. To know more refer to Syncing output data.
Run configuration
To run ACCESS-OM3 configuration execute the following command from within the control directory:
payu run
This will submit a single job to the queue with the default run length specified in the configuration.
For information about changing the run length, refer to Change run length.
Tip
You can add the -f
option to payu run
to let the model run even if there is an existing non-empty work
directory, created from a previous failed run or from running payu setup
.
Monitor ACCESS-OM3 runs
The payu run
command prints out the PBS job-ID
(formatted as <9-digit-number>.gadi-pbs
), as the last line to the terminal.
To print out information on the status of a specific job, you can execute the following command:
qstat <job-ID>
To show the status of all your submitted PBS jobs, you can execute the following command:
qstat -u $USER
The default name of your job is the name of the payu control directory (1deg_jra55do_ryf
in the example above).
This can be changed by altering the jobname
in the PBS resources section of the config.yaml
file.
S indicates the status of your run, where:
- Q → Job waiting in the queue to start
- R → Job running
- E → Job ending
- H → Job on hold
If there are no jobs listed with your jobname
(or if no job is listed), your run either successfully completed or was terminated due to an error.
For more information, check NCI documentation.
Tip
While the model is running, you can monitor its progress by running:
grep cur_exp-datetime work/atmosphere/log/matmxx.pe00000.log
Stop a run
If you want to manually terminate a run, you can do so by executing:
qdel <job-ID>
Tip
If you started an ACCESS-OM3 run using the -n
option (e.g., to run the model for more than 5 years), but subsequently decide not to keep running after the current process completes, you can create a file called stop_run
in the control directory.
This will prevent payu from submitting another job.
Error and output log files
PBS output files
When the model completes a run, PBS writes the standard output and error streams to two files inside the control directory: <jobname>.o<job-ID>
and <jobname>.e<job-ID>
, respectively.
These files usually contain logs about payu tasks, and give an overview of the resources used by the job.
To move these files to the archive
directory, use the following commmand:
payu sweep
Model log files
While the model is running, payu saves the model standard output and error streams in the access-om3.out
and access-om3.err
files inside the control directory, respectively.
You can examine the contents of these files to check on the status of a run as it progresses (or after a failed run has completed).
Warning
At the end of a successful run these log files are archived to the archive
directory and will no longer be found in the control directory. If they remain in the control directory after the PBS job for a run has completed it means the run has failed.
Trouble-shooting
If payu doesn't run correctly for some reason, a good first step is to run the following command from within the control directory:
payu setup
This command will:
- create the laboratory and
work
directories based on the experiment configuration - generate manifests
- report useful information to the user, such as the location of the laboratory where the
work
andarchive
directories are located
This can help to isolate issues such as permissions problems accessing files and directories, missing files or malformed/incorrect paths.
ACCESS-OM3 outputs
At the end of a successful model run, output files, restart files and log files are moved from the work
directory to the archive
directory.
Symbolic links to these directories are also provided in the control directory for convenience.
If a model run is unsuccessful, the work
directory is left untouched to facilitate the identification of the cause of the model failure.
Outputs and restarts are stored in subfolders within the archive
directory, subdivided for each run of the model.
Output and restart folders are called outputXXX
and restartXXX
, respectively, where XXX is the run number starting from 000
.
Model components are separated into subdirectories within the output and restart directories.
Edit ACCESS-OM3 configuration
This section describes how to modify ACCESS-OM3 configuration.
The modifications discussed in this section can change the way ACCESS-OM3 is run by payu, or how its specific model components are configured and coupled together.
The config.yaml
file located in the control directory is the Master Configuration file, which controls the general model configuration. It contains several parts, some of which it is more likely will need modification, and others which are rarely changed without having a deep understanding of how the model is configured.
To find out more about configuration settings for the config.yaml
file, refer to how to configure your experiment with payu.
Change run length
One of the most common changes is to adjust the duration of the model run.
For example, when debugging changes to a model, it is common to reduce the run length to minimise resource consumption and return faster feedback on changes.
The run length and restart period are controlled by a set of parameters in the ~/access-om3/1deg_jra55_ryf/nuopc.runconfig
file:
CLOCK_attributes::
...
restart_n = 1
restart_option = nyears
...
stop_n = 1
stop_option = nyears
...
stop_option
and stop_n
control how long the model will run.
Common options for stop_option
are nseconds
, nhours
, ndays
, nmonths
and nyears
. stop_n
provides the numerical count for stop_option
.
restart_option
and restart_n
control how often restarts are written.
In general, users will want to write restarts at the end of each run so should set the restart_*
controls to match the stop_*
controls.
For example, to run a configuration for 2 months a user should set the following in the ~/access-om3/1deg_jra55_ryf/nuopc.runconfig
file:
CLOCK_attributes::
...
restart_n = 2
restart_option = nmonths
...
stop_n = 2
stop_option = nmonths
...
Modify PBS resources
If the model has been altered and needs more time to complete, more memory, or needs to be submitted under a different NCI project, you will need to modify the following section in the config.yaml
:
# If submitting to a different project to your default, uncomment line below
# and replace PROJECT_CODE with appropriate code. This may require setting shortpath
# project: PROJECT_CODE
# Force payu to always find, and save, files in this scratch project directory
# shortpath: /scratch/PROJECT_CODE
queue: normal
ncpus: 240
jobfs: 10GB
mem: 960GB
walltime: 02:00:00
These lines can be edited to change the PBS directives for the PBS job.
For example, to run ACCESS-OM3 under the ol01
project (COSIMA Working Group), uncomment the line beginning with # project
by deleting the #
symbol and replace PROJECT_CODE
wih ol01
:
project: ol01
Warning
If projects other than ol01
are used to run ACCESS-OM3 configuration, then the shortpath
field also needs to be uncommented and the path to the desired /scratch/PROJECT_CODE
added.
Doing this will make sure the same /scratch
location is used for the laboratory, regardless of which project is used to run the experiment.
To run ACCESS-OM3, you need to be a member of a project with allocated Service Units (SU). For more information, check how to join relevant NCI projects.
Syncing output data
The laboratory directory is typically under the /scratch
storage on Gadi, where files are regularly deleted once they have been unaccessed for a period of time. For this reason climate model outputs need to be moved to a location with longer term storage.
On Gadi, this is typically in a folder under a project code on /g/data
.
Payu has built-in support to sync outputs, restarts and a copy of the control directory git history to another location.
This feature is controlled by the following section in the config.yaml
file:
# Sync options for automatically copying data from ephemeral scratch space to
# longer term storage
sync:
enable: False # set path below and change to true
path: none # Set to location on /g/data or a remote server and path (rsync syntax)
enable
to True
, and set path
to a location on /g/data
, where payu will copy output and restart folders. A sensible path
could be: /g/data/$PROJECT/$USER/ACCESS-OM3/experiment_name/
.
Saving model restarts
ACCESS-OM3 outputs restart files after every run to allow for subsequent runs to start from a previously saved model state.
Restart files can occupy a significant amount of disk space, and keeping a lot of them is often not necessary.
The restart_freq
field in the config.yaml
file specifies a strategy for retaining restart files.
This is a number (in which case every nth restart file is retained).
The most recent sequential restarts are retained, and only deleted after a permanently archived restart file has been produced.
For more information, check payu Configuration Settings documentation.
Other configuration options
Warning
The following sections in the config.yaml
file control configuration options that are rarely modified, and often require a deeper understanding of how ACCESS-OM3 is structured to be safely changed.
Model configuration
This section tells payu which driver to use for the main model configuration (access-om3
) and the location of all inputs that are common to all its model components.
model: access-om3
exe: access-om3-MOM6-CICE6
input:
- /g/data/vk83/configurations/inputs/access-om3/share/meshes/global.1deg/2024.01.25/access-om2-1deg-ESMFmesh.nc
- /g/data/vk83/configurations/inputs/access-om3/share/meshes/global.1deg/2024.01.25/access-om2-1deg-nomask-ESMFmesh.nc
- /g/data/vk83/configurations/inputs/access-om3/share/meshes/share/2024.09.16/JRA55do-datm-ESMFmesh.nc
- /g/data/vk83/configurations/inputs/access-om3/share/meshes/share/2024.09.16/JRA55do-drof-ESMFmesh.nc
- /g/data/vk83/configurations/inputs/access-om3/share/grids/global.1deg/2020.10.22/topog.nc
- /g/data/vk83/configurations/inputs/access-om3/mom/grids/mosaic/global.1deg/2020.05.30/ocean_hgrid.nc
- /g/data/vk83/configurations/inputs/access-om3/mom/grids/vertical/global.1deg/2023.07.28/ocean_vgrid.nc
- /g/data/vk83/configurations/inputs/access-om3/mom/initial_conditions/global.1deg/2020.10.22/ocean_temp_salt.res.nc
- /g/data/vk83/configurations/inputs/access-om3/mom/surface_salt_restoring/global.1deg/2020.05.30/salt_sfc_restore.nc
- /g/data/vk83/configurations/inputs/access-om3/cice/grids/global.1deg/2024.05.14/grid.nc
- /g/data/vk83/configurations/inputs/access-om3/cice/grids/global.1deg/2024.05.14/kmt.nc
- /g/data/vk83/configurations/inputs/access-om3/cice/initial_conditions/global.1deg/2023.07.28/iced.1900-01-01-10800.nc
- /g/data/vk83/configurations/inputs/JRA-55/RYF/v1-4/data
Runlog
runlog: true
git
if runlog
is set to true
.
Warning
This should not be changed as it is an essential part of the provenance of an experiment.
payu updates the manifest files for every run, and relies on runlog
to save this information in the git
history, so there is a record of all inputs, restarts, and executables used in an experiment.
Platform
platform:
nodesize: 48
In the example above, the default number of cpus per node is set to 48.
Warning
This might need changing if the configuration is run on hardware with different node structure.
Userscripts
userscripts:
setup: /usr/bin/bash /g/data/vk83/apps/om3-scripts/payu_config/setup.sh
archive: /usr/bin/bash /g/data/vk83/apps/om3-scripts/payu_config/archive.sh
A dictionary to run scripts or subcommands at various stages of a payu submission.
setup
gets called if after model setup, but prior to model execution.archive
gets called after the model execution, but prior to model output archive.
For more information about specific userscripts
fields, check the relevant section of payu Configuration Settings documentation.
Create a custom ACCESS-OM3 build
All the executables needed to run ACCESS-OM3 are pre-built into independent configurations using Spack.
To customise ACCESS-OM3's build (for example to run ACCESS-OM3 with changes in the source code of one of its component), refer to Modify an ACCESS model's source code.
Get Help
If you have questions or need help regarding ACCESS-OM3, consider creating a topic in the COSIMA category of the ACCESS-Hive Forum.
For assistance on how to request help from ACCESS-NRI, follow the guidelines on how to get help.