Design¶
Requirements¶
- Use a common tool for all models supported by the Centre
- Support the whole publication pipeline, from model output to publication
- Add metadata at the earliest point possible
- Ensure CF compliance
Design 1¶
A CLI tool with multiple commands for different stages in the model’s lifetime
- dmpr post: Post-process immediately after a model run
- dmpr stage: Prepare output for publication, adding metadata from the data management plan and checking metadata compliance
- dmpr publish: Final publication steps, adding DOI and moving to final versioned location (admin only?)
Commands¶
post¶
The post command has one required argument, the model run directory, and can take an arbitray list of files to post-process
After running the post command the input files have been converted to CF-Compliant NetCDF files and moved to a run-specific output directory
A list of newly created files is printed to standard output
stage¶
The stage command has one required argument, the job identifier used by post-processing. If a data management plan identifier is not present in the processed files this identifier is also required
The command updates metadata from the DMP and runs a CF compliance check
After running the stage command if the data files pass a CF-Compliance check they are copied to ua8 under their DMP directory
Implementation¶
cli¶
The command-line interface is created using Click. Options are kept simple in order to make output consistent between users.
The model is identified either by specifying the name or by reading the files
in the run directory, using functions in dmpr.model
.
models¶
Each model has its own class, derived from dmpr.base.Model
. The model must
override two functions, read_configs() and post_impl(), and may optionally
override outfile() to customise the processed file’s name.
read_configs() is passed the run directory, and should read the configuration files held there to set up metadata from the run configuration.
post_impl() is passed the names of the input and output files, and should post-process the input files and write the processed data to the output file.
The base class uses these functions in it’s post() function, which generates the output path, processes the file and then adds DMP metadata
Linking with a DMP is optional, as it may not be created at the time of the model run. A DMP may be linked after post-processing using dmpr stage.
dmp¶
The dmpr.dmp.DMP
class holds data management plan related metadata, read from
the online database. It has an addmeta() function to add metadata it reads
from the database to a file, which gets automatically called by the model’s
post() function.