CoeGSS HPC Job Submission component allows users to submit HPC jobs to the clusters of HPC centers (HPCaaS). HPC Job Submission uses Cloudify orchestration toolchain as a backend. The latter assumes that the user must prepare Cloudify blueprint for any job which is supposed to be submitted. The connection between Cloudify and HPC clusters is established by means of Cloudify HPC plugin.


Table of Contents


Use cases


The system provides access to the job submission services only for authenticated users. Users authenticate the HPC Job Submission via the same credentials they use to access the portal.


Portal users

Authenticated users should use the general CoeGSS portal credentials to get access to the job submission service. In the CoeGSS portal, job submission process includes the following steps common for all Cloudify application:

As soon as, the parallel job is finished, the user will be notified by the system. Afterwards, the user can remove deployment and uninstall blueprint if there are no plans for running HPC job once again.

Note that ordinary user is not permitted to see jobs of another users.


Administrator

Administrator of the job submission services has access to the Cloudify Manager Web Interface. This is a standard Web UI that comes with Cloudify Manager out-of-the-box. Cloudify Manager Web UI provides the administrator with full control of the workflow for all submitted applications. In particular, the administrator can:

More details on Cloudify Web UI can be found here.

Besides Web UI, administrator can control job submission process via Cloudify CLI.


User interaction



Uploading the blueprint

In order to upload the blueprint, the user should switch to the "Blueprints" tab with the list of all available for this user blueprints and push the button "Create new blueprint". It will redirect the user to the Blueprint upload form.


Figure 1: List of blueprints


In this form, the user should specify blueprint ID, blueprint file name, as well as the path to the tar.gz archive which contains blueprint file and another files that blueprint uses such as bootstrapping/reverting scripts, etc.


Figure 2: Blueprint upload form


The blueprint process starts when the user pushes "Upload" button.


Deploying and running the blueprint

In order to deploy the blueprint, the user should switch to the "Deployments" tab with the list deployments and runs made by this user and push the button "Install". It will redirect the user to the Blueprint deployment form, where she/he should specify deployment ID, blueprint ID, and input file for the deployment.


Figure 3: List of deployments


Afterwards deployment can be executed, by pressing the button "Run Jobs". The user can track the status of running jobs by pushing the button "Refresh".


Preparing blueprints


Blueprint files are YAML files written in accordance with OASIS TOSCA standard that describe the execution plans for the lifecycle of the application including installing, starting, terminating, orchestrating, and monitoring steps.

This section presents details of a blueprint which defines a simple HPC job. For a more sophisticated example, we refer to the well-documented blueprint for CoeGSS network reconstruction tool available from here. Furthermore, this github repository contains a number of Cloudify HPC blueprints of different complexity.


Quick start

tosca_definitions_version: cloudify_dsl_1_3

        imports:
            # to speed things up, it is possible to download this file,
            - http://raw.githubusercontent.com/mso4sc/cloudify-hpc-plugin/master/resources/types/cfy_types.yaml
            # HPC pluging
            - http://raw.githubusercontent.com/MSO4SC/cloudify-hpc-plugin/master/plugin.yaml
            - inputs-def.yaml
        
inputs:
            ############################################
            # arguments of the script
            ############################################
            new_file_name:
                description: Name of the file created by the HPC job
                default: "test.txt"
                type: string

            ############################################
            # Details of the application lifecycle
            ############################################
            # First HPC configuration
            coegss_hlrs_hazelhen:
                description: Configuration for the primary HPC to be used
                default: {}

            # Job prefix name
            job_prefix:
                description: Job name prefix in HPCs
                default: 'coegss_sn4sp_'
                type: string

            ############################################
            # Data publishing
            ############################################
            coegss_datacatalogue_entrypoint:
                description: entrypoint of the data catalogue
                default: "https://coegss1.man.poznan.pl"

            coegss_datacatalogue_key:
                description: API Key to publish the outputs
                default: ""

            coegss_output_dataset:
                description: ID of the CKAN output dataset
                default: ""
        

Sample inputs for this example are presented here.

node_templates:
            first_hpc:
                type: hpc.nodes.Compute
                properties:
                    config: { get_input: coegss_hlrs_hazelhen }
                    job_prefix: { get_input: job_prefix }
                    base_dir: "$HOME"
                    workdir_prefix: "cloudify_coegss_"
                    skip_cleanup: True

            single_job:
                type: hpc.nodes.Job
                properties:
                    job_options:
                        type: 'SBATCH'
                        command: "touch.script"
                    deployment:
                        bootstrap: 'scripts/bootstrap_sbatch_example.sh'
                        revert: 'scripts/revert_sbatch_example.sh'
                        inputs:
                            - { get_input: new_file_name }
                    skip_cleanup: True
                    publish:
                          - type: "CKAN"
                            entrypoint: { get_input: coegss_datacatalogue_entrypoint }
                            api_key: { get_input: coegss_datacatalogue_key }
                            dataset: { get_input: coegss_output_dataset }
                            file_path: { get_input: new_file_name }
                            name: "Sampled synthetic population"
                            description: ""
                relationships:
                    - type: job_contained_in_hpc
                      target: first_hpc
        
outputs:
            single_job_name:
                description: single job name in the HPC
                value: { get_attribute: [single_job, job_name] }
        

Blueprint inputs

Blueprint inputs define blueprint parameters with which application should be deployed and executed. The inputs usually specify parameters of the algorithm, as well as parameters for defining the system configuration and blueprint lifecycle. E.g., the above-mentioned blueprint requires the following parameters:

Example of the input file:

new_file_name: "unusual-test.csv"

        coegss_datacatalogue_key: "<some-ckan-api-key>"
        coegss_output_dataset:    "coegss-network-reconstruction-results"

        # HLRS Hazelhen cluster configuration
        coegss_hlrs_hazelhen:
            credentials:
                host: "hazelhen.hww.hlrs.de"
                user: "USERNAME"
                private_key: |
                  -----BEGIN RSA PRIVATE KEY-----
                  -----END RSA PRIVATE KEY-----
                private_key_password: "PRIVATE_KEY_PASSWORD"
                password: "PASSWORD"
                login_shell: true
            country_tz: "Europe/Stuttgart"
            workload_manager: "TORQUE"
        

Note that all the inputs must be initialized in the input file unless the blueprint specifies default values for them.