March 24, 2021

REANA 0.7.3 is released

We are happy to announce REANA 0.7.3. This minor bug fix release adds new reana-client validation features for workflow input parameters and environment images, improves cluster resilience in case of job failures, and fixes HTCondor and Slurm integration for complex job commands.

What’s new?

Validating workflow parameters

Computational workflows often contain tens of input parameters. It may be hard to keep track of which parameters are being used in the multiple steps a workflow has. It would be a tedious task to verify manually that all the parameters referenced in the workflow steps are properly defined as part of the workflow input parameters.

With the release of reana-client 0.7.3, we are introducing a more advanced validation that performs the input parameter validation automatically. Let us illustrate how this new feature works with a simple example.

Given a simple Serial workflow reana.yaml:

version: 0.7.3
inputs:
  files:
    - code/myanalysis.py
  parameters:
    script_path: code/myanalysis.py
workflow:
  type: serial
  specification:
    steps:
      - name: run-script
        environment: 'python:3.9-slim'
        commands:
          - python "${script_path}"

Let us call the validator:

$ reana-client validate -f reana.yaml
==> Verifying REANA specification file... my-analysis/reana.yaml
  -> SUCCESS: Valid REANA specification file.
==> Verifying workflow parameters and commands...
  -> SUCCESS: Workflow parameters and commands appear valid.

The output displayed is successful, meaning that our REANA specification is properly built and there were no issues found in parameters and commands.

Imagine that myanalysis.py can receive a second argument. We define this as a new input parameter sleeptime, but we forget to pass it to the actual Python command:

     - code/myanalysis.py
   parameters:
     script_path: code/myanalysis.py
+    sleeptime: 10
 workflow:
   type: serial
   specification:

Let us verify our REANA specification again:

$ reana-client validate -f reana.yaml
==> Verifying REANA specification file... my-analysis/reana.yaml
  -> SUCCESS: Valid REANA specification file.
==> Verifying workflow parameters and commands...
  -> WARNING: REANA input parameter "sleeptime" does not seem to be used.

The specification remains valid, as it is well-formed, however, a warning is displayed indicating that sleeptime parameter is defined but it does not seem to be used in the workflow specification.

Let us fix this problem by adding this unused parameter to the command, but imagine that we make a typo when adding it and we write sleptime instead of sleeptime:

       - name: run-script
         environment: 'python:3.9-slim'
         commands:
-          - python "${script_path}"
+          - python "${script_path}" "${sleptime}"

Let us call the validator to see the output:

$ reana-client validate -f reana.yaml
==> Verifying REANA specification file... my-analysis/reana.yaml
  -> SUCCESS: Valid REANA specification file.
==> Verifying workflow parameters and commands...
  -> WARNING: REANA input parameter "sleeptime" does not seem to be used.
  -> WARNING: Serial parameter "sleptime" found on step "run-script" is not defined in input parameters.

Due to our typo, sleptime appears as a new parameter. It is present in the workflow commands of step run-script but it was not defined. Let us fix the typo and call the validator again:

     - code/myanalysis.py
   parameters:
     script_path: code/myanalysis.py
+    sleeptime: 10
 workflow:
   type: serial
   specification:
     steps:
       - name: run-script
         environment: 'python:3.9-slim'
         commands:
-          - python "${script_path}" "${sleptime}"
+          - python "${script_path}" "${sleeptime}"
$ reana-client validate -f reana.yaml
==> Verifying REANA specification file... my-analysis/reana.yaml
  -> SUCCESS: Valid REANA specification file.
==> Verifying workflow parameters and commands...
  -> SUCCESS: Workflow parameters and commands appear valid.

Now everything is settled, the warnings disappear as the new parameter is defined and used properly.

Note that input workflow parameter validation is also implemented for CWL and Yadage workflows.

Verifying potentially dangerous workflow commands

The workflow may execute certain commands or operations that are potentially conflicting with REANA platform’s way of executing workflows. One such example is trying to run commands as superuser (sudo). REANA runs workflows under regular user identity for security reasons. If a workflow uses sudo in its commands, it often happened that the workflow failed after many hours of execution. The new release of reana-client alerts us about these possible problems at the time of workflow submission already.

Let us modify our previous example to illustrate how this works:

       - name: run-script
         environment: 'python:3.9-slim'
         commands:
-          - python "${script_path}" "${sleeptime}"
+          - sudo python "${script_path}" "${sleeptime}"

Let us run the validator:

$ reana-client validate -f reana.yaml
==> Verifying REANA specification file... my-analysis/reana.yaml
  -> SUCCESS: Valid REANA specification file.
==> Verifying workflow parameters and commands...
  -> WARNING: Operation "sudo" found in step "run-script" might be dangerous.
  -> SUCCESS: Workflow parameters and commands appear valid.

The output shows a warning message indicating the dangerous operation and in which step was found.

The reana-client validate command may thus save you development time by providing early warnings about these possibly dangerous operations that would be encountered only during later runtime.

Validating workflow environment images

REANA 0.7.3 command line client introduces a new option for the reana-client validate command, called --environments. This option will trigger the possibly-lengthy validation of workflow environment images, hence it is not done by default. The environment image validation allows you to ensure image existence, image tag, or image user and group ID settings compatibility for your workflow.

Workflows run in containerised environments that can be precisely captured for preservation by means of using tagged images. We encourage users to tag their images when running analyses as this will ensure reproducibility in the future, one of the pillars of FAIR principles.

Validating environment image existence

Besides, it is important to verify the existence of these environment images, both locally and remotely (Docker Hub and GitLab registry), to ensure that the REANA cluster can pull them with no issues.

Let us use the previous workflow example to depict this functionality. In that workflow, the environment used was python:3.9-slim, which is properly tagged and exists in Docker Hub. We are going to modify it and use a non-existing image instead, for example foo:bar.

   specification:
     steps:
       - name: run-script
-        environment: 'python:3.9-slim'
+        environment: 'foo:bar'
         commands:
           - python "${script_path}" "${sleeptime}"

Let us call the validator, passing the environments option:

$ reana-client validate -f reana.yaml --environments
==> Verifying REANA specification file... my-analysis/reana.yaml
  -> SUCCESS: Valid REANA specification file.
==> Verifying workflow parameters and commands...
  -> SUCCESS: Workflow parameters and commands appear valid.
==> Verifying environments in REANA specification file...
  -> SUCCESS: Environment image foo:bar has the correct format.
  -> WARNING: Environment image foo:bar does not exist locally.
  -> WARNING: Environment image foo:bar does not exist in Docker Hub: "Resource not found"
  -> ERROR: Environment image foo:bar does not exist locally or remotely.

Let us analyse the environment validation output. First, we get a success message informing us that the image has the correct format. This is because the step specifies the image name and the tag, so in that regard it is correct. Then we get two warnings telling us that the image was not found either locally nor in the Docker Hub registry. As a consequence, we see the validation with an error message, telling us that the image does not exist, and the validation fails.

Validating environment image tag

Let us change the environment to use a valid one. We are going to revert back to the python image but using the latest tag instead:

   specification:
     steps:
       - name: run-script
-        environment: 'foo:bar'
+        environment: 'python:latest'
         commands:
           - python "${script_path}" "${sleeptime}"

The use of latest tags is usually discouraged because of its “moving target” nature: a latest image might be different today, tomorrow, next week or next year. We always recommend to used tagged images in order to ensure the computational reproducibility of results.

Let us run the validator again:

$ reana-client validate -f reana.yaml --environments
==> Verifying REANA specification file... my-analysis/reana.yaml
  -> SUCCESS: Valid REANA specification file.
==> Verifying workflow parameters and commands...
  -> SUCCESS: Workflow parameters and commands appear valid.
==> Verifying environments in REANA specification file...
  -> WARNING: Using 'latest' tag is not recommended in python environment image.
  -> WARNING: Environment image python:latest does not exist locally.
  -> SUCCESS: Environment image python:latest exists in Docker Hub.
  -> WARNING: UID/GIDs validation skipped, specify `--pull` to enable it.

As part of the environment verification, we get a message warning us about the usage of the latest tag. Additionally, we see two checks to verify if the image exists locally and remotely. In this particular case, since we do not have the python:latest image pulled locally, a warning is displayed. On the other hand, the python:latest image exists in Docker Hub registry, this we get a success message.

Validating environment image user and group ID

The reproducible workflows should ideally not depend on the given user who executes them. The workflow should give the same result regardless of user identity, such as UID (user ID) and GID (group ID) known from Unix systems.

For security reasons, REANA executes workflows as UID=1000 (see here), and expects GID=0 to be able to share filesystem write rights across multiple nodes (see here) . This technicalities usually don’t matter. However, in certain cases, you may need to execute your workflow under a different user identity, for example because the environment image expects certain user privileges. REANA allows this by setting kubernetes_uid workflow hint. However, it can happen that workflow image user ID expectations are conflicting with the declared UID, which leads to conflicts at the workflow execution time.

REANA 0.7.3 validation of environments allows to catch these situations early by inspecting the workflow specification and the container images. In order to be able to do such inspection, you must have a running docker on the machine were you run reana-client. The environment image has to be pulled locally, which also consumes disk space. The validation of image UID and GID is therefore triggered by another command line option called --pull. Please use this option only if you are used to working with docker images locally.

Let us illustrate how the enviroment image UID and GID validation works by rerunning our past example:

   specification:
     steps:
       - name: run-script
-        environment: 'python:latest'
+        environment: 'python:3.9-slim'
         commands:
           - python "${script_path}" "${sleeptime}"
$ reana-client validate -f reana.yaml --environments --pull
==> Verifying REANA specification file... my-analysis/reana.yaml
  -> SUCCESS: Valid REANA specification file.
==> Verifying environments in REANA specification file...
  -> SUCCESS: Environment image python:3.9-slim has the correct format.
  -> WARNING: Environment image python:3.9-slim does not exist locally.
  -> SUCCESS: Environment image python:3.9-slim exists in Docker Hub.
Unable to find image 'python:3.9-slim' locally
3.9-slim: Pulling from library/python
6f28985ad184: Pulling fs layer
...
Status: Downloaded newer image for python:3.9-slim
  -> INFO: Environment image uses UID 0 but will run as UID 1000.

We can see that the image was pulled locally and there is a new message about UID check. In this case, it is just a warning that REANA uses UID=1000 by default, and that this image uses UID=0. This does not necessarily mean that the execution is going to fail, provided that the environment image does not make any assumption on user identity. The python images don’t, so the execution will succeed. If you are using an image that does require to run processes under certain user identity, we recommend that you use 1000 which is the usual default Unix user.

The workflow environment image validation is also compatible with images hosted in the GitLab registry at CERN, fo example gitlab-registry.cern.ch/johndoe/foo. Please note that if your image is protected, you would have to authenticate via docker login first so that the validator would be able to fetch it on the machine where you are executing the validation.

What’s improved?

REANA 0.7.3 release brings two minor improvements to cluster.

You may have seen a situation where a workflow failed, but you have still seen it reported as running in your workflow list. This appeared because of the miscommunication of internal REANA componets about the workflow status. REANA 0.7.3 improves the cluster resilience in these situations by amending the job status consumer to capture the exceptional situations.

Finally, if you have been using HTCondor or Slurm backends for your workflows, REANA 0.7.3 cluster improves the job dispatching in case of complex inline Yadage commands. Also, the job scheduling errors from remote HTCondor platforms are better caught and reported in the usual workflow logs.

Please give new features a try and let us know what you think!

See also:

© CERN 2023