Known issues#

This is a collection of current soft- and hardware problems and limitations on Levante. Where possible, status and workarounds are given.

Current Issues#

Slurm - Connection issues between slurmd and slurmctld leading to job loss#

User jobs are sometimes cancelled or aborted by Slurm due to internal communication timeout errors. This might be fixed with a Slurm update planned for mid-February.

Heterogeneous jobs do not work properly#

Currently, startup of so-called hetjobs (see Slurm documentation) is not working reliably. We plan to make an update of Slurm to solve these issues.

Resolved Issues#

The following blog entries describe resolved issues on Levante.

  • Bus error in jobs on February 11, 2022

    Update 2022-06-14: The problem was solved by an update of the Lustre-client by our storage vendor. The workaround described below should no longer be necessary. If one of your jobs runs into a bus error, please let us know.

    When running jobs on Levante, these sometimes fail with a Bus error, similar to the example below:

  • missing on April 22, 2022

    Update 2022-10-17: This problem should be fixed with our current software stack. The workaround is not required any longer.

    When trying to start a gui application like gvim, you get an error message:

