This is a collection of current soft- and hardware problems and limitations on Levante. Where possible, status and workarounds are given.
Slurm - Connection issues between slurmd and slurmctld leading to job loss#
User jobs are sometimes cancelled or aborted by Slurm due to internal communication timeout errors. This might be fixed with a Slurm update planned for mid-February.
Heterogeneous jobs do not work properly#
Currently, startup of so-called hetjobs (see Slurm documentation) is not working reliably. We plan to make an update of Slurm to solve these issues.
The following blog entries describe resolved issues on Levante.
Bus error in jobs on February 11, 2022
Update 2022-06-14: The problem was solved by an update of the Lustre-client by our storage vendor. The workaround described below should no longer be necessary. If one of your jobs runs into a bus error, please let us know.
When running jobs on Levante, these sometimes fail with a Bus error, similar to the example below:
libGL.so.1 missing on April 22, 2022
Update 2022-10-17: This problem should be fixed with our current software stack. The workaround is not required any longer.
When trying to start a gui application like gvim, you get an error message: