Posted in 2017

Why does my job wait so long before being executed? or: Why is my job being overtaken by other jobs in the queue?

There are several possible reasons for to be queued for a long time and/or to be overtaken …

… later submitted jobs with a higher priority (usually these have used less of their share then your job).

Read more ...


When will my SLURM job start?

The SLURM squeue command with the options - -start and -j provides an estimate for the job start time:

Read more ...


How to use modules in batch scripts

The module environment is only available if the according module command was defined for the current shell. If you are using different shell as login shell and for job batch scripts (e.g. tcsh as login shell and your job scripts start with #!/bin/bash), you need to source one of the following files in your script before any invocation of the module command:

Read more ...


How to use SSHFS to mount remote lustre filesystem over SSH

In order to interact with directories and files located on the lustre filesystem, users can mount the remote filesystem via SSHFS (SSH Filesystem) over a normal ssh connection.

SSHFS is Linux based software that needs to be installed on your local computer. On Ubuntu and Debian based systems it can be installed through apt-get. On Mac OSX you can install SHFS - you will need to download FUSE and SSHFS from the osxfuse site. On Windows you will need to grab the latest win-sshfs package from the google code repository or use an alternative approach like WinSCP.

Read more ...


How to improve interactive performance of MATLAB

When using ssh X11-Forwarding (options -X or -Y), matlab can be slow to start and also have slow response to interactive use. This is because X11 sends many small packets over the network, often awaiting a response before continuing. This interacts unfavorably with medium or even higher latency connections, i.e. WiFi. When starting matlab on mistralpp nodes, another disturbing factor is overloading of these nodes.

But mistral has means to eliminate both of these issues: GPU nodes provide exclusive resources and allow for starting a remote desktop session that does not suffer from network latencies. Furthermore, the program execution can benefit from the hardware acceleration for 3D-plots or other graphics-intensive matlab sessions.

Read more ...


How to Write a shell alias or function for quick login to a node managed by SLURM

For tasks better run in a dedicated but interactive fashion, it might be advantageous to save the repeating pattern of reserving resources and starting a new associated shell in an alias or function, as explained below.

If you use bash as default shell you can place the following alias definition in your ~/.bashrc file and source this file in the ~/.bash_profile or in the ~/.profile file:

Read more ...


How to View detailed job information when the job is already running

Once your batch job started execution (i.e. is in RUNNING state) your job script is copied to the slurm admin nodes and kept until the jobs finalizes - this prevents problems that might occur if the job script gets modified while the job is running. As a side-effect you can delete the job script without interfering the execution of the job.

If you accidentally removed or modified the job script of a running job, you can use the following command to query for the script that is actually used for executing the job:

Read more ...


How to Set the default SLURM project account

On Mistral, specification of the project account (via option -A or –account) is necessary to submit a job or make a job allocation, otherwise your request will be rejected. To set the default project account you can use the following SLURM input environment variables

SLURM_ACCOUNT   - interpreted by srun command

Read more ...


How can I see on which nodes my job was running?

Yon can use the SLURM sacct command with the following options:

Read more ...


How can I run a short MPI job using up to 4 nodes?

You can use SLURM Quality of Service (QOS) express by inserting the following line into your job script:

or using the option –qos with the sbatch command:

Read more ...


How can I login to mistral, change my password and login shell?

Login to the system via:

Change your password and/or login shell via  DKRZ  online

Read more ...


How can I get a stack trace if my program crashes?

The classical approach to find the location where your program crashed is to run it in a debugger or inspect a core file with the debugger. A quick way to get the stack trace without the need for a debugger is to compile your program with the following options:

In case of segment violation during execution of the program, detailed information on the location of the problem (call stack trace with routine names and line numbers) will be provided:

Read more ...


How can I choose which account to use, if I am subscribed to more than one project?

Just insert the following line into your job script:

There is no default project account on mistral.

Read more ...


How can I check my disk space usage?

Your individual disk space usage in HOME and SCRATCH areas as well as project quota in the WORK data space can be checked in DKRZ online portal. The numbers are updated daily.

Read more ...


How can I access my Lustre data from outside DKRZ/ZMAW?

For data transfer you can use either sftp:

or  rsync command:

Read more ...


Can I run cron jobs on Mistral?

For system administration reasons users are not allowed to shedule and execute periodic jobs on Mistral using the cron utility. Our recommendation is to use the functionality provided by the workload manager SLURM for this purpose. With the option –begin of the sbatch command you can postpone the execution of your jobs until the specified time. For example, to run a job every day after 12 pm you can use the following job script re-submitting itself at the beginning of the execution:

A variety of different date and time specifications is possible with the –begin option, for example: now+1hour, midnight, noon, teatime, YYYY-MM-DD[Thh:mm:ss], 7AM, 6PM etc. For more details see manual pages of the sbatch command:

Read more ...


Is a FTP client available on mistral?

LFTP is installed on mistral for download and upload of files from/to an external server via File Transfer Protocol (FTP):

The user name for authentication can be provided via option ‘-u’ or ‘–user’, for example:

Read more ...