Using Python

Python is increasingly popular in a scientific environment thanks to its wide selection of well maintained modules or packages. DKRZ supports different approaches for using Python on Mistral HPC system.

Python environments

We provide different precompiled environments on mistral but users can also create their own environments.

System wide python installations

The environment modules for our system wide installations follow the naming convention python3/YYYY.MM-compilerversion. The last one is python3/2021.01-gcc-9.1.0 and can be accessed as follows:

module load python3/2021.01-gcc-9.1.0

Python environments are updated about every half year. The most recent installation is also available under the alias name python3/unstable:

module load python3/unstable

New and updated Python packages will only be installed into the most recent installation. These updates can break your existing scripts based on python3/unstable. To avoid this possibility use an older (stable) installation.

Individual miniconda installation

For full control over every aspect of your Python environment, miniconda is a powerful option. Unfortunately, recent conda versions do not support the mistral operating system software environment (RHEL6) anymore.

For the above reasons, we recommend to create own conda environments using the conda command from the DKRZ-supplied miniforge (conda-forge driven minimal conda installer) installation. It is available by loading the python3/unstable module. One should then stick to older conda packages still working with the older system glibc library on mistral. When a package that needs a newer version of glibc happens to be installed, an error message like the following results:

/lib64/libc.so.6: version `GLIBC_2.14' not found

The operating system of Mistral successor system Levante will be fully supported by conda.

Basic conda commands

Create environment named my_env and install jupyter package into it:

conda create --name my_env jupyter

Show available environments:

conda info --env

Activate an environment:

conda activate my_env

Show installed conda packages:

conda list

Install additional package(s):

conda install numpy

More information on conda can be found in the user guide.

Working with Jupyter Notebook

Jupyter Notebook is a powerful web-application which allows you to work with live python code and other content. Running a Jupyter Notebook on mistral is even more useful as it gives you access to all data on the file system and the parallel processing power of a super computer.

There are several options how to use Jupyter Notebook on mistral. You will have to decide which way fits best your requirements regarding supported clients and flexibility.

JupyterHub

JupyterHub is a multi-user server for Jupyter notebooks that allows to execute the notebook directly on the DKRZ HPC system Mistral. It therefore also supports the execution of parallel computation.

JupyterHub is available at https://jupyterhub.dkrz.de for all DKRZ users who have access to Mistral and who are allowed to submit batch jobs.

You can use your mobile device or the workstation in your office. Windows or MacOS or Linux all just work.For more information see our JupyterHub documentation.

Direct Jupyter Notebook

If you run Jupyter Notebook on your local machine it will launch a web browser and you can immediately start working. Doing the same thing on mistral is not so straightforward because your local browser cannot directly connect to Jupyter Notebook on mistral. Furthermore, you may want to run Jupyter Notebook on dedicated resources inside a job.

To relieve you from having to set up ssh tunnels and write job scripts for Jupyter Notebook, we packaged everything in a simple bash script which you have to run on your local computer. It should work in most Linux environments, on macOS, and even on Windows 10 (see below).

Download the script here and add execute mode:

chmod a+x start-jupyter

Using jupyter on mistralpp

The most simple way to run it is without any options

./start-jupyter

This assumes that your username on mistral is the same as your local username. The environment module pyhon3/unstable will be loaded.

The script then starts jupyter on a mistralpp node and connects your local browser. Since it is running on an interactive node, no compute time is charged to your project’s account but you have to share the node with many other users.

Suppose you have to specify your username on mistral and you need to load a different module before starting jupyter. Create a file to load the module. You can name it as you want and store it anywhere on mistral.

mlogin100:~$ cat jupyter_preload
module load python3/2021.01-gcc-9.1.0

You then reference the file with the -i option. Any relative path starts in your home directory. An absolute path is also possible.

./start-jupyter -u u202020 -i jupyter_preload

To use your own python environment you could put something like this into your include file:

source .bashrc
conda activate my_env

Jupyter can’t just run Notebook, it can also run Lab if that is installed in your environment. Use the option -c lab.

Running jupyter in a job

If you need dedicated resources like many compute cores and large amounts of memory, it is better to run jupyter in a job. You don’t have to write the job script yourself because start-jupyter will generate one on the fly.

./start-jupyter -u u202020 -i jupyter_preload -A bb1999

This will run jupyter on the shared partition with one task. You can request more tasks with the -n option or even use a different partition with the -p option.

Troubleshooting

Hanging script

The script may at times seem to hang because it is waiting for a job to start or for a python environment to activate. This will resolve itself after a while. However, it also may be hanging because jupyter couldn’t be found with your configuration. This is usually not checked for performance reasons. If you are in doubt whether your set up is correct, then start the script with the -d option:

./start-jupyter -i some_setup -d
Looking for Jupyter
which: no jupyter in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin)

Another option is to look into jupyter’s output. This is redirected into a file ~/.jupyter/jupyter.XXXXX on mistral with random characters instead of Xs.

Orphaned jobs

When you terminate the script with Ctrl-C, it should kill all spawned processes and slurm jobs. If the script didn’t get the chance to do that because the ssh connection was broken or it got killed -9, you may have to cancel the job by yourself. Log into mistral and run

squeue -u u202020

With your own username. If you see any jobs named Jupyter, you can cancel them with the Slurm scancel command.

I have to use Windows 10

Install WSL (Windows Subsystem for Linux). The script should work with any Linux distribution on WSL but last time we checked, it had problems with WSL1 and Ubuntu while it worked with openSUSE. WSL2 worked with both distributions. Run the script as follows from PowerShell or cmd:

wsl bash ./start-jupyter

Security considerations

Anyone who has access to your Jupyter Notebook server has full control over your user account. It is therefore important to secure your server. Any version of Jupyter starting with 4.3 should be secure by default. For more information on this topic visit the documentation.

Security in the Jupyter notebook server