Machine Learning on Mistral¶
How to get started with ML on Mistral¶
Because most machine learning Python packages feature frequent new releases and due to the large variety of packages required, we recommend to set up a custom conda environment where you can then install the packages and specific versions required.
In terms of conda packages, we can recommend the following:
In general, our experience shows that pytorch is, in the long-term, more flexible than keras/tensorflow, despite an initially steeper learning curve. Indeed, the learning curve can be further flattened by using pytorch-lightning, which takes care of significant parts of the boilerplate code required to set up a pytorch training loop.
To make use of GPUs on Mistral with pytorch, it is often easiest to install the pytorch-gpu package instead of the regular pytorch, torchvision, cudatoolkit etc. packages, and thereby circumvent potential issues with outdated native dependencies from the Mistral default environment. This works well with Python 3.8.
Tailoring ANN training for batch scripts¶
usage: Model.py [-h] [-d DATA] [-e EPOCHS] [-p PATIENCE] [-bs BATCH_SIZE]
[-lr LEARNING_RATE] [-wd WEIGHT_DECAY] [-optim {Adam,SGD}]
[-nni] [--n-layers {1,2,3}]
[--units1 UNITS1] [--dropout1 DROPOUT1]
[--units2 UNITS1] [--dropout2 DROPOUT1]
[--units3 UNITS1] [--dropout3 DROPOUT1]
optional arguments:
-h Print help and exit
-d DATA Input data
-e EPOCHS Maximum number of epochs to train
-p PATIENCE Stop training prematurely if results do not improve for n epochs
-bs BATCH_SIZE Batch size
-lr LEARNING_RATE Learning rate
-wd WEIGHT_DECAY Weight decay
-optim {Adam,SGD} Optimizer
-nni Special flag to use if script is run via NNI
--n-layers {1,2,3}
--units1 UNITS1
--dropout1 DROPOUT1
--units2 UNITS2
--dropout2 DROPOUT2
--units3 UNITS3
--dropout3 DROPOUT3
This could be easily extended to, e.g., cover not only testing with an architecture with different dense and dropout layers, but also with/without convolutional or LSTM layers and different parameters for them.
Automating training and hyperparameter tuning using NNI¶
pip install nni
On Mistral, you are able to access the web interface if you are using NNI on a GPU node and are inside the DKRZ network (e.g. via VPN or by being in a DKRZ building). The server address and port are displayed when an experiment is started.