Configuration¶
The Mistral HPC system at DKRZ was installed in two stages. The Mistral phase 1 system was brought into operation in July 2015 and consists of about 1,570 nodes. The compute nodes are housed in bullx DLC (Direct Liquid Cooling) B700 blade systems with two nodes forming one blade. Each node has two sockets, equipped with an Intel Xeon E5-2680 v3 12-core processor (Haswell) sharing 30 MiB L3 cache each. The processor clock-rate is 2.5 GHz. The Mistral phase 2 system is operational since July 2016 and adds another 1,770 nodes. The phase 2 nodes differ from those of phase 1 in the CPU type. The new nodes use 2 Intel Xeon E5-2695 v4 (Broadwell) CPUs running at 2.1 GHz, and each socket has 18 cores and 45MiB L3 cache. Thus, 24 physical cores per node are available on phase 1 and 36 on phase 2 respectively. Due to active Hyper-Threading, the operating system recognizes two threads (logical cpus) per physical core. The aggregated main memory of the whole system is about 266 TB. The parallel file system Lustre provides 54 PB of usable disk space. The theoretical peak performance of the system is 3.59 PFLOPS/s. The LINPACK performance is about 3.01 PFLOPS/s.
The Operating System on the Mistral cluster is Red Hat Enterprise Linux release 6.10 (Santiago). The batch system and workload manager is SLURM.
Node Types¶
Different kinds of nodes are available to users: 7 login nodes, 5 nodes for interactive data processing and analysis, approx. 3.300 compute nodes for running scientific models, 43 fat memory nodes for pre- and postprocessing of data, and 21 nodes for running advanced visualization or GPGPU applications. The following table lists the specifics of different node types.
Node type |
Number of nodes |
Hostname |
Processors |
GPGPUs |
Number of cores (logical CPUs) |
Main Memory |
Feature |
---|---|---|---|---|---|---|---|
login / interactive prepost |
7 / 5 |
mlogin[100-105], mlogin108 / mistralpp[1-5] |
2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz |
none |
24 (48) |
256 GB |
|
shared |
36 |
m[10000-10017],
m[11296-11313]
|
2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz |
none |
24 (48) |
64 GB |
64G |
compute |
1368 |
m[10018-11295,
11314-11367, 11404-11411,
11413-11420,
11544,11553,
11560-11577]
|
2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz |
none |
24 (48) |
64 GB |
64G |
compute (large memory) |
110 |
m[11368-11403], m11440-11511], m[11545,11554] |
2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz |
none |
24 (48) |
128 GB |
128G |
compute (fat memory) |
38 |
m[11412, 11512-11543, 11555-11559 |
2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz |
none |
24 (48) |
256 GB |
256G |
compute2 |
1420 |
m[20000-21115],
m[21434-21577],
m[21607-21766]
|
2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz |
none |
36 (72) |
64 GB |
64G |
compute2 (large memory) |
270 |
m[21116-21385] |
2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz |
none |
36 (72) |
128 GB |
128G |
compute2 (fat memory) |
74 |
m[21386-21417,
21420,
21424-21433,
21578-21590,
21593-21606,
21767-21770]
|
2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz |
none |
36 (72) |
256 GB |
256G |
prepost |
43 |
m[11412,11422, 11512-11543, 11546-11549, 11555-11559] |
2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz |
none |
24 (48) |
256 GB |
256G |
vis / gpgpu |
12 |
mg[100-111] |
2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz |
2x Nvidia Tesla K80, each with 2x GK210GL |
24 (48) |
256 GB |
k80 256G |
“ |
4 |
mg[200-203] |
2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz |
2x Nvidia Tesla M40 with GM200GL |
36 (72) |
512 GB |
m40 512G |
“ |
1 |
mg204 |
2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz |
2x Nvidia Tesla M40 with GM200GL |
36 (72) |
1024 GB |
m40 1024G |
“ |
1 |
mg205 |
2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz |
2x Nvidia Quadro M6000 with GM200GL |
36 (72) |
512 GB |
m6000 512G |
“ |
2 |
mg[206-207] |
2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz |
1x Nvidia Tesla V100 |
36 (72) |
512 GB |
v100 512G |
“ |
1 |
mg208 |
2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz |
2x Nvidia Quadro M6000 with GM200GL |
36 (72) |
1024 GB |
m6000 1024G |
Interconnect¶
All compute, pre-/postprocessing, and visualization nodes are integrated in one FDR InfinBand (IB) fabric with three Mellanox SX6536 director switches and fat tree topology with a blocking factor of 1:2:2. The measured bandwidth between two arbitrary compute nodes is 5.9 GByte/s with a latency of 2.7 μs. A scheme of the Infiniband topology is given in the picture below, illustrating the blocking factors depending on which nodes are used for a specific job.
Energy Consumption¶
Mistral is one of the main contributors to DKRZ’s energy consumption. Below,estimates of some key parameters of the HLRE-3 infrastructure energy use are listed:
8.8 GWh per year for compute nodes of Mistral without cooling and line losses (average electrical power consumption 1000 kW)
1.2 GWh per year for hard disk storage of Mistral without cooling and line losses (average electrical power consumption 150 kW)
0.3 GWh per year for the HSM system without cooling and line losses (average electrical power consumption 35 kW)
PUE Mistral (incl. hard disk storage): 1.09
PUE total data centre: 1.17
3 GWh waste heat utilisation per year (30%)
In total, the HPC system Mistral and the data tape archive consume about 11 GWh/year, of which 3 GWh/year are effectively used as heat. With 365 days and 24-hour operation, this corresponds to a power consumption of approx. 1.3 MW.
For all the energy consumed at DKRZ, there are certificates of origin from plants that produce renewable energy (in our case, mainly hydroelectric plants in Norway), so that no CO2 is released into the atmosphere by running the computers. However, one must honestly admit that this view is not uncontroversial.