Configuration

The Mistral HPC system at DKRZ was installed in two stages. The Mistral phase 1 system was brought into operation in July 2015 and consists of about 1,570 nodes. The compute nodes are housed in bullx DLC (Direct Liquid Cooling) B700 blade systems with two nodes forming one blade. Each node has two sockets, equipped with an Intel Xeon E5-2680 v3 12-core processor (Haswell) sharing 30 MiB L3 cache each. The processor clock-rate is 2.5 GHz. The Mistral phase 2 system is operational since July 2016 and adds another 1,770 nodes. The phase 2 nodes differ from those of phase 1 in the CPU type. The new nodes use 2 Intel Xeon E5-2695 v4 (Broadwell) CPUs running at 2.1 GHz, and each socket has 18 cores and 45MiB L3 cache. Thus, 24 physical cores per node are available on phase 1 and 36 on phase 2 respectively. Due to active Hyper-Threading, the operating system recognizes two threads (logical cpus) per physical core. The aggregated main memory of the whole system is about 266 TB. The parallel file system Lustre provides 54 PB of usable disk space. The theoretical peak performance of the system is 3.59 PFLOPS/s. The LINPACK performance is about 3.01 PFLOPS/s.

The Operating System on the Mistral cluster is Red Hat Enterprise Linux release 6.10 (Santiago). The batch system and workload manager is SLURM.

Node Types

Different kinds of nodes are available to users: 7 login nodes, 5 nodes for interactive data processing and analysis, approx. 3.300 compute nodes for running scientific models, 43 fat memory nodes for pre- and postprocessing of data, and 21 nodes for running advanced visualization or GPGPU applications. The following table lists the specifics of different node types.

Node type

Number of nodes

Hostname

Processors

GPGPUs

Number of cores (logical CPUs)

Main Memory

Feature

login / interactive prepost

7 / 5

mlogin[100-105], mlogin108 /

mistralpp[1-5]

2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz

none

24 (48)

256 GB

shared

36

m[10000-10017],
m[11296-11313]

2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz

none

24 (48)

64 GB

64G

compute

1368

m[10018-11295,
11314-11367, 11404-11411,
11413-11420,
11544,11553,
11560-11577]

2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz

none

24 (48)

64 GB

64G

compute (large memory)

110

m[11368-11403], m11440-11511], m[11545,11554]

2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz

none

24 (48)

128 GB

128G

compute (fat memory)

38

m[11412, 11512-11543, 11555-11559

2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz

none

24 (48)

256 GB

256G

compute2

1420

m[20000-21115],
m[21434-21577],
m[21607-21766]

2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz

none

36 (72)

64 GB

64G

compute2 (large memory)

270

m[21116-21385]

2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz

none

36 (72)

128 GB

128G

compute2 (fat memory)

74

m[21386-21417,
21420,
21424-21433,
21578-21590,
21593-21606,
21767-21770]

2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz

none

36 (72)

256 GB

256G

prepost

43

m[11412,11422, 11512-11543, 11546-11549, 11555-11559]

2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz

none

24 (48)

256 GB

256G

vis / gpgpu

12

mg[100-111]

2x 12-core Intel Xeon E5-2680 v3 (Haswell) @ 2.5GHz

2x Nvidia Tesla K80, each with 2x GK210GL

24 (48)

256 GB

k80 256G

4

mg[200-203]

2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz

2x Nvidia Tesla M40 with GM200GL

36 (72)

512 GB

m40 512G

1

mg204

2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz

2x Nvidia Tesla M40  with GM200GL

36 (72)

1024 GB

m40 1024G

1

mg205

2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz

2x Nvidia Quadro M6000 with GM200GL

36 (72)

512 GB

m6000 512G

2

mg[206-207]

2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz

1x Nvidia Tesla V100

36 (72)

512 GB

v100 512G

1

mg208

2x 18-core Intel Xeon E5-2695 v4 (Broadwell) @ 2.1GHz

2x Nvidia Quadro M6000 with GM200GL

36 (72)

1024 GB

m6000 1024G

Interconnect

All compute, pre-/postprocessing, and visualization nodes are integrated in one FDR InfinBand (IB) fabric with three Mellanox SX6536 director switches and fat tree topology with a blocking factor of 1:2:2. The measured bandwidth between two arbitrary compute nodes is 5.9 GByte/s with a latency of 2.7 μs. A scheme of the Infiniband topology is given in the picture below, illustrating the blocking factors depending on which nodes are used for a specific job.

mistral_infiniband_topologie

Energy Consumption

Mistral is one of the main contributors to DKRZ’s energy consumption. Below,estimates of some key parameters of the HLRE-3 infrastructure energy use are listed:

  • 8.8 GWh per year for compute nodes of Mistral without cooling and line losses (average electrical power consumption 1000 kW)

  • 1.2 GWh per year for hard disk storage of Mistral without cooling and line losses (average electrical power consumption 150 kW)

  • 0.3 GWh per year for the HSM system without cooling and line losses (average electrical power consumption 35 kW)

  • PUE Mistral (incl. hard disk storage): 1.09

  • PUE total data centre: 1.17

  • 3 GWh waste heat utilisation per year (30%)

In total, the HPC system Mistral and the data tape archive consume about 11 GWh/year, of which 3 GWh/year are effectively used as heat. With 365 days and 24-hour operation, this corresponds to a power consumption of approx. 1.3 MW.

For all the energy consumed at DKRZ, there are certificates of origin from plants that produce renewable energy (in our case, mainly hydroelectric plants in Norway), so that no CO2 is released into the atmosphere by running the computers. However, one must honestly admit that this view is not uncontroversial.