Changes for page Slurm

Last modified by Thomas Coelho (local) on 2023/08/28 15:17

From version 5.1
edited by Thomas Coelho
on 2022/12/08 11:05
Change comment: There is no comment for this version
To version 3.1
edited by Thomas Coelho
on 2022/10/17 11:17
Change comment: There is no comment for this version

Summary

Details

Page properties
Syntax
... ... @@ -1,1 +1,1 @@
1 -XWiki 2.1
1 +MediaWiki 1.6
Content
... ... @@ -3,11 +3,8 @@
3 3  Slurm is fully integrated in our system. You do not need set any environment variables.
4 4  
5 5  
6 +== Partitions ==
6 6  
7 -{{toc/}}
8 -
9 -== Partitions ==
10 -
11 11  A partition is a subset of the cluster, a bundle of compute nodes with the same characteristics.
12 12  
13 13  Based on access restrictions our cluster is divided in different partitions. 'sinfo' will only show partitions you are allowed to use. Using 'sinfo -a' shows all partitons.
... ... @@ -14,28 +14,114 @@
14 14  
15 15  A partition is selected by '-p PARTITIONNAME'.
16 16  
17 -|=**Partition** |=**No. Nodes** |=**Cores/M** |=**Tot. Cores**|=**RAM/GB** |=**CPU** |=**Remark/Restriction**
18 -|itp |10|20 |200|64 |Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz|Common Usage
19 -|dfg-big|3|32|96|128|8-Core AMD Opteron(tm) Processor 6128|Group Valenti
20 -|dfg-big|3|48|144|128/256|12-Core AMD Opteron(tm) Processor 6168|Group Valenti
21 -|dfg-big|4|64|256|128/256|16-Core AMD Opteron(tm) Processor 6272|Group Valenti
22 -|dfg-big|4|48|192|128/256|12-Core AMD Opteron(tm) Processor 6344|Group Valenti
23 -|fplo|2|12|24|256|Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz|Group Valenti
24 -|fplo|4|16|32|256|Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz|Group Valenti
25 -|dfg-xeon|5|16|32|128|Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz|Group Valenti
26 -|dfg-xeon|7|20|140|128|Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz|Group Valenti
27 -|iboga|34|20|880|64|Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz|Group Rezzolla
28 -|dreama|1|40|40|1024|Intel(R) Xeon(R) CPU E7-4820 v3 @ 1.90GHz|Group Rezzolla
29 -|barcelona|8|40|320|192|Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz|(((
30 -Group Valenti
31 -)))
32 -|barcelona|1|40|40|512| |Group Valenti
33 -|mallorca|4|48|192|256|AMD EPYC 7352 24-Core Processor|Group Valenti
34 -|calea|36|64|2304|256|Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz|(((
35 -Group Rezzolla
36 -)))
37 -|majortom|1|64|64|256|AMD EPYC 7513 32-Core Processor|Group Bleicher
14 +{| border="1" align="center"
15 +|-
16 +! scope="col" | '''Partition'''
17 +! scope="col" | '''No. Nodes'''
18 +! scope="col" | '''Cores/M'''
19 +! scope="col" | '''Tot. Cores'''
20 +! scope="col" | '''RAM/GB'''
21 +! scope="col" | '''CPU'''
22 +! scope="col" | '''Remark/Restriction'''
23 +|-
24 +| itp
25 +| 10
26 +| 20
27 +| 200
28 +| 64
29 +| Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
30 +| Common Usage
31 +|-
32 +| dfg-big
33 +| 3
34 +| 32
35 +| 96
36 +| 128
37 +| 8-Core AMD Opteron(tm) Processor 6128
38 +| Group Valenti
39 +|-
40 +| dfg-big
41 +| 3
42 +| 48
43 +| 144
44 +| 128/256
45 +| 12-Core AMD Opteron(tm) Processor 6168
46 +| Group Valenti
47 +|-
48 +| dfg-big
49 +| 4
50 +| 64
51 +| 256
52 +| 128/256
53 +| 16-Core AMD Opteron(tm) Processor 6272
54 +| Group Valenti
55 +|-
56 +| dfg-big
57 +| 4
58 +| 48
59 +| 192
60 +| 128/256
61 +| 12-Core AMD Opteron(tm) Processor 6344
62 +| Group Valenti
63 +|-
64 +| fplo
65 +| 2
66 +| 12
67 +| 24
68 +| 256
69 +| Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
70 +| Group Valenti
71 +|-
72 +| fplo
73 +| 4
74 +| 16
75 +| 32
76 +| 256
77 +| Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
78 +| Group Valenti
79 +|-
80 +| dfg-xeon
81 +| 5
82 +| 16
83 +| 32
84 +| 128
85 +| Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
86 +| Group Valenti
87 +|-
88 +| dfg-xeon
89 +| 7
90 +| 20
91 +| 140
92 +| 128
93 +| Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
94 +| Group Valenti
95 +|-
96 +| iboga
97 +| 44
98 +| 20
99 +| 880
100 +| 64
101 +| Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
102 +| Group Rezzolla
103 +|-
104 +| dreama
105 +| 1
106 +| 40
107 +| 40
108 +| 1024
109 +| Intel(R) Xeon(R) CPU E7-4820 v3 @ 1.90GHz
110 +| Group Rezzolla
111 +|-
112 +| barcelona
113 +| 8
114 +| 40
115 +| 320
116 +| 192
117 +| Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
118 +| Group Valenti
38 38  
120 +|}
121 +
39 39  Most nodes are for exclusive use by their corresponding owners. The itp nodes are for common usage. Except for 'fplo' and 'dfg-big' nodes, all machines are connected with Infiniband for all traffic (IP and internode communitcation - MPI)
40 40  
41 41  == Submitting Jobs ==
... ... @@ -44,36 +44,34 @@
44 44  
45 45  This is very simple for serial (1 CPU) jobs:
46 46  
47 -{{{ sbatch -p PARTITION jobscript.sh}}}
130 + sbatch -p PARTITION jobscript.sh
48 48  
49 49  where jobscript.sh is a shell script with your job commands.
50 50  
51 -Running **openMPI** jobs is not much more complictated:
134 +Running '''openMPI''' jobs is not much more complictated:
52 52  
53 -{{{ sbatch -p PARTITION -n X jobscript.sh}}}
136 + sbatch -p PARTITION -n X jobscript.sh
54 54  
55 55  where X is the number of desired MPI processes. Launch the job in the jobscript with:
56 56  
57 -{{{ mpirun YOUREXECUTABLE}}}
140 + mpirun YOUREXECUTABLE
58 58  
59 59  You don't have to worry about the number of processes or specific nodes. Both slurm and openmpi know
60 60  about each other.
61 61  
62 -If you want **infiniband** for your MPI job (which is usually a good idea, if not running on the same node), you have to request the feature infiniband:
145 +If you want '''infiniband''' for your MPI job (which is usually a good idea, if not running on the same node), you have to request the feature infiniband:
63 63  
64 -{{{ sbatch -p dfg -C infiniband -n X jobscript.sh}}}
147 + sbatch -p dfg -C infiniband -n X jobscript.sh
65 65  
66 66  Note: Infiniband is not available for 'fplo' and 'dfg-big'.
67 67  
68 -Running **SMP jobs** (multiple threads, not necessary mpi). Running MPI jobs on a single node is recommended for the
151 +Running '''SMP jobs''' (multiple threads, not necessary mpi). Running MPI jobs on a single node is recommended for the
69 69  dfg-big nodes. This are big host with up to 64 cpu's per node, but 'slow' gigabit network connection. Launch SMP jobs with
70 70  
71 -{{{ sbatch -p PARTITION -N 1 -n X jobscript.sh}}}
154 + sbatch -p PARTITION -N 1 -n X jobscript.sh
72 72  
73 73  === Differences in network the network connection ===
74 -
75 -
76 -
157 +
77 77  The new v3 dfg-xeon nodes are equipped with 10 GB network. This is faster (trough put) and has lower latency then gigabit ethernet, but is not is not as fast as the DDR infinband network. The 10 GB network is used for MPI and I/O. Infiniband is only use for MPI.
78 78  
79 79  == Defining Resource limits ==
... ... @@ -80,13 +80,13 @@
80 80  
81 81  By default each job allocates 2 GB memory and a run time of 3 days. More resources can be requested by
82 82  
83 -{{{ --mem-per-cpu=<MB>}}}
164 + --mem-per-cpu=<MB>
84 84  
85 85  where <MB> is the memory in megabytes. The virtual memory limit is 2.5 times of the requested real memory limit.
86 86  
87 87  The memory limit is not a hard limit. When exceeding the limit, your memory will be swapped out. Only when using more the 150% of the limit your job will be killed. So be conservative, to keep enough room for other jobs. Requested memory is blocked from the use by other jobs.
88 88  
89 -{{{ -t or --time=<time>}}}
170 + -t or --time=<time>
90 90  
91 91  where time can be set in the format "days-hours". See man page for more formats.
92 92  
... ... @@ -98,17 +98,16 @@
98 98  
99 99  sbatch arguments can be written in the jobfile:
100 100  
101 -{{{#! /bin/bash
182 +<pre>
183 +#! /bin/bash
102 102  #
103 103  # Choosing a partition:
104 104  #SBATCH -p housewives
105 105  
106 -YOUR JOB COMMANDS....}}}
188 +YOUR JOB COMMANDS....
189 +</pre>
107 107  
108 -
109 -
110 110  = Links =
111 111  
193 +* SLURM-Homepage [http://slurm.schedmd.com/slurm.html]
112 112  
113 -
114 -* SLURM-Homepage [[url:http://slurm.schedmd.com/slurm.html]]