Changes for page Slurm
Last modified by Thomas Coelho (local) on 2023/08/28 15:17
From version 5.1
edited by Thomas Coelho
on 2022/12/08 11:05
on 2022/12/08 11:05
Change comment:
There is no comment for this version
To version 3.1
edited by Thomas Coelho
on 2022/10/17 11:17
on 2022/10/17 11:17
Change comment:
There is no comment for this version
Summary
-
Page properties (2 modified, 0 added, 0 removed)
Details
- Page properties
-
- Syntax
-
... ... @@ -1,1 +1,1 @@ 1 - XWiki2.11 +MediaWiki 1.6 - Content
-
... ... @@ -3,11 +3,8 @@ 3 3 Slurm is fully integrated in our system. You do not need set any environment variables. 4 4 5 5 6 +== Partitions == 6 6 7 -{{toc/}} 8 - 9 -== Partitions == 10 - 11 11 A partition is a subset of the cluster, a bundle of compute nodes with the same characteristics. 12 12 13 13 Based on access restrictions our cluster is divided in different partitions. 'sinfo' will only show partitions you are allowed to use. Using 'sinfo -a' shows all partitons. ... ... @@ -14,28 +14,114 @@ 14 14 15 15 A partition is selected by '-p PARTITIONNAME'. 16 16 17 -|=**Partition** |=**No. Nodes** |=**Cores/M** |=**Tot. Cores**|=**RAM/GB** |=**CPU** |=**Remark/Restriction** 18 -|itp |10|20 |200|64 |Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz|Common Usage 19 -|dfg-big|3|32|96|128|8-Core AMD Opteron(tm) Processor 6128|Group Valenti 20 -|dfg-big|3|48|144|128/256|12-Core AMD Opteron(tm) Processor 6168|Group Valenti 21 -|dfg-big|4|64|256|128/256|16-Core AMD Opteron(tm) Processor 6272|Group Valenti 22 -|dfg-big|4|48|192|128/256|12-Core AMD Opteron(tm) Processor 6344|Group Valenti 23 -|fplo|2|12|24|256|Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz|Group Valenti 24 -|fplo|4|16|32|256|Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz|Group Valenti 25 -|dfg-xeon|5|16|32|128|Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz|Group Valenti 26 -|dfg-xeon|7|20|140|128|Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz|Group Valenti 27 -|iboga|34|20|880|64|Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz|Group Rezzolla 28 -|dreama|1|40|40|1024|Intel(R) Xeon(R) CPU E7-4820 v3 @ 1.90GHz|Group Rezzolla 29 -|barcelona|8|40|320|192|Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz|((( 30 -Group Valenti 31 -))) 32 -|barcelona|1|40|40|512| |Group Valenti 33 -|mallorca|4|48|192|256|AMD EPYC 7352 24-Core Processor|Group Valenti 34 -|calea|36|64|2304|256|Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz|((( 35 -Group Rezzolla 36 -))) 37 -|majortom|1|64|64|256|AMD EPYC 7513 32-Core Processor|Group Bleicher 14 +{| border="1" align="center" 15 +|- 16 +! scope="col" | '''Partition''' 17 +! scope="col" | '''No. Nodes''' 18 +! scope="col" | '''Cores/M''' 19 +! scope="col" | '''Tot. Cores''' 20 +! scope="col" | '''RAM/GB''' 21 +! scope="col" | '''CPU''' 22 +! scope="col" | '''Remark/Restriction''' 23 +|- 24 +| itp 25 +| 10 26 +| 20 27 +| 200 28 +| 64 29 +| Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz 30 +| Common Usage 31 +|- 32 +| dfg-big 33 +| 3 34 +| 32 35 +| 96 36 +| 128 37 +| 8-Core AMD Opteron(tm) Processor 6128 38 +| Group Valenti 39 +|- 40 +| dfg-big 41 +| 3 42 +| 48 43 +| 144 44 +| 128/256 45 +| 12-Core AMD Opteron(tm) Processor 6168 46 +| Group Valenti 47 +|- 48 +| dfg-big 49 +| 4 50 +| 64 51 +| 256 52 +| 128/256 53 +| 16-Core AMD Opteron(tm) Processor 6272 54 +| Group Valenti 55 +|- 56 +| dfg-big 57 +| 4 58 +| 48 59 +| 192 60 +| 128/256 61 +| 12-Core AMD Opteron(tm) Processor 6344 62 +| Group Valenti 63 +|- 64 +| fplo 65 +| 2 66 +| 12 67 +| 24 68 +| 256 69 +| Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz 70 +| Group Valenti 71 +|- 72 +| fplo 73 +| 4 74 +| 16 75 +| 32 76 +| 256 77 +| Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 78 +| Group Valenti 79 +|- 80 +| dfg-xeon 81 +| 5 82 +| 16 83 +| 32 84 +| 128 85 +| Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 86 +| Group Valenti 87 +|- 88 +| dfg-xeon 89 +| 7 90 +| 20 91 +| 140 92 +| 128 93 +| Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz 94 +| Group Valenti 95 +|- 96 +| iboga 97 +| 44 98 +| 20 99 +| 880 100 +| 64 101 +| Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz 102 +| Group Rezzolla 103 +|- 104 +| dreama 105 +| 1 106 +| 40 107 +| 40 108 +| 1024 109 +| Intel(R) Xeon(R) CPU E7-4820 v3 @ 1.90GHz 110 +| Group Rezzolla 111 +|- 112 +| barcelona 113 +| 8 114 +| 40 115 +| 320 116 +| 192 117 +| Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz 118 +| Group Valenti 38 38 120 +|} 121 + 39 39 Most nodes are for exclusive use by their corresponding owners. The itp nodes are for common usage. Except for 'fplo' and 'dfg-big' nodes, all machines are connected with Infiniband for all traffic (IP and internode communitcation - MPI) 40 40 41 41 == Submitting Jobs == ... ... @@ -44,36 +44,34 @@ 44 44 45 45 This is very simple for serial (1 CPU) jobs: 46 46 47 - {{{sbatch -p PARTITION jobscript.sh}}}130 + sbatch -p PARTITION jobscript.sh 48 48 49 49 where jobscript.sh is a shell script with your job commands. 50 50 51 -Running **openMPI**jobs is not much more complictated:134 +Running '''openMPI''' jobs is not much more complictated: 52 52 53 - {{{sbatch -p PARTITION -n X jobscript.sh}}}136 + sbatch -p PARTITION -n X jobscript.sh 54 54 55 55 where X is the number of desired MPI processes. Launch the job in the jobscript with: 56 56 57 - {{{mpirun YOUREXECUTABLE}}}140 + mpirun YOUREXECUTABLE 58 58 59 59 You don't have to worry about the number of processes or specific nodes. Both slurm and openmpi know 60 60 about each other. 61 61 62 -If you want **infiniband**for your MPI job (which is usually a good idea, if not running on the same node), you have to request the feature infiniband:145 +If you want '''infiniband''' for your MPI job (which is usually a good idea, if not running on the same node), you have to request the feature infiniband: 63 63 64 - {{{sbatch -p dfg -C infiniband -n X jobscript.sh}}}147 + sbatch -p dfg -C infiniband -n X jobscript.sh 65 65 66 66 Note: Infiniband is not available for 'fplo' and 'dfg-big'. 67 67 68 -Running **SMP jobs**(multiple threads, not necessary mpi). Running MPI jobs on a single node is recommended for the151 +Running '''SMP jobs''' (multiple threads, not necessary mpi). Running MPI jobs on a single node is recommended for the 69 69 dfg-big nodes. This are big host with up to 64 cpu's per node, but 'slow' gigabit network connection. Launch SMP jobs with 70 70 71 - {{{sbatch -p PARTITION -N 1 -n X jobscript.sh}}}154 + sbatch -p PARTITION -N 1 -n X jobscript.sh 72 72 73 73 === Differences in network the network connection === 74 - 75 - 76 - 157 + 77 77 The new v3 dfg-xeon nodes are equipped with 10 GB network. This is faster (trough put) and has lower latency then gigabit ethernet, but is not is not as fast as the DDR infinband network. The 10 GB network is used for MPI and I/O. Infiniband is only use for MPI. 78 78 79 79 == Defining Resource limits == ... ... @@ -80,13 +80,13 @@ 80 80 81 81 By default each job allocates 2 GB memory and a run time of 3 days. More resources can be requested by 82 82 83 - {{{--mem-per-cpu=<MB>}}}164 + --mem-per-cpu=<MB> 84 84 85 85 where <MB> is the memory in megabytes. The virtual memory limit is 2.5 times of the requested real memory limit. 86 86 87 87 The memory limit is not a hard limit. When exceeding the limit, your memory will be swapped out. Only when using more the 150% of the limit your job will be killed. So be conservative, to keep enough room for other jobs. Requested memory is blocked from the use by other jobs. 88 88 89 - {{{-t or --time=<time>}}}170 + -t or --time=<time> 90 90 91 91 where time can be set in the format "days-hours". See man page for more formats. 92 92 ... ... @@ -98,17 +98,16 @@ 98 98 99 99 sbatch arguments can be written in the jobfile: 100 100 101 -{{{#! /bin/bash 182 +<pre> 183 +#! /bin/bash 102 102 # 103 103 # Choosing a partition: 104 104 #SBATCH -p housewives 105 105 106 -YOUR JOB COMMANDS....}}} 188 +YOUR JOB COMMANDS.... 189 +</pre> 107 107 108 - 109 - 110 110 = Links = 111 111 193 +* SLURM-Homepage [http://slurm.schedmd.com/slurm.html] 112 112 113 - 114 -* SLURM-Homepage [[url:http://slurm.schedmd.com/slurm.html]]