Slurm

Last modified by Thomas Coelho (local) on 2025/03/18 13:17

SLURM is the Simple Linux Utility for Resource Management and is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.

Slurm is fully integrated in our system. You do not need set any environment variables.

Links

Partitions

A partition is a subset of the cluster, a bundle of compute nodes with the same characteristics.

Based on access restrictions our cluster is divided in different partitions. 'sinfo' will only show partitions you are allowed to use. Using 'sinfo -a' shows all partitons.

A partition is selected by '-p PARTITIONNAME'.

Partition	No. Nodes	Cores/M	Tot. Cores	RAM/GB/;	CPU	Remark/Restriction
itp	12	20	240	64	Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz	Common Usage
fplo	2	12	24	256	Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz	Group Valenti
fplo	4	16	32	256	Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	Group Valenti
dfg-xeon	5	16	32	128	Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz	Group Valenti
dfg-xeon	7	20	140	128	Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz	Group Valenti
iboga	34	20	880	64	Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz	Group Rezzolla
dreama	1	40	40	1024	Intel(R) Xeon(R) CPU E7-4820 v3 @ 1.90GHz	Group Rezzolla
barcelona	8	40	320	192	Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz	Group Valenti
barcelona	1	40	40	512	Intel(R) Xeon(R) Silver 4316 CPU @ 2.30GHz	Group Valenti
mallorca	4	48	192	256	AMD EPYC 7352 24-Core Processor	Group Valenti
calea	36	64	2304	512	Intel(R) Xeon(R) Platinum 8358 CPU @ 2.10GHz	Group Rezzolla
bilbao	7	64	448	512	Intel Xeon(R) Gold 6540 @ 2.20GHz	Group Valenti
majortom	1	64	64	256	AMD EPYC 7513 32-Core Processor	Group Bleicher

Most nodes are for exclusive use by their corresponding owners. The itp nodes are for common usage. Except for 'fplo' and '
majortom', all machines are connected with Infiniband for all traffic (IP and internode communitcation - MPI)

Submitting Jobs

In most cases you want to submit a non interactive job to be executed in our cluster.

This is very simple for serial (1 CPU) jobs:

  sbatch -p PARTITION jobscript.sh

where jobscript.sh is a shell script with your job commands.

Running openMPI jobs is not much more complictated:

  sbatch -p PARTITION -n X jobscript.sh

where X is the number of desired MPI processes. Launch the job in the jobscript with:

  mpirun YOUREXECUTABLE

You don't have to worry about the number of processes or specific nodes. Both slurm and openmpi know
about each other.

Running SMP jobs (multiple threads, not necessary mpi). Running MPI jobs on a single node is recommended for the

dfg-big nodes. This are big host with up to 64 cpu's per node, but 'slow' gigabit network connection. Launch SMP jobs with

  sbatch -p PARTITION -N 1 -n X jobscript.sh

Defining Resource limits

By default each job allocates 2 GB memory and a run time of 3 days. More resources can be requested by

  --mem-per-cpu=<MB>

where <MB> is the memory in megabytes. The virtual memory limit is 2.5 times of the requested real memory limit.

The memory limit is not a hard limit. When exceeding the limit, your memory will be swapped out. Only when using more the 110% of the limit your job will be killed. So be conservative, to keep enough room for other jobs. Requested memory is blocked from the use by other jobs.

  -t or --time=<time>

where time can be set in the format "days-hours". See man page for more formats.

Memory Management

In Slurm you specify only one parameter, which is the limit for your real memory usage and drives the decision where your job is started. The virtual memory of your job maybe 2.5 times of your requested memory. You can exceed your memory limit by 20%. But this will be swap space instead of real memory. This prevents crashing if you memory limit is a little to tight.

Inline Arguments

sbatch arguments can be written in the jobfile:

#! /bin/bash
#
# Choosing a partition:
#SBATCH -p housewives

YOUR JOB COMMANDS....

Slurm

Partitions

Submitting Jobs

Defining Resource limits

Memory Management

Inline Arguments

Links

Applications

Navigation