SLURM
SLURM is the Simple Linux Utility for Resource Management and is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.
Slurm is fully integrated in our system. You do not need set any environment variables.
Contents
Partitions
A partition is a subset of the cluster, a bundle of compute nodes with the same characteristics.
Based on access restrictions our cluster is divided in different partitions. 'sinfo' will only show partitions you are allowed to use. Using 'sinfo -a' shows all partitons.
A partition is selected by '-p PARTITIONNAME'.
Partition | No. Nodes | Cores | Tot. Cores | RAM/GB | CPU | Remark |
---|---|---|---|---|---|---|
housewives | 12 | 4 | 48 | 16 | Dual Core AMD Opteron(tm) Processor 270 2,0 GHz | |
dfg | 9 | 8 | 72 | 32 | Quad-Core AMD Opteron(tm) Processor 2346 HE | Restricted access |
dfg | 8 | 8 | 64 | 32/64 | Quad-Core AMD Opteron(tm) Processor 2376 | Infiniband, Restricted access |
dfg | 8 | 12 | 96 | 32/64 | Six-Core AMD Opteron(tm) Processor 2427 | Infiniband, Restricted access |
quantum | 8 | 12 | 96 | 32/64 | Six-Core AMD Opteron(tm) Processor 2427 | Infiniband, Restricted access |
dfg-big | 3 | 32 | 96 | 128 | 8-Core AMD Opteron(tm) Processor 6128 | Restricted access |
dfg-big | 3 | 48 | 144 | 128/256 | 12-Core AMD Opteron(tm) Processor 6168 | Restricted access |
dfg-big | 4 | 64 | 256 | 128/256 | 16-Core AMD Opteron(tm) Processor 6272 | Restricted access |
The access to the DFG-Nodes (dfg and dfg-big) is restricted to the members of the SFB/TR49. If you do not belong to that group but want to test and develop programs for the Infiniband Network, please talk to the administrator. The access to the queue 'quantum' is restricted to group Prof. Hofstetter.
Submitting Jobs
In most case you want to submit a non interactive job to be executed in our cluster.
This is very simple for serial (1 CPU) jobs:
sbatch -p PARTITION jobscript.sh
where jobscript.sh is a shell script with your job commands.
Running openMPI jobs is not much more complictated:
sbatch -p PARTITION -n X jobscript.sh
where X is the number of desired MPI processes. Launch the job in the jobscript with:
mpirun YOUREXECUTABLE
You don't have to worry about the number of processes or specific nodes. Both slurm and openmpi know about each other.
If you want infiniband for your MPI job (which is usually a good idea, if not running on the same node), you have to request the feature infiniband:
sbatch -p dfg -C infiniband -n X jobscript.sh
Note: Infiniband is only available for the partitions dfg and quantum.
Running SMP jobs (multiple threads, not necessary mpi). Running MPI jobs an a single node, is recommended for the dfg-big nodes. This are big host, with up to 48 cpu's per node, but slow network connection. Launch SMP jobs with
sbatch -p PARTITION -N 1 -n X jobscript.sh
Defining Resource limits
By default each job allocates 2 GB memory and a run time of 3 days. More resources can be requested by
--mem-per-cpu=<MB>
where <MB> is the memory in megabytes. The virtual memory limit is 2.5 times of the requested real memory limit.
The memory limit is not a hard limit. When exceeding the limit, your memory will be swapped out. Only when using more the 150% of the limit your job will be killed. So be conservative, to keep enough room for other jobs. Requested memory is blocked from the use by other jobs.
-t or --time=
where time can by "days-hours". See man page for more formats.
SLURM vs. SGE
This chapter compares the new batch system SLURM with the old SGE.
Partitions
Slurm has a slightly different view on the cluster. Nodes of a cluster are organized in partitions. To submit a job you have the choose one partition where to run the job.
Comparison of commands
The following table shows the most important commands in slurm compared to the commands of the grid engine.
SGE | Slurm | Description |
---|---|---|
qstat | squeue | Show running jobs |
qsub | sbatch | Submit a batch job |
qlogin | srun | Run interactive commands |
qdel | scancel | Delete a batch job |
qhost | sinfo | Get info about nodes |
qmon | sview | Graphical Frontend |
Parallel environments
Slurm has no concept of parallel environment. Slurm has been designed for parallel execution. This makes things easier, but gives your more responsibility when allocating cluster resources.
Memory Management
In Grid Engine you requested a hard limit for your memory, which should be higher than the estimated real memory usage. And second a virtual_free value which described your effective memory requirements.
In Slurm you specify only one parameter, which is the limit for your real memory usage and drives the decision where your job is started. The virtual memory of your job maybe 2.5 times of your requested memory.
Inline Arguments
sbatch arguments can be written in the jobfile:
#! /bin/bash # # Choosing a partition: #SBATCH -p housewives YOUR JOB COMMANDS....
Links
- Homepage [1]