Revision as of 13:02, 8 February 2012

SLURM is the Simple Linux Utility for Resource Management and is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.

Slurm is fully integrated in our system. You do not need set any environment variables.

Partitions

A partition is a subset of the cluster, a bundle of compute nodes with the same characteristics.

Based on access restrictions our cluster is divided in different partitions. 'sinfo' will only show partitions you are allowed to use. Using 'sinfo -a' shows all partitons.

A partition is selected by '-p PARTITIONNAME'.

Partition	No. Nodes	Cores	Tot. Slots	RAM/GB	CPU	Remark
housewives	15	4	72	16	Dual Core AMD Opteron(tm) Processor 270 2,0 GHz
dfg	9	8	72	32	Quad-Core AMD Opteron(tm) Processor 2346 HE	Restricted access
dfg	8	8	64	32/64	Quad-Core AMD Opteron(tm) Processor 2376	Infiniband, Restricted access
dfg	8	12	96	32/64	Six-Core AMD Opteron(tm) Processor 2427	Infiniband, Restricted access
quantum	8	12	96	32/64	Six-Core AMD Opteron(tm) Processor 2427	Infiniband, Restricted access
dfg-big	3	32	96	128	8-Core AMD Opteron(tm) Processor 6128	Restricted access
dfg-big	3	48	144	128/256	12-Core AMD Opteron(tm) Processor 6168	Restricted access

The access to the DFG-Nodes (dfg and dfg-big) is restricted to the members of the SFB/TR49. If you do not belong to that group but want to test and develop programs for the Infiniband Network, please talk to the administrator. The access to the queue 'quantum' is restricted to group Prof. Hofstetter.

Submitting Jobs

In most case you want to submit a non interactive job to be executed in our cluster.

This is very simple for serial (1 CPU) jobs:

  sbatch jobscript.sh

where jobscript.sh is a shell script with your job commands.

Running openMPI jobs is not much more complictated:

  sbatch -n X jobscript.sh

where X is the number of desired MPI processes. Launch the job in the jobscript with:

  mpirun YOUREXECUTABLE

You don't have to worry about the number of processes or specific nodes. Both slurm and openmpi know about each other.

Running SMP jobs (multiple threads, not necessary mpi). Running MPI jobs an a single node, is recommended for the dfg-big nodes. This are big host, with up to 48 cpu's per node, but slow network connection. Launch SMP jobs with

  sbatch -N 1 -n X jobscript.sh

SLURM vs. SGE

This chapter compares the new batch system SLURM with the old SGE.

Partitions

Slurm has a slightly different view on the cluster. Nodes of a cluster are organized in partitions. To submit a job you have the choose one partition where to run the job.

Comparison of commands

The following table shows the most important commands in slurm compared to the commands of the grid engine.

Comparison of SGE and Slurm
SGE	Slurm	Description
qstat	squeue	Show running jobs
qsub	sbatch	Submit a batch job
qlogin	srun	Run interactive commands
qdel	scancel	Delete a batch job
qhost	sinfo	Get info about nodes
qmon	sview	Graphical Frontend

Parallel environments

Slurm has no concept of parallel environment. Slurm has been designed for parallel execution. This makes things easier, but gives your more responsibility when allocating cluster resources.

Inline Arguments

sbatch arguments can be written in the jobfile:

#! /bin/bash
#
# Choosing a partition:
#SBATCH -p housewives

YOUR JOB COMMANDS....

Links

Homepage [1]

@@ Line 146: / Line 146: @@
 |sinfo
 | Get info about nodes
+|-
+|qmon
+|sview
+| Graphical Frontend
 |}

Difference between revisions of "SLURM"

Revision as of 13:02, 8 February 2012

Contents

Partitions

Submitting Jobs

SLURM vs. SGE

Partitions

Comparison of commands

Parallel environments

Inline Arguments

Links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools