Wiki source code of GPU Server

Version 9.1 by Thomas Coelho (local) on 2024/04/25 10:49

Manage
- Copy
Actions
- Export
- Print Preview
Viewers
- Source
- Children
- Content
- Comments
- Attachments
- History
- Information
- Likes

author	version	line-number	content
		1	{{box cssClass="floatinginfobox" title="Contents"}}
		2	{{toc/}}
		3	{{/box}}
		4
		5	= The GPU Server =
		6
		7	The GPU machine is a two socket server with AMD EPYC 7313 processors. One processor a 16 Cores, actually with SMT enabled (32 Threads). It comes with 512 GB of memory and 2 x 4 TB U.3 (NVMe) SSDs as fast storage. There are 8 AMD Instinct Mi 50 GPU cards for computing.
		8
		9	Access is given by SLURM and the separate partition "gpu".
		10
		11	As software stack AMD ROCm is installed. This supports the ROCm and openCL interface.
		12
		13	(% class="box infomessage" %)
		14	(((
		15	Because GPU computing is a new discipline, we can only provide limited information here. If you have something to share, please fell free to edit this page.
		16	)))
		17
		18	== Submitting ==
		19
		20	GPUs are handled as generic resources in Slurm (gres).
		21
		22	Each GPU is handled as allocatable item. You can allocate up to 8 GPUs. You can do this by adding "~-~-gres=gpu:N", where N is the number of CPUs.
		23
		24	CPUs are handled as usual.
		25
		26	Example: Interative Seesion with 2 GPUs:
		27
		28	{{code language="bash"}}
		29	srun -p gpu --gres=gpu:2 --pty bash
		30	{{/code}}
		31
		32	== PyTorch ==
		33
		34	A popular framework for machine learning is PyTorch. An up-to-date version with ROCm support must be installed with pip3 in a venv.
		35
		36	{{code language="bash"}}
		37	python3 -m venv venc
		38	. venv/bin/activate
		39	{{/code}}
		40
		41	Install Pytorch:
		42
		43	{{code language="bash"}}
		44	pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
		45
		46
		47	{{/code}}
		48
		49	You can test the installation with
		50
		51	{{code language="python"}}
		52	import torch
		53
		54	print(torch.cuda.is_available())
		55
		56	{{/code}}
		57
		58	== Links ==
		59
		60	GPU Cards: [[https:~~/~~/www.amd.com/en/products/professional-graphics/instinct-mi50>>https://www.amd.com/en/products/professional-graphics/instinct-mi50]]
		61
		62	ROCm documentation: [[https:~~/~~/rocm.docs.amd.com/en/latest/rocm.html>>https://rocm.docs.amd.com/en/latest/rocm.html]]
		63
		64	Pytorch: [[https:~~/~~/pytorch.org/>>https://pytorch.org/]]

Wiki source code of GPU Server

Applications

Navigation