Wiki source code of GPU Server

Version 4.1 by Thomas Coelho (local) on 2023/11/06 10:58

Manage
- Copy
Actions
- Export
- Print Preview
Viewers
- Source
- Children
- Content
- Comments
- Attachments
- History
- Information
- Likes

version	line-number	content
1.1	1	{{box cssClass="floatinginfobox" title="Contents"}}
	2	{{toc/}}
	3	{{/box}}
	4
	5	= The GPU Server =
	6
2.2	7	The GPU machine is a two socket server with AMD EPYC 7313 processors. The processors have 16 Cores, actually with SMT enabled (32 Threads). It comes with 512 GB of memory and 2 x 4 TB U.3 (NVMe) SSDs as fast storage. There are 8 AMD Instinct Mi 50 GPU cards for computing.
1.1	8
	9	Access is given by SLURM and the separate partition "gpu".
	10
	11	As software stack AMD ROCm is installed. This supports the ROCm and openCL interface.
	12
	13	(% class="box infomessage" %)
	14	(((
	15	Because GPU computing is a new discipline, we can only provide limited information here. If you have something to share, please fell free to edit this page.
	16	)))
	17
	18	== Submitting ==
	19
	20	GPUs are handled as generic resources in Slurm (gres).
	21
	22	Each GPU is handled as allocatable item. You can allocate up to 8 GPUs. You can do this by adding "~-~-gres=gpu:N", where N is the number of CPUs.
	23
	24	CPUs are handled as usual.
	25
	26	Example: Interative Seesion with 2 GPUs:
	27
	28	{{code language="bash"}}
	29	srun -p gpu --gres=gpu:2 --pty bash
	30	{{/code}}
	31
	32	== PyTorch ==
	33
	34	A popular framework for machine learning is PyTorch. An up-to-date version with ROCm support must be installed with pip3 in a venv.
	35
	36	{{code language="bash"}}
	37	python3 -m venv venc
	38	. venv/bin/activate
	39	{{/code}}
	40
	41	Install Pytorch:
	42
4.1	43	{{code language="bash"}}
	44	pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6
	45	{{/code}}
1.1	46
4.1	47	(% class="wikigeneratedid" %)
	48	You can test the installation with
	49
	50	{{code language="python"}}
	51	import torch
	52
	53	print(torch.cuda.is_available())
	54
	55	{{/code}}
	56
	57	== ==
	58
1.1	59	== Links ==
	60
2.2	61	GPU Cards: [[https:~~/~~/www.amd.com/en/products/professional-graphics/instinct-mi50>>https://www.amd.com/en/products/professional-graphics/instinct-mi50]]
	62
1.1	63	ROCm documentation: [[https:~~/~~/rocm.docs.amd.com/en/latest/rocm.html>>https://rocm.docs.amd.com/en/latest/rocm.html]]
	64
	65
	66
	67

Wiki source code of GPU Server

Applications

Navigation