Wiki source code of GPU Server

Version 12.1 by Thomas Coelho (local) on 2024/06/07 10:12

Hide last authors
Thomas Coelho (local) 1.1 1 {{box cssClass="floatinginfobox" title="**Contents**"}}
2 {{toc/}}
3 {{/box}}
4
5 = The GPU Server =
6
Thomas Coelho (local) 8.1 7 The GPU machine is a two socket server with AMD EPYC 7313 processors. One processor a 16 Cores, actually with SMT enabled (32 Threads). It comes with 512 GB of memory and 2 x 4 TB U.3 (NVMe) SSDs as fast storage. There are** 8 AMD Instinct Mi 50** GPU cards for computing.
Thomas Coelho (local) 1.1 8
9 Access is given by SLURM and the separate partition "gpu".
10
Thomas Coelho (local) 10.1 11 As software stack AMD ROCm is installed. This supports the ROCm and openCL interface. Current ROCm Stack is version 6.1. This is also packaged in Ubuntu 6.1.
Thomas Coelho (local) 1.1 12
13 (% class="box infomessage" %)
14 (((
15 Because GPU computing is a new discipline, we can only provide limited information here. If you have something to share, please fell free to edit this page.
16 )))
17
Thomas Coelho (local) 12.1 18 {{warning}}
19 To have built in ROCm support in slurm, this machine has already been updated to Ubuntu 24.04. There are some parts of the ROCm stack included in the distribution which is a mixture of 5.7 and 6.0. Official Support from AMD for Ubuntu 24.04 is not yet available. Pytorch has succesfully tested with this setup.
20 {{/warning}}
21
Thomas Coelho (local) 1.1 22 == Submitting ==
23
24 GPUs are handled as generic resources in Slurm (gres).
25
26 Each GPU is handled as allocatable item. You can allocate up to 8 GPUs. You can do this by adding "~-~-gres=gpu:N", where N is the number of CPUs.
27
28 CPUs are handled as usual.
29
30 Example: Interative Seesion with 2 GPUs:
31
32 {{code language="bash"}}
33 srun -p gpu --gres=gpu:2 --pty bash
34 {{/code}}
35
36 == PyTorch ==
37
38 A popular framework for machine learning is PyTorch. An up-to-date version with ROCm support must be installed with pip3 in a venv.
39
40 {{code language="bash"}}
41 python3 -m venv venc
42 . venv/bin/activate
43 {{/code}}
44
45 Install Pytorch:
46
Thomas Coelho (local) 4.1 47 {{code language="bash"}}
Thomas Coelho (local) 9.1 48 pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
49
50
Thomas Coelho (local) 4.1 51 {{/code}}
Thomas Coelho (local) 1.1 52
Thomas Coelho (local) 11.1 53 At time of writing it's not available for 6.1. Please check the pytorch Website for updates.
Thomas Coelho (local) 10.1 54
Thomas Coelho (local) 4.1 55 You can test the installation with
56
57 {{code language="python"}}
58 import torch
59
60 print(torch.cuda.is_available())
61
62 {{/code}}
63
Thomas Coelho (local) 1.1 64 == Links ==
65
Thomas Coelho (local) 2.2 66 GPU Cards: [[https:~~/~~/www.amd.com/en/products/professional-graphics/instinct-mi50>>https://www.amd.com/en/products/professional-graphics/instinct-mi50]]
67
Thomas Coelho (local) 1.1 68 ROCm documentation: [[https:~~/~~/rocm.docs.amd.com/en/latest/rocm.html>>https://rocm.docs.amd.com/en/latest/rocm.html]]
69
Thomas Coelho (local) 5.1 70 Pytorch: [[https:~~/~~/pytorch.org/>>https://pytorch.org/]]
Thomas Coelho (local) 1.1 71
72
Thomas Coelho (local) 10.1 73