Wiki source code of GPU Server

Version 9.1 by Thomas Coelho (local) on 2024/04/25 10:49

Hide last authors
Thomas Coelho (local) 1.1 1 {{box cssClass="floatinginfobox" title="**Contents**"}}
2 {{toc/}}
3 {{/box}}
4
5 = The GPU Server =
6
Thomas Coelho (local) 8.1 7 The GPU machine is a two socket server with AMD EPYC 7313 processors. One processor a 16 Cores, actually with SMT enabled (32 Threads). It comes with 512 GB of memory and 2 x 4 TB U.3 (NVMe) SSDs as fast storage. There are** 8 AMD Instinct Mi 50** GPU cards for computing.
Thomas Coelho (local) 1.1 8
9 Access is given by SLURM and the separate partition "gpu".
10
11 As software stack AMD ROCm is installed. This supports the ROCm and openCL interface.
12
13 (% class="box infomessage" %)
14 (((
15 Because GPU computing is a new discipline, we can only provide limited information here. If you have something to share, please fell free to edit this page.
16 )))
17
18 == Submitting ==
19
20 GPUs are handled as generic resources in Slurm (gres).
21
22 Each GPU is handled as allocatable item. You can allocate up to 8 GPUs. You can do this by adding "~-~-gres=gpu:N", where N is the number of CPUs.
23
24 CPUs are handled as usual.
25
26 Example: Interative Seesion with 2 GPUs:
27
28 {{code language="bash"}}
29 srun -p gpu --gres=gpu:2 --pty bash
30 {{/code}}
31
32 == PyTorch ==
33
34 A popular framework for machine learning is PyTorch. An up-to-date version with ROCm support must be installed with pip3 in a venv.
35
36 {{code language="bash"}}
37 python3 -m venv venc
38 . venv/bin/activate
39 {{/code}}
40
41 Install Pytorch:
42
Thomas Coelho (local) 4.1 43 {{code language="bash"}}
Thomas Coelho (local) 9.1 44 pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
45
46
Thomas Coelho (local) 4.1 47 {{/code}}
Thomas Coelho (local) 1.1 48
Thomas Coelho (local) 4.1 49 You can test the installation with
50
51 {{code language="python"}}
52 import torch
53
54 print(torch.cuda.is_available())
55
56 {{/code}}
57
Thomas Coelho (local) 1.1 58 == Links ==
59
Thomas Coelho (local) 2.2 60 GPU Cards: [[https:~~/~~/www.amd.com/en/products/professional-graphics/instinct-mi50>>https://www.amd.com/en/products/professional-graphics/instinct-mi50]]
61
Thomas Coelho (local) 1.1 62 ROCm documentation: [[https:~~/~~/rocm.docs.amd.com/en/latest/rocm.html>>https://rocm.docs.amd.com/en/latest/rocm.html]]
63
Thomas Coelho (local) 5.1 64 Pytorch: [[https:~~/~~/pytorch.org/>>https://pytorch.org/]]
Thomas Coelho (local) 1.1 65
66
Thomas Coelho (local) 5.1 67