Changes for page GPU Server

Last modified by Thomas Coelho (local) on 2024/10/01 13:47

From version 3.1
edited by Thomas Coelho (local)
on 2023/11/06 10:51
Change comment: There is no comment for this version
To version 12.1
edited by Thomas Coelho (local)
on 2024/06/07 10:12
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -4,11 +4,11 @@
4 4  
5 5  = The GPU Server =
6 6  
7 -The GPU machine is a two socket server with AMD EPYC 7313 processors. The processors have 16 Cores, actually with SMT enabled (32 Threads). It comes with 512 GB of memory and 2 x 4 TB U.3 (NVMe) SSDs as fast storage. There are** 8 AMD Instinct Mi 50** GPU cards for computing.
7 +The GPU machine is a two socket server with AMD EPYC 7313 processors. One processor a 16 Cores, actually with SMT enabled (32 Threads). It comes with 512 GB of memory and 2 x 4 TB U.3 (NVMe) SSDs as fast storage. There are** 8 AMD Instinct Mi 50** GPU cards for computing.
8 8  
9 9  Access is given by SLURM and the separate partition "gpu".
10 10  
11 -As software stack AMD ROCm is installed. This supports the ROCm and openCL interface.
11 +As software stack AMD ROCm is installed. This supports the ROCm and openCL interface. Current ROCm Stack is version 6.1. This is also packaged in Ubuntu 6.1.
12 12  
13 13  (% class="box infomessage" %)
14 14  (((
... ... @@ -15,6 +15,10 @@
15 15  Because GPU computing is a new discipline, we can only provide limited information here. If you have something to share, please fell free to edit this page.
16 16  )))
17 17  
18 +{{warning}}
19 +To have built in ROCm support in slurm, this machine has already been updated to Ubuntu 24.04. There are some parts of the ROCm stack included in the distribution which is a mixture of 5.7 and 6.0. Official Support from AMD for Ubuntu 24.04 is not yet available. Pytorch has succesfully tested with this setup.
20 +{{/warning}}
21 +
18 18  == Submitting ==
19 19  
20 20  GPUs are handled as generic resources in Slurm (gres).
... ... @@ -40,7 +40,23 @@
40 40  
41 41  Install Pytorch:
42 42  
47 +{{code language="bash"}}
48 +pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
43 43  
50 +
51 +{{/code}}
52 +
53 +At time of writing it's not available for 6.1. Please check the pytorch Website for updates.
54 +
55 +You can test the installation with
56 +
57 +{{code language="python"}}
58 +import torch
59 +
60 +print(torch.cuda.is_available())
61 +
62 +{{/code}}
63 +
44 44  == Links ==
45 45  
46 46  GPU Cards: [[https:~~/~~/www.amd.com/en/products/professional-graphics/instinct-mi50>>https://www.amd.com/en/products/professional-graphics/instinct-mi50]]
... ... @@ -47,6 +47,7 @@
47 47  
48 48  ROCm documentation: [[https:~~/~~/rocm.docs.amd.com/en/latest/rocm.html>>https://rocm.docs.amd.com/en/latest/rocm.html]]
49 49  
70 +Pytorch: [[https:~~/~~/pytorch.org/>>https://pytorch.org/]]
50 50  
51 51  
52 52