Changes for page GPU Server
Last modified by Thomas Coelho (local) on 2024/10/01 13:47
From version 3.1
edited by Thomas Coelho (local)
on 2023/11/06 10:51
on 2023/11/06 10:51
Change comment:
There is no comment for this version
To version 12.1
edited by Thomas Coelho (local)
on 2024/06/07 10:12
on 2024/06/07 10:12
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -4,11 +4,11 @@ 4 4 5 5 = The GPU Server = 6 6 7 -The GPU machine is a two socket server with AMD EPYC 7313 processors. The processorshave16 Cores, actually with SMT enabled (32 Threads). It comes with 512 GB of memory and 2 x 4 TB U.3 (NVMe) SSDs as fast storage. There are** 8 AMD Instinct Mi 50** GPU cards for computing.7 +The GPU machine is a two socket server with AMD EPYC 7313 processors. One processor a 16 Cores, actually with SMT enabled (32 Threads). It comes with 512 GB of memory and 2 x 4 TB U.3 (NVMe) SSDs as fast storage. There are** 8 AMD Instinct Mi 50** GPU cards for computing. 8 8 9 9 Access is given by SLURM and the separate partition "gpu". 10 10 11 -As software stack AMD ROCm is installed. This supports the ROCm and openCL interface. 11 +As software stack AMD ROCm is installed. This supports the ROCm and openCL interface. Current ROCm Stack is version 6.1. This is also packaged in Ubuntu 6.1. 12 12 13 13 (% class="box infomessage" %) 14 14 ((( ... ... @@ -15,6 +15,10 @@ 15 15 Because GPU computing is a new discipline, we can only provide limited information here. If you have something to share, please fell free to edit this page. 16 16 ))) 17 17 18 +{{warning}} 19 +To have built in ROCm support in slurm, this machine has already been updated to Ubuntu 24.04. There are some parts of the ROCm stack included in the distribution which is a mixture of 5.7 and 6.0. Official Support from AMD for Ubuntu 24.04 is not yet available. Pytorch has succesfully tested with this setup. 20 +{{/warning}} 21 + 18 18 == Submitting == 19 19 20 20 GPUs are handled as generic resources in Slurm (gres). ... ... @@ -40,7 +40,23 @@ 40 40 41 41 Install Pytorch: 42 42 47 +{{code language="bash"}} 48 +pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0 43 43 50 + 51 +{{/code}} 52 + 53 +At time of writing it's not available for 6.1. Please check the pytorch Website for updates. 54 + 55 +You can test the installation with 56 + 57 +{{code language="python"}} 58 +import torch 59 + 60 +print(torch.cuda.is_available()) 61 + 62 +{{/code}} 63 + 44 44 == Links == 45 45 46 46 GPU Cards: [[https:~~/~~/www.amd.com/en/products/professional-graphics/instinct-mi50>>https://www.amd.com/en/products/professional-graphics/instinct-mi50]] ... ... @@ -47,6 +47,7 @@ 47 47 48 48 ROCm documentation: [[https:~~/~~/rocm.docs.amd.com/en/latest/rocm.html>>https://rocm.docs.amd.com/en/latest/rocm.html]] 49 49 70 +Pytorch: [[https:~~/~~/pytorch.org/>>https://pytorch.org/]] 50 50 51 51 52 52