Changes for page GPU Server
Last modified by Thomas Coelho (local) on 2024/10/01 13:47
From version 5.1
edited by Thomas Coelho (local)
on 2023/11/06 10:58
on 2023/11/06 10:58
Change comment:
There is no comment for this version
To version 12.1
edited by Thomas Coelho (local)
on 2024/06/07 10:12
on 2024/06/07 10:12
Change comment:
There is no comment for this version
Summary
-
Page properties (1 modified, 0 added, 0 removed)
Details
- Page properties
-
- Content
-
... ... @@ -4,11 +4,11 @@ 4 4 5 5 = The GPU Server = 6 6 7 -The GPU machine is a two socket server with AMD EPYC 7313 processors. The processorshave16 Cores, actually with SMT enabled (32 Threads). It comes with 512 GB of memory and 2 x 4 TB U.3 (NVMe) SSDs as fast storage. There are** 8 AMD Instinct Mi 50** GPU cards for computing.7 +The GPU machine is a two socket server with AMD EPYC 7313 processors. One processor a 16 Cores, actually with SMT enabled (32 Threads). It comes with 512 GB of memory and 2 x 4 TB U.3 (NVMe) SSDs as fast storage. There are** 8 AMD Instinct Mi 50** GPU cards for computing. 8 8 9 9 Access is given by SLURM and the separate partition "gpu". 10 10 11 -As software stack AMD ROCm is installed. This supports the ROCm and openCL interface. 11 +As software stack AMD ROCm is installed. This supports the ROCm and openCL interface. Current ROCm Stack is version 6.1. This is also packaged in Ubuntu 6.1. 12 12 13 13 (% class="box infomessage" %) 14 14 ((( ... ... @@ -15,6 +15,10 @@ 15 15 Because GPU computing is a new discipline, we can only provide limited information here. If you have something to share, please fell free to edit this page. 16 16 ))) 17 17 18 +{{warning}} 19 +To have built in ROCm support in slurm, this machine has already been updated to Ubuntu 24.04. There are some parts of the ROCm stack included in the distribution which is a mixture of 5.7 and 6.0. Official Support from AMD for Ubuntu 24.04 is not yet available. Pytorch has succesfully tested with this setup. 20 +{{/warning}} 21 + 18 18 == Submitting == 19 19 20 20 GPUs are handled as generic resources in Slurm (gres). ... ... @@ -41,10 +41,13 @@ 41 41 Install Pytorch: 42 42 43 43 {{code language="bash"}} 44 -pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6 48 +pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0 49 + 50 + 45 45 {{/code}} 46 46 47 -(% class="wikigeneratedid" %) 53 +At time of writing it's not available for 6.1. Please check the pytorch Website for updates. 54 + 48 48 You can test the installation with 49 49 50 50 {{code language="python"}} ... ... @@ -54,8 +54,6 @@ 54 54 55 55 {{/code}} 56 56 57 -== == 58 - 59 59 == Links == 60 60 61 61 GPU Cards: [[https:~~/~~/www.amd.com/en/products/professional-graphics/instinct-mi50>>https://www.amd.com/en/products/professional-graphics/instinct-mi50]] ... ... @@ -65,5 +65,4 @@ 65 65 Pytorch: [[https:~~/~~/pytorch.org/>>https://pytorch.org/]] 66 66 67 67 68 - 69 69