Wiki source code of GPU Server
Version 4.1 by Thomas Coelho (local) on 2023/11/06 10:58
Hide last authors
| author | version | line-number | content |
|---|---|---|---|
| |
1.1 | 1 | {{box cssClass="floatinginfobox" title="**Contents**"}} |
| 2 | {{toc/}} | ||
| 3 | {{/box}} | ||
| 4 | |||
| 5 | = The GPU Server = | ||
| 6 | |||
| |
2.2 | 7 | The GPU machine is a two socket server with AMD EPYC 7313 processors. The processors have 16 Cores, actually with SMT enabled (32 Threads). It comes with 512 GB of memory and 2 x 4 TB U.3 (NVMe) SSDs as fast storage. There are** 8 AMD Instinct Mi 50** GPU cards for computing. |
| |
1.1 | 8 | |
| 9 | Access is given by SLURM and the separate partition "gpu". | ||
| 10 | |||
| 11 | As software stack AMD ROCm is installed. This supports the ROCm and openCL interface. | ||
| 12 | |||
| 13 | (% class="box infomessage" %) | ||
| 14 | ((( | ||
| 15 | Because GPU computing is a new discipline, we can only provide limited information here. If you have something to share, please fell free to edit this page. | ||
| 16 | ))) | ||
| 17 | |||
| 18 | == Submitting == | ||
| 19 | |||
| 20 | GPUs are handled as generic resources in Slurm (gres). | ||
| 21 | |||
| 22 | Each GPU is handled as allocatable item. You can allocate up to 8 GPUs. You can do this by adding "~-~-gres=gpu:N", where N is the number of CPUs. | ||
| 23 | |||
| 24 | CPUs are handled as usual. | ||
| 25 | |||
| 26 | Example: Interative Seesion with 2 GPUs: | ||
| 27 | |||
| 28 | {{code language="bash"}} | ||
| 29 | srun -p gpu --gres=gpu:2 --pty bash | ||
| 30 | {{/code}} | ||
| 31 | |||
| 32 | == PyTorch == | ||
| 33 | |||
| 34 | A popular framework for machine learning is PyTorch. An up-to-date version with ROCm support must be installed with pip3 in a venv. | ||
| 35 | |||
| 36 | {{code language="bash"}} | ||
| 37 | python3 -m venv venc | ||
| 38 | . venv/bin/activate | ||
| 39 | {{/code}} | ||
| 40 | |||
| 41 | Install Pytorch: | ||
| 42 | |||
| |
4.1 | 43 | {{code language="bash"}} |
| 44 | pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6 | ||
| 45 | {{/code}} | ||
| |
1.1 | 46 | |
| |
4.1 | 47 | (% class="wikigeneratedid" %) |
| 48 | You can test the installation with | ||
| 49 | |||
| 50 | {{code language="python"}} | ||
| 51 | import torch | ||
| 52 | |||
| 53 | print(torch.cuda.is_available()) | ||
| 54 | |||
| 55 | {{/code}} | ||
| 56 | |||
| 57 | == == | ||
| 58 | |||
| |
1.1 | 59 | == Links == |
| 60 | |||
| |
2.2 | 61 | GPU Cards: [[https:~~/~~/www.amd.com/en/products/professional-graphics/instinct-mi50>>https://www.amd.com/en/products/professional-graphics/instinct-mi50]] |
| 62 | |||
| |
1.1 | 63 | ROCm documentation: [[https:~~/~~/rocm.docs.amd.com/en/latest/rocm.html>>https://rocm.docs.amd.com/en/latest/rocm.html]] |
| 64 | |||
| 65 | |||
| 66 | |||
| 67 |