117 lines
3.0 KiB
Markdown
117 lines
3.0 KiB
Markdown
## mail an bjoern
|
|
|
|
- boeltz: sample daten fuer test des setups; boltz braucht wie es scheint nur cuda installiert
|
|
- ollama: ordner auf qumulo fuer docker daten
|
|
|
|
- [x] nfs docker mount timing
|
|
- [x] explicit gpu
|
|
- [x] searxng openwebui
|
|
- [x] collect all url's
|
|
|
|
## Steps
|
|
|
|
1. [x] mounting and cabeling
|
|
2. [x] check bios settings
|
|
3. [x] setup storage ?? or do they have hardware raid1
|
|
4. [x] setup iLo
|
|
5. [x] os installation via usb stick - prepare before hand
|
|
6. [x] ansible base install (sec, packages, docker)
|
|
7. [x] ansible compose
|
|
8. [x] ansible nfs - mount qumulo share(s)
|
|
9. [x] manuall 25 GBits config -> use saved netplan file
|
|
10. [x] manuall nvidia driver install with manuall (nvidia driver, cuda driver and container toolkit)
|
|
11. [x] install beszel agent
|
|
12. [x] spin up containers and test them
|
|
13. [x] install [boltz](https://github.com/jwohlwend/boltz) and test it
|
|
|
|
## TODO
|
|
|
|
- [ ] (optional) clean from snap
|
|
- [=] beszel reverse proxying via firewall. sophos intuitively not made for this
|
|
- [=] install beszel agent on all devices
|
|
- [ ] extend network diagram
|
|
- [x] write ansible playbook?
|
|
- [x] test ansible contruct
|
|
- [x] prepare boot stick
|
|
|
|
## base
|
|
|
|
- Hostname: neo-srv-ai-01
|
|
- IP Addres: 192.168.60.203
|
|
- Floating IP: 192.168.60.213
|
|
- iLo IP: 192.168.50.213
|
|
|
|
## ansible-roles
|
|
|
|
- [x] geerlingguy.security
|
|
- [x] geerlingguy.docker
|
|
- [x] nfs-client (mount qumulo shares)
|
|
|
|
- [ ] users (separate)
|
|
|
|
- [x] nvidia (driver) -> do manually
|
|
- [x] interfaces (25GBits NICs) -> do manually
|
|
|
|
|
|
## Manual nvidia driver, cuda driver and container toolkit
|
|
|
|
### NVIDIA driver
|
|
|
|
Check if GPUs are recognized by the base OS:
|
|
```bash
|
|
sudo lspci | grep -i nvidia
|
|
```
|
|
|
|
Which should some output if it finds nvidia deivces.
|
|
|
|
Search for required drivers for your GPUs:
|
|
```bash
|
|
sudo ubuntu-drivers devices
|
|
```
|
|
|
|
Automatically install all drivers:
|
|
```bash
|
|
sudo ubuntu-drivers autoinstall
|
|
```
|
|
|
|
Reboot the system for changes to take effect:
|
|
```bash
|
|
sudo reboot
|
|
```
|
|
|
|
Shot GPU stats with:
|
|
```bash
|
|
nvidia-smi
|
|
```
|
|
|
|
### Cuda driver
|
|
|
|
**Disable Secure Boot in BIOS**
|
|
|
|
Install Cuda drivers:
|
|
|
|
```bash
|
|
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
|
|
sudo dpkg -i cuda-keyring_1.1-1_all.deb
|
|
sudo apt update
|
|
sudo apt -y install cuda-toolkit-12-8
|
|
sudo apt install -y cuda-drivers
|
|
```
|
|
|
|
### Container toolkit
|
|
|
|
Install the Nvidia Container toolkit:
|
|
|
|
```bash
|
|
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
|
|
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
|
|
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
|
|
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
|
apt update
|
|
apt install -y nvidia-container-toolkit
|
|
```
|
|
|
|
Test a simple cuda container and nvidia-smi command inside:
|
|
```bash
|
|
docker run --rm --gpus all nvidia/cuda:13.0.0-base-ubuntu24.04 nvidia-smi
|
|
``` |