3.0 KiB
3.0 KiB
mail an bjoern
-
boeltz: sample daten fuer test des setups; boltz braucht wie es scheint nur cuda installiert
-
ollama: ordner auf qumulo fuer docker daten
-
nfs docker mount timing
-
explicit gpu
-
searxng openwebui
-
collect all url's
Steps
- mounting and cabeling
- check bios settings
- setup storage ?? or do they have hardware raid1
- setup iLo
- os installation via usb stick - prepare before hand
- ansible base install (sec, packages, docker)
- ansible compose
- ansible nfs - mount qumulo share(s)
- manuall 25 GBits config -> use saved netplan file
- manuall nvidia driver install with manuall (nvidia driver, cuda driver and container toolkit)
- install beszel agent
- spin up containers and test them
- install boltz and test it
TODO
- (optional) clean from snap
- [=] beszel reverse proxying via firewall. sophos intuitively not made for this
- [=] install beszel agent on all devices
- extend network diagram
- write ansible playbook?
- test ansible contruct
- prepare boot stick
base
- Hostname: neo-srv-ai-01
- IP Addres: 192.168.60.203
- Floating IP: 192.168.60.213
- iLo IP: 192.168.50.213
ansible-roles
-
geerlingguy.security
-
geerlingguy.docker
-
nfs-client (mount qumulo shares)
-
users (separate)
-
nvidia (driver) -> do manually
-
interfaces (25GBits NICs) -> do manually
Manual nvidia driver, cuda driver and container toolkit
NVIDIA driver
Check if GPUs are recognized by the base OS:
sudo lspci | grep -i nvidia
Which should some output if it finds nvidia deivces.
Search for required drivers for your GPUs:
sudo ubuntu-drivers devices
Automatically install all drivers:
sudo ubuntu-drivers autoinstall
Reboot the system for changes to take effect:
sudo reboot
Shot GPU stats with:
nvidia-smi
Cuda driver
Disable Secure Boot in BIOS
Install Cuda drivers:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt -y install cuda-toolkit-12-8
sudo apt install -y cuda-drivers
Container toolkit
Install the Nvidia Container toolkit:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
apt update
apt install -y nvidia-container-toolkit
Test a simple cuda container and nvidia-smi command inside:
docker run --rm --gpus all nvidia/cuda:13.0.0-base-ubuntu24.04 nvidia-smi