Bare Metal, the GPU and the Box That Runs Atlas

Every local AI setup eventually comes down to one unglamorous physical fact: a graphics card has to be doing the work, and something has to be feeding it. In my homelab that something is a plain Ubuntu box running Ollama directly on the metal, and the single most consequential decision I made about it was the one I nearly got wrong — whether to virtualise the GPU or hand it the whole machine. Get the driver stack right and you have a fast, rebuildable box that runs models on hardware you own. Get it wrong and you have a host that boots to a black screen and a long evening ahead.

This is the story of that box, why bare metal rather than a virtual machine, and the specific pieces of pain that stand between an idea and a working nvidia-smi on the host.

The model gets all the attention. The thing that actually decides whether you can run it locally is whether the operating system, the driver and the runtime all agree on how to talk to one PCI device — and that turns out to be the hard part.

Why bare metal, not a VM

I did not start here. My instinct was to virtualise the GPU box the way I virtualise everything else — put a hypervisor underneath, pass the card through to a guest, and get snapshots and rebuildability for free. So I tried it. I spent a couple of evenings on GPU passthrough, fighting the card into its own isolation group and stopping the host claiming it before the guest could, and I got it working.

And then I ran the models on it, and the honest result was that it was worse. The passthrough layer was fragile in a way that bit me on every kernel and driver update, the performance was measurably down on native, and the whole arrangement felt like a tower of configuration that existed to buy me snapshots I rarely used. So I wiped it, installed Ubuntu straight onto the metal, put the NVIDIA driver and Ollama on the host, and it was simply faster and more stable. Bare metal won, and it was not close.

The freedom I thought I needed a VM for — being able to blow the box away and rebuild it — I get a different way. The install is scripted and the working configuration is in my notes, so the host is a rebuildable artefact even without a hypervisor: a documented sequence of steps that takes a fresh Ubuntu install to a working inference box. Models live on NVMe and are re-pullable. The card does the work, the operating system gets out of its way, and nothing sits between Ollama and the silicon.

The driver stack that has to line up

Running a GPU on bare-metal Ubuntu sounds like it should be a one-line apt install, and the reason it is frustrating is that several separate things all have to be simultaneously correct, and getting most of them right produces exactly the same symptom as getting none of them right: nvidia-smi fails, and nothing tells you which layer is wrong.

First, the open-source nouveau driver has to be out of the way — it grabs the card on boot, and the proprietary NVIDIA driver cannot bind to a card something else already owns, so nouveau has to be blacklisted. Second, if the machine has Secure Boot enabled, the NVIDIA kernel module is unsigned and the kernel will quietly refuse to load it, which looks identical to the driver not being installed at all. Third, the driver version and the CUDA runtime have to match — a driver too old for the CUDA libraries Ollama ships against gives you a card that nvidia-smi can see but that the runtime cannot use. Fourth, Ollama itself has to be able to find the CUDA libraries, or it silently falls back to the CPU and you wonder why a 3090 is generating tokens at the speed of a laptop.

# Blacklist nouveau so the NVIDIA driver can claim the card
echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
sudo update-initramfs -u

# Install the driver, then confirm the host can see the GPU
sudo ubuntu-drivers install
nvidia-smi          # must list the card before anything else will work

# Ollama runs as a native systemd service on the host, on the GPU
systemctl status ollama
journalctl -u ollama | grep -i cuda    # confirm it found the GPU, not the CPU

When all four are true, the host boots cleanly, nvidia-smi lists the card, and Ollama loads models straight onto it. When any one is false, you get a black screen, a module that will not load, or a runtime that runs but never touches the GPU.

flowchart TD
    A[nouveau blacklisted] --> B[Secure Boot: module signed or disabled]
    B --> C[NVIDIA driver loads, nvidia-smi works]
    C --> D[CUDA runtime version matches the driver]
    D --> E[Ollama finds CUDA on the host]
    E --> F[Models run on the GPU, natively]

What bit me

The failure I remember most came after a routine kernel update. Everything had been working for weeks — and then, after apt upgrade and a reboot, nvidia-smi reported it could not communicate with the driver. Nothing I had touched on purpose had changed. The cause was that the new kernel had booted but the NVIDIA module had not been rebuilt against it, so the running kernel had no driver to load. The fix was making sure DKMS rebuilt the module for the new kernel and that the rebuild actually completed, rather than failing silently in the upgrade noise. Obvious in hindsight. Invisible at the time, because the symptom — “the GPU has vanished” — is identical regardless of which layer of the stack broke.

The general lesson is the one a bare-metal driver stack teaches everyone eventually: the diagnostic skill is not fixing the failure, it is localising it. Several things have to be true and the system will not tell you which one is false, so you learn to test each independently — is the module loaded, does nvidia-smi work, does the CUDA version match, does the runtime see the card — rather than changing three things at once and hoping. I keep the working configuration in my notes now, the same way I keep the Caddy reference config, because re-deriving it from scratch is exactly the wasted effort a knowledge base exists to prevent.

The payoff

On the other side of all that fiddliness is a genuinely good arrangement. The GPU box is a plain Ubuntu machine running Ollama on the bare metal, and it serves the local models behind my AI work — the writing, the embeddings, the retrieval — on hardware that I own and that never sends a token off my network. When a driver update goes wrong, I have the recovery steps written down and I am running again quickly. When I want to rebuild the host entirely, the install is scripted and the configuration is in notes, so the rebuild is mechanical rather than archaeological.

That combination — native GPU performance with a box I can rebuild from notes — is exactly the own-the-centre bargain I keep making. The card does the work. Bare metal keeps it fast and simple. And the evenings I lost — first to passthrough, then to a kernel update that ate the driver — bought me a box I have trusted ever since.

If you are about to attempt this

Do it on a host you can afford to have down for an evening, not on the day you need it working. Resist the urge to virtualise the GPU unless you have a concrete reason to — for a single-card inference box, bare metal is faster, simpler, and far less fragile across updates. Change one thing at a time. Verify each layer of the driver stack independently before assuming the next one is the problem. And write down the working configuration the moment it works, because the version of you that needs it again will have completely forgotten how you got here — and that future stranger deserves better than a black screen and a vague memory.