Building an AI Infrastructure Lab at Home

I wanted to understand AI properly.

Not from a slide deck. Not from vendor messaging. Not from someone else’s curated demo environment where everything works because someone removed all the parts that don’t.

I wanted to build something, break it, rebuild it, and understand what was actually happening underneath. So I built a small AI lab at home, in Docker, on hardware I own, and I have been living with it ever since. This article is the origin story of almost everything else in this notebook. Most of the other projects here grew out of this one room of equipment and a fairly stubborn refusal to accept “it just works” as an explanation.

The thesis is simple, and I will keep coming back to it: AI is a workload, not a feature you switch on. It needs infrastructure, data, access control, governance, monitoring, storage and a sensible operating model — the same as any other platform that matters. Once you accept that, the whole conversation changes, and most of the marketing falls away.

Why I bothered

The day job is technical presales and solutions architecture. I spend my time helping organisations work out what to build and how to run it. And for the last couple of years, nearly every one of those conversations has eventually arrived at AI. Usually in the same shape: someone has been told AI will transform the business, a budget has appeared, and now somebody needs to make it real.

The problem is that most people talking about enterprise AI have never actually run any. They have run a chat window. That is not the same thing. Running a chat window tells you nothing about where the data goes, how access is controlled, what happens when the model is confidently wrong, or what it costs to keep the lights on when fifty people use it instead of one.

I did not want to be one of those people. If I am going to stand in front of a customer and talk about AI as part of their infrastructure, I want to have built it myself first — including the unglamorous parts. The bits the demos skip. The reverse proxy, the GPU scheduling, the moment you realise your knowledge base is a mess and the model is faithfully repeating the mess back to you.

So the lab is partly selfish. It is the fastest way I know to learn something real. A home lab teaches you things slideware never will, which is most of the argument I make in the home lab as a learning platform. You cannot fake having run the thing.

The problem: businesses treat AI as a feature, not a platform

Here is what I keep seeing. A business decides it wants AI. The conversation starts with a product:

What AI product should we buy?
Can we turn on Copilot?
Can we use ChatGPT for this process?
Can we bolt a chatbot onto the website?

None of those are stupid questions. But they are not the starting questions, and treating them as the starting questions is how you end up with an expensive disappointment. They all assume AI is a feature — a thing you procure, enable, and tick off. Switch it on and value falls out.

That is not how it works, and it is a large part of why so many AI projects fail. When you treat AI as a feature, you skip the questions that actually determine whether it works:

What data do we trust, and where does it live?
Who is allowed to see what, and does the AI respect that boundary?
Which decisions can AI assist with, and which must stay human-led?
How do we audit an outcome after the fact?
What happens when the model is wrong — and it will be wrong?
How do we stop this becoming another unsupported shadow IT project nobody owns?

Those are infrastructure and governance questions. They are boring, and they are the entire game. A business that can answer them will get value out of a mediocre model. A business that can’t will get nothing out of the best model on the market, because the model was never the bottleneck.

AI is not a feature you enable. It is a platform you operate.

I built the lab to test that belief in the only way that counts: by having to operate the platform myself, with no vendor to hide behind when something breaks at eleven at night.

Design decisions

A few principles shaped the build, and I made each of them deliberately. They are worth stating plainly, because the trade-offs are the interesting part.

Local-first. The default is that data and inference stay on hardware I control. This is not anti-cloud dogma. I use cloud models when they are the right tool. But starting local forces you to confront the questions that matter — where does this data live, who can reach it, what leaves the building — instead of waving them away because the API endpoint is somebody else’s problem. Local-first is a forcing function for good design. Once you have run a workload locally and understood it, choosing to put it in the cloud becomes an informed decision rather than a default.

Everything as code. Nearly every service runs in Docker, defined in docker-compose. The compose files live in Git, and Git is the source of truth — not the running container, not a setting I clicked in a UI eighteen months ago and have since forgotten. I run Portainer for visibility, but it is a window onto the system, not the system of record. If a machine dies, I want to rebuild from text, not from memory. This discipline is the spine of my Docker homelab, and it is the single highest-leverage decision in the whole lab.

A consumer GPU, chosen for VRAM per pound. The inference box runs a single NVIDIA RTX 3090 with 24GB of VRAM. People are sometimes surprised it isn’t something newer or faster. The reasoning is straightforward: for local LLM work, the constraint that actually bites is VRAM, not raw throughput. 24GB is the practical line where useful models run at a sensible quantisation without constant out-of-memory juggling. A used 3090 gives more usable VRAM per pound than almost anything else, and a home lab does not need to serve a thousand concurrent users — it needs to teach. I sized it the same way I describe in designing infrastructure for AI workloads: start from the model you want to run, work back to the memory it needs, and buy that. The datacentre cards are wonderful and I am not paying datacentre prices to learn.

Bare metal underneath. The main server runs Ubuntu straight on the metal — AMD Ryzen, plenty of RAM — with the always-on services in Docker on top. I tried virtualising the base first, for the snapshots, but for a single-operator lab the hypervisor cost more in fragility than it returned, so disposability comes from elsewhere: the compose files live in Git and a broken host rebuilds from text, not from a VM snapshot. A small fleet of N100-class mini PCs handles the lightweight always-on bits, and a NAS holds bulk storage and backups, because everything important follows 3-2-1 thinking. The network is a flat home LAN being slowly carved into VLANs — trust, IoT, lab — which is one of the security boundaries I am still actively working on.

The thread running through all of it: build it the way I would tell a customer to build it. No special cases I would be embarrassed to defend.

Architecture: how it actually fits together

None of the components are exotic. That is deliberate, and it is the point. The interesting part is never one tool — it is how the tools connect.

Docker is the substrate; almost everything is a container — the one deliberate exception is the model runtime. Ollama runs natively on the bare-metal GPU box, serving local models over its API close to the hardware with nothing in the way. Open WebUI is the chat front end — the part that looks like the demo, and the least important part of the system. n8n is the automation spine: the thing that turns a chat window into a system by giving the model tools, triggers and a way to reach into other services. Home Assistant brings the real world in — sensors, energy, switches — which is how the lab connects to physical things rather than just text. Caddy sits in front, terminating TLS with automatic HTTPS; I moved to it from Nginx Proxy Manager once I wanted the routing defined as code in one file alongside everything else. Uptime Kuma watches up/down, and Prometheus with Grafana handles metrics, because a platform you can’t observe is a platform you don’t really run.

Here is how the pieces relate:

flowchart TD
    User[User] --> Caddy[Caddy reverse proxy]
    Caddy --> WebUI[Open WebUI]
    Caddy --> N8N[n8n automation]
    Caddy --> HA[Home Assistant]

    WebUI --> Ollama[Ollama native on GPU box]
    Ollama --> GPU[RTX 3090 24GB]
    Ollama --> Models[Local models]

    N8N --> Ollama
    N8N --> Graph[Microsoft Graph]
    N8N --> HA
    N8N --> KB[Knowledge base in Git]

    HA --> Sensors[Energy and sensors]

    subgraph Observability
        Kuma[Uptime Kuma]
        Prom[Prometheus and Grafana]
    end

    Ollama --> Prom
    N8N --> Kuma

The shape that matters: the user does not talk to a model. The user talks to a system, and the model is one component inside it — reachable both directly through the chat front end and programmatically through n8n. That second path is where the real value lives, because n8n is what lets the model do something other than produce text.

A small but representative slice. Ollama is not in here, because it runs natively on the GPU box — bound to the LAN as a host service so the containers can reach it:

# On the bare-metal GPU box (systemd, not Docker):
#   OLLAMA_HOST=0.0.0.0:11434
#   OLLAMA_MODELS=/mnt/nvme/models
#
# The container side is the front end, pointed at that box:
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    environment:
      - OLLAMA_BASE_URL=http://gpu-box.lab.internal:11434
      - WEBUI_AUTH=true
    volumes:
      - openwebui_data:/app/backend/data
    networks:
      - web
      - ai_net

volumes:
  openwebui_data:

networks:
  web:
    external: true     # shared with Caddy; routing lives in the Caddyfile
  ai_net:
    driver: bridge

That is the front-of-house in a readable amount of YAML. The model runtime runs natively on the GPU box, the front end talks to it over the LAN, Caddy publishes the front end from a single Caddyfile, and the container side is a git pull and a docker compose up -d away from existing on another machine. That property — reproducibility — is worth more than any individual feature.

This stack is also the foundation of Project Atlas, my local AI assistant. Atlas is what happens when you take this base and add a proper Git-backed knowledge layer, retrieval, and tool-calling into n8n workflows. Everything in this article is the ground Atlas stands on.

What it changed about how I see enterprise AI

This is the part I did not expect, and the part that has been most useful in the day job.

Before the lab, if you had asked me what made an AI project succeed, I would have talked about the model. Capabilities, benchmarks, context windows. Having actually run the thing for a while, I think that focus is almost entirely wrong, and I would have argued the wrong case in front of customers who deserved better.

The model is not the product.

It is the easiest thing to obsess over. Llama, Qwen, Mistral, gpt-oss, the odd DeepSeek, whatever lands next week. I run several of them, usually at Q4_K_M, sometimes higher when there is VRAM to spare, and I pick the model for the job rather than crowning a favourite — the longer version of that journey is in my work with local LLMs. But swapping models, in the end, changes surprisingly little about whether the system is any good.

What changes everything is the context and the plumbing. A model on its own is just a text box. A useful AI system needs the right data in front of it, tools it can call, limits on what it can reach, memory in the right places and — just as importantly — no memory in the wrong places. Get the knowledge layer clean and a modest model performs well. Leave it a mess and the best model on earth will confidently recite your worst, most out-of-date document, because it has no way of knowing the document is wrong.

So the questions that actually determine enterprise success are not about the model at all. They are about data, governance and ownership:

Is the data we are feeding it trustworthy, current and well-structured?
Does the system respect who is allowed to see what?
Can we explain and audit what it did?
Who owns this platform when the person who built it is on holiday?

Those are infrastructure questions. They are the questions I have always cared about as an infrastructure person. The lab made me realise that AI did not introduce a new discipline — it just raised the stakes on the old one. This is most of what I mean when I say AI is becoming infrastructure: the model is the part everyone looks at, and the platform around it is the part that decides whether it works.

Lessons learnt

Document earlier. This is the big one, and it is the lesson I learnt the hard way. When you are learning fast you change things constantly and you do not write down why. Six weeks later you find a setting you clearly chose on purpose and you have no memory of the purpose. That is fine for a weekend hack. It is not how a platform should be run, and it is exactly the failure mode I would criticise in a customer. This site exists partly to fix that in myself — capturing why, what, what worked and what failed, which is the whole argument behind building knowledge instead of documents.

The model is not the product. Worth repeating because it is the lesson that took longest to truly believe. I spent early effort chasing models when I should have spent it on the knowledge layer and the workflows. The model was never the bottleneck.

Context and tooling are the real work. Connecting the model to n8n, giving it clean data to draw on, and bounding what it can touch — that is where the value and the difficulty both live. It is unglamorous. It does not demo as well as a clever answer in a chat window. It is also the entire job.

Local-first surfaces the hard questions early. Running things myself forced me to confront access, storage and segmentation up front, instead of discovering them in production. That has been uncomfortable and completely worth it.

Things I would not do again. I left the network flat for too long, which meant my IoT devices and my lab sat on the same trust level — fine until you think about it for ten seconds. I also ran for months without proper monitoring, so I was guessing at GPU memory pressure instead of looking at a graph. Both were false economies. Both are now fixed, or being fixed.

Where this goes next

The lab works, but it is still a collection of good tools rather than a properly engineered platform. The next phase is about turning it into the latter. Concretely:

A cleaner knowledge layer. This is the highest priority, because it is the thing that most limits quality right now. Structured, version-controlled, deduplicated source material that retrieval can draw on without dragging in noise — the spine of a proper second brain.

Git everywhere. Compose files already live in Git. I want the knowledge base, the n8n workflow exports and the Home Assistant config there too, so the entire lab is reproducible from a repository and nothing important exists only inside a running container.

Real security boundaries. Finishing the VLAN segmentation, tightening what the AI services can reach, and treating the lab as something with an attack surface rather than a trusted playground. Secrets stay in .env files and a vault, never in Git, and I want that discipline enforced rather than merely intended.

Evaluations. Right now I judge quality by feel, which does not scale and is not honest. I want a small, repeatable eval set — fixed prompts with known-good answers — so that when I change a model or a prompt I can measure whether it got better or just different. This is the same instinct behind repeatable customer health checks: if you cannot measure it the same way twice, you are guessing.

The goal is not to build the biggest AI lab. It is to build one that teaches the right lessons and reflects how I would tell a customer to do it.

A real closing thought

The thing I keep returning to is that the lab did not make AI feel more magical. It made it feel more ordinary — in the best possible way. It is another layer of technology. A powerful one, but a layer, with the same needs as every other layer: somewhere to run, data to trust, access to control, and someone to own it when it breaks.

The organisations that get the most from AI will not be the ones who shout about it loudest. They will be the ones who understand their processes, their data, their risks and their infrastructure well enough to put AI to work quietly and sensibly. That understanding does not come from a slide deck. For me it came from a noisy box with a 3090 in it and a stack of compose files, and from being on the hook when the thing fell over.

I did not build this to learn how to use AI. Plenty of people can use AI. I built it to learn how to design around it — and that, it turns out, is the part that actually matters.