AI is Becoming Infrastructure

There is a moment in every technology shift where the thing stops being interesting in its own right and starts being assumed. Nobody runs a “networking project” any more. Nobody pitches a board on the strategic value of having a database. These things became substrate — the floor you stand on, not the building you point at. You only notice them when they break.

I think we are at exactly that moment with AI, and most organisations have not realised it yet. They are still treating AI the way you treat a feature: a project with a start date, a sponsor, a demo, and a hoped-for outcome. A team gets some budget, stands up a proof of concept against a hosted API, wires it into one workflow, and calls it an AI initiative. Then another team does the same thing, with a different model, a different key, a different prompt convention, and no idea the first team exists. Multiply that by a dozen teams and you do not have an AI strategy. You have a sprawl.

This is the argument I keep coming back to, and it is the spine that several other things I have written hang off. AI has crossed the line from application to infrastructure. You do not “do an AI project”. You run an AI platform, the same way you run a network or an identity service or a virtualisation estate. And the organisations that internalise that early will spend the next few years compounding, while the ones still treating it as a feature will spend the same years re-solving the same problems in a dozen incompatible ways.

This is the constructive companion to the autopsy I wrote in why most AI projects fail. That piece is about how the application mindset kills projects one at a time. This one is about what you build instead.

The thesis: AI is now a layer, not a feature

Let me state it plainly, because the rest follows from it.

AI has become a dependency layer. It belongs next to networking, identity and the database — not in the application tier on top of them.

A feature is something one team owns and one workflow consumes. A layer is something everything depends on and nobody owns in isolation. The test is simple: if three different teams need the same capability and would each build it badly on their own, that capability has become infrastructure whether you have named it or not. Authentication crossed that line years ago — we stopped letting every app invent its own login and we built an identity platform. Storage crossed it. Messaging crossed it. AI is crossing it right now.

The signal is everywhere once you look. The marketing team wants summarisation. Support wants classification and drafting. The data team wants extraction. Engineering wants code assistance. Finance wants document parsing. These are not five AI projects. They are five consumers of one capability — access to language models, governed, observed and paid for centrally. When you see the same need arriving from five directions, you are not looking at a use case. You are looking at a utility.

And utilities have a particular shape. You meter them. You secure access to them. You plan capacity for them. You version them. You do not let every tenant build their own power station in the car park.

What treating AI as infrastructure actually means

This is where it gets concrete, because “platform” is one of those words that means everything and therefore nothing. So here is what an AI platform is, in components, stripped of the brochure language.

It starts with a platform team — a small group whose product is not an AI feature but the ability of every other team to safely ship AI features. They own the gateway, the policies, the golden paths and the bill. This is the single most important and most resisted idea, because it requires someone to stop thinking of AI as their project and start thinking of it as everyone’s substrate.

The technical heart of it is a model gateway — a shared access point that every application talks to instead of talking to model providers directly. This is the equivalent of putting a load balancer or an API gateway in front of a fleet of services, and it is the highest-leverage thing you can build. Route all model calls through one place and you suddenly get, for free, the things you cannot retrofit later: a single point for authentication, for rate limiting, for cost attribution, for logging, for content filtering, for swapping one model for another without touching a single application. I run a small version of this pattern at home through n8n as the spine, and the principle scales: applications should depend on an abstraction, never on a specific model endpoint.

Around the gateway you need identity and access control over models and data. Not every team should reach every model. Not every model should reach every data source. The same Conditional Access thinking I apply to a Microsoft tenant in the 365 health check applies here — who can call what, with which data, under what conditions. The blast radius of a leaked model key is not “a few wasted tokens”. It is potentially your entire prompt history and whatever the model was allowed to retrieve.

Then observability and cost control, which in AI are the same problem wearing two hats. You cannot manage what you cannot see, and AI spend is uniquely good at hiding. Tokens are cheap individually and ruinous in aggregate, and a single badly written retrieval loop can quintuple a bill overnight. You want per-team, per-application, per-model spend visible on a dashboard, with alerts, the same way I watch everything else through Prometheus and Grafana. Latency, error rates, token throughput, cost per request — these are platform SLOs now, not curiosities.

You provide golden paths — paved, opinionated, well-documented ways for a team to go from “I want to add summarisation” to “it is in production and governed” without inventing anything. A template repository, a client library that already points at the gateway, a sanctioned retrieval pattern, an evaluation harness. Golden paths are how a platform team scales without becoming a bottleneck: you make the right way the easy way, and most teams take it gratefully.

You manage lifecycle and versioning of models and prompts. Models deprecate. Providers retire endpoints with weeks of notice. Prompts are code — they have versions, they regress, they need testing. Treating a prompt as a configuration string pasted into an application is the AI equivalent of hardcoding a connection string, and it ages just as badly.

And you do capacity planning, especially the moment any local inference enters the picture. A GPU is a finite, schedulable resource. I sized mine — a single RTX 3090 with 24GB of VRAM — deliberately, for VRAM-per-pound rather than raw speed, because the constraint that actually bites is memory, not throughput. At enterprise scale the question becomes how many concurrent requests a model server holds, how you queue, where you burst to a hosted provider, and what it costs when you do.

Here is the layered picture I have in my head when I say “AI platform”.

flowchart TD
  Teams[Product teams and apps] --> GW[Model gateway]
  GW --> IAM[Identity and access control]
  GW --> OBS[Observability and cost control]
  GW --> ROUTER[Model router]
  ROUTER --> HOSTED[Hosted model APIs]
  ROUTER --> LOCAL[Local inference on GPU]
  GW --> DATA[Governed data and retrieval]
  IAM --> POLICY[Policy and audit]
  OBS --> POLICY
  PLAT[Platform team] --> GW
  PLAT --> PATHS[Golden paths and templates]
  PATHS --> Teams

Notice what the applications at the top depend on. Not a model. A platform. The model is an implementation detail behind the router, where it belongs.

We have been here before

If this feels familiar, it should, because the industry has run this exact play three times in fifteen years and the shape never changes. Each time, a capability arrived as a novelty, got treated as a per-project curiosity, and then quietly became the substrate that everything sits on.

Virtualisation started as a clever trick for consolidating a few servers. For a while you “did a virtualisation project”. Then it became the floor — you stopped asking whether a workload was virtual because virtual was simply how servers worked. Cloud arrived as shadow IT on a corporate credit card, one team at a time, until the sprawl forced the creation of landing zones, cloud platform teams, guardrails and FinOps — in other words, until cloud stopped being a project and became a governed platform. Containers were a developer toy, then a deployment detail, then Kubernetes became the substrate and platform engineering became a discipline whose entire job is to give product teams paved roads onto shared infrastructure.

Every one of those followed the same curve: novelty, sprawl, governance, substrate. AI is somewhere around the sprawl-to-governance inflection right now, and the lesson from the previous three is unambiguous. The organisations that got ahead of the curve built the platform before the sprawl became unmanageable. The ones that waited spent a painful year clawing back control from a hundred unsanctioned deployments they could no longer even enumerate.

graph LR
  N[Novelty] --> S[Sprawl]
  S --> G[Governance]
  G --> Sub[Substrate]

The cloud taught us this most expensively. The “shadow cloud” of 2014 — untracked accounts, no tagging, no cost control, no security baseline — is being re-enacted right now as shadow AI, and for exactly the same reasons. We are simply substituting model endpoints for EC2 instances.

Why the application mindset fails

I have watched the application mindset play out and it fails in a consistent, predictable set of ways. These are the same failure modes I catalogued in why most AI projects fail, but seen through the platform lens they stop looking like a dozen separate problems and resolve into one: the absence of a platform.

When every team rolls its own, you get duplicated effort — five teams independently solving retrieval, five subtly different and all slightly wrong. You get duplicated spend, because nobody is negotiating a single committed-use rate and nobody can even see the aggregate bill until finance asks why the cloud invoice grew by a third. You get no governance — no consistent answer to “what data is this model allowed to see”, because the answer is whatever each team happened to decide. You get shadow AI, the most dangerous of all: employees pasting sensitive material into whatever consumer tool is open in a browser tab, because the sanctioned path is harder than the unsanctioned one. And you get no audit — when a regulator or a customer asks “what did your AI tell people and on what basis”, the honest answer is a shrug across a dozen systems that log nothing in common.

None of these are model problems. Every one of them is an organisational problem that a platform solves structurally rather than heroically. You do not fix shadow AI with a policy memo any more than you fixed shadow cloud with one. You fix it by making the governed path the path of least resistance.

Here is the contrast I keep in front of me, because it clarifies almost every decision.

Application thinking	Platform thinking
“We are doing an AI project”	“We run an AI platform”
Each team holds its own model keys	Access brokered through a shared gateway
Model chosen and hardcoded per app	Model selected behind a router, swappable
Cost discovered on the monthly invoice	Cost metered per team in real time
Prompts pasted into application code	Prompts versioned, tested, owned
Governance is a policy document	Governance is enforced at the gateway
Success is a working demo	Success is teams shipping safely on paved roads
Security reviewed once, at the end	Identity and audit built into the substrate

If your AI work lives entirely in the left column, you do not have an AI capability that will survive contact with scale. You have a collection of demos with a shared invoice.

Build versus buy at the platform layer

Treating AI as infrastructure does not commit you to running your own models, and this is where people get the decision backwards. The build-versus-buy question is not “hosted API or local GPU” answered once, globally. It is answered per workload, behind the router, and the router is the entire point.

Think of hosted APIs as a utility — like grid electricity. You pay per unit, you get effectively infinite capacity on demand, you get frontier capability you could never reproduce, and you accept that your data leaves your boundary and your unit cost is set by someone else. For most workloads, most of the time, that is exactly the right trade. You would not generate your own electricity to boil a kettle.

Think of local inference as the generator and the battery — the thing you run for control, privacy, predictable cost at volume, and independence from a provider’s roadmap and rate limits. I run local models for precisely these reasons: data that must not leave the building, and high-volume batch work where per-token pricing would be punishing. The model is not the product — it never was — so I pick the model for the job and the hosting for the constraint.

The platform’s job is to make that a routing decision, not an architecture decision. An application asks for a capability. The router decides — by policy, by data sensitivity, by cost, by latency — whether that request is served by a frontier hosted model, a cheaper hosted model, or a local one on the GPU. Change the policy, change the routing, and not one line of application code moves. That is the dividend of having built the gateway: build-versus-buy stops being a fork in the road you commit to once, and becomes a dial you turn per workload, forever.

# The router mindset: applications ask for a capability, not a model.
def route(request):
    if request.data_classification == "restricted":
        return local_model("qwen2.5", quant="Q4_K_M")   # never leaves the building
    if request.needs_frontier_reasoning:
        return hosted_model("frontier-tier")             # pay the utility for capability
    return hosted_model("cheap-tier")                    # the boring default

The application that called route() knows nothing about which model answered, and that ignorance is the feature.

What infrastructure and platform engineers should do now

If you build infrastructure for a living, this shift is the best news you have had in years, because it is your discipline applied to a new substrate — not a new discipline you have to learn from scratch. Everything you already know about running a shared service is the thing that is missing from most AI efforts.

So start where the leverage is. Build the gateway before you need it. Even a thin proxy that every team agrees to route through is worth more than the most sophisticated single application, because it is the hook on which all future governance hangs. Get it in early, while there are three consumers and not thirty.

Get cost visible immediately. Tag and meter from day one. The cost conversation is the one that gives a platform team its mandate: the moment finance can see per-team AI spend, the value of central control becomes self-evident and the political argument wins itself.

Treat models and prompts as versioned artefacts in Git, tested in a pipeline, deprecated on a schedule — the same plain-text, version-controlled discipline I apply to everything I build. A prompt with no version history and no evaluation is a liability waiting to regress silently in production.

Bring identity and audit to the front. Decide who can call what, with which data, and log every call in a common format before you have a hundred consumers, not after. Retrofitting audit onto a sprawl is the most miserable project I can imagine, and shadow AI is how you end up needing to.

And learn enough of the AI stack to be dangerous — tokens, context windows, quantisation, retrieval, evaluation. You do not need to train models; you need to understand the workload well enough to run a platform for it, the same way I argue every infrastructure engineer should learn Python. This is also the centre of gravity in presales and consultancy now — the customers worth helping are not asking “can we do an AI project”, they are asking “how do we run AI as a platform”, and most do not yet know that is the question.

Where this leaves us

The framing matters more than any individual tool, which is why I keep returning to it and why so much else I write leans on it. If you take AI to be a feature, every decision that follows is shaped wrong: you optimise for the demo, you let teams sprawl, you discover the bill and the risk far too late, and you re-solve the same problems a dozen times in a dozen incompatible ways. If you take AI to be infrastructure, the decisions arrange themselves — gateway, identity, observability, golden paths, lifecycle, capacity — because they are the same decisions you have always made for any shared service, applied to a new and unusually expensive one.

We have run this play before. Virtualisation, cloud, containers — each went from novelty to sprawl to governance to substrate, and each time the organisations that built the platform ahead of the sprawl were the ones still standing and compounding when the dust settled. AI is on the same curve, only faster and with a worse failure mode, because it fails confidently and it fails in prose.

So stop doing AI projects. Start running an AI platform. The model is not the product, and it never was. The platform is.