The Architecture of Intelligent Systems

I Chapter One · Origins

Seven decades. Three waves. One architecture won.

Artificial intelligence is older than most software companies. It survived two winters and three reinventions. The story is not what was built. The story is what consolidated.

The field opened with a question. In 1950 Alan Turing asked whether machines could think.¹ Six years later a workshop at Dartmouth gave the work a name.² For thirty years the answer was the same: write the rules. The rules approach broke twice. Funding collapsed in 1974, and again in 1987. The lesson did not change. Hand-written intelligence does not scale.

Fig. 1 — Seventy-six years, one steep bend. Nothing about the early decades predicts the slope after 2012. Three forces — data, compute, and one architecture — turned a slow field into a fast one.

The three waves, each more powerful than the last

Wave 1

Symbolic AI

1950s – 1980s

Hand-coded rules and logic. Experts wrote what the machine should know. Brittle. Expensive. Narrow.

Intelligence cannot be fully captured in hand-written rules.

Wave 2

Machine Learning

1990s – 2010s

Statistical patterns from data. The machine finds what matters. Scalable. Flexible. Powerful.

The algorithm that learns from data beats the one coded by hand.

Wave 3

Deep Learning

2012 – present

Neural networks at scale. The machine learns its own representations. General. Creative. Transformative.

Each order of magnitude in scale unlocks abilities that did not exist below it.

The consolidation no one predicted

For most of the field's life, AI was six separate disciplines. Computer vision. Natural-language processing. Machine learning. Knowledge representation. Reasoning. Robotics. Each had its own data, its own tools, its own teams.³

Then, in roughly five years, the walls fell. Vision started using transformers. Reasoning was handed to language models.⁹ Robotics learned to speak in tokens. One architecture pulled six fields toward a shared center. This is the most important shift to understand. Modern AI is not many systems stitched together. It is one foundation, extended in many directions.

In plain English

What made deep learning win in 2012

Three forces arrived at once. Miss any one and the field stays slow.

Data: The open internet had piled up oceans of text and images to learn from.
Compute: GPUs, built for graphics, turned out to be near-perfect for the parallel math that neural networks need.
Algorithms: AlexNet won the ImageNet contest by a margin so wide the field rerouted overnight.⁴ The Transformer (2017) then let language models train on GPUs at scale.⁵
Scaling laws: Researchers found capability rises with scale — predictably — and cost rises faster. A research finding became a roadmap.⁶

900M

Weekly active ChatGPT users by February 2026 — up from 800M four months earlier, and past a billion monthly by mid-2026.⁷

~$78M

Estimated compute cost to train GPT-4. Frontier training keeps climbing.⁸

280×

Cheaper to run a GPT-3.5-level query in about 18 months. Inference costs fell as training costs rose.⁸

The consolidation made one architecture rule them all. What is in that architecture?

II Chapter Two · The Stack

No single thing is "the AI." The stack is.

A demo is not a deployment. A model is not a strategy. The companies that win will not buy models. They will build governed systems around them.

Looks impressive in a meeting

Demo

Wins applause. Loses production.

No real data
No controls
No accountability

Predicts well in the lab

Model

Gets the answer right. Cannot ship it.

No retrieval
No tools
No workflow fit

Acts · measured · trusted

Governed system

Where the business value lives.

Grounded in trusted data
Tools, memory, approvals
Tied to a business KPI

The master map — one picture, before we walk it

People say "AI" as if it were one thing. It is not. It is a stack — layers built on layers, each one standing on the work below. The chip does not know what an agent is. The agent does not care which chip it runs on. Between them sit cloud, models, and the plumbing that ties them together. Down the side runs governance, watching all of it. Study this map for a minute. Then we will take it apart, layer by layer — what each is, who builds it, and the one decision a leader actually makes there.

In plain English

Six words to carry the whole map

Stack: Layers built on layers. Each hides the messy details below and offers something cleaner above.
Accelerator: A chip built to do AI math fast — a GPU or a custom AI chip. The raw muscle under everything.
Foundation model: A very large model trained once on a mountain of data, then reused for many tasks. The engine of meaning.
Inference: Running a trained model to get an answer. Training builds the model; inference uses it. You pay for both, separately.
Orchestration: The conductor. It routes work between models, tools, and steps so a request becomes a finished job.
Governance: The rules and the watching — permissions, safety checks, logs, evaluation — applied at every layer, not bolted on at the end.

Fig. 2 — The whole stack on one page. Five layers, bottom to top, each standing on the one below. Governance and observability run the full height on the right — applied everywhere, never an afterthought. The named vendors are current exemplars as of mid-2026, not an endorsement.

No layer is "the AI." The system is the AI. The map is how you see it whole.

Layer 1 · Silicon — chips that do the math

Every answer a model gives is arithmetic — billions of small multiplications, done at once. Ordinary processors do them one stream at a time. Accelerators do them in parallel, thousands at a stroke. This is the floor of the stack. Everything above borrows this muscle.

Two kinds of chips matter. GPUs — graphics processors, repurposed for AI — are the general-purpose workhorse. NVIDIA's current generation is Blackwell, shipping in volume now; its successor, Vera Rubin, is in production and ships from Q3 2026, with cloud instances to follow.²⁰ Then there are custom AI chips, designed by the cloud giants for their own data centers: Google's TPU, now in its Ironwood generation, and AWS Trainium, now on its third.²¹²² The numbers move every quarter; the shape of the problem does not — more parallel math, more fast memory beside it, more chips wired as one.