← All field guides

A BlueAlly Field Guide

AI is a stack,
not a single tool.

Four chapters. Seven decades. One architecture won — silicon at the base, agents at the top, governance the full height. The companies that win next will not buy a model. They will build a governed system around one. This is the map.

Conquer Complexity

Governed system I · Origins II · The Stack III · The Patterns IV · The Vessel Four rings. One core. The whole modern stack.

The course · four chapters

Continue the story: ← How the Machine Reads  ·  The Agent Harness →

I Chapter One · Origins

Seven decades. Three waves. One architecture won.

Artificial intelligence is older than most software companies. It survived two winters and three reinventions. The story is not what was built. The story is what consolidated.

The field opened with a question. In 1950 Alan Turing asked whether machines could think.1 Six years later a workshop at Dartmouth gave the work a name.2 For thirty years the answer was the same: write the rules. The rules approach broke twice. Funding collapsed in 1974, and again in 1987. The lesson did not change. Hand-written intelligence does not scale.

A short history, in milestones 1950 → 2026. The pace is not steady. It bends sharply upward at 2012. 1950 1960 1970 1980 1990 2000 2010 2020 Winter I Winter II Turing test1950 DartmouthAI named, 1956 Deep Blue1997 AlexNet2012 Transformer2017 ChatGPT2022 Agents2026
Fig. 1 — Seventy-six years, one steep bend. Nothing about the early decades predicts the slope after 2012. Three forces — data, compute, and one architecture — turned a slow field into a fast one.

The three waves, each more powerful than the last

Wave 1

Symbolic AI

1950s – 1980s

Hand-coded rules and logic. Experts wrote what the machine should know. Brittle. Expensive. Narrow.

Intelligence cannot be fully captured in hand-written rules.
Wave 2

Machine Learning

1990s – 2010s

Statistical patterns from data. The machine finds what matters. Scalable. Flexible. Powerful.

The algorithm that learns from data beats the one coded by hand.
Wave 3

Deep Learning

2012 – present

Neural networks at scale. The machine learns its own representations. General. Creative. Transformative.

Each order of magnitude in scale unlocks abilities that did not exist below it.

The consolidation no one predicted

For most of the field's life, AI was six separate disciplines. Computer vision. Natural-language processing. Machine learning. Knowledge representation. Reasoning. Robotics. Each had its own data, its own tools, its own teams.3

Then, in roughly five years, the walls fell. Vision started using transformers. Reasoning was handed to language models.9 Robotics learned to speak in tokens. One architecture pulled six fields toward a shared center. This is the most important shift to understand. Modern AI is not many systems stitched together. It is one foundation, extended in many directions.

In plain English

What made deep learning win in 2012

Three forces arrived at once. Miss any one and the field stays slow.

Data
The open internet had piled up oceans of text and images to learn from.
Compute
GPUs, built for graphics, turned out to be near-perfect for the parallel math that neural networks need.
Algorithms
AlexNet won the ImageNet contest by a margin so wide the field rerouted overnight.4 The Transformer (2017) then let language models train on GPUs at scale.5
Scaling laws
Researchers found capability rises with scale — predictably — and cost rises faster. A research finding became a roadmap.6
900M
Weekly active ChatGPT users by February 2026 — up from 800M four months earlier, and past a billion monthly by mid-2026.7
~$78M
Estimated compute cost to train GPT-4. Frontier training keeps climbing.8
280×
Cheaper to run a GPT-3.5-level query in about 18 months. Inference costs fell as training costs rose.8
The consolidation made one architecture rule them all. What is in that architecture?

II Chapter Two · The Stack

No single thing is "the AI." The stack is.

A demo is not a deployment. A model is not a strategy. The companies that win will not buy models. They will build governed systems around them.

Looks impressive in a meeting

Demo

Wins applause. Loses production.
  • No real data
  • No controls
  • No accountability
Predicts well in the lab

Model

Gets the answer right. Cannot ship it.
  • No retrieval
  • No tools
  • No workflow fit
Acts · measured · trusted

Governed system

Where the business value lives.
  • Grounded in trusted data
  • Tools, memory, approvals
  • Tied to a business KPI

The master map — one picture, before we walk it

People say "AI" as if it were one thing. It is not. It is a stack — layers built on layers, each one standing on the work below. The chip does not know what an agent is. The agent does not care which chip it runs on. Between them sit cloud, models, and the plumbing that ties them together. Down the side runs governance, watching all of it. Study this map for a minute. Then we will take it apart, layer by layer — what each is, who builds it, and the one decision a leader actually makes there.

In plain English

Six words to carry the whole map

Stack
Layers built on layers. Each hides the messy details below and offers something cleaner above.
Accelerator
A chip built to do AI math fast — a GPU or a custom AI chip. The raw muscle under everything.
Foundation model
A very large model trained once on a mountain of data, then reused for many tasks. The engine of meaning.
Inference
Running a trained model to get an answer. Training builds the model; inference uses it. You pay for both, separately.
Orchestration
The conductor. It routes work between models, tools, and steps so a request becomes a finished job.
Governance
The rules and the watching — permissions, safety checks, logs, evaluation — applied at every layer, not bolted on at the end.
The modern AI stack — silicon to software, with governance down the side Each layer stands on the one below and hides its detail. Value rises; control runs the full height. LAYER 5 Applications & agents Copilots · research assistants · automations · the screens people actually touch Copilots Agents RAG · workflow automation LAYER 4 Orchestration & serving Agent frameworks, routers, and the engines that serve models at speed LangGraph CrewAI Agents SDKs vLLM SGLang TensorRT-LLM LAYER 3 Foundation models Language, multimodal, and embedding models — trained once, reused everywhere Claude Opus 4.8 Claude Fable 5 GPT-5.5 Gemini 3 Llama 4 LAYER 2 Cloud & infrastructure Data centers, networking, storage, and Kubernetes — where the chips actually live AWS Azure Google Cloud CoreWeave Oracle · neoclouds LAYER 1 Silicon & accelerators GPUs and custom AI chips — the raw muscle every layer above borrows NVIDIA Blackwell Google TPU AWS Trainium Rubin (Q3'26) value to the business rises ↑ Governance & observability Identity & access Guardrails & safety PII & data residency Cost & rate limits Traces & logs Evaluation suites Audit & lineage Human-in-the-loop Spans every layer. Not a bolt-on. trace every call grade every change
Fig. 2 — The whole stack on one page. Five layers, bottom to top, each standing on the one below. Governance and observability run the full height on the right — applied everywhere, never an afterthought. The named vendors are current exemplars as of mid-2026, not an endorsement.
No layer is "the AI." The system is the AI. The map is how you see it whole.

Layer 1 · Silicon — chips that do the math

Every answer a model gives is arithmetic — billions of small multiplications, done at once. Ordinary processors do them one stream at a time. Accelerators do them in parallel, thousands at a stroke. This is the floor of the stack. Everything above borrows this muscle.

Two kinds of chips matter. GPUs — graphics processors, repurposed for AI — are the general-purpose workhorse. NVIDIA's current generation is Blackwell, shipping in volume now; its successor, Vera Rubin, is in production and ships from Q3 2026, with cloud instances to follow.20 Then there are custom AI chips, designed by the cloud giants for their own data centers: Google's TPU, now in its Ironwood generation, and AWS Trainium, now on its third.2122 The numbers move every quarter; the shape of the problem does not — more parallel math, more fast memory beside it, more chips wired as one.

The one decision a leader makes here

Rent or own? Almost everyone should rent — buy compute by the hour from a cloud, and let someone else carry the power, cooling, and the next chip upgrade. You own silicon only when usage is huge, steady, and predictable enough that the math flips. For nearly every enterprise program, owning a data center is a distraction from the work that creates value.

A CPU — a few lanes, one at a time Fast on one stream. Slow on a million. An accelerator — thousands of lanes at once Built for the million. This is why AI runs on these.
Fig. 3 — Why AI runs on accelerators. A CPU is a sports car on a single lane. An accelerator is a thousand-lane highway. AI math is a million cars going the same way, so the highway wins.

Layer 2 · Cloud — where the chips actually live

A chip alone is a paperweight. It needs power, cooling, fast networking, storage, and software to share it across many users. That housing is the infrastructure layer. It turns a warehouse of silicon into compute you can rent by the minute. Two camps supply it. The hyperscalers — AWS, Microsoft Azure, and Google Cloud, with Oracle now at the frontier — run global data centers and rent everything above the chip. Alongside them, neoclouds like CoreWeave build data centers tuned for AI alone; CoreWeave has passed a gigawatt of active power and signed multi-year deals with both Meta and Anthropic.23 Underneath sits the connective tissue every serious system needs: object storage, low-latency networking, and Kubernetes — the open-source tool that schedules software across thousands of machines without a human placing each one.

The one decision a leader makes here

One cloud, or more than one? A single provider is simpler to run and often cheaper to start. Many clouds — or a neocloud beside your hyperscaler — buy leverage on price, protection if one runs short of chips, and a path to keep regulated data where the law requires. Most enterprises land in the middle: a primary cloud for the bulk of the work, a second relationship kept warm. Choose with your eyes open, because moving later is real work.

Layer 3 · Foundation models — the engines of meaning

This is the layer most people mean when they say "AI." A foundation model is trained once, at great cost, on a vast sweep of text, images, and code. What comes out can read, write, reason, and translate across thousands of tasks it was never told about. You do not train it. You rent it, and you build on top. Language models read and write text — the frontier ones now hold around a million tokens of context, roughly a long novel.10 Multimodal models add images, audio, and video. Embedding models do something quieter but vital: they turn a passage into a list of numbers that captures its meaning, so software can search by idea rather than keyword — OpenAI's text-embedding-3-large emits 3,072 numbers per passage.24

The frontier is a short list, and it moves every few months. As of mid-2026 it includes Anthropic's Claude Opus 4.8, OpenAI's GPT-5.5, and Google's Gemini 3 — all near a million-token context, with Opus 4.8 leading the independent intelligence rankings.1025 Anthropic's long-horizon Claude Fable 5 joined the frontier in June 2026.26 Meta's Llama 4 leads the open-weight pack, its Scout model advertising a ten-million-token window.27 Closed models you call over an API; open-weight models you can download and run yourself. That single fork shapes cost, control, and where your data goes.

The one decision a leader makes here

Closed frontier, or open weights? Reach for a closed API when you want the strongest model and the least to run yourself. Reach for open weights when control, cost at scale, or keeping data inside your walls matters more than the last few points of capability. The wise default is not loyalty to one model — it is staying model-agnostic, so you can swap the engine as the frontier moves, which it will.

Three families of model, one shared idea: trained once, reused everywhere Language Reads and writes text. Reasons, drafts, summarizes, translates. ~1M-token context Multimodal Adds images, audio, video. Reads a chart; describes a photo. text + pixels + sound Embeddings Turns meaning into numbers, so software can search by idea. 3,072 dims (3-large) The fork that shapes everything: closed API vs. open weights Closed — you call it over an API Top capability, zero servers to run, priced per token. Your prompts leave your walls under a contract. Claude Opus 4.8 GPT-5.5 Gemini 3 Claude Fable 5 Open weights — you download and run it Full control, runs in your walls, no per-token bill — but you own the servers, the tuning, and the upkeep. Llama 4 Scout Mistral DeepSeek
Fig. 4 — The model layer, and its defining fork. Three families do different jobs; all are trained once and reused. The choice that follows you for years is closed-API versus open-weight — capability and ease on one side, control and ownership on the other.

Layer 4 · Orchestration & serving — where a request becomes a job

A model answers one prompt. A real task needs more: find the right documents, call a tool, check the result, try again, hand off to a second model. Something must direct that traffic. Serving is the engine room — it runs a model fast for many users at once, batching requests and reusing work. The leaders are open source: vLLM, whose paged-memory trick made it the common default, and SGLang, tuned for agent workloads that share long prompts; NVIDIA's TensorRT-LLM wrings the most from NVIDIA hardware.28 Orchestration is the conductor — it decides the order of steps, routes each to the right model or tool, retries failures, and holds the state of a long job. The frameworks have settled into a short list: LangGraph for stateful, auditable flows; CrewAI for role-based teams of agents; and the official agent SDKs from the labs.29 This is also where the Model Context Protocol lives.

The one decision a leader makes here

How much to wire yourself? A heavier framework gives you control and an audit trail, at the cost of more code to maintain. A lighter one ships faster but hands you less to steer. The honest answer changes as models improve: each generation does more on its own, so the plumbing you need keeps getting thinner. Build the scaffolding today's model requires — and no more.

Layer 5 · Applications & agents — what people actually use

This is the only layer most of your colleagues will ever see. The chips, the cloud, the model, the plumbing — all of it exists to put one useful thing on a screen. A copilot sits beside a person and helps, while the human stays in charge. A RAG application answers questions from your own documents, with citations, so the answer can be checked. An agent goes further: it works in a loop, planning and calling tools until a job is done, then pausing for a human to approve anything that writes to a system of record. The honest line between them is autonomy. Most enterprise value today lives in the careful middle — agents that read, retrieve, and draft at machine speed, but wait for a person before they spend money, send a message, or change a record. Reading is leverage. Writing is risk. We open this loop in full in The Agent Harness.

The one decision a leader makes here

How much autonomy, on which task? Set it task by task, not once for the whole system. Let an agent read every contract and draft the summary on its own. Make it stop and ask a human before it emails a customer or changes a price. The line is not technical timidity — it is where a mistake stops being cheap. Put the human exactly there, and nowhere it adds no safety.

The autonomy spectrum — how much the software decides on its own Assistant answers when asked you drive Copilot suggests as you work it asks before acting Supervised agent acts, waits to write most value lives here Autonomous agent runs the whole loop it reports after ← more human oversight more autonomy →
Fig. 5 — From assistant to autonomous agent. Autonomy rises left to right; human oversight falls. The sweet spot for enterprise work sits just left of full autonomy — agents that do the reading and drafting, and ask permission before they act on the world.

The leader's decision at each layer

Few leaders need to pick a chip. Almost all will choose a cloud posture, a model strategy, how much plumbing to own, how much autonomy to grant, and who holds the governance spine. Those choices, made well, are most of the distance between a demo and a system you can run.

LayerThe one decision
SiliconRent compute, or own it. Nearly always: rent.
Cloud & infrastructureOne cloud, or several — for price, resilience, and data residency.
Foundation modelsClosed frontier or open weights — and stay model-agnostic.
Orchestration & servingHow much plumbing to wire yourself; keep it as thin as the model allows.
Applications & agentsHow much autonomy, set task by task — read freely, approve the writes.
Governance (the spine)Name an owner before the first pilot; trace, grade, and gate from day one.

So what: buy the model; build the system around it — and govern the full height. A model predicts. A system acts. What does it take for a system to act?

III Chapter Three · The Patterns

A chatbot answers. An agent acts.

Same model under the hood. Very different role in the business. The difference is governance — and that is where the value lives.

Chatbot — prompt → answer User asks Model answers A skilled author of language. Stops there. The user takes the next step. Agent — goal → plan → act → reflect Goaluser input Plandecompose Acttools · retrieval Reflectcritique Handoff
Fig. 6 — Two shapes of the same model. A chatbot is a line. An agent is a loop with tools and accountability. The loop is what turns a model into a work system.

Eleven patterns that compose every real agent

Each pattern solves one problem. Production systems combine three to seven. Anthropic, OpenAI, LangChain, and the academic literature converged on this list during 2024–2025.

PATTERN 01

Augmented LLM

[ LLM ] ↔ retrieval, tools, memory

A model extended with retrieval, tools, and memory. The atomic unit beneath every other pattern.

Anthropic 2024
PATTERN 02

Prompt Chaining

step 1 → step 2 → step 3

Decompose a task into a fixed sequence. The most reliable pattern when the logic can be written down.

Anthropic 2024
PATTERN 03

Routing

classify → dispatch

Inspect the input, send it to the right specialist. The base of every multi-domain assistant.

Anthropic 2024
PATTERN 04

RAG

query → retrieve → ground

Retrieval-Augmented Generation. Ground every claim in a verified source. The standard control for hallucination.

Lewis et al. 2020
PATTERN 05

Tool Use

LLM → call API → observe

The model dispatches deterministic tools — SQL, calculators, APIs. MCP is now the standard wire format.

Anthropic · OpenAI 2024
PATTERN 06

ReAct

thought → action → observation

Reason and act in turns. The model alternates thinking with tool calls until the job is done.

Yao et al. 2022
PATTERN 07

Plan-and-Execute

plan → execute → revise

Separate the planner from the executor. Big agentic systems split decomposition from doing.

LangChain 2024
PATTERN 08

Reflection

draft → critique → revise

Generate an answer, critique it, fix it. Standard now in coding agents, math, and document synthesis.

Shinn et al. 2023
PATTERN 09

Memory

session · user · agent

Carry state across steps and conversations. Three scopes. Temporal knowledge graphs are the 2026 frontier.

Letta · Mem0 · Zep 2025
PATTERN 10

Multi-Agent

supervisor → [a] [b] [c]

Specialized agents under a supervisor. Anthropic's research system beat single-agent Opus 4 by 90% on breadth-first tasks.

Anthropic 2025
PATTERN 11

Guardrails + HITL

policy → approve → log

Layered defense plus human escalation. The pattern that lets agents act on the business safely.

OWASP LLM Top 10 (2025)

The autonomy ladder — where the human stands

Autonomy is not a switch. It is a dial, and where you set it decides the blast radius. Most enterprise pilots start one rung up from the bottom: the agent reads everything and writes nothing.

Four rungs — risk rises as the human steps back Read onlyhuman reads output Read + suggestagent drafts, human acts Write w/ approvalhuman gate on every write Autonomousnarrow, audited, reversible H most pilots start here low blast radius high blast radius
Fig. 7 — The autonomy ladder. Reading at machine speed is a gift. Writing at machine speed is a risk. Climb the ladder only where the evidence — and the reversibility — says you can.

LLM for judgment. Tools for exactness.

The most expensive mistake in production AI is asking a model to do arithmetic. The second is asking a calculator to write a memo. Treat them as partners, each doing what it does best.

Probabilistic — the model's territory.

Best at: summarizing, drafting, classifying, reasoning under ambiguity. Weak at: exact arithmetic, deterministic policy, repeatability. Example: "Draft a tactful reply to this complaint."

Deterministic — the tool's territory.

Best at: math, lookups, calculations, policy enforcement. Strength: repeatable, auditable, precise to the digit. Example: SQL, Python, rules engines, solvers.

The rule we give every client: the model decides which calculation to run — never the answer to it. Totals come from SQL. Approvals come from code.

Open standard
MCP — the Model Context Protocol — is now the common wire format for connecting models to tools.11
Nov 2024
Anthropic introduced MCP. Think USB-C: one port, every device.11
Linux Fdn
Anthropic donated MCP to the new Agentic AI Foundation in December 2025; Block and OpenAI are founding members — the standard stays neutral.12
The more an AI system can do, the more it must be governed.

IV Chapter Four · The Containment Vessel

Architect once. Scale securely.

AI is not deterministic. The old security playbook was. Build the containment vessel before you start the reaction.

Remember the spine down the side of the master map — governance and observability, running the full height. This chapter is that spine, drawn close. A normal program fails loudly; it crashes, and you know. An AI system fails politely: fluent, confident, and wrong. You cannot trust what you cannot see. So a holistic framework cannot rely on blocking. It must rely on constrained orchestration — six pillars that hold the reaction without smothering it.

Six pillars. One vessel. Reasoning core non-deterministic Data Sovereigntytier the data first Compute Integrityverify the silicon The Agentic Gapleast privilege Output Lineagetrace every claim Defensive ROIvalue > the tax Truth Disciplineground every output
Fig. 8 — The containment vessel. The reasoning core is powerful and non-deterministic. Six pillars hold it: govern the data, verify the compute, close the agentic gap, trace every output, defend the ROI, and ground every claim in a source.

Three tiers. Three deployment paths.

Classify before you compute. The data tier defines the model — not the other way around.

Tier 01 · Critical IP

Private Sovereign

On-premises or VPC-isolated. Open-weight models under your control. No data leaves the perimeter.

Examples

Source code · M&A memos · patient records · board materials · customer PII

Tier 02 · General Ops

Enterprise Managed

Vendor APIs with binding contracts. Zero retention. SOC 2 plus ISO/IEC 42001 attestation.

Examples

Meeting notes · product specs · marketing drafts · project plans · HR documents

Tier 03 · Public

Open Access

Public data, public models. Standard managed APIs. Lightweight governance.

Examples

Press releases · SEC filings · public docs · blog posts · marketing assets

The agentic risk matrix

Privilege times blast radius. The amber and red cells need a human-in-the-loop gate before any write reaches a system of record.

Privilege × blast radius Single userTeamDepartmentEnterprise SCOPE → Read onlyRead + suggestWrite w/ HITLAutonomous write PRIVILEGE → Low Low Low–Med Low–Med Low–Med Low–Med Medium Medium Low–Med Medium High High Medium High Critical Critical Low Medium High Critical — HITL gate required
Fig. 9 — The agentic risk matrix. Risk is the product of what an agent may touch and how widely. Autonomous writes at department or enterprise scope are the critical cells — they demand a human gate, a tight scope, and a full audit trail.13

The hallucination tax is measurable.

Most enterprise AI projects earn high-fives, not returns. If the GPU bill plus human review exceeds the manual cost, the project failed. IBM put hard numbers on the tax in its 2025 breach report.14

+$670K
Added to the average breach cost when unsanctioned "shadow AI" is involved — now a top-three cost factor.14
97%
Of organizations breached through an AI incident lacked proper AI access controls.14
~5%
Of enterprises capture substantial value from generative AI at scale. Most see no material return yet.15

The frameworks that already apply

These are not future regulations. As of mid-2026 they are live, phasing in, or named in enforcement.

ISO/IEC 42001

LIVE · 2023+

The first AI management-system standard. Becoming table stakes for AI vendors that handle enterprise data.16

NIST AI RMF

LIVE · +GenAI Profile

Govern, Map, Measure, Manage. The Generative AI Profile (2024) is the GenAI-specific companion. Referenced across U.S. agencies.17

EU AI Act

PHASING · high-risk → Dec 2027

General-purpose obligations live since 2025. High-risk (Annex III) rules were set for 2 August 2026, but the Digital Omnibus — provisionally agreed 7 May 2026 — defers them to 2 December 2027, pending formal adoption. Fines reach €15M or 3% of turnover for high-risk breaches, and €35M or 7% for prohibited practices.18

OWASP LLM Top 10 + MITRE ATLAS

LIVE · 2025

Prompt injection sits at #1. ATLAS gives a shared vocabulary for AI threat modeling.19

So what: the vessel is not optional and not future. Build it once, well, and every workload after it ships faster and safer.

The close

The model predicts. The system acts.

You have walked all four chapters. Origins showed why one architecture won. The Stack drew the whole map — silicon to agents, governance the full height. The Patterns showed how a model becomes an agent. The Vessel showed how to hold it safely. The enterprise wins when both the model and the system are governed.

What good looks like ✓ The model rents the reasoning. Your data stays yours. ✓ Five layers and a spine — the governed system around it. ✓ Agents read freely, write with approval, and log everything. ✓ Every claim maps to a source. Every action to an owner.

The hard part was never naming the layers. It is the judgment inside each choice — what to rent, what to own, what to retrieve, what to govern, what to automate, and what to leave alone. That judgment is what BlueAlly brings to the table.

Three ways to begin the work

Each ships in weeks, not quarters. Each starts with a structured assessment and ends with deliverables you can act on.

Keep reading the series: ← How the Machine Reads traces one sentence through the stack · The Agent Harness → opens the loop that makes an agent work.

Sources

Where this comes from

Every factual claim above is drawn from a primary source — papers, model cards, standards bodies, vendor documentation, and the official reports named below. Figures in the diagrams are illustrative where noted; the mechanisms and the cited statistics are not. Product names are current as of mid-2026 — naming a product is description, not endorsement.

  1. Turing, A. M., "Computing Machinery and Intelligence," Mind, 1950. academic.oup.com/mind/LIX/236/433
  2. McCarthy, Minsky, Rochester & Shannon, "A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence," 1955. dartmouth.edu/ai-coined-at-dartmouth
  3. Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.), Chapter 1, 2020. aima.cs.berkeley.edu
  4. Krizhevsky, Sutskever & Hinton, "ImageNet Classification with Deep Convolutional Neural Networks" (AlexNet), NeurIPS 2012. papers.nips.cc/paper/4824
  5. Vaswani et al., "Attention Is All You Need," arXiv:1706.03762, 2017. arxiv.org/abs/1706.03762
  6. Kaplan et al., "Scaling Laws for Neural Language Models," arXiv:2001.08361, 2020; Hoffmann et al., "Training Compute-Optimal LLMs" (Chinchilla), arXiv:2203.15556, 2022. arxiv.org/abs/2001.08361
  7. OpenAI / TechCrunch, ChatGPT weekly active users surpass 900M (27 Feb 2026), up from 800M (Oct 2025); >1B monthly by mid-2026. techcrunch.com/chatgpt-900m-weekly-active-users
  8. Stanford HAI, "2025 AI Index Report," Ch. 1 (GPT-4 training ~$78M; ~280× inference price drop at GPT-3.5 level). hai.stanford.edu/ai-index/2025
  9. OpenAI, "Learning to reason with LLMs" (o1 / inference-time compute), 2024. openai.com/index/learning-to-reason-with-llms
  10. Anthropic, "Claude — Models overview" (Claude Opus 4.8; ~1M-token context). platform.claude.com/docs/en/about-claude/models/overview
  11. Anthropic, "Introducing the Model Context Protocol" (Nov 2024). anthropic.com/news/model-context-protocol
  12. Linux Foundation, "Formation of the Agentic AI Foundation" (MCP donated; Anthropic, Block, OpenAI; 9 Dec 2025). linuxfoundation.org/press/agentic-ai-foundation
  13. OWASP, "Top 10 for LLM Applications" (2025) & MITRE, "ATLAS" — privilege and blast-radius framing for the risk matrix. genai.owasp.org/llm-top-10
  14. IBM, "Cost of a Data Breach Report 2025" (shadow AI adds ~$670K; 97% of AI-breached orgs lacked AI access controls). ibm.com/reports/data-breach
  15. BCG & McKinsey, 2025 enterprise-AI value studies (a small share of firms capture value at scale). bcg.com/publications/2025/ai-impact-gap
  16. ISO/IEC, "42001:2023 — Artificial intelligence management system." iso.org/standard/42001
  17. NIST, "AI Risk Management Framework" (1.0, 2023) & "Generative AI Profile" (NIST-AI-600-1, 2024). nist.gov/itl/ai-risk-management-framework
  18. European Union, "Artificial Intelligence Act," Art. 99 (penalties €15M/3% high-risk, €35M/7% prohibited). High-risk (Annex III) timeline deferred from 2 Aug 2026 to 2 Dec 2027 under the Digital Omnibus (provisional agreement, 7 May 2026). artificialintelligenceact.eu/article/99 · Gibson Dunn, Omnibus analysis
  19. Anthropic, "Building Effective Agents" (pattern catalogue: augmented LLM, chaining, routing, tool use, reflection, multi-agent). anthropic.com/research/building-effective-agents
  20. NVIDIA Newsroom, "NVIDIA Kicks Off the Next Generation of AI With Rubin" (Vera Rubin in production; ships H2 2026, cloud instances to follow) & "Blackwell Platform Arrives." nvidianews.nvidia.com/news/rubin-platform-ai-supercomputer
  21. Google Cloud, "Ironwood: the first Google TPU for the age of inference" (TPU v7; GA at Cloud Next 2026). blog.google/.../ironwood-tpu-age-of-inference
  22. Amazon Web Services, "AWS Trainium" — custom AI training and inference silicon (Trainium3, 3nm, Dec 2025). aws.amazon.com/ai/machine-learning/trainium
  23. CoreWeave, 2026 results & customer agreements (Meta, Anthropic; >1 GW active power). coreweave.com/news
  24. OpenAI, "New embedding models and API updates" (text-embedding-3-large, 3,072 dims). openai.com/index/new-embedding-models-and-api-updates
  25. Artificial Analysis & OpenAI / Google DeepMind, model documentation and Intelligence Index for GPT-5.5 and Gemini 3 (mid-2026). platform.openai.com/docs/models
  26. Anthropic, "Claude Fable 5 and Claude Mythos 5" (released June 2026). anthropic.com/news/claude-fable-5-mythos-5
  27. Meta AI, "The Llama 4 herd" (Scout 10M-token context, open weights). ai.meta.com/blog/llama-4-multimodal-intelligence
  28. vLLM Project, documentation (PagedAttention serving) & SGLang (RadixAttention). docs.vllm.ai
  29. LangChain, "The best AI agent frameworks" (LangGraph, CrewAI, agent SDKs). langchain.com/resources/ai-agent-frameworks