The AI Strategic Assessment — A BlueAlly Field Guide

01 The hook · The GenAI divide

Almost everyone is doing AI. Almost no one is winning at it.

In 2025, MIT studied 300 enterprise AI deployments. The finding was blunt: about 95% of generative-AI pilots delivered no measurable return. Only one in twenty crossed into real profit and loss.¹ The money was there — roughly $30–40 billion spent. The discipline was not.

This is not a model problem. The models are good and getting cheaper.⁴ It is a selection problem. Teams pick the wrong opportunities, build the demo, and stall at the gap between a clever prototype and a system the business can trust. Gartner expects at least 30% of generative-AI projects to be abandoned after the proof of concept, and more than 40% of agentic-AI projects to be canceled by 2027 — for cost, weak value, and thin controls.²³

In plain English

Five words to carry the whole guide

Assessment: A short, structured study that turns "we should do AI" into a ranked list of opportunities with numbers behind each one.
Use case: One specific job AI could do — answer a claim, draft a quote, flag a risk. The unit you score and fund.
Feasibility: How hard it is to build and run: data, integration, change, and risk. The honest counterweight to value.
ROI: Return on investment. What the work earns or saves, set against what it costs — across a few honest scenarios, not one hopeful one.
Roadmap: The order you do the work in, grouped into waves — Now, Next, Later — so each wave funds and de-risks the next.

Fig. 1 — The funnel that swallows most AI budgets. Adoption is near-universal; impact is not — BCG finds roughly half of firms still stuck in proofs of concept and only ~4% running true value engines.⁷ The job of an assessment is to send only the right ideas down the funnel — and carry them to profit and loss. Figures from MIT, Gartner, Stanford HAI, and BCG.¹²⁵

95%

of enterprise gen-AI pilots showed no measurable P&L return in MIT's 2025 study of 300 deployments.¹

of organizations are AI "high performers" — those tracing 5%+ of EBIT to AI, per McKinsey's State of AI 2025.⁶

$252B

total corporate AI investment in 2024 (Stanford HAI). The spend is real. The discipline to aim it is the edge.⁵

The bottleneck was never the model. It was choosing what to build.

02 The method · Four moves

Four moves, six weeks, one decision.

An assessment answers a leader's three questions in order: Where do we start? What will it cost? How will we know it worked? We answer them with four moves. Each one narrows the field and sharpens the case, so what reaches the board is short, scored, and sequenced.

Fig. 2 — The assessment as a pipeline. Discover widens the field; Score, Model, and Sequence narrow it. By week six the output is not a slide deck of possibilities — it is a decision.

03 Move one · Discover

Discover: find the real work, not the shiny work.

The best opportunities are rarely the ones leaders name first. They hide in the seams — the report that takes three days, the inbox no one can keep up with, the quote that waits on a specialist. We find them two ways, and we use both.

We listen. Short interviews with the people who do the work and the people who feel its cost. And we measure. Where the data exists, process mining reads the system logs and shows where time and money actually go — not where anyone guesses they go. Then we cross-check the list against a library of patterns that have already paid off in your industry. The output of this move is deliberately long: a wide inventory of candidate use cases, each written the same way.

Workshops & interviews.

One room, the right people, two hours. We map a process end to end and mark every place a human waits, copies, or re-keys. Friction you can see is friction you can fix.

Process mining.

Where logs exist, software reconstructs how work really flows and times every step. It turns "this feels slow" into "this costs 9,000 hours a year." Evidence, not anecdote.

The pattern library.

Most industries share the same high-value jobs — document extraction, triage, knowledge search, drafting. We check your list against what already works, so you skip the dead ends.

Every candidate is captured to one template: the job, who owns it, the data it needs, the systems it touches, and a first guess at value and effort. Same shape for all of them. That sameness is what makes the next move — scoring — fair.

04 Move two · Score

Value on one axis, readiness on the other.

Now we choose. Every candidate gets two honest numbers: how much it is worth, and how ready you are to run it. Plot them, and the portfolio sorts itself into four quadrants. The top-right corner — high value, high readiness — is where you deploy now. The top-left holds the big prizes that need real work first.

Fig. 3 — The signature view: every opportunity, on one chart. Champions (top-right) are high value and ready — fund them first. Strategic (top-left) are big prizes with a readiness gap — sponsor a sprint. Quick Wins (bottom-right) are modest but fast — build the muscle. Foundation (bottom-left) wait until capability arrives. The matrix turns a debate into a decision.

What sets it apart

Priority = Value × Readiness × Confidence − Risk Drag

Most matrices score a guess. We score a return. Value is Expected Value divided by friction cost — the payoff measured against the friction it removes. Multiply by how ready you are and how sure we are, then subtract the drag of risk. What rises to the top has earned its place.

Readiness, weighted

Organization35%

Data30%

Governance20%

Technical15%

Scored 1–10 on a behaviorally-anchored scale. Six is the line between piloting and producing. Above six, deploy with confidence. Below six, invest in readiness first.

Recommended mix

60%Clear high-impact bets

30%Strategic initiatives

10%Experimental ideas

A funded portfolio is mostly sure things. A little room for the bets keeps it honest about the future.

Champions ship first — always.

High value, high readiness. They reach production in a quarter, prove the model, and earn the budget for everything after. Momentum is a strategy.

Strategic plays are sequenced, not skipped.

The biggest prizes often need clean data or a platform first. We sponsor a 90-day sprint to close the readiness gap — then promote them.

05 The rubric · How we score readiness

Readiness is built from four pillars.

Value is the easy axis to argue about. Readiness is the one that kills pilots. So we score it hard. Four pillars, each rated one to ten against written anchors, then weighted — organization and data carry the most, because that is where pilots actually fail. Two people score the same case the same way, and a sponsor can see why it lands where it does.

Readiness pillar	Weight	Scores low (1–3)	Scores high (8–10)
Organization	35%	No owner, no sponsor, no capacity to adopt	Named owner, executive sponsor, ready to change
Data	30%	Data is missing, messy, or locked away	Clean, accessible, already governed
Governance	20%	No policy, unclear risk, no human-in-the-loop	Clear controls, defined oversight, auditable
Technical	15%	No integration path, brittle systems	APIs in place, mostly off-the-shelf

In plain English

Six is the line between piloting and producing

A weighted score is a fair way to add up things that matter differently. We weight organization and data most because that is where pilots fail — not on the model. The weighted readiness lands on a 1–10 scale, and 6.0 is the threshold. Above six, deploy with confidence — the use case is a Champion or a Quick Win. Below six, the value may be real but the readiness is not, so we invest in readiness first rather than ship into a wall. Value, meanwhile, is measured across four lenses — revenue, cost, cash flow, and risk — covered next.

Score in the open, or the score means nothing.

06 Move three · Model the ROI

Model: the return, told honestly.

A ranked list earns attention. A credible number earns a budget. For each finalist we build the financial case the way a CFO would — conservative by default, with the assumptions on the table. Value rarely arrives on day one, so we never model it that way.

Two honesties matter most. First, benefit ramps: adoption climbs over quarters, and value lags behind it. Second, a range, not a point: we model three scenarios — conservative, expected, and upside — and we put the conservative case on the headline slide. McKinsey's data is clear that programs aiming only at cost capture a fraction of the prize; the leaders pursue growth as well.⁶ So we count value across four drivers, not one.

Fig. 4 — The ROI told two ways. Left: benefit trails adoption, so first-year value is a slice, not the whole. Right: cumulative value crosses the investment line at payback — shown for a conservative and an expected case. Numbers are illustrative; the method is not.

Value, measured four ways

Value is not a guess. Each lens has a formula, and we fill it with your numbers — so the value axis of the matrix is a return, not an opinion.

Revenue

New revenue, higher win rates, better cross-sell — the upside McKinsey ties to the leaders, not the laggards.⁶ Conversion lift × deal value × volume.

Cost

Hours returned, errors avoided, manual steps removed. The easiest to measure and the easiest to defend — the bedrock of most Quick Wins. Hours saved × loaded rate × frequency.

Cash flow

Faster collections, leaner inventory, shorter cycles. Value that never shows in a P&L line but that a CFO feels every quarter. DSO reduction × revenue × cost of capital.

Risk

Fewer compliance misses, earlier safety and quality signals, reputation protected. Hard to price, expensive to ignore. Probability × severity × frequency avoided.

The assumptions, stated — not hidden · Implementation timeline → vendor estimate + 25% buffer · Adoption curve → 30% yr 1 · 60% yr 2 · 85% yr 3 · Benefit-realization lag → 3–6 months after go-live · Discount rate → your WACC, or 10% default · Horizon → 5 years · NPV, IRR, payback per case

When the headline number is the conservative one, and every assumption is visible, finance stops arguing with the model and starts planning around it. That is the whole goal of this move.

07 Move four · Sequence

Sequence: Now, Next, Later.

A ranked list is not a plan. Some work depends on other work; some shares a platform; some should wait for the technology to settle. The last move turns the scored portfolio into waves — and each wave is chosen so it funds and de-risks the one behind it.

Fig. 5 — Now, Next, Later — with a foundation underneath. Champions and Quick Wins run first and pay early. Strategic plays follow once the data and platform are ready. The dashed track is the unglamorous work that makes all of it possible.

Wave 1

Now

Champions and Quick Wins. High readiness, ready to ship. They reach production in a quarter and earn the program's credibility — and its budget.

Wave 2

Strategic plays whose readiness gap the first wave closed. More integration, more coordination, more upside.

Wave 3

Later

Platform-scale plays — orchestration, prediction. They need the foundation in place and the organization warmed up.

Parked

Hold

Low priority or blocked. Not dead — watched. We revisit as data, technology, or strategy moves.

08 The output · One page

The whole portfolio, on a single page.

The deliverable an executive keeps is not the funnel or the matrix — it is this grid. Every funded use case, scored across the four value drivers, with its wave and its readiness in plain sight. Dark squares are where the value concentrates. It tells a CEO, in one glance, what the program will pay and where to look.

Fig. 6 — The one-page portfolio. Six funded use cases, four value drivers, three waves. The pattern of dark cells is the strategy made visible: early cost wins, then revenue, then cash-flow plays — each in its wave, each backed by a model.

09 The discipline · Staying honest

What keeps the numbers honest.

An assessment is only as good as its restraint. The failures in the research were not failures of ambition — they were failures of rigor.¹ So every BlueAlly assessment passes the same gates before it reaches a board, and we name the red flags we refuse to ship.

Conservative on the headline.

The number a leader repeats is the conservative one. Upside is shown, never promised. A case that only works in the best scenario is not a case.

Buffers, not vendor math.

Vendor timelines are a floor, so we add 25%. Vendor savings are a ceiling, so we discount them. Optimism is for the pitch, not the plan.

No data, no score.

Every use case carries a data-readiness rating. A brilliant idea with no usable data is a research project — and we say so, plainly.

A range, never a point.

Single-number forecasts are how credibility dies. Three scenarios show the shape of the risk and let a leader choose their comfort.

Baselines before benefits.

We do not claim a 30% gain without measuring today's number first. No baseline, no benefit. KPIs are set before the build, not after.

Validated, not asserted.

Every ranking is reviewed with the sponsors who own the work. A score no stakeholder believes will not survive the first hard quarter.

A demo is judged by its best moment. A program is judged by its worst assumption.

10 The close · A roadmap to fund

Many ideas went in. A roadmap came out.

We discovered the real work. We scored it on value and readiness. We modeled the return the way finance would. We sequenced it into waves that fund themselves. What a leader holds at the end is not a wish list — it is a decision, with numbers behind every line.

What the assessment delivers ✓ A ranked portfolio — every opportunity scored, in the open ✓ Three-scenario ROI per finalist — NPV, IRR, payback ✓ An 18-month roadmap — Now / Next / Later, with a foundation ✓ A board-ready case — conservative headline, assumptions shown

The hard part was never the technology. It was the judgment — what to fund, what to defer, what to leave alone, and how to prove it. That judgment is what BlueAlly brings. Pick the right few, sequence them well, and you spend your budget on the 5% that pays — not the 95% that does not.¹

The story continues: How the Machine Reads → · The AI Engagement Decision Matrix →

11 Sources

Where this comes from

The market figures above are drawn from primary research published by the named institutions. The example use cases, scores, dollar figures, and curves are illustrative — chosen to show the method honestly, not to describe any one client.

MIT Project NANDA, "The GenAI Divide: State of AI in Business 2025" (≈95% of enterprise gen-AI pilots show no measurable P&L return; analysis of 300 deployments). nanda.media.mit.edu
Gartner, "Gartner Predicts 30% of Generative AI Projects Will Be Abandoned After Proof of Concept by End of 2025." gartner.com/en/newsroom
Gartner, "Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027" (Jun 25, 2025). gartner.com/en/newsroom
Stanford HAI, "AI Index Report 2025 — Economy" (AI inference cost for GPT-3.5-level performance fell sharply, ~280× in ~18 months). hai.stanford.edu/ai-index/2025-ai-index-report/economy
Stanford HAI, "AI Index Report 2025" (78% of organizations used AI in ≥1 function in 2024, up from 55%; ~$252.3B total corporate AI investment in 2024). hai.stanford.edu/ai-index/2025-ai-index-report
McKinsey & Company, "The State of AI in 2025: Agents, innovation, and transformation" (≈6% of organizations are AI high performers tracing 5%+ of EBIT to AI; growth strategies beat cost-only). mckinsey.com/quantumblack
Boston Consulting Group, "AI Adoption in 2025 — Build for the Future" (adoption funnel: ~25% doing little, ~49% in proofs of concept, ~22% scaling value, ~4% operating value engines). bcg.com/press

Where AIearns its keep.