Why 80% of AI programs fail — and what the Mechanical Turk teaches us about fixing them.

TDSThe Data Society Editorial·Apr 26, 2026·8 min read

Engraving of the Mechanical Turk chess automaton facing a human opponent

In 1770, a Hungarian inventor named Wolfgang von Kempelen unveiled a machine that could play chess.

It wore a turban. It sat behind a cabinet filled with gears and levers. It beat Napoleon Bonaparte. It beat Benjamin Franklin. For 84 years, it toured the courts of Europe and won — appearing, to all observers, to think autonomously.

It was called the Mechanical Turk. And it was a fraud.

Behind the cabinet, hidden in a compartment too small to stand, crouched a chess master. The gears were theater. The thinking was human.

We have been building Mechanical Turks ever since.

The demo problem

Walk into any large organization running an AI program today and you will find, somewhere, a demo that works beautifully.

A chatbot that answers questions about internal policy. A model that predicts equipment failure. A dashboard that surfaces anomalies in real time. It works in the room. It impresses the committee. The ExCom nods.

Then you ask how many people are using it six months later.

The answer, more often than not, is uncomfortable.

Gartner has tracked this for years. McKinsey published a version of it. Every major consulting firm has a slide about it. The number varies — 70%, 80%, 85% — but the direction is consistent: most enterprise AI initiatives do not survive contact with the organization.

They pilot. They stall. They quietly die.

The question worth asking is not why does this keep happening. The question is why do we keep being surprised that it happens.

What the Turk actually teaches us

The Mechanical Turk did not fail because the chess master was bad at chess. The chess master was world-class.

It failed — eventually, after 84 years — because the architecture was unsustainable. The hidden expert could not scale. The cabinet could not be in two places at once. The moment the illusion of autonomy was required to operate at scale, the whole system collapsed.

This is the exact failure mode of enterprise AI programs in 2026.

The demo works because a small group of highly skilled people — data scientists, ML engineers, a motivated product owner — have engineered a controlled environment where everything holds together. The data is clean. The use case is narrow. The edge cases have been handled manually.

Deploy that into a business unit of 3,000 people with inconsistent data, no change management, and a line manager who was never consulted? The chess master runs out of room.

The failure is not technical. It was never technical.

The three patterns we see repeatedly

Across the AI programs we have observed — at engineering firms, financial institutions, logistics companies, defense contractors — the failures cluster around three structural gaps. Not one. Three, almost always together.

Gap one: The data was not ready, and everyone knew it.

This is the most common gap and the most consistently underestimated. Organizations declare their data "good enough" to start an AI program, then spend the first six months discovering what good enough actually means when a model tries to train on it.

The problem is not that the data is bad. The problem is that no one owns it. Data domains exist on paper. Data domain owners are VP-level titles attached to people who have seventeen other priorities and no incentive to clean anything.

The AI program inherits the data debt of twenty years of system fragmentation. It cannot fix it. It was not designed to fix it.

Gap two: The business never asked for this.

The second gap is ownership. Somewhere between the CDO's office and the business unit, the use case lost its sponsor.

The data team identified the opportunity. The data team scoped the project. The data team built the model. The business unit was consulted at the requirements stage and shown a demo at the delivery stage. In between, it was heads-down engineering.

Then the model lands in the business unit and there is no one there who wanted it, understands it, or has any reason to change their workflow to accommodate it.

Adoption is not a launch problem. It is a co-ownership problem that starts at day one.

Gap three: There was no definition of done.

The third gap is measurement. Most AI programs do not have a clear, pre-agreed definition of what success looks like at ninety days post-deployment.

Not "the model is in production." Not "the dashboard is live." What business outcome changes, by how much, measured how, compared to what baseline?

Without this, the program cannot die. It becomes a zombie — technically alive, practically useless, consuming budget and goodwill in equal measure, impossible to kill because no one ever agreed on what killing it would look like.

What the programs that worked actually did

We have looked carefully at enterprise AI programs that did not follow this pattern — that moved from pilot to production to scale without the usual attrition.

They were not the ones with the best models. They were not the ones with the biggest data science teams or the most sophisticated infrastructure.

They shared three practices that the failing programs did not.

They started with the process, not the model.

Before any data was touched, before any model was specified, the team sat with the business and mapped the workflow in detail. Where does the decision happen? Who makes it? What information do they need? What do they do with it?

The AI was designed to fit into an existing human decision process — not to replace it, and not to exist alongside it as a separate system that people have to remember to check.

They named a business owner on day one, not day ninety.

Every successful program we observed had a named business owner — not a sponsor, not a stakeholder, an owner — who was accountable for adoption and ROI from the first week. This person attended every sprint review. This person approved the definition of done. This person's performance review was connected, however loosely, to whether the thing worked.

The data team built it. The business owner deployed it.

They killed things fast.

The programs that scaled had a culture of fast killing. If a use case showed no signal at the eight-week mark, it was stopped. The team was redeployed. The decision was documented and shared.

This sounds obvious. It is almost never done. The political cost of killing a program that a VP championed, that the data team spent four months on, that was announced in an all-hands, is enormous. Organizations that can absorb that cost and do it anyway are the ones that eventually have portfolios of things that work.

The uncomfortable implication

None of the three gaps above are data science problems.

They are governance problems. Culture problems. Organizational design problems. The technical work — the modeling, the engineering, the infrastructure — is the part that most organizations have gotten reasonably good at.

The part they have not gotten good at is everything that happens before the model is built and everything that happens after it is deployed.

This is what the Mechanical Turk teaches us, 256 years later.

The machine was impressive. The chess master was excellent. What was missing was a sustainable architecture for the relationship between the two.

That architecture — the governance, the ownership, the measurement, the culture — is not a constraint on AI transformation. It is the transformation.

What to do on Monday

If you are leading an AI program, or sitting on a data team wondering why things are not scaling, three questions worth asking this week:

On your current portfolio: For each use case in production, can you name the business owner — not sponsor, owner — and tell me what metric they are accountable for at ninety days?

On your data foundations: Which of your data domains have an owner who has actually touched the data catalog in the last thirty days? Not been briefed on it. Touched it.

On your culture: When was the last time a use case was stopped — not paused, stopped — because it was not working? How long did it take to make that call?

The answers will tell you more about your AI readiness than any model performance metric.

The Data Society publishes weekly field intelligence for data and AI leaders. No consultants. No hype. Subscribe at thedatasociety.co

Want to benchmark your organization's AI readiness across five dimensions? The ATOROK diagnostic takes 20 minutes and is free. atorok.ai

Get notified when this drops.

Plus the weekly brief, every Monday.