Hero background image

How FinOps helps avoid failed AI implementations.

8 May 2026 9 min read Blog Post

So many AI projects are failing...

And it’s generally not because the models are inadequate – it's because of poor financial governance, chronic cost overruns and a lack of operational insight. Organisations that do succeed have financial governance embedded from Day 1, which is where FinOps for AI comes in.

In this article, I look at how to embed FinOps into your organisation’s AI approach, so AI has the foundation to become a sustainable, value-generating capability at scale.

Let’s dive in...

Why AI projects fail: The financial traps no one plans for

The economics of AI infrastructure are different from conventional cloud workloads because token-based pricing introduces microscopic billing units that scale massively in aggregate. A single API call costs almost nothing; 10 million API calls per month at a mid-tier input rate of $3.00 per million tokens can become a $24,000 monthly line item before output tokens are even considered. Add an unoptimised context window stuffed with irrelevant data, and that number can multiply several times overnight.

Beyond raw pricing, there are several consistent failure patterns we see across organisations:

  • Underestimated inference costs: Teams budget for the build but not the run. Training costs often receive scrutiny but the ongoing inference bill that dwarfs them does not.
  • Oversized models in production: Teams prototype with a frontier model because it 'just works' and then carry that model choice all the way to production without ever asking whether a smaller, cheaper model would suffice.
  • Looping costs: An AI agent can get stuck in a loop, calling an LLM 20 times to solve one task and effectively burn through a month's budget in an hour.
  • Context window bloat: Engineering teams, wanting the AI to be 'as informed as possible', stuff full documents, long chat histories and broad knowledge base retrievals into every request. Every token is billed, but most of it’s irrelevant.
  • Shadow AI spend: Individual teams and developers procure AI API access directly, bypassing central governance entirely. By the time Finance notices the spike, the behaviour is already embedded in product architecture.
  • Premature fine-tuning: Organisations spend significant GPU compute budget on training when a well-designed prompt engineering strategy would have delivered majority of the performance improvement at a fraction of the cost.

What you’re actually paying for

Before you can do any cost optimisation, you need a rigorous understanding of AI billing models.

Token-based pricing

Most leading providers (OpenAI, Anthropic, Google) price on token consumption, split into input tokens (the prompt you send) and output tokens (the model's response). Output tokens are typically 3 to 5 times more expensive than input ones, which makes verbose, unstructured responses a quality and cost problem.

Model tier matters enormously. Anthropic's Claude pricing illustrates the spread well – Claude Haiku sits at a fraction of the cost of Claude Opus for the same task. Using Opus is excessive for workloads where Haiku delivers acceptable quality.

Subscription and hybrid models

Platforms like Microsoft Copilot often use seat-based (per-user/per-month) or capacity-based pricing. The fixed cost feels predictable, but it hides its own risks – over-commitment to capacity never consumed and (perhaps more dangerously) no cost feedback loop to discourage overuse.

You must run break-even analyses to understand when a subscription becomes cheaper than a usage-based approach – and model the threshold at which overage pricing kicks in.

Third-party platforms

Many organisations don’t access foundation models via cloud intermediaries like AWS Bedrock, Google Vertex AI or Azure AI Foundry rather than directly through a provider's API. This has FinOps implications because the spend lands on your cloud invoice rather than a discrete AI bill – making it easy for AI costs to disappear into general cloud spend and bypass the governance frameworks built specifically for AI.

A few things to watch for:

  • Marketplace pricing isn't always identical to direct API rates
  • Cost attribution by team or feature becomes harder when AI spend flows through a cloud billing layer
  • FinOps dashboards pulling from a provider's native usage API won't capture consumption routed through a third party.

A key principle must be: wherever the invoice arrives, the governance framework needs to follow.

Prompt engineering is cost engineering

If there’s a single optimisation lever that delivers the highest return for the least effort in FinOps for AI, it’s prompt engineering. It’s simple to treat this as a purely technical concern, but it isn’t – the way a prompt is written directly determines the cost of every call it makes, for as long as it’s in production.

Longwinded, poorly structured prompts consume unnecessary input tokens. Ambiguous prompts generate longer, less-focused output, compounding cost on the output side. Each re-run is effectively a 100% cost penalty on that interaction.

prompt engineering

Prompt caching

Most leading providers now offer prompt caching – storing and reusing previously processed static content (system instructions and document context) rather than reprocessing them with every request. Think of it as browser caching for AI where the expensive part is processed once and subsequent calls retrieve it at a fraction of the cost.

Anthropic's caching pricing is illustrative. Cache reads are charged at just 10% of standard input token rates. For a RAG-based system sending 10,000 tokens of context with every one of 10,000 daily requests, caching can reduce the input cost of that context by up to 90%. At scale, this is a structural cost transformation.

One thing to consider as part of this optimisation strategy is managing Cache TTL (Time-To-Live) and ensuring high hit rates across distributed teams.

Model selection: Don't let it be a decision made by default

Model selection is an impactful cost decision but is almost always made by default rather than design. FinOps should be integrated in the model selection and routing thresholds conversation. The right question isn't 'Which model performs best?' it's 'Which model performs well enough for this specific task, at the lowest cost?'

model selection

The most cost-mature AI organisations implement a Mixture of Models architecture, routing query types to the model tier appropriate to their complexity:

  • Simple intent classification call goes to a small, cheap model.
  • Nuanced multi-step reasoning task escalates to a frontier model.

This typically captures 90-95% of the performance of an all-frontier approach, at 20-30% of the cost.

Implementing key engineering controls

AI budgets often break when moving from test environment to production usage – $200/month in controlled testing becomes $20,000/month when real users arrive with real query volumes, longer inputs and edge cases nobody planned for.

You therefore need key cost controls at the engineering layer:

  • Token ceilings at application or API level: Hard limits that prevent any single request from exceeding a defined token budget.
  • Response length constraints: Instructing the model to be concise since output tokens cost 3-5 times more than input tokens.
  • Rate limiting and concurrency caps: Preventing burst usage patterns from creating unexpected monthly cost spikes.
  • Guardrails on context window growth: Especially critical in agentic AI systems where multi-turn memory accumulates exponentially.
  • Cache layers for repeated prompts: Both in-application caching and provider-level prompt caching, applied to every high-volume feature.

When you embed these controls into governance, you can more easily make the POC-to-production transition – because you can tell Finance and leadership exactly what each interaction costs, and hold to it.

Making AI spend visible: Showback and chargeback

Invisible usage is another consistent contributor to AI programme failure. This is when multiple teams consume AI services from a central API key, and costs accumulate with no attribution. By the time Finance notices the monthly bill, the behaviours causing it are deeply embedded in product architecture.

Showback – attributing costs to teams and business units – is the foundational step. Once teams can see what they’re consuming and what it costs, behaviour rapidly changes. A dashboard showing a team that their feature costs £4,000 per month in inference alone is far more motivating than a shared corporate invoice they have no visibility into.

As FinOps maturity grows, you can evolve to chargeback – where AI costs are formally allocated to the P&L of the consuming business unit. This creates genuine accountability. A useful hybrid is showback for R&D and innovation workloads where experimentation is the point, and chargeback for production AI services where cost predictability is essential.

For either to work well with AI, you need purpose-built tagging standards:

  • Cost-per-prompt and cost-per-output by application and team
  • Model-level cost attribution
  • Anomaly detection dashboards that surface unexpected spend spikes in near-real-time (rather than at month-end)

Connecting cost to value

An AI programme might track total spend but have no visibility on cost per outcome – and that distinction separates financial governance from financial monitoring. Unit economics means anchoring your AI spend to the thing it actually produces. Instead of ‘We spent £40,000 on inference last month’, try establishing:

  • Cost per support ticket resolved: If your AI handles 20,000 tickets at £40,000 in inference, that's £2.00 per resolution. Is that cheaper than a human agent? By how much? What's the trend as volume scales?
  • Cost per code review completed: If a developer copilot runs 5,000 reviews at £500 in API calls, that's £0.10 per review. What's the productivity gain per developer per week?
  • Cost per lead qualified: If an AI sales assistant processes 10,000 inbound leads at £1,200/month, what's the cost per qualified opportunity versus the pipeline it generates?

When you frame AI spend this way, you can determine whether it’s generating value. You can also set rational scaling thresholds so you know the volume at which unit cost becomes cheaper than the alternative.

This is where FinOps changes the conversation – because AI costs are translated to the business context and better decisions can be made upfront based on the insights. You should establish unit cost baselines at the start of every AI initiative, track them through production and collaborate with teams when cost-per-outcome trends in the wrong direction.

The organisations winning with AI are spending intelligently

AI programmes stall for a combination of technical, financial, operational and strategic reasons – often born from poor visibility, unrealistic expectations and the absence of governance frameworks that distinguish a sustainable AI capability from an expensive experiment.

When you embed FinOps in AI programmes from proof-of-concept through to production, you bring:

  • Cost transparency that prevents surprise budget events
  • Prompt and architecture governance that can significantly reduce spend without sacrificing quality
  • Model selection rigour that captures appropriate performance at a fraction of the cost
  • Accountability mechanisms that change behaviour before problems become overwhelming

Without FinOps, AI projects are at high risk of uncontrolled spend, low-value output and eventual abandonment. With FinOps, AI becomes a value-generating capability that scales responsibly.

Contact us to discuss how we can help you achieve this.

Jake McCracken
Jake McCracken LinkedIn
Senior FinOps Advisor
Scroll to top