If you're building AI agents in 2025, you're probably using more than one provider. OpenAI for GPT-4o, Anthropic for Claude, Google for Gemini — each has different pricing, different token counting, and different gotchas.

The result? Unpredictable costs that blow up your budget.

Current Token Pricing (March 2025)

OpenAI

Model	Input (per 1M)	Output (per 1M)	Best For
GPT-4o	$2.50	$10.00	General purpose
GPT-4o mini	$0.15	$0.60	High-volume tasks
o1	$15.00	$60.00	Complex reasoning
o3	$10.00	$40.00	Advanced reasoning
o3-mini	$1.10	$4.40	Budget reasoning

Anthropic Claude

Model	Input (per 1M)	Output (per 1M)	Best For
Claude Opus 4	$15.00	$75.00	Complex analysis
Claude Sonnet 4	$3.00	$15.00	Balanced quality/cost
Claude 3.5 Haiku	$0.80	$4.00	Fast classification
Claude 3 Haiku	$0.25	$1.25	Cheapest Claude

Google Gemini

Model	Input (per 1M)	Output (per 1M)	Best For
Gemini 2.5 Pro	$1.25	$10.00	Long context, multimodal
Gemini 2.5 Flash	$0.15	$0.60	Speed + cost efficiency
Gemini 2.0 Flash	$0.10	$0.40	Cheapest fast option

DeepSeek

Model	Input (per 1M)	Output (per 1M)	Best For
DeepSeek Chat	$0.27	$1.10	Budget GPT-4o alternative
DeepSeek Reasoner	$0.55	$2.19	Cheap reasoning

The Real-World Cost Comparison

1,000 requests/day, 500 input + 200 output tokens each:

Model	Daily Cost	Monthly Cost
GPT-4o	$3.25	$97.50
GPT-4o mini	$0.20	$5.85
Claude Sonnet 4	$4.50	$135.00
Claude 3 Haiku	$0.38	$11.25
Gemini 2.5 Flash	$0.20	$5.85
Gemini 2.0 Flash	$0.13	$3.90

The spread is massive. Claude Opus 4 at 1,000 requests/day costs $577.50/month — while Gemini 2.0 Flash handles the same volume for $3.90/month. That's a 148x difference.

Why Multi-Agent Systems Make This Worse

Single-model apps are predictable. Modern AI systems use multiple models in chains:

Router agent (cheap model) classifies the request
Worker agent (mid-tier) does the work
Reviewer agent (premium) validates output
Retry loops when quality is insufficient

Each step multiplies token usage. A 3-step chain with one retry averages 4x the tokens of a single call. And if your retry logic has no budget cap? You're one bad prompt away from a $500 surprise.

The Solution: Per-Workflow Budget Caps

from tokenfence import guard

# Works with OpenAI
client = guard(
    openai.OpenAI(),
    budget="$5.00",
    fallback="gpt-4o-mini",
    on_limit="stop",
)

# Works with Anthropic
client = guard(
    anthropic.Anthropic(),
    budget="$2.00",
    fallback="claude-3-haiku-20240307",
    on_limit="raise",
)

# Works with async clients
from tokenfence import async_guard
client = async_guard(
    openai.AsyncOpenAI(),
    budget="$10.00",
    fallback="gpt-4o-mini",
)

Smart Cost Strategies

1. Tiered Model Selection

Tier 1 (< $0.01/call): GPT-4o mini, Gemini Flash, Claude Haiku
Tier 2 ($0.01-$0.05): GPT-4o, Gemini Pro, Claude Sonnet
Tier 3 ($0.05+): o1, Claude Opus — only for verified complex tasks

2. Automatic Downgrade Under Pressure

When budget is 80% consumed, automatically switch to cheaper models. Preserves functionality while capping costs.

3. Per-Workflow Budgets

Don't limit individual API calls — limit the entire workflow. A pipeline might need 20 calls, but the total should stay under $5.

4. Kill Switch for Runaway Loops

Agent retry loops are the #1 cause of cost spikes. A hard budget cap with on_limit="stop" prevents infinite loops from draining your account.

Key Takeaways

Token costs vary 148x between cheapest and most expensive models
Multi-agent chains multiply costs — budget per workflow, not per call
Auto-downgrade beats hard stops — graceful degradation keeps apps running
Framework-agnostic tooling lets you switch providers without rewriting budget logic

Get Started

pip install tokenfence

from tokenfence import guard
client = guard(openai.OpenAI(), budget="$5.00")

Read the documentation or browse examples on GitHub.

OpenAI vs Anthropic vs Gemini: Real Token Costs Compared (2025)