Multi-Agent Cost Tracking: The Observability Layer You're Missing
You’ve deployed a multi-agent system. A planner agent delegates to a researcher, a writer, and a reviewer. It works beautifully in dev. Then the bill arrives: $47 for a single workflow run. Which agent caused it? You have no idea.
This is the observability gap in multi-agent AI systems. We instrument latency, errors, and throughput — but per-agent cost attribution is almost always missing.
Why Multi-Agent Cost Tracking Is Different
Single-agent systems are simple: one model, one request, one cost. Multi-agent architectures break this model in three ways:
- Agent fan-out: A planner spawns 5 researcher agents. Each researcher makes 3–10 API calls. Cost grows exponentially, not linearly.
- Model mixing: Your planner uses GPT-4, researchers use GPT-3.5, the writer uses Claude. Each has different per-token pricing. Aggregating cost across providers is manual math.
- Retry cascades: When an agent fails, the orchestrator retries — sometimes with a more expensive model. A single retry loop can 10x the cost of a workflow.
The 3 Metrics Every Multi-Agent System Needs
At minimum, you should be tracking:
1. Cost per Agent Role
Not just total cost — cost by role. Which agent type is your biggest spender? In our testing, researcher agents typically account for 60–80% of total workflow cost because they make the most API calls.
from tokenfence import guard
# Give each agent role its own budget
planner_client = guard(openai.OpenAI(), budget=1.00, label="planner")
researcher_client = guard(openai.OpenAI(), budget=2.00, label="researcher")
writer_client = guard(anthropic.Anthropic(), budget=1.50, label="writer")
# Each agent's spend is tracked independently
# If any agent exceeds its budget, only that agent is killed
2. Cost per Workflow Run
Individual API calls are noise. What matters is the total cost of a complete workflow execution. If your “summarize document” workflow costs $0.12 on average but spikes to $4.80 on edge cases, you need to know that.
# Wrap the entire workflow in a single budget
workflow_client = guard(openai.OpenAI(), budget=5.00, label="doc-summary-v2")
# All agents in this workflow share the $5 budget
# Prevents any single run from becoming a cost outlier
3. Cost Anomaly Detection
Set thresholds. If a workflow that normally costs $0.15 suddenly costs $3.00, that’s a bug — not a feature. TokenFence’s budget caps act as automatic anomaly detection: if a workflow tries to exceed its budget, it gets killed before the damage spreads.
The Architecture Pattern
Here’s the pattern we recommend for production multi-agent systems:
import openai
from tokenfence import guard
class AgentOrchestrator:
def __init__(self, total_budget: float):
self.base_client = openai.OpenAI()
self.total_budget = total_budget
def create_agent(self, role: str, budget_pct: float):
"""Each agent gets a percentage of the total workflow budget."""
agent_budget = self.total_budget * budget_pct
return guard(self.base_client, budget=agent_budget, label=role)
def run_workflow(self, task: str):
planner = self.create_agent("planner", budget_pct=0.15)
researcher = self.create_agent("researcher", budget_pct=0.50)
writer = self.create_agent("writer", budget_pct=0.25)
reviewer = self.create_agent("reviewer", budget_pct=0.10)
# Planner gets 15% of budget, researcher gets 50%, etc.
# If researcher hits its cap, the workflow degrades gracefully
# instead of burning the entire budget on one runaway agent
What NOT to Do
Common mistakes we see in production multi-agent deployments:
- Don’t rely on monthly billing dashboards. By the time you see the OpenAI bill, the damage is done. You need real-time, per-workflow caps.
- Don’t give all agents the same budget. A planner that makes 2 API calls doesn’t need the same budget as a researcher making 20. Allocate proportionally.
- Don’t ignore retry costs. Retries are the #1 source of cost surprises in multi-agent systems. Cap retries OR cap budget — ideally both.
- Don’t build custom tracking infrastructure. You’ll spend weeks building what a library handles in 2 lines. Ship your product, not your cost tracker.
Real-World Numbers
Here’s what multi-agent cost looks like in practice (based on GPT-4o pricing, March 2026):
| Workflow Type | Agents | Avg API Calls | Avg Cost | P99 Cost |
|---|---|---|---|---|
| Document summary | 2 | 4–6 | $0.08 | $0.35 |
| Research + report | 4 | 15–30 | $0.45 | $3.20 |
| Code review pipeline | 3 | 8–12 | $0.22 | $1.80 |
| Customer support triage | 5 | 20–40 | $0.60 | $5.50 |
Notice the gap between average and P99. That’s the danger zone. Without per-workflow budget caps, those P99 runs will eat your margin.
Getting Started
Adding per-agent cost tracking to an existing multi-agent system takes about 5 minutes with TokenFence:
# Python
pip install tokenfence
# Node.js
npm install tokenfence
Wrap each agent’s LLM client in a guard() call with a budget. That’s it. No infrastructure changes, no dashboard setup, no config files.
Check out the documentation for async patterns, provider-specific examples, and the full API reference. Or browse real-world multi-agent examples on GitHub.
Your agents are smart. Make sure your cost controls are too.
Ready to protect your AI budget?
Two lines of code. Per-workflow budgets. Automatic model downgrade. Hard kill switch.