You’ve deployed a multi-agent system. A planner agent delegates to a researcher, a writer, and a reviewer. It works beautifully in dev. Then the bill arrives: $47 for a single workflow run. Which agent caused it? You have no idea.

This is the observability gap in multi-agent AI systems. We instrument latency, errors, and throughput — but per-agent cost attribution is almost always missing.

Why Multi-Agent Cost Tracking Is Different

Single-agent systems are simple: one model, one request, one cost. Multi-agent architectures break this model in three ways:

Agent fan-out: A planner spawns 5 researcher agents. Each researcher makes 3–10 API calls. Cost grows exponentially, not linearly.
Model mixing: Your planner uses GPT-4, researchers use GPT-3.5, the writer uses Claude. Each has different per-token pricing. Aggregating cost across providers is manual math.
Retry cascades: When an agent fails, the orchestrator retries — sometimes with a more expensive model. A single retry loop can 10x the cost of a workflow.

The 3 Metrics Every Multi-Agent System Needs

At minimum, you should be tracking:

1. Cost per Agent Role

Not just total cost — cost by role. Which agent type is your biggest spender? In our testing, researcher agents typically account for 60–80% of total workflow cost because they make the most API calls.

from tokenfence import guard

# Give each agent role its own budget
planner_client = guard(openai.OpenAI(), budget=1.00, label="planner")
researcher_client = guard(openai.OpenAI(), budget=2.00, label="researcher")
writer_client = guard(anthropic.Anthropic(), budget=1.50, label="writer")

# Each agent's spend is tracked independently
# If any agent exceeds its budget, only that agent is killed

2. Cost per Workflow Run

Individual API calls are noise. What matters is the total cost of a complete workflow execution. If your “summarize document” workflow costs $0.12 on average but spikes to $4.80 on edge cases, you need to know that.

# Wrap the entire workflow in a single budget
workflow_client = guard(openai.OpenAI(), budget=5.00, label="doc-summary-v2")

# All agents in this workflow share the $5 budget
# Prevents any single run from becoming a cost outlier

3. Cost Anomaly Detection

Set thresholds. If a workflow that normally costs $0.15 suddenly costs $3.00, that’s a bug — not a feature. TokenFence’s budget caps act as automatic anomaly detection: if a workflow tries to exceed its budget, it gets killed before the damage spreads.

The Architecture Pattern

Here’s the pattern we recommend for production multi-agent systems:

import openai
from tokenfence import guard

class AgentOrchestrator:
    def __init__(self, total_budget: float):
        self.base_client = openai.OpenAI()
        self.total_budget = total_budget

    def create_agent(self, role: str, budget_pct: float):
        """Each agent gets a percentage of the total workflow budget."""
        agent_budget = self.total_budget * budget_pct
        return guard(self.base_client, budget=agent_budget, label=role)

    def run_workflow(self, task: str):
        planner = self.create_agent("planner", budget_pct=0.15)
        researcher = self.create_agent("researcher", budget_pct=0.50)
        writer = self.create_agent("writer", budget_pct=0.25)
        reviewer = self.create_agent("reviewer", budget_pct=0.10)

        # Planner gets 15% of budget, researcher gets 50%, etc.
        # If researcher hits its cap, the workflow degrades gracefully
        # instead of burning the entire budget on one runaway agent

What NOT to Do

Common mistakes we see in production multi-agent deployments:

Don’t rely on monthly billing dashboards. By the time you see the OpenAI bill, the damage is done. You need real-time, per-workflow caps.
Don’t give all agents the same budget. A planner that makes 2 API calls doesn’t need the same budget as a researcher making 20. Allocate proportionally.
Don’t ignore retry costs. Retries are the #1 source of cost surprises in multi-agent systems. Cap retries OR cap budget — ideally both.
Don’t build custom tracking infrastructure. You’ll spend weeks building what a library handles in 2 lines. Ship your product, not your cost tracker.

Real-World Numbers

Here’s what multi-agent cost looks like in practice (based on GPT-4o pricing, March 2026):

Workflow Type	Agents	Avg API Calls	Avg Cost	P99 Cost
Document summary	2	4–6	$0.08	$0.35
Research + report	4	15–30	$0.45	$3.20
Code review pipeline	3	8–12	$0.22	$1.80
Customer support triage	5	20–40	$0.60	$5.50

Notice the gap between average and P99. That’s the danger zone. Without per-workflow budget caps, those P99 runs will eat your margin.

Getting Started

Adding per-agent cost tracking to an existing multi-agent system takes about 5 minutes with TokenFence:

# Python
pip install tokenfence

# Node.js
npm install tokenfence

Wrap each agent’s LLM client in a guard() call with a budget. That’s it. No infrastructure changes, no dashboard setup, no config files.

Check out the documentation for async patterns, provider-specific examples, and the full API reference. Or browse real-world multi-agent examples on GitHub.

Your agents are smart. Make sure your cost controls are too.

Multi-Agent Cost Tracking: The Observability Layer You're Missing