Your AI agent fleet processed 47,000 tasks last month. Your API bill was $12,400. Quick: which task type is burning the most money? If you can't answer that in under 10 seconds, you're flying blind — and almost certainly overspending by 40-60%.

Why Total Spend Is a Vanity Metric

Most teams monitor their AI spend at the account level. They know their monthly OpenAI bill. They might even know their Anthropic bill separately. But account-level spending tells you almost nothing actionable.

Here's why: a single "summarize document" task might cost $0.03. A "research and write report" task might cost $4.80. If both run 1,000 times per month, the research task is 160x more expensive — but in aggregate dashboards, they're invisible.

The Cost-Per-Task Framework

Cost-per-task tracking means attributing every API call, every token, every model invocation back to the specific task or workflow that triggered it. It's the difference between "we spent $12K on AI" and "customer onboarding costs $0.47 per user, but contract review costs $6.20 per document."

What You Need to Track

Metric	What It Tells You	Alert Threshold
Cost per task type	Which workflows are expensive	>2x baseline
Cost per task instance	Variance within a workflow	>3x median
Token efficiency ratio	Output tokens / input tokens	<0.1 (wasting context)
Model utilization score	Are you using the right model?	GPT-4 on simple tasks
Retry cost overhead	How much retries add	>20% of base cost

Implementation: 3 Patterns That Work

Pattern 1: Wrapper-Based Tracking

The simplest approach. Wrap your AI client with a budget-aware guard that tags every call with a task ID:

from tokenfence import guard

# Each task gets its own budget and tracking
def process_document(doc):
    client = guard(openai.OpenAI(), {
        "budget": 2.00,
        "task_id": f"doc-review-{doc.id}",
        "auto_downgrade": True
    })
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Review: {doc.text}"}]
    )
    
    # Cost is automatically tracked per task_id
    return response

Pattern 2: Middleware Pipeline

For frameworks like LangChain or CrewAI, inject cost tracking as middleware:

# Middleware that logs cost per workflow step
class CostTracker:
    def __init__(self):
        self.costs = {}
    
    def track(self, workflow_name, step_name, tokens_used, model):
        cost = self.calculate_cost(tokens_used, model)
        key = f"{workflow_name}.{step_name}"
        self.costs.setdefault(key, []).append(cost)
    
    def get_report(self):
        return {
            k: {
                "total": sum(v),
                "avg": sum(v)/len(v),
                "count": len(v),
                "p95": sorted(v)[int(len(v)*0.95)]
            }
            for k, v in self.costs.items()
        }

Pattern 3: Budget Fencing Per Task Type

The most powerful approach. Set different budgets for different task types based on their expected cost:

const { guard } = require('tokenfence');

const TASK_BUDGETS = {
  'summarize':      { budget: 0.10, model: 'gpt-4o-mini' },
  'analyze':        { budget: 1.00, model: 'gpt-4o' },
  'research':       { budget: 5.00, model: 'gpt-4o', autoDowngrade: true },
  'code-review':    { budget: 2.00, model: 'claude-sonnet-4-20250514' },
};

function createTaskClient(taskType) {
  const config = TASK_BUDGETS[taskType];
  if (!config) throw new Error('Unknown task type: ' + taskType);
  
  return guard(new OpenAI(), config);
}

Real-World Savings: Before vs After

Task Type	Before (Monthly)	After (Monthly)	Savings
Document summarization	$2,400	$340	86%
Customer support triage	$1,800	$420	77%
Code review automation	$3,200	$1,100	66%
Research & report gen	$5,000	$2,800	44%
Total	$12,400	$4,660	62%

The $7,740/month savings comes from three discoveries that only cost-per-task tracking reveals:

Model mismatch: 60% of summarization tasks were using GPT-4o when GPT-4o-mini produced identical quality. Switching saved $2,060/month.
Retry waste: Code review had a 34% retry rate due to context window overflow. Truncating input to relevant files cut retries to 5% and saved $1,400/month.
Task bloat: Research tasks were loading entire documents into context when only the abstract was needed for initial triage. Two-stage pipeline saved $2,200/month.

The Dashboard You Need

A good cost-per-task dashboard answers four questions:

What's my most expensive task type? — Sort by total monthly spend
Where's the variance? — High P95/P50 ratio means unpredictable costs
Am I using the right models? — Flag GPT-4 usage on simple classification tasks
Are costs trending up? — Week-over-week comparison per task type

Getting Started in 5 Minutes

You don't need a complex observability stack to start. Here's the minimum viable approach:

from tokenfence import guard
import openai

# Step 1: Wrap your client with per-task budgets
client = guard(openai.OpenAI(), {
    "budget": 5.00,
    "auto_downgrade": True,
    "kill_switch": True
})

# Step 2: Use it exactly like the normal client
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this contract..."}]
)

# That's it. TokenFence tracks cost automatically,
# downgrades models when budget runs low,
# and kills the task if it exceeds the cap.

Install now: pip install tokenfence or npm install tokenfence

The teams saving the most money aren't the ones with the biggest budgets — they're the ones who know exactly where every dollar goes.

AI Agent Cost-Per-Task Tracking: The Metric That Saves Teams $50K/Year