How to Set Per-Workflow Budget Limits on OpenAI API Calls
OpenAI's spending limits are account-wide. When your rogue agent hits the limit, every customer on your platform gets a 429 error. That's not a guardrail — it's a single point of failure.
What you actually need is per-workflow budget control. Here's how to do it in Python and TypeScript with TokenFence.
The Problem: Account-Level Limits Don't Scale
If you're running AI agents in production, you've probably already hit this:
- Agent A handles customer support queries ($0.02 each)
- Agent B does document analysis ($0.50 each)
- Agent C runs multi-step research workflows ($2–5 each)
With OpenAI's account spending limit set to $100/day, one malfunctioning Agent C can burn through the entire daily budget in minutes — taking Agents A and B offline with it.
The Solution: Per-Workflow Budget Caps
TokenFence wraps your existing OpenAI or Anthropic client with three layers of protection:
Layer 1: Budget Cap
from tokenfence import guard
import openai
client = guard(openai.OpenAI(), budget="$0.50")
Every API call is tracked against a per-workflow budget.
Layer 2: Auto-Downgrade
client = guard(
openai.OpenAI(),
budget="$0.50",
fallback="gpt-4o-mini",
)
When your workflow has used 80% of its budget, TokenFence automatically downgrades to a cheaper model.
Layer 3: Kill Switch
client = guard(
openai.OpenAI(),
budget="$0.50",
on_limit="stop",
)
At budget cap: "stop" returns a synthetic response, "raise" throws BudgetExceeded, "warn" logs and allows.
TypeScript Version
import { guard } from "tokenfence";
import OpenAI from "openai";
const client = guard(new OpenAI(), {
budget: "$0.50",
fallback: "gpt-4o-mini",
onLimit: "stop",
});
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Analyze this data..." }],
});
console.log(client.tokenfence.spent); // 0.0023
console.log(client.tokenfence.remaining); // 0.4977
Works with Anthropic Too
import anthropic
from tokenfence import guard
client = guard(
anthropic.Anthropic(),
budget="$1.00",
fallback="claude-3-haiku-20240307",
on_limit="stop",
)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Summarize this document..."}],
)
Real-World Example: Multi-Agent Orchestration
Each agent gets its own budget — no blast radius:
support_agent = guard(openai.OpenAI(), budget="$0.10", on_limit="stop")
analysis_agent = guard(openai.OpenAI(), budget="$1.00", fallback="gpt-4o-mini")
research_agent = guard(openai.OpenAI(), budget="$5.00", fallback="gpt-4o-mini")
If the research agent goes haywire, it burns through $5 max. The support and analysis agents keep working.
Comparison: TokenFence vs. Alternatives
| Feature | OpenAI Limits | LangSmith | TokenFence |
|---|---|---|---|
| Per-workflow budgets | ❌ | ❌ | ✅ |
| Auto model downgrade | ❌ | ❌ | ✅ |
| Kill switch | Account-wide | ❌ | Per-workflow |
| Setup time | N/A | Hours | 2 lines |
| Framework lock-in | N/A | LangChain | None |
Getting Started
pip install tokenfence # Python
npm install tokenfence # Node.js / TypeScript
Two lines to protect your entire AI budget. No framework lock-in. No infrastructure to deploy.
Ready to protect your AI budget?
Two lines of code. Per-workflow budgets. Automatic model downgrade. Hard kill switch.