← Back to Blog
TutorialOpenAIPython

How to Set Per-Workflow Budget Limits on OpenAI API Calls

·8 min read

OpenAI's spending limits are account-wide. When your rogue agent hits the limit, every customer on your platform gets a 429 error. That's not a guardrail — it's a single point of failure.

What you actually need is per-workflow budget control. Here's how to do it in Python and TypeScript with TokenFence.

The Problem: Account-Level Limits Don't Scale

If you're running AI agents in production, you've probably already hit this:

  • Agent A handles customer support queries ($0.02 each)
  • Agent B does document analysis ($0.50 each)
  • Agent C runs multi-step research workflows ($2–5 each)

With OpenAI's account spending limit set to $100/day, one malfunctioning Agent C can burn through the entire daily budget in minutes — taking Agents A and B offline with it.

The Solution: Per-Workflow Budget Caps

TokenFence wraps your existing OpenAI or Anthropic client with three layers of protection:

Layer 1: Budget Cap

from tokenfence import guard
import openai

client = guard(openai.OpenAI(), budget="$0.50")

Every API call is tracked against a per-workflow budget.

Layer 2: Auto-Downgrade

client = guard(
    openai.OpenAI(),
    budget="$0.50",
    fallback="gpt-4o-mini",
)

When your workflow has used 80% of its budget, TokenFence automatically downgrades to a cheaper model.

Layer 3: Kill Switch

client = guard(
    openai.OpenAI(),
    budget="$0.50",
    on_limit="stop",
)

At budget cap: "stop" returns a synthetic response, "raise" throws BudgetExceeded, "warn" logs and allows.

TypeScript Version

import { guard } from "tokenfence";
import OpenAI from "openai";

const client = guard(new OpenAI(), {
  budget: "$0.50",
  fallback: "gpt-4o-mini",
  onLimit: "stop",
});

const res = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Analyze this data..." }],
});

console.log(client.tokenfence.spent);     // 0.0023
console.log(client.tokenfence.remaining); // 0.4977

Works with Anthropic Too

import anthropic
from tokenfence import guard

client = guard(
    anthropic.Anthropic(),
    budget="$1.00",
    fallback="claude-3-haiku-20240307",
    on_limit="stop",
)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize this document..."}],
)

Real-World Example: Multi-Agent Orchestration

Each agent gets its own budget — no blast radius:

support_agent = guard(openai.OpenAI(), budget="$0.10", on_limit="stop")
analysis_agent = guard(openai.OpenAI(), budget="$1.00", fallback="gpt-4o-mini")
research_agent = guard(openai.OpenAI(), budget="$5.00", fallback="gpt-4o-mini")

If the research agent goes haywire, it burns through $5 max. The support and analysis agents keep working.

Comparison: TokenFence vs. Alternatives

FeatureOpenAI LimitsLangSmithTokenFence
Per-workflow budgets
Auto model downgrade
Kill switchAccount-widePer-workflow
Setup timeN/AHours2 lines
Framework lock-inN/ALangChainNone

Getting Started

pip install tokenfence        # Python
npm install tokenfence         # Node.js / TypeScript

Two lines to protect your entire AI budget. No framework lock-in. No infrastructure to deploy.

Ready to protect your AI budget?

Two lines of code. Per-workflow budgets. Automatic model downgrade. Hard kill switch.