Now in public beta

The cost circuit breaker for AI agents

Wrap your OpenAI/Anthropic clients in 2 lines of code. Set budget caps, automate model downgrades, and stop runaway loops instantly.

2.4k starsMIT License
main.py
1from tokenfence import guard
2 
3client = guard(
4 openai.OpenAI(),
5 budget="$0.50",
6 fallback="gpt-4o-mini",
7 on_limit="stop"
8)
Budget:$0.12 / $0.50

Trusted by developers at

A
Acme Corp
N
Nebula AI
Q
Quantum Labs
S
Synapse
C
Cortex
A
Apex
2,400+ stars
15K+ weekly installs
$2.1M+ saved

AI agents have unpredictable costs

A single agentic loop can burn through your entire monthly budget in minutes. One bad prompt, one recursive call, and you're looking at a four-figure surprise on your next invoice.

67%

of developers cite cost as #1 barrier to deploying AI agents

$4.2K

average unexpected bill from a single runaway agentic workflow

Estimated API Cost

$0.00

Looking normal...

Three layers of protection

TokenFence wraps your AI client locally. Zero latency overhead, full cost control.

Budget Caps

Set per-workflow spending limits. Never exceed your budget. Enforce hard limits at the SDK level before requests even fire.

Auto Downgrade

When 80% of budget is spent, automatically route to cheaper models. GPT-4o falls back to GPT-4o-mini seamlessly.

Kill Switch

Hard stop at budget limit. Graceful error handling, clean shutdown, no surprise bills. Your agents fail safely.

2 lines. That's it.

No config files, no dashboards to set up. Import, wrap, deploy.

Before
1import openai
2 
3client = openai.OpenAI()
4 
5response = client.chat.completions.create(
6 model="gpt-4o",
7 messages=[{"role": "user", "content": prompt}]
8)
9# No budget limit.
10# No fallback.
11# Good luck. 🎉
After
1import openai
2from tokenfence import guard
3 
4client = guard(
5 openai.OpenAI(),
6 budget="$0.50",
7 fallback="gpt-4o-mini",
8 on_limit="stop"
9)
10 
11response = client.chat.completions.create(
12 model="gpt-4o",
13 messages=[{"role": "user", "content": prompt}]
14)
15# Protected. Capped. Safe.

How it works

Five steps from unprotected to fully guarded. All local, all fast.

01

Wrap your AI client

Import TokenFence and wrap your existing OpenAI or Anthropic client. Zero config required.

client = guard(openai.OpenAI())
02

Set your budget

Define spending limits per workflow, per user, or globally. Budgets reset on your schedule.

budget="$0.50"
03

Monitor in real-time

TokenFence tracks every token locally. No proxy, no external calls, no added latency.

# 0ms overhead
04

Auto-downgrade models

At 80% budget, requests automatically route to cheaper models. Quality degrades gracefully.

fallback="gpt-4o-mini"
05

Hard stop at limit

When budget is exhausted, TokenFence stops requests with a clean, catchable exception.

on_limit="stop"

Simple, transparent pricing

Start free. Upgrade when you need more protection.

Hobby

Free

For side projects and experimentation

  • 50K requests/month
  • Basic budget caps
  • Single project
  • Community support
  • Core SDK access
Recommended

Pro

$49/mo

For production AI applications

  • 500K requests/month
  • Auto model downgrades
  • Per-workflow budget caps
  • Unlimited projects
  • Email support
  • Usage dashboard
  • Team management

Enterprise

Custom

For teams with advanced needs

  • Unlimited requests
  • SLA guarantee
  • On-premise deployment
  • Dedicated support
  • Custom integrations
  • SSO / SAML
  • Audit logs

Frequently asked questions

Everything you need to know about TokenFence.

Stop burning money on runaway agents

Join thousands of developers who ship AI agents with confidence. Free to start, no credit card required.