Docs/API Reference

API Reference

Complete API documentation for the TokenFence SDK.

guard(client, *, budget, fallback=None, on_limit="stop", threshold=0.8)

Wraps an OpenAI or Anthropic client with cost tracking and budget enforcement. Returns a drop-in replacement for the original client.

Parameters

ParameterTypeDefaultDescription
clientopenai.OpenAI or anthropic.AnthropicrequiredThe AI client to wrap
budgetstr | float | intrequiredMax spend in USD. Accepts "$0.50" or 0.50
fallbackstr | NoneNoneModel to auto-downgrade to when threshold is reached
on_limit"stop" | "warn" | "raise""stop"Behaviour when budget is exhausted
thresholdfloat0.8Fraction (0.0–1.0) of budget at which to trigger downgrade

Returns

A GuardedClient that proxies all calls to the original client. Use it exactly as you would the original.

Raises

  • TokenFenceError — invalid budget, threshold, or on_limit value
  • BudgetExceeded — (only when on_limit="raise") budget has been exhausted

on_limit Modes

"stop" (default)

Returns a synthetic response with zero tokens used. Your code keeps running — the response content will be "[TokenFence] Budget of $X.XX exceeded (spent $X.XXXX). Request blocked."

"warn"

Logs a warning via Python's logging module (logger name: "tokenfence"), then allows the API call through. Use when you want visibility without hard stops.

"raise"

Raises BudgetExceeded exception. Catch it for custom logic:

from tokenfence import guard, BudgetExceeded

client = guard(openai.OpenAI(), budget="$0.50", on_limit="raise")

try:
    response = client.chat.completions.create(model="gpt-4o", messages=[...])
except BudgetExceeded as e:
    print(f"Spent ${e.spent:.4f} of ${e.budget:.2f} budget")
    # Switch to manual fallback, cache, or abort

client.tokenfence — CostTracker

Every guarded client exposes a .tokenfence attribute with real-time spend data.

Properties

PropertyTypeDescription
spentfloatTotal USD spent so far
budgetfloatTotal budget in USD
remainingfloatBudget minus spent
usage_ratiofloatFraction of budget used
should_downgradeboolTrue when usage_ratio ≥ threshold
budget_exceededboolTrue when spent ≥ budget
call_countintNumber of API calls tracked

Methods

MethodDescription
record(cost: float)Manually record a cost (called automatically by guard)
reset()Reset spent to 0 and call_count to 0

Example

client = guard(openai.OpenAI(), budget="$1.00")

# After some API calls...
print(f"Spent: ${client.tokenfence.spent:.4f}")
print(f"Remaining: ${client.tokenfence.remaining:.4f}")
print(f"Calls: {client.tokenfence.call_count}")
print(f"Usage: {client.tokenfence.usage_ratio:.1%}")

Supported Models & Pricing

TokenFence includes built-in pricing for 40+ models across OpenAI, Anthropic, Google, and DeepSeek.

OpenAI

ModelInput ($/1M)Output ($/1M)
gpt-4o$2.50$10.00
gpt-4o-mini$0.15$0.60
gpt-4-turbo$10.00$30.00
o1$15.00$60.00
o3-mini$1.10$4.40
gpt-5.4$5.00$15.00
gpt-5.4-mini$0.30$1.20
gpt-5.4-nano$0.10$0.40

Anthropic

ModelInput ($/1M)Output ($/1M)
claude-opus-4$15.00$75.00
claude-sonnet-4$3.00$15.00
claude-3.5-haiku$0.80$4.00

Google

ModelInput ($/1M)Output ($/1M)
gemini-2.5-pro$1.25$10.00
gemini-2.5-flash$0.15$0.60
gemini-2.0-flash$0.10$0.40

DeepSeek

ModelInput ($/1M)Output ($/1M)
deepseek-chat$0.14$0.28
deepseek-reasoner$0.55$2.19

Exceptions

TokenFenceError

Base exception for all TokenFence errors.

BudgetExceeded(TokenFenceError)

Raised when on_limit="raise" and the budget is exhausted.

Attributes:

  • budget: float — the total budget
  • spent: float — the amount spent

Thread Safety

CostTracker uses threading.Lock internally. Multiple threads sharing the same guarded client will correctly accumulate costs without race conditions.


Framework Compatibility

TokenFence works with any framework that uses the standard OpenAI or Anthropic SDKs:

  • ✅ LangChain / LangGraph
  • ✅ CrewAI
  • ✅ AutoGen
  • ✅ Custom agent loops
  • ✅ FastAPI / Flask backends
  • ✅ Jupyter notebooks

No special adapters needed — just wrap the client before passing it to your framework.

Ready to protect your AI budget?

Two lines of code. Per-workflow budgets. Automatic model downgrade. Hard kill switch.