Async Guide
Use TokenFence with async OpenAI and Anthropic clients in production pipelines.
Why Async?
Production AI agent pipelines run async. Whether you're using asyncio for concurrent API calls,
FastAPI for serving, or running multi-step agent loops, you need budget protection that doesn't block your event loop.
async_guard() is the async counterpart to guard() — same API, same protections, fully async.
Install
# Python
pip install tokenfence[openai]
# or for Anthropic
pip install tokenfence anthropic
Async OpenAI
import asyncio
import openai
from tokenfence import async_guard
async def main():
client = async_guard(
openai.AsyncOpenAI(),
budget="$2.00",
fallback="gpt-4o-mini",
on_limit="stop",
)
# Concurrent requests — all tracked against the same budget
tasks = [
client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Question {i}"}],
)
for i in range(10)
]
responses = await asyncio.gather(*tasks)
print(f"Spent: ${client.tokenfence.spent:.4f}")
print(f"Remaining: ${client.tokenfence.remaining:.4f}")
asyncio.run(main())
Async Anthropic
import asyncio
import anthropic
from tokenfence import async_guard
async def agent_loop():
client = async_guard(
anthropic.AsyncAnthropic(),
budget="$1.00",
fallback="claude-3-haiku-20240307",
on_limit="raise",
threshold=0.7,
)
messages = []
for step in ["Write a function", "Add tests", "Add docs"]:
messages.append({"role": "user", "content": step})
try:
response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages,
)
messages.append({"role": "assistant", "content": response.content[0].text})
except Exception as e:
print(f"Budget hit: {e}")
break
asyncio.run(agent_loop())
FastAPI Integration
Perfect for API backends where each request gets its own budget:
from fastapi import FastAPI
import openai
from tokenfence import async_guard
app = FastAPI()
@app.post("/chat")
async def chat(prompt: str):
# Each request gets a fresh $0.10 budget
client = async_guard(
openai.AsyncOpenAI(),
budget="$0.10",
fallback="gpt-4o-mini",
on_limit="stop",
)
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return {
"response": response.choices[0].message.content,
"cost": client.tokenfence.spent,
}
API Reference
async_guard(client, *, budget, fallback=None, on_limit="stop", threshold=0.8)
Identical to guard() but for async clients. Accepts openai.AsyncOpenAI and anthropic.AsyncAnthropic.
Thread Safety
The underlying CostTracker is thread-safe. Multiple concurrent asyncio.gather() calls against the same guarded client will correctly accumulate costs without race conditions.
Next Steps
Ready to protect your AI budget?
Two lines of code. Per-workflow budgets. Automatic model downgrade. Hard kill switch.