Docs/Async Guide

Async Guide

Use TokenFence with async OpenAI and Anthropic clients in production pipelines.

Why Async?

Production AI agent pipelines run async. Whether you're using asyncio for concurrent API calls, FastAPI for serving, or running multi-step agent loops, you need budget protection that doesn't block your event loop.

async_guard() is the async counterpart to guard() — same API, same protections, fully async.

Install

# Python
pip install tokenfence[openai]
# or for Anthropic
pip install tokenfence anthropic

Async OpenAI

import asyncio
import openai
from tokenfence import async_guard

async def main():
    client = async_guard(
        openai.AsyncOpenAI(),
        budget="$2.00",
        fallback="gpt-4o-mini",
        on_limit="stop",
    )

    # Concurrent requests — all tracked against the same budget
    tasks = [
        client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": f"Question {i}"}],
        )
        for i in range(10)
    ]
    responses = await asyncio.gather(*tasks)

    print(f"Spent: ${client.tokenfence.spent:.4f}")
    print(f"Remaining: ${client.tokenfence.remaining:.4f}")

asyncio.run(main())

Async Anthropic

import asyncio
import anthropic
from tokenfence import async_guard

async def agent_loop():
    client = async_guard(
        anthropic.AsyncAnthropic(),
        budget="$1.00",
        fallback="claude-3-haiku-20240307",
        on_limit="raise",
        threshold=0.7,
    )

    messages = []
    for step in ["Write a function", "Add tests", "Add docs"]:
        messages.append({"role": "user", "content": step})
        try:
            response = await client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1024,
                messages=messages,
            )
            messages.append({"role": "assistant", "content": response.content[0].text})
        except Exception as e:
            print(f"Budget hit: {e}")
            break

asyncio.run(agent_loop())

FastAPI Integration

Perfect for API backends where each request gets its own budget:

from fastapi import FastAPI
import openai
from tokenfence import async_guard

app = FastAPI()

@app.post("/chat")
async def chat(prompt: str):
    # Each request gets a fresh $0.10 budget
    client = async_guard(
        openai.AsyncOpenAI(),
        budget="$0.10",
        fallback="gpt-4o-mini",
        on_limit="stop",
    )
    
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )
    
    return {
        "response": response.choices[0].message.content,
        "cost": client.tokenfence.spent,
    }

API Reference

async_guard(client, *, budget, fallback=None, on_limit="stop", threshold=0.8)

Identical to guard() but for async clients. Accepts openai.AsyncOpenAI and anthropic.AsyncAnthropic.

Thread Safety

The underlying CostTracker is thread-safe. Multiple concurrent asyncio.gather() calls against the same guarded client will correctly accumulate costs without race conditions.

Next Steps

Ready to protect your AI budget?

Two lines of code. Per-workflow budgets. Automatic model downgrade. Hard kill switch.