Why Async?

Production AI agent pipelines run async. Whether you're using asyncio for concurrent API calls, FastAPI for serving, or running multi-step agent loops, you need budget protection that doesn't block your event loop.

async_guard() is the async counterpart to guard() — same API, same protections, fully async.

Install

# Python
pip install tokenfence[openai]
# or for Anthropic
pip install tokenfence anthropic

Async OpenAI

import asyncio
import openai
from tokenfence import async_guard

async def main():
    client = async_guard(
        openai.AsyncOpenAI(),
        budget="$2.00",
        fallback="gpt-4o-mini",
        on_limit="stop",
    )

    # Concurrent requests — all tracked against the same budget
    tasks = [
        client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": f"Question {i}"}],
        )
        for i in range(10)
    ]
    responses = await asyncio.gather(*tasks)

    print(f"Spent: ${client.tokenfence.spent:.4f}")
    print(f"Remaining: ${client.tokenfence.remaining:.4f}")

asyncio.run(main())

Async Anthropic

import asyncio
import anthropic
from tokenfence import async_guard

async def agent_loop():
    client = async_guard(
        anthropic.AsyncAnthropic(),
        budget="$1.00",
        fallback="claude-3-haiku-20240307",
        on_limit="raise",
        threshold=0.7,
    )

    messages = []
    for step in ["Write a function", "Add tests", "Add docs"]:
        messages.append({"role": "user", "content": step})
        try:
            response = await client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1024,
                messages=messages,
            )
            messages.append({"role": "assistant", "content": response.content[0].text})
        except Exception as e:
            print(f"Budget hit: {e}")
            break

asyncio.run(agent_loop())

FastAPI Integration

Perfect for API backends where each request gets its own budget:

from fastapi import FastAPI
import openai
from tokenfence import async_guard

app = FastAPI()

@app.post("/chat")
async def chat(prompt: str):
    # Each request gets a fresh $0.10 budget
    client = async_guard(
        openai.AsyncOpenAI(),
        budget="$0.10",
        fallback="gpt-4o-mini",
        on_limit="stop",
    )
    
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )
    
    return {
        "response": response.choices[0].message.content,
        "cost": client.tokenfence.spent,
    }

API Reference

`async_guard(client, *, budget, fallback=None, on_limit="stop", threshold=0.8)`

Identical to guard() but for async clients. Accepts openai.AsyncOpenAI and anthropic.AsyncAnthropic.

Thread Safety

The underlying CostTracker is thread-safe. Multiple concurrent asyncio.gather() calls against the same guarded client will correctly accumulate costs without race conditions.

Async Guide