How to Add Budget Limits to LangChain, CrewAI, and AutoGen Agents
LangChain, CrewAI, and AutoGen make it easy to build multi-agent systems. But none of them ship with per-workflow budget controls. Here's how to fix that.
The Multi-Agent Cost Problem
Multi-agent frameworks are exploding in popularity. LangChain has 100K+ GitHub stars. CrewAI crossed 50K. AutoGen is Microsoft's flagship agent framework. They all share one critical gap:
No built-in way to set a dollar budget on a workflow.
When you run a CrewAI crew with 4 agents, each agent makes independent LLM calls. A "research agent" might call GPT-4o 30 times. A "writer agent" might call Claude 3.7 Sonnet 15 times. The orchestrator has no idea what the total cost is until after the fact.
The result? Teams discover their agent workflows cost 5-10x what they budgeted. A simple "write a blog post" crew can burn $3-8 per run. A complex research workflow can hit $50+.
What the Frameworks Offer (and Don't)
| Framework | Token Counting | Dollar Budgets | Auto-Downgrade | Kill Switch |
|---|---|---|---|---|
| LangChain | ✓ Callbacks | ✗ | ✗ | ✗ |
| CrewAI | ⚠ Basic logging | ✗ | ✗ | ✗ |
| AutoGen | ⚠ Per-message | ✗ | ✗ | ✗ |
| TokenFence | ✓ Automatic | ✓ Per-workflow | ✓ Automatic | ✓ Configurable |
LangChain has the most mature token tracking via callbacks, but converting tokens to dollars and enforcing budgets is left as an exercise for the reader. CrewAI and AutoGen have even less.
Adding Budget Caps to LangChain
LangChain uses the OpenAI/Anthropic client under the hood. TokenFence wraps those clients at the SDK level, so it works transparently:
from langchain_openai import ChatOpenAI
from tokenfence import guard
import openai
# Create a guarded OpenAI client
guarded_client = guard(
openai.OpenAI(),
budget="$2.00", # Max $2 for this workflow
fallback="gpt-4o-mini", # Downgrade when 80% spent
on_limit="stop" # Hard stop at budget
)
# Use with LangChain
llm = ChatOpenAI(
model="gpt-4o",
openai_api_key=guarded_client.api_key,
)
# All LangChain calls now go through TokenFence
chain = prompt | llm | parser
result = chain.invoke({"input": "Write a market analysis"})
Every LLM call LangChain makes now passes through TokenFence's budget tracking. When spend hits 80%, requests automatically downgrade to gpt-4o-mini. At $2.00, all calls stop cleanly.
Adding Budget Caps to CrewAI
CrewAI's architecture makes cost control tricky — each agent runs independently. TokenFence handles this by tracking spend across all agents through a shared budget:
from crewai import Agent, Task, Crew
from tokenfence import guard
import openai
# Shared guarded client for the entire crew
client = guard(
openai.OpenAI(),
budget="$5.00",
fallback="gpt-4o-mini",
on_limit="stop"
)
# All agents share the same budget
researcher = Agent(
role="Senior Research Analyst",
llm="gpt-4o",
)
writer = Agent(
role="Content Writer",
llm="gpt-4o",
)
crew = Crew(agents=[researcher, writer], tasks=[...])
result = crew.kickoff()
Adding Budget Caps to AutoGen
AutoGen agents can run in conversation loops that are particularly expensive. Budget caps are essential:
from autogen import AssistantAgent, UserProxyAgent
from tokenfence import guard
import openai
# Guard the client with aggressive limits
client = guard(
openai.OpenAI(),
budget="$3.00",
fallback="gpt-4o-mini",
on_limit="stop"
)
assistant = AssistantAgent("assistant", llm_config={...})
user_proxy = UserProxyAgent("user_proxy")
# Conversation is now budget-capped
user_proxy.initiate_chat(assistant, message="Analyze Q1 sales data")
Async Agents? Covered Too.
If you're running async agent pipelines (FastAPI, async CrewAI, etc.), TokenFence 0.2.0 ships with native async support:
from tokenfence import async_guard
import openai
client = async_guard(
openai.AsyncOpenAI(),
budget="$1.00",
fallback="gpt-4o-mini"
)
# Works with any async framework
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this document"}]
)
The Cost Math
Here's what typical multi-agent workflows cost without budget controls:
| Workflow | Agents | Avg LLM Calls | Uncontrolled Cost | With TokenFence ($2 cap) |
|---|---|---|---|---|
| Blog post generation | 3 | 45 | $3.20 - $8.50 | $2.00 max |
| Code review | 2 | 20 | $1.50 - $4.00 | $2.00 max |
| Market research | 4 | 80+ | $8.00 - $25.00 | $2.00 max |
| Customer support | 2 | 10 | $0.50 - $2.00 | $2.00 max |
The key insight: with budget caps, you can predict and control costs instead of hoping for the best.
Best Practices for Multi-Agent Budgets
- Set budgets per workflow, not globally. A research task should have a different budget than a simple classification.
- Use auto-downgrade aggressively. Start with GPT-4o for quality, fall back to mini for cost. Most agent tasks don't need the best model for every step.
- Set the kill switch to "stop", not "raise". Graceful degradation beats crashing in production.
- Monitor actual spend vs. budget. If workflows consistently hit their cap, either the budget is too low or the workflow needs optimization.
- Share budgets across related agents. A CrewAI crew should share one budget, not have separate budgets per agent.
Get Started in 60 Seconds
pip install tokenfence
from tokenfence import guard
import openai
client = guard(openai.OpenAI(), budget="$2.00", fallback="gpt-4o-mini", on_limit="stop")
# Drop into any framework. Done.
Read the full documentation, check out async support, or browse examples on GitHub.
Ready to protect your AI budget?
Two lines of code. Per-workflow budgets. Automatic model downgrade. Hard kill switch.