Introducing AgentGuard: Least-Privilege Policies for AI Agents — Because Prompts Are Not Permissions
You wouldn't give a new hire admin access to your production database on day one. So why are we giving AI agents unrestricted access to every tool in the system and crossing our fingers that a prompt instruction will keep them in line?
After the Meta AI agent SEV1 incident and Grigorev's production database wipe, the industry has a painfully obvious lesson staring it in the face: prompts are suggestions. Policies are enforcement.
Today, we're shipping the AgentGuard Policy Engine — a new module in TokenFence that brings least-privilege enforcement to AI agents. Define what your agents can do, deny what they can't, require human approval for dangerous operations, and audit every decision.
The Problem: Agents With Root Access
Most AI agent frameworks today give agents access to tools through function calling. The agent gets a list of available tools, picks which ones to call, and the framework executes them. The security model? A system prompt that says "be careful."
Here's what that looks like in practice:
# The "security model" most AI agents use today
system_prompt = """
You are a helpful database assistant.
You can read data and generate reports.
IMPORTANT: Never delete, drop, or modify any data.
"""
# But the agent has access to ALL tools...
tools = [
read_database,
write_database,
delete_record, # "But I told it not to!"
drop_table, # "The prompt said don't!"
truncate_logs, # "It shouldn't call this..."
]
This is the equivalent of giving someone the keys to your house and a sticky note that says "please don't go in the bedroom." It might work most of the time. But when it fails, it fails catastrophically.
What Went Wrong: The March 2026 Incidents
Two high-profile incidents in March 2026 proved this isn't theoretical:
- Meta's AI Agent SEV1 — An autonomous AI agent in Meta's internal systems triggered a cascade of unintended actions that required an emergency response. The agent was operating within its prompt instructions but exceeded its intended scope of operations.
- Grigorev Database Wipe — A developer's AI coding agent, given broad tool access for a refactoring task, executed destructive database commands on a production system. The agent was "told" not to touch production, but prompt-level instructions have zero enforcement power.
In both cases, the pattern is identical: the agent had the capability to do dangerous things, and a prompt-level instruction was the only "guardrail."
The Solution: Runtime Policy Enforcement
AgentGuard brings the principle of least privilege to AI agents. Instead of trusting prompt instructions, you declare a policy — a machine-enforced contract that defines what an agent can and cannot do.
from tokenfence import Policy
# Define what this agent is allowed to do
policy = Policy(
allow=["read_database", "generate_report", "list_tables"],
deny=["delete_*", "drop_*", "truncate_*", "alter_*"],
require_approval=["write_database", "create_table"],
name="database-readonly-agent",
)
# Check before executing any tool call
result = policy.check("read_database")
assert result.allowed # ✅ Explicitly permitted
result = policy.check("drop_table")
assert result.denied # 🚫 Blocked — no prompt can override this
result = policy.check("write_database")
assert result.needs_approval # ⏸️ Requires human confirmation
This isn't a suggestion. It's enforcement. The agent physically cannot call drop_table because the policy engine blocks it before the tool is invoked.
How It Works: Three Layers of Protection
1. Allow/Deny Lists with Wildcards
Policies use pattern matching (fnmatch-style) to define broad or specific rules:
policy = Policy(
allow=["read_*", "list_*", "search_*"], # Read operations
deny=["delete_*", "drop_*", "rm_*"], # Destructive operations
default="deny", # Deny everything not explicitly allowed
)
The default="deny" setting means any tool not matching an allow pattern is automatically blocked. This is the secure default — if you forget to add a new tool to the allow list, it's denied rather than permitted.
2. Approval Gates for Sensitive Operations
Some operations aren't dangerous by themselves but need human oversight. Approval gates pause execution and wait for confirmation:
policy = Policy(
allow=["*"],
deny=["drop_*", "truncate_*"],
require_approval=["send_email", "publish_post", "transfer_funds"],
on_approval=lambda result: ask_human(f"Allow {result.tool}?"),
)
When the agent tries to call send_email, the policy engine invokes your approval callback. You can implement this as a Slack message, a webhook, a CLI prompt — whatever fits your workflow.
3. Full Audit Trail
Every policy decision is logged with timestamps, the tool name, the decision, the matched rule, and optional context:
policy = Policy(allow=["read_*"], deny=["delete_*"], audit=True)
policy.check("read_file", context={"agent_id": "data-bot", "session": "abc123"})
policy.check("delete_file", context={"agent_id": "data-bot", "session": "abc123"})
# Inspect the audit trail
for entry in policy.audit_log:
print(f"{entry.tool} → {entry.decision} ({entry.reason})")
# read_file → allow (allowed by pattern: read_*)
# delete_file → deny (denied by pattern: delete_*)
This gives you a complete, queryable record of every tool call your agents attempted and whether it was permitted. Essential for compliance auditing, incident investigation, and debugging agent behavior.
Real-World Policy Templates
Database Agent (Post-Grigorev)
db_policy = Policy(
allow=["read_database", "generate_report", "list_tables", "describe_schema"],
deny=["delete_*", "drop_*", "truncate_*", "alter_*", "grant_*", "revoke_*"],
require_approval=["write_database", "create_table", "create_index"],
name="database-safety",
)
Email Agent
email_policy = Policy(
allow=["read_email", "search_email", "draft_email", "list_folders"],
deny=["delete_email_*", "forward_to_external", "create_rule"],
require_approval=["send_email"],
name="email-agent",
)
Code Review Agent
review_policy = Policy(
allow=["read_file", "list_files", "git_diff", "git_log", "run_tests"],
deny=["git_push", "git_force_*", "rm_*", "deploy_*"],
require_approval=["git_commit", "create_pr"],
name="code-reviewer",
)
Policy-as-Code: Version Control Your Security
Policies serialize to plain dictionaries, so you can store them as JSON or YAML in version control:
# Save policy to config
import json
config = policy.to_dict()
with open("policies/database-agent.json", "w") as f:
json.dump(config, f, indent=2)
# Load from config
with open("policies/database-agent.json") as f:
config = json.load(f)
policy = Policy.from_dict(config)
This means your agent policies are:
- Reviewable — PRs that change agent permissions are visible to the whole team
- Testable — Write unit tests for your policies (we do — 31 tests for the policy engine alone)
- Auditable — Git history shows who changed what permission and when
- Rollbackable — Bad policy change? Revert the commit
The enforce() Shortcut
If you want hard stops (exceptions) instead of checking return values, use enforce():
from tokenfence import Policy, ToolDenied
policy = Policy(allow=["read_*"])
try:
policy.enforce("delete_users")
except ToolDenied as e:
print(f"Blocked: {e.tool} — {e.reason}")
# Blocked: delete_users — not in allow list (default: deny)
This integrates naturally into agent middleware — wrap your tool execution in a policy.enforce() call and denied tools raise exceptions that your framework can catch and handle gracefully.
Combined with Budget Caps: Defense in Depth
AgentGuard policies work alongside TokenFence's existing budget caps. Together, they form a defense-in-depth strategy:
from tokenfence import guard, Policy
# Layer 1: Budget cap — financial circuit breaker
client = guard(openai.OpenAI(), budget=5.00, fallback="gpt-4o-mini")
# Layer 2: Policy — operational circuit breaker
policy = Policy(
allow=["read_database", "generate_report"],
deny=["delete_*", "drop_*"],
require_approval=["write_database"],
)
# Before every tool call:
result = policy.enforce(tool_name) # Blocks unauthorized tools
# Then use the guarded client for LLM calls:
response = client.chat.completions.create(...) # Caps spending
Two lines for budget control. Three lines for tool control. No infrastructure. No config files. Just Python.
What's Next
The Policy engine ships today in TokenFence v0.3.0 (pip install tokenfence==0.3.0). Here's what's coming next:
- Node.js/TypeScript SDK parity — Same policy engine for the npm package
- Dashboard MVP — Visual audit log viewer and policy editor
- Webhook alerts — Get notified when agents hit denied tools
- RBAC support — Different policies for different agent roles
- Framework integrations — LangChain, CrewAI, AutoGen middleware
Get Started
pip install tokenfence
from tokenfence import Policy
policy = Policy(
allow=["read_*", "list_*"],
deny=["delete_*", "drop_*"],
)
# That's it. Your agent now has guardrails that actually enforce.
result = policy.check("read_database") # ✅ allowed
result = policy.check("drop_table") # 🚫 denied
96 tests passing. Zero dependencies. MIT licensed. Built for developers who learned from March 2026 that prompts are suggestions — but policies are law.
TokenFence is the cost circuit breaker and runtime guardrail for AI agents. Budget caps + least-privilege policies. Because "don't delete anything" in a system prompt is not a security strategy.
Ready to protect your AI budget?
Two lines of code. Per-workflow budgets. Automatic model downgrade. Hard kill switch.