← Back to Blog
AgentGuardAI SafetyLeast PrivilegeRuntime EnforcementPolicy EngineTokenFence

Introducing AgentGuard: Least-Privilege Policies for AI Agents — Because Prompts Are Not Permissions

·10 min read

You wouldn't give a new hire admin access to your production database on day one. So why are we giving AI agents unrestricted access to every tool in the system and crossing our fingers that a prompt instruction will keep them in line?

After the Meta AI agent SEV1 incident and Grigorev's production database wipe, the industry has a painfully obvious lesson staring it in the face: prompts are suggestions. Policies are enforcement.

Today, we're shipping the AgentGuard Policy Engine — a new module in TokenFence that brings least-privilege enforcement to AI agents. Define what your agents can do, deny what they can't, require human approval for dangerous operations, and audit every decision.

The Problem: Agents With Root Access

Most AI agent frameworks today give agents access to tools through function calling. The agent gets a list of available tools, picks which ones to call, and the framework executes them. The security model? A system prompt that says "be careful."

Here's what that looks like in practice:

# The "security model" most AI agents use today
system_prompt = """
You are a helpful database assistant.
You can read data and generate reports.
IMPORTANT: Never delete, drop, or modify any data.
"""

# But the agent has access to ALL tools...
tools = [
    read_database,
    write_database,
    delete_record,    # "But I told it not to!"
    drop_table,       # "The prompt said don't!"
    truncate_logs,    # "It shouldn't call this..."
]

This is the equivalent of giving someone the keys to your house and a sticky note that says "please don't go in the bedroom." It might work most of the time. But when it fails, it fails catastrophically.

What Went Wrong: The March 2026 Incidents

Two high-profile incidents in March 2026 proved this isn't theoretical:

  • Meta's AI Agent SEV1 — An autonomous AI agent in Meta's internal systems triggered a cascade of unintended actions that required an emergency response. The agent was operating within its prompt instructions but exceeded its intended scope of operations.
  • Grigorev Database Wipe — A developer's AI coding agent, given broad tool access for a refactoring task, executed destructive database commands on a production system. The agent was "told" not to touch production, but prompt-level instructions have zero enforcement power.

In both cases, the pattern is identical: the agent had the capability to do dangerous things, and a prompt-level instruction was the only "guardrail."

The Solution: Runtime Policy Enforcement

AgentGuard brings the principle of least privilege to AI agents. Instead of trusting prompt instructions, you declare a policy — a machine-enforced contract that defines what an agent can and cannot do.

from tokenfence import Policy

# Define what this agent is allowed to do
policy = Policy(
    allow=["read_database", "generate_report", "list_tables"],
    deny=["delete_*", "drop_*", "truncate_*", "alter_*"],
    require_approval=["write_database", "create_table"],
    name="database-readonly-agent",
)

# Check before executing any tool call
result = policy.check("read_database")
assert result.allowed  # ✅ Explicitly permitted

result = policy.check("drop_table")
assert result.denied   # 🚫 Blocked — no prompt can override this

result = policy.check("write_database")
assert result.needs_approval  # ⏸️ Requires human confirmation

This isn't a suggestion. It's enforcement. The agent physically cannot call drop_table because the policy engine blocks it before the tool is invoked.

How It Works: Three Layers of Protection

1. Allow/Deny Lists with Wildcards

Policies use pattern matching (fnmatch-style) to define broad or specific rules:

policy = Policy(
    allow=["read_*", "list_*", "search_*"],  # Read operations
    deny=["delete_*", "drop_*", "rm_*"],      # Destructive operations
    default="deny",  # Deny everything not explicitly allowed
)

The default="deny" setting means any tool not matching an allow pattern is automatically blocked. This is the secure default — if you forget to add a new tool to the allow list, it's denied rather than permitted.

2. Approval Gates for Sensitive Operations

Some operations aren't dangerous by themselves but need human oversight. Approval gates pause execution and wait for confirmation:

policy = Policy(
    allow=["*"],
    deny=["drop_*", "truncate_*"],
    require_approval=["send_email", "publish_post", "transfer_funds"],
    on_approval=lambda result: ask_human(f"Allow {result.tool}?"),
)

When the agent tries to call send_email, the policy engine invokes your approval callback. You can implement this as a Slack message, a webhook, a CLI prompt — whatever fits your workflow.

3. Full Audit Trail

Every policy decision is logged with timestamps, the tool name, the decision, the matched rule, and optional context:

policy = Policy(allow=["read_*"], deny=["delete_*"], audit=True)

policy.check("read_file", context={"agent_id": "data-bot", "session": "abc123"})
policy.check("delete_file", context={"agent_id": "data-bot", "session": "abc123"})

# Inspect the audit trail
for entry in policy.audit_log:
    print(f"{entry.tool} → {entry.decision} ({entry.reason})")
    # read_file → allow (allowed by pattern: read_*)
    # delete_file → deny (denied by pattern: delete_*)

This gives you a complete, queryable record of every tool call your agents attempted and whether it was permitted. Essential for compliance auditing, incident investigation, and debugging agent behavior.

Real-World Policy Templates

Database Agent (Post-Grigorev)

db_policy = Policy(
    allow=["read_database", "generate_report", "list_tables", "describe_schema"],
    deny=["delete_*", "drop_*", "truncate_*", "alter_*", "grant_*", "revoke_*"],
    require_approval=["write_database", "create_table", "create_index"],
    name="database-safety",
)

Email Agent

email_policy = Policy(
    allow=["read_email", "search_email", "draft_email", "list_folders"],
    deny=["delete_email_*", "forward_to_external", "create_rule"],
    require_approval=["send_email"],
    name="email-agent",
)

Code Review Agent

review_policy = Policy(
    allow=["read_file", "list_files", "git_diff", "git_log", "run_tests"],
    deny=["git_push", "git_force_*", "rm_*", "deploy_*"],
    require_approval=["git_commit", "create_pr"],
    name="code-reviewer",
)

Policy-as-Code: Version Control Your Security

Policies serialize to plain dictionaries, so you can store them as JSON or YAML in version control:

# Save policy to config
import json

config = policy.to_dict()
with open("policies/database-agent.json", "w") as f:
    json.dump(config, f, indent=2)

# Load from config
with open("policies/database-agent.json") as f:
    config = json.load(f)
policy = Policy.from_dict(config)

This means your agent policies are:

  • Reviewable — PRs that change agent permissions are visible to the whole team
  • Testable — Write unit tests for your policies (we do — 31 tests for the policy engine alone)
  • Auditable — Git history shows who changed what permission and when
  • Rollbackable — Bad policy change? Revert the commit

The enforce() Shortcut

If you want hard stops (exceptions) instead of checking return values, use enforce():

from tokenfence import Policy, ToolDenied

policy = Policy(allow=["read_*"])

try:
    policy.enforce("delete_users")
except ToolDenied as e:
    print(f"Blocked: {e.tool} — {e.reason}")
    # Blocked: delete_users — not in allow list (default: deny)

This integrates naturally into agent middleware — wrap your tool execution in a policy.enforce() call and denied tools raise exceptions that your framework can catch and handle gracefully.

Combined with Budget Caps: Defense in Depth

AgentGuard policies work alongside TokenFence's existing budget caps. Together, they form a defense-in-depth strategy:

from tokenfence import guard, Policy

# Layer 1: Budget cap — financial circuit breaker
client = guard(openai.OpenAI(), budget=5.00, fallback="gpt-4o-mini")

# Layer 2: Policy — operational circuit breaker
policy = Policy(
    allow=["read_database", "generate_report"],
    deny=["delete_*", "drop_*"],
    require_approval=["write_database"],
)

# Before every tool call:
result = policy.enforce(tool_name)  # Blocks unauthorized tools
# Then use the guarded client for LLM calls:
response = client.chat.completions.create(...)  # Caps spending

Two lines for budget control. Three lines for tool control. No infrastructure. No config files. Just Python.

What's Next

The Policy engine ships today in TokenFence v0.3.0 (pip install tokenfence==0.3.0). Here's what's coming next:

  • Node.js/TypeScript SDK parity — Same policy engine for the npm package
  • Dashboard MVP — Visual audit log viewer and policy editor
  • Webhook alerts — Get notified when agents hit denied tools
  • RBAC support — Different policies for different agent roles
  • Framework integrations — LangChain, CrewAI, AutoGen middleware

Get Started

pip install tokenfence
from tokenfence import Policy

policy = Policy(
    allow=["read_*", "list_*"],
    deny=["delete_*", "drop_*"],
)

# That's it. Your agent now has guardrails that actually enforce.
result = policy.check("read_database")  # ✅ allowed
result = policy.check("drop_table")     # 🚫 denied

96 tests passing. Zero dependencies. MIT licensed. Built for developers who learned from March 2026 that prompts are suggestions — but policies are law.

TokenFence is the cost circuit breaker and runtime guardrail for AI agents. Budget caps + least-privilege policies. Because "don't delete anything" in a system prompt is not a security strategy.

Ready to protect your AI budget?

Two lines of code. Per-workflow budgets. Automatic model downgrade. Hard kill switch.