The Agent Firewall Checklist

What an agent firewall has to do.

"Agent firewall" is becoming a crowded label. Here is a vendor-neutral way to tell whether something earns it. Six requirements, each with the reason behind it. They are the bar we hold Strathon to, and the same questions are worth asking of any tool in this category — including ours.

It inspects the arguments before the tool runs

Decide on the actual call — tool name and arguments — before the tool's body executes.

A firewall that only sees a request after the side effect has happened is a logger, not a firewall. The email is already sent; the row is already deleted. The decision has to land in the window between the model choosing a tool and that tool doing its work, with the real arguments in hand. Matching on the tool name alone is not enough: send_email to a teammate and send_email to an attacker's address are the same tool and opposite outcomes.

It fails closed, not open

When the firewall cannot reach a verdict (the policy service is down, a hook can't run, a surface only supports observation), the unevaluated action is blocked, not waved through.

The dangerous failure is the silent one: a policy that matches but cannot be enforced, so the call proceeds as if no rule existed. Any honest tool has surfaces where it can enforce fully and surfaces where it can only observe. What matters is that it tells you which is which, and that the can't-enforce case denies rather than allows. Fail-open under load is the same as having no firewall exactly when you need it most. There is a narrower question this should not be confused with: what happens to in-flight calls when the control plane itself is briefly unreachable. Strathon keeps enforcing against recent cached policy during a short outage, and for agents that should stop rather than coast on stale state, a configurable staleness window flips that to a halt.

A human can stand in the loop

High-consequence actions can pause and wait for a person to approve or deny, with the call genuinely held until the decision arrives.

Not every risky action should be a hard block; some should be a question. The capability that separates a real control plane from a static rule set is the ability to suspend a specific tool call, route it to a human, and resume or cancel it based on the answer. The hard part is that the held call must actually block. A fake pause that lets the action through while pretending to wait is worse than no approval at all.

It produces evidence you can hand to an auditor

Every decision — allowed, blocked, redacted, approved — is recorded in a tamper-evident log you can export.

When something goes wrong, or a regulator asks, 'show me what the agent was allowed to do and what you stopped,' the answer has to be a record, not a recollection. That means an append-only trail with integrity protection (so an entry can't be quietly altered after the fact) and an export path in a form a compliance process can actually consume.

The boundary does not rely on the agent's good behavior

There is a way to enforce that does not depend on the agent choosing to route through the firewall.

If the only enforcement point is inside the agent's own process, a compromised or buggy agent can step around it. A serious posture has a layer the agent cannot opt out of by deployment. For outbound traffic, that means the ability to make the network path itself mandatory, not merely a proxy variable the agent is asked to honor. Be skeptical of any tool whose entire defense lives in the same process it is trying to police.

It is honest about what it does not cover

The tool is explicit about the attacks it cannot stop at this boundary, instead of implying total coverage.

No single enforcement point catches everything. Data-flow exfiltration that hides sensitive data inside an otherwise-valid argument, reasoning-level manipulation, memory poisoning: several real agent risks are not solvable at the tool-call boundary by anyone. A coverage table with a checkmark in every box is a warning sign. The useful question is not 'does it claim everything' but 'does it tell you the truth about the gaps.'

How Strathon measures up

We built Strathon against this list, so it is fair to ask how it scores. It inspects tool-call arguments in-process before the tool body runs, enforces the same CEL policy at three layers (in-process, the MCP gateway, and the egress proxy), holds calls for real human approval where a policy requires it, and writes every decision to an HMAC-chained audit log. Where a framework surface can only observe rather than enforce, it fails closed and says so.

And the gaps, plainly: the egress proxy ships in explicit-proxy mode today, so making the boundary truly un-bypassable depends on you isolating the agent's network. We wrote a recipe for exactly that, and transparent interception is on the roadmap. Data-flow exfiltration and reasoning-level attacks are not boundary-solvable, and we don't claim otherwise. Our Scope & Limitations page is the long version of where the lines are.