๐Ÿ›ก๏ธ Interven

Fail-open vs Fail-closed

When Interven is unreachable, should your agent keep going or hard-stop? Strategy guide for choosing per integration.

If Interven is unreachable (network blip, gateway down, scan timeout), your agent faces a binary choice:

  • Fail-open โ€” proceed with the action without policy enforcement
  • Fail-closed โ€” block the action; treat unreachable as DENY

This page is the canonical guide on choosing between them. Defaults across SDKs and plugins are listed at the bottom.

The trade-off

Fail-openFail-closed
If Interven is downAgent keeps workingAgent stops
RiskA dangerous action could slip throughA safe action gets blocked unnecessarily
Right whenThe agent has its own safety net, or downtime cost > occasional missed scanThe agent has tool access where one missed scan is unrecoverable (deletion, money movement)

There's no universally right answer โ€” it depends on the blast radius of the action you're protecting.

Agent / contextRecommendedWhy
Internal dev / stagingFail-openDon't break developer flow
Production SaaS-customer-facingFail-closedProtect customers; downtime is a known recoverable mode
Fintech / money movementFail-closed, alwaysOne missed scan can move money irrecoverably
Healthcare / PHIFail-closed, alwaysOne missed scan can leak PHI irrecoverably
SRE agents touching prod infraFail-closedDestructive ops with no undo
Coding agents (Claude Code, Cursor)Fail-openPreserves UX during dev; the agent has its own undo (git)
Customer-support chat agentsFail-open with alertDon't break chat UX; raise a Sev-2 incident if Interven is down
Browser agentsFail-closed for form submits, fail-open for readsDifferentiate by action class

How to configure

SDKs (Python interven, JS @interven/sdk-js)

client = Client(
    api_key="...",
    fail_closed=True,   # default: False
    timeout=10.0,       # seconds; on timeout fail-open or fail-closed per setting
)
const client = new IntervenClient({
  apiKey: '...',
  failClosed: true,            // default: false
  timeoutMs: 10_000,
});

When fail_closed=True:

  • 5xx after retries โ†’ SDK raises GatewayError (don't catch silently)
  • Timeout โ†’ SDK raises TimeoutError
  • Network unreachable โ†’ SDK raises connection error

When fail_closed=False (default):

  • All of the above โ†’ SDK returns a synthesized ScanVerdict with decision="ALLOW", reason_codes=["GATEWAY_UNREACHABLE"]. Your code can still detect this case and choose to block locally.

Plugins / hooks

IntegrationEnv varDefault
@interven/claude-code-hookINTERVEN_FAIL_CLOSED=1fail-open (0)
@interven/copilot-hookINTERVEN_FAIL_CLOSED=1fail-open (0)
@interven/mcp-guardINTERVEN_FAIL_OPEN=1fail-closed (0)
@interven/gateway CLI--fail-closed flagfail-open
openclaw-interven-guardfailClosedOnTimeout: true in plugin configfail-open

Yes, the polarity differs across plugins โ€” historical reason. Standardizing to INTERVEN_FAIL_CLOSED=1 is on the roadmap; for now follow the table above.

Inbound Routes (SaaS agent destination-side proxy)

Per-route setting at Console โ†’ Inbound Routes โ†’ Edit โ†’ Failure mode:

  • ALLOW (default) โ€” on Interven internal error, forward to upstream with NO scan. Logged as GATEWAY_INTERNAL_ERROR.
  • BLOCK โ€” return HTTP 503 to the SaaS agent. Agent treats as transient + retries.

For regulated environments and any healthcare / fintech route, set BLOCK.

Alerts on fail-* events

Whichever you choose, configure an alert for the failure mode itself:

Console โ†’ Alerts โ†’ Add channel โ†’ Events โ†’ "Gateway internal error"

Get a Sev-2 page when Interven is degraded. Even if your agents are fail-open, you want to know โ€” and your SLO / on-call needs to.

For fail-closed setups: ALSO alert on the agent-side ("scan timed out from your-agent-name"). The SDK emits this via the standard logging facility; route to your alert pipeline.

Detecting it from the verdict

When fail_open returns the synthesized ALLOW, your code can still react:

verdict = client.scan(...)
if "GATEWAY_UNREACHABLE" in verdict.reason_codes:
    # Interven was down; the ALLOW is synthesized
    if action_is_irrecoverable(method, url, body):
        raise BlockedLocallyError("Refusing to proceed without policy enforcement")
    else:
        log.warning("Fail-open ALLOW; proceeding")
    proceed_with_upstream(...)

This pattern โ€” local-default-deny on irrecoverable actions, default-allow elsewhere โ€” gives you the best of both modes.

What we recommend

If you're starting fresh and undecided, default your agent's "dangerous tool" paths to fail-closed, and everything else to fail-open. The split is roughly:

  • DELETE, terminate, transfer, refund, share-external, force-push โ€” fail-closed
  • READ, SELECT, list, get โ€” fail-open

Tag actions in your code with their fail-mode and route the scan through one of two pre-configured Client instances.