๐Ÿ›ก๏ธ Interven
Reference

Risk Signals

The signals that feed the per-scan risk_score. What each engine looks at, what it contributes, and how to tune.

Every scan produces a risk_score in [0.0, 1.0] plus a risk_band (LOW / MED / HIGH / CRITICAL). The score is a weighted combination of signals from six engines.

This page documents each engine โ€” what it inputs, what it outputs, default weight, and how to tune.

How the score is computed

risk_score = clamp(
  ฮฃ (engine_weight[i] ร— engine_score[i]) + trust_modifier,
  0, 1
)

Where each engine_score is in [0, 1], engine_weight is the tenant-configured weight (sums to ~1.0), and trust_modifier shifts the result based on the agent's trust score (penalize low-trust agents, gently reward high-trust ones).

risk_band mapping (default โ€” tunable):

BandRange
LOW0.00 โ€“ 0.249
MED0.25 โ€“ 0.499
HIGH0.50 โ€“ 0.749
CRITICAL0.75 โ€“ 1.00

(Authoritative source: riskBand() in packages/shared/src/helpers.ts.)

The six engines

1. Classifier

What it inputs: the request body (JSON object, array, or string). What it outputs: detected data classes โ€” SECRETS, PII, PHI, INTERNAL. Score: 0.0 if nothing detected; up to 1.0 if multiple classes hit + high severity (e.g. a long private key + PHI in same body). Default weight: 0.30 โ€” the heaviest engine.

Tune at Settings โ†’ Risk Engines โ†’ Classifier:

  • Per-class weight (SECRETS heavier than INTERNAL?)
  • Enable / disable individual patterns within a class
  • Add custom secret patterns (regex, scoped to your tenant)

2. Baseline

What it inputs: this agent's recent traffic. What it outputs: an anomaly score โ€” how different is this call from the agent's normal pattern? Score: 0.0 if normal; up to ~0.8 for clear anomalies. Capped to leave room for other engines. Default weight: 0.15.

Anomalies include:

  • New destination (URL host not seen before)
  • New operation (operation not seen for this tool)
  • Unusual time-of-day vs the agent's typical window
  • Unusual body size

Baseline takes ~50 successful scans to build a per-agent model. Before that, only the "new destination" component fires.

Tune at Settings โ†’ Risk Engines โ†’ Baseline:

  • Sensitivity (1โ€“10; higher = more flagging)
  • Minimum-history threshold (default 50 scans)

3. Correlation

What it inputs: recent scans across the tenant (not just this agent). What it outputs: a pattern-match score for known dangerous sequences. Score: 0.0 if no pattern matches; up to 1.0 for the most dangerous patterns. Default weight: 0.15.

Detected patterns:

PatternWhat it catches
Read-then-write exfilSensitive read โ†’ external destination write within 5 min
Privilege escalationIAM-create followed by policy-attach within 2 min
Mass-action burstโ‰ฅ10 mutating calls to one tool in 1 min
Token harvestingMultiple distinct credentials touched in one agent's session
Cross-agent collusionAgent A reads โ†’ Agent B writes a correlated value

Tune at Settings โ†’ Risk Engines โ†’ Correlation: enable / disable per pattern.

4. Threat Intel

What it inputs: the destination URL host, IP, or any URL in the body. What it outputs: match score against six free feeds. Score: 0.0 if no match; 1.0 on any feed hit. Default weight: 0.20 โ€” a hit alone is enough to push past HIGH.

Feeds (default โ€” toggle per-tenant):

  • urlhaus โ€” abuse.ch malware URL list
  • urlabuse โ€” phishing URL list
  • openphish โ€” community phishing list
  • phishtank โ€” phishing URL DB
  • abuseipdb-free โ€” IP reputation
  • alienvault-otx โ€” OSINT threat exchange

Feeds refresh daily. Hits write to the incident system (auto-opens a Threat-intel match incident at default settings).

Tune at Settings โ†’ Risk Engines โ†’ Threat Intel:

  • Enable / disable individual feeds
  • Add custom denylist URLs / IPs (your tenant's known-bad)
  • Add custom allowlist (suppress feed matches for known-clean URLs you control)

5. Semantic

What it inputs: the body text + the optional context field on the scan request. What it outputs: intent classification โ€” "is this attempting an action whose intent matches a flagged category?" Score: 0.0โ€“1.0 from a small classifier model. Default weight: 0.10.

Flagged categories:

CategoryTrigger
EXFILTRATIONBody or context suggests pulling data to an external destination
PRIVILEGE_ESCALATIONWording suggests granting elevated access
DESTRUCTIVEWording suggests irrecoverable deletion
PROMPT_INJECTIONBody contains common prompt-injection markers
LATERAL_MOVEMENTBody references multiple internal systems in a way that suggests pivot

Semantic is the least-deterministic engine. It's most useful for catching new attacks the rule-based engines miss; weight it low until you've tuned thresholds against your real traffic.

Tune at Settings โ†’ Risk Engines โ†’ Semantic: per-category threshold, or disable entirely (drops weight to 0; other engines' weights re-normalize).

6. Trust modifier

What it inputs: the agent's current trust score (agent_trust_state.trust_score). What it outputs: a shift to the combined score. Range: โˆ’0.10 (high trust, modest discount) to +0.20 (low trust, significant penalty). Not capped at 0/1 โ€” final score is clamped after. Default weight: Always applied; not a percentage.

Why it's not a normal engine: trust modifies how seriously we take the other signals, rather than being an independent signal itself.

See Agents page for what raises and lowers trust.

Viewing per-scan breakdown

Every Activity trace's Engines tab shows:

Classifier        0.42  โœ“ PII detected (email + SSN)
Baseline          0.18  โœ“ Unusual time-of-day (agent normally idle 22:00โ€“06:00)
Correlation       0.00
Threat Intel      0.00
Semantic          0.05  Mild EXFILTRATION lean
Trust modifier    0.08  Agent trust 0.55 (scrutiny)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Final risk_score  0.73 โ†’ HIGH

Tuning weights

Default weights are designed for typical SaaS-agent traffic. Tune at Settings โ†’ Risk Engines โ†’ Weights:

  • The 5 engine weights must sum to 1.0 (UI enforces this on save)
  • Trust modifier is independent (not part of the sum)
  • Test on Activity replay before saving โ€” Settings โ†’ Risk Engines โ†’ "Replay last 24h" shows what the new weights would have decided for recent traffic

Disabling an engine

Set its weight to 0. The other weights auto-re-normalize. The engine still runs (so you see its score in Activity for transparency) but its score doesn't affect the decision.

Common reason to disable: noisy semantic engine on a tenant with a lot of prompt-engineering traffic.

Per-policy access

You can target risk-engine output in a policy rule:

{
  "condition": { "risk_score_gt": 0.6 },
  "action": "REQUIRE_APPROVAL"
}

Or target specific signals:

{
  "condition": { "threat_intel_match": true },
  "action": "DENY",
  "incident_severity": "CRITICAL",
  "action": "OPEN_INCIDENT"
}

See Policy DSL Reference for all risk-related condition keys.