Risk Signals
The signals that feed the per-scan risk_score. What each engine looks at, what it contributes, and how to tune.
Every scan produces a risk_score in [0.0, 1.0] plus a risk_band (LOW / MED /
HIGH / CRITICAL). The score is a weighted combination of signals from six engines.
This page documents each engine โ what it inputs, what it outputs, default weight, and how to tune.
How the score is computed
risk_score = clamp(
ฮฃ (engine_weight[i] ร engine_score[i]) + trust_modifier,
0, 1
)Where each engine_score is in [0, 1], engine_weight is the tenant-configured
weight (sums to ~1.0), and trust_modifier shifts the result based on the agent's
trust score (penalize low-trust agents, gently reward high-trust ones).
risk_band mapping (default โ tunable):
| Band | Range |
|---|---|
| LOW | 0.00 โ 0.249 |
| MED | 0.25 โ 0.499 |
| HIGH | 0.50 โ 0.749 |
| CRITICAL | 0.75 โ 1.00 |
(Authoritative source: riskBand() in packages/shared/src/helpers.ts.)
The six engines
1. Classifier
What it inputs: the request body (JSON object, array, or string).
What it outputs: detected data classes โ SECRETS, PII, PHI, INTERNAL.
Score: 0.0 if nothing detected; up to 1.0 if multiple classes hit + high severity (e.g. a long private key + PHI in same body).
Default weight: 0.30 โ the heaviest engine.
Tune at Settings โ Risk Engines โ Classifier:
- Per-class weight (
SECRETSheavier thanINTERNAL?) - Enable / disable individual patterns within a class
- Add custom secret patterns (regex, scoped to your tenant)
2. Baseline
What it inputs: this agent's recent traffic. What it outputs: an anomaly score โ how different is this call from the agent's normal pattern? Score: 0.0 if normal; up to ~0.8 for clear anomalies. Capped to leave room for other engines. Default weight: 0.15.
Anomalies include:
- New destination (URL host not seen before)
- New operation (operation not seen for this tool)
- Unusual time-of-day vs the agent's typical window
- Unusual body size
Baseline takes ~50 successful scans to build a per-agent model. Before that, only the "new destination" component fires.
Tune at Settings โ Risk Engines โ Baseline:
- Sensitivity (1โ10; higher = more flagging)
- Minimum-history threshold (default 50 scans)
3. Correlation
What it inputs: recent scans across the tenant (not just this agent). What it outputs: a pattern-match score for known dangerous sequences. Score: 0.0 if no pattern matches; up to 1.0 for the most dangerous patterns. Default weight: 0.15.
Detected patterns:
| Pattern | What it catches |
|---|---|
| Read-then-write exfil | Sensitive read โ external destination write within 5 min |
| Privilege escalation | IAM-create followed by policy-attach within 2 min |
| Mass-action burst | โฅ10 mutating calls to one tool in 1 min |
| Token harvesting | Multiple distinct credentials touched in one agent's session |
| Cross-agent collusion | Agent A reads โ Agent B writes a correlated value |
Tune at Settings โ Risk Engines โ Correlation: enable / disable per pattern.
4. Threat Intel
What it inputs: the destination URL host, IP, or any URL in the body. What it outputs: match score against six free feeds. Score: 0.0 if no match; 1.0 on any feed hit. Default weight: 0.20 โ a hit alone is enough to push past HIGH.
Feeds (default โ toggle per-tenant):
- urlhaus โ abuse.ch malware URL list
- urlabuse โ phishing URL list
- openphish โ community phishing list
- phishtank โ phishing URL DB
- abuseipdb-free โ IP reputation
- alienvault-otx โ OSINT threat exchange
Feeds refresh daily. Hits write to the incident system (auto-opens a
Threat-intel match incident at default settings).
Tune at Settings โ Risk Engines โ Threat Intel:
- Enable / disable individual feeds
- Add custom denylist URLs / IPs (your tenant's known-bad)
- Add custom allowlist (suppress feed matches for known-clean URLs you control)
5. Semantic
What it inputs: the body text + the optional context field on the scan
request.
What it outputs: intent classification โ "is this attempting an action whose
intent matches a flagged category?"
Score: 0.0โ1.0 from a small classifier model.
Default weight: 0.10.
Flagged categories:
| Category | Trigger |
|---|---|
EXFILTRATION | Body or context suggests pulling data to an external destination |
PRIVILEGE_ESCALATION | Wording suggests granting elevated access |
DESTRUCTIVE | Wording suggests irrecoverable deletion |
PROMPT_INJECTION | Body contains common prompt-injection markers |
LATERAL_MOVEMENT | Body references multiple internal systems in a way that suggests pivot |
Semantic is the least-deterministic engine. It's most useful for catching new attacks the rule-based engines miss; weight it low until you've tuned thresholds against your real traffic.
Tune at Settings โ Risk Engines โ Semantic: per-category threshold, or disable entirely (drops weight to 0; other engines' weights re-normalize).
6. Trust modifier
What it inputs: the agent's current trust score (agent_trust_state.trust_score).
What it outputs: a shift to the combined score.
Range: โ0.10 (high trust, modest discount) to +0.20 (low trust, significant
penalty). Not capped at 0/1 โ final score is clamped after.
Default weight: Always applied; not a percentage.
Why it's not a normal engine: trust modifies how seriously we take the other signals, rather than being an independent signal itself.
See Agents page for what raises and lowers trust.
Viewing per-scan breakdown
Every Activity trace's Engines tab shows:
Classifier 0.42 โ PII detected (email + SSN)
Baseline 0.18 โ Unusual time-of-day (agent normally idle 22:00โ06:00)
Correlation 0.00
Threat Intel 0.00
Semantic 0.05 Mild EXFILTRATION lean
Trust modifier 0.08 Agent trust 0.55 (scrutiny)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Final risk_score 0.73 โ HIGHTuning weights
Default weights are designed for typical SaaS-agent traffic. Tune at Settings โ Risk Engines โ Weights:
- The 5 engine weights must sum to 1.0 (UI enforces this on save)
- Trust modifier is independent (not part of the sum)
- Test on Activity replay before saving โ Settings โ Risk Engines โ "Replay last 24h" shows what the new weights would have decided for recent traffic
Disabling an engine
Set its weight to 0. The other weights auto-re-normalize. The engine still runs (so you see its score in Activity for transparency) but its score doesn't affect the decision.
Common reason to disable: noisy semantic engine on a tenant with a lot of prompt-engineering traffic.
Per-policy access
You can target risk-engine output in a policy rule:
{
"condition": { "risk_score_gt": 0.6 },
"action": "REQUIRE_APPROVAL"
}Or target specific signals:
{
"condition": { "threat_intel_match": true },
"action": "DENY",
"incident_severity": "CRITICAL",
"action": "OPEN_INCIDENT"
}See Policy DSL Reference for all risk-related condition keys.