Individual events look benign. The sequence is what kills you. A probabilistic graph over runtime sessions — scored with Bayesian inference, updated in real time, flagged when the posterior crosses a threshold.
Scans cloud topology. Finds "toxic combinations" — an exposed VM + a critical CVE + an overprivileged IAM role + access to PII. The path through infrastructure is the risk, not any single misconfiguration.
Observes runtime sessions. Finds toxic chains — a postinstall hook + a credential read + an outbound curl + process delegation. The sequence through time is the risk, not any single event.
Each new event in the session updates the posterior probability of compromise. Events that are individually low-signal become high-signal in sequence. The posterior accumulates context. When it crosses the threshold, we flag.
Capture semantic actions from the runtime session: file reads, network calls, process spawns, package installs, secret accesses. Terminal-native, not process-tree snapshots.
Build a directed acyclic graph where each node is a semantic action and each edge carries a conditional probability — how much riskier is this event given what preceded it?
Apply Bayesian inference at each new event. Update the posterior. Compare against the learned baseline. Flag when the accumulated probability crosses the threat threshold.
The runtime is where the action is. VC money is pouring into adjacent spaces — but nobody owns the session chain. Here's who's nearby and where the gap is.
Package scanners stop at install. EDR sees process trees but not agent intent. Browser tools cover one surface. Runtime app security watches code paths inside the application. Cloud security graphs map infrastructure topology. Nobody stitches the developer/agent terminal session into a continuous chain and applies probabilistic inference to score it in real time. That's the opening — temporal graph + Bayesian scoring over semantic actions, sitting between package scanning and EDR.
An attacker compromised an npm publish token for Cline CLI, a popular AI coding assistant. They pushed version 2.3.0 with a modified postinstall script that silently installed OpenClaw, an autonomous AI agent, onto developer machines. The attack went undetected for 8 hours. Here's what Session Graph would have seen:
⚠ SESSION FLAGGED at Step 4 — posterior crossed 0.75 threshold
The critical chain: a postinstall hook from a package install globally installs an unknown package, which spawns an autonomous process. Each step's LR is amplified by its predecessor. By step 4, the Bayesian posterior is at 0.71 and rising fast — the session would be flagged before any credential access or exfiltration occurs.
Package scanner: Cline CLI was a known, trusted package. No CVE, no malware signature — the postinstall modification happened after the token was compromised. SCA tools saw nothing wrong. EDR: node spawned npm spawned node. Normal developer activity. No individual process was flagged. Session Graph: Caught the chain at step 4 by accumulating evidence across the sequence — postinstall hook → global install → autonomous agent spawn is a toxic chain regardless of whether each individual step is "known malicious."
What's a "node"? This defines everything downstream — the graph, the inference, the instrumentation. V1 proposal: 13 types (file_read, file_write, secret_access, process_spawn, pkg_install, code_exec, net_outbound, net_inbound, env_read, delegation, llm_call, llm_tool_use, unknown). Each needs a metadata schema and a normalization layer that maps from raw telemetry across different agents, shells, and runtimes into a shared vocabulary.
This is the engine — how P(event | benign) and P(event | malicious) are computed. Start with hand-crafted rules: base LR per action type × metadata modifier × sequential modifier (Markov context). Rules are debuggable, ship fast, SOC teams trust them, and they generate the labeled data you need to train a model later. Hybrid V2. Rules now.
Where do initial probabilities come from with zero labeled data? Ship a universal prior seeded from known kill chains (MITRE ATT&CK mapped to your taxonomy), let it be overridden by org-specific baselines as data accumulates. The universal prior is V1. Per-tenant adaptation is V2. Cold start is simpler once the likelihood model is rule-based.
Session events are deeply correlated — reading a config makes an API call more likely. Start with a Markov assumption: condition on the last 2–3 events, not full history. Cheap to compute, captures the most important sequential dependencies (credential read after postinstall hook). Most attack chains are 3–5 critical steps, so a small context window captures the signal.
The product differentiator. A SOC analyst who gets an alert with a score and no explanation will ignore it. Need per-event posterior deltas, a visual chain with critical edges highlighted, and plain-English summary: "this credential read after this postinstall hook raised probability by 34%." The Wiz playbook — show the toxic path, not just the score. Build into the architecture from day one.
Alert too much and you're dead on arrival. Too little and you're useless. Start with a sensible default (T = 0.75) plus a calibration step during onboarding: observe a week of sessions, compute the distribution, set threshold at the 99.5th percentile. Adaptive thresholds per team/role is V2.
Must answer "what is a session" before shipping. V1: a session is a contiguous agent execution from invocation to exit. Idle >N minutes = new session. Agent spawns sub-agent = child session linked to parent. Don't try to solve the local/cloud handoff yet — document the limitation. Stitch across substrates in V2.
Real sessions aren't linear — agents spawn subprocesses, tools fork, outputs merge. V1 can treat sessions as mostly linear chains. Most real attack chains are linear (install → hook → read → exfil). Branching and merging matters for sophisticated multi-agent orchestration — a later customer segment.
Bayesian update on a Markov chain is O(1) per event — fine for V1. Becomes real at thousands of concurrent sessions with full DAG inference. Start on-device (edge agent), add central aggregation later. The latency question — can you flag before exfiltration completes? — matters but is solvable with the Markov approach.
Important but premature. You need attackers to encounter and try to evade your system before you know what evasion looks like. Build logging and replay into V1 so you can study evasion attempts post-hoc. Decay functions to prevent "washing" the posterior and pacing anomaly detection come in V2 once you have real adversarial data.