Session-Level Security

The chain is
the attack surface

Individual events look benign. The sequence is what kills you. A probabilistic graph over runtime sessions — scored with Bayesian inference, updated in real time, flagged when the posterior crosses a threshold.

Wiz

Spatial graph · Cloud infrastructure

Scans cloud topology. Finds "toxic combinations" — an exposed VM + a critical CVE + an overprivileged IAM role + access to PII. The path through infrastructure is the risk, not any single misconfiguration.

Session Graph

Temporal graph · Agent runtime

Observes runtime sessions. Finds toxic chains — a postinstall hook + a credential read + an outbound curl + process delegation. The sequence through time is the risk, not any single event.

P(threat | chain) P(chain | threat) × P(threat)

Each new event in the session updates the posterior probability of compromise. Events that are individually low-signal become high-signal in sequence. The posterior accumulates context. When it crosses the threshold, we flag.

session_monitor — agent:cursor-ai — pid:48291
P(threat | observed chain)
0.02
threshold 0.75

Instrument

Capture semantic actions from the runtime session: file reads, network calls, process spawns, package installs, secret accesses. Terminal-native, not process-tree snapshots.

Graph

Build a directed acyclic graph where each node is a semantic action and each edge carries a conditional probability — how much riskier is this event given what preceded it?

Infer

Apply Bayesian inference at each new event. Update the posterior. Compare against the learned baseline. Flag when the accumulated probability crosses the threat threshold.

The runtime is where the action is. VC money is pouring into adjacent spaces — but nobody owns the session chain. Here's who's nearby and where the gap is.

$32B
Google's Wiz acquisition (2025)
Pending close H1 2026
$40M
Onyx Security launch (March 2026)
AI agent security, Conviction + Cyberstarts
30%
Breaches involving supply chain
Verizon DBIR 2025 — 2× YoY increase

Wiz

Spatial graph + runtime
Security Graph over cloud infrastructure. eBPF-based Runtime Sensor (now on Windows too, Feb 2026) detects threats in workloads. Maps toxic combinations across resources, identities, vulnerabilities. Being acquired by Google for $32B.
Gap: Graph is spatial (cloud posture), sensor is workload-level (containers/VMs). Doesn't observe developer terminal sessions, local agent behavior, or event chains on laptops. No Bayesian inference over temporal sequences. Our direct inspiration — but different axis.

Straiker

Agentic runtime
AI agent runtime security. Inspects every prompt, reasoning step, and tool call. Blocks prompt injection, data leaks, agent manipulation. Multiple 6- and 7-figure deals in financial services and healthcare (Feb 2026).
Gap: Watches the model↔tool interface (what the agent decides to do). Doesn't instrument the terminal/shell substrate — file reads, process spawns, network calls that happen outside the agent framework. Misses unmanaged tools and CLI composition.

CrowdStrike / EDR

Process telemetry
Monitors process trees, file changes, network connections. Detects known malicious binaries and suspicious process behavior. The incumbent runtime security layer.
Gap: Sees verbs (which processes ran) but not the story (what the chain was doing). No semantic understanding of agent sessions. Every step in a supply chain attack can look individually normal to EDR.

Zenity

Agent governance
Gartner-recognized AI TRiSM vendor. Inventories AI agents across platforms, monitors runtime activity, breaks interactions into granular steps. Detection & Response engine flags prompt injection, data leaks, over-permissioned actions.
Gap: Monitors managed agent platforms (Salesforce, Microsoft Copilot, etc.). Doesn't instrument the raw terminal session or unmanaged CLI tools. Policy-based, not probabilistic — no Bayesian accumulation of evidence across a chain.

Palo Alto Prisma AIRS

AI runtime platform
Enterprise AI runtime security. Built via acquisitions: Protect AI (~$500M, mid-2025), Koi (AI agent endpoint security, Feb 2026), Chronosphere ($3.35B, observability). Model scanning, red teaming, runtime anomaly detection, identity security for agents.
Gap: Focused on model-level and platform-level security for known, managed agents. Koi adds endpoint visibility but it's early. Doesn't reconstruct developer session chains or apply probabilistic inference to terminal-level event sequences.

StepSecurity

CI/CD runtime
Harden-Runner for GitHub Actions. Monitors runtime behavior in CI/CD pipelines — network calls, file access, process execution. Trusted by 10,000+ repos including Microsoft and Google.
Gap: Scoped to CI/CD pipelines, not local developer sessions. Policy-based (allow/deny lists), not probabilistic. Doesn't model the temporal chain or accumulate evidence across events.

Falco / Tetragon

eBPF runtime detection
CNCF-graduated open-source runtime security. Hooks syscalls via eBPF, applies YAML rules to detect anomalous behavior in containers and hosts. Tetragon adds Kubernetes-native enforcement. Used by thousands of production clusters.
Gap: Rule-based event detection, not probabilistic chain scoring. Each rule fires independently — no accumulated evidence across a sequence. Designed for container/server workloads, not developer terminal sessions. Could be a data source for Session Graph's instrumentation layer.

Snyk / Sonatype / Aikido

Package scanning (SCA)
Software composition analysis. Scans dependencies for known vulns, malicious packages, typosquats. Aikido's Intel Feed detects emerging npm attacks. Chainguard provides zero-CVE base images. Foundational supply chain layer.
Gap: Tells you a package is bad before or at install. Couldn't catch the Cline CLI attack — the package itself was legitimate, the npm token was compromised. SCA tools stop at the package boundary. Session Graph starts where they stop.

The gap: nobody scores the session chain

Package scanners stop at install. EDR sees process trees but not agent intent. Browser tools cover one surface. Runtime app security watches code paths inside the application. Cloud security graphs map infrastructure topology. Nobody stitches the developer/agent terminal session into a continuous chain and applies probabilistic inference to score it in real time. That's the opening — temporal graph + Bayesian scoring over semantic actions, sitting between package scanning and EDR.

!

Cline CLI Supply Chain Attack

February 17, 2026 — ~4,000 developer machines compromised

An attacker compromised an npm publish token for Cline CLI, a popular AI coding assistant. They pushed version 2.3.0 with a modified postinstall script that silently installed OpenClaw, an autonomous AI agent, onto developer machines. The attack went undetected for 8 hours. Here's what Session Graph would have seen:

StepSemantic ActionTypePosterior
1
npm install cline@2.3.0
pkg_install
0.04
2
postinstall → npm install -g openclaw@latest
code_exec
0.19
3
global pkg install from postinstall hook
pkg_install
0.48
4
openclaw spawns autonomous agent process
process_spawn
0.71
5
agent reads local environment + credentials
secret_access
0.89
6
outbound connections to unknown endpoints
net_outbound
0.96

⚠ SESSION FLAGGED at Step 4 — posterior crossed 0.75 threshold

The critical chain: a postinstall hook from a package install globally installs an unknown package, which spawns an autonomous process. Each step's LR is amplified by its predecessor. By step 4, the Bayesian posterior is at 0.71 and rising fast — the session would be flagged before any credential access or exfiltration occurs.

What existing tools saw

Package scanner: Cline CLI was a known, trusted package. No CVE, no malware signature — the postinstall modification happened after the token was compromised. SCA tools saw nothing wrong. EDR: node spawned npm spawned node. Normal developer activity. No individual process was flagged. Session Graph: Caught the chain at step 4 by accumulating evidence across the sequence — postinstall hook → global install → autonomous agent spawn is a toxic chain regardless of whether each individual step is "known malicious."

Tier 1 — Can't build without these
Foundational decisions. Get these wrong and the whole architecture is wrong. Solve in build order.
01

Semantic Action Taxonomy

What's a "node"? This defines everything downstream — the graph, the inference, the instrumentation. V1 proposal: 13 types (file_read, file_write, secret_access, process_spawn, pkg_install, code_exec, net_outbound, net_inbound, env_read, delegation, llm_call, llm_tool_use, unknown). Each needs a metadata schema and a normalization layer that maps from raw telemetry across different agents, shells, and runtimes into a shared vocabulary.

02

Likelihood Model: Static Rules vs. Learned

This is the engine — how P(event | benign) and P(event | malicious) are computed. Start with hand-crafted rules: base LR per action type × metadata modifier × sequential modifier (Markov context). Rules are debuggable, ship fast, SOC teams trust them, and they generate the labeled data you need to train a model later. Hybrid V2. Rules now.

03

Prior Bootstrapping & Cold Start

Where do initial probabilities come from with zero labeled data? Ship a universal prior seeded from known kill chains (MITRE ATT&CK mapped to your taxonomy), let it be overridden by org-specific baselines as data accumulates. The universal prior is V1. Per-tenant adaptation is V2. Cold start is simpler once the likelihood model is rule-based.

04

Conditional Independence Assumptions

Session events are deeply correlated — reading a config makes an API call more likely. Start with a Markov assumption: condition on the last 2–3 events, not full history. Cheap to compute, captures the most important sequential dependencies (credential read after postinstall hook). Most attack chains are 3–5 critical steps, so a small context window captures the signal.

Tier 2 — Can't sell without these
Can build a demo without them, but no security team will adopt a product that doesn't address these.
05

Explainability & Alert UX

The product differentiator. A SOC analyst who gets an alert with a score and no explanation will ignore it. Need per-event posterior deltas, a visual chain with critical edges highlighted, and plain-English summary: "this credential read after this postinstall hook raised probability by 34%." The Wiz playbook — show the toxic path, not just the score. Build into the architecture from day one.

06

Threshold Calibration & Adaptive Alerting

Alert too much and you're dead on arrival. Too little and you're useless. Start with a sensible default (T = 0.75) plus a calibration step during onboarding: observe a week of sessions, compute the distribution, set threshold at the 99.5th percentile. Adaptive thresholds per team/role is V2.

07

Session Boundary Detection

Must answer "what is a session" before shipping. V1: a session is a contiguous agent execution from invocation to exit. Idle >N minutes = new session. Agent spawns sub-agent = child session linked to parent. Don't try to solve the local/cloud handoff yet — document the limitation. Stitch across substrates in V2.

Tier 3 — Can't scale without these
Real problems but optimization, hardening, or second-order concerns you address once you have traction.
08

Graph Topology: Branching & Merging

Real sessions aren't linear — agents spawn subprocesses, tools fork, outputs merge. V1 can treat sessions as mostly linear chains. Most real attack chains are linear (install → hook → read → exfil). Branching and merging matters for sophisticated multi-agent orchestration — a later customer segment.

09

Scale & Latency

Bayesian update on a Markov chain is O(1) per event — fine for V1. Becomes real at thousands of concurrent sessions with full DAG inference. Start on-device (edge agent), add central aggregation later. The latency question — can you flag before exfiltration completes? — matters but is solvable with the Markov approach.

10

Adversarial Evasion

Important but premature. You need attackers to encounter and try to evade your system before you know what evasion looks like. Build logging and replay into V1 so you can study evasion attempts post-hoc. Decay functions to prevent "washing" the posterior and pacing anomaly detection come in V2 once you have real adversarial data.