The three-capability threat model that enables AI-driven data exfiltration - private data access, untrusted content exposure, and external communication.
Edison Watch prevents data exfiltration by detecting and blocking the combination of capabilities required for an attack.
AI agents are vulnerable to prompt injection - malicious instructions hidden in external content (like a web page or file) that manipulate the AI into exfiltrating sensitive data.
Exfiltration requires three capabilities simultaneously. Edison Watch tracks these via per-session monotonic flags:
Capability
Security Flag
Action
Private Data Access
read_private_data
AI reads internal files, DBs, or docs.
Untrusted Content
read_untrusted_public_data
AI fetches data from the internet.
External Communication
write_operation
AI sends data out (Slack, Email, APIs).
Enforcement Logic: If a session has accessed both Private Data AND Untrusted Content, any subsequent External Communication is paused for human approval.
An attacker needs all three to succeed. Remove any one capability and exfiltration becomes impossible:
Without private data access → nothing valuable to steal
Without untrusted content → no way to inject malicious instructions
Without external comms → no way to send stolen data out
State is tracked in the Edison server and is monotonic: once a flag is set (e.g., Private Data accessed), it cannot be unset for that session. This prevents "reset" attacks where a malicious prompt tries to clear the session's threat state.
ACLs prevent sensitive data from flowing to lower-sensitivity destinations regardless of the Trifecta state.
Level
Rule
PUBLIC
Can flow anywhere.
PRIVATE
Cannot flow to PUBLIC.
SECRET
Cannot flow to PRIVATE or PUBLIC.
Example: If an agent reads a database marked SECRET, it is immediately blocked from posting to a PUBLIC Slack channel - even if the Trifecta hasn't been fully triggered.
Traditional security tools operate at the network or identity layer. They verify who is making a request, not what data is in the AI's context window. The Lethal Trifecta is a context-level threat model - it tracks what the agent has seen and what it's about to do, then makes a real-time decision about whether that combination is dangerous.