Skip to content
The Lethal Trifecta

The Lethal Trifecta

The “Lethal Trifecta” is a term coined by Simon Willison to describe the three AI agent capabilities that, when combined, create a critical prompt injection attack surface. An agent that has access to private data, exposure to untrusted content, and the ability to exfiltrate data gives an attacker everything they need to steal information through a crafted prompt injection payload.

TeamWeb AI assistants are powerful — they can search knowledge bases, process messages from external users, send emails, execute code, and more. This means a single assistant can easily end up with all three capabilities unless you deliberately design against it.

The Three Components

1. Access to Private Data

Any tool that lets the assistant read information that should not be public. If an attacker can influence the assistant’s behaviour (via component 2), these tools become the source of stolen data.

TeamWeb AI FeatureWhat It Can Access
search_knowledgeProject knowledge base — documents, facts, website content, code, past conversations
read_code_fileFull source files from indexed code repositories
get_deliverable / list_deliverables / search_deliverablesPreviously generated content, which may contain sensitive information
get_notesPersistent notes stored by the assistant
get_project_stateShared key-value project state
get_conversation_summary / list_conversationsSummaries and metadata of past conversations
MCP server toolsArbitrary external data sources depending on the server configuration

2. Exposure to Untrusted Content

Any input the assistant processes that could have been crafted or manipulated by an attacker. This is the injection vector — the way malicious instructions reach the LLM.

TeamWeb AI FeatureSource of Untrusted Content
Channel messages (Slack, Telegram, email)Messages from external users, especially via public or shared channels
Public chatAnonymous visitors can send arbitrary messages
Knowledge base — websitesCrawled web pages could contain hidden prompt injection payloads
Knowledge base — documentsUploaded files (PDF, Word, Excel) could contain concealed instructions
web_searchSearch results contain attacker-controllable content
execute_code outputIf the code fetches external data, that data is untrusted
Delegated conversationsIf assistant B processes untrusted content and delegates to assistant A, the task description carries untrusted influence
MCP server tool resultsResults from external MCP servers are not under TeamWeb AI’s control

3. Ability to Exfiltrate Data

Any mechanism the assistant can use to send information outside the current conversation to a destination the attacker controls or can observe.

TeamWeb AI FeatureHow It Communicates Externally
send_emailSends email via the configured email provider
execute_codeSandbox has default Docker bridge networking — code can make outbound HTTP requests
save_contentSaves content with a shareable URL — private data could be embedded in a deliverable
delegate_taskPasses a task description to another assistant, which may have different (broader) tool access
MCP server toolsCan make arbitrary external API calls depending on the server
Channel responsesThe assistant’s reply itself goes back through the channel to the user who sent the message

How an Attack Works

Here is a concrete example of how prompt injection exploits the trifecta in TeamWeb AI:

  1. An assistant is configured with search_knowledge, send_email, and a Slack channel
  2. A knowledge base source — a crawled website or an uploaded document — contains a hidden prompt injection payload invisible to casual readers
  3. A user asks the assistant a question. The assistant calls search_knowledge and retrieves a result that includes the payload
  4. The LLM processes the payload as if it were part of the conversation. The injected instructions tell the assistant to search for sensitive data (API keys, customer details, internal documents) and email the results to an attacker-controlled address
  5. The assistant follows the injected instructions because LLMs cannot reliably distinguish between legitimate instructions and injected ones
This is not a hypothetical risk. Prompt injection is a well-documented, practical attack against LLM-based systems. There is currently no reliable way to make LLMs completely immune to it. The only dependable defence is to ensure your assistants never combine all three trifecta components without appropriate mitigations.

Mitigations

Apply the principle of least privilege to tools

Each assistant should have only the tools it actually needs. Disable everything else from the Tools tab. Specifically:

  • Disable send_email on assistants that do not need to send email
  • Disable execute_code on assistants that do not need code execution
  • Disable save_content on assistants that do not produce deliverables
  • Disable delegate_task on assistants that do not need to collaborate with others

Removing even one exfiltration tool can break the trifecta entirely.

Separate untrusted-content assistants from privileged ones

Do not build one assistant that does everything. Instead, create purpose-specific assistants:

  • A public-facing assistant that handles external channel messages or public chat, with minimal tools and limited knowledge access
  • An internal assistant with broader tool access and knowledge, but no public-facing channels

If an assistant processes untrusted input, it should not also have access to sensitive data and external communication tools.

Audit knowledge base sources

Every knowledge source is a potential injection vector:

  • Only crawl websites you control or trust completely — third-party pages can change at any time
  • Review uploaded documents before indexing, especially if they come from external parties
  • Be cautious with conversation indexing — past conversations with untrusted users could contain injection attempts that persist into the knowledge base

Restrict MCP server capabilities

MCP servers can provide arbitrary tools with arbitrary capabilities. Only enable servers from sources you trust, and audit what each server can do. See MCP Servers for configuration details, and Database Protection for guidance on MCP environment variable handling.

Understand sandbox limits

While code execution is sandboxed (see Code Sandboxing), the sandbox has outbound network access by default. Code executed in the sandbox can make HTTP requests to external services — meaning execute_code is both an untrusted content source (it can fetch external data) and a potential exfiltration vector (it can send data out). Disable execute_code on assistants that handle sensitive data unless code execution is essential to their purpose.

Use task-level tool restrictions

Task definitions support an allowed_tools field that restricts which tools are available during a task run. This is applied on top of the assistant’s own tool configuration. For example, a research task can be limited to only search_knowledge and web_search, preventing any exfiltration tools from being available even if the assistant normally has them.

Monitor conversations and logs

Review conversation logs and API Logs regularly to detect suspicious patterns — an assistant emailing unexpected recipients, searching knowledge for unusual terms, or making unexpected tool calls could indicate a prompt injection attack in progress.

Summary

Trifecta ComponentTeamWeb AI FeaturesRisk Level
Private data accessKnowledge base, deliverables, notes, project state, conversationsHigh — most assistants have this by default
Untrusted contentChannels, public chat, crawled websites, uploaded documents, web searchDepends on configuration — high for public-facing assistants
External communicationEmail, code execution (network), deliverable URLs, MCP tools, delegationCan be controlled by disabling specific tools
The most effective mitigation is to avoid combining all three capabilities in a single assistant. If an assistant must have all three, apply every other mitigation described above and monitor its activity closely.