Provider Plugins
Provider plugins add LLM and embedding backends to TeamWeb AI. Only one provider of each type is active at a time, configured in the global settings under Administration > Settings.
LLM Providers
LLM provider plugins power the assistant’s language model. They translate between TeamWeb AI’s internal message format and the provider’s API.
Directory Layout
my_llm/
manifest.py
provider.pyExample
manifest.py:
from app.plugins.base import PluginManifest, PluginType
manifest = PluginManifest(
name="my_llm",
plugin_type=PluginType.LLM,
version="1.0.0",
description="My custom LLM provider",
author="Your Name",
config_schema={
"api_key": {
"type": "string",
"required": True,
"secret": True,
"label": "API Key",
},
"model": {
"type": "string",
"required": False,
"default": "my-model-v1",
"label": "Model",
},
},
)provider.py:
from app.plugins.base import (
BaseLLMProvider,
LLMResponse,
NeutralMessage,
NeutralTool,
NeutralToolChoice,
ToolCall,
)
class MyLLMProvider(BaseLLMProvider):
"""Custom LLM provider.
Args:
api_key: API key for authentication.
model: Model name to use.
"""
def __init__(self, api_key: str, model: str = "my-model-v1") -> None:
self.api_key = api_key
self.model = model
def create_message(
self,
*,
system: str,
messages: list[NeutralMessage],
tools: list[NeutralTool] | None = None,
max_tokens: int = 4096,
thinking_budget: int | None = None,
model: str | None = None,
tool_choice: NeutralToolChoice | None = None,
) -> LLMResponse:
"""Send a request to your LLM and return a normalised response.
Args:
system: System prompt (plain string).
messages: Conversation history as neutral messages. Each
message has a role and a list of content blocks.
tools: Neutral tool definitions, or None for no tools.
max_tokens: Maximum tokens in the response.
thinking_budget: Reasoning/thinking budget (optional;
providers without reasoning support ignore it).
model: Override the provider's default model (optional).
tool_choice: Neutral tool-choice directive (optional).
Returns:
Normalised LLMResponse with neutral content blocks in
`blocks` — ready to append to conversation history.
"""
# Translate neutral → your vendor API, call it, translate
# back. See app/plugins/anthropic_llm/translator.py,
# app/plugins/openai_llm/translator.py, and
# app/plugins/ollama_llm/translator.py for reference
# implementations.
return LLMResponse(
text="Hello from my LLM!",
tool_calls=[],
stop_reason="end_turn",
blocks=[{"type": "text", "text": "Hello from my LLM!"}],
usage={"input_tokens": 10, "output_tokens": 5},
model=self.model,
)
def calculate_cost(
self,
model: str,
input_tokens: int,
output_tokens: int,
cache_creation_tokens: int = 0,
cache_read_tokens: int = 0,
) -> tuple[float, float, float]:
"""Calculate the cost in USD for a given number of tokens.
Args:
model: Model identifier.
input_tokens: Number of input tokens.
output_tokens: Number of output tokens.
cache_creation_tokens: Tokens written to cache.
cache_read_tokens: Tokens read from cache.
Returns:
Tuple of (input_cost, output_cost, total_cost) in USD.
"""
input_cost = input_tokens * 0.001 / 1000
output_cost = output_tokens * 0.002 / 1000
return input_cost, output_cost, input_cost + output_costMessage Format
TeamWeb AI uses a vendor-neutral content-block format internally. Your provider translates between neutral blocks and your LLM’s native API at the edges — conversation history, tool calls, thinking, and cache hints all flow through the neutral shape so the agent loop is identical regardless of which provider answers. Each built-in plugin has a dedicated translator.py that holds the neutral ↔ vendor conversion; see:
app/plugins/anthropic_llm/translator.py— neutral ↔ Anthropic Messages APIapp/plugins/openai_llm/translator.py— neutral ↔ OpenAI Responses APIapp/plugins/ollama_llm/translator.py— neutral ↔ OpenAI Chat Completions (what Ollama exposes)
Technical Details
Format normalization — Conversation history, tool calls, and tool results use neutral content blocks defined in app/plugins/base.py: TextBlock, ToolUseBlock, ToolResultBlock, ThinkingBlock, CacheBreakpointBlock. The string type names are deliberately identical to Anthropic’s vocabulary ("text", "tool_use", etc.) so any legacy conversation stored as Anthropic-shaped dicts round-trips unchanged. The blocks field on LLMResponse must contain neutral blocks — the agent loop appends it straight to the message history.
Prompt caching — Providers that support prompt caching (currently the built-in Anthropic provider) consume a neutral {"type": "cache_breakpoint"} block as a hint and translate it into the vendor’s cache primitive — e.g., the Anthropic translator attaches cache_control: {"type": "ephemeral"} to the preceding content block and drops the breakpoint. Providers without caching drop the marker silently. The agent loop inserts a breakpoint between the conversation prefix and the current turn so the system prompt, tools, and earlier messages are cached across LLM calls within the same agent loop — only new messages are reprocessed. For multi-turn agent loops with many tool calls, this dramatically reduces input token costs.
Cost calculation — Token pricing accounts for prompt caching multipliers: tokens written to the cache cost 1.25x the normal input price, while tokens read from the cache cost only 0.1x. This makes caching highly cost-effective for conversations with multiple back-and-forth iterations between the LLM and tools.
Rate limiting — The built-in Anthropic provider tracks token and request budgets from the LLM’s response headers, shared across all workers via Redis. When remaining capacity drops below a safety threshold (5% for tokens, 10% for requests), the system proactively waits until the budget resets rather than hitting rate limit errors. If a rate limit error does occur, the system uses exponential backoff (starting at 60 seconds, doubling up to a maximum of 5 minutes).
LLMResponse Fields
| Field | Description |
|---|---|
text | Concatenated text-block content (None if response only contains tool calls or thinking) |
tool_calls | List of ToolCall objects with id, name, and input — a convenience view over blocks |
stop_reason | Why the model stopped: "end_turn", "tool_use", "max_tokens", or "refusal" |
blocks | Full assistant turn as neutral content blocks — appended to conversation history verbatim |
usage | Token usage dict with at least input_tokens and output_tokens (plus any provider-specific extras like reasoning_tokens or cache fields) |
model | Model identifier string |
thinking_text | Concatenated thinking-block text (if the model emitted any) |
rate_limit_headers | Optional dict of vendor rate-limit response headers |
blocks Format
The blocks field carries neutral content blocks, one per element. The agent loop appends it unchanged to the conversation history. Each element is a dict with a type discriminator:
- Text:
{"type": "text", "text": "..."} - Tool use:
{"type": "tool_use", "id": "...", "name": "...", "input": {...}} - Thinking:
{"type": "thinking", "text": "...", "signature": "..."}—signaturecarries the provider-specific continuation token (empty string when the vendor doesn’t produce one)
Constructor
The provider class is instantiated with configuration values unpacked as keyword arguments. Define your __init__ parameters to match the keys in your config_schema.
Embedding Providers
Embedding provider plugins power the knowledge base’s semantic search. They generate vector embeddings for text.
Directory Layout
my_embeddings/
manifest.py
provider.pyExample
manifest.py:
from app.plugins.base import PluginManifest, PluginType
manifest = PluginManifest(
name="my_embeddings",
plugin_type=PluginType.EMBEDDING,
version="1.0.0",
description="My custom embedding provider",
author="Your Name",
config_schema={
"api_key": {
"type": "string",
"required": True,
"secret": True,
"label": "API Key",
},
},
)provider.py:
from app.plugins.base import BaseEmbeddingProvider
class MyEmbeddingProvider(BaseEmbeddingProvider):
"""Custom embedding provider.
Args:
api_key: API key for authentication.
"""
def __init__(self, api_key: str = "") -> None:
self.api_key = api_key
def embed(self, text: str) -> list[float]:
"""Return the embedding vector for the given text.
Args:
text: Text to embed.
Returns:
Embedding vector as a list of floats.
"""
# Call your embedding API here
return [0.0] * 384
def get_dimensions(self) -> int:
"""Return the dimensionality of the embedding vectors.
Returns:
Number of dimensions.
"""
return 384Required Methods
| Method | Description |
|---|---|
embed(text) | Return an embedding vector (list of floats) for the given text |
get_dimensions() | Return the number of dimensions in the embedding vectors |