Provider Plugins
Provider plugins add LLM and embedding backends to TeamWeb AI. Only one provider of each type is active at a time, configured in the global settings under Administration > Settings.
LLM Providers
LLM provider plugins power the assistant’s language model. They translate between TeamWeb AI’s internal message format and the provider’s API.
Directory Layout
my_llm/
manifest.py
provider.pyExample
manifest.py:
from app.plugins.base import PluginManifest, PluginType
manifest = PluginManifest(
name="my_llm",
plugin_type=PluginType.LLM,
version="1.0.0",
description="My custom LLM provider",
author="Your Name",
config_schema={
"api_key": {
"type": "string",
"required": True,
"secret": True,
"label": "API Key",
},
"model": {
"type": "string",
"required": False,
"default": "my-model-v1",
"label": "Model",
},
},
)provider.py:
from app.plugins.base import BaseLLMProvider, LLMResponse, ToolCall
class MyLLMProvider(BaseLLMProvider):
"""Custom LLM provider.
Args:
api_key: API key for authentication.
model: Model name to use.
"""
def __init__(self, api_key: str, model: str = "my-model-v1") -> None:
self.api_key = api_key
self.model = model
def create_message(
self,
*,
system: str,
messages: list[dict],
tools: list[dict] | None = None,
max_tokens: int = 16000,
thinking_budget: int | None = None,
enable_tool_search: bool = False,
) -> LLMResponse:
"""Send a chat completion and return a normalised response.
Args:
system: System prompt.
messages: Conversation messages (Anthropic format).
tools: Tool schemas (name/description/input_schema).
max_tokens: Maximum tokens in the response.
thinking_budget: Extended thinking budget (optional).
enable_tool_search: Whether to enable tool search (optional).
Returns:
Normalised LLMResponse.
"""
# Call your LLM API here, then normalise the response:
return LLMResponse(
text="Hello from my LLM!",
tool_calls=[],
stop_reason="end_turn",
raw_content=[{"type": "text", "text": "Hello from my LLM!"}],
usage={"input_tokens": 10, "output_tokens": 5},
model=self.model,
)
def calculate_cost(
self,
model: str,
input_tokens: int,
output_tokens: int,
cache_creation_tokens: int = 0,
cache_read_tokens: int = 0,
) -> tuple[float, float, float]:
"""Calculate the cost in USD for a given number of tokens.
Args:
model: Model identifier.
input_tokens: Number of input tokens.
output_tokens: Number of output tokens.
cache_creation_tokens: Tokens written to cache.
cache_read_tokens: Tokens read from cache.
Returns:
Tuple of (input_cost, output_cost, total_cost) in USD.
"""
input_cost = input_tokens * 0.001 / 1000
output_cost = output_tokens * 0.002 / 1000
return input_cost, output_cost, input_cost + output_costMessage Format
TeamWeb AI uses the Anthropic message format internally. Your provider must translate between this format and your LLM’s native API. See the built-in Ollama provider (app/plugins/ollama_llm/provider.py) for a complete example of translating to and from the OpenAI chat completions format.
Technical Details
Format normalization — All conversation history, tool calls, and tool results use Anthropic’s content block structure internally ({"type": "text", ...}, {"type": "tool_use", ...}, {"type": "tool_result", ...}). Providers for other APIs must translate inbound messages from Anthropic format to their native format, and translate LLM responses back. The raw_content field on LLMResponse must be in Anthropic format because it is appended directly to the conversation message history.
Prompt caching — The built-in Anthropic provider implements prompt caching using ephemeral cache control markers. The system prompt and tool definitions are marked as cacheable, and a cache breakpoint is placed at the end of the conversation history prefix. This means the system prompt, tools, and earlier messages are cached across LLM calls within the same agent loop — only new messages are reprocessed. For multi-turn agent loops with many tool calls, this dramatically reduces input token costs.
Cost calculation — Token pricing accounts for prompt caching multipliers: tokens written to the cache cost 1.25x the normal input price, while tokens read from the cache cost only 0.1x. This makes caching highly cost-effective for conversations with multiple back-and-forth iterations between the LLM and tools.
Rate limiting — The built-in Anthropic provider tracks token and request budgets from the LLM’s response headers, shared across all workers via Redis. When remaining capacity drops below a safety threshold (5% for tokens, 10% for requests), the system proactively waits until the budget resets rather than hitting rate limit errors. If a rate limit error does occur, the system uses exponential backoff (starting at 60 seconds, doubling up to a maximum of 5 minutes).
LLMResponse Fields
| Field | Description |
|---|---|
text | Final text content (None if response only contains tool calls) |
tool_calls | List of ToolCall objects with id, name, and input |
stop_reason | Why the model stopped: "end_turn", "tool_use", or "max_tokens" |
raw_content | Raw content blocks in Anthropic format for message history |
usage | Token usage dict with input_tokens and output_tokens |
model | Model identifier string |
thinking_text | Extended thinking content (if supported by the model) |
raw_content Format
The raw_content field must be in Anthropic format because it is appended directly to the conversation message history. Each element is a dict:
- Text:
{"type": "text", "text": "..."} - Tool use:
{"type": "tool_use", "id": "...", "name": "...", "input": {...}}
Constructor
The provider class is instantiated with configuration values unpacked as keyword arguments. Define your __init__ parameters to match the keys in your config_schema.
Embedding Providers
Embedding provider plugins power the knowledge base’s semantic search. They generate vector embeddings for text.
Directory Layout
my_embeddings/
manifest.py
provider.pyExample
manifest.py:
from app.plugins.base import PluginManifest, PluginType
manifest = PluginManifest(
name="my_embeddings",
plugin_type=PluginType.EMBEDDING,
version="1.0.0",
description="My custom embedding provider",
author="Your Name",
config_schema={
"api_key": {
"type": "string",
"required": True,
"secret": True,
"label": "API Key",
},
},
)provider.py:
from app.plugins.base import BaseEmbeddingProvider
class MyEmbeddingProvider(BaseEmbeddingProvider):
"""Custom embedding provider.
Args:
api_key: API key for authentication.
"""
def __init__(self, api_key: str = "") -> None:
self.api_key = api_key
def embed(self, text: str) -> list[float]:
"""Return the embedding vector for the given text.
Args:
text: Text to embed.
Returns:
Embedding vector as a list of floats.
"""
# Call your embedding API here
return [0.0] * 384
def get_dimensions(self) -> int:
"""Return the dimensionality of the embedding vectors.
Returns:
Number of dimensions.
"""
return 384Required Methods
| Method | Description |
|---|---|
embed(text) | Return an embedding vector (list of floats) for the given text |
get_dimensions() | Return the number of dimensions in the embedding vectors |