Provider Plugins

Provider plugins add LLM and embedding backends to TeamWeb AI. Only one provider of each type is active at a time, configured in the global settings under Administration > Settings.

LLM Providers

LLM provider plugins power the assistant’s language model. They translate between TeamWeb AI’s internal message format and the provider’s API.

Directory Layout

my_llm/
    manifest.py
    provider.py

Example

manifest.py:

from app.plugins.base import PluginManifest, PluginType

manifest = PluginManifest(
    name="my_llm",
    plugin_type=PluginType.LLM,
    version="1.0.0",
    description="My custom LLM provider",
    author="Your Name",
    config_schema={
        "api_key": {
            "type": "string",
            "required": True,
            "secret": True,
            "label": "API Key",
        },
        "model": {
            "type": "string",
            "required": False,
            "default": "my-model-v1",
            "label": "Model",
        },
    },
)

provider.py:

from app.plugins.base import (
    BaseLLMProvider,
    LLMResponse,
    NeutralMessage,
    NeutralTool,
    NeutralToolChoice,
    ToolCall,
)


class MyLLMProvider(BaseLLMProvider):
    """Custom LLM provider.

    Args:
        api_key: API key for authentication.
        model: Model name to use.
    """

    def __init__(self, api_key: str, model: str = "my-model-v1") -> None:
        self.api_key = api_key
        self.model = model

    def create_message(
        self,
        *,
        system: str,
        messages: list[NeutralMessage],
        tools: list[NeutralTool] | None = None,
        max_tokens: int = 4096,
        thinking_budget: int | None = None,
        model: str | None = None,
        tool_choice: NeutralToolChoice | None = None,
    ) -> LLMResponse:
        """Send a request to your LLM and return a normalised response.

        Args:
            system: System prompt (plain string).
            messages: Conversation history as neutral messages. Each
                message has a role and a list of content blocks.
            tools: Neutral tool definitions, or None for no tools.
            max_tokens: Maximum tokens in the response.
            thinking_budget: Reasoning/thinking budget (optional;
                providers without reasoning support ignore it).
            model: Override the provider's default model (optional).
            tool_choice: Neutral tool-choice directive (optional).

        Returns:
            Normalised LLMResponse with neutral content blocks in
            `blocks` — ready to append to conversation history.
        """
        # Translate neutral → your vendor API, call it, translate
        # back. See app/plugins/anthropic_llm/translator.py,
        # app/plugins/openai_llm/translator.py, and
        # app/plugins/ollama_llm/translator.py for reference
        # implementations.
        return LLMResponse(
            text="Hello from my LLM!",
            tool_calls=[],
            stop_reason="end_turn",
            blocks=[{"type": "text", "text": "Hello from my LLM!"}],
            usage={"input_tokens": 10, "output_tokens": 5},
            model=self.model,
        )

    def calculate_cost(
        self,
        model: str,
        input_tokens: int,
        output_tokens: int,
        cache_creation_tokens: int = 0,
        cache_read_tokens: int = 0,
    ) -> tuple[float, float, float]:
        """Calculate the cost in USD for a given number of tokens.

        Args:
            model: Model identifier.
            input_tokens: Number of input tokens.
            output_tokens: Number of output tokens.
            cache_creation_tokens: Tokens written to cache.
            cache_read_tokens: Tokens read from cache.

        Returns:
            Tuple of (input_cost, output_cost, total_cost) in USD.
        """
        input_cost = input_tokens * 0.001 / 1000
        output_cost = output_tokens * 0.002 / 1000
        return input_cost, output_cost, input_cost + output_cost

Message Format

TeamWeb AI uses a vendor-neutral content-block format internally. Your provider translates between neutral blocks and your LLM’s native API at the edges — conversation history, tool calls, thinking, and cache hints all flow through the neutral shape so the agent loop is identical regardless of which provider answers. Each built-in plugin has a dedicated translator.py that holds the neutral ↔ vendor conversion; see:

app/plugins/anthropic_llm/translator.py — neutral ↔ Anthropic Messages API
app/plugins/openai_llm/translator.py — neutral ↔ OpenAI Responses API
app/plugins/ollama_llm/translator.py — neutral ↔ OpenAI Chat Completions (what Ollama exposes)

Technical Details

Format normalization — Conversation history, tool calls, and tool results use neutral content blocks defined in app/plugins/base.py: TextBlock, ToolUseBlock, ToolResultBlock, ThinkingBlock, CacheBreakpointBlock. The string type names are deliberately identical to Anthropic’s vocabulary ("text", "tool_use", etc.) so any legacy conversation stored as Anthropic-shaped dicts round-trips unchanged. The blocks field on LLMResponse must contain neutral blocks — the agent loop appends it straight to the message history.

Prompt caching — Providers that support prompt caching (currently the built-in Anthropic provider) consume a neutral {"type": "cache_breakpoint"} block as a hint and translate it into the vendor’s cache primitive — e.g., the Anthropic translator attaches cache_control: {"type": "ephemeral"} to the preceding content block and drops the breakpoint. Providers without caching drop the marker silently. The agent loop inserts a breakpoint between the conversation prefix and the current turn so the system prompt, tools, and earlier messages are cached across LLM calls within the same agent loop — only new messages are reprocessed. For multi-turn agent loops with many tool calls, this dramatically reduces input token costs.

Cost calculation — Token pricing accounts for prompt caching multipliers: tokens written to the cache cost 1.25x the normal input price, while tokens read from the cache cost only 0.1x. This makes caching highly cost-effective for conversations with multiple back-and-forth iterations between the LLM and tools.

Rate limiting — The built-in Anthropic provider tracks token and request budgets from the LLM’s response headers, shared across all workers via Redis. When remaining capacity drops below a safety threshold (5% for tokens, 10% for requests), the system proactively waits until the budget resets rather than hitting rate limit errors. If a rate limit error does occur, the system uses exponential backoff (starting at 60 seconds, doubling up to a maximum of 5 minutes).

`LLMResponse` Fields

Field	Description
`text`	Concatenated text-block content (`None` if response only contains tool calls or thinking)
`tool_calls`	List of `ToolCall` objects with `id`, `name`, and `input` — a convenience view over `blocks`
`stop_reason`	Why the model stopped: `"end_turn"`, `"tool_use"`, `"max_tokens"`, or `"refusal"`
`blocks`	Full assistant turn as neutral content blocks — appended to conversation history verbatim
`usage`	Token usage dict with at least `input_tokens` and `output_tokens` (plus any provider-specific extras like `reasoning_tokens` or cache fields)
`model`	Model identifier string
`thinking_text`	Concatenated thinking-block text (if the model emitted any)
`rate_limit_headers`	Optional dict of vendor rate-limit response headers

`blocks` Format

The blocks field carries neutral content blocks, one per element. The agent loop appends it unchanged to the conversation history. Each element is a dict with a type discriminator:

Text: {"type": "text", "text": "..."}
Tool use: {"type": "tool_use", "id": "...", "name": "...", "input": {...}}
Thinking: {"type": "thinking", "text": "...", "signature": "..."} — signature carries the provider-specific continuation token (empty string when the vendor doesn’t produce one)

Constructor

The provider class is instantiated with configuration values unpacked as keyword arguments. Define your __init__ parameters to match the keys in your config_schema.

Embedding Providers

Embedding provider plugins power the knowledge base’s semantic search. They generate vector embeddings for text.

Directory Layout

my_embeddings/
    manifest.py
    provider.py

Example

manifest.py:

from app.plugins.base import PluginManifest, PluginType

manifest = PluginManifest(
    name="my_embeddings",
    plugin_type=PluginType.EMBEDDING,
    version="1.0.0",
    description="My custom embedding provider",
    author="Your Name",
    config_schema={
        "api_key": {
            "type": "string",
            "required": True,
            "secret": True,
            "label": "API Key",
        },
    },
)

provider.py:

from app.plugins.base import BaseEmbeddingProvider


class MyEmbeddingProvider(BaseEmbeddingProvider):
    """Custom embedding provider.

    Args:
        api_key: API key for authentication.
    """

    def __init__(self, api_key: str = "") -> None:
        self.api_key = api_key

    def embed(self, text: str) -> list[float]:
        """Return the embedding vector for the given text.

        Args:
            text: Text to embed.

        Returns:
            Embedding vector as a list of floats.
        """
        # Call your embedding API here
        return [0.0] * 384

    def get_dimensions(self) -> int:
        """Return the dimensionality of the embedding vectors.

        Returns:
            Number of dimensions.
        """
        return 384

Required Methods

Method	Description
`embed(text)`	Return an embedding vector (list of floats) for the given text
`get_dimensions()`	Return the number of dimensions in the embedding vectors

Changing the active embedding provider or its dimensions requires re-embedding all existing knowledge base content, since vectors from different models are not comparable.

Channel Plugins Output Plugins

Provider Plugins

LLM Providers

Directory Layout

Example

Message Format

LLMResponse Fields

blocks Format

Constructor

Embedding Providers

Directory Layout

Example

Required Methods

`LLMResponse` Fields

`blocks` Format