Skip to content

Provider Plugins

Provider plugins add LLM and embedding backends to TeamWeb AI. Only one provider of each type is active at a time, configured in the global settings under Administration > Settings.

LLM Providers

LLM provider plugins power the assistant’s language model. They translate between TeamWeb AI’s internal message format and the provider’s API.

Directory Layout

my_llm/
    manifest.py
    provider.py

Example

manifest.py:

from app.plugins.base import PluginManifest, PluginType

manifest = PluginManifest(
    name="my_llm",
    plugin_type=PluginType.LLM,
    version="1.0.0",
    description="My custom LLM provider",
    author="Your Name",
    config_schema={
        "api_key": {
            "type": "string",
            "required": True,
            "secret": True,
            "label": "API Key",
        },
        "model": {
            "type": "string",
            "required": False,
            "default": "my-model-v1",
            "label": "Model",
        },
    },
)

provider.py:

from app.plugins.base import (
    BaseLLMProvider,
    LLMResponse,
    NeutralMessage,
    NeutralTool,
    NeutralToolChoice,
    ToolCall,
)


class MyLLMProvider(BaseLLMProvider):
    """Custom LLM provider.

    Args:
        api_key: API key for authentication.
        model: Model name to use.
    """

    def __init__(self, api_key: str, model: str = "my-model-v1") -> None:
        self.api_key = api_key
        self.model = model

    def create_message(
        self,
        *,
        system: str,
        messages: list[NeutralMessage],
        tools: list[NeutralTool] | None = None,
        max_tokens: int = 4096,
        thinking_budget: int | None = None,
        model: str | None = None,
        tool_choice: NeutralToolChoice | None = None,
    ) -> LLMResponse:
        """Send a request to your LLM and return a normalised response.

        Args:
            system: System prompt (plain string).
            messages: Conversation history as neutral messages. Each
                message has a role and a list of content blocks.
            tools: Neutral tool definitions, or None for no tools.
            max_tokens: Maximum tokens in the response.
            thinking_budget: Reasoning/thinking budget (optional;
                providers without reasoning support ignore it).
            model: Override the provider's default model (optional).
            tool_choice: Neutral tool-choice directive (optional).

        Returns:
            Normalised LLMResponse with neutral content blocks in
            `blocks` — ready to append to conversation history.
        """
        # Translate neutral → your vendor API, call it, translate
        # back. See app/plugins/anthropic_llm/translator.py,
        # app/plugins/openai_llm/translator.py, and
        # app/plugins/ollama_llm/translator.py for reference
        # implementations.
        return LLMResponse(
            text="Hello from my LLM!",
            tool_calls=[],
            stop_reason="end_turn",
            blocks=[{"type": "text", "text": "Hello from my LLM!"}],
            usage={"input_tokens": 10, "output_tokens": 5},
            model=self.model,
        )

    def calculate_cost(
        self,
        model: str,
        input_tokens: int,
        output_tokens: int,
        cache_creation_tokens: int = 0,
        cache_read_tokens: int = 0,
    ) -> tuple[float, float, float]:
        """Calculate the cost in USD for a given number of tokens.

        Args:
            model: Model identifier.
            input_tokens: Number of input tokens.
            output_tokens: Number of output tokens.
            cache_creation_tokens: Tokens written to cache.
            cache_read_tokens: Tokens read from cache.

        Returns:
            Tuple of (input_cost, output_cost, total_cost) in USD.
        """
        input_cost = input_tokens * 0.001 / 1000
        output_cost = output_tokens * 0.002 / 1000
        return input_cost, output_cost, input_cost + output_cost

Message Format

TeamWeb AI uses a vendor-neutral content-block format internally. Your provider translates between neutral blocks and your LLM’s native API at the edges — conversation history, tool calls, thinking, and cache hints all flow through the neutral shape so the agent loop is identical regardless of which provider answers. Each built-in plugin has a dedicated translator.py that holds the neutral ↔ vendor conversion; see:

  • app/plugins/anthropic_llm/translator.py — neutral ↔ Anthropic Messages API
  • app/plugins/openai_llm/translator.py — neutral ↔ OpenAI Responses API
  • app/plugins/ollama_llm/translator.py — neutral ↔ OpenAI Chat Completions (what Ollama exposes)
Technical Details

Format normalization — Conversation history, tool calls, and tool results use neutral content blocks defined in app/plugins/base.py: TextBlock, ToolUseBlock, ToolResultBlock, ThinkingBlock, CacheBreakpointBlock. The string type names are deliberately identical to Anthropic’s vocabulary ("text", "tool_use", etc.) so any legacy conversation stored as Anthropic-shaped dicts round-trips unchanged. The blocks field on LLMResponse must contain neutral blocks — the agent loop appends it straight to the message history.

Prompt caching — Providers that support prompt caching (currently the built-in Anthropic provider) consume a neutral {"type": "cache_breakpoint"} block as a hint and translate it into the vendor’s cache primitive — e.g., the Anthropic translator attaches cache_control: {"type": "ephemeral"} to the preceding content block and drops the breakpoint. Providers without caching drop the marker silently. The agent loop inserts a breakpoint between the conversation prefix and the current turn so the system prompt, tools, and earlier messages are cached across LLM calls within the same agent loop — only new messages are reprocessed. For multi-turn agent loops with many tool calls, this dramatically reduces input token costs.

Cost calculation — Token pricing accounts for prompt caching multipliers: tokens written to the cache cost 1.25x the normal input price, while tokens read from the cache cost only 0.1x. This makes caching highly cost-effective for conversations with multiple back-and-forth iterations between the LLM and tools.

Rate limiting — The built-in Anthropic provider tracks token and request budgets from the LLM’s response headers, shared across all workers via Redis. When remaining capacity drops below a safety threshold (5% for tokens, 10% for requests), the system proactively waits until the budget resets rather than hitting rate limit errors. If a rate limit error does occur, the system uses exponential backoff (starting at 60 seconds, doubling up to a maximum of 5 minutes).

LLMResponse Fields

FieldDescription
textConcatenated text-block content (None if response only contains tool calls or thinking)
tool_callsList of ToolCall objects with id, name, and input — a convenience view over blocks
stop_reasonWhy the model stopped: "end_turn", "tool_use", "max_tokens", or "refusal"
blocksFull assistant turn as neutral content blocks — appended to conversation history verbatim
usageToken usage dict with at least input_tokens and output_tokens (plus any provider-specific extras like reasoning_tokens or cache fields)
modelModel identifier string
thinking_textConcatenated thinking-block text (if the model emitted any)
rate_limit_headersOptional dict of vendor rate-limit response headers

blocks Format

The blocks field carries neutral content blocks, one per element. The agent loop appends it unchanged to the conversation history. Each element is a dict with a type discriminator:

  • Text: {"type": "text", "text": "..."}
  • Tool use: {"type": "tool_use", "id": "...", "name": "...", "input": {...}}
  • Thinking: {"type": "thinking", "text": "...", "signature": "..."}signature carries the provider-specific continuation token (empty string when the vendor doesn’t produce one)

Constructor

The provider class is instantiated with configuration values unpacked as keyword arguments. Define your __init__ parameters to match the keys in your config_schema.


Embedding Providers

Embedding provider plugins power the knowledge base’s semantic search. They generate vector embeddings for text.

Directory Layout

my_embeddings/
    manifest.py
    provider.py

Example

manifest.py:

from app.plugins.base import PluginManifest, PluginType

manifest = PluginManifest(
    name="my_embeddings",
    plugin_type=PluginType.EMBEDDING,
    version="1.0.0",
    description="My custom embedding provider",
    author="Your Name",
    config_schema={
        "api_key": {
            "type": "string",
            "required": True,
            "secret": True,
            "label": "API Key",
        },
    },
)

provider.py:

from app.plugins.base import BaseEmbeddingProvider


class MyEmbeddingProvider(BaseEmbeddingProvider):
    """Custom embedding provider.

    Args:
        api_key: API key for authentication.
    """

    def __init__(self, api_key: str = "") -> None:
        self.api_key = api_key

    def embed(self, text: str) -> list[float]:
        """Return the embedding vector for the given text.

        Args:
            text: Text to embed.

        Returns:
            Embedding vector as a list of floats.
        """
        # Call your embedding API here
        return [0.0] * 384

    def get_dimensions(self) -> int:
        """Return the dimensionality of the embedding vectors.

        Returns:
            Number of dimensions.
        """
        return 384

Required Methods

MethodDescription
embed(text)Return an embedding vector (list of floats) for the given text
get_dimensions()Return the number of dimensions in the embedding vectors
Changing the active embedding provider or its dimensions requires re-embedding all existing knowledge base content, since vectors from different models are not comparable.