Skip to content

Provider Plugins

Provider plugins add LLM and embedding backends to TeamWeb AI. Only one provider of each type is active at a time, configured in the global settings under Administration > Settings.

LLM Providers

LLM provider plugins power the assistant’s language model. They translate between TeamWeb AI’s internal message format and the provider’s API.

Directory Layout

my_llm/
    manifest.py
    provider.py

Example

manifest.py:

from app.plugins.base import PluginManifest, PluginType

manifest = PluginManifest(
    name="my_llm",
    plugin_type=PluginType.LLM,
    version="1.0.0",
    description="My custom LLM provider",
    author="Your Name",
    config_schema={
        "api_key": {
            "type": "string",
            "required": True,
            "secret": True,
            "label": "API Key",
        },
        "model": {
            "type": "string",
            "required": False,
            "default": "my-model-v1",
            "label": "Model",
        },
    },
)

provider.py:

from app.plugins.base import BaseLLMProvider, LLMResponse, ToolCall


class MyLLMProvider(BaseLLMProvider):
    """Custom LLM provider.

    Args:
        api_key: API key for authentication.
        model: Model name to use.
    """

    def __init__(self, api_key: str, model: str = "my-model-v1") -> None:
        self.api_key = api_key
        self.model = model

    def create_message(
        self,
        *,
        system: str,
        messages: list[dict],
        tools: list[dict] | None = None,
        max_tokens: int = 16000,
        thinking_budget: int | None = None,
        enable_tool_search: bool = False,
    ) -> LLMResponse:
        """Send a chat completion and return a normalised response.

        Args:
            system: System prompt.
            messages: Conversation messages (Anthropic format).
            tools: Tool schemas (name/description/input_schema).
            max_tokens: Maximum tokens in the response.
            thinking_budget: Extended thinking budget (optional).
            enable_tool_search: Whether to enable tool search (optional).

        Returns:
            Normalised LLMResponse.
        """
        # Call your LLM API here, then normalise the response:
        return LLMResponse(
            text="Hello from my LLM!",
            tool_calls=[],
            stop_reason="end_turn",
            raw_content=[{"type": "text", "text": "Hello from my LLM!"}],
            usage={"input_tokens": 10, "output_tokens": 5},
            model=self.model,
        )

    def calculate_cost(
        self,
        model: str,
        input_tokens: int,
        output_tokens: int,
        cache_creation_tokens: int = 0,
        cache_read_tokens: int = 0,
    ) -> tuple[float, float, float]:
        """Calculate the cost in USD for a given number of tokens.

        Args:
            model: Model identifier.
            input_tokens: Number of input tokens.
            output_tokens: Number of output tokens.
            cache_creation_tokens: Tokens written to cache.
            cache_read_tokens: Tokens read from cache.

        Returns:
            Tuple of (input_cost, output_cost, total_cost) in USD.
        """
        input_cost = input_tokens * 0.001 / 1000
        output_cost = output_tokens * 0.002 / 1000
        return input_cost, output_cost, input_cost + output_cost

Message Format

TeamWeb AI uses the Anthropic message format internally. Your provider must translate between this format and your LLM’s native API. See the built-in Ollama provider (app/plugins/ollama_llm/provider.py) for a complete example of translating to and from the OpenAI chat completions format.

Technical Details

Format normalization — All conversation history, tool calls, and tool results use Anthropic’s content block structure internally ({"type": "text", ...}, {"type": "tool_use", ...}, {"type": "tool_result", ...}). Providers for other APIs must translate inbound messages from Anthropic format to their native format, and translate LLM responses back. The raw_content field on LLMResponse must be in Anthropic format because it is appended directly to the conversation message history.

Prompt caching — The built-in Anthropic provider implements prompt caching using ephemeral cache control markers. The system prompt and tool definitions are marked as cacheable, and a cache breakpoint is placed at the end of the conversation history prefix. This means the system prompt, tools, and earlier messages are cached across LLM calls within the same agent loop — only new messages are reprocessed. For multi-turn agent loops with many tool calls, this dramatically reduces input token costs.

Cost calculation — Token pricing accounts for prompt caching multipliers: tokens written to the cache cost 1.25x the normal input price, while tokens read from the cache cost only 0.1x. This makes caching highly cost-effective for conversations with multiple back-and-forth iterations between the LLM and tools.

Rate limiting — The built-in Anthropic provider tracks token and request budgets from the LLM’s response headers, shared across all workers via Redis. When remaining capacity drops below a safety threshold (5% for tokens, 10% for requests), the system proactively waits until the budget resets rather than hitting rate limit errors. If a rate limit error does occur, the system uses exponential backoff (starting at 60 seconds, doubling up to a maximum of 5 minutes).

LLMResponse Fields

FieldDescription
textFinal text content (None if response only contains tool calls)
tool_callsList of ToolCall objects with id, name, and input
stop_reasonWhy the model stopped: "end_turn", "tool_use", or "max_tokens"
raw_contentRaw content blocks in Anthropic format for message history
usageToken usage dict with input_tokens and output_tokens
modelModel identifier string
thinking_textExtended thinking content (if supported by the model)

raw_content Format

The raw_content field must be in Anthropic format because it is appended directly to the conversation message history. Each element is a dict:

  • Text: {"type": "text", "text": "..."}
  • Tool use: {"type": "tool_use", "id": "...", "name": "...", "input": {...}}

Constructor

The provider class is instantiated with configuration values unpacked as keyword arguments. Define your __init__ parameters to match the keys in your config_schema.


Embedding Providers

Embedding provider plugins power the knowledge base’s semantic search. They generate vector embeddings for text.

Directory Layout

my_embeddings/
    manifest.py
    provider.py

Example

manifest.py:

from app.plugins.base import PluginManifest, PluginType

manifest = PluginManifest(
    name="my_embeddings",
    plugin_type=PluginType.EMBEDDING,
    version="1.0.0",
    description="My custom embedding provider",
    author="Your Name",
    config_schema={
        "api_key": {
            "type": "string",
            "required": True,
            "secret": True,
            "label": "API Key",
        },
    },
)

provider.py:

from app.plugins.base import BaseEmbeddingProvider


class MyEmbeddingProvider(BaseEmbeddingProvider):
    """Custom embedding provider.

    Args:
        api_key: API key for authentication.
    """

    def __init__(self, api_key: str = "") -> None:
        self.api_key = api_key

    def embed(self, text: str) -> list[float]:
        """Return the embedding vector for the given text.

        Args:
            text: Text to embed.

        Returns:
            Embedding vector as a list of floats.
        """
        # Call your embedding API here
        return [0.0] * 384

    def get_dimensions(self) -> int:
        """Return the dimensionality of the embedding vectors.

        Returns:
            Number of dimensions.
        """
        return 384

Required Methods

MethodDescription
embed(text)Return an embedding vector (list of floats) for the given text
get_dimensions()Return the number of dimensions in the embedding vectors
Changing the active embedding provider or its dimensions requires re-embedding all existing knowledge base content, since vectors from different models are not comparable.