Provider Plugins

Provider plugins add LLM and embedding backends to TeamWeb AI. Only one provider of each type is active at a time, configured in the global settings under Administration > Settings.

LLM Providers

LLM provider plugins power the assistant’s language model. They translate between TeamWeb AI’s internal message format and the provider’s API.

Directory Layout

my_llm/
    manifest.py
    provider.py

Example

manifest.py:

from app.plugins.base import PluginManifest, PluginType

manifest = PluginManifest(
    name="my_llm",
    plugin_type=PluginType.LLM,
    version="1.0.0",
    description="My custom LLM provider",
    author="Your Name",
    config_schema={
        "api_key": {
            "type": "string",
            "required": True,
            "secret": True,
            "label": "API Key",
        },
        "model": {
            "type": "string",
            "required": False,
            "default": "my-model-v1",
            "label": "Model",
        },
    },
)

provider.py:

from app.plugins.base import BaseLLMProvider, LLMResponse, ToolCall


class MyLLMProvider(BaseLLMProvider):
    """Custom LLM provider.

    Args:
        api_key: API key for authentication.
        model: Model name to use.
    """

    def __init__(self, api_key: str, model: str = "my-model-v1") -> None:
        self.api_key = api_key
        self.model = model

    def create_message(
        self,
        *,
        system: str,
        messages: list[dict],
        tools: list[dict] | None = None,
        max_tokens: int = 16000,
        thinking_budget: int | None = None,
        enable_tool_search: bool = False,
    ) -> LLMResponse:
        """Send a chat completion and return a normalised response.

        Args:
            system: System prompt.
            messages: Conversation messages (Anthropic format).
            tools: Tool schemas (name/description/input_schema).
            max_tokens: Maximum tokens in the response.
            thinking_budget: Extended thinking budget (optional).
            enable_tool_search: Whether to enable tool search (optional).

        Returns:
            Normalised LLMResponse.
        """
        # Call your LLM API here, then normalise the response:
        return LLMResponse(
            text="Hello from my LLM!",
            tool_calls=[],
            stop_reason="end_turn",
            raw_content=[{"type": "text", "text": "Hello from my LLM!"}],
            usage={"input_tokens": 10, "output_tokens": 5},
            model=self.model,
        )

    def calculate_cost(
        self,
        model: str,
        input_tokens: int,
        output_tokens: int,
        cache_creation_tokens: int = 0,
        cache_read_tokens: int = 0,
    ) -> tuple[float, float, float]:
        """Calculate the cost in USD for a given number of tokens.

        Args:
            model: Model identifier.
            input_tokens: Number of input tokens.
            output_tokens: Number of output tokens.
            cache_creation_tokens: Tokens written to cache.
            cache_read_tokens: Tokens read from cache.

        Returns:
            Tuple of (input_cost, output_cost, total_cost) in USD.
        """
        input_cost = input_tokens * 0.001 / 1000
        output_cost = output_tokens * 0.002 / 1000
        return input_cost, output_cost, input_cost + output_cost

Message Format

TeamWeb AI uses the Anthropic message format internally. Your provider must translate between this format and your LLM’s native API. See the built-in Ollama provider (app/plugins/ollama_llm/provider.py) for a complete example of translating to and from the OpenAI chat completions format.

Technical Details

Format normalization — All conversation history, tool calls, and tool results use Anthropic’s content block structure internally ({"type": "text", ...}, {"type": "tool_use", ...}, {"type": "tool_result", ...}). Providers for other APIs must translate inbound messages from Anthropic format to their native format, and translate LLM responses back. The raw_content field on LLMResponse must be in Anthropic format because it is appended directly to the conversation message history.

Prompt caching — The built-in Anthropic provider implements prompt caching using ephemeral cache control markers. The system prompt and tool definitions are marked as cacheable, and a cache breakpoint is placed at the end of the conversation history prefix. This means the system prompt, tools, and earlier messages are cached across LLM calls within the same agent loop — only new messages are reprocessed. For multi-turn agent loops with many tool calls, this dramatically reduces input token costs.

Cost calculation — Token pricing accounts for prompt caching multipliers: tokens written to the cache cost 1.25x the normal input price, while tokens read from the cache cost only 0.1x. This makes caching highly cost-effective for conversations with multiple back-and-forth iterations between the LLM and tools.

Rate limiting — The built-in Anthropic provider tracks token and request budgets from the LLM’s response headers, shared across all workers via Redis. When remaining capacity drops below a safety threshold (5% for tokens, 10% for requests), the system proactively waits until the budget resets rather than hitting rate limit errors. If a rate limit error does occur, the system uses exponential backoff (starting at 60 seconds, doubling up to a maximum of 5 minutes).

`LLMResponse` Fields

Field	Description
`text`	Final text content (`None` if response only contains tool calls)
`tool_calls`	List of `ToolCall` objects with `id`, `name`, and `input`
`stop_reason`	Why the model stopped: `"end_turn"`, `"tool_use"`, or `"max_tokens"`
`raw_content`	Raw content blocks in Anthropic format for message history
`usage`	Token usage dict with `input_tokens` and `output_tokens`
`model`	Model identifier string
`thinking_text`	Extended thinking content (if supported by the model)

`raw_content` Format

The raw_content field must be in Anthropic format because it is appended directly to the conversation message history. Each element is a dict:

Text: {"type": "text", "text": "..."}
Tool use: {"type": "tool_use", "id": "...", "name": "...", "input": {...}}

Constructor

The provider class is instantiated with configuration values unpacked as keyword arguments. Define your __init__ parameters to match the keys in your config_schema.

Embedding Providers

Embedding provider plugins power the knowledge base’s semantic search. They generate vector embeddings for text.

Directory Layout

my_embeddings/
    manifest.py
    provider.py

Example

manifest.py:

from app.plugins.base import PluginManifest, PluginType

manifest = PluginManifest(
    name="my_embeddings",
    plugin_type=PluginType.EMBEDDING,
    version="1.0.0",
    description="My custom embedding provider",
    author="Your Name",
    config_schema={
        "api_key": {
            "type": "string",
            "required": True,
            "secret": True,
            "label": "API Key",
        },
    },
)

provider.py:

from app.plugins.base import BaseEmbeddingProvider


class MyEmbeddingProvider(BaseEmbeddingProvider):
    """Custom embedding provider.

    Args:
        api_key: API key for authentication.
    """

    def __init__(self, api_key: str = "") -> None:
        self.api_key = api_key

    def embed(self, text: str) -> list[float]:
        """Return the embedding vector for the given text.

        Args:
            text: Text to embed.

        Returns:
            Embedding vector as a list of floats.
        """
        # Call your embedding API here
        return [0.0] * 384

    def get_dimensions(self) -> int:
        """Return the dimensionality of the embedding vectors.

        Returns:
            Number of dimensions.
        """
        return 384

Required Methods

Method	Description
`embed(text)`	Return an embedding vector (list of floats) for the given text
`get_dimensions()`	Return the number of dimensions in the embedding vectors

Changing the active embedding provider or its dimensions requires re-embedding all existing knowledge base content, since vectors from different models are not comparable.

Channel Plugins Output Plugins

Provider Plugins

LLM Providers

Directory Layout

Example

Message Format

LLMResponse Fields

raw_content Format

Constructor

Embedding Providers

Directory Layout

Example

Required Methods

`LLMResponse` Fields

`raw_content` Format