Knowledge

The knowledge base is a collection of information that your assistants can search during conversations. When an assistant uses the search_knowledge tool, it performs a semantic search across all knowledge sources in the project to find relevant information.

Knowledge Source Types

TeamWeb AI supports several types of knowledge sources, each suited to different kinds of information:

Websites & URLs

Crawl entire websites or add individual pages

Facts

Add free-text information directly

Documents

Upload PDFs, Word, Excel, PowerPoint, and text files

Code Repositories

Index source code from GitHub or ZIP uploads

Core Knowledge

Any knowledge source can be marked as core. Core knowledge is always included in the assistant’s system prompt, regardless of whether it matches the current conversation topic. This is useful for essential information that should always be available, like brand guidelines or key product facts.

Non-core knowledge is only included when the assistant actively searches for it using the search_knowledge tool.

Keep core knowledge focused and concise. Including too much core knowledge can reduce the assistant’s ability to have natural conversations, as it takes up context space.

How Search Works

Knowledge search uses vector embeddings for semantic matching. When an assistant searches, TeamWeb AI:

Converts the search query into a vector embedding
Finds the most similar knowledge chunks using cosine distance
Returns the relevant chunks to the assistant as context

This means searches match by meaning, not just keywords. A search for “pricing” will find content about “costs”, “rates”, and “subscription plans” even if the word “pricing” doesn’t appear.

Technical Details

Embedding pipeline — When knowledge is ingested (a URL crawled, document uploaded, or fact saved), the system extracts plain text from the source, splits it into overlapping chunks (default 2000 characters with 200-character overlap to preserve context across boundaries), generates a vector embedding for each chunk using the configured embedding provider, and stores each chunk with its embedding in the database.

Cosine distance search — When the assistant calls search_knowledge, the query text is embedded using the same provider and compared against all stored chunk embeddings using cosine distance. Results are ordered by distance (closest first) and filtered by the configurable similarity threshold — lower threshold values are stricter, excluding results that aren’t closely related. The threshold and which source types to include are configurable per assistant through the search_knowledge tool configuration.

Core vs. search-based knowledge — Core knowledge chunks are loaded into the system prompt at the start of every turn, regardless of the conversation topic. This ensures essential information (like brand guidelines or key product facts) is always available but comes at the cost of context window space. Non-core knowledge is only retrieved when the assistant actively searches, keeping the base prompt lean.

Conversation indexing — Completed conversations can be indexed as knowledge. The system summarizes the conversation, chunks and embeds the summary, and stores it as searchable knowledge in the project. This allows future conversations to draw on insights from past interactions through the same semantic search mechanism.

Managing Knowledge

From the project detail page, each knowledge type has its own tab. You can:

Add new knowledge sources of any type
Edit fact content and labels
Re-ingest URLs and re-crawl websites to refresh content
Re-upload documents to update their content
Re-index GitHub repositories to pick up code changes
Toggle core status on any source
Configure auto-sync to keep remote sources fresh automatically
Bulk re-ingest all sources in a project at once
Delete any knowledge source and its associated chunks

Deleting a knowledge source also removes all its embedded chunks from the search index.

Auto-Sync & Freshness

Knowledge sources that pull content from remote URLs can be configured to auto-sync on a schedule. This keeps your knowledge base up to date without manual re-ingestion.

Supported source types:

URLs (individual web pages)
Websites (full site crawls)
GitHub code repositories

Sync frequencies:

Daily – re-ingested every 24 hours
Weekly – re-ingested every 7 days

You can set the sync frequency when adding a new source, or change it later from the project detail page using the sync dropdown on each source card.

Change detection – When a URL is re-ingested during auto-sync, TeamWeb AI computes a hash of the extracted text content. If the content hasn’t changed since the last sync, the re-embedding step is skipped entirely, saving processing time while still updating the freshness timestamp.

Freshness indicator – Each source card shows when it was last synced (e.g., “Synced 3h ago”). This helps you see at a glance how fresh your knowledge is.

Bulk re-ingest – Use the re-ingest button in the project header to re-process all knowledge sources at once. This dispatches background tasks for every ingested source in the project.

Facts, uploaded documents, and ZIP-uploaded code repositories do not support auto-sync because they have no remote source to re-fetch. Re-upload or edit these manually when they need updating.