Skip to content

Websites & URLs

URLs

Add individual web pages to the knowledge base. TeamWeb AI fetches the page content, extracts the text, and creates searchable embeddings.

  • URL – The web page address to ingest
  • Context Label – A human-readable label describing the source (e.g., “Product pricing page”, “Competitor - Acme Corp”)
  • Core – Whether to always include this source in the assistant’s context
  • Auto-Sync – Optionally set to Daily or Weekly to automatically re-ingest

After adding a URL, TeamWeb AI processes it in the background. The status will show as pending while processing, then ingested when complete.

You can re-ingest a URL to refresh the content if the page has been updated. When auto-sync is enabled, TeamWeb AI will re-fetch the URL on the configured schedule and only re-embed the content if it has changed (using content hashing for change detection).

Website Crawls

Crawl an entire website starting from a root URL. TeamWeb AI discovers and processes pages automatically.

  • Root URL – The starting URL to begin crawling from
  • Context Label – A label for the entire site (e.g., “Product documentation”, “Company blog”)
  • Max Pages – The maximum number of pages to crawl (1–500, default 50)
  • Core – Whether to always include this source in context
  • Auto-Sync – Optionally set to Daily or Weekly to automatically re-crawl

The crawler follows links within the same domain and stays under the root URL path. For example, crawling https://example.com/docs will only follow links under /docs, not /blog.

Discovered pages appear as child sources under the main site entry. You can view all crawled pages and their content. Re-crawling a site deletes existing content and re-discovers pages from scratch.

The crawler skips non-HTML resources like PDFs, images, and media files. Only pages with text/html content are processed.

Keeping Content Fresh

Each URL and website card shows a freshness indicator (“Synced 3h ago”) so you can see when the content was last processed. You can:

  • Change sync frequency using the dropdown on each source card (No sync / Daily / Weekly)
  • Manually re-ingest any individual source at any time
  • Bulk re-ingest all project sources using the button in the project header