Rate Limiting

TeamWeb AI includes rate limiting on public-facing endpoints to prevent abuse. This primarily protects the embeddable widget API, which is accessible without authentication.

Widget API Limits

The widget API enforces three rate limits using a sliding window algorithm backed by Redis:

Limit	Default	Window	Scope
Messages per session per minute	10	60 seconds	Per visitor session
Requests per IP per minute	30	60 seconds	Per IP address
New conversations per IP per hour	5	1 hour	Per IP address

When a limit is exceeded, the API returns a 429 Too Many Requests response with a Retry-After header indicating how many seconds the client should wait.

Configuration

Rate limits can be adjusted via environment variables:

Variable	Default	Description
`WIDGET_RATE_LIMIT_MESSAGES_PER_MIN`	`10`	Maximum messages a single visitor session can send per minute
`WIDGET_RATE_LIMIT_PER_IP_PER_MIN`	`30`	Maximum requests from a single IP per minute
`WIDGET_RATE_LIMIT_NEW_CONVOS_PER_HOUR`	`5`	Maximum new conversations a single IP can create per hour

Set these in your .env file or deployment environment to adjust the limits.

How It Works

The rate limiter uses Redis sorted sets to implement a true sliding window. Each request is recorded with a timestamp. When checking a limit, expired entries outside the window are removed and the remaining count is compared against the limit.

Rate limiting applies to:

POST .../messages — Sending a message (checked against both per-session and per-IP limits)
POST .../conversations — Creating a new conversation (checked against per-IP conversation limit)

Read-only endpoints (config, polling for messages) are not rate limited.

Graceful Degradation

If Redis is unavailable, rate limiting is silently skipped and requests are allowed through. This ensures the widget remains functional even if Redis is temporarily down.

Anthropic API Rate Limiting

Separately from the widget rate limiter, TeamWeb AI also tracks rate limits from the Anthropic API. The system reads Anthropic’s response headers (anthropic-ratelimit-*) and proactively delays API calls when token or request budgets are running low. This shared state is stored in Redis so it works across multiple Celery workers.

This is visible in the API Logs section, which shows remaining rate limit headroom on each log entry.

Code Sandboxing The Lethal Trifecta