Rate Limiting
TeamWeb AI includes rate limiting on public-facing endpoints to prevent abuse. This primarily protects the embeddable widget API, which is accessible without authentication.
Widget API Limits
The widget API enforces three rate limits using a sliding window algorithm backed by Redis:
| Limit | Default | Window | Scope |
|---|---|---|---|
| Messages per session per minute | 10 | 60 seconds | Per visitor session |
| Requests per IP per minute | 30 | 60 seconds | Per IP address |
| New conversations per IP per hour | 5 | 1 hour | Per IP address |
When a limit is exceeded, the API returns a 429 Too Many Requests response with a Retry-After header indicating how many seconds the client should wait.
Configuration
Rate limits can be adjusted via environment variables:
| Variable | Default | Description |
|---|---|---|
WIDGET_RATE_LIMIT_MESSAGES_PER_MIN | 10 | Maximum messages a single visitor session can send per minute |
WIDGET_RATE_LIMIT_PER_IP_PER_MIN | 30 | Maximum requests from a single IP per minute |
WIDGET_RATE_LIMIT_NEW_CONVOS_PER_HOUR | 5 | Maximum new conversations a single IP can create per hour |
Set these in your .env file or deployment environment to adjust the limits.
How It Works
The rate limiter uses Redis sorted sets to implement a true sliding window. Each request is recorded with a timestamp. When checking a limit, expired entries outside the window are removed and the remaining count is compared against the limit.
Rate limiting applies to:
POST .../messages— Sending a message (checked against both per-session and per-IP limits)POST .../conversations— Creating a new conversation (checked against per-IP conversation limit)
Read-only endpoints (config, polling for messages) are not rate limited.
Graceful Degradation
If Redis is unavailable, rate limiting is silently skipped and requests are allowed through. This ensures the widget remains functional even if Redis is temporarily down.
Anthropic API Rate Limiting
Separately from the widget rate limiter, TeamWeb AI also tracks rate limits from the Anthropic API. The system reads Anthropic’s response headers (anthropic-ratelimit-*) and proactively delays API calls when token or request budgets are running low. This shared state is stored in Redis so it works across multiple Celery workers.
This is visible in the API Logs section, which shows remaining rate limit headroom on each log entry.