Skip to content

Rate Limiting

TeamWeb AI includes rate limiting on public-facing endpoints to prevent abuse. This primarily protects the embeddable widget API, which is accessible without authentication.

Widget API Limits

The widget API enforces three rate limits using a sliding window algorithm backed by Redis:

LimitDefaultWindowScope
Messages per session per minute1060 secondsPer visitor session
Requests per IP per minute3060 secondsPer IP address
New conversations per IP per hour51 hourPer IP address

When a limit is exceeded, the API returns a 429 Too Many Requests response with a Retry-After header indicating how many seconds the client should wait.

Configuration

Rate limits can be adjusted via environment variables:

VariableDefaultDescription
WIDGET_RATE_LIMIT_MESSAGES_PER_MIN10Maximum messages a single visitor session can send per minute
WIDGET_RATE_LIMIT_PER_IP_PER_MIN30Maximum requests from a single IP per minute
WIDGET_RATE_LIMIT_NEW_CONVOS_PER_HOUR5Maximum new conversations a single IP can create per hour

Set these in your .env file or deployment environment to adjust the limits.

How It Works

The rate limiter uses Redis sorted sets to implement a true sliding window. Each request is recorded with a timestamp. When checking a limit, expired entries outside the window are removed and the remaining count is compared against the limit.

Rate limiting applies to:

  • POST .../messages — Sending a message (checked against both per-session and per-IP limits)
  • POST .../conversations — Creating a new conversation (checked against per-IP conversation limit)

Read-only endpoints (config, polling for messages) are not rate limited.

Graceful Degradation

If Redis is unavailable, rate limiting is silently skipped and requests are allowed through. This ensures the widget remains functional even if Redis is temporarily down.

Anthropic API Rate Limiting

Separately from the widget rate limiter, TeamWeb AI also tracks rate limits from the Anthropic API. The system reads Anthropic’s response headers (anthropic-ratelimit-*) and proactively delays API calls when token or request budgets are running low. This shared state is stored in Redis so it works across multiple Celery workers.

This is visible in the API Logs section, which shows remaining rate limit headroom on each log entry.