Production

Rate Limiting

Capping how many requests a user, IP, or app can make in a time window.

Last updated: April 26, 2026

Definition

Rate limiting protects three things: your wallet (LLM API spend), the upstream provider (you don't want to hit their limits), and your other users (a runaway client should not starve everyone else). Standard pattern: per-IP limits (e.g., 20 chat messages per 15 minutes), per-user limits (e.g., 100 assessment runs per day), and global daily caps (e.g., $50 max LLM spend per day). DynamoDB atomic counters are a clean implementation for serverless. Redis is the classic for monolith deployments.

Code Example

typescript

// Daily cost cap via DynamoDB atomic counter
const today = new Date().toISOString().slice(0, 10);
const result = await ddb.update({
  TableName: "RateLimits",
  Key: { id: `assessment#${today}` },
  UpdateExpression: "ADD #c :one",
  ConditionExpression: "attribute_not_exists(#c) OR #c < :max",
  ExpressionAttributeNames: { "#c": "count" },
  ExpressionAttributeValues: { ":one": 1, ":max": 500 },
});

Atomic DDB increment with a conditional max. Throws if over limit.

When To Use

Required from launch day. Without rate limiting, one bug or one abuser can blow your monthly budget in hours.

Related Terms

Exponential Backoff

Retry strategy that doubles the wait between attempts to give a struggling syste…

Observability

Logging, metrics, and tracing for LLM calls so you can debug, audit, and optimiz…

Building with Rate Limiting?

I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.

Book a discovery call Browse more terms