Rate Limiting
Capping how many requests a user, IP, or app can make in a time window.
Last updated: April 26, 2026
Definition
Rate limiting protects three things: your wallet (LLM API spend), the upstream provider (you don't want to hit their limits), and your other users (a runaway client should not starve everyone else). Standard pattern: per-IP limits (e.g., 20 chat messages per 15 minutes), per-user limits (e.g., 100 assessment runs per day), and global daily caps (e.g., $50 max LLM spend per day). DynamoDB atomic counters are a clean implementation for serverless. Redis is the classic for monolith deployments.
Code Example
// Daily cost cap via DynamoDB atomic counter
const today = new Date().toISOString().slice(0, 10);
const result = await ddb.update({
TableName: "RateLimits",
Key: { id: `assessment#${today}` },
UpdateExpression: "ADD #c :one",
ConditionExpression: "attribute_not_exists(#c) OR #c < :max",
ExpressionAttributeNames: { "#c": "count" },
ExpressionAttributeValues: { ":one": 1, ":max": 500 },
});Atomic DDB increment with a conditional max. Throws if over limit.
When To Use
Required from launch day. Without rate limiting, one bug or one abuser can blow your monthly budget in hours.
Building with Rate Limiting?
I've shipped this pattern in real production systems. If you want a second pair of eyes on your architecture, that's what I do.