How Rate Limiting Works

Unkey’s rate limiting is designed for global, low-latency enforcement across distributed systems.

Architecture

When you call limiter.limit(identifier):

Request hits the nearest Unkey location
Counter is checked and updated
Decision returned in ~30ms globally

See real-time performance metrics at ratelimit.unkey.com.

Sliding window algorithm

Unkey uses a sliding window algorithm that provides smooth rate limiting without the “burst at window start” problem of fixed windows. Fixed window problem:

Limit: 100/minute
User sends 100 requests at 0:59
Window resets at 1:00
User sends 100 more at 1:01
Result: 200 requests in 2 seconds ❌

Sliding window solution:

Limit: 100/minute
Considers requests from the past 60 seconds at any point
No burst exploitation possible

Global consistency

Rate limits are enforced consistently across all regions. A user can’t bypass limits by hitting different geographic endpoints.

Cross-region denial propagation

When an identifier crosses its limit in any region, every other region picks up the denial within a few seconds and starts rejecting the same identifier locally — even before that region sees any of the abusive traffic firsthand. The window is honored end to end: as the offending window decays, every region releases the identifier at the same time. This means a single attacker hitting your API from multiple geographies can’t multiply their effective limit by the number of regions they hit. Once any region denies them, every region denies them. You don’t have to enable or configure anything — propagation runs automatically for every namespace.

Cross-region enforcement applies to limits with a window of at least 1 minute. Shorter windows (for example, per-second burst limits) are enforced per region only, because the propagation roundtrip takes longer than the window itself.

Response fields

Every rate limit check returns:

Field	Type	Description
`success`	`boolean`	`true` if request is allowed
`limit`	`number`	The configured limit
`remaining`	`number`	Requests left in current window
`reset`	`number`	Unix timestamp (ms) when window resets

Handling the response

const { success, remaining, reset } = await limiter.limit(identifier);

if (!success) {
  // Calculate retry time
  const retryAfter = Math.ceil((reset - Date.now()) / 1000);

  return new Response("Rate limit exceeded", {
    status: 429,
    headers: {
      "Retry-After": retryAfter.toString(),
      "X-RateLimit-Remaining": "0",
      "X-RateLimit-Reset": reset.toString(),
    },
  });
}

// Request allowed

Cost-based limiting

Not all requests are equal. Use cost to deduct more from the limit for expensive operations:

// Normal request: costs 1
await limiter.limit(userId);

// Expensive operation: costs 5
await limiter.limit(userId, { cost: 5 });

With a limit of 100/minute:

100 normal requests, OR
20 expensive requests, OR
Mix of both

Track token consumption in the dashboard

When you use cost-based limiting, the rate limit overview in your Unkey dashboard surfaces token usage per identifier alongside request counts. Each row in the namespace logs table includes:

Column	What it shows
`Passed Requests`	Number of requests that were allowed in the selected window
`Blocked Requests`	Number of requests that were denied in the selected window
`Passed Tokens`	Sum of `cost` deducted from the limit by allowed requests
`Blocked Tokens`	Sum of `cost` that would have been required by denied requests

This is useful when different operations have different costs — for example, when an LLM endpoint deducts tokens proportional to prompt size. Inspect the Passed Tokens and Blocked Tokens columns to see which identifiers are actually consuming the most capacity, not just making the most requests. If you don’t pass a cost value, every request counts as 1 token, so the token columns mirror the request columns.

Timeout and fallback

Configure behavior when Unkey is unreachable:

const limiter = new Ratelimit({
  rootKey: process.env.UNKEY_ROOT_KEY,
  namespace: "api",
  limit: 100,
  duration: "60s",
  timeout: {
    ms: 3000, // Wait max 3 seconds
    fallback: (identifier) => ({
      success: true, // Allow on timeout (or false to deny)
      limit: 0,
      remaining: 0,
      reset: Date.now(),
    }),
  },
  onError: (err, identifier) => {
    console.error(`Rate limit error for ${identifier}:`, err);
    return { success: true, limit: 0, remaining: 0, reset: Date.now() };
  },
});

Getting Started

Platform

Build & Deploy

Networking

Observability

CLI

Security

External APIs

AI Code Gen

Audit Logs

CLI

Errors

How Rate Limiting Works

Architecture

Sliding window algorithm

Global consistency

Cross-region denial propagation

Response fields

Handling the response

Cost-based limiting

Track token consumption in the dashboard

Timeout and fallback

Next steps

Custom overrides

SDK reference

Getting Started

Platform

Build & Deploy

Networking

Observability

CLI

Security

External APIs

AI Code Gen

Audit Logs

CLI

Errors

Documentation Index

​Architecture

​Sliding window algorithm

​Global consistency

​Cross-region denial propagation

​Response fields

​Handling the response

​Cost-based limiting

​Track token consumption in the dashboard

​Timeout and fallback

​Next steps

Custom overrides

SDK reference

Architecture

Sliding window algorithm

Global consistency

Cross-region denial propagation

Response fields

Handling the response

Cost-based limiting

Track token consumption in the dashboard

Timeout and fallback

Next steps