← 2026-04-03 📂 All Days

🏗️

🏗️ System Design

Saturday Deep Dive Placeholder — system-design

▼

Saturday Deep Dive Placeholder — system-design

Date: 2026-04-04

Day: 17

Note: This is a Saturday Deep Dive day. Individual section content is not generated.

Today's full content is in: 2026-04-04-deepdive.md

Topic covered today: Rate Limiting & Throttling — a complete 18-minute bilingual deep dive

covering Token Bucket, Sliding Window Counter, Fixed Window, and Leaky Bucket algorithms,

with production-grade Redis implementation, distributed systems edge cases, real-world

examples from GitHub/Stripe/OpenAI/Cloudflare, and a full interview simulation section.

See: /Users/davidding/.openclaw/workspace/byte-by-byte/archive/2026-04-04-deepdive.md

💻

💻 Algorithms

Saturday Deep Dive Placeholder — algorithms

▼

Saturday Deep Dive Placeholder — algorithms

Date: 2026-04-04

Day: 17

Note: This is a Saturday Deep Dive day. Individual section content is not generated.

Today's full content is in: 2026-04-04-deepdive.md

Topic covered today: Rate Limiting & Throttling — a complete 18-minute bilingual deep dive

covering Token Bucket, Sliding Window Counter, Fixed Window, and Leaky Bucket algorithms,

with production-grade Redis implementation, distributed systems edge cases, real-world

examples from GitHub/Stripe/OpenAI/Cloudflare, and a full interview simulation section.

See: /Users/davidding/.openclaw/workspace/byte-by-byte/archive/2026-04-04-deepdive.md

🗣️

🗣️ Soft Skills

Saturday Deep Dive Placeholder — soft-skills

▼

Saturday Deep Dive Placeholder — soft-skills

Date: 2026-04-04

Day: 17

Note: This is a Saturday Deep Dive day. Individual section content is not generated.

Today's full content is in: 2026-04-04-deepdive.md

Topic covered today: Rate Limiting & Throttling — a complete 18-minute bilingual deep dive

covering Token Bucket, Sliding Window Counter, Fixed Window, and Leaky Bucket algorithms,

with production-grade Redis implementation, distributed systems edge cases, real-world

examples from GitHub/Stripe/OpenAI/Cloudflare, and a full interview simulation section.

See: /Users/davidding/.openclaw/workspace/byte-by-byte/archive/2026-04-04-deepdive.md

🎨

🎨 Frontend

Saturday Deep Dive Placeholder — frontend

▼

Saturday Deep Dive Placeholder — frontend

Date: 2026-04-04

Day: 17

Note: This is a Saturday Deep Dive day. Individual section content is not generated.

Today's full content is in: 2026-04-04-deepdive.md

Topic covered today: Rate Limiting & Throttling — a complete 18-minute bilingual deep dive

covering Token Bucket, Sliding Window Counter, Fixed Window, and Leaky Bucket algorithms,

with production-grade Redis implementation, distributed systems edge cases, real-world

examples from GitHub/Stripe/OpenAI/Cloudflare, and a full interview simulation section.

See: /Users/davidding/.openclaw/workspace/byte-by-byte/archive/2026-04-04-deepdive.md

🤖

🤖 AI

Saturday Deep Dive Placeholder — ai

▼

Saturday Deep Dive Placeholder — ai

Date: 2026-04-04

Day: 17

Note: This is a Saturday Deep Dive day. Individual section content is not generated.

Today's full content is in: 2026-04-04-deepdive.md

Topic covered today: Rate Limiting & Throttling — a complete 18-minute bilingual deep dive

covering Token Bucket, Sliding Window Counter, Fixed Window, and Leaky Bucket algorithms,

with production-grade Redis implementation, distributed systems edge cases, real-world

examples from GitHub/Stripe/OpenAI/Cloudflare, and a full interview simulation section.

See: /Users/davidding/.openclaw/workspace/byte-by-byte/archive/2026-04-04-deepdive.md

🔬

🔬 Deep Dive

🔬 Saturday Deep Dive: Rate Limiting & Throttling（限流与节流）(18 min read)

▼

🔬 Saturday Deep Dive: Rate Limiting & Throttling（限流与节流）(18 min read)

📊 Day 17/150 · NeetCode: 16/150 · SysDesign: 15/40 · Behavioral: 15/40 · Frontend: 15/50 · AI: 7/30

🔥 5-day streak!

Overview / 概述

中文：

限流（Rate Limiting）是分布式系统中最重要的防御机制之一。当你构建一个面向公众的 API 时，如果不做限流，任何一个恶意用户或者写了死循环的程序都可以把你的服务打垮。Netflix 每天处理数亿请求，Stripe 的 API 每秒承接数万笔支付——他们都依赖精心设计的限流系统来保持稳定。

限流解决三个核心问题：

可用性保护：防止单个用户耗尽服务资源
成本控制：LLM API 调用、短信等高成本操作必须被约束
公平性：确保资源在用户之间公平分配

English:

Rate limiting is one of the most critical defensive mechanisms in distributed systems. Without it, a single misbehaving client — whether a DDoS attacker or a developer's buggy retry loop — can bring down your entire service. Netflix, Stripe, GitHub, and every major API platform relies on sophisticated rate limiting. This deep dive covers four algorithms with increasing sophistication, a production-grade Redis implementation, and the distributed systems challenges that make this problem genuinely hard.

Part 1: Theory / 理论基础 (5 min)

四大限流算法 / The Four Algorithms

中文：

限流有四种主流算法，每种都有不同的特性和适用场景。

English:

There are four main rate limiting algorithms, each with distinct trade-offs.

算法 1：固定窗口计数器 / Fixed Window Counter

中文：

最简单的算法。把时间切成固定窗口（比如每分钟），窗口内计数，超过阈值就拒绝。

致命缺陷：窗口边界攻击。如果限制是每分钟 100 次，用户可以在 0:59 发送 100 次，在 1:01 再发送 100 次——2 秒内发送了 200 次请求，是限制的两倍。

English:

Simplest algorithm. Divide time into fixed windows (e.g., per minute), count requests in each window, reject when threshold exceeded.

Fatal flaw: boundary burst attacks. A limit of 100/min can be abused to send 200 requests in 2 seconds by straddling window boundaries.

Timeline: |--window 1--|--window 2--|
Requests:    99 at :59   100 at :01
Result:      Both OK! But 199 requests in 2 seconds 🚨

算法 2：滑动窗口日志 / Sliding Window Log

中文：

精确解决边界问题。为每个用户维护一个请求时间戳的有序日志，每次请求时移除窗口外的旧时间戳，检查剩余数量是否超限。

优点：精确。缺点：内存开销大，每个用户每个时间窗口内都要存所有时间戳。

English:

Precisely solves the boundary problem. Maintain a sorted log of request timestamps per user. On each request, evict timestamps outside the window, then check count.

Pro: Exact. Con: Memory-heavy — stores every timestamp for every active user.

User A: [12:00:01, 12:00:23, 12:00:45, 12:00:58] ← 4 requests, all within 1 min window New request at 12:01:05: Evict 12:00:01 (>60s ago) → [12:00:23, 12:00:45, 12:00:58, 12:01:05] Count = 4 → OK (if limit is 5)

算法 3：滑动窗口计数器 / Sliding Window Counter

中文：

固定窗口和滑动窗口日志的折中方案——工程上最常用的方案。

核心思想：利用当前窗口计数和上一个窗口计数的加权组合来近似"真实"的滑动窗口。

公式：estimated_count = prev_window_count × (1 - elapsed_ratio) + curr_window_count

English:

The practical middle ground — most commonly used in production.

Core idea: approximate the sliding window using a weighted combination of the previous and current fixed window counts.

Formula: estimated_count = prev_window_count × (1 - elapsed_ratio) + curr_window_count

Window size: 60s. Limit: 100. prev_window (12:00-13:00): 80 requests curr_window (13:00-14:00): 30 requests, 40s elapsed elapsed_ratio = 40/60 = 0.667 estimate = 80 × (1 - 0.667) + 30 = 80 × 0.333 + 30 = 26.6 + 30 = 56.6 → OK

算法 4：令牌桶 / Token Bucket

中文：

最灵活的算法，也是 AWS、Stripe 等使用的主流方案。

每个用户有一个容量为 capacity 的桶，以 rate 的速度持续向桶中添加令牌。每次请求消耗一个令牌，桶空则拒绝。

核心优势：天然支持突发流量（Burst）。桶满时积累的令牌允许短时间内的高频请求，只要平均速率不超限。

English:

The most flexible algorithm, used by AWS API Gateway, Stripe, and most cloud providers.

Each user has a bucket with capacity C. Tokens fill at rate r tokens/second. Each request consumes one token; empty bucket = reject.

Key advantage: burst tolerance. A full bucket lets users send C requests instantly, then sustain r requests/second long-term.

capacity = 10, rate = 2 tokens/sec t=0s: bucket=10, request→bucket=9 ✅ t=0s: burst of 9 more → bucket=0 ✅ (all burst allowed!) t=1s: bucket=2 (refilled), request→bucket=1 ✅ t=1s: 1 more request → bucket=0 ✅ t=1s: 1 more request → bucket empty → REJECT ❌

漏桶算法 / Leaky Bucket

中文：

令牌桶的"镜像"。请求进入队列，以固定速率处理（"漏出"）。保证输出速率绝对平滑，但不允许突发。适合需要精确控制输出速率的场景（如视频流）。

English:

The mirror of token bucket. Requests enter a queue, processed at fixed rate. Guarantees smooth output but no burst tolerance. Great for video streaming or payment processing where you need steady throughput.

算法选择指南 / Algorithm Selection Guide

场景 / Scenario	推荐算法 / Algorithm
简单 API 限流 / Simple API rate limit	Sliding Window Counter
允许突发 / Burst traffic OK	Token Bucket
需要平滑输出 / Smooth output required	Leaky Bucket
精确计费 / Exact billing	Sliding Window Log

Part 2: Step-by-Step Implementation / 一步一步实现 (8 min)

中文：

我们来实现两个版本：（1）单机内存版——理解算法核心；（2）分布式 Redis 版——生产可用。

English:

We'll build two versions: (1) in-memory single-node to understand the algorithm; (2) distributed Redis version that's production-ready.

Version 1: In-Memory Token Bucket / 单机令牌桶

import time
import threading
from dataclasses import dataclass, field
from typing import Dict

@dataclass
class TokenBucket:
    capacity: float        # Max tokens (burst limit)
    rate: float            # Tokens added per second
    tokens: float = field(init=False)
    last_refill: float = field(init=False)
    lock: threading.Lock = field(default_factory=threading.Lock, init=False)

    def __post_init__(self):
        self.tokens = self.capacity  # Start full
        self.last_refill = time.monotonic()

    def _refill(self):
        """Add tokens based on elapsed time since last refill."""
        now = time.monotonic()
        elapsed = now - self.last_refill
        # Calculate how many tokens to add
        new_tokens = elapsed * self.rate
        # Cap at capacity (can't overflow the bucket)
        self.tokens = min(self.capacity, self.tokens + new_tokens)
        self.last_refill = now

    def consume(self, tokens: float = 1.0) -> bool:
        """Try to consume `tokens`. Returns True if allowed."""
        with self.lock:
            self._refill()
            if self.tokens >= tokens:
                self.tokens -= tokens
                return True  # Allow request
            return False  # Reject: bucket empty


class RateLimiter:
    """Per-user rate limiter using token bucket algorithm."""
    
    def __init__(self, capacity: int = 10, rate: float = 2.0):
        self.capacity = capacity
        self.rate = rate
        self._buckets: Dict[str, TokenBucket] = {}
        self._lock = threading.Lock()

    def _get_bucket(self, user_id: str) -> TokenBucket:
        """Lazily create bucket for user."""
        with self._lock:
            if user_id not in self._buckets:
                self._buckets[user_id] = TokenBucket(
                    capacity=self.capacity,
                    rate=self.rate
                )
            return self._buckets[user_id]

    def is_allowed(self, user_id: str) -> bool:
        return self._get_bucket(user_id).consume()


# Usage
limiter = RateLimiter(capacity=5, rate=1.0)  # 5 burst, 1/sec sustained

for i in range(8):
    allowed = limiter.is_allowed("user_123")
    print(f"Request {i+1}: {'✅ ALLOWED' if allowed else '❌ REJECTED'}")
    # Output: first 5 allowed (burst), next 3 rejected (bucket empty)

Version 2: Distributed Sliding Window Counter with Redis / 分布式滑动窗口计数器

中文：

单机版本有个致命问题：在多实例部署中，每个实例维护各自的计数器，用户可以通过轮询绕过限制。Redis 提供原子操作，是分布式限流的标准选择。

English:

Single-node limiters have a fatal flaw: in multi-instance deployments, each instance has its own counters. A user hitting 3 servers with 10 req/server sees 30 effective requests per window. Redis atomic operations solve this.

import time
import redis
from typing import Tuple

class RedisRateLimiter:
    """
    Sliding Window Counter using Redis.
    
    Key insight: we store TWO counters per user — current window and previous.
    We use the weighted formula to approximate a true sliding window.
    This uses O(1) memory per user, vs O(n) for log-based approach.
    """
    
    def __init__(
        self,
        redis_client: redis.Redis,
        limit: int = 100,
        window_seconds: int = 60,
        key_prefix: str = "rl"
    ):
        self.redis = redis_client
        self.limit = limit
        self.window = window_seconds
        self.prefix = key_prefix

    def _get_keys(self, user_id: str) -> Tuple[str, str]:
        """Get Redis keys for current and previous windows."""
        # Integer division gives us the current window bucket
        current_window = int(time.time()) // self.window
        prev_window = current_window - 1
        
        curr_key = f"{self.prefix}:{user_id}:{current_window}"
        prev_key = f"{self.prefix}:{user_id}:{prev_window}"
        return curr_key, prev_key

    def is_allowed(self, user_id: str) -> Tuple[bool, int]:
        """
        Check if request is allowed.
        Returns (allowed: bool, remaining: int)
        """
        curr_key, prev_key = self._get_keys(user_id)
        
        # Lua script for atomic read-increment-check
        # Redis executes Lua atomically — no race conditions possible
        lua_script = """
        local curr_key = KEYS[1]
        local prev_key = KEYS[2]
        local limit = tonumber(ARGV[1])
        local window = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])
        
        -- Get current counts (default 0)
        local curr_count = tonumber(redis.call('GET', curr_key) or 0)
        local prev_count = tonumber(redis.call('GET', prev_key) or 0)
        
        -- Calculate what fraction of current window has elapsed
        local elapsed_in_window = now % window
        local elapsed_ratio = elapsed_in_window / window
        
        -- Weighted estimate: prev window contributes less as current window progresses
        local estimated = prev_count * (1 - elapsed_ratio) + curr_count
        
        if estimated >= limit then
            return {0, 0}  -- Reject: over limit
        end
        
        -- Increment current window counter, set TTL to 2x window
        redis.call('INCR', curr_key)
        redis.call('EXPIRE', curr_key, window * 2)
        
        local remaining = math.floor(limit - estimated - 1)
        return {1, remaining}
        """
        
        script = self.redis.register_script(lua_script)
        now = time.time()
        
        result = script(
            keys=[curr_key, prev_key],
            args=[self.limit, self.window, now]
        )
        
        allowed = bool(result[0])
        remaining = int(result[1])
        return allowed, remaining

    def get_headers(self, user_id: str) -> dict:
        """Return rate limit headers (standard HTTP spec)."""
        curr_key, prev_key = self._get_keys(user_id)
        curr_count = int(self.redis.get(curr_key) or 0)
        
        return {
            "X-RateLimit-Limit": str(self.limit),
            "X-RateLimit-Remaining": str(max(0, self.limit - curr_count)),
            "X-RateLimit-Reset": str(
                (int(time.time()) // self.window + 1) * self.window
            ),
        }


# FastAPI middleware example
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse

app = FastAPI()
redis_client = redis.Redis(host="localhost", port=6379, db=0)
limiter = RedisRateLimiter(redis_client, limit=100, window_seconds=60)

@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    # Extract user identifier (API key, JWT sub, or IP as fallback)
    user_id = request.headers.get("X-API-Key") or request.client.host
    
    allowed, remaining = limiter.is_allowed(user_id)
    headers = limiter.get_headers(user_id)
    
    if not allowed:
        return JSONResponse(
            status_code=429,
            content={
                "error": "rate_limit_exceeded",
                "message": "Too many requests. Please slow down.",
                "retry_after": headers["X-RateLimit-Reset"]
            },
            headers={**headers, "Retry-After": headers["X-RateLimit-Reset"]}
        )
    
    response = await call_next(request)
    # Inject rate limit headers into every response (good practice)
    for key, value in headers.items():
        response.headers[key] = value
    
    return response

Tiered Rate Limiting / 分级限流

中文：

真实系统中通常有多层限流：不同 API 端点有不同限制，不同用户等级有不同配额。

English:

Production systems use tiered limits — different endpoints, different user tiers, different time windows:

# Tiered config — typically loaded from DB or feature flags
RATE_LIMIT_CONFIG = {
    "free": {
        "default": {"limit": 60, "window": 60},         # 60 req/min
        "/api/search": {"limit": 10, "window": 60},     # Search is expensive
        "/api/export": {"limit": 2, "window": 3600},    # 2 exports/hour
    },
    "pro": {
        "default": {"limit": 1000, "window": 60},
        "/api/search": {"limit": 100, "window": 60},
        "/api/export": {"limit": 50, "window": 3600},
    },
    "enterprise": {
        "default": {"limit": 10000, "window": 60},     # Effectively unlimited
    }
}

def get_limit_config(user_tier: str, endpoint: str) -> dict:
    tier_config = RATE_LIMIT_CONFIG.get(user_tier, RATE_LIMIT_CONFIG["free"])
    # Fall back to default if endpoint not specifically configured
    return tier_config.get(endpoint, tier_config["default"])

Part 3: Edge Cases & Gotchas / 边界情况 (2 min)

中文：

1. 竞态条件（Race Condition）

不用 Redis Lua 脚本而用 GET → 业务逻辑 → SET 会导致竞态。两个请求同时读到 count=99，都认为可以递增到 100，结果实际到达 101。**永远使用原子操作：INCR 或 Lua 脚本。**

2. Redis 故障时怎么办？

两种策略：

Fail Open（故障放行）：Redis 挂了就放行所有请求，服务可用性优先
Fail Closed（故障拒绝）：Redis 挂了就拒绝所有请求，安全性优先

大多数 API 服务选择 Fail Open。

3. 分布式时钟漂移

多台服务器的系统时钟可能不同步（差几百毫秒），导致窗口边界不一致。解决方案：使用 Redis 的 TIME 命令获取统一时间源。

4. 代理和 CDN 后的真实 IP

request.client.host 会返回负载均衡器 IP，不是用户真实 IP。需要解析 X-Forwarded-For 或 CF-Connecting-IP（Cloudflare），但要注意这些 header 可伪造。

5. 重试风暴（Retry Storm）

客户端收到 429 后立即重试，会加剧问题。标准做法：返回 Retry-After header，客户端实现指数退避 + 抖动（Jitter）。

English:

Race condition: Never GET → check → SET. Always use INCR or Lua for atomicity.
Redis failure: Choose Fail Open (availability) or Fail Closed (strict safety). Most APIs choose Fail Open.
Clock skew: Servers clocks drift. Use Redis TIME for a single source of truth.
IP behind proxy: X-Forwarded-For is user-controlled and can be spoofed. Validate carefully, trust only the last hop your own infrastructure added.
Retry storms: Always return Retry-After. Clients should use exponential backoff with jitter: delay = base * 2^attempt + random(0, base).

Part 4: Real-World Application / 实际应用 (2 min)

中文：

GitHub API：未认证请求 60次/小时，认证请求 5000次/小时，搜索 API 10次/分钟（独立限流！）
Stripe：100次/秒，但 Webhook 重试使用指数退避而非固定速率
OpenAI API：按 Token 计算（TPM: Tokens Per Minute），不只是按请求次数——这是令牌桶算法，只是"令牌"的单位是 LLM token 而非请求数
Cloudflare：在边缘节点（CDN PoP）进行限流，在请求到达源站之前就拦截——这叫"边缘限流"，延迟极低但实现复杂
Netflix：每个 API 路由有独立的限流配置，Hystrix/Resilience4j 提供熔断器（Circuit Breaker）作为限流的补充

English:

GitHub API: 60 unauthenticated / 5000 authenticated reqs/hour. Search is separately limited at 10/min — same backend, different resource cost profile.
Stripe: 100 reqs/sec baseline, but billing webhooks use exponential backoff rather than fixed rate limiting to handle payment processing spikes.
OpenAI: Limits on tokens per minute (TPM), not just requests — effectively a token bucket where the "token" unit is an LLM token. They also layer requests per minute (RPM) on top.
Cloudflare Workers: Rate limiting at the edge (CDN PoP level), before requests even reach your origin. Reduces attack surface at sub-millisecond latency with zero origin load.
DoorDash: Uses a combination of per-user limits AND global service limits with adaptive throttling — if downstream services degrade, they automatically tighten rate limits upstream.

Part 5: Interview Simulation / 面试模拟 (3 min)

中文：面试中，限流是系统设计面试的常见子问题。以下是面试官最常问的 5 个追问。

English: Rate limiting appears in nearly every system design interview, either as the main topic or as a component. Here are the 5 most common follow-ups:

Q1: 如果我们有 100 个 API 服务器，怎么保证全局限流准确？

If we have 100 API servers, how do we enforce a global rate limit accurately?

A: 中心化存储（Redis 集群）是标准方案。每台 API 服务器都读写同一个 Redis，原子操作保证一致性。但如果 Redis 延迟高，可以考虑"本地令牌 + 同步"方案：每台服务器缓存一小批令牌（比如 10%），定期从 Redis 补充，减少网络往返。这叫 Token Bucket with Local Cache，牺牲少量精度换取性能。

Q2: 怎么防止用户伪造 IP 绕过限流？

How do you prevent IP spoofing to bypass rate limits?

A: IP-based limiting should be a last resort. Prefer authenticated identifiers (API keys, JWT user IDs). For unauthenticated endpoints, trust only the last hop of X-Forwarded-For that your own infrastructure controls (e.g., the IP your load balancer added). Never trust client-supplied IPs blindly. For high-security scenarios, layer fingerprinting (User-Agent + TLS fingerprint) on top.

Q3: 限流和熔断器（Circuit Breaker）有什么区别？

What's the difference between rate limiting and a circuit breaker?

A: Rate limiting protects your service from excessive client requests (inbound). Circuit breakers protect your service from failing downstream dependencies (outbound). Rate limiter rejects requests at the door; circuit breaker stops you from calling a service that's already down. They're complementary: use both. Netflix Hystrix (and its successor Resilience4j) implements circuit breakers; Redis or API Gateway handles rate limiting.

Q4: 如何对 "代价不同" 的 API 操作做限流？比如写操作比读操作贵得多。

How do you rate limit operations with different "costs"? Writes vs reads, for example.

A: Use weighted token consumption. Instead of consume(1), writes call consume(5). This is exactly how OpenAI handles LLM tokens — a 4K-token completion costs 4000 units from the TPM bucket, while a 100-token prompt costs 100. Define a "cost unit" (e.g., 1 unit = 1 DB read), assign costs to each operation, and run one token bucket per user with a capacity in cost units rather than request counts.

Q5: 用户触发限流了，怎么给他们好的体验？

What's the user experience when a rate limit is hit?

A: Return HTTP 429 with three key pieces of info: (1) Retry-After header with exact seconds until window resets; (2) X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset on every response so clients can self-throttle before hitting the limit; (3) A clear error body with a link to rate limit docs. Pro tip: send a 429 with Retry-After: 0 once the window resets, which is a soft signal that the client can retry immediately. Client libraries like the official Stripe SDK and OpenAI Python client parse these headers and auto-retry with backoff.

🔗 References / 参考资料

📄 System Design Primer — Rate Limiting: https://github.com/donnemartin/system-design-primer
📄 Stripe Engineering Blog — Rate Limiters: https://stripe.com/blog/rate-limiters
📄 Cloudflare — How We Built Rate Limiting: https://blog.cloudflare.com/counting-things-a-lot-of-different-things/
📄 Redis Official Docs — INCR pattern: https://redis.io/docs/latest/commands/incr/
📄 RFC 6585 — HTTP 429 Too Many Requests: https://tools.ietf.org/html/rfc6585
📹 NeetCode — Sliding Window Intro (relevant for SW counter): https://neetcode.io/courses/advanced-algorithms/1

Generated for byte-by-byte Saturday Deep Dive · 2026-04-04

Saturday, April 4, 2026

Saturday Deep Dive Placeholder — system-design

Saturday Deep Dive Placeholder — algorithms

Saturday Deep Dive Placeholder — soft-skills

Saturday Deep Dive Placeholder — frontend

Saturday Deep Dive Placeholder — ai

🔬 Saturday Deep Dive: Rate Limiting & Throttling（限流与节流）(18 min read)

Overview / 概述

Part 1: Theory / 理论基础 (5 min)

四大限流算法 / The Four Algorithms

算法 1：固定窗口计数器 / Fixed Window Counter

算法 2：滑动窗口日志 / Sliding Window Log

算法 3：滑动窗口计数器 / Sliding Window Counter

算法 4：令牌桶 / Token Bucket

漏桶算法 / Leaky Bucket

算法选择指南 / Algorithm Selection Guide

Part 2: Step-by-Step Implementation / 一步一步实现 (8 min)

Version 1: In-Memory Token Bucket / 单机令牌桶

Version 2: Distributed Sliding Window Counter with Redis / 分布式滑动窗口计数器

Tiered Rate Limiting / 分级限流

Part 3: Edge Cases & Gotchas / 边界情况 (2 min)

Part 4: Real-World Application / 实际应用 (2 min)

Part 5: Interview Simulation / 面试模拟 (3 min)

🔗 References / 参考资料