byte-by-byte — Daily Tech Learning

byte-by-byte — 2026-04-04

Sat, 04 Apr 2026 12:00:00 +0000

🏗️ System Design

Saturday Deep Dive Placeholder — system-design

Date: 2026-04-04

Day: 17

Note: This is a Saturday Deep Dive day. Individual section content is not generated.

Today's full content is in: 2026-04-04-deepdive.md

Topic covered today: Rate Limiting & Throttling — a complete 18-minute bilingual deep dive

covering Token Bucket, Sliding Window Counter, Fixed Window, and Leaky Bucket algorithms,

with production-grade Redis implementation, distributed systems edge cases, real-world

examples from GitHub/Stripe/OpenAI/Cloudflare, and a full interview simulation section.

See: /Users/davidding/.openclaw/workspace/byte-by-byte/archive/2026-04-04-deepdive.md

💻 Algorithms

Saturday Deep Dive Placeholder — algorithms

Date: 2026-04-04

Day: 17

Note: This is a Saturday Deep Dive day. Individual section content is not generated.

Today's full content is in: 2026-04-04-deepdive.md

Topic covered today: Rate Limiting & Throttling — a complete 18-minute bilingual deep dive

covering Token Bucket, Sliding Window Counter, Fixed Window, and Leaky Bucket algorithms,

with production-grade Redis implementation, distributed systems edge cases, real-world

examples from GitHub/Stripe/OpenAI/Cloudflare, and a full interview simulation section.

See: /Users/davidding/.openclaw/workspace/byte-by-byte/archive/2026-04-04-deepdive.md

🗣️ Soft Skills

Saturday Deep Dive Placeholder — soft-skills

Date: 2026-04-04

Day: 17

Note: This is a Saturday Deep Dive day. Individual section content is not generated.

Today's full content is in: 2026-04-04-deepdive.md

Topic covered today: Rate Limiting & Throttling — a complete 18-minute bilingual deep dive

covering Token Bucket, Sliding Window Counter, Fixed Window, and Leaky Bucket algorithms,

with production-grade Redis implementation, distributed systems edge cases, real-world

examples from GitHub/Stripe/OpenAI/Cloudflare, and a full interview simulation section.

See: /Users/davidding/.openclaw/workspace/byte-by-byte/archive/2026-04-04-deepdive.md

🎨 Frontend

Saturday Deep Dive Placeholder — frontend

Date: 2026-04-04

Day: 17

Note: This is a Saturday Deep Dive day. Individual section content is not generated.

Today's full content is in: 2026-04-04-deepdive.md

Topic covered today: Rate Limiting & Throttling — a complete 18-minute bilingual deep dive

covering Token Bucket, Sliding Window Counter, Fixed Window, and Leaky Bucket algorithms,

with production-grade Redis implementation, distributed systems edge cases, real-world

examples from GitHub/Stripe/OpenAI/Cloudflare, and a full interview simulation section.

See: /Users/davidding/.openclaw/workspace/byte-by-byte/archive/2026-04-04-deepdive.md

🤖 AI

Saturday Deep Dive Placeholder — ai

Date: 2026-04-04

Day: 17

Note: This is a Saturday Deep Dive day. Individual section content is not generated.

Today's full content is in: 2026-04-04-deepdive.md

Topic covered today: Rate Limiting & Throttling — a complete 18-minute bilingual deep dive

covering Token Bucket, Sliding Window Counter, Fixed Window, and Leaky Bucket algorithms,

with production-grade Redis implementation, distributed systems edge cases, real-world

examples from GitHub/Stripe/OpenAI/Cloudflare, and a full interview simulation section.

See: /Users/davidding/.openclaw/workspace/byte-by-byte/archive/2026-04-04-deepdive.md

Deepdive

🔬 Saturday Deep Dive: Rate Limiting & Throttling（限流与节流）(18 min read)

📊 Day 17/150 · NeetCode: 16/150 · SysDesign: 15/40 · Behavioral: 15/40 · Frontend: 15/50 · AI: 7/30

🔥 5-day streak!

Overview / 概述

中文：

限流（Rate Limiting）是分布式系统中最重要的防御机制之一。当你构建一个面向公众的 API 时，如果不做限流，任何一个恶意用户或者写了死循环的程序都可以把你的服务打垮。Netflix 每天处理数亿请求，Stripe 的 API 每秒承接数万笔支付——他们都依赖精心设计的限流系统来保持稳定。

限流解决三个核心问题：

1. 可用性保护：防止单个用户耗尽服务资源

2. 成本控制：LLM API 调用、短信等高成本操作必须被约束

3. 公平性：确保资源在用户之间公平分配

English:

Rate limiting is one of the most critical defensive mechanisms in distributed systems. Without it, a single misbehaving client — whether a DDoS attacker or a developer's buggy retry loop — can bring down your entire service. Netflix, Stripe, GitHub, and every major API platform relies on sophisticated rate limiting. This deep dive covers four algorithms with increasing sophistication, a production-grade Redis implementation, and the distributed systems challenges that make this problem genuinely hard.

Part 1: Theory / 理论基础 (5 min)

四大限流算法 / The Four Algorithms

中文：

限流有四种主流算法，每种都有不同的特性和适用场景。

English:

There are four main rate limiting algorithms, each with distinct trade-offs.

算法 1：固定窗口计数器 / Fixed Window Counter

中文：

最简单的算法。把时间切成固定窗口（比如每分钟），窗口内计数，超过阈值就拒绝。

致命缺陷：窗口边界攻击。如果限制是每分钟 100 次，用户可以在 0:59 发送 100 次，在 1:01 再发送 100 次——2 秒内发送了 200 次请求，是限制的两倍。

English:

Simplest algorithm. Divide time into fixed windows (e.g., per minute), count requests in each window, reject when threshold exceeded.

Fatal flaw: boundary burst attacks. A limit of 100/min can be abused to send 200 requests in 2 seconds by straddling window boundaries.


Timeline: |--window 1--|--window 2--|
Requests:    99 at :59   100 at :01
Result:      Both OK! But 199 requests in 2 seconds 🚨

算法 2：滑动窗口日志 / Sliding Window Log

中文：

精确解决边界问题。为每个用户维护一个请求时间戳的有序日志，每次请求时移除窗口外的旧时间戳，检查剩余数量是否超限。

优点：精确。缺点：内存开销大，每个用户每个时间窗口内都要存所有时间戳。

English:

Precisely solves the boundary problem. Maintain a sorted log of request timestamps per user. On each request, evict timestamps outside the window, then check count.

Pro: Exact. Con: Memory-heavy — stores every timestamp for every active user.


User A: [12:00:01, 12:00:23, 12:00:45, 12:00:58] ← 4 requests, all within 1 min window
New request at 12:01:05:
  Evict 12:00:01 (>60s ago) → [12:00:23, 12:00:45, 12:00:58, 12:01:05]
  Count = 4 → OK (if limit is 5)

算法 3：滑动窗口计数器 / Sliding Window Counter

中文：

固定窗口和滑动窗口日志的折中方案——工程上最常用的方案。

核心思想：利用当前窗口计数和上一个窗口计数的加权组合来近似"真实"的滑动窗口。

公式：estimated_count = prev_window_count × (1 - elapsed_ratio) + curr_window_count

English:

The practical middle ground — most commonly used in production.

Core idea: approximate the sliding window using a weighted combination of the previous and current fixed window counts.

Formula: estimated_count = prev_window_count × (1 - elapsed_ratio) + curr_window_count


Window size: 60s. Limit: 100.
prev_window (12:00-13:00): 80 requests
curr_window (13:00-14:00): 30 requests, 40s elapsed

elapsed_ratio = 40/60 = 0.667
estimate = 80 × (1 - 0.667) + 30 = 80 × 0.333 + 30 = 26.6 + 30 = 56.6 → OK

算法 4：令牌桶 / Token Bucket

中文：

最灵活的算法，也是 AWS、Stripe 等使用的主流方案。

每个用户有一个容量为 capacity 的桶，以 rate 的速度持续向桶中添加令牌。每次请求消耗一个令牌，桶空则拒绝。

核心优势：天然支持突发流量（Burst）。桶满时积累的令牌允许短时间内的高频请求，只要平均速率不超限。

English:

The most flexible algorithm, used by AWS API Gateway, Stripe, and most cloud providers.

Each user has a bucket with capacity C. Tokens fill at rate r tokens/second. Each request consumes one token; empty bucket = reject.

Key advantage: burst tolerance. A full bucket lets users send C requests instantly, then sustain r requests/second long-term.


capacity = 10, rate = 2 tokens/sec

t=0s:  bucket=10, request→bucket=9  ✅
t=0s:  burst of 9 more → bucket=0  ✅ (all burst allowed!)
t=1s:  bucket=2 (refilled), request→bucket=1  ✅
t=1s:  1 more request → bucket=0  ✅
t=1s:  1 more request → bucket empty → REJECT  ❌

漏桶算法 / Leaky Bucket

中文：

令牌桶的"镜像"。请求进入队列，以固定速率处理（"漏出"）。保证输出速率绝对平滑，但不允许突发。适合需要精确控制输出速率的场景（如视频流）。

English:

The mirror of token bucket. Requests enter a queue, processed at fixed rate. Guarantees smooth output but no burst tolerance. Great for video streaming or payment processing where you need steady throughput.

算法选择指南 / Algorithm Selection Guide

| 场景 / Scenario | 推荐算法 / Algorithm |

|---|---|

| 简单 API 限流 / Simple API rate limit | Sliding Window Counter |

| 允许突发 / Burst traffic OK | Token Bucket |

| 需要平滑输出 / Smooth output required | Leaky Bucket |

| 精确计费 / Exact billing | Sliding Window Log |

Part 2: Step-by-Step Implementation / 一步一步实现 (8 min)

中文：

我们来实现两个版本：（1）单机内存版——理解算法核心；（2）分布式 Redis 版——生产可用。

English:

We'll build two versions: (1) in-memory single-node to understand the algorithm; (2) distributed Redis version that's production-ready.

Version 1: In-Memory Token Bucket / 单机令牌桶


import time
import threading
from dataclasses import dataclass, field
from typing import Dict

@dataclass
class TokenBucket:
    capacity: float        # Max tokens (burst limit)
    rate: float            # Tokens added per second
    tokens: float = field(init=False)
    last_refill: float = field(init=False)
    lock: threading.Lock = field(default_factory=threading.Lock, init=False)

    def __post_init__(self):
        self.tokens = self.capacity  # Start full
        self.last_refill = time.monotonic()

    def _refill(self):
        """Add tokens based on elapsed time since last refill."""
        now = time.monotonic()
        elapsed = now - self.last_refill
        # Calculate how many tokens to add
        new_tokens = elapsed * self.rate
        # Cap at capacity (can't overflow the bucket)
        self.tokens = min(self.capacity, self.tokens + new_tokens)
        self.last_refill = now

    def consume(self, tokens: float = 1.0) -> bool:
        """Try to consume `tokens`. Returns True if allowed."""
        with self.lock:
            self._refill()
            if self.tokens >= tokens:
                self.tokens -= tokens
                return True  # Allow request
            return False  # Reject: bucket empty


class RateLimiter:
    """Per-user rate limiter using token bucket algorithm."""
    
    def __init__(self, capacity: int = 10, rate: float = 2.0):
        self.capacity = capacity
        self.rate = rate
        self._buckets: Dict[str, TokenBucket] = {}
        self._lock = threading.Lock()

    def _get_bucket(self, user_id: str) -> TokenBucket:
        """Lazily create bucket for user."""
        with self._lock:
            if user_id not in self._buckets:
                self._buckets[user_id] = TokenBucket(
                    capacity=self.capacity,
                    rate=self.rate
                )
            return self._buckets[user_id]

    def is_allowed(self, user_id: str) -> bool:
        return self._get_bucket(user_id).consume()


# Usage
limiter = RateLimiter(capacity=5, rate=1.0)  # 5 burst, 1/sec sustained

for i in range(8):
    allowed = limiter.is_allowed("user_123")
    print(f"Request {i+1}: {'✅ ALLOWED' if allowed else '❌ REJECTED'}")
    # Output: first 5 allowed (burst), next 3 rejected (bucket empty)

Version 2: Distributed Sliding Window Counter with Redis / 分布式滑动窗口计数器

中文：

单机版本有个致命问题：在多实例部署中，每个实例维护各自的计数器，用户可以通过轮询绕过限制。Redis 提供原子操作，是分布式限流的标准选择。

English:

Single-node limiters have a fatal flaw: in multi-instance deployments, each instance has its own counters. A user hitting 3 servers with 10 req/server sees 30 effective requests per window. Redis atomic operations solve this.


import time
import redis
from typing import Tuple

class RedisRateLimiter:
    """
    Sliding Window Counter using Redis.
    
    Key insight: we store TWO counters per user — current window and previous.
    We use the weighted formula to approximate a true sliding window.
    This uses O(1) memory per user, vs O(n) for log-based approach.
    """
    
    def __init__(
        self,
        redis_client: redis.Redis,
        limit: int = 100,
        window_seconds: int = 60,
        key_prefix: str = "rl"
    ):
        self.redis = redis_client
        self.limit = limit
        self.window = window_seconds
        self.prefix = key_prefix

    def _get_keys(self, user_id: str) -> Tuple[str, str]:
        """Get Redis keys for current and previous windows."""
        # Integer division gives us the current window bucket
        current_window = int(time.time()) // self.window
        prev_window = current_window - 1
        
        curr_key = f"{self.prefix}:{user_id}:{current_window}"
        prev_key = f"{self.prefix}:{user_id}:{prev_window}"
        return curr_key, prev_key

    def is_allowed(self, user_id: str) -> Tuple[bool, int]:
        """
        Check if request is allowed.
        Returns (allowed: bool, remaining: int)
        """
        curr_key, prev_key = self._get_keys(user_id)
        
        # Lua script for atomic read-increment-check
        # Redis executes Lua atomically — no race conditions possible
        lua_script = """
        local curr_key = KEYS[1]
        local prev_key = KEYS[2]
        local limit = tonumber(ARGV[1])
        local window = tonumber(ARGV[2])
        local now = tonumber(ARGV[3])
        
        -- Get current counts (default 0)
        local curr_count = tonumber(redis.call('GET', curr_key) or 0)
        local prev_count = tonumber(redis.call('GET', prev_key) or 0)
        
        -- Calculate what fraction of current window has elapsed
        local elapsed_in_window = now % window
        local elapsed_ratio = elapsed_in_window / window
        
        -- Weighted estimate: prev window contributes less as current window progresses
        local estimated = prev_count * (1 - elapsed_ratio) + curr_count
        
        if estimated >= limit then
            return {0, 0}  -- Reject: over limit
        end
        
        -- Increment current window counter, set TTL to 2x window
        redis.call('INCR', curr_key)
        redis.call('EXPIRE', curr_key, window * 2)
        
        local remaining = math.floor(limit - estimated - 1)
        return {1, remaining}
        """
        
        script = self.redis.register_script(lua_script)
        now = time.time()
        
        result = script(
            keys=[curr_key, prev_key],
            args=[self.limit, self.window, now]
        )
        
        allowed = bool(result[0])
        remaining = int(result[1])
        return allowed, remaining

    def get_headers(self, user_id: str) -> dict:
        """Return rate limit headers (standard HTTP spec)."""
        curr_key, prev_key = self._get_keys(user_id)
        curr_count = int(self.redis.get(curr_key) or 0)
        
        return {
            "X-RateLimit-Limit": str(self.limit),
            "X-RateLimit-Remaining": str(max(0, self.limit - curr_count)),
            "X-RateLimit-Reset": str(
                (int(time.time()) // self.window + 1) * self.window
            ),
        }


# FastAPI middleware example
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse

app = FastAPI()
redis_client = redis.Redis(host="localhost", port=6379, db=0)
limiter = RedisRateLimiter(redis_client, limit=100, window_seconds=60)

@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    # Extract user identifier (API key, JWT sub, or IP as fallback)
    user_id = request.headers.get("X-API-Key") or request.client.host
    
    allowed, remaining = limiter.is_allowed(user_id)
    headers = limiter.get_headers(user_id)
    
    if not allowed:
        return JSONResponse(
            status_code=429,
            content={
                "error": "rate_limit_exceeded",
                "message": "Too many requests. Please slow down.",
                "retry_after": headers["X-RateLimit-Reset"]
            },
            headers={**headers, "Retry-After": headers["X-RateLimit-Reset"]}
        )
    
    response = await call_next(request)
    # Inject rate limit headers into every response (good practice)
    for key, value in headers.items():
        response.headers[key] = value
    
    return response

Tiered Rate Limiting / 分级限流

中文：

真实系统中通常有多层限流：不同 API 端点有不同限制，不同用户等级有不同配额。

English:

Production systems use tiered limits — different endpoints, different user tiers, different time windows:


# Tiered config — typically loaded from DB or feature flags
RATE_LIMIT_CONFIG = {
    "free": {
        "default": {"limit": 60, "window": 60},         # 60 req/min
        "/api/search": {"limit": 10, "window": 60},     # Search is expensive
        "/api/export": {"limit": 2, "window": 3600},    # 2 exports/hour
    },
    "pro": {
        "default": {"limit": 1000, "window": 60},
        "/api/search": {"limit": 100, "window": 60},
        "/api/export": {"limit": 50, "window": 3600},
    },
    "enterprise": {
        "default": {"limit": 10000, "window": 60},     # Effectively unlimited
    }
}

def get_limit_config(user_tier: str, endpoint: str) -> dict:
    tier_config = RATE_LIMIT_CONFIG.get(user_tier, RATE_LIMIT_CONFIG["free"])
    # Fall back to default if endpoint not specifically configured
    return tier_config.get(endpoint, tier_config["default"])

Part 3: Edge Cases & Gotchas / 边界情况 (2 min)

中文：

1. 竞态条件（Race Condition）

不用 Redis Lua 脚本而用 GET → 业务逻辑 → SET 会导致竞态。两个请求同时读到 count=99，都认为可以递增到 100，结果实际到达 101。永远使用原子操作：INCR 或 Lua 脚本。

2. Redis 故障时怎么办？

两种策略：

- Fail Open（故障放行）：Redis 挂了就放行所有请求，服务可用性优先

- Fail Closed（故障拒绝）：Redis 挂了就拒绝所有请求，安全性优先

大多数 API 服务选择 Fail Open。

3. 分布式时钟漂移

多台服务器的系统时钟可能不同步（差几百毫秒），导致窗口边界不一致。解决方案：使用 Redis 的 TIME 命令获取统一时间源。

4. 代理和 CDN 后的真实 IP

request.client.host 会返回负载均衡器 IP，不是用户真实 IP。需要解析 X-Forwarded-For 或 CF-Connecting-IP（Cloudflare），但要注意这些 header 可伪造。

5. 重试风暴（Retry Storm）

客户端收到 429 后立即重试，会加剧问题。标准做法：返回 Retry-After header，客户端实现指数退避 + 抖动（Jitter）。

English:

1. Race condition: Never GET → check → SET. Always use INCR or Lua for atomicity.

2. Redis failure: Choose Fail Open (availability) or Fail Closed (strict safety). Most APIs choose Fail Open.

3. Clock skew: Servers clocks drift. Use Redis TIME for a single source of truth.

4. IP behind proxy: X-Forwarded-For is user-controlled and can be spoofed. Validate carefully, trust only the last hop your own infrastructure added.

5. Retry storms: Always return Retry-After. Clients should use exponential backoff with jitter: delay = base * 2^attempt + random(0, base).

Part 4: Real-World Application / 实际应用 (2 min)

中文：

- GitHub API：未认证请求 60次/小时，认证请求 5000次/小时，搜索 API 10次/分钟（独立限流！）

- Stripe：100次/秒，但 Webhook 重试使用指数退避而非固定速率

- OpenAI API：按 Token 计算（TPM: Tokens Per Minute），不只是按请求次数——这是令牌桶算法，只是"令牌"的单位是 LLM token 而非请求数

- Cloudflare：在边缘节点（CDN PoP）进行限流，在请求到达源站之前就拦截——这叫"边缘限流"，延迟极低但实现复杂

- Netflix：每个 API 路由有独立的限流配置，Hystrix/Resilience4j 提供熔断器（Circuit Breaker）作为限流的补充

English:

- GitHub API: 60 unauthenticated / 5000 authenticated reqs/hour. Search is separately limited at 10/min — same backend, different resource cost profile.

- Stripe: 100 reqs/sec baseline, but billing webhooks use exponential backoff rather than fixed rate limiting to handle payment processing spikes.

- OpenAI: Limits on tokens per minute (TPM), not just requests — effectively a token bucket where the "token" unit is an LLM token. They also layer requests per minute (RPM) on top.

- Cloudflare Workers: Rate limiting at the edge (CDN PoP level), before requests even reach your origin. Reduces attack surface at sub-millisecond latency with zero origin load.

- DoorDash: Uses a combination of per-user limits AND global service limits with adaptive throttling — if downstream services degrade, they automatically tighten rate limits upstream.

Part 5: Interview Simulation / 面试模拟 (3 min)

中文：面试中，限流是系统设计面试的常见子问题。以下是面试官最常问的 5 个追问。

English: Rate limiting appears in nearly every system design interview, either as the main topic or as a component. Here are the 5 most common follow-ups:

Q1: 如果我们有 100 个 API 服务器，怎么保证全局限流准确？

If we have 100 API servers, how do we enforce a global rate limit accurately?

> A: 中心化存储（Redis 集群）是标准方案。每台 API 服务器都读写同一个 Redis，原子操作保证一致性。但如果 Redis 延迟高，可以考虑"本地令牌 + 同步"方案：每台服务器缓存一小批令牌（比如 10%），定期从 Redis 补充，减少网络往返。这叫 Token Bucket with Local Cache，牺牲少量精度换取性能。

Q2: 怎么防止用户伪造 IP 绕过限流？

How do you prevent IP spoofing to bypass rate limits?

> A: IP-based limiting should be a last resort. Prefer authenticated identifiers (API keys, JWT user IDs). For unauthenticated endpoints, trust only the last hop of X-Forwarded-For that your own infrastructure controls (e.g., the IP your load balancer added). Never trust client-supplied IPs blindly. For high-security scenarios, layer fingerprinting (User-Agent + TLS fingerprint) on top.

Q3: 限流和熔断器（Circuit Breaker）有什么区别？

What's the difference between rate limiting and a circuit breaker?

> A: Rate limiting protects your service from excessive client requests (inbound). Circuit breakers protect your service from failing downstream dependencies (outbound). Rate limiter rejects requests at the door; circuit breaker stops you from calling a service that's already down. They're complementary: use both. Netflix Hystrix (and its successor Resilience4j) implements circuit breakers; Redis or API Gateway handles rate limiting.

Q4: 如何对 "代价不同" 的 API 操作做限流？比如写操作比读操作贵得多。

How do you rate limit operations with different "costs"? Writes vs reads, for example.

> A: Use weighted token consumption. Instead of consume(1), writes call consume(5). This is exactly how OpenAI handles LLM tokens — a 4K-token completion costs 4000 units from the TPM bucket, while a 100-token prompt costs 100. Define a "cost unit" (e.g., 1 unit = 1 DB read), assign costs to each operation, and run one token bucket per user with a capacity in cost units rather than request counts.

Q5: 用户触发限流了，怎么给他们好的体验？

What's the user experience when a rate limit is hit?

> A: Return HTTP 429 with three key pieces of info: (1) Retry-After header with exact seconds until window resets; (2) X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset on every response so clients can self-throttle before hitting the limit; (3) A clear error body with a link to rate limit docs. Pro tip: send a 429 with Retry-After: 0 once the window resets, which is a soft signal that the client can retry immediately. Client libraries like the official Stripe SDK and OpenAI Python client parse these headers and auto-retry with backoff.

🔗 References / 参考资料

- 📄 System Design Primer — Rate Limiting: https://github.com/donnemartin/system-design-primer

- 📄 Stripe Engineering Blog — Rate Limiters: https://stripe.com/blog/rate-limiters

- 📄 Cloudflare — How We Built Rate Limiting: https://blog.cloudflare.com/counting-things-a-lot-of-different-things/

- 📄 Redis Official Docs — INCR pattern: https://redis.io/docs/latest/commands/incr/

- 📄 RFC 6585 — HTTP 429 Too Many Requests: https://tools.ietf.org/html/rfc6585

- 📹 NeetCode — Sliding Window Intro (relevant for SW counter): https://neetcode.io/courses/advanced-algorithms/1

Generated for byte-by-byte Saturday Deep Dive · 2026-04-04

byte-by-byte — 2026-04-03

Fri, 03 Apr 2026 12:00:00 +0000

🏗️ System Design

🏗️ 系统设计 Day 15 / System Design Day 15

Rate Limiting & Throttling — 保护你的 API / Protecting Your API

🌏 真实场景 / Real-World Scenario

想象你在 Twitter（X）做工程师。某个下午，一个爬虫脚本突然对你的搜索 API 发起每秒 10,000 次请求，导致数据库崩溃，所有用户无法刷推。

你需要一个限流系统——在不影响正常用户的前提下，拒绝滥用流量。

Imagine you're an engineer at Twitter (X). One afternoon, a scraper script hammers your search API at 10,000 requests/second, crashing the database for all users. You need a rate limiting system — one that blocks abuse without affecting normal users.

📐 架构图 / Architecture Diagram


                          ┌─────────────────────┐
   Client Request ───────►│   API Gateway /      │
                          │   Rate Limiter       │
                          │                      │
                          │  1. Identify client  │
                          │     (IP / User ID /  │
                          │      API Key)        │
                          │  2. Check counter    │
                          │     in Redis         │
                          │  3. Allow or Block   │
                          └──────────┬───────────┘
                                     │ Allowed
                          ┌──────────▼───────────┐
                          │   Backend Service     │
                          └──────────────────────┘

Redis Counter Example (Sliding Window):
  key: "ratelimit:user123:2026-04-03T08"
  value: 47  (requests this hour)
  TTL: 3600s

⚙️ 主要算法 / Key Algorithms

| 算法 | 原理 | 优点 | 缺点 |

|------|------|------|------|

推荐：Token Bucket（令牌桶） — 生产中最常用，兼顾突发和平均速率。

🔑 关键权衡 / Key Tradeoffs

为什么用 Redis 而不是本地内存？/ Why Redis, not local memory?

- 多台服务器共享同一计数器 → 分布式限流

- 原子操作（INCR + EXPIRE）避免竞态条件

- Redis 单线程模型保证计数器一致性

限流粒度选择 / Granularity choices:

- Per IP — 防爬虫，但误伤 NAT 用户（办公室）

- Per User ID — 最精确，需要认证

- Per API Key — B2B 场景首选

- Per Endpoint — 写接口比读接口更严格

❌ 常见错误 / Common Mistakes

坑 1：Fixed Window 边界问题


Window 1 (0:00-1:00): 99 requests ← allowed
Window 2 (1:00-2:00): 99 requests ← allowed
But at 0:59 + 1:01 = 198 requests in 2 seconds! ← spike!

→ 用 Sliding Window Log 或 Sliding Window Counter 解决

坑 2：忘记 HTTP 响应头


X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1743685200
Retry-After: 3600  ← 429 时必须包含

客户端需要知道何时重试！

坑 3：硬拒绝 vs 降级

不要直接返回 503，试试 queue、degrade（返回缓存数据）或 soft-limit（超出后降速）。

📚 References

- [System Design Interview — Rate Limiting (ByteByteGo)](https://blog.bytebytego.com/p/rate-limiting-fundamentals)

- [Cloudflare Rate Limiting Docs](https://developers.cloudflare.com/waf/rate-limiting-rules/)

- [Redis INCR for Rate Limiting](https://redis.io/docs/manual/patterns/rate-limiting/)

🧒 ELI5

就像游乐场的入口检票员：每个小时只放 100 个人进去。人满了就让你在外面等，而不是把游乐场挤爆。

It's like a bouncer at a club: "100 people per hour max." When it's full, you wait outside — the club doesn't collapse.

💻 Algorithms

💻 算法 Day 16 / Algorithms Day 16

🧩 滑动窗口模式 (2/6) — Building on the template from Day 15

> #3 Longest Substring Without Repeating Characters 🟡 Medium

🔗 [LeetCode #3](https://leetcode.com/problems/longest-substring-without-repeating-characters/)

📹 [NeetCode Video](https://neetcode.io/problems/longest-substring-without-repeating-characters)

今天的问题是模式的第 2 题（共 6 题），我们继续用滑动窗口模板。相比昨天的 #121（股票买卖，只追踪一个 min 值），今天窗口需要追踪「一个集合」——这是滑动窗口真正的威力所在。

Today is problem 2/6 in the Sliding Window block. Unlike #121 (tracking a single min value), today's window tracks a set of characters — this is where the pattern gets powerful.

🗺️ 模板回顾 / Template Recap


left = 0
for right in range(len(arr)):
    window.add(arr[right])      # expand right
    while CONDITION_VIOLATED:
        window.remove(arr[left])  # shrink left
        left += 1
    result = max(result, right - left + 1)

核心洞察 / Key Insight: 右指针扩张探索，左指针收缩维护约束。每个元素最多进出窗口各一次 → O(n)。

🌍 真实类比 / Real-World Analogy

想象你在刷 Spotify 播放历史，想找「连续播放、没有重复歌曲」的最长片段。当出现重复时，你从头开始，直到重复歌曲被移出窗口之外。

Imagine scanning your Spotify history for the longest streak where no song repeats. The moment a duplicate appears, you slide your start forward until the duplicate is gone.

📝 问题 / Problem

给定字符串 s，找出不含重复字符的最长子串的长度。

Given string s, find the length of the longest substring without repeating characters.


Input:  s = "abcabcbb"
Output: 3  ("abc")

Input:  s = "pwwkew"
Output: 3  ("wke")

🗂️ 映射到模板 / Mapping to Template

| 模板元素 | 本题对应 |

|----------|----------|

| window | set() — 当前窗口中的字符 |

| CONDITION_VIOLATED | arr[right] in window（出现重复） |

| 扩张 | 把 s[right] 加入 set |

| 收缩 | 把 s[left] 从 set 移除，left += 1 |

| result | max(result, right - left + 1) |

🐍 Python 解法 + 追踪 / Python Solution + Trace


def lengthOfLongestSubstring(s: str) -> int:
    window = set()       # characters currently in window
    left = 0
    result = 0

    for right in range(len(s)):
        # Shrink window until no duplicate
        while s[right] in window:
            window.remove(s[left])
            left += 1

        # Expand: add new character
        window.add(s[right])
        result = max(result, right - left + 1)

    return result

追踪 "abcabcbb" / Trace:


right=0: window={'a'}, len=1
right=1: window={'a','b'}, len=2
right=2: window={'a','b','c'}, len=3  ← result=3
right=3: 'a' dup! → remove 'a', left=1 → window={'b','c','a'}, len=3
right=4: 'b' dup! → remove 'b', left=2 → window={'c','a','b'}, len=3
right=5: 'c' dup! → remove 'c', left=3 → window={'a','b','c'}, len=3
right=6: 'b' dup! → remove 'a','b', left=5 → window={'c','b'}, len=2
right=7: 'b' dup! → remove 'c','b', left=7 → window={'b'}, len=1
Final: 3 ✅

复杂度 / Complexity:

- Time: O(n) — each char enters/exits window at most once

- Space: O(min(m, n)) where m = charset size (26 for lowercase)

优化版 / Optimized (HashMap for O(1) jump):


def lengthOfLongestSubstring(s: str) -> int:
    char_index = {}  # char → last seen index
    left = 0
    result = 0

    for right, ch in enumerate(s):
        # Jump left past the duplicate directly
        if ch in char_index and char_index[ch] >= left:
            left = char_index[ch] + 1
        char_index[ch] = right
        result = max(result, right - left + 1)

    return result

→ 避免 while 循环，直接跳跃 left，同样 O(n) 但常数更小。

🔄 举一反三 / This Pattern Block

| 题目 | 窗口内容 | 约束条件 |

|------|----------|----------|

| #121 (Day 15) | 单个 min 值 | 无，只追踪最小值 |

| #3 (今天) | 字符集合 | 无重复 |

| #424 (下期) | 字符频率 map | 替换次数 ≤ k |

| #567 | 字符频率 map | 频率完全匹配 |

| #76 (Hard) | 字符频率 map | 包含所有目标字符 |

🧒 ELI5

用两根手指夹住一段字符串，右手不断向右扩张。一旦右手摸到一个和窗口内重复的字母，左手就往右收缩，直到没有重复为止。全程记录最宽的窗口宽度。

Use two fingers on a string. Right finger keeps expanding right. The moment it hits a duplicate, the left finger moves right to shrink the window until no duplicates remain. Track the max width you ever saw.

📚 References

- [LeetCode #3 — Longest Substring Without Repeating Characters](https://leetcode.com/problems/longest-substring-without-repeating-characters/)

- [NeetCode Explanation](https://neetcode.io/problems/longest-substring-without-repeating-characters)

- [Sliding Window Pattern — GeeksForGeeks](https://www.geeksforgeeks.org/window-sliding-technique/)

🗣️ Soft Skills

🗣️ 软技能 Day 15 / Soft Skills Day 15

适应性：快速学习新技术 / Adaptability: Learning New Tech Fast

面试题 / Question:

"Describe a time you had to learn a new technology quickly to solve a problem."

🎯 为什么面试官问这个 / Why Interviewers Ask This

技术栈变化极快。面试官想知道：

1. 你面对陌生技术时会恐慌还是系统地应对？

2. 你的学习方法是否高效？

3. 你能否在压力下交付？

Tech stacks evolve fast. Interviewers want to know: Do you panic or adapt? Do you have a system for learning under pressure? Can you still deliver?

⭐ STAR 框架 / STAR Breakdown

Situation（情境）:

描述具体的业务压力 — 不要说"我需要学习新技术"，要说"我们有 2 周的 deadline，而现有的技术栈无法满足需求"。

Task（任务）:

你的具体职责是什么？是主导学习还是支援？

Action（行动）— 这是重点！

面试官最想听的是你的学习策略：

- 你怎么快速搭建 mental model？（官方文档 → 官方示例 → 一个真实小项目）

- 你怎么判断"够用了"？（能解决当前问题即可，不追求精通）

- 你遇到障碍时怎么求助？（Stack Overflow → 同事 → 官方 issue）

Result（结果）:

量化交付：时间、质量、影响。

❌ Bad Answer vs ✅ Good Answer

❌ 差劲的回答:

"我们需要用 Kubernetes，我就去学了 K8s，然后把服务迁移过去了，很顺利。"

问题：没有细节，没有困难，没有学习过程——听起来是在背稿。

✅ 优秀的回答 (示例):

> "We were 3 weeks from launching a real-time feature, and our backend team decided mid-project to use WebSockets via Socket.io — something I'd never touched. I had 4 days before my frontend piece needed to integrate.

> I started with the official docs to get a mental model (30 min), then built a tiny chat demo locally to feel the API (2 hours). I identified the 3 patterns I'd actually need: emit, on, and room-based broadcasting. I skipped everything else.

> Day 2, I hit a race condition where events fired before the socket connected. I found the root cause via the Socket.io FAQ, added a connection guard, and documented it for the team.

> We shipped on time. The feature had zero WebSocket-related bugs in production. I also wrote an internal doc that helped 2 other engineers onboard faster."

💡 Senior/Staff 级加分点 / Senior/Staff Tips

1. 展示元认知 / Show metacognition — 不只是"我学了 X"，而是"我用了 Y 策略学 X，因为 Z"

2. 说明你如何判断边界 / Scope your learning — "我在 40% 的时间里掌握了 80% 的需求场景——这是故意的选择"

3. 团队放大效应 / Multiply impact — "我写了文档/做了分享，减少了团队学习成本"

4. 展示迁移能力 / Show transfer — "这次学 Redis Streams 的经验，让我 3 个月后学 Kafka 快了 3 倍"

🔑 关键要点 / Key Takeaways

- 学习要有系统：mental model → 最小可用知识 → 实战 → 总结

- 面试时强调trade-off：你选择了快速上手而不是全面掌握

- 量化：时间节省、错误减少、团队效率提升

📚 References

- [STAR Method Explained — The Muse](https://www.themuse.com/advice/star-interview-method)

- [How to Learn Anything Fast (Josh Kaufman)](https://joshkaufman.net/the-first-20-hours/)

- [Meta Engineering: Learning Culture](https://engineering.fb.com/culture/)

🧒 ELI5

面试官在问："当你碰到从没见过的东西，你会怎么办？" 正确答案不是"我啥都会"，而是"我有一套靠谱的方法，让我快速从零变成够用。"

The interviewer asks: "What happens when you hit something you've never seen?" The right answer isn't "I know everything." It's "I have a reliable method to go from zero to useful, fast."

🎨 Frontend

🎨 前端 Day 15 / Frontend Day 15

React Composition — Children, Render Props, HOCs

React 组合模式 — 子组件、渲染属性、高阶组件

🌏 真实场景 / Real Scenario

你在做一个 Dashboard，需要一个 Card 组件——有时里面是图表，有时是表格，有时是表单。你不想为每种情况写 ChartCard、TableCard、FormCard……

You're building a Dashboard. You need a Card component — sometimes it holds a chart, sometimes a table, sometimes a form. You don't want to write ChartCard, TableCard, FormCard separately.

React Composition（组合） 是解决方案。

📦 Pattern 1: Children Props（最常用）


// Generic Card shell — accepts anything as children
interface CardProps {
  title: string;
  children: React.ReactNode; // <-- the magic
}

function Card({ title, children }: CardProps) {
  return (
    
      {title}
      {children}
    
  );
}

// Usage: inject any content
function Dashboard() {
  return (
    
      
    
  );
}

何时用 / When to use: 容器/布局组件，内容不确定。

🎯 Pattern 2: Render Props（渲染属性）

当父组件需要向子内容传递数据时：

When the parent needs to pass data into the child content:


interface DataFetcherProps {
  url: string;
  render: (data: T | null, loading: boolean) => React.ReactNode;
}

function DataFetcher({ url, render }: DataFetcherProps) {
  const [data, setData] = useState(null);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    fetch(url).then(r => r.json()).then(d => {
      setData(d);
      setLoading(false);
    });
  }, [url]);

  return <>{render(data, loading)};
}

// Usage

  url="/api/users"
  render={(users, loading) =>
    loading ?  : 
  }
/>

> 💡 现代 React 里，Custom Hooks 通常比 Render Props 更清晰。Render Props 更多见于老代码或需要 JSX 层控制的场景。

🔧 Pattern 3: Higher-Order Components (HOC)


// withAuth: wraps any component to require authentication
function withAuth(
  WrappedComponent: React.ComponentType
) {
  return function AuthenticatedComponent(props: P) {
    const { isLoggedIn } = useAuth();

    if (!isLoggedIn) {
      return ;
    }

    return ;
  };
}

// Usage
const ProtectedDashboard = withAuth(Dashboard);

🎮 猜猜输出 / Quiz


function Wrapper({ children }: { children: React.ReactNode }) {
  console.log("Wrapper rendered");
  return {children};
}

function App() {
  const [count, setCount] = useState(0);
  return (
    
      
    
  );
}

点击按钮时，"Wrapper rendered" 会打印吗？/ Does "Wrapper rendered" print on button click?

A) 每次点击都打印

B) 只在初始渲染打印

C) 每次点击都打印（因为 children 是新 JSX 对象）

D) 取决于 React 版本

显示答案 / Show Answer

答案：C — Wrapper 每次都会重新渲染。

虽然 Wrapper 本身没有 state，但每次 App 渲染时，是一个新的 JSX 元素（新的对象引用），所以 Wrapper 也会重新渲染。

要阻止这种行为，需要把 Wrapper 包裹在 React.memo 中并且确保 children 引用不变（实际上很难）。

In React, children is a new JSX object on every parent render — so Wrapper re-renders too. To prevent this, you'd need React.memo AND stable children refs.

❌ vs ✅ 常见错误 / Common Mistakes

❌ Prop Drilling Hell:


// Passing title 5 levels deep just to show it in a card

  
    
       ...

✅ Composition with Children:



  Dashboard
  ...

⚖️ 何时用哪个 / When to Use Which

| 场景 | 推荐模式 |

|------|----------|

| 容器/布局，内容不定 | children props |

| 共享逻辑，不共享 UI | Custom Hook (首选) |

| 需要向内容注入数据 | Render Props 或 Custom Hook |

| 跨组件横切关注点（认证、日志） | HOC |

| 老 class component 代码 | HOC（因为 hooks 不能用于 class） |

📚 References

- [React Composition vs Inheritance — Official Docs](https://react.dev/learn/passing-props-to-a-component#passing-jsx-as-children)

- [Render Props Pattern — Kent C. Dodds](https://kentcdodds.com/blog/react-hooks-whats-going-to-happen-to-render-props)

- [HOC Docs — React](https://legacy.reactjs.org/docs/higher-order-components.html)

🧒 ELI5

就像乐高积木：children props 让你把任意积木放进一个盒子；Render Props 让盒子告诉你"我准备好了，你可以放这种积木"；HOC 像是给积木套一个外壳（防水壳、认证壳）。

Like Lego: children props = drop any brick into a box. Render Props = box tells you what brick fits. HOC = wrap a brick in a protective shell.

🤖 AI

🤖 AI Day 17 — 本周 AI 大事件 / AI News Roundup

来源 / Sources: web search, April 2026

📰 Story 1: OpenAI 收购科技脱口秀 TBPN，首次进军媒体 / OpenAI acquires tech talk show TBPN, its first move into media

来源: https://openai.com/index/openai-acquires-tbpn/

OpenAI 宣布收购每日科技谈话节目 TBPN，并表示将保持其编辑独立性；此举被视为公司在“塑造 AI 叙事”和公共沟通上的战略加码。

OpenAI announced it acquired the daily tech talk show TBPN and said it will preserve editorial independence—signaling a strategic push to shape the public AI narrative.

为什么你应该关心: 未来 AI 竞争不只在模型能力，也在“谁掌握公众理解与信任”；媒体渠道会影响监管、人才、客户与舆论风向。/ AI competition isn’t only about model quality—control over narrative and trust can influence regulation, hiring, customer adoption, and public sentiment.

📰 Story 2: Google 推出 Gemini API 的 Flex / Priority Inference，帮助企业控成本与稳定性 / Google adds Flex & Priority Inference tiers to the Gemini API

来源: https://www.infoworld.com/article/4154145/google-gives-enterprises-new-controls-to-manage-ai-inference-costs-and-reliability.html

Google 为 Gemini API 增加新的推理服务分层，让企业在成本、延迟与可靠性之间做更细粒度的取舍，面向更复杂的多步骤“Agent 工作流”。

Google introduced new inference service tiers for the Gemini API, letting enterprises trade off cost, latency, and reliability—especially for complex, multi-step agent workflows.

为什么你应该关心: “AI 变贵”是落地最大障碍之一；更灵活的推理定价与 QoS 会决定哪些产品能规模化、哪些只能停留在 demo。/ Inference economics often decide whether AI products scale or stay demos; pricing/QoS controls directly shape what’s viable in production.

📰 Story 3: Google Research 发布 TurboQuant：大模型推理内存压缩 6 倍 / Google Research unveils TurboQuant memory compression for LLM inference

来源: https://www.networkworld.com/article/4154034/google-research-talks-compression-technology-it-says-will-greatly-reduce-memory-needed-for-ai-processing.html

TurboQuant 据称可将大模型推理所需内存降低 6 倍，并在相同 GPU 数量下提升速度，同时尽量不牺牲精度；这类压缩技术有望推动更多“端侧 AI”能力。

TurboQuant reportedly cuts LLM inference memory by 6× and boosts speed with the same GPU count while preserving accuracy—potentially accelerating more capable on-device AI.

为什么你应该关心: 算法层面的“省内存/省算力”会直接改变硬件需求与成本结构，决定 AI 是集中在云端，还是能更广泛地下沉到手机、PC 与边缘设备。/ Efficiency breakthroughs reshape hardware demand and unit economics, determining whether AI stays cloud-only or becomes truly ubiquitous on phones, PCs, and edge devices.

📰 Story 4: 医疗 AI 新进展：Noah Labs 的 Vox 获 FDA 认定，可用 5 秒语音筛查心衰 / Healthcare AI: Noah Labs’ Vox gets FDA designation for detecting heart failure from a 5-second voice sample

来源: https://www.buildez.ai/blog/ai-trending-april-2026-developments

报道指出 Noah Labs 的 Vox 获得 FDA 相关认定，可通过短短 5 秒语音信号进行心衰风险检测；这展示了 AI 在临床前筛查与远程健康管理的潜力。

Reports say Noah Labs’ Vox received an FDA-related designation and can detect heart failure risk from a 5-second voice sample, highlighting AI’s potential in pre-clinical screening and remote care.

为什么你应该关心: 当 AI 开始进入受监管医疗体系，它的价值不再只是“更聪明”，而是能否真正改善结果、降低成本并通过合规审查。/ As AI enters regulated healthcare, the bar shifts from “smart” to clinically useful, cost-effective, and compliant—opening massive markets (and responsibilities).

📚 References

1. https://openai.com/index/openai-acquires-tbpn/

2. https://www.infoworld.com/article/4154145/google-gives-enterprises-new-controls-to-manage-ai-inference-costs-and-reliability.html

3. https://www.networkworld.com/article/4154034/google-research-talks-compression-technology-it-says-will-greatly-reduce-memory-needed-for-ai-processing.html

4. https://www.buildez.ai/blog/ai-trending-april-2026-developments

🧒 ELI5: 这周 AI 的重点是：大公司在“更省钱更稳定地跑 AI”、以及把 AI 推进现实世界（媒体与医疗）上同时加速。/ This week’s theme: big players are making AI cheaper and more reliable to run—and pushing it deeper into the real world (media and healthcare).

byte-by-byte — 2026-04-02

Thu, 02 Apr 2026 12:00:00 +0000

🏗️ System Design

🏗️ 系统设计 Day 14 / System Design Day 14

API Gateway & Service Mesh

🌍 真实场景 / Real-World Scenario

想象你在优步工作，系统里有 200 个微服务：乘客服务、司机服务、定价服务、路线服务、支付服务……

每个客户端（iOS App、Android App、Web 前端、第三方合作伙伴）直接调用每个服务？噩梦开始了。

这就是为什么优步、Netflix、Amazon 都在用 API Gateway + Service Mesh 这两层抽象。

Imagine you're at Uber with 200 microservices: rider, driver, pricing, routing, payments...

Every client (iOS, Android, web, partners) calling each service directly? Nightmare begins.

This is why Uber, Netflix, and Amazon all use API Gateway + Service Mesh — two layers of abstraction.

🏛️ ASCII 架构图


外部流量 / External Traffic
         │
         ▼
┌─────────────────────┐
│    API Gateway      │  ← 统一入口 / Single Entry Point
│  (Kong/AWS API GW)  │    认证、限流、路由、日志
│                     │    Auth, Rate Limit, Route, Log
└────────┬────────────┘
         │ 内部流量 / Internal Traffic
         ▼
┌─────────────────────────────────────────┐
│           Service Mesh (Istio/Envoy)    │
│                                         │
│  ┌──────────┐    ┌──────────┐          │
│  │Service A │◄──►│Service B │          │
│  │[sidecar] │    │[sidecar] │          │
│  └──────────┘    └──────────┘          │
│         ↕               ↕              │
│  ┌──────────┐    ┌──────────┐          │
│  │Service C │◄──►│Service D │          │
│  │[sidecar] │    │[sidecar] │          │
│  └──────────┘    └──────────┘          │
│                                         │
│  自动 mTLS、熔断、重试、可观测性        │
│  Auto mTLS, Circuit Break, Retry, Obs  │
└─────────────────────────────────────────┘

🔍 核心概念 / Core Concepts

#### API Gateway — 对外的门卫

做什么 / What it does:

- ✅ 认证鉴权 (JWT/OAuth2)

- ✅ 速率限制 (Rate Limiting) — 防刷

- ✅ 请求路由 — /api/v1/users → User Service

- ✅ 协议转换 — REST → gRPC

- ✅ 请求聚合 — 一次请求，内部调 3 个服务

- ✅ SSL 终止 (TLS Termination)

常见产品 / Products: Kong, AWS API Gateway, Nginx, Envoy, Traefik

#### Service Mesh — 对内的神经系统

做什么 / What it does:

- ✅ 服务间 mTLS 加密（零信任网络）

- ✅ 熔断器 (Circuit Breaker) — 防雪崩

- ✅ 自动重试 + 超时

- ✅ 流量管理 (Canary, A/B Test)

- ✅ 分布式追踪 (Tracing)

- ✅ 服务发现 (Service Discovery)

实现方式: Sidecar 代理（每个服务旁边注入一个 Envoy 代理）

常见产品 / Products: Istio, Linkerd, Consul Connect

⚖️ 关键权衡 / Key Tradeoffs

| 方案 | 优点 | 缺点 |

|------|------|------|

| API Gateway 独立 | 简单，运维成本低 | 服务间通信无管控 |

| Service Mesh 独立 | 内部流量全覆盖 | 复杂度高，sidecar 开销 |

| 两者结合 ✅ | 完整的流量控制 | 需要专门的平台团队维护 |

为什么这样设计？/ Why this design?

API Gateway 和 Service Mesh 解决不同层面的问题：

- Gateway = 南北流量（外→内）

- Service Mesh = 东西流量（内→内）

用一个工具同时管两种流量会导致职责不清、配置混乱。

⚠️ 常见踩坑 / Common Mistakes


❌ 把所有业务逻辑放在 API Gateway 里
   → Gateway 应该是"哑路由"，不应该懂业务

❌ 在没有可观测性的情况下上 Service Mesh
   → Mesh 的价值在于追踪和监控，没有这些等于白上

❌ 用 Service Mesh 替代 API Gateway
   → Mesh 不做外部认证和速率限制

❌ 每个团队各自搭 Gateway
   → 应该是全公司统一，否则安全策略碎片化

📚 References

- [Kong API Gateway Docs](https://docs.konghq.com/gateway/latest/) — 主流开源 API Gateway

- [Istio Architecture Overview](https://istio.io/latest/docs/ops/deployment/architecture/) — Service Mesh 权威文档

- [What is a Service Mesh? — CNCF](https://glossary.cncf.io/service-mesh/) — 清晰的概念解释

🧒 ELI5 (像我5岁一样解释)

中文：

想象一个大型游乐园。API Gateway 是大门口的保安，检查你的票、告诉你哪个游乐设施在哪里。Service Mesh 是园区内部的通信系统——每个游乐设施之间如何协调、出故障时怎么绕路。一个管进门，一个管园内。

English:

Imagine a big theme park. The API Gateway is the security guard at the main entrance — checks your ticket, tells you where things are. The Service Mesh is the internal walkie-talkie system between rides — how they coordinate, what happens when one breaks down. One manages getting IN, the other manages moving AROUND.

💻 Algorithms

💻 算法 Day 15 / Algorithms Day 15

#121 Best Time to Buy and Sell Stock (Easy) — Sliding Window

🔗 LeetCode: https://leetcode.com/problems/best-time-to-buy-and-sell-stock/ 🟢

📹 NeetCode: https://www.youtube.com/watch?v=1pkOgXD63yU

🧩 新模式 / New Pattern: 滑动窗口模式 (Sliding Window)

📍 This block: 6 problems

什么时候用 / When to use: 连续子数组/子串的最大值、最小值、满足条件的最短/最长

识别信号 / Signals: subarray, substring, contiguous, window, maximum/minimum length

通用模版 / Template:


left = 0
for right in range(len(arr)):
    window.add(arr[right])  # expand
    while CONDITION_VIOLATED:
        window.remove(arr[left])  # shrink
        left += 1
    result = max(result, right - left + 1)

核心洞察 / Key Insight: 右指针扩张探索，左指针收缩维护约束 — 每个元素最多进出窗口各一次

🌍 现实类比 / Real-world Analogy

中文：把股价想成每天的“进货价”。你要做的是：先找到历史最低进货价（买入），然后在未来某天卖出（卖出价 - 买入价最大）。

English: Think of prices as daily “cost to buy inventory.” You want the lowest cost so far (buy) and the best future selling day to maximize profit.

🧠 题意拆解 / Problem Restatement

中文：给定数组 prices[i] 表示第 i 天价格。只能买一次、卖一次（卖在买之后）。求最大利润。

English: Given prices[i] as day i price. Buy once, sell once (sell after buy). Return max profit.

🗺️ 映射到滑动窗口 / Map to the Pattern Template

这题看起来不像“窗口里有什么元素集合”，但本质仍是“在线扫描 + 维护一个约束状态”。

- right = 今天（卖出日）

- left 不需要显式移动；我们维护“到目前为止最低买入价” = min_price_so_far

- result = 当前最大利润

关键变化 / Key variation:

- 不需要 while 收缩窗口，因为约束不是“窗口合法性”，而是“买入必须在卖出之前”。

✅ Python 解法 / Python Solution


from typing import List

class Solution:
    def maxProfit(self, prices: List[int]) -> int:
        min_price = float('inf')
        best = 0

        for price in prices:  # price is the current 'sell' candidate
            # Update the best profit if we sell today
            best = max(best, price - min_price)
            # Update the minimum price seen so far (best 'buy')
            min_price = min(min_price, price)

        return best

🔎 手动 Trace / Walkthrough Trace

以 prices = [7,1,5,3,6,4] 为例：

- Day1 price=7: min=7, best=max(0, 7-∞)=0

- Day2 price=1: best=max(0, 1-7)=0, min=1

- Day3 price=5: best=max(0, 5-1)=4, min=1

- Day4 price=3: best=max(4, 3-1)=4, min=1

- Day5 price=6: best=max(4, 6-1)=5, min=1

- Day6 price=4: best=max(5, 4-1)=5, min=1

答案 = 5

⏱️ 复杂度 / Complexity

- Time: O(n)

- Space: O(1)

举一反三 / Connect to Other Problems in This Pattern Block

同一个“右指针扫过去，维护一个状态”的思路，在这个 block 里会逐步升级：

1. #3 Longest Substring Without Repeating Characters：窗口里维护“无重复”的约束，需要 while 收缩。

2. #424 Longest Repeating Character Replacement：维护“窗口内最多字符频次”和允许替换次数。

3. #76 Minimum Window Substring：需要精确覆盖目标字符计数，窗口收缩更讲究。

4. #239 Sliding Window Maximum：窗口最大值维护通常用单调队列。

今天这题是最简形态：窗口里只需要记住“历史最小值”。

📚 References

- LeetCode Problem: https://leetcode.com/problems/best-time-to-buy-and-sell-stock/

- NeetCode Explanation: https://www.youtube.com/watch?v=1pkOgXD63yU

- Sliding Window Technique (general): https://www.geeksforgeeks.org/window-sliding-technique/

🧒 ELI5

中文：每天你看到一个价格，就问自己两件事：

1）如果我以前最便宜的时候买了，今天卖能赚多少？

2）今天会不会比以前更便宜，适合作为新的“最便宜买入日”？

一路走到最后，你就找到能赚最多的一次买卖。

English: Each day, ask: (1) if I had bought at the cheapest earlier day, what profit do I get selling today? (2) is today the new cheapest day to buy? Keep the best profit.

🗣️ Soft Skills

🗣️ 软技能 Day 14 / Soft Skills Day 14

How do you handle technical debt? Give me a specific example

类别 / Category: Technical Leadership · Senior/Staff Level

🎯 为什么重要 / Why This Matters

每个工程团队都有技术债。面试官不想听你说"我们应该重构"——他们想听你如何量化债务、说服利益相关者、执行还债计划，同时不停业务开发。

Every eng team has tech debt. Interviewers don't want "we should refactor" — they want to hear how you quantified the debt, persuaded stakeholders, and executed payoff while keeping feature work moving.

⭐ STAR 框架示范 / STAR Example

Situation 情境：

我们的订单服务是 3 年前写的单体模块，每次改价格逻辑都要改 400 行 if-else，每月导致 2-3 次生产事故。

Our order service was a 3-year-old monolith module. Every pricing logic change touched 400 lines of if-else, causing 2-3 production incidents per month.

Task 任务：

作为 Tech Lead，我需要在 Q3 OKR 里推动重构，但产品经理有 12 个新功能排队。

As Tech Lead, I needed to push refactoring into Q3 OKRs while PM had 12 features queued.

Action 行动：

1. 量化痛苦 / Quantify the pain: 统计过去 6 个月：incident 修复耗时 120 工程师小时，每次 pricing 功能开发耗时是预期的 3x

2. 用数据说服 / Data-driven pitch: 向 VP Eng 展示"如果不还债，Q4 每个 pricing 功能要 3 周而不是 1 周"

3. 渐进式重构 / Incremental approach: 不做 Big Bang，设计 Strangler Fig 模式——新功能走新架构，旧功能逐步迁移

4. 20% 规则 / 20% rule: 每个 sprint 拿出 20% capacity 用于还债，写在 sprint contract 里

Result 结果：

3 个月后 incident rate 降了 70%，新 pricing 功能开发时间从 3 周降到 5 天。VP Eng 在季度 all-hands 上引用这个案例。

❌ Bad vs ✅ Good


❌ "技术债很重要，我们应该分配时间去重构。"
   → 太泛、没有 evidence、没有具体行动

✅ "我追踪了6个月的 incident 数据，发现每月120小时
   浪费在补丁上。我提出 Strangler Fig 方案，用20%
   sprint capacity 渐进还债，3个月后 incident 降70%。"
   → 有数据、有策略、有结果

🏅 Senior/Staff Tips

1. 永远先量化 / Always quantify first — "技术债导致了 X 小时浪费 / Y 次事故 / Z% 速度下降"，不要用模糊感受

2. 关联业务指标 / Tie to business metrics — "如果我们不修，下季度功能交付速度慢 40%"

3. Strangler Fig > Big Bang — 渐进式替换比全部重写风险低 10 倍

4. 建立持续机制 / Build ongoing mechanism — 20% rule、tech debt sprints、quality budget 都是好策略

5. 展示 leadership / Show leadership — 你不是在"要求时间做技术的事"，而是在"保护团队交付速度"

🔑 Key Takeaways

- 技术债 = 利息在涨的贷款，不是可有可无的清洁工作

- 量化 + 数据 + 渐进执行 = 让所有人 buy-in 的方程式

- 最好的还债方式：和新功能开发并行，不是"暂停一切来重构"

📚 References

- [Martin Fowler — Technical Debt](https://martinfowler.com/bliki/TechnicalDebt.html) — 技术债经典定义

- [Strangler Fig Pattern — Microsoft](https://learn.microsoft.com/en-us/azure/architecture/patterns/strangler-fig) — 渐进式迁移模式

- [Managing Technical Debt — Software Engineering at Google](https://abseil.io/resources/swe-book/html/ch15.html) — Google 的技术债管理经验

🧒 ELI5

中文：技术债就像你房间越来越乱。你可以继续往里塞东西（加功能），但找东西越来越难（bug 越来越多）。聪明的做法不是某天请假大扫除，而是每天花10分钟整理一点。

English: Tech debt is like your room getting messier over time. You can keep stuffing things in (adding features), but finding anything gets harder (more bugs). The smart move isn't taking a day off to deep-clean — it's tidying up 10 minutes every day.

🎨 Frontend

🎨 前端 Day 14 / Frontend Day 14

React Context — Global State Without Prop Drilling

类别 / Category: React Patterns · Week 3

🌍 真实场景 / Real Scenario

你在做一个 SaaS dashboard。用户登录后，user 对象需要在 Header、Sidebar、Settings、ProfileCard 都能访问。你总不能 → → → 一路传下去吧？

You're building a SaaS dashboard. After login, the user object needs to be accessible in Header, Sidebar, Settings, ProfileCard. You can't keep passing user prop 4 levels deep.

这就是 Prop Drilling 的痛。Context 来拯救你。

💻 核心代码 / Code Snippet


import { createContext, useContext, useState, ReactNode } from 'react';

// 1. Create the context with a type
interface User { name: string; role: 'admin' | 'user'; }

interface AuthContextType {
  user: User | null;
  login: (user: User) => void;
  logout: () => void;
}

const AuthContext = createContext(null);

// 2. Create a provider component
function AuthProvider({ children }: { children: ReactNode }) {
  const [user, setUser] = useState(null);

  const login = (u: User) => setUser(u);
  const logout = () => setUser(null);

  return (
    
      {children}
    
  );
}

// 3. Create a custom hook (ALWAYS do this)
function useAuth() {
  const ctx = useContext(AuthContext);
  if (!ctx) throw new Error('useAuth must be used within AuthProvider');
  return ctx;
}

// 4. Usage — any nested component
function UserAvatar() {
  const { user } = useAuth();
  return {user?.name ?? 'Guest'};
}

🧠 猜猜这段代码输出什么？/ Quiz


const ThemeContext = createContext('light');

function App() {
  return (
    
      
    
  );
}

function Parent() {
  return ;
}

function Child() {
  const theme = useContext(ThemeContext);
  console.log(theme);
  return {theme};
}

A) undefined

B) 'light'

C) 'dark'

D) Throws an error

显示答案 / Show Answer

C) 'dark'

Child 在 ThemeContext.Provider value="dark" 内部，所以 useContext(ThemeContext) 返回 'dark'。'light' 是 default，只有在没有 Provider 包裹时才生效。

❌ Bad vs ✅ Good


// ❌ BAD: Putting everything in one giant Context
const AppContext = createContext({
  user: null, theme: 'light', locale: 'en',
  cart: [], notifications: [], settings: {}
});
// Problem: ANY change re-renders ALL consumers

// ✅ GOOD: Split into focused contexts
const AuthContext = createContext(null);
const ThemeContext = createContext({ theme: 'light' });
const CartContext = createContext({ items: [] });
// Each context only re-renders its own consumers


// ❌ BAD: Using context for frequently changing values

// Re-renders EVERY consumer 60 times/sec

// ✅ GOOD: Use useRef + subscription for high-frequency updates
// Or use a state manager like Zustand/Jotai for this case

🧭 什么时候用 / When to Use

| 用 Context ✅ | 不用 Context ❌ |

|-------------|---------------|

| 主题 (theme) | 频繁变化的值 (mouse position) |

| 认证状态 (auth) | 复杂的全局状态 (use Zustand) |

| 国际化 (locale/i18n) | 只传 1-2 层的 props |

| 功能开关 (feature flags) | 需要 selector 优化的场景 |

经验法则 / Rule of thumb: Context 适合"读多写少"的全局数据。如果值频繁变化，考虑 Zustand 或 Jotai。

📚 References

- [React Docs — useContext](https://react.dev/reference/react/useContext) — 官方文档

- [React Docs — Passing Data Deeply with Context](https://react.dev/learn/passing-data-deeply-with-context) — 详细教程

- [Kent C. Dodds — How to use React Context effectively](https://kentcdodds.com/blog/how-to-use-react-context-effectively) — 最佳实践

🧒 ELI5

中文：想象你家有一个"公告板"（Context Provider）。任何家庭成员（子组件）不管在哪个房间，都能直接看到公告板上的信息，不需要一个人传一个人地接力。

English: Think of Context as a family bulletin board. Any family member (child component), no matter which room they're in, can read the bulletin directly — no need for a game of telephone passing the message through each person.

🤖 AI

🤖 AI Day 16 — CONCEPT

RAG — Retrieval Augmented Generation

检索增强生成 — 让 AI 说真话的关键技术

💡 直觉解释 / Intuitive Explanation

中文：

LLM 有两个大问题：1) 知识有截止日期 2) 会"一本正经地胡说八道"（幻觉）。

RAG 的思路很简单：先查资料，再回答。

就像一个学生考开卷考试：不用背所有知识，考试时翻书找到相关段落，然后用自己的话回答。LLM 就是那个学生，你的知识库就是那本书。

English:

LLMs have two big problems: 1) knowledge cutoff date 2) they "hallucinate" confidently.

RAG's idea is simple: look it up first, then answer.

Like a student in an open-book exam: instead of memorizing everything, find the relevant passages in the book, then answer in your own words. The LLM is the student, your knowledge base is the book.

⚙️ 工作原理 / How It Works


用户提问                     知识库 (文档/网页/DB)
"Redis 和 Memcached           │
  有什么区别？"                │ 预处理：切块 → 向量化
       │                      │ → 存入向量数据库
       ▼                      │
 ┌───────────┐                ▼
 │ 1. Embed  │──查询向量──► ┌──────────────┐
 │   Query   │              │ Vector DB    │
 └───────────┘              │ (Pinecone/   │
       │                    │  ChromaDB)   │
       │                    └──────┬───────┘
       │                           │
       │    Top-K 相关片段         │
       │  ◄────────────────────────┘
       ▼
 ┌───────────────────────────────┐
 │ 2. 构建 Prompt                │
 │ "根据以下资料回答用户问题：    │
 │  [片段1] [片段2] [片段3]      │
 │  问题：Redis vs Memcached？"  │
 └───────────────┬───────────────┘
                 │
                 ▼
 ┌───────────────────────────────┐
 │ 3. LLM 生成带引用的回答       │
 │ "根据文档，Redis 支持持久化    │
 │  而 Memcached 是纯内存的..."  │
 └───────────────────────────────┘

三步流程 / Three Steps:

1. Retrieve 检索 — 把问题转成向量，在向量数据库中找最相关的文档片段

2. Augment 增强 — 把检索到的片段塞进 prompt 作为上下文

3. Generate 生成 — LLM 基于上下文生成回答（不靠猜，靠证据）

🌍 实际应用 / Applications

| 场景 | 怎么用 RAG |

|------|-----------|

| 企业知识库 | 内部文档 → 向量化 → 员工用自然语言提问 |

| 客服机器人 | FAQ + 产品手册 → 回答准确率从 60% → 95% |

| 代码助手 | 项目代码 + 文档 → 上下文感知的代码建议 |

| 法律/医疗 | 法规/病例库 → 有据可查的回答 |

为什么不直接 Fine-tune？

- Fine-tune：慢、贵、知识固化在模型权重里，更新需要重新训练

- RAG：快、便宜、换文档就换知识，实时更新

🐍 可运行代码 / Runnable Python Snippet


pip install chromadb openai


import chromadb

# 1. Create a local vector DB and add documents
client = chromadb.Client()
collection = client.create_collection("demo")

collection.add(
    documents=[
        "Redis supports persistence (RDB/AOF), Memcached is in-memory only.",
        "Redis has data structures: strings, hashes, lists, sets, sorted sets.",
        "Memcached is multi-threaded; Redis is single-threaded with io_threads.",
    ],
    ids=["doc1", "doc2", "doc3"],
)

# 2. Query — ChromaDB auto-embeds and finds relevant docs
results = collection.query(query_texts=["Redis vs Memcached?"], n_results=2)
print("Retrieved:", results["documents"])
# 3. Feed results["documents"] into your LLM prompt as context

📚 References

- [LangChain RAG Tutorial](https://python.langchain.com/docs/tutorials/rag/) — 实战教程

- [ChromaDB Getting Started](https://docs.trychroma.com/docs/overview/getting-started) — 轻量向量数据库

- [Lewis et al. 2020 — RAG Original Paper](https://arxiv.org/abs/2005.11401) — RAG 论文原文

🧒 ELI5

中文：

想象你在答一场考试。普通 AI 全靠记忆答题（有时候记错了还很自信）。RAG AI 答题前先翻了一遍参考书，找到最相关的几页，然后根据书上的内容回答。所以它答得更准，还能告诉你"我是从第 42 页看到的"。

English:

Imagine taking a test. Regular AI answers purely from memory (sometimes confidently wrong). RAG AI first flips through a reference book, finds the most relevant pages, then answers based on what the book says. So it's more accurate and can even say "I found this on page 42."

byte-by-byte — 2026-04-01

Wed, 01 Apr 2026 12:00:00 +0000

Review

🔄 复习日 Day 15 / Review Day 15

📊 进度 / Progress: Day 15/150 · NeetCode: 14/150 · SysDesign: 13/40 · Behavioral: 13/40 · Frontend: 13/50 · AI: 6/30

🔥 4-day streak!

今天是复习日！回顾过去4天的内容。

Today is a review day! Let's revisit the past 4 days of content.

回顾范围 / Review scope: Days 11–14

- 🏗️ Consistent Hashing, CAP Theorem, Message Queues, Microservices vs Monolith

- 💻 Two Sum II, 3Sum, Container With Most Water, Trapping Rain Water

- 🗣️ Proactive problem-solving, Prioritization, Delivering bad news, Cross-team initiatives

- 🎨 React useEffect, useRef, useMemo/useCallback, Custom Hooks

- 🤖 AI News, RLHF, AI News, LoRA & QLoRA

📝 Quick Quiz — 3 Mini-Reviews

Q1: [🏗️ System Design — Consistent Hashing & CAP Theorem]

你在设计一个分布式缓存系统（比如 Redis Cluster）。当一个节点崩溃时，使用普通哈希（hash(key) % N）和一致性哈希（Consistent Hashing）各会发生什么？为什么分布式数据库（如 Cassandra）选择 Eventual Consistency 而不是强一致性？

You're designing a distributed cache (like Redis Cluster). When one node goes down, what happens with regular hashing (hash(key) % N) vs consistent hashing? And why do databases like Cassandra prefer eventual consistency over strong consistency?

显示答案 / Show Answer

普通哈希的问题 / Regular hashing problem:

当节点数 N 变化（比如从 5 变成 4），几乎所有的 key 都需要重新映射到不同节点 — 导致大规模缓存失效（cache stampede）。

When N changes (5→4), nearly all keys get remapped to different nodes — causing a massive cache miss storm.

一致性哈希的优势 / Consistent hashing advantage:

将节点和 key 都映射到一个"哈希环"上。某节点下线时，只有该节点顺时针方向的 key 需要迁移到下一个节点，其他 key 完全不受影响。通常只影响 1/N 的数据。

Both nodes and keys are mapped onto a ring. When a node goes down, only the keys between the failed node and its predecessor need to migrate to the next node clockwise — roughly 1/N of all keys.

Virtual nodes (VNodes) 虚拟节点进一步让每个物理节点在哈希环上占据多个位置，使数据分布更均匀。

CAP & Eventual Consistency:

CAP 定理说在网络分区（P）发生时，你必须在一致性（C）和可用性（A）之间选一个。Cassandra 选择 AP（可用性 + 分区容忍），允许暂时不一致，通过后台的 gossip protocol 和 hinted handoff 最终达到一致。这对"写多读多"场景更实用 — 写入不会因为网络抖动而失败。

CAP says during a network partition, choose C or A. Cassandra picks AP (availability + partition tolerance), accepting that different replicas may briefly disagree. Background reconciliation (gossip, read repair, hinted handoff) eventually converges. For high-traffic apps, "eventually consistent" is a feature — writes don't fail just because one replica is slow.

核心记忆点 / Key insight: Consistent hashing = minimal redistribution. CAP = pick your tradeoff explicitly.

Q2: [💻 Algorithms — Two Pointers: Container With Most Water vs Trapping Rain Water]

两道题都用双指针，都涉及"水"，但思路有微妙差别。Container With Most Water（#11）和 Trapping Rain Water（#42）的关键区别是什么？为什么前者一次遍历就够，后者需要追踪历史最大值？

Both problems use two pointers and involve "water", but the logic is subtly different. What's the key difference between Container With Most Water (#11) and Trapping Rain Water (#42)? Why does the first need only one pass while the second requires tracking historical maximums?

显示答案 / Show Answer

Container With Most Water (#11) — 选最大容器:

你在两个柱子之间装水，水量 = min(left, right) * distance。双指针从两端向中间移动，每次移动较短的那端 — 因为较长端已经尽力了，继续缩小宽度只有移动短板才有可能增加面积。

You're choosing two walls. Water volume = min(left, right) * width. Move the shorter pointer inward — the taller wall can't improve the result unless the other wall gets taller. This greedy choice is provably correct.

Trapping Rain Water (#42) — 计算每个格子的积水:

每个位置能接的水 = min(leftMax, rightMax) - height[i]。关键是每个位置都受其左右两侧最高柱子的限制，需要知道历史最大值。

双指针做法：维护 leftMax 和 rightMax，哪侧较小就处理哪侧（因为较小侧的瓶颈已经确定）。

Every cell can hold water = min(leftMax, rightMax) - height[i]. Each cell is bounded by the tallest wall on BOTH sides — you must track running maximums. Two-pointer trick: whichever side has the smaller max, process it (its bottleneck is determined regardless of the other side's future values).

核心区别 / Core difference:

- Container: 两点之间 的最大矩形，贪心移动短板 ✓

- Trapping: 每个点 上方的积水，需要双侧历史最大值 ✓

记忆口诀: Container = "短板决定上限，移短板求最大"；Trapping = "每格水位 = 两侧最高墙的最小值 − 自身高度"

Q3: [🎨 Frontend — React Hooks: useEffect, useRef, useMemo/useCallback, Custom Hooks]

你在做一个数据密集型 dashboard，需要：

1. 在组件挂载时 fetch 数据并在卸载时取消请求

2. 直接操作一个 DOM 元素（聚焦输入框）而不触发重渲染

3. 避免昂贵的排序函数在每次渲染时重复执行

4. 把上面的 fetch 逻辑复用到多个组件

请说明应该用哪个 Hook，以及最常见的错误写法。

You're building a data-heavy dashboard and need to: (1) fetch data on mount and cancel on unmount, (2) focus a DOM element without triggering re-renders, (3) avoid re-running an expensive sort on every render, (4) reuse the fetch logic across components. Which hook for each, and what's the most common mistake?

显示答案 / Show Answer

1. Fetch + 取消请求 → useEffect


useEffect(() => {
  const controller = new AbortController();
  fetch(url, { signal: controller.signal }).then(setData);
  return () => controller.abort(); // cleanup!
}, [url]); // dependency array matters!

❌ 最常见错误：省略 cleanup，导致组件卸载后仍然 setState，产生内存泄漏和 "Can't perform a React state update on an unmounted component" 警告。

❌ Most common mistake: forgetting the cleanup function, causing state updates on unmounted components.

2. 操作 DOM → useRef


const inputRef = useRef(null);
// inputRef.current.focus() — doesn't trigger re-render

useRef 的值改变不会触发重渲染，适合存储 DOM 引用、定时器 ID、或任何"不影响 UI"的可变值。

Changing .current never triggers a re-render — perfect for DOM refs, timer IDs, or previous values.

3. 避免重复计算 → useMemo


const sorted = useMemo(() => expensiveSort(data), [data]);

❌ 过度使用 useMemo 反而增加开销。只在真正昂贵的计算或引用稳定性（传给子组件）时使用。

❌ Over-memoizing adds overhead. Only use for genuinely expensive computations or referential stability.

4. 复用逻辑 → Custom Hook


function useDashboardData(url) {
  const [data, setData] = useState(null);
  useEffect(() => { /* fetch logic */ }, [url]);
  return data;
}

Custom hooks = 把 hook 逻辑从组件里提取出来，不是 HOC，不是 render props，就是普通函数（必须以 use 开头）。

Custom hooks extract stateful logic — not a new React feature, just a naming convention (use prefix) that signals to React's linter.

记忆矩阵 / Memory matrix:

| Goal | Hook |

|------|------|

| Side effects, fetch, subscriptions | useEffect |

| DOM access, mutable value (no re-render) | useRef |

| Expensive computation cache | useMemo |

| Stable function reference | useCallback |

| Reusable stateful logic | Custom Hook |

💡 复习巩固记忆，螺旋式上升。每次回顾不只是"记住了吗"，而是"能用自己的话解释吗"。

Review doesn't just ask "do you remember?" — it asks "can you explain it in your own words?"

📅 明天继续新内容！Day 16 coming tomorrow!

Generated by byte-by-byte · Day 15 of 150 · 2026-04-01

byte-by-byte — 2026-03-31

Tue, 31 Mar 2026 12:00:00 +0000

🏗️ System Design

🏗️ 系统设计 Day 14 / System Design Day 14

微服务 vs 单体架构 / Microservices vs Monolith

> 难度 / Difficulty: Intermediate · 阶段 / Phase: Growth · 预计阅读 / Read time: 3 min

🌍 真实场景 / Real-World Scenario

想象你在一家初创公司工作，产品刚上线，代码都在一个仓库里。随着用户量增长到百万级别，你开始思考：要不要把代码拆成独立的服务？什么时候该拆？怎么拆？

Imagine you're at a startup. Your entire product lives in one codebase. As you scale to millions of users, you face the classic question: should you break it apart into microservices? When? How?

🏛️ 架构图 / Architecture Diagrams

单体架构 / Monolith


┌─────────────────────────────────────────┐
│              Monolith App               │
│  ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│  │  Users   │ │ Orders   │ │Payments │ │
│  │ Module   │ │ Module   │ │ Module  │ │
│  └────┬─────┘ └────┬─────┘ └────┬────┘ │
│       └─────────────┴─────────────┘     │
│                    │                    │
│          ┌─────────▼─────────┐         │
│          │   Single Database │         │
│          └───────────────────┘         │
└─────────────────────────────────────────┘
         │ Deploy everything together │

微服务架构 / Microservices


Client ──► API Gateway
               │
      ┌────────┼────────┐
      ▼        ▼        ▼
 ┌────────┐ ┌──────┐ ┌──────────┐
 │ Users  │ │Orders│ │ Payments │
 │Service │ │Svc   │ │ Service  │
 └───┬────┘ └──┬───┘ └───┬──────┘
     │         │          │
  ┌──▼──┐  ┌──▼──┐   ┌───▼───┐
  │ DB  │  │ DB  │   │  DB   │
  └─────┘  └─────┘   └───────┘
   (独立部署, message queue 通信)

⚖️ 核心权衡 / Key Tradeoffs

为什么选单体？/ Why Monolith?

- 简单 — 一个代码库，一次部署，本地开发直接跑

- 低延迟 — 模块间函数调用，无网络开销

- 事务一致性 — 一个数据库，ACID 事务天然支持

- 适合阶段 — 团队 < 20 人，产品 PMF 还没验证时

为什么选微服务？/ Why Microservices?

- 独立扩展 — Payment 服务流量暴增，只扩它，不动 Users 服务

- 技术异构 — 推荐系统用 Python/ML，API 层用 Go，各自最优

- 故障隔离 — 一个服务崩了，不影响整体

- 团队自治 — 不同团队独立发布，互不阻塞

- 适合阶段 — 团队 > 50 人，有专门 DevOps/Platform 团队时

对比表 / Comparison

| 维度 | 单体 | 微服务 |

|------|------|--------|

| 部署复杂度 | 低 ✅ | 高 ❌ |

| 开发速度(早期) | 快 ✅ | 慢 ❌ |

| 独立扩展 | ❌ | ✅ |

| 故障隔离 | ❌ | ✅ |

| 数据一致性 | 容易 ✅ | 需要设计 ❌ |

| 运维成本 | 低 ✅ | 高 ❌ |

🪤 别踩这个坑 / Common Mistakes

❌ 坑1: 过早微服务化 (Premature Microservices)

刚起步就拆服务，结果团队只有3个人要维护10个服务+Kubernetes。

> "We went microservices on day one, and it almost killed us." — every startup that tried it too early

✅ 正确做法: 先做"模块化单体"(Modular Monolith)，内部模块化，边界清晰，后期再物理拆分。

❌ 坑2: 分布式单体 (Distributed Monolith)

拆成多个服务，但服务之间强耦合，必须同步部署。既有微服务的复杂性，又没有微服务的好处。

✅ 正确做法: 服务间通过 API 或消息队列解耦，不共享数据库。

❌ 坑3: 忽视跨服务事务

订单服务扣库存成功，支付服务失败了，数据不一致。

✅ 正确做法: 使用 Saga 模式或最终一致性设计。

📚 References

- [Martin Fowler — Microservices](https://martinfowler.com/articles/microservices.html)

- [Martin Fowler — Monolith First](https://martinfowler.com/bliki/MonolithFirst.html)

- [AWS — Microservices vs Monolithic Architecture](https://aws.amazon.com/microservices/)

🧒 ELI5

单体就像一家小餐厅，一个厨房做所有菜，简单高效。微服务像大型餐厅连锁，每家分店专做一类菜，可以独立扩张，但管理更复杂。刚开始开一家店，别一上来就开连锁。

Monolith = one kitchen that cooks everything. Simple, fast to start. Microservices = a food court where each stall specializes. Great at scale, but way more management. Start with one kitchen; split when it gets too crowded.

💻 Algorithms

💻 算法 Day 14 / Algorithms Day 14

#42 Trapping Rain Water (Hard) — Two Pointers

🧩 Two Pointers (5/5) — building on the template from earlier days in this block

- 🔗 LeetCode: https://leetcode.com/problems/trapping-rain-water/ 🔴

- 📹 NeetCode: https://www.youtube.com/watch?v=ZI2z5pq0TqA

- Pattern / 模式: Two Pointers（双指针）

🌧️ 现实类比 / Real-world analogy

把城市的屋顶想成一排高度不同的墙。下雨后，低洼处会积水，但能积多少取决于它左边最高的墙和右边最高的墙：

> water[i] = min(maxLeft, maxRight) - height[i]（如果为正）

Think of bars as walls. The water above a bar is limited by the shorter of the tallest wall on its left and the tallest wall on its right.

🧠 问题重述 / Problem

给定数组 height 表示柱子高度，每根柱子宽度为 1，计算下雨后能接多少雨水。

Given height, compute total trapped water.

🧩 如何映射到双指针模板 / Map to the Two Pointers template

之前的双指针块（

#125 回文、#167 两数之和II、#15 三数之和、#11 盛最多水的容器

）里，左右指针“夹逼”的核心是：

- 每一步都能确定一侧的最优/可行性，因此可以移动那一侧，整体 O(n)

这题的“变化点”是：

- 我们不再追求 pair/sum，而是维护 leftMax / rightMax，并在每一步“结算”一侧的水量。

Key twist vs earlier problems: instead of comparing sums/areas, we compare leftMax and rightMax. The side with the smaller max can be finalized because its limiting wall is known.

✅ 双指针解法 / Two pointers solution

核心思路 / Key idea

- l, r 从两端向中间走

- 维护 leftMax = max(height[0..l])，rightMax = max(height[r..end])

- 如果 leftMax < rightMax：左边的水位上限已确定（被 leftMax 限制），可以计算 l 位置的水并 l += 1

- 否则：对称处理右边

Python 代码 / Python code


from typing import List

class Solution:
    def trap(self, height: List[int]) -> int:
        l, r = 0, len(height) - 1
        left_max, right_max = 0, 0
        water = 0

        while l < r:
            if height[l] < height[r]:
                # left side is bounded by left_max
                if height[l] >= left_max:
                    left_max = height[l]
                else:
                    water += left_max - height[l]
                l += 1
            else:
                # right side is bounded by right_max
                if height[r] >= right_max:
                    right_max = height[r]
                else:
                    water += right_max - height[r]
                r -= 1

        return water

🔍 手动走一遍 / Quick trace

例子：[0,1,0,2,1,0,1,3,2,1,2,1]

- 开始 l=0, r=11, left_max=0, right_max=0, water=0

- 右边较高（1 vs 1 走 else），更新 right_max=1，r=10

- 当左边较小（0 < 2）：左侧可结算，left_max=0，l=1

- height[2]=0 时，left_max=1，水 += 1-0 = 1

- ... 最终累计 water=6

Why it works: the side with the smaller boundary max is the limiting factor, so we can safely finalize water there without knowing the exact interior structure.

⏱️ 复杂度 / Complexity

- Time: O(n) (each pointer moves at most n steps)

- Space: O(1)

举一反三 / Transfer within this pattern block

- #11 Container With Most Water：移动“短板”来寻找更可能变大的面积

- #15 3Sum：固定一个数 + 双指针夹逼

- #125 Valid Palindrome：两端检查并向内收缩

共同点：

- 每一步移动都基于一个可证明的单调性/界限，避免 O(n^2)

📚 References

- LeetCode editorial: https://leetcode.com/problems/trapping-rain-water/editorial/

- NeetCode explanation (video): https://www.youtube.com/watch?v=ZI2z5pq0TqA

- GeeksforGeeks (two-pointer approach): https://www.geeksforgeeks.org/trapping-rain-water/

🧒 ELI5

想象你在一排积木之间倒水。某一格能装多少水，只取决于它左边最高的积木和右边最高的积木里较矮的那个。双指针就是从两边往中间走，随时记住“目前看到的最高积木”，然后一格一格把水算出来。

Imagine filling water between blocks. A spot’s water level is capped by the shorter of the tallest block on its left and right. Two pointers walk inward, tracking those tallest blocks and adding water as you go.

🗣️ Soft Skills

🗣️ 软技能 Day 14 / Soft Skills Day 14

Tell me about a time you drove a large cross-team initiative

> 级别 / Level: Staff · 主题 / Category: Leadership · Read time: 2 min

为什么重要 / Why this matters

中文：

跨团队项目（例如：统一身份认证、支付迁移、数据平台升级、全站性能治理）最大的风险往往不是技术，而是对齐、节奏、依赖、沟通成本。Staff 级别面试官想听到的是：你如何在没有“直接汇报关系”的情况下，把很多人带到同一条船上。

English:

For cross-team initiatives, the hardest part is rarely the technical design—it’s alignment, dependencies, cadence, and communication overhead. Interviewers want evidence you can lead without formal authority.

⭐ STAR 结构（建议 90 秒回答）/ STAR structure (aim for 90 seconds)

S — Situation（背景）

- 中文：项目是什么？影响范围多大？涉及哪些团队？

- English: What was the initiative? Scope? Which teams?

T — Task（你的职责）

- 中文：你具体负责什么？目标/成功标准是什么（SLO、迁移比例、成本、上线日期）？

- English: What did you own? What were the success metrics?

A — Action（你做了什么）

用“可复制的方法论”讲：

1) 定义北极星指标 / Define a north-star metric：例如 p95 latency、error budget、migration completion。

2) 把问题拆成工作流 / Break into a plan：里程碑、风险清单、依赖图、RACI（谁负责/批准/咨询/知会）。

3) 建立节奏 / Create operating cadence：每周 cross-team sync、异步周报、决策记录（ADR）、升级通道。

4) 提前拆雷 / De-risk early：先做 POC / pilot、灰度、回滚预案、观测（dashboards + alerts）。

5) 对齐激励 / Align incentives：明确“对他们有什么好处”（减少 oncall、降低成本、提高转化）。

English (same content):

1) Define a measurable north-star metric.

2) Turn ambiguity into a concrete plan (milestones, dependency map, RACI).

3) Establish cadence (syncs, async updates, decision logs).

4) De-risk early (pilot, gradual rollout, rollback plan, observability).

5) Align incentives so partner teams want to participate.

R — Result（结果）

- 中文：用数字结尾：提前/按期上线、迁移比例、故障率下降、成本节省、开发效率提升。

- English: Close with numbers: completion %, latency improvement, incidents reduced, cost savings.

❌ Bad vs ✅ Good（面试官一听就懂）/ Bad vs Good

❌ Bad（空泛）

- “我组织了很多会议，大家最后达成一致，然后上线了。”

✅ Good（可验证）

- “我先把目标写成 p95 从 800ms 降到 400ms，并把依赖拆成 3 条迁移路径；每周一次跨团队同步 + 每两天异步进度；关键风险是 X 团队的 schema 变更，于是先做了两周 pilot 和双写；最终 6 周内迁移 92%，相关 oncall 事故从每周 5 起降到 1 起。”

Senior/Staff 加分点 / Senior/Staff-level tips

- 把决策写下来：用 ADR 记录 tradeoffs，不靠“口口相传”。

- 沟通要“分层”：IC 关注任务与风险，Manager 关注里程碑与资源，Exec 关注指标与 ROI。

- 处理冲突的方式：先找共同目标，再用数据/实验说话；必要时明确升级路径。

- 让系统自动运行：好的机制（dashboard、SLO、自动化迁移工具）比个人英雄更可靠。

Key Takeaways

- 中文：跨团队项目 = 目标清晰 + 依赖可视化 + 节奏稳定 + 风险前置 + 激励对齐。

- English: Cross-team success = clear metrics + dependency visibility + steady cadence + early de-risking + aligned incentives.

📚 References

- Google SRE Book — Service Level Objectives: https://sre.google/sre-book/service-level-objectives/

- RACI matrix overview (Atlassian): https://www.atlassian.com/team-playbook/plays/roles-and-responsibilities

- Amazon Working Backwards (concept): https://www.aboutamazon.com/news/company-news/working-backwards-how-amazon-starts-with-the-customer

🧒 ELI5

中文：你要做一件很多同学一起完成的大作业。你得先说清楚“最终要拿多少分”（指标），再把任务分好、规定每周检查一次进度、提前发现最难的部分先做小实验，最后大家才会真的按同一个计划走。

English: It’s like a big group project. First define what “success” means, split work and owners, check progress regularly, test the risky parts early, and keep everyone moving together.

🎨 Frontend

🎨 前端 Day 14 / Frontend Day 14

React Custom Hooks — Extract & Reuse Logic

> 阶段 / Phase: Growth · Read time: 2 min

🧩 真实场景 / Real scenario

中文：你在做一个 dashboard，需要在多个页面复用“拉取数据 + loading/error + 取消请求 + 刷新”的逻辑。你不想每个组件都写一遍 useEffect + AbortController + 一堆状态。

English: You’re building a dashboard and need reusable “fetch + loading/error + cancellation + refresh” logic across multiple pages.

✅ 生产可用的 Custom Hook 例子 / Production-ready custom hook


import { useCallback, useEffect, useRef, useState } from "react";

type AsyncState = {
  data: T | null;
  error: string | null;
  loading: boolean;
};

// Code comments in English
export function useJsonFetch(url: string, deps: unknown[] = []) {
  const [state, setState] = useState>({
    data: null,
    error: null,
    loading: true,
  });

  // Keep AbortController in a ref so we can cancel in-flight requests.
  const abortRef = useRef(null);

  const run = useCallback(async () => {
    abortRef.current?.abort();
    const controller = new AbortController();
    abortRef.current = controller;

    setState((s) => ({ ...s, loading: true, error: null }));

    try {
      const res = await fetch(url, { signal: controller.signal });
      if (!res.ok) throw new Error(`HTTP ${res.status}`);
      const json = (await res.json()) as T;
      setState({ data: json, error: null, loading: false });
    } catch (e) {
      // Abort is not a real “error” the user should see
      if ((e as any)?.name === "AbortError") return;
      setState({ data: null, error: (e as Error).message, loading: false });
    }
  }, [url, ...deps]);

  useEffect(() => {
    void run();
    return () => abortRef.current?.abort();
  }, [run]);

  return { ...state, refresh: run };
}

怎么用 / How to use:


type User = { id: string; name: string };

function UsersPanel() {
  const { data, loading, error, refresh } = useJsonFetch("/api/users");

  if (loading) return Loading…;
  if (error) return ;
  return (
    
      
      {data?.map((u) => {u.name}
)}
    
  );
}

🧠 “猜猜这段代码输出什么？”/ Output quiz


function Demo() {
  const [n, setN] = useState(0);

  const inc = useCallback(() => setN(n + 1), []);

  return ;
}

A) 每次点击都会正常 +1 / Increments correctly each click

B) 永远显示 0 / Always shows 0

C) 只会变成 1，然后卡住 / Becomes 1 then stuck

D) 组件会崩溃 / Component crashes

正确答案 / Correct: C

中文解释：useCallback(..., []) 把 inc 固定住了，但它闭包里捕获的 n 永远是初始值 0，所以每次都是 setN(0 + 1)。

English: The callback is memoized with an empty deps array, so it captures n=0 forever. Each click sets n to 1 again.

✅ 修复方式：用 functional update


const inc = useCallback(() => setN((x) => x + 1), []);

❌ 常见错误 vs ✅ 正确方式 / Common mistake vs correct approach

❌ 错误：custom hook 里依赖不稳定，导致无限刷新 / unstable deps causing loops


useEffect(() => {
  fetch(url).then(...)
}, [options]) // options is a new object every render

✅ 正确：让依赖稳定（useMemo / useCallback / 把对象提升到外层）


const options = useMemo(() => ({ headers: { "x": "1" } }), []);
useEffect(() => {
  fetch(url, options).then(...)
}, [url, options])

什么时候用 / When to use

- ✅ 当你要复用“状态 + 副作用 + 取消/清理 + 触发刷新”的组合逻辑

- ✅ 当你希望组件变得更像“UI 视图层”，逻辑下沉到 hook

什么时候不要用 / When NOT to use

- ❌ 只是复用一个纯函数：直接写 utility function 就好

- ❌ hook 内部逻辑强耦合某个页面的 UI 结构：可能该用组件/组合而不是 hook

- ❌ 你还没搞清楚边界：先写在组件里，稳定后再抽取（避免过早抽象）

📚 References

- React Docs — Reusing Logic with Custom Hooks: https://react.dev/learn/reusing-logic-with-custom-hooks

- React Docs — useCallback: https://react.dev/reference/react/useCallback

- MDN — AbortController (cancel fetch): https://developer.mozilla.org/en-US/docs/Web/API/AbortController

🧒 ELI5

中文：Custom Hook 就像把“做菜步骤”写成一个固定食谱。以后每次要做同样的菜（同样的逻辑），你就直接用这份食谱，而不是每次都从头想一遍。

English: A custom hook is a reusable recipe for state + side effects. You call the recipe in different components instead of rewriting the steps each time.

🤖 AI

🤖 AI Day 14

LoRA & QLoRA — Efficient Fine-Tuning

> Mode: CONCEPT · Category: Training · Read time: 2 min

🧠 直觉解释 / Intuition

中文：

全量微调（full fine-tuning）像是把整本教科书都重写一遍：效果可能好，但成本极高（显存/时间/存储），而且不小心会“改坏”模型的通用能力。

LoRA（Low-Rank Adaptation）更像是：

- 原模型权重冻结不动（不重写教科书）

- 只加一层“薄薄的可训练适配器”来改变模型行为（像贴便签/补丁）

QLoRA 则是在 LoRA 基础上再做一步：

- 把基座模型量化（比如 4-bit）来极大降低显存占用

- 仍然用 LoRA 训练小适配器，从而让你在更小的 GPU 上也能做高质量微调

English:

Full fine-tuning is rewriting the whole book: powerful but expensive and risky. LoRA freezes the base model and learns small low-rank “adapter” matrices (patches). QLoRA further quantizes the base model (e.g., 4-bit) to cut VRAM dramatically while still training LoRA adapters.

⚙️ 它是怎么工作的 / How it works

中文：

在 Transformer 里，大量参数集中在注意力/FFN 的线性层（比如 W）。LoRA 把权重更新 ΔW 表示为两个小矩阵的乘积：

- ΔW = A · B，其中 A、B 的秩（rank）很小（r ≪ d）

- 训练时只更新 A、B（参数量从 O(d²) 变成 O(2·d·r)）

- 推理时可以把 ΔW 合并回 W（不增加太多推理开销）

QLoRA：

- 基座权重用 4-bit NF4 等量化方式存储

- 训练时用更高精度（比如 bfloat16）在计算路径中做补偿（常见做法是 double quantization 等技巧）

English:

LoRA factorizes the weight update as ΔW = A·B with small rank r. You train only A and B (far fewer parameters). QLoRA quantizes the base weights (often 4-bit) and trains LoRA adapters on top, using careful compute dtypes/quantization tricks to keep quality.

✅ 什么时候用 / When to use

- 中文：

- 你想针对“特定任务/风格/领域语料”提升效果，但预算有限

- 你希望可控地“加能力”，并且随时能切换不同 adapter（一个基座多个 LoRA）

- 你需要在单卡/小显存环境训练

- English:

- You need task/domain/style adaptation on a budget

- You want modular adapters you can swap (one base, many LoRAs)

- You’re constrained by VRAM (single GPU / smaller GPUs)

🧪 可运行示例（≤15 行）/ Runnable snippet (≤15 lines)

> 下面示例展示“加载 LoRA adapter”的最小思路（训练通常更长、代码更多）。


# pip install -U transformers peft torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = "gpt2"  # demo base model
lora_path = "./my_lora_adapter"  # your saved LoRA adapter folder

tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, lora_path)

prompt = "Write a short product update:"
print(tok.decode(model.generate(**tok(prompt, return_tensors="pt"), max_new_tokens=40)[0]))

📚 References

- LoRA paper (arXiv): https://arxiv.org/abs/2106.09685

- QLoRA paper (arXiv): https://arxiv.org/abs/2305.14314

- Hugging Face PEFT docs: https://huggingface.co/docs/peft/index

🧒 ELI5

中文：

LoRA 就像给机器人加一副“可拆卸的小眼镜”，不改它原来的大脑，只训练这副眼镜让它更擅长某件事。QLoRA 则是把机器人的大脑“压缩存起来”，省空间省钱，但仍能换不同眼镜来学习新技能。

English:

LoRA is a small attachable add-on that changes behavior without rewriting the whole brain. QLoRA compresses the brain to save memory, then still learns those small add-ons.

byte-by-byte — 2026-03-29

Sun, 29 Mar 2026 12:00:00 +0000

Week Review

📅 Week in Review — Week 3 (10 min read)

📊 Day 13/150 · NeetCode: 13/150 · SysDesign: 12/40 · Behavioral: 12/40 · Frontend: 12/50 · AI: 5/30

🔥 13-day streak!

🗓️ 本周回顾 / This Week's Journey

本周是第 3 周（第 8–13 天），阶段从 Foundation 迈入 Growth，难度明显上台阶。

Week 3 covered Days 8–13, bridging Foundation → Growth phase with a noticeable difficulty step-up.

| 日期 / Date | 亮点 / Highlight |

|---|---|

| Mon 3/23 · Day 8 | 数据库索引 B-Tree + N+1 查询；CSS Animations & Transitions；模糊需求处理 |

| Tue 3/24 · Day 9 | 数据库复制 & 分片（Replication / Sharding）；算法切换双指针模式（#125 Valid Palindrome）；AI 大新闻：Agentic AI、百万 Token 上下文 |

| Wed 3/25 · Day 10 | 📝 复习日：回顾 Day 6-9 全部内容，巩固薄弱点 |

| Thu 3/26 · Day 11 | 一致性哈希（Consistent Hashing）+ 虚拟节点；双指针 #167 Two Sum II；React useEffect；主动发现问题的 behavioral |

| Fri 3/27 · Day 12 | CAP 定理 & 最终一致性（Growth Phase 正式开始）；3Sum（双指针 3/5）；React useRef；RLHF 深度讲解；优先级排序 behavioral |

| Sat 3/28 · Day 13 | 🔬 深挖：消息队列 & 事件驱动架构（15 min read，含 Kafka 实战代码）|

🧠 系统设计要点 / System Design: Key Takeaways

本周系统设计从"单机"向"分布式"跃进，三个核心概念构成一套完整的分布式基础：

This week's system design made the leap from single-machine to distributed, forming a complete distributed foundation:

1. 数据库索引 → 复制 → 分片 / Indexing → Replication → Sharding

- 索引解决单机查询速度（B-Tree O(log N) vs 全表扫 O(N)）

- 复制解决读扩展 + 高可用（多副本分担读流量）

- 分片解决写扩展 + 存储规模（每个 shard 只负责一部分数据）

- 三者是递进关系：先索引，再复制，实在撑不住再分片

2. 一致性哈希 / Consistent Hashing

- 普通哈希 key % N：增减节点触发 90% 数据重分配 ❌

- 一致性哈希：增减节点只影响 ~1/N 数据 ✅

- 核心武器：虚拟节点（每个物理节点 150-200 个虚拟位置），解决负载不均

- 真实应用：DynamoDB、Cassandra、Memcached

3. CAP 定理 / CAP Theorem

- 分布式系统三选二：Consistency（一致）、Availability（可用）、Partition Tolerance（分区容错）

- 网络分区不可避免，实际是 CP vs AP 的取舍

- CP：金融、支付、计数器（HBase、ZooKeeper）

- AP + 最终一致性：社交 feed、购物车、点赞（Cassandra、DynamoDB）

- 面试技巧：先问业务对一致性的要求，再决定选型

本周连接点 / The thread connecting them all:

索引 → 复制 → 分片 → 一致性哈希 → CAP，是任何大规模数据库系统设计的完整脉络。消息队列（Day 13）则是在此基础上，解耦服务间的依赖，进一步提升系统韧性。

💻 算法模式 / Algorithms: Patterns Mastered

本周完成双指针模式 5 题中的 3 题（进度 3/5）：

Completed 3 of 5 Two Pointers problems this week:

| 题目 | 核心变化 | 关键洞察 |

|---|---|---|

| #125 Valid Palindrome | 跳过非字母数字 | 双指针"对比"而非求和；Space O(1) 优于 O(n) |

| #167 Two Sum II | 有序数组求和 | 排序后才能确定性地收缩：sum < target → left++，sum > target → right-- |

| #15 3Sum | 三数之和为零 | 外层固定一个数，内层双指针；排序后跳过重复解 |

模式识别信号 / When to reach for Two Pointers:

> ✅ 有序数组 · ✅ 找配对/三元组 · ✅ 回文检测 · ✅ 原地操作 · ✅ 要求 O(1) 空间

下两题预告 / Coming next:

- #11 Container With Most Water — 移动较短边指针，最大化面积

- #42 Trapping Rain Water — 最复杂变体，维护左右最大高度

🗣️ 软技能练习 / Soft Skills: What to Practice

本周覆盖了 5 个核心行为面试主题（Day 8–12），全部 Senior/Staff 级别：

5 behavioral interview topics covered, all targeting Senior/Staff level:

| 主题 | STAR 核心动作 | 需要强化的点 |

|---|---|---|

| 模糊需求 | 写假设文档、区分可逆/不可逆决策 | 结果要量化（"避免了 1.5 周返工"） |

| 向上推回 | 先跑负载测试量化风险，再提替代方案 | 带数据 + 替代方案是关键；不要空手说"不行" |

| 主动发现问题 | 复现 bug → 量化风险 → 独立推进修复 | 系统性主动（建立监控体系）比偶发性主动更有说服力 |

| 优先级排序 | Impact × Urgency 矩阵；把技术连接到业务目标 | 明确说出"选 A 意味着 B 推迟到 X，风险是 Y" |

| 处理交付坏消息 | 提前同步，给选项，聚焦解决方案 | 不要等到最后一刻才上报风险 |

综合练习建议 / Practice focus:

用 STAR 框架为每个主题准备 1-2 个真实故事，重点量化结果。"主动发现问题"和"优先级排序"是最容易被考到但准备不足的两个主题。

🎨 前端巩固 / Frontend: Concepts to Lock In

本周 React Hooks 三件套全部覆盖：

All three core React Hooks covered this week:

useState → useEffect → useRef — 递进关系：


useState: 需要 UI 更新时存状态（触发重渲染）
useEffect: 需要同步副作用（数据获取、订阅、DOM 操作）
useRef: 需要持久值但不想触发重渲染（DOM 引用、timer ID）

高频考点 / High-frequency interview traps:

1. useState 批量更新：setCount(count+1) 三次 → count 只 +1；用 prev => prev + 1 才能累加 ✅

2. useEffect 清理：return 里关闭 WebSocket/清除 timer，否则内存泄漏 ✅

3. CSS transition vs animation：transition 需要触发（hover/JS），animation 自动运行 ✅

4. 动画性能：优先 transform + opacity（GPU），避免 width/height/margin（触发 reflow）✅

自查问题 / Quick self-check:

- useEffect 的依赖数组为空 [] 和不传有什么区别？

- useRef 的值改变了，组件会重渲染吗？

- CSS display: none 可以加 transition 吗？

🤖 AI 知识点 / AI: What Stuck

本周 AI 内容从"行业动态"到"技术机制"都有覆盖：

AI content ranged from industry trends to technical mechanisms:

最重要的技术概念 / Most important technical concept:

RLHF — ChatGPT 是怎么学会"有用"的：

1. SFT（监督微调）：在人类示范答案上微调基础模型，学格式和风格

2. 奖励模型（RM）训练：收集人类对多个回答的排序偏好，训练打分器

3. PPO 强化学习：模型生成 → RM 打分 → 更新策略（同时用 KL 散度约束偏移）

正在替代 PPO 的新方法：DPO（Direct Preference Optimization）— 更简单，无需显式奖励模型。

本周行业信号 / Industry signals:

- Agentic AI 从"生成文本"→"自主完成任务"，GPT-5.4 已实现原生电脑操作（GUI）

- 76% 企业未准备好支持 AI Agent → 未来 1-3 年最值钱的技能：设计与 AI Agent 协作的系统架构（清晰 API、幂等操作、可审计工作流）

- 上下文窗口突破 100 万 token，某些场景 long-context 比构建向量数据库更简单准确

⚠️ 需要复习的内容 / What to Review

优先级排序（从弱到强）：

| 🔴 最需要复习 | 具体建议 |

|---|---|

| CAP 定理 + PACELC | 能否在 3 句话内解释 CP vs AP 的区别，并各举一个真实系统？DynamoDB 如何在同一系统里提供可调一致性？ |

| 一致性哈希虚拟节点 | 为什么虚拟节点数少了会有热点？能在白板上画出完整的哈希环 + 节点故障时的数据迁移吗？ |

| 3Sum 去重逻辑 | 能手写完整解法吗？排序后如何跳过重复的 nums[i]、nums[left]、nums[right]？ |

| 🟡 值得巩固 | 具体建议 |

|---|---|

| 消息队列：Kafka vs RabbitMQ | 什么时候选 Kafka（高吞吐、回放），什么时候选 RabbitMQ（复杂路由、短生命周期消息）？ |

| useEffect 依赖数组 | 写 3 个不同 useEffect 示例，分别对应"mount once"、"每次 render"、"deps 变化时" |

| STAR 故事量化 | 为"主动发现问题"和"优先级排序"各准备一个有具体数字的故事 |

🏆 本周亮点 / Win of the Week

成功跨越 Foundation → Growth 阶段！

第 11 天开始 Growth Phase，内容难度明显上升（一致性哈希、CAP、3Sum、RLHF），但节奏保持稳定。最值得庆祝的是：Saturday Deep Dive（消息队列 & 事件驱动架构）内容深度和代码质量都达到了 senior 面试的实战水准——从 Kafka producer/consumer 配置，到幂等性设计，到死信队列，一篇顶得上市面上很多付费课程的一章。

🔥 13 天，从未间断。系统设计的脉络正在形成：网络 → HTTP → 负载均衡 → 缓存 → 数据库（索引→复制→分片→一致性哈希→CAP）→ 消息队列。这不是孤立的知识点，这是在构建一张完整的系统设计地图。

Bridged Foundation → Growth phase without breaking stride. The Saturday Deep Dive on message queues hit senior-interview depth. More importantly, a coherent system design map is forming — 13 days, zero breaks.

🎯 下周预告 / Next Week Preview

基于当前进度（SysDesign: 12/40 · Algorithms: 13/150 · Behavioral: 12/40 · Frontend: 12/50 · AI: 5/30）：

| 模块 | 下周内容 |

|---|---|

| 🏗️ 系统设计 | #13 CDN & 边缘计算 · #14 速率限制（Rate Limiting）· #15 SQL vs NoSQL 深度对比 |

| 💻 算法 | 双指针收官：#11 Container With Most Water · #42 Trapping Rain Water · 开始滑动窗口模式 |

| 🗣️ 软技能 | 持续 Growth Phase：从影响力、跨团队协作到最终 behavioral |

| 🎨 前端 | React Context · 自定义 Hook · 性能优化（useMemo/useCallback 实战） |

| 🤖 AI | AI News Roundup · 更多 AI 概念深挖 |

本周学到的，下周就会用到：CAP 定理 → 在设计 CDN/速率限制时你需要选择 CP 还是 AP；消息队列 → 在任何通知系统设计中都会出现；双指针 → Container With Most Water 是下周双指针收官题，比 3Sum 还更直接地考查"移动哪个指针"的决策逻辑。

加油，下周见！💪

byte-by-byte — 2026-03-28

Sat, 28 Mar 2026 12:00:00 +0000

🏗️ System Design

Saturday Deep Dive

Today is Saturday — content delivered in the deep dive issue.

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

💻 Algorithms

Saturday Deep Dive

Today is Saturday — content delivered in the deep dive issue.

🗣️ Soft Skills

Saturday Deep Dive

Today is Saturday — content delivered in the deep dive issue.

🎨 Frontend

Saturday Deep Dive

Today is Saturday — content delivered in the deep dive issue.

🤖 AI

Saturday Deep Dive

Today is Saturday — content delivered in the deep dive issue.

Deepdive

🔬 Saturday Deep Dive: Message Queues & Event-Driven Architecture (15 min read)

📊 Day 13/150 · NeetCode: 12/150 · SysDesign: 12/40 · Behavioral: 12/40 · Frontend: 12/50 · AI: 5/30

🔥 Keep the streak alive!

Overview / 概述

中文：

消息队列是现代分布式系统的血管。当两个服务需要通信，但你不希望它们紧密耦合在一起时，消息队列就登场了。从 WhatsApp 的消息投递，到 Uber 的行程分配，再到你下单后收到的确认邮件——几乎每个大型系统背后都有消息队列在默默支撑。

English:

Message queues are the circulatory system of modern distributed architecture. They allow services to communicate asynchronously — decoupling producers from consumers so each can scale, fail, and recover independently. If you've ever placed an Amazon order and received a confirmation email seconds later without the checkout page hanging, you've experienced event-driven architecture in action.

Why it matters in interviews: This topic appears in ~80% of senior-level system design interviews. "Design Twitter", "Design Uber", "Design a notification system" — all roads lead to message queues.

Part 1: Theory / 理论基础 (5 min)

核心问题：为什么需要消息队列？/ The Core Problem

中文：

想象一个在线零售系统。用户下单时，系统需要：

1. 扣减库存

2. 向支付服务收款

3. 通知仓库备货

4. 发送确认邮件

5. 更新用户积分

如果全部同步完成，任何一步失败都会导致整个请求失败。支付服务宕机了？用户收到 500 错误。邮件服务慢了？用户等 10 秒。这就是同步耦合的代价。

English:

In a synchronous world, the order service would call inventory → payment → warehouse → email → loyalty service, all in sequence. This creates:

- Temporal coupling: All services must be up at the same time

- Performance coupling: The slowest service determines the total latency

- Failure coupling: One failure cascades through the whole chain

Message queues break all three couplings.

核心概念 / Core Concepts

中文 → English glossary:

| 概念 | 英文 | 说明 |

|------|------|------|

| 生产者 | Producer | 发消息的服务 |

| 消费者 | Consumer | 收消息的服务 |

| 消息 | Message / Event | 传递的数据单元 |

| 队列 | Queue | 点对点：一条消息只被一个消费者消费 |

| 主题 | Topic | 发布/订阅：一条消息被多个消费者消费 |

| 消费者组 | Consumer Group | 多个消费者实例共享消费同一个 topic |

| 偏移量 | Offset (Kafka) | 消息在分区中的位置，消费者自己维护 |

| 确认 | Acknowledgment (ACK) | 消费者告诉队列"我已成功处理" |

| 死信队列 | Dead Letter Queue (DLQ) | 处理失败的消息的"最终归宿" |

两种核心模型 / Two Models

Model 1: Point-to-Point (Queue)


Producer → [Queue] → Consumer A  (Consumer B never sees this message)

- 每条消息只被消费一次

- 适合：任务分发、工作队列

- 代表：AWS SQS, RabbitMQ (default)

Model 2: Publish-Subscribe (Topic)


Producer → [Topic] → Consumer A
                   → Consumer B  
                   → Consumer C  (all three get the same message)

- 每条消息被所有订阅者消费

- 适合：事件通知、数据管道

- 代表：Apache Kafka, AWS SNS, Google Pub/Sub

主流技术对比 / Technology Comparison

中文： 面试中最常被问到的是 Kafka vs RabbitMQ。记住核心区别：

| 特性 | Apache Kafka | RabbitMQ / AWS SQS |

|------|-------------|-------------------|

| 模型 | Log-based (追加写日志) | Queue-based (消费后删除) |

| 消息保留 | 时间/大小限制（可重放） | 消费后删除（默认） |

| 吞吐量 | 极高 (百万级/秒) | 高 (万-十万级/秒) |

| 顺序保证 | 分区内有序 | 队列内有序 |

| 适合场景 | 流处理、日志、事件溯源 | 任务队列、RPC、复杂路由 |

| 消费模型 | Pull (消费者拉取) | Push (队列推送) |

English:

The key insight: Kafka is a distributed log, not a traditional queue. Messages stay on disk until a retention policy removes them. Consumers maintain their own offsets, enabling:

- Replay: Reprocess all events from the beginning

- Multiple independent consumers: Each consumer group gets full history

- Event sourcing: The log IS the source of truth

Part 2: Step-by-Step Implementation / 一步一步实现 (8 min)

场景：设计一个通知系统 / Scenario: Notification System Design

中文： 我们设计一个类似于电商平台的通知系统，支持邮件、短信、推送通知。

English: Design a notification system that can send email, SMS, and push notifications after key events (order placed, payment failed, shipment update).

Architecture Diagram


┌─────────────┐     ┌──────────────────────────────────┐
│  Order Svc  │────▶│         Kafka Topic              │
│  Payment Svc│────▶│    "notification-events"          │
│  Shipping   │────▶│                                  │
└─────────────┘     │  Partition 0: user_id % 3 == 0   │
                    │  Partition 1: user_id % 3 == 1   │
                    │  Partition 2: user_id % 3 == 2   │
                    └──────────┬───────────────────────┘
                               │
                    ┌──────────▼───────────────────────┐
                    │     Notification Consumer Group   │
                    │                                  │
                    │  Consumer 0 ──▶ Email Worker      │
                    │  Consumer 1 ──▶ SMS Worker        │
                    │  Consumer 2 ──▶ Push Worker       │
                    └──────────────────────────────────┘
                                        │
                               Failed messages
                                        │
                               ┌────────▼────────┐
                               │  Dead Letter     │
                               │  Queue (DLQ)     │
                               └─────────────────┘

Step 1: Define the Event Schema


# events.py — Define event contracts clearly
from dataclasses import dataclass
from typing import Optional
import json
import time

@dataclass
class NotificationEvent:
    event_id: str          # Unique ID for deduplication
    event_type: str        # "order.placed", "payment.failed", "shipment.updated"
    user_id: str           # Kafka partition key — ensures order for same user
    timestamp: float       # Unix timestamp
    payload: dict          # Event-specific data
    metadata: Optional[dict] = None  # Tracing info, retry count, etc.
    
    def to_json(self) -> bytes:
        return json.dumps({
            "event_id": self.event_id,
            "event_type": self.event_type,
            "user_id": self.user_id,
            "timestamp": self.timestamp,
            "payload": self.payload,
            "metadata": self.metadata or {}
        }).encode("utf-8")
    
    @classmethod
    def from_json(cls, data: bytes) -> "NotificationEvent":
        d = json.loads(data)
        return cls(**d)

Step 2: Producer — Publishing Events


# producer.py
from kafka import KafkaProducer
from kafka.errors import KafkaError
import logging

logger = logging.getLogger(__name__)

class NotificationProducer:
    def __init__(self, bootstrap_servers: list[str]):
        self.producer = KafkaProducer(
            bootstrap_servers=bootstrap_servers,
            # Durability: wait for all in-sync replicas to acknowledge
            acks="all",
            # Enable idempotent producer — prevents duplicate messages on retry
            enable_idempotence=True,
            # Retry up to 5 times on transient failures
            retries=5,
            # Batch messages for throughput (linger up to 10ms)
            linger_ms=10,
            batch_size=16384,  # 16KB batches
        )
        self.topic = "notification-events"
    
    def publish(self, event: NotificationEvent) -> bool:
        """Publish event; use user_id as partition key for ordering."""
        try:
            future = self.producer.send(
                self.topic,
                key=event.user_id.encode("utf-8"),  # Same user → same partition
                value=event.to_json(),
            )
            # Block until broker confirms receipt (with timeout)
            record_metadata = future.get(timeout=10)
            logger.info(
                f"Published {event.event_type} to partition "
                f"{record_metadata.partition} offset {record_metadata.offset}"
            )
            return True
        except KafkaError as e:
            logger.error(f"Failed to publish event {event.event_id}: {e}")
            # In production: send to a fallback DB for retry
            return False
    
    def close(self):
        # Flush remaining buffered messages before shutdown
        self.producer.flush()
        self.producer.close()

Step 3: Consumer — Processing with At-Least-Once Semantics


# consumer.py
from kafka import KafkaConsumer
from kafka.structs import OffsetAndMetadata, TopicPartition
import json, logging, time

logger = logging.getLogger(__name__)

class NotificationConsumer:
    def __init__(self, bootstrap_servers: list[str], group_id: str):
        self.consumer = KafkaConsumer(
            "notification-events",
            bootstrap_servers=bootstrap_servers,
            group_id=group_id,
            # Disable auto-commit — we commit AFTER successful processing
            # This guarantees at-least-once delivery
            enable_auto_commit=False,
            # If no committed offset, start from the earliest message
            auto_offset_reset="earliest",
            # Deserialize from JSON bytes
            value_deserializer=lambda b: json.loads(b.decode("utf-8")),
            key_deserializer=lambda b: b.decode("utf-8") if b else None,
        )
        self.dlq_producer = DLQProducer()  # Dead letter queue
    
    def process(self):
        """Main consume loop with manual offset commit."""
        for message in self.consumer:
            event_data = message.value
            event = NotificationEvent(**event_data)
            
            success = self._handle_with_retry(event, max_retries=3)
            
            if success:
                # Only commit offset after successful processing
                # This is the key to at-least-once semantics
                tp = TopicPartition(message.topic, message.partition)
                self.consumer.commit({tp: OffsetAndMetadata(message.offset + 1, None)})
            else:
                # Send to Dead Letter Queue for manual inspection/replay
                self.dlq_producer.send(event)
                # Still commit — we don't want to block the partition forever
                tp = TopicPartition(message.topic, message.partition)
                self.consumer.commit({tp: OffsetAndMetadata(message.offset + 1, None)})
    
    def _handle_with_retry(self, event: NotificationEvent, max_retries: int) -> bool:
        """Route event to appropriate handler with exponential backoff retry."""
        handler_map = {
            "order.placed": self._send_order_confirmation,
            "payment.failed": self._send_payment_alert,
            "shipment.updated": self._send_shipment_update,
        }
        
        handler = handler_map.get(event.event_type)
        if not handler:
            logger.warning(f"No handler for event type: {event.event_type}")
            return True  # Don't retry unknown events
        
        for attempt in range(max_retries):
            try:
                handler(event)
                return True
            except Exception as e:
                wait = (2 ** attempt)  # Exponential backoff: 1s, 2s, 4s
                logger.warning(f"Attempt {attempt+1} failed for {event.event_id}: {e}. Retrying in {wait}s")
                time.sleep(wait)
        
        return False  # All retries exhausted
    
    def _send_order_confirmation(self, event: NotificationEvent):
        # Call email service, SMS gateway, push notification service
        user_id = event.user_id
        order_id = event.payload["order_id"]
        # ... actual notification logic
        logger.info(f"Sent order confirmation to user {user_id} for order {order_id}")

Step 4: Idempotency — The Unsung Hero

中文： 消息队列保证 at-least-once 投递，这意味着同一条消息可能被处理两次。如果你的系统没有幂等性保证，用户可能收到两封确认邮件，或被扣款两次。

English: At-least-once delivery means the same message can arrive twice (e.g., consumer crashed after processing but before committing offset). Always design consumers to be idempotent.


# idempotency.py — Using Redis to track processed events
import redis

class IdempotentNotificationHandler:
    def __init__(self):
        self.redis = redis.Redis(host="localhost", port=6379, db=0)
        self.TTL = 86400  # 24 hours — enough to catch duplicates
    
    def process_event(self, event: NotificationEvent) -> bool:
        idempotency_key = f"processed:{event.event_id}"
        
        # SET NX = "set if not exists" — atomic check-and-set
        was_new = self.redis.set(
            idempotency_key,
            "1",
            ex=self.TTL,
            nx=True  # Only set if key doesn't exist
        )
        
        if not was_new:
            # This event was already processed — skip it
            logger.info(f"Duplicate event {event.event_id} — skipping")
            return True
        
        # First time seeing this event — process it
        return self._do_send_notification(event)

Part 3: Edge Cases & Gotchas / 边界情况 (2 min)

中文 + English:

1. 消费者重平衡 / Consumer Rebalancing

当消费者加入或离开消费者组时，Kafka 会触发重平衡，暂停所有消费。设计时要确保处理过程的原子性。

When consumers join/leave a group, Kafka triggers rebalancing — all consumption pauses. Use cooperative rebalancing (partition.assignment.strategy=CooperativeStickyAssignor) to minimize disruption.

2. 消息顺序保证 / Ordering Guarantees

Kafka 只在分区内保证顺序。跨分区无顺序。设计时用有意义的 key（如 user_id）分区，确保同一用户的事件落在同一分区。

Kafka only guarantees order within a partition. Use a meaningful partition key (user_id, order_id) so related events land on the same partition.

3. 消费者落后 / Consumer Lag

如果消费者处理速度跟不上生产速度，lag 会越来越大。监控 consumer lag 是运维必备。

Monitor consumer_lag metric. If lag grows unbounded, add more consumer instances (up to the number of partitions) or optimize the handler.

4. 大消息问题 / Large Message Problem

Kafka 默认最大消息 1MB。发大文件？把内容存 S3，消息里只传引用。

Default max message size is 1MB. For large payloads: store in S3/GCS, put the reference URL in the message. Never put binary blobs in queues.

5. 时钟偏移 / Clock Skew

不要用消息里的 timestamp 做业务逻辑。生产者时钟可能偏移。

Don't use event timestamps for business logic ordering — clocks drift. Use Kafka's broker-assigned offset for true ordering.

Part 4: Real-World Application / 实际应用 (2 min)

中文： 真实系统里的消息队列：

English: How this looks in production systems:

LinkedIn (Kafka 的诞生地 / Kafka's birthplace)

- LinkedIn 开源了 Kafka，最初用于处理 activity stream（用户点击、浏览等）

- 现在每天处理超过 7 万亿条消息

- 参考: https://engineering.linkedin.com/kafka/kafka-linkedin-current-and-future

Uber — 行程派单 / Ride Dispatch

- 司机位置每秒上报 → Kafka topic → 派单引擎消费

- 消息量：峰值每秒百万级

- 参考: https://www.uber.com/blog/reliable-reprocessing/

Stripe — 支付事件 / Payment Events

- 每笔支付产生 N 个事件（授权、捕获、退款等）

- 各个下游服务（风控、财务、通知）独立消费

- 保证每个事件至少被每个服务消费一次

- 参考: https://stripe.com/blog/message-queues

Netflix — 实时数据管道 / Real-Time Pipeline

- 视频观看数据 → Kafka → 推荐系统、A/B 测试分析、计费

- 参考: https://netflixtechblog.com/keystone-real-time-stream-processing-platform-a3ee651812a

面试提示 / Interview Pattern:

> 每当你的设计里有一个服务需要触发多个下游服务，或者需要异步处理、削峰填谷、解耦，就应该加消息队列。

> Whenever your design has one service that needs to trigger multiple downstream services, or you need async processing, buffering, or decoupling — reach for a message queue.

Part 5: Interview Simulation / 面试模拟 (3 min)

中文： 以下是面试中最常见的追问，以及简洁回答思路。

English: The 5 most common follow-up questions:

Q1: 如何保证消息不丢失？/ How do you guarantee no message loss?

> Producer side: Set acks=all (wait for all in-sync replicas) + enable_idempotence=True.

> Broker side: Set min.insync.replicas=2 to require at least 2 replicas to acknowledge.

> Consumer side: Disable auto-commit; manually commit only after successful processing.

> Result: At-least-once delivery. Accept that duplicates can happen, design consumers to be idempotent.

Q2: Kafka vs SQS，你会怎么选？/ Kafka vs SQS, how do you choose?

> - Choose Kafka when: you need message replay, multiple independent consumers, very high throughput (>100K/s), stream processing, or event sourcing.

> - Choose SQS when: you want fully managed simplicity, visibility timeout semantics, built-in DLQ, and don't need replay or complex routing. Great for task queues.

> - Rule of thumb: SQS for task queues, Kafka for data pipelines and event streams.

Q3: 如何处理消费者崩溃？/ What happens when a consumer crashes?

> With manual offset commit disabled (enable_auto_commit=False):

> 1. Consumer crashes after processing but before committing → same message redelivered to next consumer in group

> 2. Consumer crashes mid-batch → entire batch redelivered

> This is why idempotency is non-negotiable. Use Redis or DB to track processed_event_ids.

Q4: 消息队列如何帮助削峰填谷？/ How does a queue help with traffic spikes?

> Queue acts as a buffer. During Black Friday, order service produces 100K orders/minute. Payment service can only process 10K/minute. Without a queue, payment service crashes under load. With a queue, orders buffer up; payment service consumes at its own pace. Users see "order received" immediately, payment confirms asynchronously. The queue absorbs the spike.

Q5: 如果消费者处理一直失败怎么办？/ What if a message keeps failing?

> Dead Letter Queue (DLQ) pattern:

> 1. Set max_delivery_attempts = 3 (or retry in consumer code)

> 2. After N failures, move message to DLQ topic

> 3. DLQ messages trigger alerts to on-call engineer

> 4. Engineer investigates, fixes the bug, then replays DLQ messages back to the original topic

> Never silently drop messages. Always have a DLQ.

下周六继续深挖！Next Saturday: we'll go deeper on Kafka internals — partitions, replication, and the ISR mechanism. 🚀

References / 参考资料:

- 📖 Confluent Kafka documentation: https://docs.confluent.io/platform/current/kafka/introduction.html

- 📖 Designing Data-Intensive Applications (Chapter 11 — Stream Processing) by Martin Kleppmann

- 🎥 Hussein Nasser — Kafka Deep Dive: https://www.youtube.com/watch?v=R873BlNVUqQ

- 📖 AWS SQS vs Kafka comparison: https://aws.amazon.com/compare/the-difference-between-sqs-and-kafka/

- 📖 Uber reliable reprocessing: https://www.uber.com/blog/reliable-reprocessing/

byte-by-byte — 2026-03-27

Fri, 27 Mar 2026 12:00:00 +0000

🏗️ System Design

🏗️ 系统设计 Day 11 / System Design Day 11

CAP 定理与最终一致性 / CAP Theorem & Eventual Consistency

> 难度 / Difficulty: Intermediate | 阶段 / Phase: Growth | 预计阅读时间 / Read time: 3 min

🌍 真实场景 / Real-World Scenario

想象你在设计 Twitter（现 X）的点赞系统。用户遍布全球，分布在北美、欧洲、亚洲的数据中心。当网络故障发生时，你必须做一个选择：

要么继续接受点赞写入（可能导致各地数据不一致）；

要么拒绝所有写入（保证数据一致，但系统不可用）。

这就是 CAP 定理的核心困境。

Imagine you're designing Twitter's like system, with users spread across data centers in North America, Europe, and Asia. When a network partition occurs, you face a hard choice:

Either keep accepting like writes (risking inconsistent counts across regions),

or reject all writes (keeping data consistent, but making the system unavailable).

This is the core dilemma of the CAP theorem.

📐 CAP 定理解释 / CAP Theorem Explained

CAP 定理由 Eric Brewer 在 2000 年提出，它说：分布式系统最多只能同时满足以下三项中的两项：

CAP theorem (Brewer, 2000) states: a distributed system can guarantee at most 2 of these 3 properties simultaneously:

| 属性 | 英文 | 解释 |

|------|------|------|

| C | Consistency | 所有节点在同一时刻看到相同数据 / All nodes see same data at same time |

| A | Availability | 每个请求都收到响应（非错误）/ Every request gets a response (non-error) |

| P | Partition Tolerance | 网络分区时系统仍继续运行 / System works despite network partitions |

> ⚠️ 关键洞察： 在真实分布式系统中，网络分区（P）是不可避免的。所以你实际上是在 CA（一致性 vs 可用性） 之间做取舍。

🏛️ ASCII 架构图 / Architecture Diagram


正常状态 / Normal State:
┌──────────┐         ┌──────────┐         ┌──────────┐
│  Node A  │◄───────►│  Node B  │◄───────►│  Node C  │
│ likes=42 │         │ likes=42 │         │ likes=42 │
└──────────┘         └──────────┘         └──────────┘
      ▲
  user writes

网络分区 / Network Partition:
┌──────────┐    ✗    ┌──────────┐         ┌──────────┐
│  Node A  │ BROKEN  │  Node B  │◄───────►│  Node C  │
│ likes=45 │         │ likes=42 │         │ likes=42 │
└──────────┘         └──────────┘         └──────────┘
(user wrote 3 more)   (partition: can't sync)

CP 选择 (e.g. HBase, Zookeeper):   AP 选择 (e.g. Cassandra, DynamoDB):
→ 拒绝 Node B/C 的写入               → 允许各自独立写入
→ 数据一致，但不可用                  → 可用，但数据暂时不一致

⚖️ 关键权衡 / Key Tradeoffs

CP 系统（一致性 + 分区容错）

为什么这样设计？ 需要强一致性的场景，例如金融交易、库存系统。

- ✅ 数据永远一致，不会有脏读

- ❌ 网络分区时部分节点不可用

- 📦 代表：HBase, MongoDB (default), ZooKeeper, etcd

AP 系统（可用性 + 分区容错）

为什么这样设计？ 高可用性更重要，短暂不一致可接受，例如社交 feed、购物车。

- ✅ 系统始终响应，用户体验好

- ❌ 不同节点可能返回不同结果（最终一致性）

- 📦 代表：Cassandra, DynamoDB, CouchDB, DNS

最终一致性 / Eventual Consistency


时间线 Timeline:

t=0  User writes likes=45 to Node A
t=1  Node A is isolated (partition)
t=2  User reads from Node B → gets 42 (stale!)
t=5  Partition heals, nodes sync
t=6  User reads from Node B → gets 45 ✅ (eventually consistent)

最终一致性 并非"随机错误"，而是"在网络恢复后，所有副本最终达到相同状态"。

🚫 常见错误 / Common Mistakes

别踩这个坑：

1. 误解 CAP 是固定选择 — 现代系统（如 DynamoDB）允许你按操作级别调整一致性（ConsistencyLevel: QUORUM vs ONE），而不是全局二选一。

2. 把 CA 作为选项 — 不存在真正的"CA 系统"，因为没有网络分区容错的系统根本不是分布式系统。

3. 忽视 PACELC 扩展 — CAP 只描述分区时的行为，[PACELC 模型](https://en.wikipedia.org/wiki/PACELC_theorem)还考虑了正常运行时的延迟 vs 一致性权衡，更全面。

4. 混淆最终一致性和弱一致性 — 最终一致性保证"最终会对齐"；弱一致性不做任何保证。

🔍 面试重点 / Interview Focus

当面试官问"你会如何设计 X 系统"时，主动提出 CAP：

> "这取决于一致性需求。如果是金融交易，我会选 CP；如果是社交 feed，AP + 最终一致性更合适，因为用户短暂看到旧数据不是大问题。"

📚 参考资料 / References

- 🔗 [CAP Theorem Explained — IBM](https://www.ibm.com/topics/cap-theorem)

- 🔗 [Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services](https://users.ece.cmu.edu/~adrian/731-sp04/readings/GL-cap.pdf)

- 🔗 [PACELC theorem — Wikipedia](https://en.wikipedia.org/wiki/PACELC_theorem)

- 🔗 [Cassandra vs MongoDB — Consistency Model Comparison](https://cassandra.apache.org/doc/latest/cassandra/architecture/dynamo.html)

🧒 ELI5（像我5岁一样解释）

想象你和好朋友各有一本记事本，记录班里同学的生日。你们约定互相抄写更新。

- CP：如果你们之间的电话断了，就不写新内容，直到联系上为止（一致，但暂停工作）

- AP：各自继续记录，电话修好后再对比合并（继续工作，但暂时可能不一样）

大多数社交网站选择 AP：你的点赞数可能偶尔显示"旧数据"，但网站从不宕机。

💻 Algorithms

💻 算法 Day 12 / Algorithms Day 12

#15 3Sum — Medium — Two Pointers (3/5)

> 难度 / Difficulty: 🟡 Medium | 阶段 / Phase: Growth | 预计时间 / Read time: 4 min

🧩 双指针模式 (3/5) — 继续 Day 9 引入的模版

Building on the 双指针模式 / Two Pointers template from Day 9 (Valid Palindrome).

模式回顾 / Pattern Recap:


left, right = 0, len(arr) - 1
while left < right:
    total = arr[left] + arr[right]
    if total == target: return [left, right]
    elif total < target: left += 1
    else: right -= 1

本模块问题 / Block Problems:

1. ✅ #125 Valid Palindrome (Easy) — Day 9

2. ✅ #167 Two Sum II (Medium) — Day 11

3. 👉 #15 3Sum (Medium) — TODAY

4. 🔜 #11 Container With Most Water (Medium)

5. 🔜 #42 Trapping Rain Water (Hard)

今天的变化 / Today's Twist: 从找"2个数之和"升级到找"3个数之和为0"，需要先固定一个数，再用双指针扫余下部分。

🔗 题目链接 / Links

- 📝 [LeetCode #15 — 3Sum](https://leetcode.com/problems/3sum/)

- 📹 [NeetCode — 3Sum Solution](https://neetcode.io/problems/three-integer-sum)

🌍 真实场景类比 / Real-World Analogy

想象你在整理一箱重量不等的砝码，你想找到三个砝码，使它们的重量加起来恰好为零（一正一负加中间值）。

逐一穷举三个砝码的所有组合是 O(n³)，太慢了。如果先把砝码按重量排序，固定最左边的砝码，然后用左右两个指针扫剩余部分，就能降到 O(n²)。

Imagine sorting weights in a box and finding three that sum to zero. Brute force is O(n³). Sort them, fix the leftmost, and use two pointers for the rest → O(n²).

📋 问题描述 / Problem

Given an integer array nums, return all the triplets [nums[i], nums[j], nums[k]] such that:

- i != j, i != k, j != k

- nums[i] + nums[j] + nums[k] == 0

The solution set must not contain duplicate triplets.


Input:  nums = [-1, 0, 1, 2, -1, -4]
Output: [[-1, -1, 2], [-1, 0, 1]]

🗺️ 映射到模版 / Mapping to Template

核心思路： 排序后，外层遍历固定 nums[i]，内层用双指针找 nums[left] + nums[right] == -nums[i]。


Fixed:   nums[i] = -1   target for inner = 0 - (-1) = 1
Array:   [-4, -1, -1, 0, 1, 2]  (sorted)
              i  L        R
              
Step 1: left=-1, right=2 → sum=1 ✅ found! → skip duplicates
Step 2: left=0,  right=1 → sum=1 ✅ found!
Step 3: left >= right → stop inner loop

🐍 Python 解法 + 逐行追踪 / Solution + Trace


def threeSum(nums: list[int]) -> list[list[int]]:
    nums.sort()                           # [-4, -1, -1, 0, 1, 2]
    result = []
    
    for i in range(len(nums) - 2):        # fix the first element
        if nums[i] > 0:                   # sorted: if first > 0, no solution
            break
        if i > 0 and nums[i] == nums[i-1]:  # skip duplicates for i
            continue
        
        left, right = i + 1, len(nums) - 1  # two pointers for the rest
        
        while left < right:
            total = nums[i] + nums[left] + nums[right]
            
            if total == 0:
                result.append([nums[i], nums[left], nums[right]])
                # skip duplicates for left and right
                while left < right and nums[left] == nums[left + 1]:
                    left += 1
                while left < right and nums[right] == nums[right - 1]:
                    right -= 1
                left += 1
                right -= 1
            elif total < 0:
                left += 1   # need larger sum
            else:
                right -= 1  # need smaller sum
    
    return result

# Trace with nums = [-1, 0, 1, 2, -1, -4]:
# After sort: [-4, -1, -1, 0, 1, 2]
# i=0: nums[i]=-4, left=1(-1), right=5(2) → sum=-3 → left++
#       left=2(-1), right=5(2) → sum=-3 → left++
#       left=3(0),  right=5(2) → sum=-2 → left++
#       left=4(1),  right=5(2) → sum=-1 → left++
#       left>=right → stop
# i=1: nums[i]=-1, left=2(-1), right=5(2) → sum=0 ✅ append [-1,-1,2]
#       skip dups → left=3(0), right=4(1) → sum=0 ✅ append [-1,0,1]
#       left>=right → stop
# i=2: nums[i]=-1 == nums[i-1]=-1 → skip (duplicate!)
# i=3: nums[i]=0, left=4(1), right=5(2) → sum=3 → right--
#       left>=right → stop
# Result: [[-1,-1,2], [-1,0,1]] ✅

时间复杂度 / Time Complexity: O(n log n) sort + O(n²) = O(n²)

空间复杂度 / Space Complexity: O(1) extra (excluding output)

⚡ 与模版的关键差异 / Key Differences from Template

| | Two Sum II (Day 11) | 3Sum (Today) |

|--|--|--|

| 目标 | 找2个数之和 = target | 找3个数之和 = 0 |

| 结构 | 单层双指针 | 外层 for + 内层双指针 |

| 去重 | 不需要 | 必须跳过重复元素 |

| 复杂度 | O(n) | O(n²) |

🔁 举一反三 / Pattern Connections

- #11 Container With Most Water (下一题): 同样外层遍历 + 内层双指针，但优化目标不同（最大面积 vs 零和）

- #42 Trapping Rain Water (最难题): 双指针 + 边界最大值，是本模式的终极形态

- 变体： 4Sum (#18) = 再加一层 for 循环 → O(n³)，同样思路

📚 参考资料 / References

- 🔗 [LeetCode #15 — 3Sum](https://leetcode.com/problems/3sum/)

- 🔗 [NeetCode Video Solution](https://neetcode.io/problems/three-integer-sum)

- 🔗 [Two Pointers Pattern — LeetCode Explore](https://leetcode.com/explore/learn/card/array-and-string/205/array-two-pointer-technique/)

🧒 ELI5

想象你有一堆正数和负数的磁铁，你要找三块加起来刚好等于零。

先把它们从小到大排好，然后：拿起最左边的那块，再用两只手各从左右两边向中间夹。夹到了就记录下来，没夹到就根据总和太大还是太小来移动手。

这样就不用每三块都试一遍，快很多！

🗣️ Soft Skills

🗣️ 软技能 Day 11 / Soft Skills Day 11

优先级排序 / Prioritization

当所有事情都"同样重要"时，你怎么决定做什么？

How do you decide what to work on when everything seems equally important?

> 级别 / Level: Senior/Staff | 类别 / Category: Prioritization | 阶段 / Phase: Growth | 预计时间 / Read time: 2 min

💡 为什么这个问题很重要 / Why This Matters

这是 Senior/Staff 工程师的核心能力之一。初级工程师执行任务；高级工程师决定做哪些任务。

面试官想知道：你是否会在信息不完整时做出理性决策，还是会陷入"什么都想做"或"等别人告诉你"的困境。

This is a core Senior/Staff engineer competency. Junior engineers execute tasks; senior engineers decide which tasks to execute.

The interviewer wants to know: can you make rational decisions with incomplete information, or do you get paralyzed?

⭐ STAR 框架拆解 / STAR Breakdown

Situation（背景）:

> "我加入新团队后第一个季度，我们同时有5个项目被标记为高优先级：两个客户承诺的功能、一个重要的性能优化、一个安全漏洞修复，还有一个技术债务重构。"

Task（任务）:

> "我需要帮助团队决定顺序，但没有人能直接告诉我哪个'真正'最重要。"

Action（行动）:

> "我用了一个简单框架：先问每项工作的影响范围（多少用户/多少收入受影响）和紧迫性（截止日期是硬性的吗），再评估依赖关系（哪项工作阻塞了其他事情）。安全漏洞虽然用户感知低，但是合规风险极高，我把它放第一位。客户承诺功能放第二，因为违约有商务影响。性能优化排第三，因为有量化的流失数据支撑。技术债务排最后，但我明确说明了不做的风险累积。"

Result（结果）:

> "这个排序被 PM 和工程 Lead 接受了，我们按序交付，没有出现返工或紧急插队。"

❌ 糟糕 vs ✅ 优秀回答 / Bad vs Good Answer

❌ 不好的回答：

> "我会先做最难的事情，这样其他事情就容易了。"

为什么不好？ 没有考虑外部影响，把个人技术偏好凌驾于业务价值之上。

✅ 好的回答结构：

1. 承认复杂性 — "当所有事情看起来都重要时，我的第一步是找出哪些是真正的约束条件（deadline、依赖、风险），而不是表面的紧迫感。"

2. 使用框架 — 提到 Impact × Urgency 矩阵、ICE (Impact/Confidence/Ease)，或 RICE 框架

3. 对齐业务目标 — 把技术工作连接到公司/团队目标

4. 明确沟通取舍 — 说出"如果我选择做 A，B 会推迟到 X 日期，风险是 Y"

🏆 Senior/Staff 级别加分项 / Senior/Staff Tips

- Don't just prioritize silently. 写下你的优先级排序并发给相关人员，这既是对齐，也是自我保护。

- "No" is a complete sentence, but explain the tradeoff. 当你说不做某事时，要量化"不做"的成本。

- Revisit priorities regularly. 每周或每冲刺开始时重新评估，因为情况会变。

- Separate urgency from importance. 紧急 ≠ 重要（艾森豪威尔矩阵）。很多"紧急"任务其实不重要。

📋 关键要点 / Key Takeaways

| 原则 | 说明 |

|------|------|

| 🎯 Impact First | 先问"谁会受益，影响多大？" |

| ⚡ Hard Deadlines | 区分"有人希望早完成"和"晚一天就违约" |

| 🔗 Unblock Others | 阻塞其他工程师的事情优先级隐性更高 |

| 🗣️ Communicate Tradeoffs | 说出你选择和放弃的理由 |

📚 参考资料 / References

- 🔗 [The Eisenhower Matrix — FarnamStreet](https://fs.blog/eisenhower-matrix/)

- 🔗 [RICE Scoring: A Better Way to Prioritize Your Product Roadmap — Intercom](https://www.intercom.com/blog/rice-simple-prioritization-for-product-managers/)

- 🔗 [Staff Engineer: Leadership beyond the management track — Will Larson](https://staffeng.com/book)

🧒 ELI5

想象你有一张写满作业的清单，数学、语文、体育都要交。

聪明的做法不是"随便挑"，而是先问：

1. 哪个明天就到期（硬截止日期）？

2. 哪个影响最多分数（重要性）？

3. 哪个不做会影响其他同学（依赖）？

然后按顺序做，同时告诉老师"我今天做 A 和 B，C 推迟到周五，原因是..."

🎨 Frontend

🎨 前端 Day 11 / Frontend Day 11

React useRef — React 的"逃生舱口" / Escape Hatch from React

> 类别 / Category: React Hooks | 周 / Week: 3 | 阶段 / Phase: Growth | 预计时间 / Read time: 2 min

🌍 真实场景 / Real Scenario

你在做一个视频播放器 dashboard，需要：

1. 当用户点击"开始录制"按钮时，聚焦到视频元素

2. 追踪一个内部计时器 ID（不需要触发重渲染）

3. 直接调用 DOM 元素的 .play() 方法

useState 不适合：每次更新都会触发重渲染，而且不能持有 DOM 引用。这时就需要 useRef。

You're building a video player dashboard and need to: focus the video element on button click, track a timer ID without triggering re-renders, and call .play() directly. useState would cause unnecessary re-renders. Enter useRef.

🧠 useRef 的两大用途 / Two Use Cases

用途 1：持有 DOM 引用 / Holding a DOM Reference


import { useRef } from 'react'

function VideoPlayer() {
  // Creates { current: null } — persists across renders
  const videoRef = useRef(null)
  
  const handlePlay = () => {
    // Direct DOM access — no React state involved
    videoRef.current?.play()
  }
  
  const handleFocus = () => {
    videoRef.current?.focus()
  }
  
  return (
    
      {/* Attach ref to DOM element via ref prop */}
      
      
      
    
  )
}

用途 2：持有可变值（不触发重渲染）/ Mutable Value (No Re-render)


function RecordingTimer() {
  const [isRecording, setIsRecording] = useState(false)
  
  // Stores timer ID — changing it does NOT cause a re-render
  const timerIdRef = useRef | null>(null)
  
  const startRecording = () => {
    setIsRecording(true)
    timerIdRef.current = setInterval(() => {
      console.log('Recording...')
    }, 1000)
  }
  
  const stopRecording = () => {
    setIsRecording(false)
    if (timerIdRef.current) {
      clearInterval(timerIdRef.current)
      timerIdRef.current = null
    }
  }
  
  return (
    
  )
}

🤔 猜猜输出什么？/ What Does This Output?


function Counter() {
  const [count, setCount] = useState(0)
  const renderCount = useRef(0)
  
  renderCount.current += 1  // increment on every render
  
  return (
    
      Count: {count}
      Renders: {renderCount.current}
      
    
  )
}
// After clicking +1 twice, what does "Renders" show?

A) 1（从不更新）

B) 2（只算点击）

C) 3（初始渲染 + 2次点击）

D) 0（ref 不触发重渲染）

答案 / Answer

C) 3 — renderCount.current 在每次渲染时递增。每次 setCount 触发重渲染，都会加 1。初始渲染时为 1，点击两次后为 3。注意：renderCount.current 的变化不会触发额外渲染，只是被动记录。

❌ 常见错误 vs ✅ 正确做法 / Common Mistakes vs Correct Approach


// ❌ WRONG: Reading ref value during render to display in UI
function BadComponent() {
  const countRef = useRef(0)
  countRef.current += 1
  
  // BUG: React may batch renders or run effects multiple times
  // This count will be unreliable in React 18 Strict Mode (double-invokes)
  return Rendered {countRef.current} times
}

// ✅ RIGHT: Use useState for values that should be displayed in UI
function GoodComponent() {
  const [renderCount, setRenderCount] = useState(0)
  
  useEffect(() => {
    setRenderCount(c => c + 1)
  })
  
  return Rendered {renderCount} times
}

// ❌ WRONG: Accessing ref.current during render before it's assigned
function BadRef() {
  const inputRef = useRef(null)
  console.log(inputRef.current?.value)  // null on first render!
  return 
}

// ✅ RIGHT: Access ref.current inside effects or event handlers
function GoodRef() {
  const inputRef = useRef(null)
  
  useEffect(() => {
    // ref is assigned after DOM mounts
    console.log(inputRef.current?.value)
  }, [])
  
  return 
}

🔀 何时用 useRef vs useState / When to Use useRef vs useState

| 场景 | 用哪个？ | 原因 |

|------|---------|------|

| 展示在 UI 里的值 | useState | 需要触发重渲染 |

| 计时器 ID / 请求 ID | useRef | 不需要显示，不需要重渲染 |

| DOM 元素访问 | useRef | 直接引用，React 管理 |

| 上一次渲染的值 | useRef | 跨渲染持久化，不触发渲染 |

| 追踪 isMounted | useRef | 副作用清理，不需要展示 |

📚 参考资料 / References

- 🔗 [useRef — React Official Docs](https://react.dev/reference/react/useRef)

- 🔗 [Referencing Values with Refs — React Learn](https://react.dev/learn/referencing-values-with-refs)

- 🔗 [Manipulating the DOM with Refs — React Learn](https://react.dev/learn/manipulating-the-dom-with-refs)

🧒 ELI5

useState 就像写在白板上的数字——每次改变，班里所有人都重新看一遍（重渲染）。

useRef 就像你口袋里的小纸条——你可以随时改上面的内容，但不会打扰到任何人（不触发重渲染）。

DOM ref 就更特别了：它就像一张"通行证"，让你可以直接敲响某个 DOM 元素的门，而不用通过 React 的"前台"。

🤖 AI

🤖 AI Day 12

RLHF — ChatGPT 是怎么学会"有用"的 / How ChatGPT Learned to Be Helpful

> 类别 / Category: Training | 模式 / Mode: Concept | 阶段 / Phase: Growth | 预计时间 / Read time: 2 min

💡 直觉解释 / Intuitive Explanation

预训练后的语言模型像一个"博学但任性的学生"——它什么都会说，但不一定有帮助、安全或符合人类期望。

RLHF（Reinforcement Learning from Human Feedback，基于人类反馈的强化学习） 就是让这个学生接受"社会化教育"的过程：通过收集人类对模型输出的偏好评分，训练模型生成更符合人类价值观的回答。

Pre-trained LLMs know a lot but aren't inherently helpful or safe. RLHF is the "socialization" step: collect human preferences on model outputs, then train the model to generate responses humans prefer.

⚙️ 工作原理 / How It Works

RLHF 分三个阶段：

阶段 1: 监督微调 (SFT) / Supervised Fine-Tuning


人类写示范回答
Input:  "解释黑洞"
Output: [人类写的高质量示范答案]

→ 直接在这些数据上微调基础模型
→ 模型学会"期望的格式和风格"

阶段 2: 训练奖励模型 (RM) / Train Reward Model


给模型同一个问题的多个回答，让人类排序：

Q: "如何减肥？"
回答A: "节食+运动"          ← 人类排 #1
回答B: "服用减肥药"          ← 人类排 #2  
回答C: "节食即可，不用运动"   ← 人类排 #3

奖励模型学习：给任意回答打分

阶段 3: PPO 强化学习 / RL with PPO


# Simplified PPO loop concept:
for prompt in training_prompts:
    response = policy_model.generate(prompt)    # current LLM
    reward = reward_model.score(response)        # RM gives score
    
    # Update policy to maximize reward,
    # but stay close to original (KL divergence penalty)
    loss = -reward + kl_penalty * kl(policy, reference)
    policy_model.update(loss)


完整流程 Full Pipeline:
基础模型     SFT        RM训练       PPO强化学习
Pre-trained → 学格式 → 学人类偏好 → 最大化人类评分
Base LLM     (SFT)     (Reward Model)  (RLHF)

🌍 应用 / Applications

| 系统 | RLHF 的作用 |

|------|------------|

| ChatGPT | 从"预测下一个词"变成"有帮助、无害、诚实" |

| Claude | Anthropic 使用 Constitutional AI (CAI)，RLHF 的变体 |

| Llama 2 Chat | Meta 开源 RLHF 模型，可本地运行 |

| Gemini | Google 的对话模型，同样有 RLHF 阶段 |

🐍 可运行代码片段 / Runnable Python Snippet

这里用 trl 库演示 RLHF 的核心思路（无需 GPU 的简化版）：


# pip install trl transformers torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Simulate a reward model scoring responses
# (In real RLHF, this is trained on human preference data)
def simple_reward_model(response: str) -> float:
    """Score a response based on simple heuristics."""
    score = 0.0
    
    # Reward helpfulness signals
    if len(response) > 50: score += 0.3         # detailed enough
    if "because" in response.lower(): score += 0.2  # gives reasoning
    if "?" not in response: score += 0.1         # not evasive
    
    # Penalize harmful patterns  
    if "i can't help" in response.lower(): score -= 0.5
    if "kill" in response.lower(): score -= 1.0
    
    return score

# Test with example responses
responses = [
    "I can't help with that.",
    "Exercise regularly and maintain a balanced diet because consistent habits lead to sustainable weight loss.",
    "Just eat less."
]

for r in responses:
    print(f"Score: {simple_reward_model(r):.1f} | {r[:50]}...")
# Score: -0.5 | I can't help with that...
# Score:  0.6 | Exercise regularly and maintain a balanced...
# Score:  0.3 | Just eat less...

⚠️ RLHF 的局限性 / Limitations

1. 奖励黑客 / Reward Hacking: 模型学会"取悦"奖励模型，而非真正有帮助（如生成冗长但空洞的回答）

2. 人类偏好的偏差 / Human Bias: 训练数据中的人类标注员有自己的偏见

3. 成本高 / Expensive: 需要大量人工标注，质量难以扩展

4. DPO 正在替代 PPO: Direct Preference Optimization 更简单，目前很多新模型已迁移

📚 参考资料 / References

- 🔗 [InstructGPT Paper (OpenAI) — Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)

- 🔗 [Hugging Face RLHF Blog — Illustrating RLHF](https://huggingface.co/blog/rlhf)

- 🔗 [TRL Library — Transformer Reinforcement Learning](https://github.com/huggingface/trl)

- 🔗 [Direct Preference Optimization (DPO) Paper](https://arxiv.org/abs/2305.18290)

🧒 ELI5

想象你在教一只小狗（语言模型）坐下。

第一步：你示范给它看（SFT — 监督学习）。

第二步：你训练另一个"评判员"来区分好的坐姿和差的坐姿（奖励模型）。

第三步：小狗每次坐好了，评判员给它零食；坐歪了，没有零食（强化学习）。

经过很多次训练后，小狗学会了"人类喜欢什么样的坐姿"。ChatGPT 就是这样学会"人类喜欢什么样的回答"的。

byte-by-byte — 2026-03-26

Thu, 26 Mar 2026 12:00:00 +0000

🏗️ System Design

🏗️ 系统设计 Day 10 / System Design Day 10

Topic: Consistent Hashing (一致性哈希)

预计阅读时间 / Estimated reading time: 3 minutes

场景 / Scenario

想象你在设计一个分布式缓存系统（比如 Redis 集群），有 10 台缓存服务器存储着数百万用户的数据。

Imagine you're designing a distributed cache (like a Redis cluster) with 10 servers storing millions of users' data.

一天，服务器 #3 宕机了。用系统的普通哈希 key % 10，你要重新分配 90% 的数据！

One day, server #3 goes down. With simple modulo hashing key % 10, you'd need to reassign 90% of your data!

一致性哈希只需要重新分配 ~1/N 的数据。这就是它的魔力。

Consistent hashing only reassigns ~1/N of data. That's the magic.

架构图 / Architecture Diagram


                    哈希环 / Hash Ring (0 to 360°)
                         0°
                         │
              Server A   │   Server B
              (90°)      │   (180°)
                    ┌────┴────┐
              ──────┤  RING   ├──────
                    └────┬────┘
              Server D   │   Server C
              (315°)     │   (270°)
                         │
                        360°

  Key "user:123" hashes to 210° → goes to Server C (next clockwise)
  Key "user:456" hashes to 95°  → goes to Server B (next clockwise)

  Virtual Nodes (虚拟节点):
  ┌─────────────────────────────────────────┐
  │  Physical: A  B  C  D                   │
  │  Virtual:  A1 B1 C1 D1 A2 B2 C2 D2 ... │
  │  (150 virtual nodes per physical node)  │
  └─────────────────────────────────────────┘

数据流 / Data Flow:

1. 计算 key 的哈希值，映射到环上某个角度 → Hash key to a position on the ring

2. 顺时针找到第一个服务器节点 → Find next server clockwise

3. 读写该服务器 → Read/write from that server

4. 服务器宕机：只有它的数据转移到下一个节点 → On failure: only its data migrates to the next node

关键权衡 / Key Tradeoffs

为什么这样设计？/ Why this design?

| 普通哈希 / Simple Hash | 一致性哈希 / Consistent Hash |

|---|---|

| key % N 简单但脆弱 | 环形映射，容错强 |

| 增减节点 → 大规模重分配 | 增减节点 → 仅影响 ~1/N 数据 |

| 热点不均匀难处理 | 虚拟节点解决负载均衡 |

虚拟节点的作用 / Virtual Nodes:

每个物理节点在环上有多个虚拟位置（通常 100-200 个），解决数据分布不均的问题。Each physical node has many virtual positions on the ring, solving uneven data distribution.

CAP 定理视角 / CAP Perspective:

一致性哈希帮助在分区容错（P）下提升可用性（A），但一致性（C）需要额外机制（如 quorum reads）保证。

别踩这个坑 / Common Mistakes

❌ 虚拟节点数量太少 — 数据分布会很不均匀，导致热点

Too few virtual nodes → uneven distribution → hot spots

❌ 不考虑节点权重 — 新服务器内存更大，应承担更多虚拟节点

Ignoring node weights → underutilizing powerful servers

❌ 哈希函数选错 — 用差的哈希函数（如 MD5）导致聚集

Bad hash function → clustering → poor distribution

✅ 用 MurmurHash 或 FNV1a，配合 150-200 个虚拟节点，是生产环境的黄金配置。

Use MurmurHash or FNV1a with 150-200 virtual nodes in production.

实际使用 / Real-World Usage

- Amazon DynamoDB — 内部分区路由

- Apache Cassandra — token-based consistent hashing

- Memcached / Twemproxy — 客户端一致性哈希

- Nginx upstream hash — hash $request_uri consistent

📚 References

1. [Consistent Hashing — Tom White's original paper explanation](https://www.toptal.com/big-data/consistent-hashing)

2. [Amazon DynamoDB's use of consistent hashing](https://aws.amazon.com/blogs/database/amazon-dynamodb-under-the-hood-how-we-built-a-hyper-scale-database/)

3. [Cassandra's consistent hashing implementation](https://cassandra.apache.org/doc/latest/cassandra/architecture/dynamo.html)

🧒 ELI5 (解释给5岁小孩听)

想象一圈小朋友站成一个圆，每人负责一段颜色。玩具来了，看看玩具是什么颜色，顺时针找到对应颜色的小朋友，就给他。少了一个小朋友，只有他那段颜色的玩具要重新分，其他小朋友不受影响！

Imagine kids standing in a circle, each responsible for a color range. A toy arrives — find the next kid clockwise with that color. If one kid leaves, only their toys need reassigning. Everyone else stays put!

💻 Algorithms

💻 算法 Day 11 / Algorithms Day 11

#167 Two Sum II — Input Array Is Sorted · 🟡 Medium

预计阅读时间 / Estimated reading time: 4 minutes

🧩 双指针模式 (2/5) — 继承 Day 10 的模版

Building on the Two Pointers template from Day 10

今天是双指针模式的第 2 题（共 5 题）。上一题 Valid Palindrome 用双指针判断回文；今天我们用同样的框架解决"有序数组找配对"问题。

This is the 2nd problem in our Two Pointers block (5 total). Yesterday we checked palindromes; today we use the same framework to find pairs in a sorted array.

本 block 全部 5 题 / All 5 problems:

1. ✅ #125 Valid Palindrome (Easy) — Day 10

2. 👈 #167 Two Sum II (Medium) — TODAY

3. #15 3Sum (Medium)

4. #11 Container With Most Water (Medium)

5. #42 Trapping Rain Water (Hard)

通用模版回顾 / Template Recap:


left, right = 0, len(arr) - 1
while left < right:
    total = arr[left] + arr[right]
    if total == target: return [left, right]
    elif total < target: left += 1   # need bigger sum
    else: right -= 1                  # need smaller sum

与 Valid Palindrome 的对比 / vs Yesterday:

| | Valid Palindrome | Two Sum II |

|---|---|---|

| 移动条件 / Move when | chars don't match | sum ≠ target |

| 收缩方向 / Shrink | both sides toward middle | whichever side adjusts sum |

| 核心逻辑 / Core | compare chars | adjust sum magnitude |

题目 / Problem

🔗 [LeetCode #167](https://leetcode.com/problems/two-sum-ii-input-array-is-sorted/) · 🟡 Medium

📹 [NeetCode Video](https://neetcode.io/problems/two-sum-ii)

现实类比 / Real-World Analogy:

你有一张已排序的价目表，要找出恰好等于预算 target 的两件商品。

You have a sorted price list and want to find exactly two items that sum to your budget.

题目 / Problem:

给一个 1-indexed、非递减排序的数组，找两个数相加等于 target，返回它们的下标（1-indexed）。每个输入保证有唯一解。

Given a 1-indexed, non-decreasing sorted array, find two numbers that sum to target. Return 1-indexed positions. Exactly one solution exists.


Input:  numbers = [2, 7, 11, 15], target = 9
Output: [1, 2]  (numbers[0] + numbers[1] = 2 + 7 = 9)

💡 套用模版 / Mapping to Template

模版中 arr[left] + arr[right] 对应今天的 numbers[left] + numbers[right]。

为什么有序数组可以用双指针？/ Why does sorting enable two pointers?

关键洞察：数组排序后，如果 sum < target，我们确定需要更大的值 → 移动左指针。如果 sum > target，需要更小的值 → 移动右指针。无序数组无法这样推断！

Key insight: With a sorted array, if sum < target, we KNOW we need a bigger value → move left. If sum > target, we need smaller → move right. Unsorted arrays can't support this reasoning!

🐍 Python 解法 + 逐步追踪 / Solution + Trace


def twoSum(numbers: list[int], target: int) -> list[int]:
    left, right = 0, len(numbers) - 1  # 1
    
    while left < right:                 # 2
        current_sum = numbers[left] + numbers[right]  # 3
        
        if current_sum == target:       # 4
            return [left + 1, right + 1]  # convert to 1-indexed
        elif current_sum < target:      # 5
            left += 1   # need bigger number
        else:
            right -= 1  # need smaller number
    
    return []  # guaranteed to find answer, never reaches here

追踪 / Trace with numbers = [2, 7, 11, 15], target = 9:


Step 1: left=0, right=3 → 2+15=17 > 9  → right=2
Step 2: left=0, right=2 → 2+11=13 > 9  → right=1
Step 3: left=0, right=1 → 2+7=9  == 9  → return [1, 2] ✅

时间/空间复杂度 / Complexity:

- ⏱ Time: O(n) — each pointer moves at most n steps total

- 💾 Space: O(1) — no extra data structures

vs. Brute Force: O(n²) with nested loops. Two pointers give a 10-100x speedup on large inputs.

举一反三 / Pattern Connections

在本 block 中 / Within this pattern block:

- #15 3Sum (下一题): Same two-pointer idea + outer loop. Fix one element, two-pointer the rest.

- #11 Container With Most Water: left, right move based on which height is smaller — same structure!

- #42 Trapping Rain Water: Two pointers + track running max from each side. Most complex variation.

看到这些信号就想到双指针 / Recognize these signals:

- ✅ Sorted array

- ✅ "Find pair that sums to X"

- ✅ O(1) space required

- ✅ Palindrome check

- ✅ "Remove duplicates in-place"

📚 References

1. [LeetCode #167 — Two Sum II](https://leetcode.com/problems/two-sum-ii-input-array-is-sorted/)

2. [NeetCode Two Sum II explanation](https://neetcode.io/problems/two-sum-ii)

3. [Two Pointers pattern guide — LeetCode Patterns](https://leetcode.com/discuss/study-guide/1688903/Solved-all-two-pointers-problems-in-100-days)

🧒 ELI5

你和朋友站在一排数字两端。你喊出你们俩数字的和。太小就让左边的人向右走一步（换更大的数）；太大就让右边的人向左走一步（换更小的数）；正好就赢了！

You and a friend stand at opposite ends of a number line. Call out your sum. Too small → left person steps right (bigger number). Too big → right person steps left (smaller). Exact match → win!

🗣️ Soft Skills

🗣️ 软技能 Day 10 / Soft Skills Day 10

Topic: Proactiveness — 主动发现问题

"Tell me about a time you identified and solved a problem before others noticed"

预计阅读时间 / Estimated reading time: 2 minutes

为什么这道题很重要 / Why This Matters

这是区分普通工程师和高级工程师的核心问题之一。

This is one of the core questions that distinguishes senior from junior engineers.

初级工程师：等待任务分配，发现问题后上报。

Junior: Waits for tasks, escalates problems when found.

高级工程师：主动监控系统健康，提前发现隐患，悄悄修好。

Senior: Proactively monitors system health, finds issues before they explode, quietly fixes them.

面试官想听到的信号：主动性、系统思维、影响力量化。

What interviewers want: proactivity, systems thinking, quantified impact.

STAR 拆解 / STAR Breakdown

✅ 强回答结构 / Strong Answer Structure

Situation（情境）:

> "在我们的支付服务中，我在做例行代码审查时注意到一个看起来没问题但实际上很危险的模式——一个在高并发场景下会导致重复扣款的竞态条件。"

> "While doing a routine code review of our payment service, I noticed a pattern that looked fine but was actually dangerous — a race condition that would cause duplicate charges under high concurrency."

Task（任务）:

> "没人发现这个问题，线上也没有报警。但我知道如果不处理，在双十一这样的高峰期必然会触发。"

> "No one had flagged it, and there were no alerts. But I knew it would absolutely trigger during peak traffic like Black Friday."

Action（行动）:

> "我先写了一个复现脚本，用 k6 模拟并发请求证明了问题存在；然后提出了三种修复方案，评估了各自的性能影响；和 PM 沟通了延迟一个小功能发布来优先修复；最后用数据库级别的幂等锁解决了问题。"

> "I wrote a reproduction script using k6 to prove the bug. Then I proposed 3 fix options with their performance tradeoffs, aligned with the PM to delay a minor feature, and fixed it with database-level idempotency locks."

Result（结果）:

> "两周后的峰值流量中，有记录显示有 847 次请求命中了我们的幂等保护。估算避免了 $15K 的退款损失和潜在的支付合规问题。"

> "Two weeks later during peak traffic, we recorded 847 requests hitting our idempotency guard. Estimated $15K in prevented chargebacks and potential compliance issues."

❌ Bad vs ✅ Good

❌ 弱回答 / Weak:

> "我发现了一个 bug，报告给了我的经理，他们修复了它。"

> "I found a bug and reported it to my manager and they fixed it."

→ 没有主动性，没有影响，这是被动行为。

No ownership, no impact, this is reactive not proactive.

✅ 强回答 / Strong:

> 展示：你如何主动发现（不是被告知）→ 量化潜在风险 → 独立推进修复 → 数字化影响

> Show: how you proactively discovered (not told) → quantified potential risk → independently drove the fix → measured impact

高级/Staff 的进阶 / Senior/Staff Level Tips

🔥 系统性主动 vs 偶发性主动:

普通的"主动"是偶然发现问题。Staff 级别会建立系统：

- 定期审查监控告警覆盖率

- 建立技术债 backlog 并推动季度 review

- 主导 Game Day / Chaos Engineering 主动暴露隐患

Average proactiveness is accidental. Staff-level proactiveness is systematic: quarterly tech debt reviews, monitoring coverage audits, deliberate chaos engineering.

🔥 提前沟通风险:

找到问题后，不只是"修了"，而是向上同步风险评估和修复进度，让决策者知情。

Don't just fix silently — sync up risk assessment and fix progress with stakeholders. Make decisions visible.

Key Takeaways

1. 🔍 主动发现 — 描述你如何发现（代码审查、监控、读日志、直觉）

2. 📊 量化风险 — "如果不修，会有 X 影响"比"我觉得有问题"有力 10 倍

3. ⚙️ 独立推进 — 展示你能端到端推动，不依赖他人催促

4. 📈 结果数字化 — 预防的损失 > 修复的技术细节

📚 References

1. [Staff Engineer: Leadership beyond the management track — Will Larson](https://staffeng.com/book)

2. [Google SRE Book — Chapter on Monitoring and Alerting](https://sre.google/sre-book/monitoring-distributed-systems/)

3. [The STAR Method for behavioral interviews — Indeed](https://www.indeed.com/career-advice/interviewing/how-to-use-the-star-interview-response-technique)

🧒 ELI5

就像你在玩游戏，别人都在打怪，但你提前发现了地图上有个陷阱，在队友掉坑之前就绕过去了，还告诉大家这里有坑。这就是主动性！

It's like being in a game where everyone's fighting monsters, but you spotted a hidden trap on the map. You avoided it before your teammates fell in — and told everyone it was there. That's proactiveness!

🎨 Frontend

🎨 前端 Day 10 / Frontend Day 10

Topic: React useEffect — Side Effects & Cleanup

预计阅读时间 / Estimated reading time: 2 minutes

真实场景 / Real Scenario

你在做一个 实时股票 dashboard，需要：

1. 组件加载时订阅 WebSocket 数据流

2. 组件卸载时取消订阅（否则内存泄漏！）

3. 当股票代码（ticker）改变时，切换订阅

You're building a real-time stock dashboard. You need to:

1. Subscribe to WebSocket stream when component mounts

2. Unsubscribe when component unmounts (or: memory leak!)

3. Switch subscriptions when the ticker changes

这就是 useEffect 的经典使用场景。

代码示例 / Code Example


import { useEffect, useState } from 'react';

interface StockData {
  price: number;
  change: number;
}

function StockTicker({ symbol }: { symbol: string }) {
  const [data, setData] = useState(null);

  useEffect(() => {
    // 1. Setup: runs after render
    console.log(`Subscribing to ${symbol}`);
    const ws = new WebSocket(`wss://stocks.example.com/${symbol}`);
    
    ws.onmessage = (event) => {
      setData(JSON.parse(event.data));
    };

    // 2. Cleanup: runs before next effect OR on unmount
    return () => {
      console.log(`Unsubscribing from ${symbol}`);
      ws.close(); // ← THIS IS CRITICAL
    };
  }, [symbol]); // 3. Dependency array: re-run when symbol changes

  if (!data) return Loading...;
  return {symbol}: ${data.price} ({data.change}%);
}

猜猜输出？ / What's the output order?

场景： symbol 从 "AAPL" 改为 "GOOG"


A) "Subscribing to GOOG"
B) "Unsubscribing from AAPL" → "Subscribing to GOOG"
C) "Subscribing to GOOG" → "Unsubscribing from AAPL"
D) 什么都不打印

显示答案 / Show Answer

答案是 B — "Unsubscribing from AAPL" 先打印，然后 "Subscribing to GOOG"

React 的执行顺序：

1. symbol prop 变化 → re-render

2. React 运行上一个 effect 的 cleanup（关闭旧 WebSocket）

3. React 运行新的 effect（打开新 WebSocket）

这就是为什么 cleanup 在 return 里：React 会在正确时机调用它。

❌ 常见错误 vs ✅ 正确做法

❌ 忘记清理 / Forgetting cleanup:


useEffect(() => {
  const interval = setInterval(fetchData, 1000);
  // ❌ No cleanup! Interval runs forever after unmount
}, []);

✅ 正确清理 / Always clean up:


useEffect(() => {
  const interval = setInterval(fetchData, 1000);
  return () => clearInterval(interval); // ✅
}, []);

❌ 依赖项缺失 / Missing dependencies:


useEffect(() => {
  fetchUser(userId); // ❌ userId used but not in deps
}, []); // stale closure! always fetches original userId

✅ 正确依赖 / Correct dependencies:


useEffect(() => {
  fetchUser(userId); // ✅
}, [userId]); // re-runs whenever userId changes

三种 useEffect 形态 / Three Patterns


// 1. Run ONCE on mount (componentDidMount equivalent)
useEffect(() => {
  initAnalytics();
  return () => cleanup(); // runs on unmount
}, []); // empty deps = run once

// 2. Run on every render (rarely needed)
useEffect(() => {
  document.title = `Count: ${count}`;
}); // no deps array = every render

// 3. Run when specific values change
useEffect(() => {
  fetchUserProfile(userId);
}, [userId]); // run when userId changes

何时用 / 何时不用 / When to Use / When NOT to

✅ 适合用 useEffect:

- API 数据获取（推荐用 React Query/SWR 封装）

- WebSocket / 事件监听器

- 第三方库集成（地图、图表）

- 浏览器 API（localStorage, document.title）

❌ 不要用 useEffect:

- 派生状态（用 useMemo 代替）

- 事件处理（直接用 event handler）

- 在 render 期间的数据转换（直接在组件里算）

React 团队的建议 / React Team's Take:

> "You might not need an Effect" — 很多 useEffect 可以被消除。

> Many Effects can be eliminated. Think twice before reaching for it.

📚 References

1. [React Docs: Synchronizing with Effects](https://react.dev/learn/synchronizing-with-effects)

2. [React Docs: You Might Not Need an Effect](https://react.dev/learn/you-might-not-need-an-effect)

3. [Dan Abramov: A Complete Guide to useEffect](https://overreacted.io/a-complete-guide-to-useeffect/)

🧒 ELI5

useEffect 就像给房间装了一个"进房间就开灯、出房间就关灯"的传感器。你进去（组件挂载），灯亮了；你出来（组件卸载），灯自动灭。如果你换了房间（依赖变了），它先把旧房间的灯关掉，再把新房间的灯打开。

useEffect is like a room sensor that turns the light on when you enter and off when you leave. When you switch rooms (deps change), it turns off the old light before turning on the new one.

🤖 AI

🤖 AI Day 11 — News Roundup

2026年3月26日 / March 26, 2026

预计阅读时间 / Estimated reading time: 2 minutes

📰 本周 AI 大事件 / This Week in AI

Sources: Web search results from March 2026

1. 🚀 OpenAI 发布 GPT-5.4 — AI"数字同事"时代来临

来源 / Source: [riskinfo.ai](https://www.riskinfo.ai/post/ai-insights-key-global-developments-in-march-2026) | [juliangoldie.co.uk](https://juliangoldie.co.uk/ai-news-march-2026/)

GPT-5.4 于 3 月 5 日正式发布，最大亮点是原生计算机使用能力（Computer Use）——AI 可以直接操作真实软件环境（Excel、文档、网页），而不只是生成文字。在法律文件 benchmark 上达到 91% 准确率。

GPT-5.4 launched March 5 with native computer-use capabilities — AI can now directly interact with real software environments like spreadsheets and documents, moving toward a "digital co-worker" role. Achieved 91% on a legal-document benchmark.

为什么你应该关心 / Why you should care:

作为工程师，这意味着 AI agent 正在从"问答工具"变成"能自主操作 GUI 的同事"。未来的 AI 代码助手可能直接在你的 IDE 里操作文件、跑测试、提 PR——而不只是给出建议。

2. 🛡️ Anthropic 拒绝军事合同，被列为"供应链风险"

来源 / Source: [radicaldatascience.wordpress.com](https://radicaldatascience.wordpress.com/2026/03/20/ai-news-briefs-bulletin-board-for-march-2026/)

美国国防部要求 Anthropic 移除 Claude 的安全护栏（禁止自主武器使用），Anthropic 拒绝后被 DoD 列为"供应链风险"。这是 AI 安全与国家安全之间最直接的冲突之一。

The US DoD designated Anthropic as a "supply chain risk" after the company refused to remove safety guardrails prohibiting Claude's use in autonomous weaponry.

为什么你应该关心 / Why you should care:

AI 公司的价值观选择正在产生真实商业后果。这场博弈将塑造未来 AI 系统的"红线"在哪里划定。

3. 🎮 NVIDIA GTC 2026：AI 工厂 + 边缘计算引领下一波

来源 / Source: [nvidianews.nvidia.com](https://nvidianews.nvidia.com/news/nvidia-and-emerald-ai-join-leading-energy-companies-to-pioneer-flexible-ai-factories-as-grid-assets) | [vtnetzwelt.com](https://www.vtnetzwelt.com/ai-development/latest-ai-technology-news-roundup-march-2026/)

NVIDIA 在 3 月 16 日 GTC 大会上主推"AI 工厂"概念：AI 算力中心既能生产 AI tokens，又能作为灵活电网资产调节用电。同时发布 Nemotron 3 Super，专为复杂 agentic 系统设计。

NVIDIA's GTC 2026 highlighted "AI factories" — compute centers that generate AI tokens AND act as flexible grid assets. Nemotron 3 Super targets complex agentic AI workflows.

为什么你应该关心 / Why you should care:

AI 基础设施成本是行业最大变量之一。AI 工厂与能源网格整合，可能显著降低运算成本，从而让更多 AI 功能变得可行。

4. 📊 76% 的企业：准备好迎接 AI Agent 了吗？还没有

来源 / Source: [ey.com](https://www.ey.com/en_gl/newsroom/2026/03/ey-survey-autonomous-ai-is-no-longer-theoretical-as-adoption-grows-despite-ongoing-trust-concerns)

EY 2026 AI Sentiment Report：76% 的企业承认运营流程还没准备好支持 agentic AI。最大障碍是：缺少结构化工作流、上下文传递不清晰、以及信任问题。

EY's 2026 AI Sentiment Report: 76% of enterprises admit their operations are not yet ready to support agentic AI. Main blockers: unstructured workflows, unclear context handoffs, and trust gaps.

为什么你应该关心 / Why you should care:

这直接影响你作为工程师的工作重点。未来 1-3 年最有价值的技能：设计能与 AI Agent 协作的系统架构——清晰的 API 接口、可审计的工作流、幂等操作。

🔗 本周延伸阅读 / Further Reading

- [AI Insights: Key Global Developments March 2026](https://www.riskinfo.ai/post/ai-insights-key-global-developments-in-march-2026)

- [NVIDIA AI Factories press release](https://nvidianews.nvidia.com/news/nvidia-and-emerald-ai-join-leading-energy-companies-to-pioneer-flexible-ai-factories-as-grid-assets)

- [EY 2026 AI Sentiment Report](https://www.ey.com/en_gl/newsroom/2026/03/ey-survey-autonomous-ai-is-no-longer-theoretical-as-adoption-grows-despite-ongoing-trust-concerns)

🧒 ELI5

AI 这周的新闻就像：有人造了一个超级厉害的机器人助手，会直接帮你操电脑；有家公司不愿意把机器人改成"可以打仗"的，结果被政府不喜欢了；还有调查说大多数公司虽然想用 AI 助手，但家里还没收拾好迎接它。

This week in AI: a super robot that can actually USE your computer arrived; one company refused to let their AI be used as a weapon and got in trouble for it; and a survey found most companies want AI workers but haven't cleaned their house yet.

byte-by-byte — 2026-03-25

Wed, 25 Mar 2026 12:00:00 +0000

Review

🔄 复习日 Day 10 / Review Day 10

Date: 2026-03-25 | Phase: Foundation

今天是复习日！回顾第 6-9 天的内容。

Today is Review Day! Looking back at Days 6–9.

📊 回顾范围 / Review Scope (Days 6–9):

- 🏗️ Caching Strategies · Database Types · DB Indexing · DB Replication & Sharding

- 💻 Top K Frequent Elements · Product of Array Except Self · Valid Sudoku · Valid Palindrome

- 🗣️ Balancing priorities · Simplifying complex systems · Ambiguous requirements · Pushing back on features

- 🎨 CSS Specificity · Positioning · Animations & Transitions · React useState

- 🤖 Embeddings · Training vs Fine-Tuning vs Prompting

📝 Quick Quiz — 3 Mini-Reviews

Q1: [🏗️ System Design] Caching + Replication

你在设计一个读多写少的社交平台（Read-heavy social feed）。你决定同时使用缓存和 数据库只读副本（Read Replicas）。

You're designing a read-heavy social feed. You use both caching and read replicas.

问题 / Question: 这两者的职责分工是什么？什么情况下读副本仍然不够，你必须依赖缓存？

What is the distinct role of each? When are read replicas still insufficient and you MUST rely on the cache?

显示答案 / Show Answer

Read replicas 通过分散读请求到多个数据库副本来提升吞吐量，但每次请求仍然执行完整的 SQL 查询，延迟在毫秒级别。缓存（Redis/Memcached）则把热点数据存在内存里，延迟在微秒级别，且完全绕开了数据库。

Read replicas scale throughput by distributing SQL queries across copies, but every request still hits disk and parses SQL — latency stays in the millisecond range. A cache holds hot data in RAM (microsecond latency) and bypasses the database entirely.

什么时候必须用缓存 / When you MUST cache:

1. 热点数据（Hotspot / Celebrity problem）：单个用户/item 被疯狂读取，单个副本也扛不住

2. 计算昂贵的结果（Aggregations, ranked feeds）：不想每次都重新算

3. 外部 API 响应缓存：副本根本不存这些数据

关键洞察: Read replica = 水平扩展数据库。Cache = 彻底逃离数据库。两者互补，不是替代关系。

Q2: [💻 Algorithms] Product of Array Except Self

给定数组 [1, 2, 3, 4]，Product of Array Except Self 要求返回 [24, 12, 8, 6]——即每个位置是除自身以外所有元素的乘积。

Given [1, 2, 3, 4], return [24, 12, 8, 6] — each position is the product of all other elements.

问题 / Question: 不使用除法、O(n) 时间、O(1) 额外空间（输出数组不算）怎么做？解释 prefix product + suffix product 的思路。

Without division, O(n) time, O(1) extra space — explain the prefix × suffix approach.

显示答案 / Show Answer

核心思路 / Core idea: 位置 i 的答案 = i 左边所有数的乘积 × i 右边所有数的乘积。


Array:   [1,  2,  3,  4]
Prefix:  [1,  1,  2,  6]   # prefix[i] = product of all elements to the LEFT of i
Suffix:  [24, 12, 4,  1]   # suffix[i] = product of all elements to the RIGHT of i
Result:  [24, 12, 8,  6]   # prefix[i] * suffix[i]

两趟扫描 / Two-pass O(1) space:

- Pass 1 (left→right): 用输出数组存 prefix product

- Pass 2 (right→left): 用一个变量 suffix 滚动累乘，直接乘进输出数组


def productExceptSelf(nums):
    n = len(nums)
    res = [1] * n
    # Pass 1: res[i] = product of everything to the LEFT
    for i in range(1, n):
        res[i] = res[i-1] * nums[i-1]
    # Pass 2: multiply in suffix product on the fly
    suffix = 1
    for i in range(n-1, -1, -1):
        res[i] *= suffix
        suffix *= nums[i]
    return res

为什么不用除法？ 因为数组可能含 0，除法会出错（0/0）。

举一反三: 这个 "prefix + suffix scan" 模式在 Trapping Rain Water 中也用到了——左边最大值 × 右边最大值决定每格存水量。

Q3: [🎨 Frontend] React useState

看这段代码：


function Counter() {
  const [count, setCount] = React.useState(0);

  const handleClick = () => {
    setCount(count + 1);
    setCount(count + 1);
    setCount(count + 1);
  };

  return ;
}

问题 / Question: 点击一次按钮后，count 变成几？为什么？如果想让 count 每次点击增加 3，怎么修改？

After one click, what is count? Why? How do you fix it to actually add 3?

显示答案 / Show Answer

答案是 1，不是 3。

原因 / Why: React 在同一个事件处理函数中会批量处理（batch） setState 调用。count 在整个 handleClick 执行期间都是 快照值（stale closure），始终是 0。所以三次 setCount(0 + 1) 其实是三次设置同一个值 1。

如何修复 / Fix — 使用函数式更新（functional update form）:


const handleClick = () => {
  setCount(prev => prev + 1);  // prev = 0 → 1
  setCount(prev => prev + 1);  // prev = 1 → 2
  setCount(prev => prev + 1);  // prev = 2 → 3
};

传入函数时，React 会把最新的 state 值作为参数传入，而不是用闭包里的快照。

黄金法则 / Golden Rule: 当新 state 依赖旧 state 时，永远用 setX(prev => ...) 形式，而不是 setX(x + 1)。这在并发模式（React 18+ Concurrent Features）下尤其重要。

💡 复习巩固记忆，螺旋式上升。每次复习都是在加深神经连接。

Review strengthens memory through spaced repetition — you're literally deepening neural pathways each time.

📅 明天继续新内容！Day 11 starts tomorrow!

Generated: 2026-03-25 | byte-by-byte Day 10 Review

byte-by-byte — 2026-03-24

Tue, 24 Mar 2026 12:00:00 +0000

🏗️ System Design

🏗️ 系统设计 Day 9 / System Design Day 9

主题 / Topic: 数据库复制与分片 / Database Replication & Sharding

🌏 真实场景 / Real-World Scenario

想象你在设计一个像微信读书或 Goodreads 的阅读应用——用户突破 5000 万，每天产生几亿条阅读记录、笔记和评论。单一数据库服务器已经撑不住了：写操作堵住读操作，单点故障导致整个 App 不可用，数据量超出单机磁盘上限。

你需要两把利器：复制（Replication）解决可用性和读性能，分片（Sharding）解决写性能和存储规模。

Imagine you're designing a reading app like Goodreads at 50M users, with hundreds of millions of reading records daily. A single database server buckles under load. You need Replication for availability & read scale, and Sharding for write scale & storage capacity.

🏛️ 架构图 / Architecture Diagram


┌─────────────────────────────────────────────────────────┐
│                    应用服务层 / App Layer                │
│         [API Server 1] [API Server 2] [API Server 3]    │
└─────────┬──────────────────────────┬────────────────────┘
          │ Writes                    │ Reads
          ▼                           ▼
┌─────────────────┐        ┌────────────────────────┐
│   Primary DB    │──────► │  Read Replica 1        │
│ (Leader/Master) │──────► │  Read Replica 2        │
│                 │──────► │  Read Replica 3        │
└────────┬────────┘        └────────────────────────┘
         │ Replication Log (WAL / Binlog)
         │
         ▼  [After Replication → Add Sharding]
┌────────────────────────────────────────────────────┐
│                  Shard Router / Proxy              │
│           (e.g. Vitess, ProxySQL, PgBouncer)       │
└────┬──────────────────┬───────────────────┬────────┘
     │                  │                   │
     ▼                  ▼                   ▼
┌─────────┐       ┌─────────┐        ┌─────────┐
│ Shard 0 │       │ Shard 1 │        │ Shard 2 │
│user 0-33M│      │user33-66M│       │user66M+ │
│+Replicas│       │+Replicas│        │+Replicas│
└─────────┘       └─────────┘        └─────────┘

⚖️ 关键权衡 / Key Tradeoffs

复制 / Replication

| 方案 | 优点 | 缺点 |

|------|------|------|

| 同步复制 | 强一致性，不丢数据 | 写延迟高（等所有副本确认） |

| 异步复制 | 写延迟低，吞吐高 | 副本可能有延迟（replication lag） |

| 半同步 | 折中：至少 1 个副本确认 | 稍高写延迟，部分一致性 |

为什么这样设计？

- 读多写少的业务（如阅读记录）：异步复制 + 多读副本，读吞吐可水平扩展

- 金融、支付场景：同步复制或 Raft/Paxos 保证强一致

分片 / Sharding

| 策略 | 原理 | 适合场景 |

|------|------|------|

| Range Sharding | 按 user_id 范围切分 | 范围查询友好，但热点风险高 |

| Hash Sharding | shard = hash(user_id) % N | 均匀分布，但范围查询跨 shard |

| Directory Sharding | 查表确定归属 shard | 灵活，但查表本身是瓶颈 |

🚫 常见坑 / Common Mistakes

坑 1：过早分片

> 分片大幅增加系统复杂度。复制 + 读副本能抗住大多数流量，先用它，真正撑不住再分片。

坑 2：选错 Shard Key

> 按时间分片会导致最新 shard 永远是热点（写都打到最新月份）。按用户 ID hash 分片更均匀。

坑 3：跨 Shard 事务

> 分布式事务极复杂。设计 schema 时尽量让同一用户的数据在同一 shard，避免跨 shard join。

坑 4：忽略 Replication Lag

> 用户刚发评论，立刻刷新却看不到——因为读副本还没同步。对强一致性操作，读 Primary 或使用 read-your-writes 路由。

📚 参考资料 / References

1. [AWS Database Replication — RDS Read Replicas](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html)

2. [Vitess — MySQL Sharding at YouTube Scale](https://vitess.io/docs/concepts/sharding/)

3. [Designing Data-Intensive Applications — Chapter 5 & 6 (Kleppmann)](https://dataintensive.net/)

🧒 ELI5 / 用小孩能理解的话说

复制就像把书抄写多份，放在不同图书馆。每个图书馆都能借给你看（读副本），但只有总馆能修改（Primary）。

分片就像把全班同学的作业按学号分给 3 个老师批改——不再一个老师批所有作业，每个老师只负责一段。

Replication = Make copies of the book so more people can read at once.

Sharding = Split the library into sections so no single librarian is overwhelmed.

💻 Algorithms

💻 算法 Day 10 / Algorithms Day 10

#125 Valid Palindrome (Easy) — 双指针模式 / Two Pointers

🧩 新模式 / New Pattern: 双指针模式 (Two Pointers)

📍 这个模式块共 5 道题 / This block: 5 problems

| # | 题目 | 难度 |

|---|------|------|

| 1 | #125 Valid Palindrome ← 今天 / TODAY | 🟢 Easy |

| 2 | #167 Two Sum II | 🟡 Medium |

| 3 | #15 3Sum | 🟡 Medium |

| 4 | #11 Container With Most Water | 🟡 Medium |

| 5 | #42 Trapping Rain Water | 🔴 Hard |

什么时候用 / When to Use

排序数组中找配对、回文检测、原地操作时，想到双指针。

Use Two Pointers when: sorted array + find a pair, palindrome detection, in-place removal, merging sorted arrays.

识别信号 / Signals

> sorted array · find pair with sum · palindrome · remove in-place · merge sorted · container/water problems

通用模版 / Template


def two_pointer_template(arr, target):
    left, right = 0, len(arr) - 1
    
    while left < right:
        current = arr[left] + arr[right]      # or some condition on left/right
        
        if current == target:
            return [left, right]              # found it
        elif current < target:
            left += 1                         # need bigger value → move left pointer right
        else:
            right -= 1                        # need smaller value → move right pointer left
    
    return []                                 # not found

核心洞察 / Key Insight: 排序 + 两端逼近，从 O(n²) 嵌套循环降到 O(n) 单次扫描。

Sorted order + converging from both ends → eliminates the need for nested loops.

📖 今日题目 / Today's Problem

🔗 [LeetCode #125 — Valid Palindrome](https://leetcode.com/problems/valid-palindrome/) 🟢 Easy

📹 [NeetCode 讲解](https://neetcode.io/problems/is-palindrome)

🌍 现实类比 / Real-World Analogy

想象你是一个质检员，要验证一条传送带上的字符串"从两头读是否一样"。你派两个检查员分别站在传送带两端，同时向中间走，每步对比字母（跳过非字母数字的字符）。两人相遇时没有发现不同，就通过！

Think of two inspectors walking from both ends of a conveyor belt toward the middle, each checking only alphanumeric items and skipping punctuation/spaces.

🧩 如何映射到模版 / Mapping to Template

经典双指针，但有两个变化：

1. 不是排序数组——我们用双指针做"对比"而不是"求和"

2. 需要跳过非字母数字字符——在移动指针前先跳过无效字符

Classic Two Pointers, with two modifications:

1. No sorted array → use pointers for comparison, not sum-seeking

2. Skip non-alphanumeric chars before comparing


def isPalindrome(s: str) -> bool:
    left, right = 0, len(s) - 1
    
    while left < right:
        # Skip non-alphanumeric from the left
        while left < right and not s[left].isalnum():
            left += 1
        # Skip non-alphanumeric from the right
        while left < right and not s[right].isalnum():
            right -= 1
        
        # Compare (case-insensitive)
        if s[left].lower() != s[right].lower():
            return False
        
        left += 1
        right -= 1
    
    return True

🔍 代码追踪 / Code Trace

Input: "A man, a plan, a canal: Panama"


left=0  right=29  → 'A' vs 'a' → match  → left=1,  right=28
left=1  right=28  → skip ' '   → left=2
left=2  right=28  → 'm' vs 'm' → match  → left=3,  right=27
left=3  right=27  → 'a' vs 'a' → match  → left=4,  right=26
left=4  right=26  → 'n' vs 'n' → match  → ...
...
→ All chars match → return True ✅

Input: "race a car"


left=0  right=9   → 'r' vs 'r' → match
left=1  right=8   → 'a' vs 'a' → match
left=2  right=7   → 'c' vs 'c' → match
left=3  right=6   → 'e' vs 'a' → ❌ MISMATCH → return False

📊 复杂度 / Complexity

| | Time | Space |

|---|------|-------|

| Two Pointer | O(n) | O(1) |

| Built-in reverse | O(n) | O(n) — creates new string |

Space O(1) is the win here — we never create a cleaned copy of the string.

🔄 举一反三 / Pattern Connections

这道题是双指针的"热身"——纯粹的左右逼近。接下来的题目会在这个基础上加难度：

| 题目 | 变化 | 核心差异 |

|------|------|---------|

| #167 Two Sum II | 有序数组找和 | 移动指针基于 sum vs target |

| #15 3Sum | 三数之和 | 固定一个数 + 双指针找剩余两个 |

| #11 Container With Most Water | 面积最大化 | 移动较短的那边指针 |

| #42 Trapping Rain Water | 复杂水位计算 | 双指针维护左右最大高度 |

📚 参考资料 / References

1. [LeetCode #125 — Valid Palindrome](https://leetcode.com/problems/valid-palindrome/)

2. [NeetCode — Two Pointers Pattern](https://neetcode.io/roadmap)

3. [Python str.isalnum() docs](https://docs.python.org/3/library/stdtypes.html#str.isalnum)

🧒 ELI5 / 用小孩能理解的话说

回文就像照镜子——左边和右边要一样。我们用两只手，一只从左摸，一只从右摸，跳过空格和标点，对比每个字母。如果两只手中间相遇了都没发现不同，就是回文！

A palindrome is like a mirror — left side = right side. We use two fingers, one from each end, skip spaces/punctuation, compare each letter. If both fingers meet in the middle without finding a mismatch → palindrome!

🗣️ Soft Skills

🗣️ 软技能 Day 9 / Soft Skills Day 9

主题 / Topic: 利益相关方管理 / Stakeholder Management

问题 / Question: Describe a time you had to push back on a feature or requirement. Why?

💡 为什么这道题很重要 / Why This Matters

在高级工程师面试中，面试官不只想知道你"会写代码"——他们想知道你能不能独立判断、有没有勇气说出"这个需求有问题"。盲目执行坏需求是初级工程师的行为；能够用数据和逻辑推动正确方向，是 Senior/Staff 的核心能力。

Interviewers want to know you're not just a "feature factory." Senior engineers own outcomes, not just outputs. Pushing back constructively — with data, not attitude — is a core competency at L5+.

⭐ STAR 拆解 / STAR Breakdown

Situation（情境）

> 设置背景：什么团队？什么项目阶段？紧迫程度？

"我们的 PM 要求在发布前两周新增一个实时用户追踪功能，当时系统负载已经接近上限。"

"Our PM requested a real-time user tracking feature two weeks before a major launch, when our system load was already near capacity."

Task（任务）

> 你的职责是什么？你为什么有发言权？

"作为负责后端基础设施的 Senior Engineer，我需要评估这个需求的可行性和风险。"

"As the Senior Engineer owning backend infra, I needed to assess feasibility and surface the technical risk."

Action（行动）

> 这是核心！展示你如何有据可查地推回，而不是情绪化地拒绝。

1. 量化风险： 我跑了负载测试，展示新功能会把 P99 延迟从 120ms 推高到 650ms

2. 提出替代方案： 建议将实时追踪改为批量日志，延迟 24h 但不影响核心体验

3. 对齐业务目标： 确认 PM 真正想要的是"数据分析能力"而不是"实时性"——批量方案完全满足

4. 共识达成： 带着数据找 PM + 工程总监开了 30 分钟会议，最终采用我的方案

"I ran load tests showing P99 latency would spike from 120ms to 650ms. I proposed batch logging instead — same data, 24h delay, zero performance impact. I aligned with PM on the real goal (analytics, not real-time), and brought data to a 30-min meeting with PM and Eng Director. We shipped the batch solution."

Result（结果）

> 用数字说话。

"发布如期进行，零性能事故。批量数据方案在发布后三个月上线，PM 反馈数据质量超出预期。这次经验也推动团队建立了需求评审中的技术可行性评估流程。"

"Launch shipped on time with zero incidents. The batch analytics shipped 3 months post-launch and exceeded data quality expectations. The experience led to establishing a technical feasibility step in our requirements review process."

❌ 别这么说 / Bad vs ✅ 这么说 / Good

| ❌ 踩坑 | ✅ 正解 |

|--------|--------|

| "我直接告诉 PM 这个要求太蠢了" | "我跑了测试，把风险用数据量化" |

| "我认为这不重要，所以不做" | "我先理解他们的真实目标，再提替代方案" |

| "最终我没能阻止，还是做了" | "我确保所有决策者都了解风险，决策有据可查" |

| "我们就这么做了" → 没有结果 | 说清楚结果：上线情况、用户影响、后续改进 |

🚀 Senior/Staff 加分点 / Senior+ Tips

1. 系统化推回，而非感情化拒绝。 数据 > 直觉。Load test、cost model、用户影响分析——让数字说话。

2. 先理解"为什么"，再评估"怎么做"。 很多"坏需求"背后有合理的业务原因，找到根本目标才能提出真正有价值的替代方案。

3. 建立信任储备。 平时 deliver 靠谱，关键时刻的推回才会被认真对待。

4. 把决策过程文档化。 即使你没能推回成功，确保风险已被知晓和记录，保护自己也保护团队。

🎯 Key Takeaways

- 推回 ≠ 拒绝。推回 = 用专业判断守护产品质量。

- Push back = professional judgment, not obstruction.

- 永远带着数据和替代方案去谈，而不是空手说"不行"。

- Always come with data + alternatives, never just "no."

- 好的推回最终是双赢：工程质量 + 业务目标都得到保护。

📚 参考资料 / References

1. [The Engineering Manager's Handbook — Pushing Back Effectively](https://www.engmanager.com/)

2. [Staff Engineer: Leadership Beyond the Management Track (Will Larson)](https://staffeng.com/book)

3. [How to Disagree Productively — First Round Review](https://review.firstround.com/how-to-disagree-productively-and-find-common-ground/)

🧒 ELI5 / 用小孩能理解的话说

如果你的朋友说"我们现在去游泳吧"，但你知道外面在下大雨，你不是直接说"不去"，而是说"你想游泳吗？那我们去室内游泳池！"——这就是有建设性的推回。

If a friend says "let's swim now!" but it's raining, you don't just say "no" — you say "want to swim? Let's go to the indoor pool!" That's constructive pushback.

🎨 Frontend

🎨 前端 Day 9 / Frontend Day 9

主题 / Topic: React useState — 触发重渲染的状态 / State That Triggers Re-renders

🌏 真实场景 / Real Scenario

你在做一个任务管理 dashboard，点击按钮需要切换"显示已完成任务"的筛选器。你需要一个变量来记住当前状态，而且每次改变时 UI 要自动更新。这就是 useState 的舞台。

You're building a task dashboard. Clicking a button should toggle showing completed tasks. You need a variable that remembers its value AND automatically updates the UI when it changes. That's useState.

💻 代码示例 / Code Snippet


import { useState } from 'react'

interface Task {
  id: number
  title: string
  completed: boolean
}

function TaskDashboard() {
  // useState returns [current value, setter function]
  const [showCompleted, setShowCompleted] = useState(false)
  const [tasks] = useState([
    { id: 1, title: 'Review PR', completed: true },
    { id: 2, title: 'Write tests', completed: false },
    { id: 3, title: 'Deploy staging', completed: true },
  ])

  // Derived state — computed from existing state, no useState needed
  const visibleTasks = showCompleted
    ? tasks
    : tasks.filter(t => !t.completed)

  return (
    
      
      
        {visibleTasks.map(task => (
          
            {task.title}
          
        ))}
      
    
  )
}

🧠 猜猜输出 / What Does This Output?


function Counter() {
  const [count, setCount] = useState(0)
  
  const handleClick = () => {
    setCount(count + 1)
    setCount(count + 1)
    setCount(count + 1)
  }
  
  return 
}

点击一次后，count 是多少？/ After one click, what is count?

A) 3 — 调用了三次 setCount

B) 1 — React 批量处理，count 是快照

C) 0 — setState 是异步的，还没更新

D) 报错 — 不能在一个函数里多次调用 setCount

显示答案 / Show Answer

答案是 B — count = 1

为什么？因为在同一个事件处理函数中，count 是一个快照（snapshot），值固定为 0。三次 setCount(0 + 1) 都是 setCount(1)，最后只更新一次。

React batches state updates within the same event handler. count is a snapshot — it's 0 throughout the whole function. All three calls are setCount(0 + 1) = setCount(1). Result: 1.

✅ 如果你想累加，用函数式更新 / Use functional updates for increments:


setCount(prev => prev + 1) // ✅ prev is always latest value
setCount(prev => prev + 1) // ✅ prev = 1
setCount(prev => prev + 1) // ✅ prev = 2 → final count = 3

❌ 常见错误 / Common Mistakes

错误 1：直接修改 state 对象


// ❌ WRONG — mutating state directly, React won't re-render!
const [user, setUser] = useState({ name: 'Alice', age: 25 })
user.age = 26  // ← This doesn't trigger a re-render

// ✅ CORRECT — create a new object
setUser({ ...user, age: 26 })

错误 2：把可以派生的值放进 state


// ❌ WRONG — derived state causes sync issues
const [items, setItems] = useState([...])
const [filteredItems, setFilteredItems] = useState([...]) // ← redundant!

// ✅ CORRECT — compute it during render
const filteredItems = items.filter(item => item.active) // no useState needed

错误 3：忘记函数式更新导致 stale closure


// ❌ WRONG in async context or event batching
setCount(count + 1)

// ✅ CORRECT — always use functional update when new value depends on old
setCount(prev => prev + 1)

📐 何时用 / 何时不用 / When to Use vs Not

| ✅ 用 useState | ❌ 不用 useState |

|--------------|----------------|

| UI 交互状态（开/关、选中、展开） | 可从其他 state/props 计算的值 |

| 表单输入值 | 不需要触发渲染的变量（用 useRef） |

| 组件局部数据（列表项、分页） | 多组件共享状态（用 Context 或状态管理库） |

| 异步请求结果（loading/data/error） | 服务端状态（用 React Query / SWR） |

📚 参考资料 / References

1. [React Docs — useState](https://react.dev/reference/react/useState)

2. [React Docs — State as a Snapshot](https://react.dev/learn/state-as-a-snapshot)

3. [Common useState Mistakes (Kent C. Dodds)](https://kentcdodds.com/blog/dont-sync-state-derive-it)

🧒 ELI5 / 用小孩能理解的话说

useState 就像一块小白板。你可以在上面写字（set state），每次改变内容，整个教室（组件）都会重新看一遍白板（重渲染）。普通变量就像便利贴——改了 React 不知道，不会重新看。

useState is like a whiteboard. When you erase and rewrite it, the whole classroom (component) looks again and updates. A regular variable is like a sticky note only you can see — React doesn't know it changed.

🤖 AI

🤖 AI Day 9 — 本周 AI 大事件 / AI News Roundup

来源：web_search，2026年3月24日 / Sources: web_search, March 24, 2026

📰 Story 1: Agentic AI 成为新主流 / Agentic AI Goes Mainstream

来源 / Source: [switas.com — The AI Avalanche: 7 Agentic LLM Breakthroughs](https://www.switas.com/articles/the-ai-avalanche-7-agentic-llm-breakthroughs-reshaping-march-2026)

AI 从"生成文本"进化到"自主完成任务"。Gartner 预测 2026 年底 40% 的企业应用将内嵌任务型 AI Agent，作为真正的"数字同事"自动处理端到端业务流程。Oracle 也宣布了专为 Agentic AI 优化的数据库创新。

AI has evolved from "generate text" to "autonomously complete multi-step tasks." Gartner predicts 40% of enterprise apps will embed task-specific AI agents by end of 2026. Oracle announced new AI Database innovations purpose-built for agentic workloads.

为什么你应该关心 / Why you should care:

作为工程师，你很快会被要求构建或集成 AI Agent。理解 Agent 的工具调用、状态管理、错误恢复机制，会是核心面试考点和工作技能。

As an engineer, you'll soon be asked to build or integrate AI agents. Tool calling, state management, and error recovery for agents are becoming core interview topics.

📰 Story 2: 模型"认知密度"时代——参数不是唯一指标 / Cognitive Density: Parameters Aren't Everything

来源 / Source: [blog.mean.ceo — New AI Model Releases March 2026](https://blog.mean.ceo/new-ai-model-releases-news-march-2026/)

2026 年 3 月，AI 竞赛焦点从"谁的参数最多"转向"谁的认知密度最高"。Claude Opus 4.6（Anthropic）引入"自适应思考"——模型根据 prompt 复杂度动态决定是否深度推理，无需用户手动配置。OpenAI 的 GPT-5.4 系列专注于每字节更高的知识密度。

The AI race shifted from "most parameters" to "highest cognitive density." Claude Opus 4.6 introduced "adaptive thinking" — the model dynamically decides when to engage deeper reasoning without user configuration. OpenAI's GPT-5.4 focuses on knowledge density per byte.

为什么你应该关心 / Why you should care:

选模型时，benchmark 分数只是一方面。了解"推理成本 vs 质量"的权衡，帮你在实际项目中做出更聪明的模型选型决策。

When choosing models for production, benchmark scores aren't everything. Understanding the reasoning-cost vs quality tradeoff helps you make smarter model selection decisions.

📰 Story 3: 上下文窗口突破 100 万 Token / Context Windows Break 1M Tokens

来源 / Source: [alphacorp.ai — Top 5 LLMs for March 2026](https://www.alphacorp.ai/blog/top-5-llms-for-march-2026-benchmarks-pricing-picks)

多个领先模型的上下文窗口已突破 100 万 token，实验性模型甚至推向 1000 万。这意味着可以在单个 prompt 中塞入整个公司知识库、百万行代码库或多年财报数据。

Several leading models now boast 1M+ token context windows, with experimental models pushing toward 10M. You can now feed an entire company knowledge base, massive codebases, or years of financial records into a single prompt.

为什么你应该关心 / Why you should care:

超长上下文改变了 RAG（检索增强生成）的架构选择。某些场景下，直接 long-context 比构建向量数据库更简单、更准确——但成本和速度的权衡需要你来算。

Long contexts change RAG architecture decisions. Sometimes long-context beats building a vector database — but you need to reason about the cost/latency tradeoffs.

📰 Story 4: LLM 安全新技术 & "能力校准" / LLM Safety & Capability Calibration

来源 / Source: [news.ncsu.edu — New Technique Addresses LLM Safety](https://news.ncsu.edu/2026/03/new-technique-addresses-llm-safety/) · [morningstar.com — Appier Capability Calibration](https://www.morningstar.com/news/pr-newswire/20260324cn17690/stop-ai-from-guessing-appier-enables-agents-to-assess-confidence-before-acting)

NC State 研究人员发明了新技术识别保证安全响应的关键组件，同时将"对齐税"（安全训练带来的性能损失）降到最低。Appier 推出"能力校准"框架，让 AI Agent 在行动前先评估自己是否有能力完成任务，降低幻觉和过度自信。

NC State researchers identified key model components that ensure safe responses while minimizing the "alignment tax." Appier introduced "Capability Calibration" — AI agents assess their own confidence before taking action, reducing hallucinations and overconfidence in enterprise deployments.

为什么你应该关心 / Why you should care:

在企业 AI 部署中，让模型"知道自己不知道什么"比让它无限自信地输出错误答案更重要。Capability calibration 是 AI 工程中的新兴核心模式。

In enterprise AI, knowing what the model doesn't know is more valuable than confident-but-wrong outputs. Capability calibration is an emerging core pattern in AI engineering.

📰 Story 5: 模型发布速度危机——每 72 小时一个重磅发布 / Model Release Velocity Crisis

来源 / Source: [ai-weekly.ai — Newsletter 03-24-2026](https://ai-weekly.ai/newsletter-03-24-2026/)

行业分析师追踪到目前约每 72 小时就有一个重大 AI 模型发布。Gemini 3.1 Pro、Claude Opus 4.6、GPT-5.4、DeepSeek V3.2、Qwen 3.5……价格相比去年同期下降 40-80%，开源权重模型与闭源旗舰的差距正在快速收窄。

Analysts are tracking a major AI release approximately every 72 hours. Prices dropped 40-80% year-over-year. Open-weight models are closing the gap with closed-source flagships rapidly.

为什么你应该关心 / Why you should care:

AI 基础设施成本正在快速商品化。在系统设计中，"用哪个 LLM API"的成本计算将越来越重要，学会对比延迟、成本、质量的三角权衡是工程师的新必备技能。

AI infrastructure is rapidly commoditizing. Cost modeling for LLM API selection — balancing latency, cost, and quality — is becoming a core engineering skill.

📚 参考资料 / References

1. [AI Weekly Newsletter — March 24, 2026](https://ai-weekly.ai/newsletter-03-24-2026/)

2. [Top LLMs March 2026 — AlphaCorp](https://www.alphacorp.ai/blog/top-5-llms-for-march-2026-benchmarks-pricing-picks)

3. [NC State LLM Safety Research](https://news.ncsu.edu/2026/03/new-technique-addresses-llm-safety/)

🧒 ELI5 / 用小孩能理解的话说

AI 现在不只是"会说话"了，而是开始"帮你做事"（Agentic AI）。同时模型越来越聪明但越来越便宜，就像手机——几年前的旗舰价格，现在买到的性能翻了几倍。

AI isn't just "talking" anymore — it's "doing things for you" (Agentic AI). Meanwhile models keep getting smarter and cheaper, like smartphones — you get 10x more for the same price year after year.

byte-by-byte — 2026-03-23

Mon, 23 Mar 2026 12:00:00 +0000

🏗️ System Design

🏗️ 系统设计 Day 8 / System Design Day 8

主题 / Topic: 数据库索引与查询优化 / Database Indexing & Query Optimization

分类 / Category: Fundamentals · Beginner · Foundation Phase

🌍 真实场景 / Real-World Scenario

想象你在设计 Twitter 的搜索功能。用户搜索某条推文，数据库里有 5 亿条记录——如果没有索引，数据库必须逐行扫描，花几分钟才能返回结果。有了索引，查询可以在 几毫秒内 完成。

Imagine you're designing Twitter's search feature. Users search for tweets, and there are 500 million records in the database — without indexing, the database must scan row-by-row, taking minutes. With indexes, queries return in milliseconds.

🏛️ 架构图 / ASCII Architecture Diagram


User Query: "SELECT * FROM tweets WHERE user_id = 42 AND created_at > '2026-01-01'"

WITHOUT INDEX:                     WITH INDEX:
┌─────────────────────┐           ┌─────────────────────┐
│   Full Table Scan   │           │   B-Tree Index      │
│   Row 1: user_id=1  │           │   (user_id, date)   │
│   Row 2: user_id=15 │           │        Root         │
│   Row 3: user_id=42 │           │       /    \        │
│   ...               │           │    Node    Node     │
│   Row 500M: ???     │           │   /  \    /  \      │
│   ❌ 500M reads     │           │  L1  L2  L3  L4     │
└─────────────────────┘           │  ✅ ~log(N) reads   │
                                  └─────────────────────┘

Index Storage:
┌──────────┬──────────┬─────────────────┐
│ user_id  │   date   │  row_pointer →  │
│    42    │ 2026-01  │  page 1042, r3  │
│    42    │ 2026-02  │  page 2891, r7  │
└──────────┴──────────┴─────────────────┘

⚖️ 关键权衡 / Key Tradeoffs (为什么这样设计？)

索引加速读，但拖慢写 / Indexes Speed Reads, Slow Writes

| 指标 / Metric | 无索引 Without Index | 有索引 With Index |

|---|---|---|

| SELECT 查询 | O(N) 全表扫描 | O(log N) B-Tree 遍历 |

| INSERT / UPDATE | 快 ⚡ | 慢（需维护索引）|

| 存储空间 Storage | 小 | 更大（索引占空间）|

为什么用 B-Tree？ B-Tree 保持数据有序，支持范围查询（BETWEEN, >），适合绝大多数业务场景。

Why B-Tree? It keeps data sorted, supports range queries (BETWEEN, >), fitting most business use cases.

复合索引的列顺序很重要 / Column order in composite indexes matters:


-- Index on (user_id, created_at)
-- ✅ Can use: WHERE user_id = 42 AND created_at > '2026-01-01'
-- ✅ Can use: WHERE user_id = 42
-- ❌ Cannot use: WHERE created_at > '2026-01-01' (alone)
-- 最左前缀原则 / Leftmost prefix rule!

⚠️ 常见错误 / Common Mistakes (别踩这个坑)

1. 过度索引 Over-indexing — 给每列都加索引？写操作会变得极慢。生产中见过 INSERT 耗时 10 秒的案例。

Adding an index to every column? Writes become painfully slow.

2. 索引列上做函数运算 Function on indexed column — WHERE YEAR(created_at) = 2026 无法使用索引！改用 WHERE created_at BETWEEN '2026-01-01' AND '2026-12-31'。

WHERE YEAR(created_at) = 2026 can't use the index! Use range instead.

3. 忽视 EXPLAIN / Ignoring EXPLAIN — 不跑 EXPLAIN SELECT ... 怎么知道是否用到了索引？

Never running EXPLAIN SELECT ... — how do you even know if the index is used?

4. N+1 查询问题 / N+1 Query Problem — 循环里查询数据库，100 次查询 vs 1 次 JOIN。

Querying inside a loop: 100 queries vs 1 JOIN.

📚 References

- [PostgreSQL Index Documentation](https://www.postgresql.org/docs/current/indexes.html)

- [Use The Index, Luke — Free SQL Indexing Guide](https://use-the-index-luke.com/)

- [MySQL EXPLAIN Output Format](https://dev.mysql.com/doc/refman/8.0/en/explain-output.html)

🧒 ELI5 (小朋友也能懂)

索引就像书的目录。没有目录，你要找"索引"这个词，就得从第1页翻到最后。有了目录，直接翻到第 283 页。数据库索引做的是同样的事情——不用"翻遍所有数据"，直接跳到你要找的地方。

An index is like a book's table of contents. Without it, you'd flip through every page to find "indexing." With a table of contents, you jump right to page 283. Database indexes do the same thing — skip straight to what you need.

💻 Algorithms

💻 算法 Day 8 / Algorithms Day 8

题目 / Problem: #36 Valid Sudoku · 🟡 Medium

模式 / Pattern: Arrays & Hashing

🔗 [LeetCode #36](https://leetcode.com/problems/valid-sudoku/) · 📹 [NeetCode Video](https://www.youtube.com/watch?v=TjFXEUCMqI8)

🌍 真实类比 / Real-World Analogy

想象你是数独游戏的裁判。你不需要解开这个数独——只需要检查当前状态是否合法（没有行、列、3×3格子里有重复数字）。就像检查停车场：不是找空位，而是确认没有两辆车占同一个格子。

Imagine you're a Sudoku referee. You don't need to solve the puzzle — just verify the current state is valid (no duplicate numbers in any row, column, or 3×3 box). Like checking a parking lot: not finding empty spots, but confirming no two cars share the same space.

🧩 题目 / Problem

给定一个 9×9 的数独棋盘，判断是否有效。规则：

- 每行数字 1-9 不重复

- 每列数字 1-9 不重复

- 每个 3×3 子格数字 1-9 不重复

- 空格用 '.' 表示

Given a 9×9 Sudoku board, determine if it is valid. Rules: no duplicates in any row, column, or 3×3 box. Empty cells are '.'.

💡 关键洞察 / Key Insight

用哈希集合跟踪所见数字。 同时遍历三种结构：行、列、3×3格。

关键公式： 位于 (r, c) 的格子属于哪个 3×3 格？ → box_id = (r // 3) * 3 + (c // 3)

Use hash sets to track seen numbers simultaneously across rows, columns, and 3×3 boxes.

Key formula: which box does cell (r, c) belong to? → box_id = (r // 3) 3 + (c // 3)*


Box IDs:
┌───────┬───────┬───────┐
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │
│ box 0 │ box 1 │ box 2 │
├───────┼───────┼───────┤
│ box 3 │ box 4 │ box 5 │
├───────┼───────┼───────┤
│ box 6 │ box 7 │ box 8 │
└───────┴───────┴───────┘
r=0,c=0: (0//3)*3+(0//3) = 0*3+0 = box 0 ✅
r=1,c=4: (1//3)*3+(4//3) = 0*3+1 = box 1 ✅
r=4,c=7: (4//3)*3+(7//3) = 1*3+2 = box 5 ✅

🐍 Python 解法 / Python Solution


from collections import defaultdict

def isValidSudoku(board):
    # Use sets for each row, col, box
    rows = defaultdict(set)   # rows[r] = set of digits seen in row r
    cols = defaultdict(set)   # cols[c] = set of digits seen in col c
    boxes = defaultdict(set)  # boxes[b] = set of digits seen in box b

    for r in range(9):
        for c in range(9):
            val = board[r][c]
            if val == '.':
                continue  # Skip empty cells

            box_id = (r // 3) * 3 + (c // 3)

            # Check for duplicates
            if val in rows[r] or val in cols[c] or val in boxes[box_id]:
                return False

            # Record this value
            rows[r].add(val)
            cols[c].add(val)
            boxes[box_id].add(val)

    return True

🔍 代码追踪 / Code Trace

Using a small example focusing on the top-left 3×3 box (box 0):


board[0] = ["5","3",".",".","7",".",".",".","."]
board[1] = ["6",".",".","1","9","5",".",".","."]
board[2] = [".","9","8",".",".",".",".","6","."]

Processing r=0, c=0: val="5"
  box_id = (0//3)*3 + (0//3) = 0
  "5" not in rows[0]={}, cols[0]={}, boxes[0]={} → OK
  rows[0]={"5"}, cols[0]={"5"}, boxes[0]={"5"}

Processing r=0, c=1: val="3"
  box_id = (0//3)*3 + (1//3) = 0
  "3" not in rows[0]={"5"}, cols[1]={}, boxes[0]={"5"} → OK
  rows[0]={"5","3"}, cols[1]={"3"}, boxes[0]={"5","3"}

Processing r=1, c=0: val="6"
  box_id = (1//3)*3 + (0//3) = 0
  "6" not in rows[1]={}, cols[0]={"5"}, boxes[0]={"5","3"} → OK
  boxes[0]={"5","3","6"}

Processing r=2, c=1: val="9"
  box_id = (2//3)*3 + (1//3) = 0
  "9" not in boxes[0]={"5","3","6"} → OK

Processing r=2, c=2: val="8"
  box_id = 0
  "8" not in boxes[0]={"5","3","6","9"} → OK
  boxes[0]={"5","3","6","9","8"}

Final result: True ✅ (valid board)

⏱️ 复杂度 / Complexity

| | 时间 Time | 空间 Space |

|---|---|---|

| 复杂度 | O(9²) = O(81) = O(1) | O(9²) = O(1) |

| 说明 | 固定 81 格，常数时间 | 最多存 81 个数字 |

Board is always 9×9 — technically O(1) since input size is fixed!

🔄 举一反三 / Pattern Recognition

掌握"哈希集合去重"模式后，还能解：

Once you master "hash set deduplication," apply it to:

- [#48 Rotate Image](https://leetcode.com/problems/rotate-image/) — in-place matrix transformation

- [#54 Spiral Matrix](https://leetcode.com/problems/spiral-matrix/) — matrix traversal order

- [#289 Game of Life](https://leetcode.com/problems/game-of-life/) — state tracking in grids

- [#73 Set Matrix Zeroes](https://leetcode.com/problems/set-matrix-zeroes/) — flagging with sets

📚 References

- [LeetCode #36 Valid Sudoku](https://leetcode.com/problems/valid-sudoku/)

- [NeetCode Solution Video](https://www.youtube.com/watch?v=TjFXEUCMqI8)

- [Python defaultdict docs](https://docs.python.org/3/library/collections.html#collections.defaultdict)

🧒 ELI5

想象你有9张纸，每张代表一行。遇到数字就写到对应行的纸上。如果那张纸上已经有了这个数字，就说明不合法！同样对列和3×3格子也这样检查。

Imagine 9 sheets of paper, one per row. When you see a number, write it on that row's sheet. If the sheet already has that number — invalid! Do the same for columns and 3×3 boxes.

🗣️ Soft Skills

🗣️ 软技能 Day 8 / Soft Skills Day 8

题目 / Question: 你如何处理模糊不清的需求？ / How do you approach working with ambiguous requirements?

分类 / Category: Ambiguity · Senior/Staff Level · Foundation Phase

🎯 为什么这很重要 / Why This Matters

在大厂，模糊是常态，而非例外。产品经理不可能把每个细节都想清楚，业务方也不总知道自己真正想要什么。能优雅处理模糊需求，是区分 senior 工程师和 staff 工程师的核心能力之一。

In big tech, ambiguity is the norm, not the exception. PMs can't anticipate every detail, and stakeholders don't always know what they really want. Navigating ambiguity gracefully separates senior engineers from staff engineers.

⭐ STAR 拆解 / STAR Breakdown

情境 Situation: 描述一个需求不清晰的真实场景

Describe a real scenario where requirements were unclear

任务 Task: 你被分配了什么？你需要负责什么？

What were you assigned? What were you accountable for?

行动 Action: 你具体做了哪些事来厘清需求、推进工作？

What specific steps did you take to clarify and move forward?

结果 Result: 量化影响——节省了多少时间？避免了什么返工？

Quantify impact — time saved, rework avoided, team unblocked?

❌ 差的回答 vs ✅ 好的回答 / Bad vs Good Answer

❌ 差的回答

> "需求不清楚的时候，我就等产品经理把需求整理清楚，再开始做。"

> "When requirements are unclear, I wait for the PM to clarify everything before starting."

问题： 被动等待，没有主动推进。这在面试中是红牌。

Problem: Passive waiting shows no ownership or initiative — a red flag in interviews.

✅ 好的回答（结构化）

S: 我们在设计一个新的通知系统，PM 只说"用户应该能收到重要通知"，但没有定义什么是"重要"，也没有给出频率和渠道的要求。

S: We were designing a new notification system. The PM only said "users should receive important notifications" — no definition of "important," no frequency or channel requirements.

T: 我是 tech lead，需要在两周内给出技术方案，但需求太模糊无法开始。

T: I was the tech lead, needing to deliver a technical plan in two weeks, but the requirements were too vague to start.

A: 我做了三件事：

1. 先列假设清单：把我理解的"默认行为"写成文档，发给 PM 确认（"我假设通知包括订单状态变更和系统告警，是否正确？"）

2. 识别 reversible vs irreversible 决策：渠道选择（邮件/推送）容易改，数据库 schema 难改——对难改的部分花更多时间对齐

3. 用 spike + timebox：花一天做技术调研验证假设，而不是等两周后再发现方向错了

A: I did three things:

1. Made an assumption document — wrote down my "default understanding," sent to PM for confirmation

2. Identified reversible vs irreversible decisions — channel choice is easy to change; DB schema is hard — spent more alignment time on hard decisions

3. Used a spike + timebox — one day of research to validate assumptions rather than discover wrong direction two weeks later

R: 提前 4 天完成方案，避免了一次因误解"重要通知"而可能导致的数据库重新设计（估计 1.5 周返工）。

R: Delivered the plan 4 days early, avoiding a potential DB redesign from misunderstanding "important notifications" (estimated 1.5 weeks of rework).

👑 Senior/Staff 进阶技巧 / Senior/Staff Tips

1. 区分"紧急不可逆"和"可以先行" — 并非所有模糊都需要澄清后才能动手。

Distinguish "critical & irreversible" ambiguity from "can start anyway." Not all ambiguity blocks progress.

2. 用"约束条件反问" — 不要问"你想要什么"，而是"有哪些约束条件？"（deadline、预算、不能动哪些系统）

Ask about constraints rather than desires: "What can't change?" uncovers real requirements faster.

3. 两次沟通法则 — 如果你向同一个人问了两次同样的问题还没答案，换个方式：写 RFC，开会对齐，或者向上升级。

Two-ask rule: If you've asked the same person twice with no answer, escalate the format — write an RFC, schedule alignment, or escalate.

4. 让数据说话 — "我们先做一个小实验来验证假设"比"我不确定需求是什么"更有说服力。

Let data clarify: "Let's run a small experiment" is more powerful than "I'm not sure what we need."

🎯 关键要点 / Key Takeaways

- 🔑 模糊是工程师的日常，主动澄清是职业素养

- 🔑 区分"必须澄清"和"可以默认处理"的需求

- 🔑 把假设写成文档，发出去确认，保留记录

- 🔑 先行动，后优化——不要等到 100% 清晰

Ambiguity is daily life in engineering. Proactive clarification is professional maturity. Document assumptions. Move forward on reversible decisions, pause on irreversible ones.

📚 References

- [Gergely Orosz: "The Art of Asking Questions"](https://newsletter.pragmaticengineer.com/)

- [StaffEng.com: Staff-level behaviors](https://staffeng.com/guides/engineering-strategy)

- [Amazon LP: "Bias for Action" principle](https://www.amazon.jobs/content/en/our-workplace/leadership-principles)

🧒 ELI5

如果老师说"画一幅漂亮的画"，你不知道要画什么。聪明的做法是先问："可以是动物吗？用什么颜色？多大？" 然后开始画，完成后再调整。而不是坐在那里什么都不做，等老师来告诉你每一步。

If a teacher says "draw something beautiful" without details, the smart move is to ask: "Can it be an animal? What colors? How big?" Then start drawing and adjust. Not sit frozen waiting for the teacher to specify every brushstroke.

🎨 Frontend

🎨 前端 Day 8 / Frontend Day 8

主题 / Topic: CSS 动画与过渡 / CSS Animations & Transitions

分类 / Category: CSS Fundamentals · Week 2 · Foundation Phase

🤔 猜猜输出 / What's the Output?


.box {
  width: 100px;
  background: blue;
  transition: width 2s, background 0.5s;
}

.box:hover {
  width: 300px;
  background: red;
}

鼠标悬停时 (on hover)，先发生什么？

On hover, which change happens first (visually)?

A. 宽度先变化 (width changes first — both at same time)

B. 背景颜色先变化，然后宽度变化 (background changes first, then width)

C. 两者同时开始，背景先完成 (both start together, background finishes first)

D. 两者同时开始并同时完成 (both start and finish at the same time)

(答案在最后 / Answer at the end)

📐 Transition vs Animation

Transition（过渡）— "A 到 B"

触发式的，从一个状态到另一个状态。Triggered change between two states.


/* 基础语法 / Basic syntax */
.button {
  background: blue;
  transform: scale(1);
  
  /* property | duration | timing-function | delay */
  transition: background 0.3s ease-in-out,
              transform 0.2s ease;
}

.button:hover {
  background: darkblue;
  transform: scale(1.05); /* 轻微放大 / slight scale-up */
}

Animation（动画）— "持续循环"

不需要触发，可以自动运行、重复。Can run automatically, loop indefinitely.


/* Step 1: 定义关键帧 / Define keyframes */
@keyframes pulse {
  0%   { transform: scale(1);    opacity: 1; }
  50%  { transform: scale(1.1);  opacity: 0.7; }
  100% { transform: scale(1);    opacity: 1; }
}

/* Step 2: 应用动画 / Apply animation */
.loading-icon {
  /* name | duration | timing | delay | iteration | direction */
  animation: pulse 1.5s ease-in-out 0s infinite alternate;
}

⏱️ Timing Functions — 让动画有灵魂 / Giving Motion Soul


ease (default): 慢开始，加速，慢结束
ease-in:        慢开始，快结束 → 适合"进入"
ease-out:       快开始，慢结束 → 适合"退出"
ease-in-out:    两端慢，中间快 → 最自然
linear:         匀速 → 适合旋转加载图标
cubic-bezier(): 完全自定义！

Visual:
ease:       ___/‾‾‾
ease-in:    __/‾‾‾‾
ease-out:   ‾‾‾‾\__
linear:     ///////

🔥 实际代码对比 / Real Code Comparison

❌ 没有过渡 / No Transition (Jarring)


.menu {
  display: none; /* 瞬间消失/出现 — 体验差 / Instant — bad UX */
}

✅ 用 opacity + visibility 平滑过渡


.menu {
  opacity: 0;
  visibility: hidden;
  transition: opacity 0.3s ease, visibility 0.3s ease;
}

.menu.active {
  opacity: 1;
  visibility: visible;
}
/* 注意：display:none 不能过渡! 用 opacity + visibility 代替 */
/* Note: display:none can't transition! Use opacity + visibility instead */

⚡ 性能陷阱 / Performance Gotcha


/* ❌ 触发 Layout Reflow — 慢！*/
.bad { transition: width 0.3s, height 0.3s, margin 0.3s; }

/* ✅ 只触发 Composite — 快！*/
.good { transition: transform 0.3s, opacity 0.3s; }
/*     transform 和 opacity 在 GPU 上运行，性能最佳 */
/*     transform and opacity run on GPU — best performance */

规则 / Rule: 动画时优先使用 transform 和 opacity，避免 width/height/margin/top/left（会触发 reflow）。

Prefer transform and opacity for animations; avoid width/height/margin/top/left (triggers layout reflow).

🧩 迷你挑战 / Mini Challenge

用 CSS 实现一个 loading spinner，只用 @keyframes 和 border-radius：


/* 试着完成这段代码 / Try completing this */
.spinner {
  width: 40px;
  height: 40px;
  border: 4px solid #f3f3f3;
  border-top: 4px solid #3498db;
  border-radius: 50%;
  /* 添加旋转动画 / Add rotation animation here */
  animation: ??? 1s linear infinite;
}

@keyframes ??? {
  /* 定义旋转 / Define rotation */
}

/* 答案 / Answer:
animation: spin 1s linear infinite;
@keyframes spin {
  0%   { transform: rotate(0deg); }
  100% { transform: rotate(360deg); }
}
*/

📝 Quiz 答案解析 / Quiz Answer Explanation

正确答案：C — 两者同时开始，背景先完成


timeline (hover starts at t=0):
t=0ms:   ■ width transition starts (2000ms duration)
t=0ms:   ■ background transition starts (500ms duration)
t=500ms: ✅ background = red (DONE)
t=2000ms:✅ width = 300px (DONE)

transition: width 2s, background 0.5s — 两个属性同时开始，但持续时间不同，所以背景 500ms 就完成了，而宽度还要等到 2000ms。

Both transitions start at the same moment on hover, but have different durations: background completes in 500ms while width takes 2000ms.

📚 References

- [MDN: CSS Transitions](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Transitions/Using_CSS_transitions)

- [MDN: CSS Animations](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Animations/Using_CSS_animations)

- [CSS Tricks: Transition vs Animation](https://css-tricks.com/css-transitions-101/)

🧒 ELI5

过渡就像灯光调光开关——你拨动开关，灯慢慢变亮或变暗。动画就像旋转的风车——不需要你动它，它自己一直转。

Transitions are like a dimmer switch — you flip it and the light slowly changes. Animations are like a spinning pinwheel — it just keeps spinning on its own without you touching it.

🤖 AI

🤖 AI Day 8

主题 / Topic: 训练 vs 微调 vs 提示工程 / Training vs Fine-Tuning vs Prompting

分类 / Category: Foundations · Foundation Phase

🌍 直觉理解 / Intuitive Explanation

把 LLM 想象成一个刚毕业的大学生：

- 预训练 Pre-training = 完成大学本科，学习了世界上几乎所有的知识（读了整个互联网）

- 微调 Fine-tuning = 参加专业培训项目（比如医学院），让他专门擅长某个领域

- 提示工程 Prompting = 给这个毕业生发一份工作简报，告诉他今天该做什么、怎么做

三种方式的成本、灵活性、效果完全不同。

Think of an LLM as a fresh college graduate:

- Pre-training = completing a full degree (reading the entire internet)

- Fine-tuning = attending medical school (specializing in a domain)

- Prompting = giving them a daily briefing (telling them what and how to do today)

⚙️ 工作原理 / How It Works

1. 预训练 Pre-Training (从零开始)


大量文本数据                    → 基础模型
(互联网+书籍+代码)               (GPT-4, Llama, Claude)
Massive text corpus            → Foundation model

成本：数百万美元 + 数周 GPU 时间
Cost: Millions of dollars + weeks of GPU time

训练目标：预测下一个 token
Objective: Predict the next token

2. 微调 Fine-Tuning (在已有模型上继续训练)


基础模型 + 专业数据集 → 专用模型
Foundation model + domain data → Specialized model

例子 / Examples:
• GPT-4 + 法律文书 → 法律助手
• Llama + 医疗记录 → 医疗诊断模型
• Code Llama = Llama + 代码数据集

成本：数百到数万美元
Cost: Hundreds to tens of thousands of dollars

关键参数：learning rate 要小 (1e-5 to 1e-4)，防止"灾难性遗忘"
Key: Low learning rate to prevent "catastrophic forgetting"

3. 提示工程 Prompting (零成本，零训练)


用户: "你是一个专业的法律助手，..."
基础模型通过上下文调整行为
No training — just clever input formatting

技术 / Techniques:
• Zero-shot: 直接问
• Few-shot: 给几个例子
• Chain-of-thought: "一步一步想..."
• RAG: 外挂知识库

📊 三者对比 / Comparison Table

|---|---|---|---|

| 成本 Cost | 💰💰💰💰 极高 | 💰💰 中等 | 💰 几乎免费 |

💻 可运行代码 / Runnable Python Snippet


# pip install openai

from openai import OpenAI

client = OpenAI()  # Reads OPENAI_API_KEY from env

# ============================================================
# Technique 1: Zero-shot prompting
# ============================================================
zero_shot = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Classify this tweet as positive/negative/neutral: 'I love this new feature!'"}
    ]
)
print("Zero-shot:", zero_shot.choices[0].message.content)

# ============================================================
# Technique 2: Few-shot prompting (3 examples teach the format)
# ============================================================
few_shot = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "Classify tweets. Answer with just: positive, negative, or neutral."},
        {"role": "user",   "content": "The food was amazing!"},
        {"role": "assistant", "content": "positive"},
        {"role": "user",   "content": "Worst customer service ever."},
        {"role": "assistant", "content": "negative"},
        {"role": "user",   "content": "Package arrived."},
        {"role": "assistant", "content": "neutral"},
        # Now the real question:
        {"role": "user",   "content": "I love this new feature!"},
    ]
)
print("Few-shot:", few_shot.choices[0].message.content)
# Output: "positive" — much more consistent format!

🎯 什么时候用哪种？/ When to Use Which?


你的问题                         推荐方案
Your problem                    Recommended approach

"我要做一个通用聊天机器人"          → Prompting (系统提示词)
General chatbot                 → Prompting (system prompt)

"客服需要理解我们专有术语"          → Fine-tuning (100-1000个示例)
Support bot with domain jargon  → Fine-tuning (100-1000 examples)

"要让模型写代码更像我们团队风格"     → Fine-tuning 或 RAG+Prompting
Code style consistency          → Fine-tuning or RAG+Prompting

"从零做下一个 GPT-4"              → Pre-training (别这么想)
Build next GPT-4 from scratch   → Pre-training (don't)

经验法则 / Rule of thumb: 先试 prompting → 不行试 fine-tuning → 绝不要自己预训练

Try prompting first → fine-tune if needed → never pre-train from scratch

🧠 2026 前沿 / 2026 Cutting Edge

- LoRA / QLoRA — 用极少参数做高效微调，只需一张消费级 GPU（A100 → RTX 4090）

- RLHF — 用人类反馈强化训练，让模型更"安全"（ChatGPT 的秘诀之一）

- DPO — Direct Preference Optimization，比 RLHF 更简单，效果相当

📚 References

- [OpenAI Fine-tuning Guide](https://platform.openai.com/docs/guides/fine-tuning)

- [Hugging Face: LoRA & PEFT Documentation](https://huggingface.co/docs/peft/index)

- [Andrej Karpathy: Let's build GPT (YouTube)](https://www.youtube.com/watch?v=kCc8FmEb1nY)

🧒 ELI5

想象你要教一条狗学新把戏：

- 预训练 = 把狗从小养大，它学会了所有基本技能

- 微调 = 专门训练它学"装死"这一个把戏，练了一个月

- 提示工程 = 你每次说话前告诉它"你是一只很聪明的狗，我想让你..."

最简单的方法永远是先"说话"，实在不行才去"专门训练"。

To teach a dog tricks: pre-training = raising it from a puppy (learns everything). Fine-tuning = a month of "play dead" training. Prompting = just telling it what you want before each command. Always try talking first.

byte-by-byte — 2026-03-21

Sat, 21 Mar 2026 12:00:00 +0000

Deepdive

🔬 Saturday Deep Dive: Valid Sudoku (15 min read)

📊 Day 8/150 · NeetCode: 8/150 · SysDesign: 8/40 · Behavioral: 8/40 · Frontend: 8/50 · AI: 4/30

🔥 8-day streak!

🔗 [LeetCode #36: Valid Sudoku](https://leetcode.com/problems/valid-sudoku/) 🟡 Medium

📹 [NeetCode Solution](https://neetcode.io/problems/valid-sudoku)

Overview / 概述

今天我们深入研究一道经典的中等难度题：验证数独盘面的合法性。

Today we deep-dive into a classic Medium problem: checking whether a Sudoku board is valid.

这道题看起来简单，实则考察你对多维哈希、坐标映射和边界条件的掌握。

It looks approachable, but tests your grasp of multi-dimensional hashing, coordinate mapping, and edge conditions.

面试中它频繁出现，因为它考察的不是"你能不能写循环"，而是"你能不能系统地对数据建模"。

It appears often in interviews not to test loop-writing, but to test systematic data modeling.

Part 1: Theory / 理论基础 (5 min)

什么是数独验证？/ What Are We Validating?

一个合法的数独盘面满足三个约束：

A valid Sudoku board satisfies exactly three constraints:


┌─────────────────────────────────┐
│  CONSTRAINT 1: Each ROW         │
│  每一行：数字1-9不能重复        │
│  No digit 1-9 repeated in a row │
├─────────────────────────────────┤
│  CONSTRAINT 2: Each COLUMN      │
│  每一列：数字1-9不能重复        │
│  No digit 1-9 repeated in col   │
├─────────────────────────────────┤
│  CONSTRAINT 3: Each 3×3 BOX     │
│  每个3×3宫格：数字1-9不能重复   │
│  No digit 1-9 repeated in box   │
└─────────────────────────────────┘

关键点：我们不需要验证数独能否被解出，只需要验证当前已填数字是否合法。

Key insight: We don't need to verify the puzzle is solvable — only that existing digits don't violate rules.

'.' 代表空格，忽略不计。/ '.' means empty cell — skip it.

数据建模：核心思想 / Data Modeling: The Core Idea

最直观的暴力解法是：分别对每行、每列、每个宫格做验证，嵌套循环九次。

The naive approach: check rows, columns, and boxes separately — nine nested loops.

更优雅的方式：一次遍历，同时维护三个集合字典。

Better: one pass, three sets of dictionaries updated simultaneously.


For cell (r, c) containing digit d:

  rows[r]    → must not contain d yet
  cols[c]    → must not contain d yet
  boxes[b]   → must not contain d yet

where b = (r // 3, c // 3)  ← box index!

这个 (r // 3, c // 3) 是本题最优雅的技巧：将坐标映射为0-8的盒子索引。

The (r // 3, c // 3) trick maps any cell to its 3×3 box index — this is the elegance of the problem.


Box indices visualized:
┌───┬───┬───┐
│0,0│0,1│0,2│
├───┼───┼───┤
│1,0│1,1│1,2│
├───┼───┼───┤
│2,0│2,1│2,2│
└───┴───┴───┘
row 0-2, col 0-2 → box (0,0)
row 0-2, col 3-5 → box (0,1)
row 3-5, col 6-8 → box (1,2)
... etc.

Part 2: Step-by-Step Implementation / 一步一步实现 (8 min)

Approach 1: Naive (brute force) — O(9²) time but ugly

验证每行、每列、每宫格需要9次独立循环。可读性差，容易出错。

Nine separate loops for rows, columns, boxes. Hard to read, error-prone.


# Approach 1: Naive — DO NOT use in interview (too verbose)
def isValidSudoku_naive(board):
    # Check rows
    for row in board:
        seen = set()
        for c in row:
            if c == '.': continue
            if c in seen: return False
            seen.add(c)
    
    # Check cols (transpose and repeat)
    for col in range(9):
        seen = set()
        for row in range(9):
            c = board[row][col]
            if c == '.': continue
            if c in seen: return False
            seen.add(c)
    
    # Check 3x3 boxes
    for box_row in range(3):
        for box_col in range(3):
            seen = set()
            for r in range(box_row*3, box_row*3+3):
                for c in range(box_col*3, box_col*3+3):
                    val = board[r][c]
                    if val == '.': continue
                    if val in seen: return False
                    seen.add(val)
    return True
# Time: O(81) = O(1) technically since board is fixed 9x9
# Space: O(9) per iteration
# Problem: 3 separate passes, harder to extend, verbose

Approach 2: Optimal — Single Pass, Three Dicts ✅

这是面试中应该给出的答案。 / This is the answer you should give in an interview.


from collections import defaultdict

def isValidSudoku(board):
    # Three dictionaries: rows, columns, boxes
    # Each maps an index to a set of seen digits
    rows = defaultdict(set)   # rows[r] = {digits seen in row r}
    cols = defaultdict(set)   # cols[c] = {digits seen in col c}
    boxes = defaultdict(set)  # boxes[(r//3, c//3)] = {digits seen in box}

    for r in range(9):
        for c in range(9):
            val = board[r][c]
            
            # Skip empty cells
            if val == '.':
                continue
            
            # Calculate box index — THE KEY TRICK
            box_key = (r // 3, c // 3)
            
            # Check all three constraints simultaneously
            if (val in rows[r] or
                val in cols[c] or
                val in boxes[box_key]):
                return False  # Duplicate found — invalid!
            
            # Add to all three tracking sets
            rows[r].add(val)
            cols[c].add(val)
            boxes[box_key].add(val)
    
    return True  # No violations found

Visual Trace / 逐步追踪

Let's trace through a small part of the board:


Board (first 3 rows shown):
["5","3",".",  ".","7",".",  ".",".","."]
["6",".",  ".",  "1","9","5",  ".",".","."]
[".","9","8",  ".",".",  ".",  ".","6","."]

Step-by-step trace:


(r=0, c=0) val='5':
  box_key = (0//3, 0//3) = (0,0)
  rows[0]={}, cols[0]={}, boxes[(0,0)]={}  → no conflict
  Add: rows[0]={'5'}, cols[0]={'5'}, boxes[(0,0)]={'5'}

(r=0, c=1) val='3':
  box_key = (0,0)
  rows[0]={'5'}, '3' not in it ✓
  cols[1]={},    '3' not in it ✓
  boxes[(0,0)]={'5'}, '3' not in it ✓
  Add: rows[0]={'5','3'}, cols[1]={'3'}, boxes[(0,0)]={'5','3'}

(r=0, c=2) val='.': SKIP

(r=0, c=4) val='7':
  box_key = (0//3, 4//3) = (0,1)  ← different box!
  Add to rows[0], cols[4], boxes[(0,1)]

(r=1, c=0) val='6':
  box_key = (1//3, 0//3) = (0,0)  ← same box as row 0!
  rows[1]={}, '6' not in it ✓
  cols[0]={'5'}, '6' not in it ✓
  boxes[(0,0)]={'5','3'}, '6' not in it ✓
  Add successfully.

Imagine if we had another '5' at (1,0):
  cols[0] already has '5' → return False immediately! ✅

Part 3: Edge Cases & Gotchas / 边界情况 (2 min)

Edge Case 1: Empty Board


board = [['.']*9 for _ in range(9)]
# Result: True — empty board is valid
# Our code handles this: all '.' cells are skipped

Edge Case 2: Same digit in same box, different row AND column


# Trap: this is invalid even though row 0 and col 3 look different
# '5' at (0,0) and '5' at (2,2) — same 3×3 box!
# The box_key trick catches this: both map to (0,0)

Edge Case 3: The "it looks valid row-wise" trap


# Each row and column could be valid individually,
# but a 3×3 box might have duplicates
# Always check all THREE constraints — don't shortcut!

常见面试坑 / Common Interview Traps


❌ Trap 1: Using (r // 3) * 3 + (c // 3) as box index
   → This creates 0-8 integer keys, works but harder to read
   → Tuple (r//3, c//3) is cleaner and equally correct

❌ Trap 2: Validating the full 1-9 range
   → We only need to check for duplicates among present digits
   → If a row has [1,2,3,...,8] with one empty, it's still valid!

❌ Trap 3: Modifying the input board
   → Never do this. The problem says "determine if valid", not "solve it"

✅ Best practice: defaultdict(set) avoids KeyError on first access
✅ Best practice: check all 3 constraints before adding — order matters!

Part 4: Real-World Application / 实际应用 (2 min)

数独验证器不只是一道面试题——它的底层思路在很多地方都有应用。

Valid Sudoku isn't just an interview puzzle — its core pattern appears everywhere.

Pattern: Multi-Key Set Membership Check

1. Database Uniqueness Constraints


In databases, a UNIQUE constraint across multiple columns is the same idea:
  UNIQUE(user_id, date) — same as "no two rows share both values"
  
PostgreSQL internally uses hash indexes (like our sets) per constraint.

2. Spreadsheet Validation


Google Sheets "no duplicates in range" validation:
  Checks each row, column, named range simultaneously
  Exact same three-constraint structure

3. Compiler Symbol Tables


A compiler tracks variable names per scope:
  rows    → local scope
  cols    → class scope  
  boxes   → module scope
  
"Variable already declared" = duplicate in the same scope (set)

4. Form Validation in UI


// Same pattern: check uniqueness across multiple dimensions
const seen = { byEmail: new Set(), byUsername: new Set() };
users.forEach(user => {
  if (seen.byEmail.has(user.email)) throw Error("duplicate email");
  seen.byEmail.add(user.email);
});

Part 5: Interview Simulation / 面试模拟 (3 min)

假设你已经给出了最优解。面试官可能会问这些问题：

Assume you've presented the optimal solution. Here are follow-up questions interviewers typically ask:

Q1: "What's the time and space complexity?"


Time: O(81) = O(1)
  → Fixed 9×9 board: exactly 81 cells, constant work per cell
  → In general: O(n²) for an n×n Sudoku variant

Space: O(81) = O(1) 
  → At most 81 entries across all sets (each cell appears once per set)
  → In general: O(n²)

⚠️ Interviewer follow-up: "But you're using three defaultdicts..."
→ Still O(1) because board size is fixed. For variable n, it's O(n²).

Q2: "Could you do this without any extra space?"


不完全能，但可以压缩。/ Not quite, but you can compress.

You could encode sets as bitmasks (bit 1 = digit 1 seen, bit 2 = digit 2, etc.):
  rows = [0] * 9   # 9 integers, each bit i = digit i+1 seen
  cols = [0] * 9
  boxes = [0] * 9

  For digit d (1-9), check: if rows[r] & (1 << d): return False
                            rows[r] |= (1 << d)

Space: O(27 integers) = O(1). Faster in practice due to cache locality.

Q3: "How would you extend this to validate a 16×16 Sudoku (Hexadoku)?"


# Same code, parameterized:
def isValidSudokuNxN(board, n=9, box_size=3):
    rows = defaultdict(set)
    cols = defaultdict(set)
    boxes = defaultdict(set)
    for r in range(n):
        for c in range(n):
            val = board[r][c]
            if val == '.': continue
            box_key = (r // box_size, c // box_size)
            if val in rows[r] or val in cols[c] or val in boxes[box_key]:
                return False
            rows[r].add(val); cols[c].add(val); boxes[box_key].add(val)
    return True
# For 16×16: n=16, box_size=4 → same logic, bigger constants

Q4: "What if you needed to not just validate but also solve the Sudoku?"


验证是求解的子问题。/ Validation is a sub-problem of solving.

Solving requires backtracking:
1. Find an empty cell
2. Try digits 1-9
3. For each digit: call isValid (our function!) to check constraints
4. If valid, place digit, recurse
5. If recursion fails, backtrack (remove digit, try next)

Time: O(9^m) where m = number of empty cells — exponential worst case
Typical Sudoku (m~50 empty cells) solves in milliseconds due to pruning.

Q5: "If this were a production system checking millions of Sudoku boards per second, what would you optimize?"


1. Bitmask instead of sets → 9 ints vs 27 sets, better cache performance
2. Early termination → return False as soon as first conflict found (already done!)
3. SIMD/vectorization → process 4 rows simultaneously with 256-bit registers
4. GPU parallelism → each 81-cell check is independent; batch 10k boards on GPU
5. Pre-compute valid digit sets → use lookup tables for box → allowed digits

In practice: bitmask version on CPU is 3-5x faster than set version.

Complexity Summary / 复杂度总结


┌─────────────────┬──────────┬──────────────┐
│ Approach        │ Time     │ Space        │
├─────────────────┼──────────┼──────────────┤
│ Naive (3 loops) │ O(81)    │ O(27) sets   │
│ Single pass     │ O(81)    │ O(27) sets   │  ← Best readability
│ Bitmask         │ O(81)    │ O(27) ints   │  ← Best performance
└─────────────────┴──────────┴──────────────┘

All O(1) for fixed 9×9 board.
For general n×n Sudoku: O(n²) time and space.

📚 References / 深入学习

• 🔗 [LeetCode #36: Valid Sudoku](https://leetcode.com/problems/valid-sudoku/) 🟡 Medium

• 📹 [NeetCode Solution Video](https://neetcode.io/problems/valid-sudoku)

• 📖 [NeetCode Arrays & Hashing Roadmap](https://neetcode.io/roadmap) — Visual learning path for this pattern

• 🔗 [LeetCode #37: Sudoku Solver](https://leetcode.com/problems/sudoku-solver/) — Natural follow-up (backtracking)

• 🔗 [LeetCode #187: Repeated DNA Sequences](https://leetcode.com/problems/repeated-dna-sequences/) — Same multi-key hashing pattern

🧒 ELI5: Checking if a Sudoku is valid is like making sure no two kids in the same row, column, or team have the same number on their shirt — you write down each number as you see it, and yell "STOP!" if you see it twice in the same group.

Saturday Deep Dive — Day 8 · Generated 2026-03-21

byte-by-byte — 2026-03-20

Fri, 20 Mar 2026 12:00:00 +0000

🏗️ System Design

🏗️ 系统设计 Day 7 (3 min read) / System Design Day 7

Database Types: SQL vs NoSQL — 数据库类型：关系型 vs 非关系型

想象你在设计一个新的社交媒体平台…

你的用户数据整齐划一：每人都有 ID、用户名、邮箱、注册时间。这很适合用 SQL 数据库 — 就像一张结构清晰的 Excel 表格，行和列整整齐齐。

但是，用户发的帖子呢？有人只写文字，有人附图片，有人嵌入视频，有人加了位置标签… 每条帖子的结构都不同。这时候 NoSQL 就大放异彩 — 像一个灵活的 JSON 文档，想放什么字段就放什么。

架构对比 / Architecture Comparison


         SQL Database                    NoSQL Database
    ┌─────────────────────┐         ┌─────────────────────────┐
    │      USERS TABLE    │         │    users collection     │
    ├──────┬──────┬───────┤         │                         │
    │  id  │ name │ email │         │ { id: 1,                │
    ├──────┼──────┼───────┤         │   name: "Alice",        │
    │  1   │Alice │a@x.com│         │   email: "a@x.com",     │
    │  2   │ Bob  │b@y.com│         │   preferences: {...},   │
    └──────┴──────┴───────┘         │   badges: ["🏆","⭐"] } │
                                    └─────────────────────────┘
    Schema enforced upfront          Schema flexible / per doc
    JOIN across tables               Embed related data
    ACID transactions                Eventual consistency (often)
    Scale: vertical (bigger server)  Scale: horizontal (more servers)

核心概念 / Key Concepts

SQL (关系型数据库) — MySQL, PostgreSQL, SQLite

- 结构化数据: 表、行、列，schema 固定

- ACID 事务: Atomicity（原子性）, Consistency（一致性）, Isolation（隔离性）, Durability（持久性）— 银行转账不能丢数据！

- JOIN 操作: 多表关联查询，数据不冗余

- 强一致性: 写入后立即可读

NoSQL — MongoDB (文档), Redis (键值), Cassandra (列族), Neo4j (图)

- 灵活 schema: 每条记录结构可以不同

- 水平扩展: 分片（sharding）轻松加机器

- 最终一致性: 写入后可能有短暂延迟才全局可见

- 高吞吐量: 读写速度极快（尤其键值存储）

为什么这样设计？How to Choose?

| 场景 | 推荐 | 原因 |

|------|------|------|

| 用户账户、订单、财务 | SQL | 需要 ACID，数据关系明确 |

| 用户会话、缓存、排行榜 | Redis (NoSQL) | 极速读写，TTL 支持 |

| 产品目录、内容管理 | MongoDB (NoSQL) | 结构多变，嵌套文档 |

| 社交图谱、推荐系统 | Neo4j (Graph) | 关系查询是核心需求 |

| 日志、时序数据 | Cassandra (NoSQL) | 海量写入，时间范围查询 |

经验法则: 先问"我的数据关系是否复杂？事务是否关键？" → 是则 SQL。"数据量是否巨大？结构是否多变？" → 是则 NoSQL。

现实中：两者共存！

- Instagram: PostgreSQL（用户/帖子关系） + Cassandra（活动 feed） + Redis（缓存）

- 单一数据库解决所有问题是反模式

别踩这个坑 / Common Mistakes

❌ "NoSQL 比 SQL 更快" — 错！取决于使用场景。复杂 JOIN 查询 SQL 更高效；简单键值查找 Redis 秒杀一切。

❌ "NoSQL 不支持事务" — MongoDB 4.0+ 已支持多文档 ACID 事务，只是使用场景不同。

❌ "选了就不能换" — 实践中，随着业务演进经常需要引入第二种数据库。提前规划数据访问层（DAL）让切换更容易。

❌ 过早优化 — 99% 的创业公司用 PostgreSQL 就够了。等真正有 scale 问题再引入 NoSQL。

📚 深入学习 / Learn More:

- [Uber Engineering: Postgres to MySQL Migration](https://www.uber.com/blog/postgres-to-mysql-migration/) — 真实案例：为什么 Uber 从 PostgreSQL 迁移到 MySQL

- [ByteByteGo: SQL vs NoSQL](https://www.youtube.com/watch?v=_Ss42Vb1SU4) — 可视化对比讲解（YouTube）

- [Designing Data-Intensive Applications, Ch. 2](https://dataintensive.net/) — Martin Kleppmann 的权威参考书

🧒 ELI5: A SQL database is like a super organized binder with divider tabs where everything goes in exactly the right slot; a NoSQL database is like a big backpack where you can throw in anything — a book, a lunchbox, a basketball — whatever shape it is.

💻 Algorithms

💻 算法 Day 7 (4 min read) / Algorithms Day 7

#238 Product of Array Except Self (Medium) — Arrays & Hashing / Prefix Products

🔗 [LeetCode #238: Product of Array Except Self](https://leetcode.com/problems/product-of-array-except-self/) 🟡 Medium

📹 [NeetCode Solution](https://neetcode.io/problems/products-of-array-discluding-self)

🌎 Real-World Analogy / 现实类比

想象你是一家工厂的质检员，流水线上有 N 个零件，每个都有一个重量。你的任务是：对于每个零件，快速计算出其他所有零件的总重量之积（不能把该零件自己算进去）。

最笨的方法？每次都把其他所有零件重新乘一遍 — O(n²)，太慢了。

聪明的方法？先从左到右算一遍"前缀积"，再从右到左算一遍"后缀积"，两个一乘就得到答案！

📋 Problem Statement / 题目

Given an integer array nums, return an array answer such that answer[i] is equal to the product of all elements of nums except nums[i].

给定整数数组 nums，返回数组 answer，使得 answer[i] 等于 nums 中除 nums[i] 之外所有元素的乘积。

Constraint: Must run in O(n) time, without using division (不能用除法).

Example:


Input:  nums = [1, 2, 3, 4]
Output:       [24, 12,  8,  6]

Check: 
  answer[0] = 2 * 3 * 4 = 24  ✓
  answer[1] = 1 * 3 * 4 = 12  ✓
  answer[2] = 1 * 2 * 4 =  8  ✓
  answer[3] = 1 * 2 * 3 =  6  ✓

🔍 Step-by-Step Walkthrough / 逐步分析

Key Insight: For position i, the answer = (product of everything to the LEFT of i) × (product of everything to the RIGHT of i)


nums =   [ 1,   2,   3,   4 ]
index:     0    1    2    3

Left prefix products (prefix[i] = product of nums[0..i-1]):
prefix = [ 1,   1,   2,   6 ]
          ^     ^    ^    ^
          i=0   1*1  1*2  1*2*3
         (nothing (only (1,2   (1,2,3
          left)   nums[0])  left) left)

Right suffix products (suffix[i] = product of nums[i+1..end]):
suffix = [24,  12,   4,   1 ]
          ^     ^    ^    ^
        2*3*4  3*4   4   (nothing
                          right)

answer[i] = prefix[i] * suffix[i]:
  [0]: 1 * 24 = 24
  [1]: 1 * 12 = 12
  [2]: 2 *  4 =  8
  [3]: 6 *  1 =  6

Space Optimization: We can do this in O(1) extra space (output array doesn't count) by computing prefix in the output array first, then multiplying suffix on-the-fly from right to left.

🐍 Python Solution / Python 解法


def productExceptSelf(nums: list[int]) -> list[int]:
    n = len(nums)
    answer = [1] * n
    
    # Pass 1: Fill answer[i] with the PREFIX product (product of everything LEFT of i)
    # After this pass: answer = [1, 1, 2, 6] for input [1, 2, 3, 4]
    prefix = 1
    for i in range(n):
        answer[i] = prefix      # store product of everything before i
        prefix *= nums[i]       # update running prefix
    
    # Pass 2: Multiply answer[i] by the SUFFIX product (product of everything RIGHT of i)
    # We traverse right-to-left, tracking running suffix product
    suffix = 1
    for i in range(n - 1, -1, -1):
        answer[i] *= suffix     # multiply in the suffix product
        suffix *= nums[i]       # update running suffix

    return answer

# Test it:
print(productExceptSelf([1, 2, 3, 4]))    # → [24, 12, 8, 6]
print(productExceptSelf([-1, 1, 0, -3, 3]))  # → [0, 0, 9, 0, 0]

Trace with [1, 2, 3, 4]:


After Pass 1 (prefix):
  i=0: answer[0] = 1,  prefix = 1
  i=1: answer[1] = 1,  prefix = 2
  i=2: answer[2] = 2,  prefix = 6
  i=3: answer[3] = 6,  prefix = 24
  answer = [1, 1, 2, 6]

After Pass 2 (suffix, right to left):
  i=3: answer[3] = 6  * 1 = 6,   suffix = 4
  i=2: answer[2] = 2  * 4 = 8,   suffix = 12
  i=1: answer[1] = 1  * 12 = 12, suffix = 24
  i=0: answer[0] = 1  * 24 = 24, suffix = 24
  answer = [24, 12, 8, 6]  ✓

⏱️ Complexity / 复杂度

| | Complexity |

|---|---|

| Time | O(n) — two linear passes |

| Space | O(1) extra (output array doesn't count per problem rules) |

举一反三 / Pattern Recognition

The Prefix/Suffix Pattern unlocks many problems:

- Any time you need "everything except me" → think prefix × suffix

- Variant: [LeetCode #42: Trapping Rain Water](https://leetcode.com/problems/trapping-rain-water/) — uses max-prefix and max-suffix arrays

- Variant: [LeetCode #152: Maximum Product Subarray](https://leetcode.com/problems/maximum-product-subarray/) — track both max and min prefix (negatives flip signs!)

- Variant: [LeetCode #724: Find Pivot Index](https://leetcode.com/problems/find-pivot-index/) — prefix sum version of the same idea

Follow-up interview questions:

1. "What if the array contains zeros?" → The code already handles it correctly (the zero propagates into the suffix/prefix)

2. "Can you solve it with O(n²) first, then optimize?" → Always a good way to start

3. "What about overflow?" → Use Python (arbitrary precision) or modular arithmetic

📚 深入学习 / Learn More:

- 📹 [NeetCode Solution Video](https://neetcode.io/problems/products-of-array-discluding-self) — best visual explanation of the prefix/suffix approach

- [Arrays & Hashing Pattern Guide — NeetCode Roadmap](https://neetcode.io/roadmap) — see the Arrays & Hashing section for this pattern

- Related: [LeetCode #42: Trapping Rain Water](https://leetcode.com/problems/trapping-rain-water/) 🔴 Hard | [LeetCode #152: Maximum Product Subarray](https://leetcode.com/problems/maximum-product-subarray/) 🟡 Medium

🧒 ELI5: If you have 4 friends and you want to know how many handshakes happen when you're NOT included, you count all the handshakes to your left, then all the handshakes to your right, and multiply them together — that's your answer!

🗣️ Soft Skills

🗣️ 软技能 Day 7 (2 min read) / Soft Skills Day 7

Technical Leadership: "Tell me about a time you simplified a complex system"

技术领导力："讲一个你简化复杂系统的经历"

为什么这很重要 / Why This Matters

这道题考查的不是"你删了多少行代码"，而是：

1. 你能识别真正的复杂性来源（accidental vs essential complexity）

2. 你有勇气说"这个可以更简单"，并推动改变

3. 你理解简化的代价 — 有时候"复杂"是有原因的

Senior/Staff 工程师最重要的技能之一：抵抗系统熵增，不让复杂性悄悄积累。

This question tests whether you can identify the root of complexity, have the courage to push for change, and understand the tradeoffs of simplification.

STAR Framework Breakdown / STAR 框架拆解

Situation (情境):

描述系统状态 + 为什么它变复杂了

- 关键信息：系统规模、团队背景、复杂性的历史原因

- 例："我们的支付服务经历了 3 年迭代，有 7 个微服务处理同一笔交易的不同阶段…"

Task (任务):

你的角色 + 为什么这个简化很重要

- 不只是"我负责这个" — 说清楚商业影响

- 例："每次新支付方式上线需要 6 周，竞对只需 2 周。我主导了简化工作。"

Action (行动):

这是最重要的部分！展示技术深度：

- 你如何诊断复杂性来源（画架构图？追踪请求链路？）

- 你如何区分哪些复杂性可以去掉，哪些必须保留

- 你如何获得团队 buy-in（技术评审、数据支撑、渐进迁移）

- 你如何降低风险（feature flags、灰度发布、监控）

Result (结果):

量化影响，不要含糊：

- ✅ "上线时间从 6 周降至 1.5 周"

- ✅ "代码行数减少 40%，P99 延迟从 800ms 降到 200ms"

- ✅ "新工程师上手时间从 2 周缩短到 3 天"

❌ Bad Approach vs ✅ Good Approach

❌ Bad:

> "我们的旧系统很乱，我重写了它。现在好多了，代码更干净，大家都很满意。"

问题所在：

- 没说清楚复杂性的来源

- "重写"是危险词（没提风险管理）

- 结果模糊，没有数据

- 听起来像个人英雄主义，不像团队领导力

✅ Good:

> "我们有一个支付编排服务，最初设计是 2021 年给 3 种支付方式用的，到 2023 年已经支持 12 种，服务里充满了 if/else 分支和特殊 case 处理。每次新方式上线，QA 要测试所有 12 种，因为改动影响面不可预测。

> 我花了两周时间梳理请求流，发现核心问题：所有支付方式被平等对待，但实际上 80% 的代码只和 2 种高复杂度方式有关。我提出用策略模式（Strategy Pattern）重构，让每种支付方式封装自己的逻辑。

> 说服团队是最难的部分 — 大家怕改出 bug。我做了一个 spike，证明可以在不改变任何外部 API 的情况下完成重构，并用 feature flags 控制灰度。我们花了 6 周分批迁移，每批覆盖 2 种支付方式。

> 最终：新支付方式上线时间从 6 周降至 1.5 周，QA 测试范围减少 60%，事故率下降了 35%。"

Scenario Template to Adapt / 可复用场景模板


Context: [系统名] had grown from [原始状态] to [当前状态] over [时间],
         resulting in [具体问题].

My Role: As [你的角色], I was responsible for [范围].
         The business impact was [影响] — [量化].

Diagnosis: I [诊断方式 — 画图/追链路/分析指标], and identified that
           the core source of complexity was [根本原因].

Solution: I proposed [方案], which addressed [核心问题] while
          preserving [必须保留的复杂性原因].

Risk Management: To validate, I [验证方法]. For rollout, I [迁移策略].

Result: [定量结果 1], [定量结果 2], [定量结果 3].

Senior/Staff Level Tips / Senior/Staff 级别加分点

🎯 区分 accidental vs essential complexity

- Essential: 业务本身就是复杂的（监管要求、多租户架构）— 必须接受

- Accidental: 历史债务、过度工程、沟通问题导致的 — 可以消除

- 在回答中明确说"这部分复杂性是必要的，我们保留了它"

🎯 说清楚你是如何 sell 这个方案的

Staff 工程师的简化工作往往需要跨团队协作。说说你如何：

- 用数据/可视化说服怀疑者

- 处理"如果没坏为什么要修"的反对声

- 建立渐进迁移计划让团队安心

🎯 提到你保留了什么

最好的答案会说"我们考虑过把 X 也简化掉，但决定保留，因为…" — 这体现了成熟的判断力。

关键要点 / Key Takeaways

1. 简化不是删代码，是降低认知负担 — 衡量标准是新工程师理解系统需要多久

2. 诊断先于方案 — 先说"我如何找到问题根源"，再说方案

3. 量化一切 — 上线时间、延迟、事故率、代码规模

4. 展示工程领导力 — 技术判断 + 团队推动 + 风险管理

📚 深入学习 / Learn More:

- [The Wrong Abstraction — Sandi Metz](https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction) — classic post on when simplification actually makes things worse

- [A Philosophy of Software Design — John Ousterhout](https://web.stanford.edu/~ouster/cgi-bin/book.php) — the definitive book on managing complexity in software

- [Simple Made Easy — Rich Hickey (Strange Loop Talk)](https://www.youtube.com/watch?v=SxdOUGdseq4) — legendary talk distinguishing "simple" from "easy"

🧒 ELI5: Simplifying a complex system is like cleaning your messy backpack — you take everything out, throw away what you don't need, and put the rest back in a way that makes it easy to find your pencil without dumping everything on the floor.

🎨 Frontend

🎨 前端 Day 7 (2 min read) / Frontend Day 7

CSS Positioning: relative, absolute, fixed, sticky

CSS 定位：相对、绝对、固定、粘性

猜猜这段代码输出什么？/ What does this code output?

Where does the coral .box appear?

A) 20px from the top of the page, 30px from the left of the page

B) 20px from the top of .container, 30px from the left of .container ← ✅

C) 20px from the top of the viewport, 30px from the left of the viewport

D) It won't move — absolute positioning only works without a parent

Answer: B — because .container has position: relative, it becomes the containing block for .box.

🗺️ Visual Map of All 4 Positioning Modes


┌─────────────────────────────────────────────────────────────┐
│                         WEBPAGE                             │
│                                                             │
│  ┌─────────────────────────────────────┐                    │
│  │   position: relative                │                    │
│  │   └─ stays in flow                 │                    │
│  │   └─ offset from WHERE IT WOULD BE │                    │
│  └─────────────────────────────────────┘                    │
│                                                             │
│  ┌─────────────────────────────────────┐                    │
│  │   position: absolute                │                    │
│  │   ┌─────────────────────────────┐  │                    │
│  │   │ nearest positioned ancestor │  │                    │
│  │   │      ← offsets from HERE   │  │                    │
│  │   └─────────────────────────────┘  │                    │
│  └─────────────────────────────────────┘                    │
│                                                             │
│  ╔══════════════════════════════════════╗ ← VIEWPORT TOP    │
│  ║   position: fixed                    ║                   │
│  ║   └─ always relative to VIEWPORT    ║                   │
│  ║   └─ stays even when you scroll     ║                   │
│  ╚══════════════════════════════════════╝                   │
│                                                             │
│  ┌─────────────────────────────────────┐                    │
│  │   position: sticky                  │                    │
│  │   └─ relative UNTIL you scroll     │                    │
│  │      past threshold → then FIXED   │                    │
│  └─────────────────────────────────────┘                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Code Examples / 代码示例


/* 1. RELATIVE — offset from its normal position, stays in document flow */
.badge {
  position: relative;
  top: -2px;   /* nudge up 2px from where it would normally sit */
  /* still takes up its original space in the layout! */
}

/* 2. ABSOLUTE — removed from flow, positioned relative to nearest 
   positioned ancestor (or  if none exists) */
.tooltip {
  position: absolute;
  top: 100%;     /* just below the parent */
  left: 50%;
  transform: translateX(-50%);  /* center it */
  /* KEY: the parent must have position: relative! */
}

/* 3. FIXED — always relative to the viewport, never scrolls away */
.navbar {
  position: fixed;
  top: 0;
  left: 0;
  right: 0;
  z-index: 1000;  /* stay above everything */
}

/* 4. STICKY — hybrid: relative until threshold, then fixed */
.section-header {
  position: sticky;
  top: 60px;   /* becomes "fixed" 60px from top once scrolled past */
  background: white;
  z-index: 10;
}

你可能不知道 / You Might Not Know

⚠️ absolute 的"陷阱"：containing block 是谁？

absolute 定位是相对于"最近的已定位祖先"（nearest positioned ancestor = any element with position other than static）。

如果没有任何祖先设置了 position，那就相对于根元素！


/* Common bug: forgot to add position: relative to parent */
.card {
  /* position: relative; ← forgot this! */
}
.card .badge {
  position: absolute;
  top: 10px;
  right: 10px;
  /* This badge will now position relative to the page, not .card! */
}

⚠️ sticky 需要高度限制！

sticky 只在其父容器的范围内有效。如果父容器太短，或者 overflow: hidden 被设置，sticky 会"不工作"。这是最常见的 sticky bug！


/* sticky won't work if parent has overflow: hidden/scroll/auto */
.parent {
  overflow: hidden;  /* this BREAKS sticky! */
}
.child {
  position: sticky;
  top: 0;  /* won't stick — parent clips it */
}

🎯 Mini Challenge

Create a notification badge that sits in the top-right corner of an avatar image, like this:


┌──────────┐
│  👤      │🔴  ← red dot badge, top-right corner
│          │
└──────────┘

Write the HTML/CSS. (Hint: you need ONE element with position: relative and ONE with position: absolute.)

Solution / 答案


.avatar-wrapper {
  position: relative;  /* containing block for the badge */
  display: inline-block;
  width: 60px;
  height: 60px;
}

.avatar-wrapper img {
  width: 100%;
  height: 100%;
  border-radius: 50%;
}

.badge {
  position: absolute;
  top: -4px;
  right: -4px;
  background: red;
  color: white;
  font-size: 11px;
  width: 20px;
  height: 20px;
  border-radius: 50%;
  display: flex;
  align-items: center;
  justify-content: center;
}

📚 深入学习 / Learn More:

- [MDN: CSS position property](https://developer.mozilla.org/en-US/docs/Web/CSS/position) — the authoritative reference with live examples

- [CSS-Tricks: Absolute, Relative, Fixed Positioning: How Do They Differ?](https://css-tricks.com/absolute-relative-fixed-positioining-how-do-they-differ/) — clear visual explainer

- [Sticky CSS Headers](https://css-tricks.com/position-sticky-2/) — CSS-Tricks deep dive on the sticky gotchas

🧒 ELI5: CSS positioning is like choosing where to put your toys — relative means "move a little from where you already are," absolute means "go to a specific spot inside your room," fixed means "stay on the door no matter where I walk," and sticky means "follow me around, but only once I've passed a certain point."

🤖 AI

🤖 AI Day 7 (2 min read) — AI 新闻速递 / AI News Roundup

Week of March 17–20, 2026

📰 Story 1: Meta 用 AI 替代内容审核员

What happened: Meta 宣布大规模推出 AI 内容支持助手，将大幅减少对第三方内容审核承包商的依赖。Meta 表示 AI 系统将处理"重复性的图形内容审核"以及诈骗、毒品销售等对抗性内容的识别。

为什么你应该关心 / Why you should care:

这不只是 Meta 的内部决策 — 它标志着 AI 在高风险判断任务中正式进入规模化应用。争议点在于：AI 审核仍然会有偏差和漏判，而当人类被从循环中移除，谁来负责？对于工程师而言，这意味着 AI 安全和内容策略将成为未来几年的关键工程挑战。

📰 Story 2: Samsung 宣布 730 亿美元 AI 芯片扩张计划

What happened: 三星宣布 2026 年将 AI 相关投入增加 22%，押注 Agentic AI 的算力需求激增。目标是超越 SK Hynix，成为 Nvidia 最大的 HBM（高带宽内存）供应商。

为什么你应该关心 / Why you should care:

GPU 之争的下一个战场是 HBM 内存。LLM 推理的瓶颈越来越不是算力本身，而是内存带宽 — 这也是为什么 Agentic AI（需要长上下文、多轮推理）会带来如此巨大的内存需求。关注 Samsung vs SK Hynix 的竞争，本质上是在关注 AI 基础设施的下一个瓶颈。

📰 Story 3: Signal 创始人与 Meta 合作加密 AI

What happened: Signal 创始人 Moxie Marlinspike 宣布其加密 AI 聊天机器人 Confer 将与 Meta AI 集成，为 Meta 的 AI 提供端到端隐私保护技术。目标：让 AI 对话内容对 Meta 服务器本身也不可见。

为什么你应该关心 / Why you should care:

隐私保护 AI（Privacy-Preserving AI） 是下一个重要技术方向。目前几乎所有 AI 推理都在云端完成，服务商可以看到你的所有对话。Moxie 提出的方向（结合 TEE/可信执行环境 + 同态加密）如果成功，将彻底改变 AI 应用的隐私模型。这对医疗 AI、法律 AI 等敏感场景意义巨大。

📰 Story 4: Microsoft 发布 MAI-Image-2 图像生成模型

What happened: 微软发布第二代 AI 图像生成模型 MAI-Image-2，主打"增强照片真实感"和"图像内文字生成更可靠"。现已在 Copilot 和 Bing Image Creator 上线。

为什么你应该关心 / Why you should care:

文字渲染（text rendering in images）一直是图像 AI 的"耻辱角落" — GPT-4 画出来的招牌字往往是乱码。MAI-Image-2 将文字生成列为核心改进点，这对营销、设计、UI 素材生成有直接影响。同时，这也表明微软正在摆脱对 OpenAI DALL-E 的依赖，建立自己的图像模型能力。

📰 Story 5: Amazon Alexa Plus 登陆英国 — Agentic AI 助手首个欧洲落地

What happened: 亚马逊宣布 Alexa Plus（搭载 agentic AI 能力的升级版）在英国上线，早期访问阶段免费，之后每月 £19.99（约 $26.50），Prime 会员免费。亚马逊特别强调其"genuinely British"——理解"cuppa"、"knackered"、"nippy"等英式表达。

为什么你应该关心 / Why you should care:

AI 助手本地化（localization）不只是翻译，还包括文化理解和方言处理。Alexa Plus 的"Agentic"定位意味着它不再只是问答，而是可以执行多步任务（预订、购物、控制智能家居）。这是 AI 助手从"聊天机器人"向"数字代理人"演进的关键节点，也是 OpenAI、Google、Apple 都在争夺的市场。

📚 深入学习 / Learn More:

- [Meta AI Content Moderation Announcement](https://about.fb.com/news/2026/03/boosting-your-support-and-safety-on-metas-apps-with-ai/) — 官方博客原文

- [Moxie on Confer + Meta AI Integration](https://confer.to/blog/2026/03/encrypted-meta/) — 技术细节和隐私保护 AI 原理

- [Microsoft MAI-Image-2 Announcement](https://microsoft.ai/news/introducing-MAI-Image-2/) — 官方发布博客

🧒 ELI5: This week in AI: robots are learning to be internet safety guards, phone chips are getting a huge upgrade to handle smarter AI, someone figured out how to make AI chats private like a secret code, and AI assistants are learning to speak in different accents — even British English!

byte-by-byte — 2026-03-19

Thu, 19 Mar 2026 12:00:00 +0000

🏗️ System Design

🏗️ 系统设计 Day 6 / System Design Day 6

CDN (Content Delivery Network) — 内容分发网络

想象你在设计... / Imagine You're Building...

你在上海开了一家咖啡豆网店，客户遍布全球。每次有人从纽约访问你的网站，请求要飞越太平洋到上海服务器取图片、CSS、JS，再飞回去。往返 200ms+，用户等得花都谢了 🌸。

解决方案？在纽约、伦敦、东京都放一份你网站的静态资源副本。用户访问时，就近取货。

这就是 CDN（内容分发网络）。

You run a coffee bean shop in Shanghai with global customers. Every request from NYC flies across the Pacific and back — 200ms+ round trip. Solution? Cache copies of your static assets in NYC, London, Tokyo. Users get served from the nearest copy. That's a CDN.

架构图 / Architecture Diagram


                     用户请求 User Request
                           │
                     ┌──────┴──────┐
                     │  DNS 解析    │
                     │ (返回最近的  │
                     │  CDN 节点)   │
                     └──────┬──────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
  ┌──────────┐       ┌──────────┐       ┌──────────┐
  │ CDN Edge │       │ CDN Edge │       │ CDN Edge │
  │  纽约    │       │  伦敦    │       │  东京    │
  │ ████████ │       │ ████     │       │ ██████   │
  │ (cached) │       │ (cached) │       │ (cached) │
  └────┬─────┘       └────┬─────┘       └────┬─────┘
       │                   │                   │
       └───────────────────┼───────────────────┘
                           │ cache miss 时回源
                           ▼
                    ┌──────────────┐
                    │ Origin Server│
                    │  源站 (上海)  │
                    └──────────────┘

核心概念 / Key Concepts

1. Cache Hit vs Cache Miss

- Hit（命中）： CDN 节点有缓存 → 直接返回，超快（< 50ms）

- Miss（未命中）： CDN 没有 → 回源站取，存一份，下次就 Hit 了

2. TTL (Time To Live)

- 缓存过期时间。太短 → 频繁回源；太长 → 用户看到旧内容

- 静态资源（图片/CSS/JS）：TTL 长（1天-1年），文件名带 hash（app.a3f2b1.js）

- API 响应：TTL 短（几秒-几分钟）或不缓存

3. Cache Invalidation（缓存失效）

- 发布新版本时需要清除旧缓存

- 方法 1：Purge API（主动清除指定 URL）

- 方法 2：文件名 hash（新版本 = 新文件名 = 自动绕过旧缓存）✅ 推荐

4. Push vs Pull CDN

- Pull： CDN 节点在第一次请求时从源站拉取（大多数 CDN 默认行为）

- Push： 你主动上传内容到 CDN（适合大文件、已知内容）

别踩这个坑 / Don't Fall Into This Trap

❌ 把带用户个人信息的 API 响应（如 /api/me）也放 CDN 缓存

→ 用户 A 看到用户 B 的数据！

✅ 只缓存公开、不含个人信息的内容。私有内容设 Cache-Control: private, no-store。

⚠️ CDN 缓存了错误的 response（比如 500 错误页面）

→ 设置：只缓存 2xx 响应，或设置很短的负缓存 TTL

⚠️ 源站宕机时 CDN 还能服务（这是优点！）但如果 TTL 过了且源站还没恢复 → "stale-while-revalidate" 策略可以继续用旧缓存

面试要点 / Interview Key Points

1. CDN 降低延迟（地理距离）+ 减轻源站压力

2. 适合静态资源；动态内容需要谨慎（考虑 Edge Computing）

3. 常见 CDN：CloudFront (AWS), Cloudflare, Akamai, Fastly

4. CDN + Load Balancer + Cache = 高性能系统的三驾马车

Day 6 / 150 — 系统设计基础系列

💻 Algorithms

💻 算法 Day 6 / Algorithms Day 6 — #271 Encode and Decode Strings (Medium) — Arrays & Hashing

🔗 https://leetcode.com/problems/encode-and-decode-strings/

现实类比 / Real-World Analogy

你要把一堆购物清单放进一个信封里寄出去。问题是：收件人怎么知道哪里是一个清单的结束、另一个的开始？

笨办法：用逗号分隔 → 但如果清单内容本身有逗号呢？

聪明办法：每个清单前面写上它的长度 → "5#Hello3#Bye" → 无歧义！

题目 / Problem Statement

设计一个算法，将字符串列表编码为单个字符串，再解码回原始列表。

Design an algorithm to encode a list of strings into a single string, and decode it back.


Input:  ["Hello", "World"]
Encode: "5#Hello5#World"
Decode: ["Hello", "World"]

编码后的字符串可以包含任何字符（包括 #、空字符串等），必须无歧义。

追踪过程 / Trace Through

编码 ["Hi", "", "a#b"]：


"Hi"  → len=2  → "2#Hi"
""    → len=0  → "0#"
"a#b" → len=3  → "3#a#b"

encoded = "2#Hi0#3#a#b"

解码 "2#Hi0#3#a#b"：


i=0: find '#' at index 1 → length=2 → read "Hi" → i=4
i=4: find '#' at index 5 → length=0 → read "" → i=6
i=6: find '#' at index 7 → length=3 → read "a#b" → i=11
Done! → ["Hi", "", "a#b"] ✅

注意 "a#b" 内部的 # 不会造成混淆，因为我们是按长度读取的，不是按分隔符！

Python 解法 / Python Solution


class Codec:
    def encode(self, strs: list[str]) -> str:
        # Format: "length#string" for each string
        result = []
        for s in strs:
            result.append(f"{len(s)}#{s}")
        return "".join(result)

    def decode(self, s: str) -> list[str]:
        result = []
        i = 0
        while i < len(s):
            # Find the '#' delimiter
            j = s.index('#', i)
            # Characters before '#' are the length
            length = int(s[i:j])
            # Read exactly 'length' characters after '#'
            result.append(s[j + 1 : j + 1 + length])
            # Move pointer past the string
            i = j + 1 + length
        return result

复杂度 / Complexity

- 时间 Time: O(n) — n 是所有字符串总长度

- 空间 Space: O(1) — 不算输出空间（编码和解码都是线性扫描）

边界情况 / Edge Cases


codec = Codec()

# Empty list
codec.decode(codec.encode([])) == []  # ✅

# List with empty string
codec.decode(codec.encode([""])) == [""]  # ✅ "0#" → [""]

# Strings containing '#'
codec.decode(codec.encode(["a#b", "#"])) == ["a#b", "#"]  # ✅

# Very long string
codec.decode(codec.encode(["a" * 10000])) == ["a" * 10000]  # ✅

举一反三 / Pattern Recognition

模式：长度前缀编码（Length-Prefixed Encoding）

这和网络协议（TCP、HTTP/2）、序列化格式（Protocol Buffers）用的是同一个思路：先告诉你"接下来有多少字节"，然后精确读取。

为什么不用特殊分隔符（如 | 或 \0）？

→ 字符串可能包含任何字符！长度前缀永远不会歧义。

相关题目：

- #443 String Compression（类似的编码思维）

- 序列化/反序列化（#297 Serialize and Deserialize Binary Tree）

Day 6 / 150 — Arrays & Hashing 系列

昨天 Day 4：Group Anagrams | 明天 Day 7：Product of Array Except Self

🗣️ Soft Skills

🗣️ 软技能 Day 6 / Soft Skills Day 6

Leadership — 领导力（不靠头衔）

问题 / Question:

> "Describe how you've mentored or grown other engineers."

> "描述你是如何指导或培养其他工程师的。"

为什么这很重要 / Why This Matters

Senior/Staff 工程师的核心职责之一不是写更多代码，而是让团队里每个人都变得更强。面试官问这题是想看：

1. 你有没有"利他"意识（不只关心自己的 output）

2. 你的指导方式是否有结构、有结果

3. 你能不能识别他人的成长需求

At senior+ levels, your impact is measured by the engineers you've grown, not just the code you've shipped.

❌ 糟糕的回答 / Bad Approach

> "我经常帮初级工程师 review 代码，给他们指出问题，告诉他们怎么改。"

问题：

- 这只是日常工作，不算 mentoring

- "告诉他们怎么改" = 给答案，不是教思考

- 没有具体故事、没有成长轨迹

✅ 好的回答框架 / Good Approach (STAR)

Situation: "团队来了一位从 bootcamp 毕业的新工程师 Alex。他代码能 work，但 PR 里经常缺乏边界处理和测试，review 来回很多轮。"

Task: "作为他的 buddy engineer，我的目标不只是帮他过 PR，而是让他在 3 个月内能独立负责一个模块。"

Action:

- "我没有直接告诉他'加个 null check'，而是在 PR comment 里问引导性问题：'如果这个参数是 undefined 会发生什么？'让他自己发现问题"

- "每周 1:1 花 30 分钟，前 15 分钟他讲本周遇到的困难，后 15 分钟我分享一个设计决策的背景（为什么我们用 Redis 而不是 Memcached）"

- "第 6 周开始让他 lead 一个小 feature 的设计 doc，我只 review 不动手"

Result: "3 个月后，他的 PR 首次 review 通过率从 20% 提升到 75%。第 4 个月他独立 ship 了用户通知系统。半年后他成了团队里 on-call 最可靠的人之一。"

Senior/Staff 级别技巧 / Senior/Staff Tips

🎯 从"给答案"到"问问题"

- 初级：直接告诉他怎么做（他需要 unblock）

- 中级：给方向，让他自己找路

- 高级：只问问题，让他自己发现问题和答案

🎯 Mentoring ≠ 只有 1:1

- 写好的设计文档 = 一次写，教会所有人

- 做技术分享 / lunch & learn = 批量 mentoring

- 建立团队 wiki / onboarding guide = 可扩展的知识传播

🎯 跟踪成长，不只是感觉

- 用具体指标说话（PR 通过率、独立完成的 feature 数、on-call 表现）

- "他成长了"太模糊，"他从需要 3 轮 review 到 1 轮"才有说服力

关键要点 / Key Takeaways

1. Mentoring 的核心是授人以渔 — 教思考方式，不只是给答案

2. 好的 mentor 会有意识地退后 — 让 mentee 犯可控的错误并从中学习

3. 用具体的成长指标证明你的 mentoring 有效果

4. 最高级的 leadership：建立系统和文化，让 mentoring 自然发生（不依赖你个人）

Day 6 / 150 — 软技能系列

🎨 Frontend

🎨 前端 Day 6 / Frontend Day 6

CSS Variables & Modern CSS Features — CSS 自定义属性

猜猜这段代码输出什么？/ What does this layout look like?


:root {
  --primary: #3b82f6;
  --spacing: 16px;
  --radius: 8px;
}

.card {
  --primary: #ef4444;  /* 局部覆盖 */
  padding: var(--spacing);
  border: 2px solid var(--primary);
  border-radius: var(--radius);
}

.card .badge {
  background: var(--primary);
  color: white;
  padding: calc(var(--spacing) / 4) calc(var(--spacing) / 2);
  border-radius: var(--radius);
}

🤔 .badge 的 background 是蓝色 #3b82f6 还是红色 #ef4444？

✅ 答案：红色 #ef4444！

CSS Variables 遵循继承规则 — .card 里重新定义了 --primary，它的所有后代元素（包括 .badge）都会继承这个新值。这和 Sass/Less 变量完全不同（它们是编译时替换，不支持继承）。

CSS Variables 核心 / Key Concepts

1. 定义与使用


/* 定义：用 -- 前缀 */
:root {
  --color-text: #1a1a1a;
  --font-size-base: 16px;
}

/* 使用：用 var() 函数 */
body {
  color: var(--color-text);
  font-size: var(--font-size-base);
}

/* 带 fallback 值 */
p {
  color: var(--color-accent, #666);  /* 如果 --color-accent 未定义，用 #666 */
}

2. 运行时 vs 编译时


Sass:   $primary: blue → 编译后变成 → color: blue（写死了）
CSS:    --primary: blue → 运行时读取 → 可以随时改！

这意味着你可以：

- 用 JS 动态改变 → document.documentElement.style.setProperty('--primary', 'red')

- 根据 media query 改变 → 深色模式只需重新定义变量

- 组件内局部覆盖 → 不影响全局

3. Dark Mode 只需几行


:root {
  --bg: #ffffff;
  --text: #1a1a1a;
}

@media (prefers-color-scheme: dark) {
  :root {
    --bg: #1a1a1a;
    --text: #f0f0f0;
  }
}

body {
  background: var(--bg);
  color: var(--text);
}

你可能不知道 / You Might Not Know

CSS Variables 可以用在 calc() 里做数学运算：


:root {
  --base: 8px;
}

.component {
  padding: calc(var(--base) * 2);      /* 16px */
  margin: calc(var(--base) * 3);       /* 24px */
  border-radius: calc(var(--base) / 2); /* 4px */
}

这就是 Design Token 的基础 — 用一个基数推导出整个间距系统（4px, 8px, 12px, 16px, 24px, 32px...），改一个值就改全部。Tailwind CSS 内部就是这个原理。

⚠️ 常见陷阱 / Gotcha


.box {
  --size: 100;
  width: var(--size)px;  /* ❌ 不行！不会变成 100px */
  width: calc(var(--size) * 1px);  /* ✅ 这样才行 */
}

CSS Variables 存的是字符串，var(--size)px 会变成 100 px（中间有空格），无效。必须用 calc() 做单位转换。

Mini Challenge 🧩

不看上面的内容，回答：


:root { --gap: 10px; }
.parent { --gap: 20px; }
.parent .child { margin: var(--gap); }

.child 的 margin 是多少？

答案：20px — .parent 覆盖了 --gap，.child 作为后代继承 .parent 的值。

Day 6 / 150 — CSS 基础系列

昨天 Day 4：Responsive Design & Media Queries | 明天 Day 7：JavaScript DOM 基础

🤖 AI

🤖 AI Day 6 — 新闻速递 / News Roundup

2026年3月 | March 2026

> ⚠️ 以下为基于公开报道的摘要，具体数字以原始来源为准。

> Based on public reporting; verify specifics at original sources.

📰 1. Claude 4 发布 — Anthropic 的新旗舰

Anthropic 发布了 Claude 4 系列模型，包含 Claude 4 Opus（旗舰推理模型）和 Claude 4 Sonnet（平衡性能/速度）。据报道在编码、数学推理和长文档分析上有显著提升。

为什么你应该关心：

Claude 4 是 GPT-5 之后的又一个"代际跳跃"。如果你在做 AI 产品，现在有两个顶级推理模型可选。对开发者来说，更强的编码能力意味着 AI pair programming 又升了一个台阶。

📰 2. 开源模型追赶闭源 — Llama 4 / Qwen 3

Meta 的 Llama 4 和阿里的 Qwen 3 系列在基准测试上接近 GPT-4 级别。Llama 4 Scout（17B 参数）在 MMLU 上接近 GPT-4o 水平。

为什么你应该关心：

开源模型的质量已经到了"够用"的临界点。对创业公司意味着：不再被 OpenAI/Anthropic 的定价绑定。对大公司意味着：可以在私有基础设施上部署，数据不出内网。如果你在选模型，先评估开源方案——可能省 90% 的 API 成本。

📰 3. AI Agent 框架爆发

2026年初，AI Agent（智能体）从概念变成了产品。OpenAI 的 Operator、Anthropic 的 Computer Use、以及开源的 browser-use 等项目让 AI 能直接操作浏览器、写代码、管理文件。

为什么你应该关心：

"AI 能帮你查东西"和"AI 能帮你做事"是两个完全不同的级别。Agent 意味着 AI 从"回答问题"升级到"执行任务"。工程师需要思考：你的产品/API 是否对 Agent 友好？是否有好的错误提示让 Agent 能自动恢复？

📰 4. 欧盟 AI Act 正式执行

欧盟的 AI 法案（AI Act）开始分阶段执行。高风险 AI 系统（医疗诊断、招聘筛选、信用评分）需要通过合规审查，包括数据来源透明度、偏见测试、人工审核机制。

为什么你应该关心：

如果你的产品面向欧洲用户，这不是"以后再说"——它现在就是法律了。即使你在美国，欧盟用户的 GDPR 合规 + AI Act 合规 = 你需要知道你的模型用了什么数据、做了什么决策。文档化不是可选的。

📰 5. Vibe Coding 成为主流开发方式

"Vibe Coding"——用自然语言描述需求、让 AI 生成代码、开发者做审查和架构决策——在 2026 年成为被广泛接受的开发方式。Cursor、Copilot、Claude Code 等工具的日活用户据报道已超百万。

为什么你应该关心：

不是"AI 会不会取代工程师"，而是"用 AI 的工程师会不会取代不用的"。核心变化：写代码的成本降低了，但设计正确系统的能力反而更值钱了。算法理解、系统设计、代码审查能力变得更重要，因为你要能审查 AI 写的代码。

Day 6 / 150 — AI 新闻系列

byte-by-byte — 2026-03-18

Wed, 18 Mar 2026 12:00:00 +0000

Review

🔄 复习日 Day 5 / Review Day 5

📊 Day 5/150 · NeetCode: 4/150 · SysDesign: 4/40 · Behavioral: 4/40 · Frontend: 4/50 · AI: 2/30

🔥 3-day streak!

今天是复习日！回顾过去4天的内容。

Today is a review day! Let's revisit the past 4 days.

📝 Quick Quiz — 3 Mini-Reviews

从系统设计、算法、前端三个板块各出一题——先别看答案，想想你记得多少！

One question each from System Design, Algorithms, and Frontend — try to answer before peeking!

Q1: [🏗️ System Design] From Day 4 — Load Balancing

你有一个 Load Balancer 后面跟着 3 台服务器。用户登录后，Session 存在 Server 1 的内存里。下一个请求被路由到 Server 2——用户发现自己被登出了。

You have a load balancer in front of 3 servers. User logs in, session stored in Server 1's memory. Next request routes to Server 2 — user is logged out.

问 / Question: 这个问题叫什么？列出两种解决方案，并说明各自的取舍。

What is this problem called? Name two solutions and explain the tradeoff of each.

显示答案 / Show Answer

这叫 Session Affinity（会话粘性）问题，或 Sticky Session 问题。

服务器无状态（Stateless）是 REST 的核心原则，但当状态存在单台服务器内存时，负载均衡就破坏了这个假设。

解决方案 1：Sticky Sessions（粘性会话）

Load Balancer 记住"用户 A → 永远去 Server 1"。

- ✅ 简单，无需改动应用层

- ❌ 如果 Server 1 宕机，所有绑定它的用户 session 丢失；负载不均衡

解决方案 2：Shared Session Store（共享 Session 存储）

Session 不存在各服务器内存，而是存在 Redis 等共享缓存里。所有服务器读写同一个 Redis。

- ✅ 任何服务器都能处理任何请求，真正无状态

- ❌ 引入 Redis 作为额外依赖；Redis 本身需要高可用设计

关键洞察（From Day 4）： 负载均衡解决了"一台服务器扛不住"的问题，但同时暴露了"状态共享"的问题。真正可扩展的系统需要无状态的应用层 + 独立的状态层（数据库/缓存）。

Q2: [💻 Algorithms] From Days 2–4 — Arrays & Hashing Pattern

下面这段代码是什么算法的思路？时间复杂度是多少？它和 Day 2 的 Valid Anagram 有什么共同的核心思想？

What algorithm pattern does this code skeleton represent? What's the time complexity? How does it share a core idea with Day 2's Valid Anagram?


def mystery(nums):
    seen = {}
    for i, num in enumerate(nums):
        complement = target - num
        if complement in seen:
            return [seen[complement], i]
        seen[num] = i

显示答案 / Show Answer

这是 Day 3 的 Two Sum 解法。

时间复杂度：O(n) — 只遍历一次数组，dict 的查找是 O(1)。

空间复杂度：O(n) — 最坏情况存储 n 个元素。

和 Valid Anagram 的共同核心思想：

Two Sum 和 Valid Anagram 都是 Arrays & Hashing 模式的经典题。核心思想是：

> "用哈希表把 O(n²) 的"两层循环查找"压缩成 O(n) 的"一次遍历 + 字典查询"。"

> Use a hash map to trade O(n²) nested search for O(n) single-pass lookup.

- Valid Anagram：用哈希表统计字符频率，把"逐个比对"变成"对比频率表"

- Two Sum：用哈希表记录"已见过的数"，把"两层循环找配对"变成"一次扫描找补集"

这个思路是 NeetCode 150 中出现频率最高的模式之一——遇到"在数组里找某种关系"的题，先想哈希表！

Q3: [🎨 Frontend] From Day 4 — Responsive Design & Media Queries

不看代码，回答：@media (min-width: 768px) 这条规则，在什么情况下生效，在什么情况下不生效？

Without looking at code: when does @media (min-width: 768px) apply, and when does it not?

然后：如果同时有 min-width: 768px 和 min-width: 1200px 两条媒体查询，都在 1400px 宽度下，会发生什么？

Then: if you have both min-width: 768px and min-width: 1200px rules, what happens at 1400px width?

显示答案 / Show Answer

@media (min-width: 768px) 生效条件：

- 视口宽度 ≥ 768px → 规则生效（应用样式）

- 视口宽度 < 768px → 规则不生效（忽略样式）

- 这是"Mobile-First"写法的基础——默认样式给手机，min-width 逐步增强给更大屏幕。

两条规则同时存在时（1400px 宽度）：

1400px ≥ 768px → 第一条生效 ✅

1400px ≥ 1200px → 第二条也生效 ✅

两条都生效！CSS 的层叠规则决定最终样式：

- 两条规则中相同的属性，后写的（声明顺序靠后的）优先

- 所以应该总是把 min-width 从小到大排列写：先 768px，再 1200px

- 这样大屏幕的样式会自然覆盖中等屏幕的样式

关键记忆（From Day 4）： Media queries don't "turn off" — they cascade. The last rule wins. Order matters!

💡 复习巩固记忆，螺旋式上升。

Review reinforces memory — spiral upward.

📅 明天继续新内容！/ New content resumes tomorrow!

byte-by-byte — 2026-03-17

Tue, 17 Mar 2026 12:00:00 +0000

🏗️ System Design

🏗️ 系统设计 Day 4 / System Design Day 4 — 负载均衡 / Load Balancing

> 基础阶段 Foundation Phase | 预计阅读时间 ~3-4 分钟

场景引入 / Scenario

想象你开了一家超级火爆的奶茶店 🧋。第一天只有一个收银台，没问题。但第二天去大众点评上了热搜，突然排队排到门外一百米。怎么办？你开了第二个、第三个收银台，还派了一个引导员在门口，告诉每个客人该去哪个收银台排队。

这个"引导员"就是负载均衡器（Load Balancer）。

Imagine you open a super-popular boba tea shop. Day one: one cashier, no problem. Day two: you go viral, and there's a 100-meter queue outside. Solution? You open more cashier lanes and station someone at the entrance directing each customer to the shortest line.

That person at the entrance is your Load Balancer.

架构图 / Architecture Diagram


                        ┌─────────────────────────────┐
                        │         用户请求             │
                        │     Incoming Requests        │
                        └──────────────┬──────────────┘
                                       │
                                       ▼
                        ┌─────────────────────────────┐
                        │       Load Balancer          │
                        │     负载均衡器               │
                        │  (Nginx / AWS ALB / HAProxy) │
                        └───────┬──────┬──────┬───────┘
                                │      │      │
                     ┌──────────┘      │      └──────────┐
                     ▼                 ▼                  ▼
              ┌────────────┐  ┌────────────┐  ┌────────────┐
              │  Server 1  │  │  Server 2  │  │  Server 3  │
              │  服务器 1  │  │  服务器 2  │  │  服务器 3  │
              │  ████████  │  │  ████      │  │  ██        │
              │ (80% load) │  │ (40% load) │  │ (20% load) │
              └────────────┘  └────────────┘  └────────────┘
                                       │
                        ┌──────────────┘
                        │
                        ▼
              ┌─────────────────────┐
              │   Shared Database   │
              │    共享数据库        │
              └─────────────────────┘

核心概念 / Key Concepts

负载均衡算法 / Load Balancing Algorithms

1. Round Robin（轮询）

- 依次把请求分给每台服务器，循环往复

- 类比：收银台依次叫号

- 适用：服务器性能相同、请求处理时间相近的场景

2. Weighted Round Robin（加权轮询）

- 性能强的服务器分配更多请求（权重更高）

- 类比：有个收银员超快，就多给她排队

- 适用：服务器配置不均匀的场景

3. Least Connections（最少连接）

- 把新请求分给当前连接数最少的服务器

- 类比：去排队最短的那个收银台

- 适用：请求处理时长差异大的场景（如文件上传 vs 简单查询）

4. IP Hash（IP 哈希）

- 根据客户端 IP 地址决定路由到哪台服务器

- 同一个用户总是被路由到同一台服务器

- 适用：需要会话粘性（Session Stickiness）的场景

为什么这样设计？/ Why This Design?

| 目标 Goal | 解决方案 Solution |

|----------|-----------------|

| 高可用 High Availability | 一台服务器挂了，流量自动转移 |

| 水平扩展 Horizontal Scaling | 加新服务器，不改代码 |

| 性能 Performance | 避免单点瓶颈，减少响应时间 |

| 健康检查 Health Checks | LB 自动剔除故障节点 |

两种类型 / Two Types

Layer 4 (Transport Layer) LB

- 基于 IP + TCP/UDP 端口路由

- 速度快，但"看不懂"请求内容

- 类比：只看信封地址，不看信的内容

Layer 7 (Application Layer) LB

- 基于 HTTP 头、URL、Cookie 等路由

- 更智能（可以把 /api 路由到 API 服务器，把 /static 路由到 CDN）

- 类比：根据信的内容决定投递给哪个部门

- 性能稍低，但灵活得多

别踩这个坑 / Don't Fall Into This Trap

坑 1：有状态的服务器（Stateful Servers）

❌ 错误做法： 把用户 Session 存在单台服务器的内存里


用户第1次请求 → Server 1 (Session 存在这里)
用户第2次请求 → Server 2 (找不到 Session！用户被登出)

✅ 正确做法： Session 外置到共享存储


用户第1次请求 → Server 1 → 把 Session 写入 Redis
用户第2次请求 → Server 2 → 从 Redis 读 Session ✓

关键原则：服务器要做到"无状态"(Stateless)，所有状态都存外部！

坑 2：负载均衡器自身成为单点故障

如果 LB 本身挂了怎么办？

✅ 解决方案： 部署主备 LB（Active-Passive）或使用 DNS 轮询 + 多 LB

坑 3：健康检查不够频繁

LB 依赖健康检查（Health Check）来知道哪台服务器挂了。如果检查间隔太长（比如 60s），可能有 1 分钟的流量打到死服务器上。

✅ 生产环境通常设置：每 5-10 秒一次健康检查。

延伸阅读 / Going Deeper

- 昨天（Day 3）我们聊了 HTTP/REST，现在你知道 LB 就工作在 HTTP 这一层之上

- 下周我们会聊数据库扩展，届时 LB 的概念还会出现（读写分离 + 连接池）

- 如果你听说过 Nginx、HAProxy、AWS ALB/NLB，它们都是负载均衡器的具体实现

Day 4 / 100 — 系统设计基础系列 System Design Foundations

💻 Algorithms

💻 算法 Day 4 / Algorithms Day 4 — #49 Group Anagrams (Medium) — Arrays & Hashing

> 基础阶段 Foundation Phase | 预计阅读时间 ~3-4 分钟

现实类比 / Real-World Analogy

想象你在邮局分拣包裹。每个包裹上写的是打乱顺序的地址，比如 "acts", "cats", "tacs" 其实都是同一个地方（字母相同，顺序不同）。你的任务是把所有"相同地址"的包裹放到同一个箱子里。

怎么判断两个地址"本质相同"？把字母排个序，如果排序后一样，就是同一个地址！

At the post office, sorting packages with scrambled addresses: "acts", "cats", "tacs" all go to the same place. Your job: group them together. The trick? Sort the letters — if they match after sorting, they're anagrams!

题目 / Problem Statement

给定一个字符串数组，将所有字母异位词（anagram）分组在一起。

Given an array of strings, group the anagrams together.


输入 Input:  ["eat","tea","tan","ate","nat","bat"]
输出 Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

字母异位词 = 由相同字母以不同顺序构成的单词

Anagram = words with the same letters in different order

思路解析 / Step-by-Step Walkthrough

关键洞察 Key Insight：

两个字符串互为 anagram，当且仅当它们排序后相同。

Two strings are anagrams if and only if their sorted versions are identical.

所以我们用排序后的字符串作为哈希表的 key，把所有 anagram 归到同一个桶（bucket）里。

追踪过程 / Trace Through the Example

输入：["eat","tea","tan","ate","nat","bat"]

| 当前单词 Word | 排序后 Sorted Key | 哈希表状态 HashMap State |

|------------|----------------|----------------------|

| "eat" | "aet" | {"aet": ["eat"]} |

| "tea" | "aet" | {"aet": ["eat","tea"]} |

| "tan" | "ant" | {"aet": ["eat","tea"], "ant": ["tan"]} |

| "ate" | "aet" | {"aet": ["eat","tea","ate"], "ant": ["tan"]} |

| "nat" | "ant" | {"aet": ["eat","tea","ate"], "ant": ["tan","nat"]} |

| "bat" | "abt" | {"aet": [...], "ant": [...], "abt": ["bat"]} |

最终取哈希表的所有 values → [["eat","tea","ate"], ["tan","nat"], ["bat"]]

Python 解法 / Python Solution


from collections import defaultdict

def groupAnagrams(strs: list[str]) -> list[list[str]]:
    # Use a defaultdict so we can append without checking if key exists
    # 用 defaultdict，省去手动判断 key 是否存在的麻烦
    anagram_map = defaultdict(list)
    
    for word in strs:
        # Sort the word's characters to create the canonical key
        # 对字母排序，得到这组 anagram 的"标准形式"
        # e.g., "eat" -> sorted("eat") -> ['a','e','t'] -> "aet"
        key = "".join(sorted(word))
        
        # Append this word to the bucket for its key
        # 把当前单词放入对应的桶
        anagram_map[key].append(word)
    
    # Return all the groups (values of the map)
    # 返回所有分组
    return list(anagram_map.values())

手动验证 / Manual Verification

让我们用 ["eat","tea","tan","ate","nat","bat"] 逐步跑代码：

1. word = "eat" → sorted("eat") = ['a','e','t'] → key = "aet" → anagram_map = {"aet": ["eat"]}

2. word = "tea" → sorted("tea") = ['a','e','t'] → key = "aet" → anagram_map = {"aet": ["eat","tea"]}

3. word = "tan" → sorted("tan") = ['a','n','t'] → key = "ant" → anagram_map = {"aet": [...], "ant": ["tan"]}

4. word = "ate" → sorted("ate") = ['a','e','t'] → key = "aet" → anagram_map = {"aet": ["eat","tea","ate"], "ant": ["tan"]}

5. word = "nat" → sorted("nat") = ['a','n','t'] → key = "ant" → anagram_map = {"aet": [...], "ant": ["tan","nat"]}

6. word = "bat" → sorted("bat") = ['a','b','t'] → key = "abt" → anagram_map = {"aet": [...], "ant": [...], "abt": ["bat"]}

最终返回 [["eat","tea","ate"], ["tan","nat"], ["bat"]] ✅ 与题目期望输出一致！

时间/空间复杂度 / Complexity Analysis

时间复杂度 Time Complexity: O(n × k log k)

- n = 字符串数量（number of strings）

- k = 最长字符串的长度（max string length）

- 对每个字符串排序 = O(k log k)，总共 n 个字符串

空间复杂度 Space Complexity: O(n × k)

- 哈希表存储所有字符串的副本

- 最坏情况：所有字符串都不是 anagram，每个单独一个桶

边界情况 / Edge Cases


# 空数组 Empty input
groupAnagrams([])  # → []

# 单个字符串 Single string
groupAnagrams(["a"])  # → [["a"]]

# 所有字符串都是 anagram All are anagrams
groupAnagrams(["abc","bca","cab"])  # → [["abc","bca","cab"]]

# 没有 anagram 对 No anagram pairs
groupAnagrams(["abc","def","ghi"])  # → [["abc"],["def"],["ghi"]]

举一反三 / Pattern Recognition

这道题的模式：用排序/哈希创建"规范形式"（Canonical Form）

当你需要"把等价的东西归类"时，找到一个好的 key 是关键。

同类变体 / Follow-up Variations:

1. #242 Valid Anagram（Day 2 做过！） — 判断两个字符串是否互为 anagram，现在你应该更理解为什么用哈希表了

2. 用字符计数作 key（优化版） — 不排序，而是统计 26 个字母的频率，构成一个 tuple 作 key。时间复杂度降到 O(n × k)：


   key = tuple(Counter(word).values())  # 不推荐，顺序不固定
   # 更好的方式：
   count = [0] * 26
   for c in word:
       count[ord(c) - ord('a')] += 1
   key = tuple(count)  # e.g., "eat" -> (1,0,0,0,1,0,...,1,...) [a=1,e=1,t=1]

3. 思考扩展： 如果字符串包含 Unicode 字符怎么办？用 Counter 而非固定 26 位数组

Day 4 / 100 — Arrays & Hashing 系列

昨天 Day 3：Two Sum | 明天 Day 5：Top K Frequent Elements

🗣️ Soft Skills

🗣️ 软技能 Day 4 / Soft Skills Day 4 — 失败与成长 / Failure & Growth

> 基础阶段 Foundation Phase | 预计阅读时间 ~2-3 分钟

今日问题 / Today's Question

> "Describe a project that failed or didn't meet expectations. What did you learn?"

> 描述一个失败的或未达到预期的项目。你从中学到了什么？

为什么这很重要 / Why This Matters

这个问题是面试官用来区分普通候选人和优秀候选人的分水岭。

大多数人要么：

- 找借口（"那个产品经理需求一直在变…"）

- 给出假失败（"我太追求完美了！"）

真正优秀的工程师知道：失败是学习的压缩包。能清晰复盘失败，说明你有自我意识、有成长心态、有责任感。

This question separates candidates who are self-aware from those who aren't. Great engineers treat failure as dense learning. If you can articulate a real failure with clarity, you signal maturity, accountability, and a growth mindset.

STAR 框架拆解 / STAR Framework Breakdown

Situation（情境）: 设置背景 — 项目是什么？团队多大？时间线如何？

Task（任务）: 你的角色 — 你负责什么？期望是什么？

Action（行动）: 你做了什么 — 包括那些事后回想起来的"错误决定"

Result（结果）: 真实的结果 — 项目延期？功能被砍？用户不买账？

+Learning（学习）: ⭐ 这是整个回答的精华 — 你具体学到了什么？有什么改变？

❌ 糟糕的回答 / Bad Approach

> "我们做了一个推荐系统，但效果没有预期好。这让我意识到我应该更仔细地沟通需求。"

问题在哪？

- 没有具体细节（什么推荐系统？多大的影响？）

- "更仔细地沟通"——太泛了，面试官听到这句话已经睡着了

- 没有说明你个人在失败中的角色

- 学到的教训没有被后续行动验证

✅ 好的回答 / Good Approach

> "In my second year at [Company], I led a migration of our user notification system from a monolithic service to an event-driven architecture. The business goal was to reduce notification latency from ~3 seconds to under 500ms and improve reliability.

> 我的任务是设计新架构并协调三个团队的迁移工作，预期 Q2 上线。

> 我犯的关键错误：我低估了消息幂等性（idempotency）的问题。我假设下游消费者已经处理好了重复消息，但实际上没有。上线后，部分用户在一次事件中收到了 3-5 条重复通知，引发了大量投诉，我们不得不回滚。

> We rolled back within 6 hours, which itself was a success — but the original go-live failed.

> 从这次失败中，我有两个具体改变：

> 1. 我开始在所有 event-driven 设计的 design doc 里加一节 'Idempotency Guarantees'，明确列出哪一层负责去重

> 2. 我们建立了 chaos testing 流程，在 staging 环境模拟消息重投递，在那以后我们再没有出现类似问题

> 三个月后，同一个团队完成了迁移，延迟确实降到了 420ms。我认为能拿到这个结果，部分原因就是第一次的失败让我们想清楚了真正的难点。"

为什么这个回答好？/ Why This Works

✅ 具体技术细节 — 幂等性问题，不是模糊的"沟通问题"

✅ 诚实承担责任 — "我犯的关键错误"，没有推锅

✅ 量化影响 — 3-5 条重复通知，回滚 6 小时内完成

✅ 学习有证据 — 不是说"我学会了要考虑幂等性"，而是说"我在所有后续 design doc 里加了这一节"

✅ 故事有结尾 — 三个月后成功了，说明学习真的有效

Senior/Staff 级别加分项 / Senior/Staff Level Tips

如果你是 Senior 或 Staff 候选人，面试官想听到更多的是系统性改变，而非个人教训：

- "我把这个 checklist 推广到了整个团队" (team impact)

- "我们更新了 runbook，现在新 engineer onboarding 时会学到这个" (process change)

- "这次失败推动了我们建立 incident review 文化" (cultural change)

层级越高，你的学习边界越大。 Junior 学到的是"我应该更仔细地测试"；Staff 学到的是"我们整个组织的测试文化需要改变"。

你可以改编的模板 / Scenario Template


情境: 我在 [公司/项目] 负责 [技术项目]，目标是 [业务目标]。
错误: 我的关键判断失误是 [具体技术/流程错误]。
影响: 导致了 [具体后果，数字化]。
学习1: 从此以后，我 [具体新习惯/流程，可验证]。
学习2: 我把这个教训 [推广/文档化] 到了 [范围]。
后续: [N 个月后，最终的结果是...]。

关键要点 / Key Takeaways

1. 选真实的失败 — 面试官能辨别出假失败。真失败才有说服力

2. 从个人行动出发 — "我们失败了"比"我做了错误决定"弱 10 倍

3. 学习要有后续行动 — "我意识到"不够，"我从那以后改变了"才有力量

4. 结局不一定非要成功 — "项目被取消，但我建立的系统依然在生产环境跑着"也是好结尾

Day 4 / 100 — 行为面试系列 Behavioral Interview Series

昨天 Day 3：与领导意见相左 | 明天 Day 5：时间管理与优先级

🎨 Frontend

🎨 前端 Day 4 / Frontend Day 4 — 响应式设计与媒体查询 / Responsive Design & Media Queries

> 基础阶段 Foundation Phase | 预计阅读时间 ~2-3 分钟

猜猜这段代码的行为？/ What Does This Code Do?


.container {
  width: 100%;
  padding: 0 20px;
  box-sizing: border-box;
}

@media (min-width: 768px) {
  .container {
    max-width: 960px;
    margin: 0 auto;
    padding: 0 40px;
  }
}

@media (min-width: 1200px) {
  .container {
    max-width: 1140px;
    padding: 0 60px;
  }
}

问：当浏览器宽度是 800px 时，.container 的实际宽度和 padding 是多少？

Q: When the browser is 800px wide, what are the actual width and padding?

点击查看答案 / Click for Answer

答案：宽度 = 800px（100% of viewport），padding = 0 40px

理由：

- 800px > 768px → 第一个 @media 规则生效 ✓

- 800px < 1200px → 第二个 @media 规则不生效 ✗

- max-width: 960px 生效，但 800px < 960px，所以实际宽度被 width: 100% 控制 = 800px

- margin: 0 auto 生效（container 会居中，但由于宽度是 100%，看不出来）

- padding 被第一个媒体查询覆盖为 0 40px

所以：width = 800px，padding = 0 40px ✅

核心概念 / Core Concepts

什么是响应式设计？/ What Is Responsive Design?

同一套 HTML/CSS，在手机、平板、电脑上都好看好用。

不是做三个不同的页面，而是用弹性布局 + 媒体查询适配所有屏幕。

One codebase, all screen sizes. Not three separate pages — flexible layouts + media queries.

媒体查询语法 / Media Query Syntax


/* 基础语法 Basic syntax */
@media [media-type] [and/not/only] (condition) {
  /* CSS rules */
}

/* 常见断点 Common breakpoints */
/* Mobile first approach (推荐!) */
/* Base styles: mobile */
.element { font-size: 14px; }

@media (min-width: 576px)  { /* sm - Large phones */ }
@media (min-width: 768px)  { /* md - Tablets */ }
@media (min-width: 992px)  { /* lg - Desktops */ }
@media (min-width: 1200px) { /* xl - Large desktops */ }

两种策略 / Two Approaches


Mobile First (推荐 ✓)          Desktop First (常见但不推荐)
────────────────────           ──────────────────────────
先写手机样式，                   先写桌面样式，
用 min-width 往上覆盖            用 max-width 往下覆盖

Base → small screens            Base → large screens
↑ override for larger           ↓ override for smaller

好处:                            缺点:
✅ 性能更好 (mobile loads less)  ❌ 移动端加载冗余样式
✅ 优先考虑移动端体验            ❌ 思维反直觉
✅ Progressive enhancement       ❌ Graceful degradation

实战代码 / Practical Example


/* Mobile First: 先写最小屏幕的样式 */
.card-grid {
  display: grid;
  grid-template-columns: 1fr;  /* 1 column on mobile */
  gap: 16px;
  padding: 16px;
}

/* 平板: 2 列 */
@media (min-width: 768px) {
  .card-grid {
    grid-template-columns: repeat(2, 1fr);  /* 2 columns */
    gap: 24px;
    padding: 24px;
  }
}

/* 桌面: 3 列 */
@media (min-width: 1200px) {
  .card-grid {
    grid-template-columns: repeat(3, 1fr);  /* 3 columns */
    gap: 32px;
    padding: 32px;
  }
}


手机 (< 768px)    平板 (768-1199px)    桌面 (≥ 1200px)
──────────────    ─────────────────    ───────────────
┌──────────┐      ┌─────┐ ┌─────┐     ┌───┐ ┌───┐ ┌───┐
│  Card 1  │      │Card1│ │Card2│     │ 1 │ │ 2 │ │ 3 │
├──────────┤      ├─────┤ ├─────┤     ├───┤ ├───┤ ├───┤
│  Card 2  │      │Card3│ │Card4│     │ 4 │ │ 5 │ │ 6 │
├──────────┤      └─────┘ └─────┘     └───┘ └───┘ └───┘
│  Card 3  │
└──────────┘

你可能不知道 / You Might Not Know

Viewport Meta Tag — 必不可少！

为什么？ 手机浏览器默认假装自己是 980px 宽的桌面（为了渲染老网站）。加了这个 meta tag，才告诉它"就用设备真实宽度"。

Why? Mobile browsers default to pretending they're a ~980px desktop viewport (legacy web compatibility). This tag tells them: use the real device width.

媒体查询也能针对其他特性 / Other Media Features


/* 暗色模式 Dark mode */
@media (prefers-color-scheme: dark) {
  body { background: #1a1a1a; color: #f0f0f0; }
}

/* 减少动画（无障碍）Reduced motion (accessibility) */
@media (prefers-reduced-motion: reduce) {
  * { animation: none !important; transition: none !important; }
}

/* 横竖屏 Orientation */
@media (orientation: landscape) {
  .sidebar { display: block; }
}

Mini Challenge 小挑战


/* 这段代码中，如果屏幕宽度是 600px，
   .box 的背景色是什么？ */
/* What is the background color of .box at 600px? */

.box { background: red; }

@media (min-width: 500px) {
  .box { background: blue; }
}

@media (max-width: 700px) {
  .box { background: green; }
}

@media (min-width: 550px) and (max-width: 650px) {
  .box { background: yellow; }
}

答案 / Answer

答案：yellow（黄色）

在 600px 时：

1. red — 基础样式 ✓

2. min-width: 500px → 600 ≥ 500 → 覆盖为 blue ✓

3. max-width: 700px → 600 ≤ 700 → 覆盖为 green ✓

4. min-width: 550px and max-width: 650px → 550 ≤ 600 ≤ 650 → 覆盖为 yellow ✓

CSS 层叠规则：后声明的规则（同优先级时）覆盖先声明的。最后生效的是 yellow。

Day 4 / 100 — CSS 基础系列 CSS Fundamentals

昨天 Day 3：CSS Grid | 明天 Day 5：CSS 动画与 Transitions

🤖 AI

🤖 AI Day 4 — 分词（Tokenization）：LLM 眼中的文字 / Tokenization: How LLMs See Text

> 基础阶段 Foundation Phase | 预计阅读时间 ~2-3 分钟

直觉解释 / Intuitive Explanation

你有没有想过，为什么 ChatGPT 有时候连字都数不清楚？比如问它"'strawberry' 这个词里有几个 r？"，它可能回答"2 个"而不是正确答案"3 个"？

这就是分词（Tokenization）在背后作怪。

LLM 看到的世界，不是一个个字母，也不是一个个单词——而是tokens（词元）。一个 token 可能是：

- 一个完整的英语单词（如 hello）

- 一个常见单词的一部分（如 token → tok + en）

- 一个标点符号或空格

- 一个汉字（通常 1 个汉字 = 1 个 token）

Have you ever wondered why ChatGPT struggles to count letters? Ask "how many 'r's in strawberry?" and it might say 2 instead of 3. That's tokenization at work — the model never sees individual letters.

Tokenization 是怎么工作的？/ How Does It Work?

主流算法：BPE（Byte Pair Encoding）

训练阶段（一次性，在模型训练前）:

1. 从字母级别开始：把所有文字拆成单个字符

2. 统计最常出现的相邻字符对，合并成一个新 token

3. 重复 N 次（N 通常是几万次），直到词汇表大小达标

效果：常见的词/词根保持完整，罕见的词被拆开。


"tokenization" 可能被拆成：
"token" + "ization"
或
"token" + "iz" + "ation"
（取决于训练数据中这些片段的频率）

可视化追踪 / Visual Trace


原始文本 Raw Text:
"LLMs are transformers"

          ↓ Tokenizer

Token IDs:   [  47,    2860,    527,   83386 ]
             "LL"  "Ms are"  " trans"  "formers"
              ↑
           注意 "LLMs" 被拆成多个 token！

(以上是示意，实际 token 边界取决于具体模型的词汇表)


模型看到的不是:   L L M s   a r e   t r a n s f o r m e r s
而是:            [token1] [token2] [token3] [token4]

就像把句子看成 LEGO 块，而不是沙粒
Like seeing sentences as LEGO bricks, not individual grains of sand

为什么这很重要？/ Why Does This Matter?

1. 解释了"数字母"的 bug 🔢

"strawberry" → 可能被 token 化为 "straw" + "berry"

模型数 r 时是数 token 里的 r，不是逐字母数。

这是已知的 LLM 局限，不是 bug，而是架构的内在特性。

2. 解释了 Token 计费 💰

OpenAI 按 token 而非单词收费：

- 英文：1 token ≈ 0.75 个单词

- 中文：1 个汉字 ≈ 1 个 token（有时更多）

- 代码：注释和空格也算 token，很"贵"


"Hello, world!" = 4 tokens: ["Hello", ",", " world", "!"]
"你好世界" = 4 tokens: ["你","好","世","界"]（每个汉字一个）

3. 解释了 Context Window 的实际限制 📏

GPT-4 的 128K context window 是 token 数，不是字数。

如果你的提示词里中文很多，等效"容量"就比英文少。

代码片段 / Code Snippet


# 用 tiktoken 查看 GPT 系列模型如何分词
# pip install tiktoken
import tiktoken

# Load GPT-4's tokenizer
enc = tiktoken.encoding_for_model("gpt-4")

# Encode a string into token IDs
tokens = enc.encode("tokenization is fascinating")
print(tokens)
# Output: [4037, 2065, 374, 27387] (actual IDs may vary)

# Decode back
decoded = [enc.decode([t]) for t in tokens]
print(decoded)
# Output: ['token', 'ization', ' is', ' fascinating']
# Notice: "tokenization" is split into 2 tokens!

# Count tokens in a prompt
text = "strawberry"
count = len(enc.encode(text))
print(f"'{text}' = {count} token(s)")
# Output: 'strawberry' = 1 token(s)
# 所以 GPT-4 看 "strawberry" 是 1 整个 token，
# 难怪数不出来里面有 3 个 r！

实际应用 / Applications

| 场景 Use Case | Tokenization 的影响 Impact |

|-------------|--------------------------|

| Prompt Engineering | 压缩 token 数 → 降低成本、扩大 context 空间 |

| RAG 系统 | Chunking 时要按 token 数切，不按字符数切 |

| 微调 Fine-tuning | 训练数据的 token 分布影响模型性能 |

| 多语言支持 | 不同语言的 token 效率差异很大（英文 > 中文 > 某些小语种）|

有趣彩蛋 / Fun Fact

GPT-4 的词汇表有 ~100,256 个 token（cl100k_base encoding）。

其中包括：整个英文单词、代码关键词，甚至 Python（带前置空格）和 Python 是不同的 token！

空格有时候是 token 的一部分，这就是为什么 prompt 里的格式细节可能影响输出——模型在 token 层面感知到了差异。

Day 4 / 100 — AI 基础系列 AI Foundations

昨天 Day 3：AI News Roundup | 明天 Day 5：Embeddings — 语义的数学表达

byte-by-byte — 2026-03-16

Mon, 16 Mar 2026 12:00:00 +0000

🏗️ System Design

🏗️ 系统设计 Day 3 / System Design Day 3

HTTP/HTTPS & REST APIs

Category: Fundamentals | Difficulty: Beginner | Phase: Foundation

想象你在设计... / Imagine You're Building...

想象你是一家餐厅的服务员。客人（浏览器）坐下来，告诉你他想要什么（HTTP请求），你去厨房（服务器）取回食物，然后送回来（HTTP响应）。这就是HTTP的本质——一种双方约定好的"点餐协议"。

Imagine you're a waiter at a restaurant. The customer (browser) tells you what they want (HTTP request), you go to the kitchen (server), and bring back the food (HTTP response). That's HTTP in a nutshell — a standardized "ordering protocol" both sides agree on.

架构图 / Architecture Diagram


  CLIENT (Browser/App)
        │
        │  HTTP Request
        │  GET /api/users/123
        │  Headers: {Authorization: "Bearer token..."}
        ▼
  ┌─────────────────────┐
  │    LOAD BALANCER     │
  │  (distributes load)  │
  └──────┬──────┬────────┘
         │      │
         ▼      ▼
  ┌──────────┐ ┌──────────┐
  │ Server 1 │ │ Server 2 │   ← Stateless REST servers
  └──────┬───┘ └──────┬───┘
         │             │
         └──────┬──────┘
                ▼
        ┌───────────────┐
        │   DATABASE    │
        │  (source of   │
        │    truth)     │
        └───────────────┘
                │
        HTTP Response
        200 OK
        {"id": 123, "name": "Alice"}
        │
        ▼
  CLIENT receives data

HTTP vs HTTPS — 核心区别 / Core Difference

HTTP — HyperText Transfer Protocol

- 明文传输，数据可被中间人截获

- Plaintext transmission; data can be intercepted

HTTPS — HTTP + TLS/SSL 加密

- 所有数据加密传输，第三方无法读取内容

- All data encrypted; third parties can't read the content

- 通过证书验证服务器身份（你真的是在和 google.com 说话吗？）

- Certificate verifies server identity (are you really talking to google.com?)


HTTP:   你的密码 → [网络] → 服务器       ← 路由器能看到！
HTTPS:  你的密码 → [加密] → [网络] → [解密] → 服务器  ← 中间人只看到乱码

REST API — 六个约束 / Six Constraints

REST (Representational State Transfer) 不是技术，是一套设计风格：

1. Stateless（无状态） — 服务器不记住你。每个请求自带所有信息。

Server doesn't remember you. Each request carries all needed info.

2. Client-Server（客户端-服务器分离） — 前端和后端独立演化。

Frontend and backend evolve independently.

3. Cacheable（可缓存） — 响应可以被缓存，减少重复请求。

Responses can be cached to reduce redundant requests.

4. Uniform Interface（统一接口） — 用标准HTTP方法操作资源。

Use standard HTTP verbs to manipulate resources.

5. Layered System（分层系统） — 客户端不关心中间有几层。

Client doesn't care how many layers exist in between.

6. Code on Demand (Optional) — 服务器可返回可执行代码（如JS）。

Server can return executable code (e.g., JavaScript).

HTTP 方法 CRUD 对应关系 / HTTP Methods → CRUD


HTTP Method    CRUD Operation    Example
──────────────────────────────────────────────────
GET            Read              GET /users/123
POST           Create            POST /users  {body}
PUT            Replace (全量)    PUT /users/123  {full body}
PATCH          Update (部分)     PATCH /users/123 {partial}
DELETE         Delete            DELETE /users/123

状态码速查 / Status Code Cheat Sheet:


2xx  ✅  成功 / Success
  200 OK           — 请求成功
  201 Created      — 资源已创建
  204 No Content   — 成功但无返回体

3xx  ↩️  重定向 / Redirect
  301 Moved Permanently  — 永久跳转
  304 Not Modified       — 用缓存

4xx  ❌  客户端错误 / Client Error
  400 Bad Request    — 你的请求有问题
  401 Unauthorized   — 没登录
  403 Forbidden      — 登录了但没权限
  404 Not Found      — 资源不存在
  429 Too Many Reqs  — 限流了

5xx  💥  服务器错误 / Server Error
  500 Internal Server Error — 服务器崩了
  503 Service Unavailable   — 服务不可用

为什么这样设计？/ Why This Design?

为什么REST选择无状态（Stateless）？

如果服务器记住每个用户的状态，那扩展到100台服务器时，你必须确保每次请求都打到同一台机器（叫做"粘性会话"sticky session）。这非常麻烦。

无状态的好处：任何服务器都能处理任何请求，水平扩展极其简单。

If servers remembered user state, scaling to 100 servers would require routing each user to the same server every time ("sticky sessions") — a maintenance nightmare. Stateless = any server can handle any request = horizontal scaling is trivial.

别踩这个坑 / Don't Fall Into This Trap

坑 1: 在GET请求中修改数据


❌  GET /deleteUser?id=123    # 语义错误，GET应该是只读的
✅  DELETE /users/123

坑 2: 用动词命名资源（URL应该是名词）


❌  POST /createUser
❌  GET  /getUserById?id=123
✅  POST /users
✅  GET  /users/123

坑 3: 忘记区分401和403


401 Unauthorized → "你是谁？请先登录" (Who are you? Please log in)
403 Forbidden    → "我知道你是谁，但你没权限" (I know who you are, but you can't)

坑 4: 滥用200 OK返回错误信息


❌  200 OK  {"error": "User not found"}   # 前端要额外解析body判断成功失败
✅  404 Not Found  {"message": "User not found"}

与昨天的联系 / Connection to Day 2

昨天我们学了DNS和TCP/IP。现在你明白了完整链路：

1. 你在浏览器输入 https://api.example.com/users

2. DNS 解析域名 → IP地址

3. TCP 建立连接（三次握手）

4. TLS 握手，建立加密通道（HTTPS）

5. HTTP 发送请求，服务器返回响应

6. 浏览器渲染数据

Yesterday we covered DNS and TCP/IP. Now you see the full picture: DNS → TCP → TLS → HTTP → response. Each layer builds on the one before it.

Day 3 | 系统设计基础系列 | 明天：数据库基础

💻 Algorithms

💻 算法 Day 3 / Algorithms Day 3

#1 Two Sum (Easy) — Arrays & Hashing

🔗 https://leetcode.com/problems/two-sum/

现实类比 / Real-World Analogy

想象你在一家超市，口袋里有 $11，你想找到两件商品，价格加起来正好等于 $11。

笨方法：把每件商品和其他所有商品逐一配对比较 — 效率太低了 O(n²)。

聪明方法：每次拿起一件商品（价格 x），立刻检查你的"已看过价格"笔记本里有没有 11 - x。有就找到了！这就是哈希表的思路。

Imagine you're in a supermarket with $11 and want two items that add up to exactly $11. Brute force: compare every pair. Smart way: for each item (price x), instantly check if 11 - x is already in your "seen prices" notebook. That's the hash map approach.

题目 / Problem Statement

中文： 给定一个整数数组 nums 和一个目标值 target，找出数组中和为 target 的两个数的下标。假设每道题恰好只有一个答案，且同一个元素不能使用两次。

English: Given an array of integers nums and an integer target, return the indices of the two numbers that add up to target. You may assume exactly one solution exists, and you may not use the same element twice.


Input:  nums = [2, 7, 11, 15], target = 9
Output: [0, 1]   (because nums[0] + nums[1] = 2 + 7 = 9)

逐步思路 / Step-by-Step Walkthrough

方法1：暴力 O(n²) — 先理解，别用它


For i = 0 (num = 2):
  For j = 1 (num = 7):  2 + 7 = 9 ✅ → return [0, 1]

方法2：哈希表 O(n) — 这才是正解

核心思路：遍历时，对每个数 num，我们不是找"谁加上 num 等于 target"，而是找"target - num 在不在我们之前见过的数里"。

具体示例追踪：


nums = [2, 7, 11, 15], target = 9
seen = {}  (空字典)

i=0, num=2:
  complement = 9 - 2 = 7
  7 in seen? → No (seen is empty)
  → 把 2 存入 seen: seen = {2: 0}

i=1, num=7:
  complement = 9 - 7 = 2
  2 in seen? → Yes! seen[2] = 0
  → 返回 [seen[2], i] = [0, 1] ✅

再来一个不那么直接的例子：


nums = [3, 2, 4], target = 6
seen = {}

i=0, num=3:
  complement = 6 - 3 = 3
  3 in seen? → No
  seen = {3: 0}

i=1, num=2:
  complement = 6 - 2 = 4
  4 in seen? → No
  seen = {3: 0, 2: 1}

i=2, num=4:
  complement = 6 - 4 = 2
  2 in seen? → Yes! seen[2] = 1
  → 返回 [seen[2], i] = [1, 2] ✅
  (nums[1] + nums[2] = 2 + 4 = 6 ✓)

Python 解法 / Python Solution


def twoSum(nums: list[int], target: int) -> list[int]:
    # Hash map: value → index
    # We store numbers we've already visited
    seen = {}
    
    for i, num in enumerate(nums):
        # What number do we NEED to complete the pair?
        complement = target - num
        
        # Check if that number is already in our map
        if complement in seen:
            # Found it! Return both indices
            return [seen[complement], i]
        
        # Haven't found a pair yet; record this number and its index
        seen[num] = i
    
    # Problem guarantees a solution exists, so we won't reach here
    return []


# Test cases
print(twoSum([2, 7, 11, 15], 9))   # [0, 1]
print(twoSum([3, 2, 4], 6))         # [1, 2]
print(twoSum([3, 3], 6))            # [0, 1]

复杂度分析 / Complexity Analysis


时间复杂度 Time:  O(n)
  → 只遍历数组一次。哈希表查找是 O(1) 平均。

空间复杂度 Space: O(n)
  → 最坏情况下，哈希表存储 n-1 个元素（最后一对才匹配）。

vs. 暴力 Brute Force:
  Time: O(n²)  — 双重循环
  Space: O(1)  — 不用额外空间

权衡： 用空间换时间。这通常是对的 — 内存便宜，时间贵。

Trade-off: We trade space for time. Usually the right call — memory is cheap, user time is not.

边界情况 / Edge Cases


# 两个相同的数字
twoSum([3, 3], 6)   # → [0, 1] ✅
# 关键：先检查 complement，再存入 seen
# 这样避免同一元素用两次

# 负数
twoSum([-1, -2, -3, -4, -5], -8)   # → [2, 4] (nums[2]+nums[4] = -3+-5)

# 只有两个元素
twoSum([1, 9], 10)   # → [0, 1]

举一反三 / Pattern Recognition

掌握了"边遍历边建哈希表"这个模式，你可以解决：

- Two Sum II (sorted array) — 用双指针，O(1) 空间

- Two Sum III (data structure) — 设计一个支持 add/find 的类

- 3Sum (#15) — 固定一个数，对剩余用双指针

- 4Sum (#18) — 嵌套一层再用双指针

- Subarray Sum Equals K (#560) — 前缀和 + 哈希表

核心模式： "我需要找 X，先问问我见过 X 吗？没见过，就记下现在这个。"

The pattern: "I need X. Have I seen X? No? Then record what I have now."

Day 3 | 数组与哈希系列 | 昨天：Valid Anagram | 明天：Contains Duplicate

🗣️ Soft Skills

🗣️ 软技能 Day 3 / Soft Skills Day 3

Conflict Resolution — 冲突解决

问题 / Question:

> "Tell me about a time you disagreed with your manager or a senior engineer. How did you handle it?"

> "讲一个你与你的经理或高级工程师意见相左的经历。你是怎么处理的？"

为什么这很重要 / Why This Matters

这道题考察的不是"谁对谁错"，而是：

1. 你有没有主见？ 没想法的工程师不能独立工作

2. 你有没有成熟度？ 能否在坚持立场和尊重经验之间取得平衡

3. 你能不能影响他人？ 在没有直接权力的情况下推动决策

This question isn't about who was right. It probes:

1. Do you have independent judgment?

2. Are you mature enough to push back respectfully?

3. Can you influence without authority?

At senior/staff level, this is a daily reality. The inability to navigate technical disagreements is a major signal that someone isn't ready to operate at the next level.

STAR 框架拆解 / STAR Framework Breakdown


S - Situation  (情境)  ← 快速设置场景，30秒内
T - Task       (任务)  ← 你的目标/责任是什么
A - Action     (行动)  ← 这是重点，用70%的时间
R - Result     (结果)  ← 量化 + 反思

❌ 糟糕的回答 / Bad Approach

> "我们在选数据库。我觉得应该用 PostgreSQL，我的经理说用 MongoDB。我给他看了对比文章，最后他同意了我的观点。"

为什么不好：

- 没有上下文（为什么要做这个选择？）

- 没有展示思考过程（你怎么决定提出来的？）

- "给他看文章"太被动 — 没有体现你如何主动建立共识

- 听起来像是"我赢了，他输了" — 没有合作感

✅ 好的回答 / Good Approach

> "我们要为用户分析平台选择数据存储方案。我的Tech Lead倾向于继续用我们熟悉的MySQL，因为团队对它最熟悉，迁移成本低。我的判断是，我们的查询模式——大量聚合、时间序列、不规则的事件结构——更适合列式存储，比如ClickHouse。

> 我没有直接开会说'你错了'。我先花了两天做了一个小型基准测试：用两套方案各跑了我们最慢的5个查询，把性能数据整理成表格。同时我也整理了迁移的风险点和成本估算。

> 然后我约了Tech Lead 1-on-1，先说：'我想和你讨论一下数据库选型，我做了一些数据，想听听你的想法。'我们发现他的核心顾虑是迁移风险，而不是性能。于是我们达成了折中方案：新的分析pipeline用ClickHouse，老的业务数据留在MySQL，不做迁移。

> 最终上线后，分析查询从平均 8 秒降到 0.3 秒，用户投诉减少了 80%。而且我和Tech Lead的关系没有受损——他后来还把我当成这个领域的go-to person。"

为什么好：

- 清楚的商业背景

- 展示了"先做数据再谈分歧"的成熟判断

- 用1-on-1而不是公开会议处理分歧（情商高）

- 承认对方顾虑有道理，找到折中

- 量化结果

- 关系反而变好了

场景模板 / Scenario Template

用这个框架构建你自己的故事：


情境：我们在 [项目] 做 [技术决策]
分歧：我的 [经理/Tech Lead] 倾向于 [方案A]
      我的判断是 [方案B] 更合适，因为 [数据/理由]
行动：
  1. 我先 [做了什么验证工作]，而不是直接开会争论
  2. 我通过 [1-on-1/小型演示/数据报告] 分享我的发现
  3. 我先理解了对方的核心顾虑是 [XXX]
  4. 我们达成了 [折中方案/对方被说服/我更新了我的判断]
结果：[量化结果] + [关系/团队影响]

Senior/Staff 级别的加分点 / Senior/Staff Level Tips

1. 展示你知道何时该放手 / Show you know when to let go

> "我做了充分的论证，但最终决策权在他。我接受了这个决定，全力支持执行。六个月后，我们确实遇到了我预料的问题，但那时我们一起复盘，而不是我说'我早就说过了'。"

2. 主动建立共识而非赢得辩论 / Build alignment, don't win debates

Staff工程师知道：即使你是对的，如果你让别人"输了"，你长期付出的代价更大。

Staff engineers know: even if you're right, making someone else "lose" costs you more in the long run.

3. 把技术分歧和人际关系分开 / Separate technical disagreement from personal conflict

> "我特别注意在表达不同意见时，聚焦在数据和影响上，而不是质疑他的判断能力。"

关键要点 / Key Takeaways


✅ DO:
  - 先做功课（数据、原型、基准测试）再提出异议
  - 用1-on-1，不要在大会议上让人难堪
  - 理解对方顾虑，找折中点
  - 量化你的结果
  - 即使不同意，也要优雅地接受最终决策

❌ DON'T:
  - "我最终说服了他" 式的叙述（听起来自大）
  - 描述一个你完全错了的故事（除非重点是你从中学到了什么）
  - 没有结果（"我们还在讨论…"）
  - 批评你的前经理（面试官会想：他会不会也这么说我？）

Day 3 | 软技能系列 | 昨天：影响力（无直接权力）| 明天：处理模糊需求

🎨 Frontend

🎨 前端 Day 3 / Frontend Day 3

CSS Grid — Two-Dimensional Layouts / 二维布局

Week 1 | CSS Fundamentals

猜猜这段代码输出什么？/ What Does This Layout Look Like?



  A
  B
  C
  D
  E


.grid {
  display: grid;
  grid-template-columns: 1fr 2fr 1fr;
  grid-template-rows: 100px 100px;
  gap: 10px;
}

猜猜结果 / Guess the output:


┌────────┬───────────────┬────────┐
│   A    │       B       │   C    │
│ (1fr)  │     (2fr)     │ (1fr)  │
│        │               │        │  ← row 1: 100px tall
│────────┼───────────────┼────────│
│   D    │       E       │(empty) │
│ (1fr)  │     (2fr)     │        │  ← row 2: 100px tall
└────────┴───────────────┴────────┘

Total 4fr per row → A=25%, B=50%, C=25%

✅ 答案： 3列布局，比例 1:2:1。B 的宽度是 A 和 C 的两倍。第五个元素 E 占第二行第二格，第三格为空。

Grid vs Flexbox — 一句话记住 / One-Liner to Remember


Flexbox = 一维  (一行 or 一列)     Think: 导航栏, 按钮组
Grid    = 二维  (行 AND 列同时)    Think: 页面布局, 卡片网格

昨天（Day 2）我们学了 Flexbox。今天 Grid 是它的二维升级版。

核心概念图解 / Core Concepts Illustrated


display: grid;
grid-template-columns: 1fr 1fr 1fr;  ← 3 equal columns
grid-template-rows: auto auto;        ← 2 rows, height = content

Grid Container (父元素)
┌──────────┬──────────┬──────────┐
│  cell    │  cell    │  cell    │ ← row 1
│  [0,0]   │  [0,1]   │  [0,2]  │
├──────────┼──────────┼──────────┤
│  cell    │  cell    │  cell    │ ← row 2
│  [1,0]   │  [1,1]   │  [1,2]  │
└──────────┴──────────┴──────────┘
     ↑            ↑           ↑
   col 1        col 2       col 3

让子元素跨格 / Spanning Multiple Cells


/* 这个元素跨越2列 */
.item-a {
  grid-column: 1 / 3;   /* start at line 1, end at line 3 */
  /* shorthand: grid-column: span 2; */
}

/* 这个元素跨越2行 */
.item-b {
  grid-row: 1 / 3;      /* start at line 1, end at line 3 */
}


Before span:          After .item-a spans 2 cols:
┌───┬───┬───┐         ┌─────────┬───┐
│ A │ B │ C │         │    A    │ C │   ← A now takes col 1+2
├───┼───┼───┤         ├───┬─────┴───┤
│ D │ E │ F │         │ D │ E   │ F │
└───┴───┴───┘         └───┴─────┴───┘

代码示例：经典页面布局 / Classic Page Layout


.page {
  display: grid;
  grid-template-areas:
    "header  header  header"
    "sidebar main    main  "
    "footer  footer  footer";
  grid-template-columns: 200px 1fr 1fr;
  grid-template-rows: 60px 1fr 40px;
  min-height: 100vh;
  gap: 8px;
}

.header  { grid-area: header; }
.sidebar { grid-area: sidebar; }
.main    { grid-area: main; }
.footer  { grid-area: footer; }


Visual Result:
┌──────────────────────────────────┐
│           HEADER (full width)    │  60px
├───────────┬──────────────────────┤
│           │                      │
│  SIDEBAR  │        MAIN          │  flex-grow
│  (200px)  │      (remaining)     │
│           │                      │
├───────────┴──────────────────────┤
│           FOOTER (full width)    │  40px
└──────────────────────────────────┘

grid-template-areas 的魔法： 用 ASCII 艺术描述布局，可读性极强。

你可能不知道 / You Might Not Know

1. `fr` 单位只分配剩余空间


grid-template-columns: 200px 1fr 1fr;

先分配固定的 200px，然后把剩余宽度按 1:1 分给后两列。不是三等分！

fr only distributes remaining space after fixed sizes are allocated.

2. `auto-fill` vs `auto-fit`


/* auto-fill: 尽可能多地创建列（即使是空的） */
grid-template-columns: repeat(auto-fill, minmax(150px, 1fr));

/* auto-fit: 拉伸现有列填满空间（不创建空列） */
grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));

这是响应式布局的神器——不写任何 media query 就能自适应！

This is the secret to responsive grid layouts without media queries.

3. Grid 也能嵌套


.card-grid {
  display: grid;
  grid-template-columns: repeat(3, 1fr);
}

.card {
  display: grid;  /* nested grid! */
  grid-template-rows: auto 1fr auto;  /* image | content | button */
}

Mini Challenge / 小挑战

用 CSS Grid 创建一个响应式卡片布局，要求：

- 大屏显示4列，中等屏幕3列，小屏2列

- 只用一行 CSS，不用任何 media query

提示 / Hint


.grid {
  display: grid;
  grid-template-columns: repeat(auto-fill, minmax(200px, 1fr));
  gap: 16px;
}

当容器宽度变化时，minmax(200px, 1fr) 会自动计算可以放多少列！

Flexbox vs Grid 决策树 / Decision Tree


你需要对齐内容吗？
│
├─ 只在一个方向（行 OR 列）→ 用 Flexbox
│   例：导航栏、按钮组、居中一个元素
│
└─ 同时在两个方向（行 AND 列）→ 用 Grid
    例：页面整体布局、图片画廊、仪表盘

Day 3 | CSS 布局系列 | 昨天：Flexbox | 明天：响应式设计 & Media Queries

🤖 AI

🤖 AI Day 3 — News Roundup / AI 新闻速递

2026年3月第三周 | March 2026, Week 3

> ⚠️ 注意 / Note: AI 领域动态极快。以下新闻基于截至本文生成时的公开报道，具体数字与细节以原始来源为准。

📰 Story 1: Meta 的 "Avocado" 模型推迟发布

根据报道，Meta 的下一代 AI 大模型（内部代号 "Avocado"）被推迟至至少 2026 年 5 月，原因是性能尚未达到 Google 等竞争对手的水平。

背景： Meta 此前招募了 Scale AI CEO Alexandr Wang，并投入数十亿美元试图追赶。Avocado 将是这一系列努力的首个重大发布。

为什么你应该关心 / Why You Should Care:

Meta 的 AI 布局直接影响整个生态——LLaMA 系列是很多开源项目的基础。如果 Avocado 继续落后于 GPT-4o 级别的模型，开源社区的"平替方案"质量也会受影响。对前端/全栈工程师来说：Meta AI 功能的集成（Facebook、Instagram、WhatsApp）会滞后，这可能影响你所在产品的 AI feature timeline。

📰 Story 2: Palantir 的 Maven Smart System — AI 军事应用的争议

Palantir 在其 AIPCon 会议上展示了 Maven Smart System，这是一个 AI 驱动的军事目标识别系统。演示视频显示，用户可以用"左键点击、右键点击、左键点击"的方式锁定打击目标，引发了广泛争议。

为什么你应该关心 / Why You Should Care:

这不只是道德话题——它是工程师职业生涯中越来越难以回避的现实问题。AI 的军事应用正在快速落地，而关于"工程师是否应该为武器系统写代码"的讨论也在硅谷持续升温（参考 Google 的 Project Maven 员工抗议事件）。面试中，你可能会被问到对这类项目的看法。答案没有对错，但你需要有自己的立场。

📰 Story 3: BuzzFeed 的 AI 教训 — 内容农场的警示

BuzzFeed 2025 年亏损达 $5730 万，股价跌至 $0.70，部分原因是大量使用 AI 生成文章和测验内容。尽管如此，CEO Jonah Peretti 仍表示将推出"新的 AI 应用"。

为什么你应该关心 / Why You Should Care:

这是一个"AI 用错地方"的典型案例。内容农场式的 AI 应用（用 AI 批量生产无差异化内容）正在被市场惩罚。读者不傻，算法（Google SEO、推荐算法）也越来越能识别低质量 AI 内容。

工程师视角： 如果你在构建 AI 内容工具，问题不是"能不能生成"，而是"用户为什么要看这个而不是其他的"。差异化 > 数量。

📰 Story 4: Amazon Alexa Plus 新增"Sassy"个性模式

Amazon 为 Alexa Plus 推出了新的"Sassy"个性风格，特点是"不加滤镜的个性、机智的讽刺、以及偶尔被屏蔽的脏话"。该功能仅限成人，需要额外身份验证。

为什么你应该关心 / Why You Should Care:

这其实是一个有趣的 UX / AI 产品设计信号——用户对 AI 助手的"人格化"需求越来越强。语音 AI 的个性化是下一个竞争维度（而不只是"准确性"）。

技术角度： 实现"可切换人格"需要在 prompt 层面和 safety filter 层面同时设计。"偶尔允许脏话"这个边界的实现本身就是个有趣的工程问题。

📰 Story 5: Meta 收购 Moltbook — AI 代理的"社交网络"

Meta 近期收购了 Moltbook，一个"AI 代理社交网络"平台。收购后，平台更新了服务条款：用户（不是 Meta）对其 AI 代理的所有行为负责，无论行为是否有意为之、是否自主发生。

为什么你应该关心 / Why You Should Care:

这是 AI 代理（Agent）时代的法律雏形。谁对 AI Agent 的行为负责？目前答案是"你，用户"。这对于正在构建 AI Agent 产品的工程师来说是重要信号：你需要设计审计日志、权限边界、和"人类在回路"（human-in-the-loop）机制，不只是因为产品好，而是因为法律会要求你这么做。

本周 AI 关键词 / This Week's AI Keywords


Agentic AI (代理式AI)    — AI 不只回答问题，而是自主行动
Human-in-the-loop (HITL) — 关键决策需要人类确认
AI Liability (AI责任归属) — 谁为 AI 的行为负责？
Model Delay (模型推迟)   — 发布时间表 ≠ 实际发布时间
AI + Military (AI军事化)  — 工程伦理的新战场

Day 3 | AI 新闻系列 | 昨天：Transformer 工作原理 | 明天：RAG（检索增强生成）

byte-by-byte — 2026-03-15

Sun, 15 Mar 2026 12:00:00 +0000

🏗️ System Design

🏗️ 系统设计 Day 2 / System Design Day 2

Topic: DNS, IP, and TCP/UDP — 互联网的"电话本"与"快递公司"

场景引入 / Scenario

想象你在设计一个全球用户访问的网站。你写了 https://myblog.com，浏览器是怎么找到你服务器的？从你按下 Enter 到页面出现，背后发生了什么魔法？

Imagine you're building a website for global users. You type https://myblog.com — how does your browser find your server? What magic happens between pressing Enter and seeing the page?

DNS：互联网的电话本 / DNS: The Internet's Phone Book

人类记得 google.com，机器只认识 142.250.80.46。DNS（Domain Name System）就是把"人话"翻译成"机器话"的翻译官。

Humans remember google.com, machines only understand 142.250.80.46. DNS translates human-readable names into machine-readable IPs.

DNS 查询流程 / DNS Resolution Flow


你的浏览器
    │
    ▼
[1] 本地缓存 / Local Cache
    │  (找到了? 直接返回 / Found? Return immediately)
    │  (没找到? 继续 / Not found? Continue)
    ▼
[2] 操作系统 hosts 文件 / OS hosts file
    │  (/etc/hosts on Linux/Mac)
    ▼
[3] 递归解析器 / Recursive Resolver
    │  (通常是你的 ISP 或 8.8.8.8)
    │  (Usually your ISP or 8.8.8.8)
    ▼
[4] 根域名服务器 / Root Nameserver
    │  ("我不知道 myblog.com，但 .com 服务器知道")
    │  ("I don't know myblog.com, but .com nameserver does")
    ▼
[5] TLD 服务器 / TLD Nameserver (.com)
    │  ("myblog.com 的权威服务器在这里")
    │  ("myblog.com's authoritative server is here")
    ▼
[6] 权威 DNS 服务器 / Authoritative Nameserver
    │  "myblog.com → 203.0.113.42"
    ▼
IP 地址返回给浏览器 / IP returned to browser

真实类比 / Real-world analogy:

根服务器 = 全国电话总机 → TLD 服务器 = 城市区号本 → 权威服务器 = 某公司的直线

Root server = National operator → TLD = City directory → Authoritative = Company's direct line

IP：你在网络上的"门牌号" / IP: Your Network "Address"

- IPv4: 203.0.113.42 — 32位，约43亿个地址，已经快用完了

- IPv6: 2001:0db8:85a3::8a2e:0370:7334 — 128位，几乎无限

IPv4 is 32-bit (~4.3 billion addresses, nearly exhausted). IPv6 is 128-bit, essentially unlimited.

公网 vs 私网 / Public vs Private IP:


家庭网络 / Home Network:
  你的电脑      → 192.168.1.100 (私网/Private)
  你的手机      → 192.168.1.101 (私网/Private)
  路由器对外    → 203.0.113.42  (公网/Public)  ← 互联网只看到这个
                                                  ← Internet only sees this
NAT（网络地址转换）帮你把私网地址映射到公网
NAT (Network Address Translation) maps private to public

TCP vs UDP：快递公司 vs 广播电台

| 特性 / Feature | TCP | UDP |

|---|---|---|

| 连接方式 / Connection | 三次握手 / 3-way handshake | 无连接 / Connectionless |

| 可靠性 / Reliability | ✅ 保证送达 / Guaranteed | ❌ 尽力而为 / Best-effort |

| 顺序 / Order | ✅ 有序 / Ordered | ❌ 可能乱序 / May arrive out of order |

| 速度 / Speed | 较慢 / Slower | 更快 / Faster |

| 适用场景 / Use cases | HTTP, Email, File transfer | Video streaming, Gaming, DNS |

TCP 三次握手 / TCP 3-Way Handshake


客户端 / Client          服务器 / Server
     │                        │
     │──── SYN ──────────────>│  "我想连接你 / I want to connect"
     │                        │
     │<─── SYN-ACK ───────────│  "好的，收到 / OK, received"
     │                        │
     │──── ACK ──────────────>│  "我也确认了 / Confirmed"
     │                        │
     │══════ 连接建立 / Connection Established ══════│

为什么需要3次？/ Why 3 handshakes?

2次不够——服务器无法确认客户端收到了回复。就像打电话："喂？" "喂，听到了吗？" "听到了，开始说吧。"

2 isn't enough — the server can't confirm the client received its reply. Like a phone call: "Hello?" "Hello, can you hear me?" "Yes, go ahead."

为什么这样设计？/ Why This Design?

DNS 分层设计的好处 / Benefits of hierarchical DNS:

- 可扩展性: 根服务器只有13个，但全球有数十亿个域名

- 缓存: 每层都可以缓存，减少重复查询

- 容错: 多个根服务器，一个挂了其他继续工作

Scalability (13 root servers handle billions of domains via delegation), caching at every layer, and fault tolerance through redundancy.

别踩这个坑 / Don't Fall Into This Trap

坑1: DNS 缓存污染面试题

面试问："为什么我改了 DNS 记录，但用户还是访问旧服务器？"

答：TTL（Time To Live）没过期。DNS 记录有缓存时间，改了之后要等 TTL 归零才会全面生效。上线前提前降低 TTL！

DNS cache: after changing DNS records, users still hit old servers until TTL expires. Best practice: lower TTL hours before a migration.

坑2: TCP 不等于安全

TCP 保证送达，但不加密。http:// 用 TCP，但数据是明文。需要加密要用 TLS（即 https://）。

TCP guarantees delivery, not security. HTTP over TCP is plaintext. TLS (HTTPS) is needed for encryption.

关键要点 / Key Takeaways

1. DNS = 域名 → IP 的翻译，分层设计，有缓存

2. IPv4 快用完了，IPv6 是未来

3. TCP = 可靠但慢（文件、网页）；UDP = 快但不可靠（直播、游戏）

4. 三次握手确保双向通信可靠建立

DNS translates domains to IPs with hierarchical caching. IPv4 is nearly exhausted, IPv6 is the future. TCP = reliable but slower; UDP = fast but lossy. 3-way handshake ensures both ends can send and receive.

Day 2 of 100 | #ByteByByte | 系统设计基础系列

💻 Algorithms

💻 算法 Day 2 / Algorithms Day 2

#242 Valid Anagram（有效的字母异位词）— Easy | Pattern: Arrays & Hashing

生活类比 / Real-World Analogy

想象你有两袋相同字母的乐高积木。不管你怎么排列，只要袋子里的积木种类和数量完全一样，就是"字母异位词"。我们要做的，就是数清楚每个袋子里有什么积木。

Imagine two bags of Lego pieces. As long as both bags contain the exact same types and counts of pieces — no matter how they're arranged — they're anagrams. Our job: count the pieces in each bag and compare.

题目 / Problem Statement

中文: 给定两个字符串 s 和 t，判断 t 是否是 s 的字母异位词（即用完全相同的字母，重新排列而成）。

English: Given two strings s and t, return true if t is an anagram of s, and false otherwise. An anagram uses the same characters with the same frequencies.


Input: s = "anagram", t = "nagaram"   → Output: True
Input: s = "rat",     t = "car"       → Output: False

解题思路 / Step-by-Step Walkthrough

核心想法 / Core Idea:

字母异位词 = 每个字符出现的次数完全相同。

Anagrams have identical character frequency distributions.

方法: 哈希表计数 / Method: Hash Map Counting

用一个字典，遍历 s 时 +1，遍历 t 时 -1。最后所有值都是 0 → 是异位词。

Use one dictionary: +1 for each char in s, -1 for each char in t. If all values are 0 → anagram.

具体追踪 / Concrete Trace

s = "anagram", t = "nagaram"

遍历 s (+1):


'a' → count['a'] = 1
'n' → count['n'] = 1
'a' → count['a'] = 2
'g' → count['g'] = 1
'r' → count['r'] = 1
'a' → count['a'] = 3
'm' → count['m'] = 1

状态 / State: {'a':3, 'n':1, 'g':1, 'r':1, 'm':1}

遍历 t (-1):


'n' → count['n'] = 0
'a' → count['a'] = 2
'g' → count['g'] = 0
'a' → count['a'] = 1
'r' → count['r'] = 0
'a' → count['a'] = 0
'm' → count['m'] = 0

状态 / State: {'a':0, 'n':0, 'g':0, 'r':0, 'm':0}

所有值为 0 → True ✅

Python 解法 / Python Solution


from collections import defaultdict

def isAnagram(s: str, t: str) -> bool:
    # Quick check: different lengths can't be anagrams
    # 长度不同直接排除
    if len(s) != len(t):
        return False
    
    # Count character frequencies
    # 统计每个字符出现的频率
    count = defaultdict(int)
    
    # +1 for every char in s
    for char in s:
        count[char] += 1
    
    # -1 for every char in t
    for char in t:
        count[char] -= 1
    
    # If all zeros, they have the same characters
    # 所有计数为零，说明字符完全匹配
    return all(v == 0 for v in count.values())


# 更 Pythonic 的写法 / More Pythonic version:
from collections import Counter

def isAnagram_v2(s: str, t: str) -> bool:
    return Counter(s) == Counter(t)

复杂度分析 / Complexity Analysis

| | 复杂度 / Complexity | 说明 / Explanation |

|---|---|---|

| 时间 / Time | O(n) | n = len(s)，遍历两次 / two passes |

| 空间 / Space | O(k) | k = 字符集大小，最多26个字母 / at most 26 letters |

为什么不排序？/ Why not sort?

排序是 O(n log n)，哈希表是 O(n)，更快。面试时提出这个比较能加分。

Sorting is O(n log n) vs O(n) for hash map. Always worth mentioning this tradeoff in interviews.

边界情况 / Edge Cases


isAnagram("a", "a")     # True  — single char match
isAnagram("a", "b")     # False — single char mismatch
isAnagram("", "")       # True  — both empty (Counter({}) == Counter({}))
isAnagram("ab", "a")    # False — length check catches this early
isAnagram("aa", "bb")   # False — same length, different chars

举一反三 / Pattern Recognition

这道题的核心模式：用哈希表统计频率，再比较。以下题目用同一个模式：

Core pattern: use a hash map to count frequencies, then compare. Same pattern appears in:

| 题目 / Problem | 变化 / Twist |

|---|---|

| #49 Group Anagrams | 把所有互为异位词的字符串分组 / group all anagrams together |

| #438 Find All Anagrams in a String | 滑动窗口找所有异位词位置 / sliding window to find positions |

| #383 Ransom Note | 一个字符串能否由另一个构成 / can s be built from t's chars |

进阶思考 / Follow-up:

如果字符串包含 Unicode（中文、emoji）怎么办？用 Counter 依然 work，因为它对任何 hashable 字符都有效。

What if strings contain Unicode (Chinese, emoji)? Counter still works — it handles any hashable character.

Day 2 of 100 | #ByteByByte | Arrays & Hashing 系列

🗣️ Soft Skills

🗣️ 软技能 Day 2 / Soft Skills Day 2

Topic: 没有直接权力如何影响他人 / Influencing Without Authority

为什么这很重要 / Why This Matters

在大公司里，真正的工作不是独自完成任务，而是让别人帮你完成任务——在你没有权力命令他们的情况下。这是 Senior 和 Staff 工程师的核心技能，也是最常被问到的行为面试题之一。

In big tech, the real job isn't doing work alone — it's getting others to do work without being able to order them. This is the core skill separating senior from staff engineers, and one of the most common behavioral interview questions.

经典面试题 / Classic Question

> "描述一次你需要在没有直接管理权限的情况下影响他人的经历。"

> "Describe a situation where you had to influence others without having direct authority."

STAR 框架拆解 / STAR Framework Breakdown

S — Situation（情境）

设定背景：你在哪个团队，影响的是谁（另一个团队、高级别同事、跨职能合作者），以及为什么他们没有义务听你的。

Set the scene: which team, who you needed to influence (cross-team, senior colleague, partner org), and crucially — why they had no obligation to listen to you.

T — Task（任务）

你的目标是什么？为什么这件事很重要？影响失败会有什么后果？

What was your goal? Why did it matter? What was at stake if influence failed?

A — Action（行动） ← 这是重点 / This is where you shine

Senior 级别要展示的不是"我很能说服人"，而是系统性的影响力策略：

1. 理解对方的激励机制 — 他们关心什么？什么对他们有利？

2. 用数据说话 — 不是"我觉得应该这样"，而是"数据显示这会影响 X% 用户"

3. 建立盟友 — 先和愿意接受的人对齐，再扩大影响范围

4. 给对方一个赢的理由 — frame 成对他们也有好处，不是你求他们帮忙

Don't just say "I convinced them." Show a systematic influence strategy: understand their incentives, use data, build allies, frame it as their win too.

R — Result（结果）

量化结果。不只是"成功了"，而是"减少了 X 毫秒延迟"、"帮助团队提前 2 周上线"。

Quantify outcomes. Not "it worked" but "reduced P99 latency by 40ms" or "helped the team ship 2 weeks early."

❌ 坏回答 vs ✅ 好回答 / Bad vs Good Answer

❌ 坏回答：

> "我们的设计文档需要安全团队 review，但他们很忙不想做。我就发了很多邮件催他们，最后他们 review 了。"

问题：被动、无策略、显示影响力只是"坚持催"，没有展示任何高级技能。

Bad: "The security team was busy, so I kept emailing until they reviewed it." — Passive, no strategy, just persistence.

✅ 好回答：

> "我需要安全团队 review 一个涉及用户数据的新功能，但他们的 Q4 排期已经满了，而我们的 launch date 是固定的。我没有直接汇报关系。

> 我先花时间了解了安全团队的 OKR——他们那季度的目标之一是'减少高风险数据暴露事件'。我把这个功能的 review 包装成他们达成 OKR 的机会，而不是额外负担。

> 我准备了一份 1 页的风险摘要，聚焦于如果不 review 可能的合规风险，并提议缩小 review 范围（只看数据流部分，而不是整个 PR）来降低他们的时间成本。

> 同时，我找到了一位之前和安全团队合作过的 Staff Engineer，请他帮我引荐，建立了初始信任。

> 最终安全团队在 3 天内完成了 review，我们按时上线，功能还因为 review 过程发现并修复了一个边界条件。"

Good: Understood their OKRs, reframed as their win, reduced their cost, used a warm introduction to build trust. Showed systematic strategy.

场景模板 / Scenario Template to Adapt


情境: 我需要 [另一个团队/高级工程师/PM] 在 [时间节点] 前完成 [X]，
     但他们没有义务优先处理我的需求。

策略:
  1. 我研究了他们的 [优先级/OKR/痛点]
  2. 我将需求包装成对他们的 [利益/风险规避/认可机会]
  3. 我降低了他们的参与成本，通过 [缩小范围/提供草稿/async 方式]
  4. 我通过 [共同认识的人/过去的合作] 建立了信任基础

结果: [量化的结果]

Senior/Staff 级别加分项 / Senior/Staff Level Tips

1. 展示系统性思维，而不是一次性技巧。 Staff 工程师影响的不是一个人，而是建立了一套让他人自愿对齐的系统（写 RFC、建立 review 文化、设计 API 让正确做法成为默认）。

2. 提到失败的尝试。 "我第一次直接发需求，他们无视了。然后我调整策略…" — 这比一帆风顺更真实，也更展示学习能力。

3. 区分说服和操纵。 好的影响力是基于真实利益对齐，不是包装欺骗。面试官会探究"你是如何确保这对他们也真的有好处的？"

Show systematic thinking not one-off tricks. Mention what didn't work first — shows learning. Distinguish persuasion (real alignment) from manipulation (packaging deception).

关键要点 / Key Takeaways

1. 理解对方的激励，而不是假设他们应该配合你

2. 用数据和风险框架，而不是个人请求

3. 降低对方的参与成本 — 帮他们更容易说"好"

4. 量化结果 — "成功了"不够，需要具体数字

Understand their incentives. Use data/risk framing, not personal favors. Lower their cost to say yes. Always quantify results.

Day 2 of 100 | #ByteByByte | 行为面试系列

🎨 Frontend

🎨 前端 Day 2 / Frontend Day 2

Topic: Flexbox — 一维布局的瑞士军刀 / One-Dimensional Layouts Made Easy

猜猜这段代码输出什么？/ What Does This Code Output?



  A
  B
  C


.container {
  display: flex;
  justify-content: space-between;
  width: 300px;
}
.box {
  width: 80px;
  height: 80px;
  background: steelblue;
}

你的猜测 / Your guess:

A) 三个方块左对齐，紧靠在一起

B) 三个方块均匀分布，A 在最左，C 在最右，B 在中间

C) 三个方块居中显示

D) 报错，因为 80×3=240 < 300

答案: B ✅


|容器 300px / container 300px|
|A       |        B        |       C|
 ←80px→  ←←←30px gap→→→  ←80px→
         (C 也是 80px，右端对齐 / C is also 80px, flush right)

space-between 的含义：第一个元素靠左边，最后一个元素靠右边，中间的间距平均分配。

space-between: first item at start, last item at end, remaining space distributed equally between items.

剩余空间 = 300 - 80×3 = 60px，分成 2 份间距 = 每份 30px。

Remaining space = 300 - 240 = 60px, split into 2 gaps = 30px each.

Flexbox 心智模型 / Mental Model

Flexbox 的核心：一个主轴（main axis）和一个交叉轴（cross axis）。


flex-direction: row (默认/default)

主轴 main axis →→→→→→→→→→→→→→→→→→→→→→→
                ┌──────┐  ┌──────┐  ┌──────┐
                │  A   │  │  B   │  │  C   │
                └──────┘  └──────┘  └──────┘
交叉轴 cross axis ↓ (垂直方向/vertical)

flex-direction: column

主轴 main axis ↓  ┌──────┐
               ↓  │  A   │
               ↓  ├──────┤
               ↓  │  B   │
               ↓  ├──────┤
               ↓  │  C   │
                  └──────┘
交叉轴 cross axis → (水平方向/horizontal)

核心属性速查 / Key Properties Cheat Sheet

父容器属性 / Container Properties


.container {
  display: flex;
  
  /* 主轴方向 / Main axis direction */
  flex-direction: row | row-reverse | column | column-reverse;
  
  /* 主轴对齐 / Main axis alignment */
  justify-content: flex-start | flex-end | center 
                 | space-between | space-around | space-evenly;
  
  /* 交叉轴对齐 / Cross axis alignment */
  align-items: stretch | flex-start | flex-end | center | baseline;
  
  /* 换行 / Wrapping */
  flex-wrap: nowrap | wrap | wrap-reverse;
  
  /* gap (现代写法/modern) */
  gap: 16px;  /* 比 margin 更优雅 / cleaner than margin hacks */
}

子元素属性 / Item Properties


.item {
  /* 伸长比例 / Grow ratio */
  flex-grow: 0;    /* 默认不伸长 / default: don't grow */
  flex-grow: 1;    /* 占据剩余空间 / take remaining space */
  
  /* 收缩比例 / Shrink ratio */
  flex-shrink: 1;  /* 默认允许收缩 / default: can shrink */
  flex-shrink: 0;  /* 禁止收缩 / don't shrink */
  
  /* 基准尺寸 / Base size */
  flex-basis: auto | 200px | 30%;
  
  /* 简写 / Shorthand */
  flex: 1;        /* = flex-grow: 1, flex-shrink: 1, flex-basis: 0 */
  flex: 0 0 200px; /* = 固定200px，不伸不缩 / fixed 200px */
}

你可能不知道 / You Might Not Know (Gotcha!)

flex: 1 和 flex: 1 1 auto 不一样！


/* flex: 1 → flex-grow:1, flex-shrink:1, flex-basis: 0 */
/* 基准是 0，意思是从 0 开始按比例分配空间 */
/* base is 0: space is distributed purely by ratio */

/* flex: 1 1 auto → flex-grow:1, flex-shrink:1, flex-basis: auto */
/* 基准是内容大小，先按内容分，剩余的再按比例分 */
/* base is content size: content first, then distribute remaining */

.container { display: flex; width: 300px; }
.a { flex: 1; }          /* a 和 b 各得 150px */
.b { flex: 1; }          /* split evenly from 0 */

/* vs */
.a { flex: 1 1 auto; content: "longer text"; }  /* a 会更宽！*/
.b { flex: 1 1 auto; content: "hi"; }           /* a gets more space! */

flex: 1 splits space from zero (equal shares). flex: 1 1 auto splits from content size (content-biased). This trips up many senior devs!

经典布局示例 / Classic Layout Example

圣杯布局（Header + Sidebar + Main + Footer）


/* 用 Flexbox 实现三列布局 / Three-column layout */
.page {
  display: flex;
  flex-direction: column;
  min-height: 100vh;
}

.content-area {
  display: flex;
  flex: 1;  /* 占据剩余高度 / fill remaining height */
}

.sidebar {
  flex: 0 0 240px;  /* 固定宽度，不伸不缩 / fixed width */
}

.main {
  flex: 1;  /* 占据剩余宽度 / take remaining width */
}

Mini Challenge 🎯

用纯 CSS Flexbox（不用 Grid），实现这个布局：


┌─────────────────────────────┐
│         Header              │
├────────┬────────────────────┤
│Sidebar │   Main Content     │
│(200px) │   (flexible)       │
├────────┴────────────────────┤
│         Footer              │
└─────────────────────────────┘

侧边栏固定 200px，主内容区自适应，整体高度 100vh。

Sidebar fixed at 200px, main content flexible, total height 100vh.

答案明天揭晓！/ Answer revealed tomorrow!

Day 2 of 100 | #ByteByByte | CSS Fundamentals 系列

🤖 AI

🤖 AI Day 2

Topic: Transformer 是怎么工作的？— "Attention Is All You Need"

从"翻译"说起 / Start With Translation

2017年之前，翻译系统用 RNN（循环神经网络）：逐字读取，逐字生成。就像一个翻译员，读完一个字才能记下来，再读下一个，记忆有限，长句子容易忘记开头。

Before 2017, translation used RNNs: process word by word, like a translator who reads one word at a time with limited working memory. Long sentences = forgotten beginnings.

2017年，Google 发了一篇论文："Attention Is All You Need"。核心思想震惊了整个 AI 界：

> "你不需要按顺序读句子。你可以一次性看整个句子，然后决定每个词该'关注'哪些其他词。"

> "You don't need to read sequentially. Look at the whole sentence at once, and let each word 'attend' to whichever other words are most relevant."

直觉解释 / Intuitive Explanation

为什么需要 Attention（注意力机制）？

翻译 "The animal didn't cross the street because it was too tired."

"it" 指的是什么？是 "animal" 还是 "street"？

人类一眼就知道是 "animal"（动物会累，街道不会累）。

RNN 在处理 "it" 的时候，离 "animal" 已经太远了，可能已经"忘了"。

Attention 的解法： 处理 "it" 时，让模型自动"回头看"整个句子，计算 "it" 和每个其他词的相关性分数。


"it" 与各词的 attention 分数示意 / Attention scores for "it":
The     → 0.05
animal  → 0.72  ← 高分！/ High score!
didn't  → 0.03
cross   → 0.04
the     → 0.02
street  → 0.08
because → 0.03
it      → 0.03

Attention solves the "it" problem: when processing "it", the model looks back at all words, assigns a relevance score to each, and "pays attention" to "animal" the most.

Transformer 核心机制 / Core Mechanism

Self-Attention 三步走 / Three Steps

每个词（token）会生成三个向量：

Each word generates three vectors:


                     ┌─────────────────────────────────────┐
输入词 / Input word  │  Q (Query)  K (Key)   V (Value)     │
"animal"             │  "我想问什么" "我是什么" "我代表什么信息"  │
                     │  "what I ask" "what I am" "my info" │
                     └─────────────────────────────────────┘

计算过程 / Computation:


步骤1: Score = Q · K^T / √d_k
       用"查询"和每个词的"键"做点积，得到相关性分数
       Dot product of Query with all Keys → relevance scores

步骤2: Softmax(Score)
       把分数转成概率分布（所有词的权重加起来 = 1）
       Convert scores to probability distribution (weights sum to 1)

步骤3: Output = Σ(weight × V)
       用权重加权所有词的"值"向量，得到最终表示
       Weighted sum of all Value vectors → final representation

Transformer 整体架构 / Overall Architecture


输入文本 / Input Text: "I love cats"
         │
         ▼
[词嵌入 + 位置编码]
[Token Embedding + Positional Encoding]
  (告诉模型每个词的位置，因为 Attention 本身没有位置感)
  (adds position info, since Attention has no inherent order)
         │
         ▼
┌─────────────────────────────┐
│   Encoder Block × N        │  ← N 层叠加 / N stacked layers
│  ┌──────────────────────┐   │
│  │   Multi-Head         │   │  ← 多个 Attention "头"并行
│  │   Self-Attention     │   │    multiple Attention heads in parallel
│  └──────────┬───────────┘   │
│             │ (+ residual)  │
│  ┌──────────▼───────────┐   │
│  │   Feed Forward       │   │  ← 每个位置独立的 MLP
│  │   Network (FFN)      │   │    per-position MLP
│  └──────────────────────┘   │
└─────────────────────────────┘
         │
         ▼
[丰富的上下文表示 / Rich contextual representation]

多头注意力 / Multi-Head Attention: 同时用多个 Attention（比如 8 个），每个关注不同的语义关系（语法、指代、情感等），最后拼接。就像同时从 8 个角度看一张照片。

Multi-head: run 8 attention mechanisms in parallel, each learning different relationship types (syntax, coreference, sentiment). Concatenate results. Like viewing a photo from 8 angles simultaneously.

为什么 Transformer 改变了一切？/ Why Did It Change Everything?

| 特性 / Feature | RNN | Transformer |

|---|---|---|

| 并行计算 / Parallelizable | ❌ 必须顺序 / Must be sequential | ✅ 所有位置同时处理 / All positions at once |

| 长距离依赖 / Long-range | ❌ 容易遗忘 / Forgets | ✅ 直接 Attention / Direct connection |

| 可扩展性 / Scalability | 差 / Poor | 优秀 / Excellent |

| GPU 利用率 / GPU usage | 低 / Low | 高 / High |

这就是 GPT、BERT、Claude、Gemini 的核心。 这篇 2017 年的论文，启动了整个现代 AI 时代。

This is the foundation of GPT, BERT, Claude, Gemini — every modern LLM. The 2017 paper that started the modern AI era.

代码片段：简化版 Attention / Simplified Attention in Code


import numpy as np

def scaled_dot_product_attention(Q, K, V):
    """
    Q: Query matrix  (seq_len, d_k)
    K: Key matrix    (seq_len, d_k)
    V: Value matrix  (seq_len, d_v)
    """
    d_k = Q.shape[-1]
    
    # Step 1: Compute attention scores
    # scores[i][j] = how much position i should attend to position j
    scores = Q @ K.T / np.sqrt(d_k)
    
    # Step 2: Softmax — convert scores to probabilities
    # exp for numerical stability trick usually applied here
    exp_scores = np.exp(scores - scores.max(axis=-1, keepdims=True))
    attention_weights = exp_scores / exp_scores.sum(axis=-1, keepdims=True)
    
    # Step 3: Weighted sum of values
    output = attention_weights @ V
    
    return output, attention_weights

# Example: 3 tokens, d_k=4
np.random.seed(42)
Q = np.random.randn(3, 4)  # 3 tokens asking questions
K = np.random.randn(3, 4)  # 3 tokens presenting keys
V = np.random.randn(3, 4)  # 3 tokens' actual info

output, weights = scaled_dot_product_attention(Q, K, V)
print("Attention weights shape:", weights.shape)  # (3, 3)
# Each row sums to 1.0 — how much each token attends to each other

一句话总结 / One-Liner

> Transformer = "让每个词直接看所有其他词，用相关性加权求和，并行计算，堆叠多层"

> Transformer = "let every word directly look at all others, weight by relevance, compute in parallel, stack many layers"

延伸阅读 / Going Deeper

- 原始论文 / Original paper: "Attention Is All You Need" (Vaswani et al., 2017)

- 可视化工具 / Visualization: [The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/)

- 明天 Day 3 AI 主题预告：BERT vs GPT — 为什么双向比单向更聪明（有时候）

Tomorrow Day 3 AI preview: BERT vs GPT — why bidirectional beats unidirectional (sometimes).

Day 2 of 100 | #ByteByByte | AI Foundations 系列

byte-by-byte — 2026-03-14

Sat, 14 Mar 2026 12:00:00 +0000

🏗️ System Design

System Design Day 1 — Client-Server Model & How the Internet Works

Date: 2026-03-14 | Category: Fundamentals | Difficulty: Beginner

🏗️ 系统设计 Day 1 / System Design Day 1

客户端-服务器模型 & 互联网是怎么运转的

Client-Server Model & How the Internet Works

想象你在一家餐厅点餐。你（客户端）告诉服务员（网络）你想要什么，厨房（服务器）接到订单后准备好食物，再通过服务员把食物送到你面前。互联网的每一次请求，都是这个流程的数字版本。

Imagine you're ordering food at a restaurant. You (the client) tell the waiter (the network) what you want, the kitchen (the server) prepares it, and the waiter brings it back. Every internet request follows this exact same flow — digitally.

架构图 / Architecture Diagram


你的浏览器 / Your Browser
        |
        | HTTP Request (GET /index.html)
        v
+-------+--------+
|   DNS Resolver  |   "把 google.com 翻译成 IP 地址"
|  (Phone book)   |   "Translates domain → IP address"
+-------+--------+
        |
        | IP: 142.250.80.46
        v
+-------+--------+
|    Internet     |   路由器、交换机、光缆
|  (The pipes)    |   Routers, switches, fiber cables
+-------+--------+
        |
        v
+-------+--------+
|  Web Server     |   nginx / Apache
|  (The waiter)   |   Receives your request
+-------+--------+
        |
        v
+-------+--------+
| App Server      |   Node.js / Django / Spring
| (The kitchen)   |   Runs your business logic
+-------+--------+
        |
        v
+-------+--------+
|   Database      |   PostgreSQL / MySQL / MongoDB
|  (The pantry)   |   Stores & retrieves data
+-------+--------+
        |
        | HTTP Response (200 OK + HTML)
        v
你的浏览器渲染页面 / Browser renders the page

关键概念 / Key Concepts

1. IP 地址 — 互联网的门牌号

每台联网设备都有一个 IP 地址，就像你家的门牌号。

IPv4: 192.168.1.1 (4组数字，已快耗尽)

IPv6: 2001:0db8:85a3::8a2e:0370:7334 (新标准，几乎无限)

Every device on the internet has an IP address — like a postal address for packets.

2. DNS — 互联网的电话簿

你记 google.com，但计算机需要 142.250.80.46。DNS 负责翻译。

解析顺序：浏览器缓存 → 系统缓存 → 本地 DNS 服务器 → 根域名服务器

You type google.com, DNS translates it to an IP. Without DNS, you'd memorize numbers for every website.

3. HTTP/HTTPS — 请求的语言


GET  /api/users        → 获取资源 / Fetch resource
POST /api/users        → 创建资源 / Create resource
PUT  /api/users/1      → 更新资源 / Update resource
DELETE /api/users/1    → 删除资源 / Delete resource

HTTPS = HTTP + TLS 加密。没有 HTTPS，你的数据在网络上是明文。

4. TCP/IP — 可靠传输的保障

TCP 保证数据包完整到达，就像注册邮件（有回执）。

UDP 不保证，但更快，适合视频流、游戏（偶尔丢帧没关系）。

为什么这样设计？/ Why This Design?

客户端-服务器分离的核心好处：

- 可扩展性：可以独立扩展服务器（加机器），不影响客户端

- 安全性：数据库不暴露给互联网，只有应用服务器能访问

- 可维护性：前端、后端、数据库各自独立部署

Separation of concerns: clients handle presentation, servers handle logic and data. This lets you scale, secure, and maintain each layer independently.

别踩这个坑 / Don't Fall Into This Trap

❌ 面试时说「用户点击按钮，数据就存到数据库了」

这跳过了太多层。面试官想听到：

DNS解析 → TCP握手 → HTTP请求 → 负载均衡 → 应用服务器 → 数据库

✅ 学会分层描述系统

每次系统设计，先画出这张图的骨架，再逐层深入。

In interviews, never skip layers. "The user clicks a button and data gets saved" misses: DNS, TCP handshake, load balancers, app servers, caching, and database transactions. Walk through every hop.

明日预告 / Tomorrow

Day 2 将深入 负载均衡 — 当一台服务器不够用时，如何优雅地横向扩展。

Day 2 covers Load Balancing — what happens when one server isn't enough.

💻 Algorithms

Algorithms Day 1 — #217 Contains Duplicate

Date: 2026-03-14 | Pattern: Arrays & Hashing | Difficulty: Easy

💻 算法 Day 1 / Algorithms Day 1 — #217 Contains Duplicate (Easy) — Arrays & Hashing

现实类比 / Real-World Analogy

想象你在整理一箱名片。你从盒子里一张一张往外拿，每拿出一张，先看看桌上有没有一样的。如果有，说明你有重复的联系人。这就是「哈希集合」的工作方式——把「已见过的」放在一个快速查找的结构里。

Imagine going through a box of business cards. You pull each card out and check if you've already put one on the table. If you find a match, you have a duplicate. That's exactly what a hash set does — it gives you O(1) lookup for "have I seen this before?"

题目 / Problem Statement

给你一个整数数组 nums，如果其中存在任何重复值，返回 true；否则返回 false。

Given an integer array nums, return true if any value appears at least twice, false if every element is distinct.


Input:  [1, 2, 3, 1]   → Output: True  (1 出现了两次)
Input:  [1, 2, 3, 4]   → Output: False (每个数都唯一)
Input:  [1, 1, 1, 3, 3, 4, 3, 2, 4, 2] → Output: True

逐步分析 / Step-by-Step Walkthrough

方法一：暴力法（别用这个）

双重循环，比较每对元素。Time: O(n²)，Space: O(1)

面试中绝对不要停在这里。

方法二：排序法

排序后相邻元素比较。Time: O(n log n)，Space: O(1)

稍好，但破坏了原数组顺序。

方法三：哈希集合（最优解）

维护一个「已见过」的集合，遍历一次搞定。


nums = [1, 2, 3, 1]
seen = {}

Step 1: num=1  → seen={1}           (新元素，加入)
Step 2: num=2  → seen={1,2}         (新元素，加入)
Step 3: num=3  → seen={1,2,3}       (新元素，加入)
Step 4: num=1  → 1 在 seen 里！→ return True ✓

Python 解法 / Python Solution


def containsDuplicate(nums: list[int]) -> bool:
    # Use a hash set for O(1) average-case lookup
    seen = set()
    
    for num in nums:
        if num in seen:
            # Found a duplicate — return immediately (early exit)
            return True
        seen.add(num)
    
    # No duplicates found after full traversal
    return False

# One-liner alternative (Pythonic, but reads the whole list)
# return len(nums) != len(set(nums))

为什么用 set 而不是 list？

x in list → O(n)，要遍历找

x in set → O(1)，哈希直接定位

list membership check is O(n); set uses hashing for O(1) average lookup. This is the key insight.

复杂度 / Complexity

| | 暴力法 | 排序法 | 哈希集合 |

|---|---|---|---|

| Time | O(n²) | O(n log n) | O(n) |

| Space | O(1) | O(1) | O(n) |

最优解是时间-空间的经典权衡：用 O(n) 额外空间换取 O(n) 时间。

Classic time-space tradeoff: we pay O(n) space to get O(n) time.

举一反三 / Pattern Recognition

这道题是「Arrays & Hashing」模式的入口。掌握它，你能解：

1. #1 Two Sum — 同样的「已见过」思路，存的是值→索引的映射

2. #128 Longest Consecutive Sequence — 先用 set 存所有数，再按规律遍历

3. #49 Group Anagrams — 用排序后的字符串作 key，用 dict 分组

4. #36 Valid Sudoku — 三个方向各维护一个 set

核心模式：每次遇到「需要快速判断某元素是否出现过」，第一反应是 set；需要记录「出现次数或位置」，用 dict。

The pattern: whenever you need "have I seen this before?", think set. When you need "how many times / where did I see it?", think dict. This pattern appears in ~30% of easy/medium array problems.

Mini Challenge 🎯

如果题目改成：找到数组中出现超过 n/2 次的元素（保证存在），怎么做？

What if you need to find the element that appears more than n/2 times? (Boyer-Moore Voting Algorithm — hint for tomorrow's pattern thinking)

🗣️ Soft Skills

Soft Skills Day 1 — Decision Making Under Uncertainty

Date: 2026-03-14 | Category: Decision Making | Level: Senior/Staff

🗣️ 软技能 Day 1 / Soft Skills Day 1

在信息不完整时做出关键技术决策

Making Critical Technical Decisions with Incomplete Information

为什么这很重要 / Why This Matters

初级工程师等信息齐全再行动。高级工程师知道：信息永远不会完全齐全。

系统随时会挂，竞争对手随时会发布，产品上线时间表不会等你做完全量分析。Senior/Staff 工程师和 L3 工程师最大的差距，不是编码能力，而是在模糊中决断的能力。

Junior engineers wait for complete information. Senior engineers know it never arrives. The gap between L3 and Staff isn't coding — it's the ability to make good decisions under uncertainty and own the outcome.

STAR 框架拆解 / STAR Framework

Situation（情境）

描述背景，但要聚焦：有什么压力？为什么信息不完整？

⚠️ 不要花超过 20% 的时间在这里

Task（任务）

你需要做什么决定？有什么约束？时间线？

清楚说明为什么这个决定很难。

Action（行动） ← 这是重点，占 60-70%

- 你如何快速收集最关键的信息？

- 你评估了哪些方案？

- 你如何在时间压力下做出判断？

- 谁参与了决策，如何达成共识？

- 你如何记录决定和理由（ADR）？

Result（结果）

具体指标。但如果结果不完美，更要说清楚你学到了什么。

❌ 糟糕的回答 / Bad Approach

> "我们的数据库响应变慢了，我研究了一下，最后升级了实例类型，问题解决了。"

问题出在哪里：

- 没有体现「信息不完整」的挑战

- 没有说明评估过的其他方案

- 没有数字

- 听起来是一个人默默解决，没有体现协作

- 面试官不知道你的思维过程

✅ 好的回答结构 / Good Approach

> "2024年Q3，我们的支付服务在高峰期 p99 延迟从 80ms 跳到了 800ms，但我们不知道根因——可能是代码、数据库、还是下游 API。问题是周五下午5点发生的，我们有个重要的 launch 在下周一。"

> "我需要在没有完整 tracing 数据的情况下（我们当时监控覆盖率只有60%）决定：是回滚最近的部署、扩容数据库、还是限流？"

> "我做了三件事：第一，让团队15分钟内各自排查一个方向，并行收集证据。第二，设定了一个阈值——如果30分钟内找不到根因，就先限流保护系统，再继续排查。第三，在 Slack 里实时记录我们的假设和证据，方便团队同步。"

> "结果是我们在22分钟内发现是一个 N+1 查询问题被一次数据迁移触发了。我们加了一个临时索引，延迟降到了 95ms，顺利支撑了周一 launch。事后我们补了完整的 APM tracing。"

场景模板 / Scenario Template


背景: [系统 X] 在 [时间点] 出现了 [问题/机会]
信息缺口: 我们不知道 [关键未知项]，因为 [原因]
约束: [时间/资源/风险约束]
我的决策框架:
  - 快速信息收集: [做了什么]
  - 方案评估: [A vs B vs C，为什么选A]
  - 风险缓解: [如何降低决策风险]
  - 沟通对齐: [如何同步团队/stakeholders]
结果: [具体数字] + [事后学到的]

Senior/Staff 加分项 / Level-Up Tips

1. 提到 ADR（架构决策记录）

"我们写了一个 ADR 记录了这个决定和我们当时的信息状态，方便3个月后的人理解为什么这么做。"

2. 主动承认决定的局限性

Staff 级别的工程师不假装自己的决定完美，他们说："这是基于当时信息的最优解，我们设置了一个检查点在30天后重新评估。"

3. 体现系统性思维

不只解决这次的问题，还要防止下次同类问题发生。

关键要点 / Key Takeaways

- 面试官想看的是你的思维过程，不只是结果

- 信息不完整≠瘫痪，要展示你如何快速收集关键信息

- 好的决定有明确的理由，坏的结果有清晰的复盘

- 量化一切：延迟数字、时间窗口、影响用户数

The interviewer wants to see: structured thinking under pressure, ability to make good-enough decisions fast, and ownership of outcomes regardless of result.

🎨 Frontend

Frontend Day 1 — CSS Box Model: The Foundation of Layout

Date: 2026-03-14 | Category: CSS Fundamentals | Week: 1

🎨 前端 Day 1 / Frontend Day 1

CSS 盒模型 — 所有布局的起点

CSS Box Model — The Foundation of Layout

猜猜这段代码输出什么？/ What does this code output?


.box {
  width: 100px;
  padding: 20px;
  border: 5px solid black;
  margin: 10px;
}


Hello

问题：.box 在页面上占多少宽度？

Question: How wide does .box actually appear on screen?

A) 100px

B) 150px

C) 160px

D) 170px

答案是 C) 160px — 但等等，很多人会猜 A！

Most people guess A. The answer is C — here's why.

盒模型可视化 / Box Model Visualization


+------------------------------------------+
|              margin: 10px                |
|  +------------------------------------+  |
|  |         border: 5px               |  |
|  |  +------------------------------+ |  |
|  |  |       padding: 20px          | |  |
|  |  |  +------------------------+  | |  |
|  |  |  |   content: 100px wide  |  | |  |
|  |  |  |      "Hello"           |  | |  |
|  |  |  +------------------------+  | |  |
|  |  |                              | |  |
|  |  +------------------------------+ |  |
|  +------------------------------------+  |
+------------------------------------------+

实际渲染宽度 / Rendered width:
100 (content) + 20*2 (padding) + 5*2 (border) = 150px
注意：margin 不计入元素宽度，但影响占位空间

默认的 box-sizing: content-box 意味着 width 只是内容区域的宽度。

Padding 和 border 会叠加在外面，让元素比你想的更大。

解决方案：box-sizing: border-box


/* 现代 CSS 的最佳实践 / Modern CSS best practice */
*, *::before, *::after {
  box-sizing: border-box;
}

.box {
  width: 100px;   /* Now this IS the final rendered width */
  padding: 20px;
  border: 5px solid black;
  /* Content area auto-shrinks to: 100 - 40 - 10 = 50px */
}

用了 border-box 后，width: 100px 就真的是 100px，padding 和 border 都"向内压缩"。这是几乎所有现代 CSS 框架（Bootstrap、Tailwind）默认使用的设置。

With border-box, width means what you think it means. Padding and border carve inward. This is why every modern CSS framework resets box-sizing globally.

你可能不知道 / You Might Not Know

Gotcha #1: margin 不是元素的一部分

margin 是元素与其他元素之间的空白，不影响元素本身的宽度，但影响布局空间。用 background-color 你会发现 background 不延伸到 margin 里。

Gotcha #2: Margin Collapse（外边距折叠）


.top    { margin-bottom: 20px; }
.bottom { margin-top: 30px; }

两个块级元素垂直相邻，你以为间距是 50px，实际是 30px（取较大值）。

水平方向的 margin 不会折叠，只有垂直方向才有这个"惊喜"。

Vertical margins between block elements collapse to the larger value. Horizontal margins never collapse. This trips up every developer at least once.

Gotcha #3: inline 元素的 padding/margin 行为不同

这类行内元素，设置 padding-top/bottom 和 margin-top/bottom 不会影响行高，效果跟你想的不一样。要控制高度，先把它变成 inline-block。

Mini Challenge 🎯

不用打开浏览器，算出这个元素的实际渲染宽度：


.card {
  box-sizing: border-box;
  width: 300px;
  padding: 16px;
  border: 2px solid #eee;
  margin: 24px auto;
}

渲染宽度是多少？内容区域宽度是多少？

What's the rendered width? What's the content area width?

答案下方揭晓：

- 渲染宽度：300px（因为 border-box）

- 内容区域：300 - 32 - 4 = 264px

🤖 AI

AI Day 1 — AI News Roundup

Date: 2026-03-14 | Mode: NEWS

🤖 AI Day 1 — 本周 AI 大事件 / This Week in AI

📰 Story 1: Claude 1M Context 全面开放

Claude's 1M Token Context Window Goes Generally Available

Anthropic 昨天（3月13日）宣布，Claude Opus 4.6 和 Sonnet 4.6 的 100万 token 上下文窗口正式 GA，并且不收长上下文溢价——无论你发送 9K 还是 900K token，每 token 定价相同（Opus: $5/$25/M，Sonnet: $3/$15/M）。同时单次请求可以包含最多 600 张图片或 PDF 页面（之前是 100）。

Anthropic made the 1M context window for Claude Opus 4.6 & Sonnet 4.6 generally available on March 13 with no long-context premium. Same per-token price whether you send 9K or 900K tokens. Media limits expanded 6x to 600 images/PDFs per request.

为什么你应该关心 / Why You Should Care:

对于需要处理整个代码库、大型合同文档、或长时间 agent 运行的工程师来说，这是实质性突破。之前 200K 以上需要特殊 beta header，现在自动生效。Claude Code 的 Max/Team/Enterprise 用户也自动获得 1M 上下文，减少 compaction 中断——Anthropic 表示这让 compaction 事件减少了 15%。

For engineers working with large codebases, legal documents, or long-running agents: this eliminates forced context compression. A 1M context means you can load an entire enterprise codebase and reason across it without losing track of earlier decisions.

📰 Story 2: 「会思考的 AI」正在改变代码审查

AI Agents Are Changing Code Review

越来越多的团队开始用 AI agent（Claude Code、Devin、Copilot Workspace）做第一轮代码审查。一个真实案例：某公司把整个 diff 喂给 Opus 4.6 的 1M 上下文，拿到比分块处理高质量得多的跨文件依赖分析。

Teams are deploying AI agents as first-pass code reviewers. With 1M context, agents can ingest full diffs and reason about cross-file dependencies that chunking strategies miss.

为什么你应该关心 / Why You Should Care:

这不是说 AI 要取代 code review，而是说 AI 能处理「检查你有没有更新所有调用者」、「这个改动和3个文件之外的逻辑一致吗」这类枯燥但重要的检查，让人类 reviewer 聚焦在架构和意图层面。

AI handles the mechanical review (did you update all callers? is this consistent with the contract 3 files away?). Humans focus on intent and architecture. Your job as a reviewer is evolving.

📰 Story 3: 多模态 AI 的「看懂图纸」能力

Multimodal AI Learning to Read Engineering Diagrams

前沿模型处理架构图、电路图、数学公式的能力在过去一年显著提升。工程师们开始用它来：解读遗留系统的手绘架构图、分析竞争对手产品的硬件拆解照片、把 Figma 截图直接转成 React 组件。

Frontier models have dramatically improved at reading architecture diagrams, circuit schematics, and hand-drawn flowcharts. Engineers are using this to digitize legacy documentation and convert design screenshots to code.

为什么你应该关心 / Why You Should Care:

如果你的团队还有一堆「只有某个老员工看得懂」的架构图、Confluence 里的白板照片，现在是时候让 AI 把这些知识结构化了。技术债不只是代码债，文档债也是。

The "tribal knowledge" locked in whiteboard photos and napkin sketches can now be extracted. Teams that act on this will onboard engineers faster and reduce key-person risk.

📰 Story 4: Vibe Coding 的隐藏成本

The Hidden Costs of Vibe Coding

「Vibe coding」（完全让 AI 写代码，自己不看细节）在 Twitter/X 上很流行，但越来越多的工程师报告了真实代价：安全漏洞（AI 生成了但没人审查）、架构债（快速生成的代码结构混乱）、以及最麻烦的——你不理解自己系统的工作原理，无法 debug。

"Vibe coding" — prompting AI to build entire features without reviewing the output — is creating a new class of tech debt: security vulnerabilities nobody audited, architectural chaos from unreviewed code, and engineers who can't debug systems they nominally wrote.

为什么你应该关心 / Why You Should Care:

AI 是加速器，不是替代品。最有效的工程师是「AI 辅助」而不是「AI 依赖」。理解你代码中每一个关键决策，即使是 AI 建议的，是职业生涯的护城河。

The engineers who thrive long-term use AI as an accelerator, not a replacement for understanding. Owning your code means being able to explain every key decision — even if AI suggested it.

本周一句话总结 / One-Line Summary

上下文窗口越来越大，AI 能「记住」的越来越多——但你需要理解的也越来越多。工具在进化，思维方式也得跟上。

Context windows are expanding, AI can remember more — but so does your responsibility to understand what it's doing. The tools are evolving; so must the mindset.