← 2026-03-23 📂 All Days 2026-03-25 →
🏗️
🏗️ System Design
🏗️ 系统设计 Day 9 / System Design Day 9

🏗️ 系统设计 Day 9 / System Design Day 9

主题 / Topic: 数据库复制与分片 / Database Replication & Sharding


🌏 真实场景 / Real-World Scenario

想象你在设计一个像微信读书或 Goodreads 的阅读应用——用户突破 5000 万,每天产生几亿条阅读记录、笔记和评论。单一数据库服务器已经撑不住了:写操作堵住读操作,单点故障导致整个 App 不可用,数据量超出单机磁盘上限。

你需要两把利器:复制(Replication)解决可用性和读性能,分片(Sharding)解决写性能和存储规模。

Imagine you're designing a reading app like Goodreads at 50M users, with hundreds of millions of reading records daily. A single database server buckles under load. You need Replication for availability & read scale, and Sharding for write scale & storage capacity.


🏛️ 架构图 / Architecture Diagram

┌─────────────────────────────────────────────────────────┐ │ 应用服务层 / App Layer │ │ [API Server 1] [API Server 2] [API Server 3] │ └─────────┬──────────────────────────┬────────────────────┘ │ Writes │ Reads ▼ ▼ ┌─────────────────┐ ┌────────────────────────┐ │ Primary DB │──────► │ Read Replica 1 │ │ (Leader/Master) │──────► │ Read Replica 2 │ │ │──────► │ Read Replica 3 │ └────────┬────────┘ └────────────────────────┘ │ Replication Log (WAL / Binlog) │ ▼ [After Replication → Add Sharding] ┌────────────────────────────────────────────────────┐ │ Shard Router / Proxy │ │ (e.g. Vitess, ProxySQL, PgBouncer) │ └────┬──────────────────┬───────────────────┬────────┘ │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ Shard 0 │ │ Shard 1 │ │ Shard 2 │ │user 0-33M│ │user33-66M│ │user66M+ │ │+Replicas│ │+Replicas│ │+Replicas│ └─────────┘ └─────────┘ └─────────┘

⚖️ 关键权衡 / Key Tradeoffs

复制 / Replication

方案优点缺点
同步复制强一致性,不丢数据写延迟高(等所有副本确认)
异步复制写延迟低,吞吐高副本可能有延迟(replication lag)
半同步折中:至少 1 个副本确认稍高写延迟,部分一致性

为什么这样设计?

  • 读多写少的业务(如阅读记录):异步复制 + 多读副本,读吞吐可水平扩展
  • 金融、支付场景:同步复制或 Raft/Paxos 保证强一致

分片 / Sharding

策略原理适合场景
Range Sharding按 user_id 范围切分范围查询友好,但热点风险高
Hash Shardingshard = hash(user_id) % N均匀分布,但范围查询跨 shard
Directory Sharding查表确定归属 shard灵活,但查表本身是瓶颈

🚫 常见坑 / Common Mistakes

坑 1:过早分片

分片大幅增加系统复杂度。复制 + 读副本能抗住大多数流量,先用它,真正撑不住再分片。

坑 2:选错 Shard Key

按时间分片会导致最新 shard 永远是热点(写都打到最新月份)。按用户 ID hash 分片更均匀。

坑 3:跨 Shard 事务

分布式事务极复杂。设计 schema 时尽量让同一用户的数据在同一 shard,避免跨 shard join。

坑 4:忽略 Replication Lag

用户刚发评论,立刻刷新却看不到——因为读副本还没同步。对强一致性操作,读 Primary 或使用 read-your-writes 路由。

📚 参考资料 / References

  1. AWS Database Replication — RDS Read Replicas
  2. Vitess — MySQL Sharding at YouTube Scale
  3. Designing Data-Intensive Applications — Chapter 5 & 6 (Kleppmann)

🧒 ELI5 / 用小孩能理解的话说

复制就像把书抄写多份,放在不同图书馆。每个图书馆都能借给你看(读副本),但只有总馆能修改(Primary)。

分片就像把全班同学的作业按学号分给 3 个老师批改——不再一个老师批所有作业,每个老师只负责一段。

Replication = Make copies of the book so more people can read at once.

Sharding = Split the library into sections so no single librarian is overwhelmed.

💻
💻 Algorithms
💻 算法 Day 10 / Algorithms Day 10

💻 算法 Day 10 / Algorithms Day 10

#125 Valid Palindrome (Easy) — 双指针模式 / Two Pointers


🧩 新模式 / New Pattern: 双指针模式 (Two Pointers)

📍 这个模式块共 5 道题 / This block: 5 problems

#题目难度
1#125 Valid Palindrome ← 今天 / TODAY🟢 Easy
2#167 Two Sum II🟡 Medium
3#15 3Sum🟡 Medium
4#11 Container With Most Water🟡 Medium
5#42 Trapping Rain Water🔴 Hard

什么时候用 / When to Use

排序数组中找配对、回文检测、原地操作时,想到双指针。

Use Two Pointers when: sorted array + find a pair, palindrome detection, in-place removal, merging sorted arrays.

识别信号 / Signals

sorted array · find pair with sum · palindrome · remove in-place · merge sorted · container/water problems

通用模版 / Template

def two_pointer_template(arr, target):
    left, right = 0, len(arr) - 1
    
    while left < right:
        current = arr[left] + arr[right]      # or some condition on left/right
        
        if current == target:
            return [left, right]              # found it
        elif current < target:
            left += 1                         # need bigger value → move left pointer right
        else:
            right -= 1                        # need smaller value → move right pointer left
    
    return []                                 # not found

核心洞察 / Key Insight: 排序 + 两端逼近,从 O(n²) 嵌套循环降到 O(n) 单次扫描。

Sorted order + converging from both ends → eliminates the need for nested loops.


📖 今日题目 / Today's Problem

🔗 LeetCode #125 — Valid Palindrome 🟢 Easy

📹 NeetCode 讲解


🌍 现实类比 / Real-World Analogy

想象你是一个质检员,要验证一条传送带上的字符串"从两头读是否一样"。你派两个检查员分别站在传送带两端,同时向中间走,每步对比字母(跳过非字母数字的字符)。两人相遇时没有发现不同,就通过!

Think of two inspectors walking from both ends of a conveyor belt toward the middle, each checking only alphanumeric items and skipping punctuation/spaces.


🧩 如何映射到模版 / Mapping to Template

经典双指针,但有两个变化:

  1. 不是排序数组——我们用双指针做"对比"而不是"求和"
  2. 需要跳过非字母数字字符——在移动指针前先跳过无效字符

Classic Two Pointers, with two modifications:

  1. No sorted array → use pointers for comparison, not sum-seeking
  2. Skip non-alphanumeric chars before comparing
def isPalindrome(s: str) -> bool:
    left, right = 0, len(s) - 1
    
    while left < right:
        # Skip non-alphanumeric from the left
        while left < right and not s[left].isalnum():
            left += 1
        # Skip non-alphanumeric from the right
        while left < right and not s[right].isalnum():
            right -= 1
        
        # Compare (case-insensitive)
        if s[left].lower() != s[right].lower():
            return False
        
        left += 1
        right -= 1
    
    return True

🔍 代码追踪 / Code Trace

Input: "A man, a plan, a canal: Panama"

left=0 right=29 → 'A' vs 'a' → match → left=1, right=28 left=1 right=28 → skip ' ' → left=2 left=2 right=28 → 'm' vs 'm' → match → left=3, right=27 left=3 right=27 → 'a' vs 'a' → match → left=4, right=26 left=4 right=26 → 'n' vs 'n' → match → ... ... → All chars match → return True ✅

Input: "race a car"

left=0 right=9 → 'r' vs 'r' → match left=1 right=8 → 'a' vs 'a' → match left=2 right=7 → 'c' vs 'c' → match left=3 right=6 → 'e' vs 'a' → ❌ MISMATCH → return False

📊 复杂度 / Complexity

TimeSpace
Two PointerO(n)O(1)
Built-in reverseO(n)O(n) — creates new string

Space O(1) is the win here — we never create a cleaned copy of the string.


🔄 举一反三 / Pattern Connections

这道题是双指针的"热身"——纯粹的左右逼近。接下来的题目会在这个基础上加难度:

题目变化核心差异
#167 Two Sum II有序数组找和移动指针基于 sum vs target
#15 3Sum三数之和固定一个数 + 双指针找剩余两个
#11 Container With Most Water面积最大化移动较短的那边指针
#42 Trapping Rain Water复杂水位计算双指针维护左右最大高度

📚 参考资料 / References

  1. LeetCode #125 — Valid Palindrome
  2. NeetCode — Two Pointers Pattern
  3. Python str.isalnum() docs

🧒 ELI5 / 用小孩能理解的话说

回文就像照镜子——左边和右边要一样。我们用两只手,一只从左摸,一只从右摸,跳过空格和标点,对比每个字母。如果两只手中间相遇了都没发现不同,就是回文!

A palindrome is like a mirror — left side = right side. We use two fingers, one from each end, skip spaces/punctuation, compare each letter. If both fingers meet in the middle without finding a mismatch → palindrome!

🗣️
🗣️ Soft Skills
🗣️ 软技能 Day 9 / Soft Skills Day 9

🗣️ 软技能 Day 9 / Soft Skills Day 9

主题 / Topic: 利益相关方管理 / Stakeholder Management

问题 / Question: Describe a time you had to push back on a feature or requirement. Why?


💡 为什么这道题很重要 / Why This Matters

在高级工程师面试中,面试官不只想知道你"会写代码"——他们想知道你能不能独立判断、有没有勇气说出"这个需求有问题"。盲目执行坏需求是初级工程师的行为;能够用数据和逻辑推动正确方向,是 Senior/Staff 的核心能力。

Interviewers want to know you're not just a "feature factory." Senior engineers own outcomes, not just outputs. Pushing back constructively — with data, not attitude — is a core competency at L5+.


⭐ STAR 拆解 / STAR Breakdown

Situation(情境)

设置背景:什么团队?什么项目阶段?紧迫程度?

"我们的 PM 要求在发布前两周新增一个实时用户追踪功能,当时系统负载已经接近上限。"

"Our PM requested a real-time user tracking feature two weeks before a major launch, when our system load was already near capacity."

Task(任务)

你的职责是什么?你为什么有发言权?

"作为负责后端基础设施的 Senior Engineer,我需要评估这个需求的可行性和风险。"

"As the Senior Engineer owning backend infra, I needed to assess feasibility and surface the technical risk."

Action(行动)

这是核心!展示你如何有据可查地推回,而不是情绪化地拒绝。
  1. 量化风险: 我跑了负载测试,展示新功能会把 P99 延迟从 120ms 推高到 650ms
  2. 提出替代方案: 建议将实时追踪改为批量日志,延迟 24h 但不影响核心体验
  3. 对齐业务目标: 确认 PM 真正想要的是"数据分析能力"而不是"实时性"——批量方案完全满足
  4. 共识达成: 带着数据找 PM + 工程总监开了 30 分钟会议,最终采用我的方案

"I ran load tests showing P99 latency would spike from 120ms to 650ms. I proposed batch logging instead — same data, 24h delay, zero performance impact. I aligned with PM on the real goal (analytics, not real-time), and brought data to a 30-min meeting with PM and Eng Director. We shipped the batch solution."

Result(结果)

用数字说话。

"发布如期进行,零性能事故。批量数据方案在发布后三个月上线,PM 反馈数据质量超出预期。这次经验也推动团队建立了需求评审中的技术可行性评估流程。"

"Launch shipped on time with zero incidents. The batch analytics shipped 3 months post-launch and exceeded data quality expectations. The experience led to establishing a technical feasibility step in our requirements review process."


❌ 别这么说 / Bad vs ✅ 这么说 / Good

❌ 踩坑✅ 正解
"我直接告诉 PM 这个要求太蠢了""我跑了测试,把风险用数据量化"
"我认为这不重要,所以不做""我先理解他们的真实目标,再提替代方案"
"最终我没能阻止,还是做了""我确保所有决策者都了解风险,决策有据可查"
"我们就这么做了" → 没有结果说清楚结果:上线情况、用户影响、后续改进

🚀 Senior/Staff 加分点 / Senior+ Tips

  1. 系统化推回,而非感情化拒绝。 数据 > 直觉。Load test、cost model、用户影响分析——让数字说话。
  2. 先理解"为什么",再评估"怎么做"。 很多"坏需求"背后有合理的业务原因,找到根本目标才能提出真正有价值的替代方案。
  3. 建立信任储备。 平时 deliver 靠谱,关键时刻的推回才会被认真对待。
  4. 把决策过程文档化。 即使你没能推回成功,确保风险已被知晓和记录,保护自己也保护团队。

🎯 Key Takeaways

  • 推回 ≠ 拒绝。推回 = 用专业判断守护产品质量。
  • Push back = professional judgment, not obstruction.
  • 永远带着数据和替代方案去谈,而不是空手说"不行"。
  • Always come with data + alternatives, never just "no."
  • 好的推回最终是双赢:工程质量 + 业务目标都得到保护。

📚 参考资料 / References

  1. The Engineering Manager's Handbook — Pushing Back Effectively
  2. Staff Engineer: Leadership Beyond the Management Track (Will Larson)
  3. How to Disagree Productively — First Round Review

🧒 ELI5 / 用小孩能理解的话说

如果你的朋友说"我们现在去游泳吧",但你知道外面在下大雨,你不是直接说"不去",而是说"你想游泳吗?那我们去室内游泳池!"——这就是有建设性的推回。

If a friend says "let's swim now!" but it's raining, you don't just say "no" — you say "want to swim? Let's go to the indoor pool!" That's constructive pushback.

🎨
🎨 Frontend
🎨 前端 Day 9 / Frontend Day 9

🎨 前端 Day 9 / Frontend Day 9

主题 / Topic: React useState — 触发重渲染的状态 / State That Triggers Re-renders


🌏 真实场景 / Real Scenario

你在做一个任务管理 dashboard,点击按钮需要切换"显示已完成任务"的筛选器。你需要一个变量来记住当前状态,而且每次改变时 UI 要自动更新。这就是 useState 的舞台。

You're building a task dashboard. Clicking a button should toggle showing completed tasks. You need a variable that remembers its value AND automatically updates the UI when it changes. That's useState.


💻 代码示例 / Code Snippet

import { useState } from 'react'

interface Task {
  id: number
  title: string
  completed: boolean
}

function TaskDashboard() {
  // useState returns [current value, setter function]
  const [showCompleted, setShowCompleted] = useState(false)
  const [tasks] = useState<Task[]>([
    { id: 1, title: 'Review PR', completed: true },
    { id: 2, title: 'Write tests', completed: false },
    { id: 3, title: 'Deploy staging', completed: true },
  ])

  // Derived state — computed from existing state, no useState needed
  const visibleTasks = showCompleted
    ? tasks
    : tasks.filter(t => !t.completed)

  return (
    <div>
      <button onClick={() => setShowCompleted(prev => !prev)}>
        {showCompleted ? 'Hide' : 'Show'} Completed ({tasks.filter(t => t.completed).length})
      </button>
      <ul>
        {visibleTasks.map(task => (
          <li key={task.id} style={{ opacity: task.completed ? 0.5 : 1 }}>
            {task.title}
          </li>
        ))}
      </ul>
    </div>
  )
}

🧠 猜猜输出 / What Does This Output?

function Counter() {
  const [count, setCount] = useState(0)
  
  const handleClick = () => {
    setCount(count + 1)
    setCount(count + 1)
    setCount(count + 1)
  }
  
  return <button onClick={handleClick}>Count: {count}</button>
}

点击一次后,count 是多少?/ After one click, what is count?

A) 3 — 调用了三次 setCount

B) 1 — React 批量处理,count 是快照

C) 0 — setState 是异步的,还没更新

D) 报错 — 不能在一个函数里多次调用 setCount

<details><summary>显示答案 / Show Answer</summary>

答案是 B — count = 1

为什么?因为在同一个事件处理函数中,count 是一个快照(snapshot),值固定为 0。三次 setCount(0 + 1) 都是 setCount(1),最后只更新一次。

React batches state updates within the same event handler. count is a snapshot — it's 0 throughout the whole function. All three calls are setCount(0 + 1) = setCount(1). Result: 1.

✅ 如果你想累加,用函数式更新 / Use functional updates for increments:

setCount(prev => prev + 1) // ✅ prev is always latest value
setCount(prev => prev + 1) // ✅ prev = 1
setCount(prev => prev + 1) // ✅ prev = 2 → final count = 3

</details>


❌ 常见错误 / Common Mistakes

错误 1:直接修改 state 对象

// ❌ WRONG — mutating state directly, React won't re-render!
const [user, setUser] = useState({ name: 'Alice', age: 25 })
user.age = 26  // ← This doesn't trigger a re-render

// ✅ CORRECT — create a new object
setUser({ ...user, age: 26 })

错误 2:把可以派生的值放进 state

// ❌ WRONG — derived state causes sync issues
const [items, setItems] = useState([...])
const [filteredItems, setFilteredItems] = useState([...]) // ← redundant!

// ✅ CORRECT — compute it during render
const filteredItems = items.filter(item => item.active) // no useState needed

错误 3:忘记函数式更新导致 stale closure

// ❌ WRONG in async context or event batching
setCount(count + 1)

// ✅ CORRECT — always use functional update when new value depends on old
setCount(prev => prev + 1)

📐 何时用 / 何时不用 / When to Use vs Not

✅ 用 useState❌ 不用 useState
UI 交互状态(开/关、选中、展开)可从其他 state/props 计算的值
表单输入值不需要触发渲染的变量(用 useRef
组件局部数据(列表项、分页)多组件共享状态(用 Context 或状态管理库)
异步请求结果(loading/data/error)服务端状态(用 React Query / SWR)

📚 参考资料 / References

  1. React Docs — useState
  2. React Docs — State as a Snapshot
  3. Common useState Mistakes (Kent C. Dodds)

🧒 ELI5 / 用小孩能理解的话说

useState 就像一块小白板。你可以在上面写字(set state),每次改变内容,整个教室(组件)都会重新看一遍白板(重渲染)。普通变量就像便利贴——改了 React 不知道,不会重新看。

useState is like a whiteboard. When you erase and rewrite it, the whole classroom (component) looks again and updates. A regular variable is like a sticky note only you can see — React doesn't know it changed.

🤖
🤖 AI
🤖 AI Day 9 — 本周 AI 大事件 / AI News Roundup

🤖 AI Day 9 — 本周 AI 大事件 / AI News Roundup

来源:web_search,2026年3月24日 / Sources: web_search, March 24, 2026


📰 Story 1: Agentic AI 成为新主流 / Agentic AI Goes Mainstream

来源 / Source: switas.com — The AI Avalanche: 7 Agentic LLM Breakthroughs

AI 从"生成文本"进化到"自主完成任务"。Gartner 预测 2026 年底 40% 的企业应用将内嵌任务型 AI Agent,作为真正的"数字同事"自动处理端到端业务流程。Oracle 也宣布了专为 Agentic AI 优化的数据库创新。

AI has evolved from "generate text" to "autonomously complete multi-step tasks." Gartner predicts 40% of enterprise apps will embed task-specific AI agents by end of 2026. Oracle announced new AI Database innovations purpose-built for agentic workloads.

为什么你应该关心 / Why you should care:

作为工程师,你很快会被要求构建或集成 AI Agent。理解 Agent 的工具调用、状态管理、错误恢复机制,会是核心面试考点和工作技能。

As an engineer, you'll soon be asked to build or integrate AI agents. Tool calling, state management, and error recovery for agents are becoming core interview topics.


📰 Story 2: 模型"认知密度"时代——参数不是唯一指标 / Cognitive Density: Parameters Aren't Everything

来源 / Source: blog.mean.ceo — New AI Model Releases March 2026

2026 年 3 月,AI 竞赛焦点从"谁的参数最多"转向"谁的认知密度最高"。Claude Opus 4.6(Anthropic)引入"自适应思考"——模型根据 prompt 复杂度动态决定是否深度推理,无需用户手动配置。OpenAI 的 GPT-5.4 系列专注于每字节更高的知识密度。

The AI race shifted from "most parameters" to "highest cognitive density." Claude Opus 4.6 introduced "adaptive thinking" — the model dynamically decides when to engage deeper reasoning without user configuration. OpenAI's GPT-5.4 focuses on knowledge density per byte.

为什么你应该关心 / Why you should care:

选模型时,benchmark 分数只是一方面。了解"推理成本 vs 质量"的权衡,帮你在实际项目中做出更聪明的模型选型决策。

When choosing models for production, benchmark scores aren't everything. Understanding the reasoning-cost vs quality tradeoff helps you make smarter model selection decisions.


📰 Story 3: 上下文窗口突破 100 万 Token / Context Windows Break 1M Tokens

来源 / Source: alphacorp.ai — Top 5 LLMs for March 2026

多个领先模型的上下文窗口已突破 100 万 token,实验性模型甚至推向 1000 万。这意味着可以在单个 prompt 中塞入整个公司知识库、百万行代码库或多年财报数据。

Several leading models now boast 1M+ token context windows, with experimental models pushing toward 10M. You can now feed an entire company knowledge base, massive codebases, or years of financial records into a single prompt.

为什么你应该关心 / Why you should care:

超长上下文改变了 RAG(检索增强生成)的架构选择。某些场景下,直接 long-context 比构建向量数据库更简单、更准确——但成本和速度的权衡需要你来算。

Long contexts change RAG architecture decisions. Sometimes long-context beats building a vector database — but you need to reason about the cost/latency tradeoffs.


📰 Story 4: LLM 安全新技术 & "能力校准" / LLM Safety & Capability Calibration

来源 / Source: news.ncsu.edu — New Technique Addresses LLM Safety · morningstar.com — Appier Capability Calibration

NC State 研究人员发明了新技术识别保证安全响应的关键组件,同时将"对齐税"(安全训练带来的性能损失)降到最低。Appier 推出"能力校准"框架,让 AI Agent 在行动前先评估自己是否有能力完成任务,降低幻觉和过度自信。

NC State researchers identified key model components that ensure safe responses while minimizing the "alignment tax." Appier introduced "Capability Calibration" — AI agents assess their own confidence before taking action, reducing hallucinations and overconfidence in enterprise deployments.

为什么你应该关心 / Why you should care:

在企业 AI 部署中,让模型"知道自己不知道什么"比让它无限自信地输出错误答案更重要。Capability calibration 是 AI 工程中的新兴核心模式。

In enterprise AI, knowing what the model doesn't know is more valuable than confident-but-wrong outputs. Capability calibration is an emerging core pattern in AI engineering.


📰 Story 5: 模型发布速度危机——每 72 小时一个重磅发布 / Model Release Velocity Crisis

来源 / Source: ai-weekly.ai — Newsletter 03-24-2026

行业分析师追踪到目前约每 72 小时就有一个重大 AI 模型发布。Gemini 3.1 Pro、Claude Opus 4.6、GPT-5.4、DeepSeek V3.2、Qwen 3.5……价格相比去年同期下降 40-80%,开源权重模型与闭源旗舰的差距正在快速收窄。

Analysts are tracking a major AI release approximately every 72 hours. Prices dropped 40-80% year-over-year. Open-weight models are closing the gap with closed-source flagships rapidly.

为什么你应该关心 / Why you should care:

AI 基础设施成本正在快速商品化。在系统设计中,"用哪个 LLM API"的成本计算将越来越重要,学会对比延迟、成本、质量的三角权衡是工程师的新必备技能。

AI infrastructure is rapidly commoditizing. Cost modeling for LLM API selection — balancing latency, cost, and quality — is becoming a core engineering skill.


📚 参考资料 / References

  1. AI Weekly Newsletter — March 24, 2026
  2. Top LLMs March 2026 — AlphaCorp
  3. NC State LLM Safety Research

🧒 ELI5 / 用小孩能理解的话说

AI 现在不只是"会说话"了,而是开始"帮你做事"(Agentic AI)。同时模型越来越聪明但越来越便宜,就像手机——几年前的旗舰价格,现在买到的性能翻了几倍。

AI isn't just "talking" anymore — it's "doing things for you" (Agentic AI). Meanwhile models keep getting smarter and cheaper, like smartphones — you get 10x more for the same price year after year.