← 2026-03-29 📂 All Days 2026-04-01 →
🏗️
🏗️ System Design
🏗️ 系统设计 Day 14 / System Design Day 14

🏗️ 系统设计 Day 14 / System Design Day 14

微服务 vs 单体架构 / Microservices vs Monolith

难度 / Difficulty: Intermediate · 阶段 / Phase: Growth · 预计阅读 / Read time: 3 min

🌍 真实场景 / Real-World Scenario

想象你在一家初创公司工作,产品刚上线,代码都在一个仓库里。随着用户量增长到百万级别,你开始思考:要不要把代码拆成独立的服务?什么时候该拆?怎么拆?

Imagine you're at a startup. Your entire product lives in one codebase. As you scale to millions of users, you face the classic question: should you break it apart into microservices? When? How?


🏛️ 架构图 / Architecture Diagrams

单体架构 / Monolith

┌─────────────────────────────────────────┐ │ Monolith App │ │ ┌──────────┐ ┌──────────┐ ┌─────────┐ │ │ │ Users │ │ Orders │ │Payments │ │ │ │ Module │ │ Module │ │ Module │ │ │ └────┬─────┘ └────┬─────┘ └────┬────┘ │ │ └─────────────┴─────────────┘ │ │ │ │ │ ┌─────────▼─────────┐ │ │ │ Single Database │ │ │ └───────────────────┘ │ └─────────────────────────────────────────┘ │ Deploy everything together │

微服务架构 / Microservices

Client ──► API Gateway │ ┌────────┼────────┐ ▼ ▼ ▼ ┌────────┐ ┌──────┐ ┌──────────┐ │ Users │ │Orders│ │ Payments │ │Service │ │Svc │ │ Service │ └───┬────┘ └──┬───┘ └───┬──────┘ │ │ │ ┌──▼──┐ ┌──▼──┐ ┌───▼───┐ │ DB │ │ DB │ │ DB │ └─────┘ └─────┘ └───────┘ (独立部署, message queue 通信)

⚖️ 核心权衡 / Key Tradeoffs

为什么选单体?/ Why Monolith?

  • 简单 — 一个代码库,一次部署,本地开发直接跑
  • 低延迟 — 模块间函数调用,无网络开销
  • 事务一致性 — 一个数据库,ACID 事务天然支持
  • 适合阶段 — 团队 < 20 人,产品 PMF 还没验证时

为什么选微服务?/ Why Microservices?

  • 独立扩展 — Payment 服务流量暴增,只扩它,不动 Users 服务
  • 技术异构 — 推荐系统用 Python/ML,API 层用 Go,各自最优
  • 故障隔离 — 一个服务崩了,不影响整体
  • 团队自治 — 不同团队独立发布,互不阻塞
  • 适合阶段 — 团队 > 50 人,有专门 DevOps/Platform 团队时

对比表 / Comparison

维度单体微服务
部署复杂度低 ✅高 ❌
开发速度(早期)快 ✅慢 ❌
独立扩展
故障隔离
数据一致性容易 ✅需要设计 ❌
运维成本低 ✅高 ❌

🪤 别踩这个坑 / Common Mistakes

❌ 坑1: 过早微服务化 (Premature Microservices)

刚起步就拆服务,结果团队只有3个人要维护10个服务+Kubernetes。

"We went microservices on day one, and it almost killed us." — every startup that tried it too early

✅ 正确做法: 先做"模块化单体"(Modular Monolith),内部模块化,边界清晰,后期再物理拆分。

❌ 坑2: 分布式单体 (Distributed Monolith)

拆成多个服务,但服务之间强耦合,必须同步部署。既有微服务的复杂性,又没有微服务的好处。

✅ 正确做法: 服务间通过 API 或消息队列解耦,不共享数据库。

❌ 坑3: 忽视跨服务事务

订单服务扣库存成功,支付服务失败了,数据不一致。

✅ 正确做法: 使用 Saga 模式或最终一致性设计。


📚 References

🧒 ELI5

单体就像一家小餐厅,一个厨房做所有菜,简单高效。微服务像大型餐厅连锁,每家分店专做一类菜,可以独立扩张,但管理更复杂。刚开始开一家店,别一上来就开连锁。

Monolith = one kitchen that cooks everything. Simple, fast to start. Microservices = a food court where each stall specializes. Great at scale, but way more management. Start with one kitchen; split when it gets too crowded.

💻
💻 Algorithms
💻 算法 Day 14 / Algorithms Day 14

💻 算法 Day 14 / Algorithms Day 14

#42 Trapping Rain Water (Hard) — Two Pointers

🧩 Two Pointers (5/5) — building on the template from earlier days in this block

  • 🔗 LeetCode: https://leetcode.com/problems/trapping-rain-water/ 🔴
  • 📹 NeetCode: https://www.youtube.com/watch?v=ZI2z5pq0TqA
  • Pattern / 模式: Two Pointers(双指针)

🌧️ 现实类比 / Real-world analogy

把城市的屋顶想成一排高度不同的墙。下雨后,低洼处会积水,但能积多少取决于它左边最高的墙和右边最高的墙:

water[i] = min(maxLeft, maxRight) - height[i](如果为正)

Think of bars as walls. The water above a bar is limited by the shorter of the tallest wall on its left and the tallest wall on its right.


🧠 问题重述 / Problem

给定数组 height 表示柱子高度,每根柱子宽度为 1,计算下雨后能接多少雨水。

Given height, compute total trapped water.


🧩 如何映射到双指针模板 / Map to the Two Pointers template

之前的双指针块(

#125 回文、#167 两数之和II、#15 三数之和、#11 盛最多水的容器

)里,左右指针“夹逼”的核心是:

  • 每一步都能确定一侧的最优/可行性,因此可以移动那一侧,整体 O(n)

这题的“变化点”是:

  • 我们不再追求 pair/sum,而是维护 leftMax / rightMax,并在每一步“结算”一侧的水量。

Key twist vs earlier problems: instead of comparing sums/areas, we compare leftMax and rightMax. The side with the smaller max can be finalized because its limiting wall is known.


✅ 双指针解法 / Two pointers solution

核心思路 / Key idea

  • l, r 从两端向中间走
  • 维护 leftMax = max(height[0..l])rightMax = max(height[r..end])
  • 如果 leftMax < rightMax:左边的水位上限已确定(被 leftMax 限制),可以计算 l 位置的水并 l += 1
  • 否则:对称处理右边

Python 代码 / Python code

from typing import List

class Solution:
    def trap(self, height: List[int]) -> int:
        l, r = 0, len(height) - 1
        left_max, right_max = 0, 0
        water = 0

        while l < r:
            if height[l] < height[r]:
                # left side is bounded by left_max
                if height[l] >= left_max:
                    left_max = height[l]
                else:
                    water += left_max - height[l]
                l += 1
            else:
                # right side is bounded by right_max
                if height[r] >= right_max:
                    right_max = height[r]
                else:
                    water += right_max - height[r]
                r -= 1

        return water

🔍 手动走一遍 / Quick trace

例子:[0,1,0,2,1,0,1,3,2,1,2,1]

  • 开始 l=0, r=11, left_max=0, right_max=0, water=0
  • 右边较高(1 vs 1 走 else),更新 right_max=1r=10
  • 当左边较小(0 < 2):左侧可结算,left_max=0l=1
  • height[2]=0 时,left_max=1,水 += 1-0 = 1
  • ... 最终累计 water=6

Why it works: the side with the smaller boundary max is the limiting factor, so we can safely finalize water there without knowing the exact interior structure.


⏱️ 复杂度 / Complexity

  • Time: O(n) (each pointer moves at most n steps)
  • Space: O(1)

举一反三 / Transfer within this pattern block

  • #11 Container With Most Water:移动“短板”来寻找更可能变大的面积
  • #15 3Sum:固定一个数 + 双指针夹逼
  • #125 Valid Palindrome:两端检查并向内收缩

共同点:

  • 每一步移动都基于一个可证明的单调性/界限,避免 O(n^2)

📚 References

  • LeetCode editorial: https://leetcode.com/problems/trapping-rain-water/editorial/
  • NeetCode explanation (video): https://www.youtube.com/watch?v=ZI2z5pq0TqA
  • GeeksforGeeks (two-pointer approach): https://www.geeksforgeeks.org/trapping-rain-water/

🧒 ELI5

想象你在一排积木之间倒水。某一格能装多少水,只取决于它左边最高的积木和右边最高的积木里较矮的那个。双指针就是从两边往中间走,随时记住“目前看到的最高积木”,然后一格一格把水算出来。

Imagine filling water between blocks. A spot’s water level is capped by the shorter of the tallest block on its left and right. Two pointers walk inward, tracking those tallest blocks and adding water as you go.

🗣️
🗣️ Soft Skills
🗣️ 软技能 Day 14 / Soft Skills Day 14

🗣️ 软技能 Day 14 / Soft Skills Day 14

Tell me about a time you drove a large cross-team initiative

级别 / Level: Staff · 主题 / Category: Leadership · Read time: 2 min

为什么重要 / Why this matters

中文:

跨团队项目(例如:统一身份认证、支付迁移、数据平台升级、全站性能治理)最大的风险往往不是技术,而是对齐、节奏、依赖、沟通成本。Staff 级别面试官想听到的是:你如何在没有“直接汇报关系”的情况下,把很多人带到同一条船上。

English:

For cross-team initiatives, the hardest part is rarely the technical design—it’s alignment, dependencies, cadence, and communication overhead. Interviewers want evidence you can lead without formal authority.


⭐ STAR 结构(建议 90 秒回答)/ STAR structure (aim for 90 seconds)

S — Situation(背景)

  • 中文:项目是什么?影响范围多大?涉及哪些团队?
  • English: What was the initiative? Scope? Which teams?

T — Task(你的职责)

  • 中文:你具体负责什么?目标/成功标准是什么(SLO、迁移比例、成本、上线日期)?
  • English: What did you own? What were the success metrics?

A — Action(你做了什么)

用“可复制的方法论”讲:

1) 定义北极星指标 / Define a north-star metric:例如 p95 latency、error budget、migration completion。

2) 把问题拆成工作流 / Break into a plan:里程碑、风险清单、依赖图、RACI(谁负责/批准/咨询/知会)。

3) 建立节奏 / Create operating cadence:每周 cross-team sync、异步周报、决策记录(ADR)、升级通道。

4) 提前拆雷 / De-risk early:先做 POC / pilot、灰度、回滚预案、观测(dashboards + alerts)。

5) 对齐激励 / Align incentives:明确“对他们有什么好处”(减少 oncall、降低成本、提高转化)。

English (same content):

1) Define a measurable north-star metric.

2) Turn ambiguity into a concrete plan (milestones, dependency map, RACI).

3) Establish cadence (syncs, async updates, decision logs).

4) De-risk early (pilot, gradual rollout, rollback plan, observability).

5) Align incentives so partner teams want to participate.

R — Result(结果)

  • 中文:用数字结尾:提前/按期上线、迁移比例、故障率下降、成本节省、开发效率提升。
  • English: Close with numbers: completion %, latency improvement, incidents reduced, cost savings.

❌ Bad vs ✅ Good(面试官一听就懂)/ Bad vs Good

❌ Bad(空泛)

  • “我组织了很多会议,大家最后达成一致,然后上线了。”

✅ Good(可验证)

  • “我先把目标写成 p95 从 800ms 降到 400ms,并把依赖拆成 3 条迁移路径;每周一次跨团队同步 + 每两天异步进度;关键风险是 X 团队的 schema 变更,于是先做了两周 pilot 和双写;最终 6 周内迁移 92%,相关 oncall 事故从每周 5 起降到 1 起。”

Senior/Staff 加分点 / Senior/Staff-level tips

  • 把决策写下来:用 ADR 记录 tradeoffs,不靠“口口相传”。
  • 沟通要“分层”:IC 关注任务与风险,Manager 关注里程碑与资源,Exec 关注指标与 ROI。
  • 处理冲突的方式:先找共同目标,再用数据/实验说话;必要时明确升级路径。
  • 让系统自动运行:好的机制(dashboard、SLO、自动化迁移工具)比个人英雄更可靠。

Key Takeaways

  • 中文:跨团队项目 = 目标清晰 + 依赖可视化 + 节奏稳定 + 风险前置 + 激励对齐。
  • English: Cross-team success = clear metrics + dependency visibility + steady cadence + early de-risking + aligned incentives.

📚 References

  • Google SRE Book — Service Level Objectives: https://sre.google/sre-book/service-level-objectives/
  • RACI matrix overview (Atlassian): https://www.atlassian.com/team-playbook/plays/roles-and-responsibilities
  • Amazon Working Backwards (concept): https://www.aboutamazon.com/news/company-news/working-backwards-how-amazon-starts-with-the-customer

🧒 ELI5

中文:你要做一件很多同学一起完成的大作业。你得先说清楚“最终要拿多少分”(指标),再把任务分好、规定每周检查一次进度、提前发现最难的部分先做小实验,最后大家才会真的按同一个计划走。

English: It’s like a big group project. First define what “success” means, split work and owners, check progress regularly, test the risky parts early, and keep everyone moving together.

🎨
🎨 Frontend
🎨 前端 Day 14 / Frontend Day 14

🎨 前端 Day 14 / Frontend Day 14

React Custom Hooks — Extract & Reuse Logic

阶段 / Phase: Growth · Read time: 2 min

🧩 真实场景 / Real scenario

中文:你在做一个 dashboard,需要在多个页面复用“拉取数据 + loading/error + 取消请求 + 刷新”的逻辑。你不想每个组件都写一遍 useEffect + AbortController + 一堆状态。

English: You’re building a dashboard and need reusable “fetch + loading/error + cancellation + refresh” logic across multiple pages.


✅ 生产可用的 Custom Hook 例子 / Production-ready custom hook

import { useCallback, useEffect, useRef, useState } from "react";

type AsyncState<T> = {
  data: T | null;
  error: string | null;
  loading: boolean;
};

// Code comments in English
export function useJsonFetch<T>(url: string, deps: unknown[] = []) {
  const [state, setState] = useState<AsyncState<T>>({
    data: null,
    error: null,
    loading: true,
  });

  // Keep AbortController in a ref so we can cancel in-flight requests.
  const abortRef = useRef<AbortController | null>(null);

  const run = useCallback(async () => {
    abortRef.current?.abort();
    const controller = new AbortController();
    abortRef.current = controller;

    setState((s) => ({ ...s, loading: true, error: null }));

    try {
      const res = await fetch(url, { signal: controller.signal });
      if (!res.ok) throw new Error(`HTTP ${res.status}`);
      const json = (await res.json()) as T;
      setState({ data: json, error: null, loading: false });
    } catch (e) {
      // Abort is not a real “error” the user should see
      if ((e as any)?.name === "AbortError") return;
      setState({ data: null, error: (e as Error).message, loading: false });
    }
  }, [url, ...deps]);

  useEffect(() => {
    void run();
    return () => abortRef.current?.abort();
  }, [run]);

  return { ...state, refresh: run };
}

怎么用 / How to use:

type User = { id: string; name: string };

function UsersPanel() {
  const { data, loading, error, refresh } = useJsonFetch<User[]>("/api/users");

  if (loading) return <div>Loading…</div>;
  if (error) return <button onClick={refresh}>Retry: {error}</button>;
  return (
    <div>
      <button onClick={refresh}>Refresh</button>
      <ul>{data?.map((u) => <li key={u.id}>{u.name}</li>)}</ul>
    </div>
  );
}

🧠 “猜猜这段代码输出什么?”/ Output quiz

function Demo() {
  const [n, setN] = useState(0);

  const inc = useCallback(() => setN(n + 1), []);

  return <button onClick={inc}>{n}</button>;
}

A) 每次点击都会正常 +1 / Increments correctly each click

B) 永远显示 0 / Always shows 0

C) 只会变成 1,然后卡住 / Becomes 1 then stuck

D) 组件会崩溃 / Component crashes

正确答案 / Correct: C

中文解释:useCallback(..., [])inc 固定住了,但它闭包里捕获的 n 永远是初始值 0,所以每次都是 setN(0 + 1)

English: The callback is memoized with an empty deps array, so it captures n=0 forever. Each click sets n to 1 again.

✅ 修复方式:用 functional update

const inc = useCallback(() => setN((x) => x + 1), []);

❌ 常见错误 vs ✅ 正确方式 / Common mistake vs correct approach

❌ 错误:custom hook 里依赖不稳定,导致无限刷新 / unstable deps causing loops

useEffect(() => {
  fetch(url).then(...)
}, [options]) // options is a new object every render

✅ 正确:让依赖稳定(useMemo / useCallback / 把对象提升到外层)

const options = useMemo(() => ({ headers: { "x": "1" } }), []);
useEffect(() => {
  fetch(url, options).then(...)
}, [url, options])

什么时候用 / When to use

  • ✅ 当你要复用“状态 + 副作用 + 取消/清理 + 触发刷新”的组合逻辑
  • ✅ 当你希望组件变得更像“UI 视图层”,逻辑下沉到 hook

什么时候不要用 / When NOT to use

  • ❌ 只是复用一个纯函数:直接写 utility function 就好
  • ❌ hook 内部逻辑强耦合某个页面的 UI 结构:可能该用组件/组合而不是 hook
  • ❌ 你还没搞清楚边界:先写在组件里,稳定后再抽取(避免过早抽象)

📚 References

  • React Docs — Reusing Logic with Custom Hooks: https://react.dev/learn/reusing-logic-with-custom-hooks
  • React Docs — useCallback: https://react.dev/reference/react/useCallback
  • MDN — AbortController (cancel fetch): https://developer.mozilla.org/en-US/docs/Web/API/AbortController

🧒 ELI5

中文:Custom Hook 就像把“做菜步骤”写成一个固定食谱。以后每次要做同样的菜(同样的逻辑),你就直接用这份食谱,而不是每次都从头想一遍。

English: A custom hook is a reusable recipe for state + side effects. You call the recipe in different components instead of rewriting the steps each time.

🤖
🤖 AI
🤖 AI Day 14

🤖 AI Day 14

LoRA & QLoRA — Efficient Fine-Tuning

Mode: CONCEPT · Category: Training · Read time: 2 min

🧠 直觉解释 / Intuition

中文:

全量微调(full fine-tuning)像是把整本教科书都重写一遍:效果可能好,但成本极高(显存/时间/存储),而且不小心会“改坏”模型的通用能力。

LoRA(Low-Rank Adaptation)更像是:

  • 原模型权重冻结不动(不重写教科书)
  • 只加一层“薄薄的可训练适配器”来改变模型行为(像贴便签/补丁)

QLoRA 则是在 LoRA 基础上再做一步:

  • 把基座模型量化(比如 4-bit)来极大降低显存占用
  • 仍然用 LoRA 训练小适配器,从而让你在更小的 GPU 上也能做高质量微调

English:

Full fine-tuning is rewriting the whole book: powerful but expensive and risky. LoRA freezes the base model and learns small low-rank “adapter” matrices (patches). QLoRA further quantizes the base model (e.g., 4-bit) to cut VRAM dramatically while still training LoRA adapters.


⚙️ 它是怎么工作的 / How it works

中文:

在 Transformer 里,大量参数集中在注意力/FFN 的线性层(比如 W)。LoRA 把权重更新 ΔW 表示为两个小矩阵的乘积:

  • ΔW = A · B,其中 A、B 的秩(rank)很小(r ≪ d)
  • 训练时只更新 A、B(参数量从 O(d²) 变成 O(2·d·r))
  • 推理时可以把 ΔW 合并回 W(不增加太多推理开销)

QLoRA:

  • 基座权重用 4-bit NF4 等量化方式存储
  • 训练时用更高精度(比如 bfloat16)在计算路径中做补偿(常见做法是 double quantization 等技巧)

English:

LoRA factorizes the weight update as ΔW = A·B with small rank r. You train only A and B (far fewer parameters). QLoRA quantizes the base weights (often 4-bit) and trains LoRA adapters on top, using careful compute dtypes/quantization tricks to keep quality.


✅ 什么时候用 / When to use

  • 中文:

- 你想针对“特定任务/风格/领域语料”提升效果,但预算有限

- 你希望可控地“加能力”,并且随时能切换不同 adapter(一个基座多个 LoRA)

- 你需要在单卡/小显存环境训练

  • English:

- You need task/domain/style adaptation on a budget

- You want modular adapters you can swap (one base, many LoRAs)

- You’re constrained by VRAM (single GPU / smaller GPUs)


🧪 可运行示例(≤15 行)/ Runnable snippet (≤15 lines)

下面示例展示“加载 LoRA adapter”的最小思路(训练通常更长、代码更多)。
# pip install -U transformers peft torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = "gpt2"  # demo base model
lora_path = "./my_lora_adapter"  # your saved LoRA adapter folder

tok = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base)
model = PeftModel.from_pretrained(model, lora_path)

prompt = "Write a short product update:"
print(tok.decode(model.generate(**tok(prompt, return_tensors="pt"), max_new_tokens=40)[0]))

📚 References

  • LoRA paper (arXiv): https://arxiv.org/abs/2106.09685
  • QLoRA paper (arXiv): https://arxiv.org/abs/2305.14314
  • Hugging Face PEFT docs: https://huggingface.co/docs/peft/index

🧒 ELI5

中文:

LoRA 就像给机器人加一副“可拆卸的小眼镜”,不改它原来的大脑,只训练这副眼镜让它更擅长某件事。QLoRA 则是把机器人的大脑“压缩存起来”,省空间省钱,但仍能换不同眼镜来学习新技能。

English:

LoRA is a small attachable add-on that changes behavior without rewriting the whole brain. QLoRA compresses the brain to save memory, then still learns those small add-ons.

Attention Heatmap

When the model processes it, it attends most strongly to animal.

Theanimaldidntcrossthestreetbecauseitwastootired
it →