ursb.me / notes
FIELD NOTE / 06 网络协议 Network Protocols 2026

一次请求
一生

A Request,
end to end.

一个 GET 请求要在 UDP 之上跑完 13 道协议工序、跨 4 个加密级、穿过 3 类流,才能让你看到一个 200 OK——然后连接还要走完关闭、排空、复活三种结局。
这是 HTTP/3 与 QUIC 的全景手册,每一步都标出对应的 RFC 条款。

A single GET has to walk thirteen protocol stages on top of UDP, four cryptographic levels and three stream classes before it can land a single 200 OK — then the connection still has to walk close, drain, or revive.
This is a field map of HTTP/3 and QUIC, with every step pinned to the relevant RFC clause.

协议流水线 · 24 章 · 4 段 Protocol pipeline · 24 chapters · 4 acts ▸ 滚动开始 ▸ scroll to start
I为什么是 H3Why H3 II传输层 · QUICTransport · QUIC IIIHTTP/3 与生命周期HTTP/3 & lifecycle IV现状 · 决策State of the world
CHAPTER 01

三个公式 — HTTP 到底是什么

Three formulas — what is HTTP, really?

三个公式,一具协议骨骼

three formulas, one protocol skeleton

"HTTP" 在大多数人嘴里是一种东西——一个能让浏览器去网站取页面的协议。但工程师如果还把它当成一种东西,就永远理解不了为什么会有 HTTP/3。HTTP 从来不是一个协议,它是三个正交协议的乘积

To most people, "HTTP" is a thing — the protocol your browser uses to fetch a page. Engineers who keep thinking of it as one thing will never understand why HTTP/3 exists. HTTP has never been one protocol; it has always been the product of three orthogonal layers.

公式 1 / FORMULA 1FORMULA 1
HTTP = Semantics + Framing + Transport HTTP/1.1 = RFC 9110 + RFC 9112 (ASCII) + TCP HTTP/2 = RFC 9110 + RFC 9113 (binary) + TCP + TLS HTTP/3 = RFC 9110 + RFC 9114 (binary) + QUIC (UDP)
推论:HTTP/1→2→3 只换了下面两层。语义没动。Implication: HTTP/1→2→3 only swap the bottom two layers. The semantics never moved.
公式 2 / FORMULA 2FORMULA 2
QUIC = UDP + TLS 1.3 + Loss recovery + Congestion control + Streams
推论:QUIC 不是 "TCP over UDP"。它把 TCP 整个塞进了用户态,并把 TLS 写进了协议本身。Implication: QUIC is not "TCP over UDP". It is TCP rewritten in user space, with TLS welded into the protocol itself.
公式 3 / FORMULA 3FORMULA 3
HTTP/3 = QUIC streams + QPACK + H3 framing
推论:HTTP/3 拿掉了 HTTP/2 里所有"为了 TCP 而做"的复杂度(优先级树、PUSH_PROMISE、HPACK 严序依赖)。剩下的东西很薄。Implication: HTTP/3 strips out everything HTTP/2 only had to do because of TCP — priority trees, PUSH_PROMISE, HPACK's strict ordering. What's left is thin.

三层「骨骼」对照

HTTP anatomy at a glance

版本Version Semantics Framing Transport
HTTP/0.9 (1991)GET onlyTCP
HTTP/1.0 (1996, RFC 1945)headers, methodsASCII, 1 req / connTCP
HTTP/1.1 (1997-2022, RFC 9112)同上 + chunked+ chunked, keepaliveASCII, keepalive, pipeliningTCP (+ TLS)
HTTP/2 (2015-2022, RFC 9113)RFC 9110二进制 · 多路复用 · HPACKbinary · mux · HPACKTCP + TLS 1.2/1.3
HTTP/3 (2022, RFC 9114)RFC 9110二进制 · 简化 · QPACKbinary · simpler · QPACKQUIC (UDP+TLS 1.3)
FIELD NOTE FIELD NOTE 公式 1 是这篇文章的真正主语
所有"为什么 HTTP/3 这么设计"的问题,答案都是"因为它在同一份语义下,把FramingTransport 都换了"。
Formula 1 is the real subject of this essay.
Every "why does HTTP/3 do it this way" question collapses into "because the semantics stayed the same, while Framing and Transport both got replaced".
CHAPTER 02

家谱 — 三十年 HTTP 演进

Family tree — 30 years of HTTP

从 Tim Berners-Lee 的一行 GET 到 Cloudflare 的 50% 全网流量

from Tim Berners-Lee's first GET to Cloudflare's 50% global traffic

HTTP/3 不是凭空出现的。它是 30 年技术堆栈一次次试错的产物:从 HTTP/0.9 的一行 GET /,到 SPDY 的实验,到 HTTP/2 的"二进制化",再到 QUIC 把 TCP 整个搬进用户态。每一步都少做了一个假设

HTTP/3 didn't appear from nowhere. It's the product of thirty years of trial and error: HTTP/0.9's one-line GET /, SPDY's experiments, HTTP/2's binary framing, finally QUIC dragging TCP into user space. Each step drops one assumption.

1991 1996 1999 2009 2015 2018 2021 2022 → HTTP/0.91991 · TBL HTTP/1.0RFC 1945 HTTP/1.1RFC 2616 → 9112 SPDYGoogle, 2009 HTTP/2RFC 7540 · 9113 gQUIC2012 · Roskind IETF QUIC2016 WG QUIC v1RFC 9000 · 2021-05 HTTP/3RFC 9114 · 2022-06 TLS 1.2RFC 5246 · 2008 TLS 1.3RFC 8446 · 2018 HTTP semantics line HTTP framing release QUIC transport TLS milestone retired / merged
FIG 02·1 HTTP 协议家谱 · 1991 → 2026 · 三条主线(semantics / framing / transport)的交错演进。 Fig 02·1 · HTTP family tree, 1991 → 2026 · three lines (semantics / framing / transport) braiding through 30 years.

关键节点

Key milestones

年份Year 事件Event 关键人物 / 文档Person / Doc
1991HTTP/0.9 — 一行 GET /single-line GET /Tim Berners-Lee · CERN
1996HTTP/1.0 · RFC 1945Henrik Frystyk Nielsen · W3C
1997HTTP/1.1 · RFC 2068 → 2616 (1999) → 7230 (2014) → 9112 (2022)Roy Fielding · UCI
2008TLS 1.2 · RFC 5246Tim Dierks · Eric Rescorla
2009SPDY 在 Chrome 实验experimental in ChromeMike Belshe · Roberto Peon · Google
2012gQUIC 在 Google 内部at GoogleJim Roskind
2015HTTP/2 · RFC 7540Mark Nottingham · Martin Thomson
2016IETF QUIC WG 成立charteredMark Nottingham · Lars Eggert
2018TLS 1.3 · RFC 8446Eric Rescorla · Mozilla
2018-11"HTTP/3" 正式命名name finalisedMark Nottingham · IETF 103
2021-05RFC 9000/9001/9002 · QUIC v1Iyengar · Thomson · Bishop · Pardue
2022-06RFC 9114 · HTTP/3Mike Bishop · Akamai
2022-06RFC 9204 · QPACKCharles 'Buck' Krasic · Mike Bishop · Alan Frindell
2023RFC 9460 · HTTPS RR (SVCB)Ben Schwartz · Mike Bishop · Erik Nygren
2024QUIC v2 · RFC 9369 · 字段排列变更,反僵化field re-shuffle, anti-ossificationMartin Duke
TRIVIA HTTP/3 命名差点叫 "HTTP over QUIC"。2018 年 11 月在 IETF 103 Bangkok 会上,Mark Nottingham 一句"为什么不直接叫 HTTP/3"被点头通过——这意味着 IETF 第一次公开承认 transport 选择是 HTTP 版本号的一部分。 HTTP/3 was almost called "HTTP over QUIC". At IETF 103 in Bangkok (Nov 2018), Mark Nottingham casually asked "why not just HTTP/3" — the room nodded. That was IETF's first public admission that transport choice is part of HTTP's version number.
本章引用Chapter references
RFC
RFC 9000 · QUIC
RFC
RFC 8446 · TLS 1.3
RFC
RFC 8449 · Record size limits
CHAPTER 03

HTTP/2 的死结 — TCP 的三宗罪

HTTP/2's deadlock — TCP's three sins

为什么花了七年发现 HTTP/2 还不够

why it took seven years to find out HTTP/2 wasn't enough

2015 年 HTTP/2 发布的时候,大家以为 HTTP 终于"完工"了。它把 ASCII 换成了二进制,把 6 条 TCP 连接压成 1 条,把头部用 HPACK 压扩到 95%。结果跑了三年实战,工程师们发现 HTTP/2 留下了三个根本治不好的问题——而且都不是 HTTP/2 的错。是 TCP 的错。

When HTTP/2 shipped in 2015, everyone thought HTTP was finally "done". It swapped ASCII for binary, collapsed 6 TCP connections into 1, compressed headers ~95% with HPACK. Three years of production later, engineers found that HTTP/2 left three diseases that couldn't be cured — and none of them were HTTP/2's fault. They were TCP's fault.

三宗罪 · The three sins

The three sins

罪一 · TCP HOL
SIN 1 · TCP HOL
"一个包卡死全场""One packet stalls everyone"

HTTP/2 在应用层多路复用 100 个流,但 TCP 在传输层仍然要求按序交付。一个数据包丢了,整条 TCP 连接停下来等重传——即使另外 99 个流毫无关系。这叫 TCP head-of-line blocking

HTTP/2 multiplexes 100 streams at the application layer, but TCP at the transport layer still demands in-order delivery. Drop one packet, the entire TCP connection halts — even if the other 99 streams are unrelated. This is TCP head-of-line blocking.

实测:3% 丢包率下 HTTP/2 经常比 HTTP/1.1 多连接还慢。

Measured: at 3% loss, HTTP/2 often loses to HTTP/1.1 multi-connection.

罪二 · 握手 RTT
SIN 2 · Handshake RTT
"三步走才能开口""Three steps before you speak"

HTTP/2 必须跑在 TLS 上(实际上)。一次新连接要:TCP SYN/SYN-ACK/ACK(1 RTT)+ TLS 1.2 ClientHello/ServerHello(2 RTT)= 3 RTT;用 TLS 1.3 + TCP Fast Open 还是 2 RTT。200ms 的跨洲 RTT 下,开口就花 400~600ms。

HTTP/2 must run over TLS (in practice). A fresh connection needs: TCP SYN/SYN-ACK/ACK (1 RTT) + TLS 1.2 ClientHello/ServerHello (2 RTT) = 3 RTT; TLS 1.3 + TCP Fast Open still 2 RTT. At 200ms intercontinental RTT, you spend 400-600ms before saying a word.

实测:手机 4G/5G 上,握手时间常常超过整个页面的 LCP 预算。

Measured: on 4G/5G, handshake alone often eats the page's entire LCP budget.

罪三 · 连接绑死
SIN 3 · IP-pinned
"Wi-Fi 切 5G 就断""Wi-Fi to 5G ⇒ disconnect"

TCP 连接由 (src_ip, src_port, dst_ip, dst_port) 五元组定义。手机从 Wi-Fi 切到 5G,src_ip 变了——TCP 连接立即报废,TLS 会话也跟着重建。前端 SPA 里那个长连接 WebSocket 就这样断了。

A TCP connection is identified by the 4-tuple (src_ip, src_port, dst_ip, dst_port). When a phone switches Wi-Fi → 5G, src_ip changes — the TCP connection is dead on arrival, the TLS session along with it. That long-lived WebSocket inside your SPA? Gone.

实测:Meta 测算 5% 的视频流断流是因为切网。

Measured: Meta attributes ~5% of video stalls to network switches.

罪四(隐藏) · 协议僵化
SIN 4 (hidden) · Ossification
"想加新字段都加不了""You can't add a new field"

中间盒(运营商 NAT、企业防火墙、CDN)对 TCP/TLS 字段有路径上的判断逻辑。RFC 允许的扩展字段到中间盒手里就被丢包。TLS 1.3 当初为此用了"中间盒兼容模式"伪装成 TLS 1.2。HTTP/3 干脆躲到 UDP 里。

Middleboxes — ISP NATs, enterprise firewalls, CDNs — inspect TCP/TLS fields and silently drop anything new. RFC-permitted extensions get blackholed in flight. TLS 1.3 ended up disguising itself as TLS 1.2. HTTP/3 just hides inside UDP.

实测:TLS 1.3 早期遭遇 ~3% 中间盒丢包。

Measured: early TLS 1.3 saw ~3% middlebox drops.

罪一可视化 · TCP HOL vs QUIC 流独立

Visualising Sin 1 · TCP HOL vs QUIC stream independence

HTTP/2 OVER TCP HTTP/3 OVER QUIC single byte stream, in-order independent streams, per-stream order A B C D E app sees: TCP byte stream — one ordered pipe A·1 A1 B·1 B1 C·1 ✗ LOST D·1 E·1 ⏸ ALL streams blocked until C·1 retransmit A1 B1 ━━━━━ stalled ━━━━━ Result · 全停 5 streams suffer for 1 lost packet. RTT spent on retransmit + HOL. A B C D E app sees: A·1 A1 B·1 B1 C·1 ✗ LOST D·1 D1 E·1 E1 ⏸ stream C only A1 B1 D1 E1 (C waits) Result · 只 C 停 4/5 streams unaffected, deliver in flight. RTT to recover only stream C.
FIG 03·1 5 条流,PN3 丢一次。左:TCP 一根管子按序送,全停;右:QUIC 五条独立流,只 C 流停。 Fig 03·1 · Five streams, one packet (PN3) lost. Left: TCP one ordered pipe, every stream stalls; Right: QUIC five independent streams, only C blocks.
「HTTP/2 把 HTTP 治好了,
但 HTTP/2 自己被 TCP 治残了。」
"HTTP/2 cured HTTP,
and then TCP crippled HTTP/2."
Daniel Stenberg · curl · 2018

为什么不直接改 TCP?

Why not just fix TCP?

这是 IETF 在 2015-2016 年最先想到的方案。但 TCP 是内核态协议——任何字段改动都要等 Linux / Windows / iOS / Android / 每一台路由器升级一遍。看看 TCP Fast Open(RFC 7413, 2014)现状:发布十年了,实际部署率仍然 < 5%,因为中间盒会丢掉它的 cookie。

结论:在 TCP 上演进 = 在十年这个时间尺度上演进。

That was IETF's first instinct in 2015-2016. But TCP lives in the kernel — any field change waits for Linux / Windows / iOS / Android / every router to ship a new version. Look at TCP Fast Open (RFC 7413, 2014): ten years on, deployment is still < 5%, because middleboxes drop its cookie.

Conclusion: evolving on top of TCP means evolving on a decade timescale.

FIELD NOTE · 数字 FIELD NOTE · NUMBERS Google 在 2016 SIGCOMM 论文里给了一个让所有人闭嘴的数字:Google.com 搜索的端到端延迟,gQUIC 比 TCP+TLS 快 8%(中位),慢链路上快 16%(中位)。这两个百分点是 IETF QUIC 工作组成立的直接动力。 Google's 2016 SIGCOMM paper dropped one number that shut the room up: end-to-end latency of Google.com search was 8% faster on gQUIC than TCP+TLS at the median, 16% faster at slow-link median. Those two percentage points were the direct trigger for the IETF QUIC WG.
CHAPTER 04

为什么是 UDP — 中间盒,僵化,与可部署性

Why UDP — middleboxes, ossification, and deployability

不是因为 UDP 好,是因为 UDP 不被人管

not because UDP is good, but because nobody touches UDP

"为什么 QUIC 跑在 UDP 上?" 这是任何讲 HTTP/3 的人都要回答的第一个问题。直觉答案"UDP 没有可靠传输、所以 QUIC 自己实现可靠"是错的——这是结果,不是原因。真正的原因只有一个:UDP 是当今互联网上仅剩的、中间盒不会乱碰的协议号。

"Why does QUIC run on UDP?" is the first question every HTTP/3 talk has to answer. The intuitive answer — "UDP isn't reliable, so QUIC has to add its own reliability" — is wrong. That's a consequence, not a cause. The real reason is one sentence: UDP is the only protocol number left on the modern internet that middleboxes don't mess with.

候选清单 · The shortlist

The shortlist

选项Option 优势Pros 为什么不行Why not
SCTP 天然多流,按消息边界传输native multi-stream, message-based IP protocol number 132 — 大多数 NAT 直接丢包,~50% 丢包率IP protocol 132 — most NATs drop, ~50% loss
DCCP 无序但拥塞控制unordered with cc IP protocol 33 — 同上,部署率 < 0.1%IP protocol 33 — same, < 0.1% deployed
新协议号New IP protocol 理论最干净theoretically cleanest 需要全球每一台路由器+NAT+防火墙升级,不可能needs every router/NAT/firewall on Earth to upgrade — impossible
TCP option 复用现有连接reuse existing conn 中间盒会清空未知 TCP optionsmiddleboxes strip unknown options
UDP 所有 NAT/防火墙都放行 UDP/443UDP/443 traverses everywhere 需要在用户态重造 TCP——但这就是 QUIC 想做的have to rebuild TCP in user space — but that's exactly what QUIC wants
FIELD NOTE · Ossification FIELD NOTE · Ossification "协议僵化"(protocol ossification)是 2015 年后 IETF 的核心关切。一个协议越成功,就越僵——因为越多中间盒会假设它的字段含义。TCP 已经僵到任何 RFC 改动都要 10 年才能跑通。QUIC 的策略是主动反僵化:从第一天起就加密 packet number、加密 header flags、GREASE 假参数、定期发版本协商——让中间盒"除了 UDP 头和源端口什么都看不见"。 "Protocol ossification" became IETF's main concern after 2015. A protocol becomes more rigid the more successful it gets — because more middleboxes start assuming what its fields mean. TCP is so ossified that any RFC change takes a decade to propagate. QUIC's strategy is active anti-ossification: encrypt packet numbers from day one, encrypt header flags, GREASE fake parameters, ship periodic version negotiation — so middleboxes see nothing but the UDP header and source port.

用户态的代价

The user-space cost

把 TCP 的所有功能(重传、拥塞控制、流控、多路复用、连接管理)搬到用户态,意味着每个 QUIC 数据包都要:进内核 → recvfrom() 拷贝到用户态 → 解密 → 处理 → 加密 → sendto() 拷贝回内核 → 网卡。Fastly 2020 年的实测:QUIC 的 CPU 成本是 TCP+TLS 的 ~2 倍。这是 HTTP/3 真正的负面成本,我们会在第 22 章详细讲。

Moving everything TCP did (retransmit, cc, flow control, mux, connection management) into user space means every QUIC packet has to: enter kernel → recvfrom() copy to user space → decrypt → handle → encrypt → sendto() copy back → NIC. Fastly's 2020 measurement: QUIC costs ~2x the CPU of TCP+TLS. That is HTTP/3's real downside, and we will revisit it in chapter 22.

FIELD NOTE · 反讽 FIELD NOTE · Irony UDP 在 1980 年被设计成"最简单的不可靠协议"——只是一层薄薄的端口分发。四十年后,它成了承载世界上一半 web 流量的可靠协议宿主。"最简单"反而是最难僵化的。 UDP was designed in 1980 as "the simplest unreliable protocol" — a thin port demultiplexer. Forty years later, it has become the host of half the world's web traffic — reliably. "Simplest" turns out to mean "hardest to ossify".
CHAPTER 05

QUIC 全景 — 4 加密级 · 3 PN 空间 · 2 类 Header

QUIC at a glance — 4 levels · 3 PN spaces · 2 headers

在钻进每章细节之前,先把骨架记牢

memorise the skeleton before diving into each chapter

QUIC 的设计可以用三个小数字描述:4 个加密级(Initial / 0-RTT / Handshake / 1-RTT)、3 个 Packet Number 空间(Initial / Handshake / Application)、2 类 Header(Long / Short)。这三个数字之间的关系,是后面所有章节的预读骨架。

QUIC's design fits into three small numbers: 4 encryption levels (Initial / 0-RTT / Handshake / 1-RTT), 3 Packet Number spaces (Initial / Handshake / Application), 2 Header types (Long / Short). The relationship between these three numbers is the pre-read skeleton for every later chapter.

协议栈 · The stack

The stack

应用
App
HTTP/3 (RFC 9114)
+ QPACK
QPACK (RFC 9204)
传输
Transport
QUIC (RFC 9000-2)
加密
Crypto
TLS 1.3 (RFC 8446)

↓ UDP/443 · IPv4 / IPv6 · 链路层link layer

↓ UDP/443 · IPv4 / IPv6 · link layer

STACK 注意 TLS 1.3 不是在 QUIC 之下而是在 QUIC 内部。QUIC 用 CRYPTO 帧携带 TLS 1.3 的 records,而不是反过来。这就是为什么 RFC 9001 叫 "Using TLS to Secure QUIC" 而不是 "QUIC over TLS"。 Note that TLS 1.3 is not below QUIC but inside QUIC. QUIC carries TLS 1.3 records inside CRYPTO frames, not the other way around. That is why RFC 9001 is titled "Using TLS to Secure QUIC" — not "QUIC over TLS".

4 个加密级 · 4 levels

The four encryption levels

Initial
公开 salt + DCID 派生密钥。任何人都能解密——这层的"加密"只是为了反僵化、防止中间盒乱碰。
Keys derived from public salt + DCID. Anyone can decrypt — this "encryption" only exists to fight ossification, to keep middleboxes from poking inside.
0-RTT (Early Data)
用前一次会话恢复的 PSK 派生。只在恢复连接时存在。承担重放风险(见 Ch08)。
Keys derived from a resumed session's PSK. Only exists on connection resumption. Carries replay risk (see Ch08).
Handshake
TLS 1.3 EE/CERT/FIN 完成后派生。真加密开始,但还在握手过程中。
Keys derived after TLS 1.3 EE/CERT/FIN. Real encryption kicks in here — still inside the handshake.
1-RTT (Application)
握手完成后用的主密钥。承担 99% 的数据传输。可以做 key update(密钥滚动)。
The main key after handshake completes. Carries 99% of all data. Supports key update (rotating keys mid-connection).

3 个 PN 空间

The three PN spaces

空间 1 · Initial
Space 1 · Initial
PN_0..N
独立编号,从 0 起
independent, starts at 0
  • CRYPTO (ClientHello)
  • ACK (initial)
  • PADDING (anti-amp)
空间 2 · Handshake
Space 2 · Handshake
PN_0..M
独立编号,从 0 起
independent, starts at 0
  • CRYPTO (EE/CERT/FIN)
  • ACK (handshake)
空间 3 · Application
Space 3 · Application
PN_0..∞
独立编号,从 0 起
independent, starts at 0
  • STREAM, ACK, MAX_DATA …
  • HANDSHAKE_DONE
  • NEW_CONNECTION_ID
为什么三个空间? Why three? 如果 Initial / Handshake / 1-RTT 共用一套 PN,丢包检测就会"看错"——你不知道是 Initial 包丢了还是 1-RTT 包丢了,因为它们已经被你的内核乱序处理。三个独立空间 = 三套独立的 ACK 状态 = 没有"跨级"的 head-of-line blocking。这是 QUIC 比 TCP+TLS 干净的根源之一。 If Initial / Handshake / 1-RTT shared one PN, loss detection would "guess wrong" — you can't tell whether an Initial packet was lost or a 1-RTT one, because the kernel may have reordered them. Three independent spaces = three independent ACK clocks = no cross-level head-of-line blocking. This is one of the reasons QUIC is structurally cleaner than TCP+TLS.

2 类 Header

The two header types

Long Header · 4 种
Long Header · 4 forms
握手期使用
used during handshake

字段:Version(32) · DCID Len(8) · DCID · SCID Len(8) · SCID · Type-specific...

Fields: Version(32) · DCID Len(8) · DCID · SCID Len(8) · SCID · Type-specific...

Initial · 0-RTT · Handshake · Retry

Short Header · 1 种
Short Header · 1 form
握手后使用(99% 流量)
post-handshake (99% of traffic)

字段:Flags(8) · DCID · PN(8/16/24/32)

Fields: Flags(8) · DCID · PN(8/16/24/32)

1-RTT only

FIELD NOTE · 字段对齐 FIELD NOTE · Field alignment QUIC v2(RFC 9369)故意把字段排列搅乱了一次——目的就是检测中间盒是否在做"它不该做的" QUIC v1 字段解析。如果中间盒按 v1 字段顺序解析 v2 包,会立刻报错。这是主动反僵化策略落到字节级的体现。 QUIC v2 (RFC 9369) deliberately shuffled the field order — to detect middleboxes doing things they shouldn't with QUIC v1 field parsing. If a middlebox tries to parse a v2 packet with v1 layout, it breaks immediately. This is active anti-ossification realised at the byte level.
本章引用Chapter references
RFC
RFC 9114 · §3 HTTP request lifecycle
MAIN LINE · THE REQUEST

一次 GET ursb.me 的一生 — 字节级生命周期

The life of one GET ursb.me — a byte-level lifecycle

从 DNS 查询到 200 OK 到连接关闭 · 每一步都标 RFC §

from DNS query to 200 OK to connection close · every step pinned to its RFC §

接下来 14 章流水线都用同一条请求把它们串起来——在 Chrome 地址栏输入 https://ursb.me,按回车。我们跟着这次请求的字节流走完它的一生:DNS 解析、初次握手、传输请求、收到响应、连接闲置、网络切换、最后优雅关闭——一共 10 个阶段。每章都有一个 "◇ 在我们的 GET 请求里" 卡片,告诉你这一章的输入、变换、输出分别是什么。

这条主线的角色清单是:

The next 14 pipeline chapters all hang off one request: type https://ursb.me in Chrome, press Enter. We follow this request's byte stream through its full life: DNS query, first handshake, request payload, response, idle, network switch, graceful close — 10 phases. Every chapter below carries a "◇ In our GET request" card showing input, transform, output at that stage.

The cast on this main line:

角色清单 · The setup

The setup

// what the user typed URL = "https://ursb.me/" Method = GET // idempotent → 0-RTT eligible // client Browser = "Chrome 134 on macOS" Library = "google/quiche (C++)" src_ip = 192.168.1.42 // Wi-Fi at T+0 src_port = 52341 // ephemeral // server Origin = "ursb.me" Stack = "Cloudflare quiche + nginx 1.26" dst_ip = 39.105.102.252 dst_port = 443 // UDP — not TCP // network RTT = 40 ms // home Wi-Fi → BJ aliyun Loss = ~1.5% // peak hour Path MTU = 1500 // stored state from prior visit PSK ticket = "valid · age=2h" // 0-RTT eligible

10 个阶段全景

All 10 phases at a glance

Client Server time T+0 Phase 0 · DNS HTTPS RR · DoH/DoQ · RFC 9460 + 8484/9250 T+5ms Phase 1 · Initial[CH + 0-RTT[GET /]] 1228 B · padded ≥ 1200 · RFC 9000 §17.2, §14.1 T+25ms Phase 2 · Initial[SH] + Handshake[EE,Cert,CV,FIN] ~2900 B (cert chain + TP) · RFC 9001 §4 T+45ms Phase 3 · 1-RTT[STREAM 0: 200 OK + HTML] 3200 B body · QPACK 5 B header · RFC 9114 §7 T+65ms Phase 4 · 1-RTT[FIN + ACK + HANDSHAKE_DONE_ACK] stream 0 closed · RFC 9000 §19.8 Phase 5 · idle · keep-alive PING every ~25s RFC 9000 §10.1.2 T+8min Phase 6 · PATH_CHALLENGE / RESPONSE (Wi-Fi → 5G) new src_ip · same CID · RFC 9000 §9 T+15min Phase 7-8 · GOAWAY → CONNECTION_CLOSE → drain (3 PTO) RFC 9114 §5.2 · RFC 9000 §10
FIG cmain·1 主线 10 阶段总时序 · 颜色编码:蓝=客户端,紫=服务端加密层,绿=数据交付,铜=客户端响应,黄=关闭。 Fig cmain·1 · Full main-line timeline · colour code: blue = client, purple = server crypto, green = data delivery, copper = client response, amber = close.

阶段 0 · DNS 解析(pre-flight)

Phase 0 · DNS resolution (pre-flight)

Chrome 不会直接发 QUIC 包——它先要问 DNS:ursb.me 在哪?支持哪些 ALPN? 这里 Chrome 用 DoH(DNS over HTTPS,RFC 8484)向 1.1.1.1 查询,请求里同时问 A(IPv4)和 HTTPSRFC 9460)两种 RR——后者一行就能拿到 ALPN 列表 + IP hint,省一个 RTT。

Chrome can't fire a QUIC packet yet — it needs DNS first: where's ursb.me? Which ALPNs does it speak? Chrome queries 1.1.1.1 over DoH (RFC 8484), asking simultaneously for A (IPv4) and the new HTTPS RR (RFC 9460). The latter returns ALPN + IP hint in one record, saving an RTT.

INPUT
URL bar string"https://ursb.me/"
OUTPUT (DNS)
DNS RR setA → 39.105.102.252
HTTPS 1 . alpn="h3,h2"
DoH wire · POST /dns-query · application/dns-messageRFC 8484 + 9460
; Question section: ursb.me HTTPS QNAME = ursb.me. QTYPE = 65 ; HTTPS (SVCB-compatible) QCLASS = 1 ; IN ; Answer ─ HTTPS RR ursb.me. 300 IN HTTPS 1 . \ alpn="h3,h2" ; ← tells the browser H3 is OK ipv4hint="39.105.102.252" ; ← skip a separate A query port=443

阶段 1 · 第一个 UDP 包出发

Phase 1 · The first UDP datagram leaves

DNS 回包 5 ms 后,Chrome 拼出第一个真正的 QUIC 包。因为有上次会话的 PSK ticket,这次走 0-RTT:ClientHello 和 GET 请求一起放进同一个 UDP 数据报

5 ms after the DNS response, Chrome assembles the first actual QUIC packet. Because we have a PSK ticket from last visit, this is a 0-RTT send: ClientHello and the GET request ride in the same UDP datagram.

INPUT
empty TLS state + PSK"send a ClientHello and a GET"
OUTPUT (wire)
1228-byte UDP datagramInitial[CRYPTO:CH+PSK]
+ 0-RTT[STREAM 0: GET]
+ PADDING
UDP/443 · the first packet on the wiretcpdump · 1228 B total
; IP/UDP headers (28 B) IPv4: src=192.168.1.42 dst=39.105.102.252 proto=17 UDP: src=52341 dst=443 len=1200 ; Coalesced QUIC datagram (1200 B) — TWO packets, ONE UDP send [Initial packet, PN_Initial=0] (~520 B) CRYPTO[0..516]: TLS 1.3 ClientHello { SNI = "ursb.me" ; (or ECH-encrypted) ALPN = ["h3"] key_share = x25519:0x9c4f… pre_shared_key = [ticket from last visit] early_data = ext_42 ; "I will send 0-RTT" quic_transport_parameters = { ; RFC 9000 §18 initial_max_data = 10_485_760 ; 10 MB initial_max_streams_bidi = 100 initial_max_streams_uni = 100 max_idle_timeout = 30_000 ; 30 s disable_active_migration = false } } [0-RTT packet, PN_Application=0] (~680 B) STREAM sid=0, off=0, fin=true: HEADERS frame (QPACK static-indexed, see Ch15) :method=GET :scheme=https :authority=ursb.me :path=/ accept=text/html,*/* user-agent=Mozilla/5.0… priority=u=0,i ; RFC 9218 PADDING × N ; pad to 1200 anti-amp floor

阶段 2 · 服务器握手响应

Phase 2 · Server's handshake response

20 ms 后第一个回程包到达。这是多包合并(coalesced datagram)的典型场景:服务器在同一个 UDP 数据报里塞了 Initial、Handshake、1-RTT 三种包,分别承载握手不同阶段的 CRYPTO 帧和首批数据。

20 ms later the first server datagram arrives. This is a classic coalesced case: the server packs Initial, Handshake and 1-RTT packets all into one UDP datagram, carrying CRYPTO frames for different handshake stages plus the first batch of response data.

INPUT (server received)
our 1228-byte datagramverifies PSK · accepts 0-RTT
OUTPUT (server → client)
~2900-byte coalesced replyInitial[SH]
+ Handshake[EE,Cert,CV,FIN]
+ 1-RTT[200 OK headers]
server → client · coalesced UDP datagramRFC 9001 §4 · RFC 9000 §12.2
[Initial packet, PN_Initial=0] (~80 B) ACK [0] ; ack the client's Initial CRYPTO[0..]: TLS ServerHello { selected_psk = 0 key_share = x25519:0xa1b2… } [Handshake packet, PN_Handshake=0] (~2600 B) CRYPTO[0..]: EncryptedExtensions { early_data=accept, alpn=h3, TP=… } Certificate { cert + intermediates · ~1700 B } CertificateVerify { sig over transcript } Finished { mac } [1-RTT packet, PN_Application=0] (~250 B) HANDSHAKE_DONE ; RFC 9000 §19.20 — keys are committed NEW_CONNECTION_ID seq=1 ; pre-stock 1 spare CID for migration STREAM sid=0, off=0: ; can answer 0-RTT GET already HEADERS (QPACK · 7 B) → :status=200, content-type=text/html

阶段 3 · 200 OK + 正文到达

Phase 3 · 200 OK + body arrives

前面那个 Handshake 包确认完后,Chrome 在 ~45 ms 收到完整正文。3200 字节的 HTML 通过同一个 Stream 0 的 DATA 帧分两个 1-RTT 包送到——这就是 0-RTT 的胜利:用户看到 200 OK 时握手还没完全结束

A few packets later, the complete body lands by ~45 ms. The 3200-byte HTML rides Stream 0 in two DATA frames spread across 1-RTT packets. The 0-RTT win is concrete here: the user sees 200 OK before the handshake is fully closed.

INPUT
HEADERS only:status=200 · content-type · content-length=3200
OUTPUT
full HTML to renderer3200 bytes body → into Chromium Loading stage
(see Field Note 02 Ch00 Loading)

阶段 4 · FIN + 收尾 ACK

Phase 4 · FIN + final ACK

Chrome 收完 3200 字节后在 STREAM 帧上看到 FIN=1,知道服务器不会再发了。客户端回一个空 STREAM(带 FIN)关闭自己的方向——这是双向流的半关闭语义。同时回一个 HANDSHAKE_DONE 的 ACK,让服务器知道可以丢掉 Handshake 密钥。

Once Chrome receives the 3200 bytes, the STREAM frame carries FIN=1 — no more data this direction. The client replies with an empty STREAM(FIN) to close its direction — bidirectional half-close semantics. It also ACKs HANDSHAKE_DONE, allowing the server to drop the Handshake keys.

INPUT (server sent)
STREAM[0] body + FIN
OUTPUT (client sent)
ACK + STREAM[0] FINstream 0 fully closed · resources freed

阶段 5 · 闲置与 PING 保活

Phase 5 · Idle & PING keep-alive

连接没有立刻关——Chrome 默认会保留它 30 秒,等下一个请求(CSS、图片、API 调用)复用。期间双方按需发 PING 帧(RFC 9000 §19.2)防 NAT 表项过期。max_idle_timeout 在 TP 里协商出来——min(client 30s, server 30s) = 30s。

The connection doesn't close immediately — Chrome holds it for 30 s, hoping the next request (CSS, images, an API call) reuses it. Either side may send PING frames (RFC 9000 §19.2) to keep NAT mappings alive. max_idle_timeout was negotiated in TP — min(client 30s, server 30s) = 30s.

INPUT
no app datastream 0 closed, others ready
OUTPUT
PING every ~25scwnd/RTT stat kept warm
NAT mapping refreshed

阶段 6 · 切网迁移

Phase 6 · Network migration

8 分钟后用户走出咖啡馆,手机切到 5G——src_ip192.168.1.42 变成 10.220.5.13。Chrome 启用预存的备用 CID,服务器看到陌生 IP + 合法 CID 立刻发 PATH_CHALLENGE。一次 RTT 内完成路径验证,连接没断

8 minutes later the user walks out of the café and the phone switches to 5G — src_ip flips from 192.168.1.42 to 10.220.5.13. Chrome activates the pre-stocked spare CID; the server sees a new IP with a valid CID and fires PATH_CHALLENGE. Path validated in one RTT; the connection survives.

INPUT
old path = Wi-Fisrc=192.168.1.42:52341
OUTPUT
new path = 5Gsrc=10.220.5.13:34188
DCID rotated · same crypto state

阶段 7-8 · 优雅关闭

Phase 7-8 · Graceful close

15 分钟后服务器决定下线这个连接(也可能是版本升级、负载均衡、配额到期),发 GOAWAY(H3 帧 0x07)告诉客户端"我不再接受新流,但已开的流我处理完"。等所有未完成的流结束后,发 CONNECTION_CLOSE(QUIC 帧 0x1c)正式结束连接。然后进入 draining 状态 3 PTO,等任何延迟的包不再处理——避免和"新连接"混淆。详见 Ch19

15 minutes in, the server decides to retire this connection (rolling deploy, load-balance, quota expiry). It sends GOAWAY (H3 frame 0x07): "I'll finish in-flight streams but accept no new ones." After the last stream is done, it sends CONNECTION_CLOSE (QUIC frame 0x1c). The server then enters draining state for 3 PTO, ignoring any late packets to avoid confusion with a "new" connection. See Ch19.

INPUT
connection still alivestreams 4, 8, 12 running
OUTPUT
GOAWAY → drain → CLOSEstreams complete
3 PTO drain · then closed

阶段 9 · Stateless Reset(备选结局)

Phase 9 · Stateless Reset (alternate ending)

如果服务器进程意外重启(OOM、crash、容器升级),客户端发的下一个 1-RTT 包会让新进程找不到对应的连接上下文。新进程不能用 CONNECTION_CLOSE(没密钥也没握手状态),只能发一个 Stateless Reset——一段看起来像随机 UDP 数据但末尾带 16 字节 reset token(在阶段 2 的 NEW_CONNECTION_ID 里预发过)的包。客户端识别 token 后才能安全地说"对方真的丢状态了",然后销毁本地连接。这是 RFC 9000 §10.3 给出的无状态恢复路径

If the server process unexpectedly restarts (OOM, crash, container upgrade), the next 1-RTT packet from the client finds the new process without any matching connection state. The new process can't send CONNECTION_CLOSE (no keys, no state). Instead it emits a Stateless Reset: a packet that looks like random UDP bytes but ends in the 16-byte reset token the original server pre-distributed via NEW_CONNECTION_ID in Phase 2. Only the client can recognise the token — and only then can it safely conclude "peer really lost state" and tear down locally. This is the stateless-recovery path of RFC 9000 §10.3.

INPUT
server restartedno conn state · cannot decrypt
OUTPUT
Stateless Resetrandom-looking 21+ B
tail = reset_token from §18.2

阶段 → 章节 路线图

Phase → chapter roadmap

每一章下面都会有一个 "◇ 在我们的 GET 请求里" 卡片,把这一章的输入/动作/输出对应到上面 10 个阶段。下面这张表先把对应关系列清楚——按这个顺序读:

Each chapter below carries a "◇ In our GET request" card that anchors its input / action / output to the 10 phases above. Use this table as the reading map:

主线阶段Main-line phase深入章节Drill-down chapterRFC §
Phase 0 · DNSCh22 Field work · Ch04 UDP9460 · 8484 · 9250
Phase 1 · Initial outCh06 UDP Datagram · Ch08 0-RTT9000 §17.2 · 9001 §4
Phase 2 · Server cryptoCh07 Handshake · Ch09 Crypto layers9001 §4-§5 · 8446 §4
Phase 3 · 200 OKCh14 H3 frames · Ch15 QPACK9114 §7 · 9204
Phase 4 · FINCh11 Streams · Ch12 Loss9000 §3 (states) · §19.8
Phase 5 · idleCh13 Congestion9000 §10.1 · §19.2 PING
Phase 6 · migrationCh17 Migration9000 §9
Phase 7-8 · closeCh19 Lifecycle (new)9114 §5.2 · 9000 §10
Phase 9 · stateless resetCh19 Lifecycle (new)9000 §10.3 · §18.2
"DNS 解析 5 ms · 握手 + 0-RTT GET 25 ms · 收到 200 OK 45 ms ·
15 分钟后优雅关闭。
整个过程 50% 的时间花在加密,30% 在等光速。"
"DNS in 5 ms · handshake + 0-RTT GET in 25 ms · 200 OK at 45 ms ·
gracefully closed 15 minutes later.
Half the time was in crypto, a third in waiting on the speed of light."
主线 · 阶段总览 main-line · phase summary
CHAPTER 06

UDP Datagram — 包在线上长什么样

UDP Datagram — what the packet looks like

字节级的 QUIC 包结构

QUIC packet structure, byte by byte

在主线里
In our request
T+20ms
线程 / 层
Layer
QUIC / UDP
RFC
9000 §17
输入 → 输出
In → Out
bytes → UDP payload

主线时间 T+20ms:你的 Chrome 浏览器把一段还没什么内容的 ClientHello(TLS 1.3)包成一个 UDP 包,从源端口 52341 发到目标 39.105.102.252:443。这一节我们把这个包按字节拆开。

Main-line time T+20ms: Chrome wraps a still-mostly-empty TLS 1.3 ClientHello into one UDP datagram, sent from source port 52341 to destination 39.105.102.252:443. This chapter takes that packet apart byte by byte.

◇ 在我们的 GET 请求里 · 主线阶段 1◇ In our GET request · Main-line phase 1

输入 / INPUT
INPUT
TLS 1.3 ClientHello~500 B 加密握手载荷~500 B encrypted handshake payload
输出 / OUTPUT
OUTPUT
1228-byte UDP datagramIP(28B) + UDP(8B) + QUIC header + Initial packet (PN=0) + 0-RTT(STREAM 0:GET) + PADDING

两类 Header

The two header types

FIG 06·1 · QUIC Long Header(握手期)
FIG 06·1 · QUIC Long Header (handshake)
0        1        2        3        4        5        6        7
byte 0 1..4 5 6..(6+DCID) +1 SCID type-specific Flags 1 byte Version 32 bit · 0x00000001 = v1 DCID len 1 Destination CID 0–20 bytes SCID len Source CID Type-specific Token, length, packet number, payload Header form: 1 bit · 1 = Long Fixed bit: 1 bit · always 1 (used for QUIC bit-mux) Type: 2 bits · 00=Initial · 01=0-RTT · 10=Handshake · 11=Retry Type-specific: 2 bits PN length encoded: 2 bits · 1/2/3/4 bytes (all 4 of these are header-protected) SHORT HEADER (1-RTT, post-handshake): Flags 1 byte Destination CID implicit length, negotiated Packet number 1–4 bytes · header-protected Encrypted payload AEAD over frames
FIG 06·1 QUIC 数据包结构 · 长/短 Header 对照 · 注意 PN 和 Flags 都被 Header Protection 加密。 Fig 06·1 · QUIC packet structure · long vs short header · note PN and flags are both header-protected.

UDP 之外没有别人

Nothing outside the UDP

UDP/443 · main-line T+20ms · first packet on the wiremock · constructed to spec
; Constructed by hand to match RFC 9000 §17.2 byte-for-byte; NOT a real ; tcpdump capture. The encrypted regions and checksums are abbreviated. ; To capture a real one: `SSLKEYLOGFILE=keys.log ./quiche-client https://ursb.me` ; then `tcpdump -i lo0 -w cap.pcap udp port 443` → open in Wireshark. 0x0000: 4500 04dc 0001 0000 4011 8c2b ; IP header (UDP proto 17 = 0x11) 0x0010: c0a8 012a 2769 66fc cc55 01bb ; UDP: src=52341, dst=443 (0x1bb) 0x0014: 04c8 0000 ; UDP len=1224, checksum (computed) 0x001c: c0 ; Flags = 0b11000000 → Long header, Initial 0x001d: 00 00 00 01 ; Version = QUIC v1 (RFC 9000) 0x0021: 08 7b 0f 23 e4 a1 c4 12 5b ; DCID len=8, DCID=... 0x002a: 08 3a 51 02 d8 ef 99 11 7c ; SCID len=8, SCID=... 0x0033: 00 ; Token Length = 0 (no Retry token) 0x0034: 44 b0 ; Length = 1200 (var-int) 0x0036: [ ENCRYPTED ] ; Packet number + payload (AEAD) ; payload (~1180 bytes after decryption): ; CRYPTO[0..len] = TLS 1.3 ClientHello ; PADDING ; pad to ≥ 1200 bytes — anti-amplification
为什么是 1200 字节? Why 1200 bytes? RFC 9000 §14.1 强制 Initial 包必须填到 ≥ 1200 字节。原因:(1) 反放大攻击——客户端"先付够字节",服务器才有 3 倍预算(见 Ch18)回大包;(2) PMTU 下限——如果路径 MTU < 1200 包就被丢,客户端立刻 fallback。1200 这个数字 RFC 9000 直接定义为下限(不是从 IPv6 MTU 1280 减出来的——IPv6 头 40 + UDP 头 8 = 48,1280 − 48 = 1232,比 1200 还多 32 字节)。RFC 选 1200 是为了在 IPv4 隧道、6to4、IPsec 等场景留出额外 32 字节的封装余量。 RFC 9000 §14.1 mandates Initial packets ≥ 1200 bytes. Two reasons: (1) anti-amplification — the client must "send enough first" so the server gets a 3× budget (see Ch18) to send big responses; (2) PMTU floor — if path MTU < 1200, the packet drops and the client falls back fast. The 1200 figure is directly defined by RFC 9000 as the floor (it is not derived from IPv6's 1280-byte MTU: IPv6 header 40 + UDP header 8 = 48, so 1280 − 48 = 1232, leaving 32 bytes more than 1200). RFC chose 1200 to leave that 32-byte cushion for IPv4 tunnels, 6to4, IPsec and other encapsulations.
FIG 06·2 · QUIC packet · BIT-LEVEL · long + short headers QUIC 数据包 bit-level 布局 · Long Header vs Short Header Bit-level breakdown of QUIC long header (Initial / 0-RTT / Handshake / Retry) and short header (1-RTT), showing every flag bit per RFC 9000 §17.2 and §17.3 — Header Form, Fixed Bit, Long Packet Type / Spin Bit, Reserved Bits, Key Phase, Packet Number Length, then field-level layout below. QUIC PACKET · BIT-LEVEL · per RFC 9000 §17 LONG HEADER · first byte · 8 bits used by Initial / 0-RTT / Handshake / Retry (RFC 9000 §17.2) 1 Header Form 1 Fixed Bit TT Long Packet Type RR Reserved (HP) PP PN Length 7 6 5 4 3 2 1 0 BITS · explained 7 Header Form: 1 = long 6 Fixed Bit: must be 1 5-4 Type: 00=Initial · 01=0-RTT · 10=Handshake · 11=Retry 3-2 Reserved: must be 00 after HP removal · trap if not 1-0 PN Length: 00=1B · 01=2B · 10=3B · 11=4B (encrypted by HP) LONG HEADER · fields after first byte Version 32 bit · v1 = 0x01 DCID Len 8 bit DCID 0-20 byte · routing ID SCID Len 8 bit SCID 0-20 byte · sender ID Type-specific Token (Initial) · Retry token · etc. Length var-int · 1-8 byte PN 1-4 B (HP) Payload AEAD encrypted SHORT HEADER · first byte · 8 bits used by 1-RTT (post-handshake) (RFC 9000 §17.3) 0 Header Form 1 Fixed Bit S Spin Bit ⚐ RR Reserved (HP) K Key Phase PP PN Length 7 6 5 4 3 2 1 0 BITS · explained 7 Header Form: 0 = short 6 Fixed Bit: must be 1 5 Spin Bit: latency obs · flipped each RTT 4-3 Reserved (HP) 2 Key Phase: flips on key update 1-0 PN Length (HP) SHORT HEADER · fields DCID (no length byte!) 0-20 byte · length pre-agreed PN 1-4 B (HP) Payload (AEAD encrypted) CRYPTO + STREAM + ACK + ... frames ↑ short header 比 long header 省 ~ 18 字节 / 包(无 Version / DCID Len / SCID Len+SCID / Length)。这是 1-RTT 阶段的小账。 HEADER PROTECTION (HP) · the bits marked (HP) are encrypted
Reserved bits + PN Length bits + the PN field are XOR-ed with a 5-byte mask derived from AES-ECB(hp_key, sample) where sample is 16 ciphertext bytes at packet offset 4. Middleboxes see only the Header Form / Fixed Bit / Spin Bit / Key Phase — enough to route, not enough to read PN for traffic analysis.

QUIC 包的第一字节 8 bit 全部承载结构信息:Header Form 区分长短 header、Fixed Bit 必须为 1(防 NAT 偶然匹配)、长 header 中 2 bit Long Packet Type 区分 Initial/0-RTT/Handshake/Retry、短 header 用其中 1 bit 作 Spin Bit 给运营商看 RTT、Key Phase 1 bit 标记密钥轮换。Reserved 和 PN Length 这 4 bit 是加密的(Header Protection),让中间盒看不见 PN 做流量分析。 RFC 9000 §17 全部就在这两个字节里。

The first byte of every QUIC packet carries all the routing structure: Header Form distinguishes long/short, Fixed Bit (must be 1) prevents accidental NAT collisions, the 2 Long Packet Type bits distinguish Initial/0-RTT/Handshake/Retry, while short header repurposes one bit as Spin Bit (operator RTT observability) and another as Key Phase (key-rotation marker). Reserved + PN Length (4 bits) are encrypted by Header Protection so middleboxes can't read packet numbers for traffic analysis. RFC 9000 §17 entirely lives in these two bytes.

Header Protection · 头部加密

Header Protection

QUIC 不只加密 payload,还加密 packet number 和 flags 的最低几位。具体做法:取 payload 加密后的密文取 16 字节"样本",用对应级别的密钥跑 AES-ECB 派生出一个 mask,把 mask 异或到 PN 和 flags 上。这一层"header protection"专门防中间盒读取 PN 做流量分析。

QUIC encrypts not only the payload but also the packet number and the low bits of the flags. The recipe: take a 16-byte sample of the ciphertext payload, run AES-ECB with the level's HP key to derive a mask, XOR the mask onto PN and flags. This "header protection" specifically defeats middleboxes that would otherwise read PN for traffic analysis.

DEVTOOLS 在 Chrome 看 H3 的最直接方法:DevTools → Network → 打开 Protocol 列。一行 h3 = 走的 HTTP/3。如果你看到 h2,说明被某个环节挡了——浏览器走了 TCP fallback。要查为什么,跑 chrome://net-export/ 导出 NetLog 再用 netlog-viewer.appspot.com 看。 The quickest way to see H3 in Chrome: DevTools → Network → enable the Protocol column. A row marked h3 = HTTP/3. If it says h2, something blocked you and the browser fell back to TCP. To diagnose, dump chrome://net-export/ and load it into netlog-viewer.appspot.com.
CHAPTER 07

握手 — 1-RTT 与 TLS 1.3 的合体

Handshake — 1-RTT and the TLS-1.3 merger

QUIC 不是 TLS over UDP,而是 QUIC carrying TLS

QUIC isn't TLS over UDP, QUIC carries TLS

在主线里
In our request
T+20..100ms
Layer
QUIC + TLS 1.3
RFC
9001 · 8446
输出
Output
1-RTT keys

把 HTTP/2 干掉的"2-RTT 起步"是 HTTP/3 最大的卖点。但要真正理解为什么 HTTP/3 能做到 1-RTT(重连 0-RTT),你需要看清 QUIC 和 TLS 1.3 是怎么融合的:不是上下层堆叠,而是 QUIC 用 CRYPTO 帧承载 TLS 1.3 的握手 records,让握手和应用数据共用一个 RTT。

Killing the "2-RTT minimum" left over from HTTP/2 is HTTP/3's biggest selling point. To really see why HTTP/3 hits 1-RTT (and 0-RTT on resumption), you need to look at how QUIC and TLS 1.3 merge: not as stacked layers, but QUIC carrying TLS 1.3 handshake records inside CRYPTO frames, letting handshake and application data share a single RTT.

◇ 在我们的 GET 请求里 · 主线阶段 1 + 2◇ In our GET request · Phase 1 + 2

INPUT
empty TLS state客户端首发 ClientHello(含 key_share、ALPN=h3、TP)client sends ClientHello (key_share, ALPN=h3, TP)
OUTPUT
4 key sets + HANDSHAKE_DONEInitial / 0-RTT / Handshake / 1-RTT 四级密钥派生完成Initial / 0-RTT / Handshake / 1-RTT keys all derived

完整 1-RTT 时序

Full 1-RTT timeline

Client Server Initial[CRYPTO: ClientHello, ALPN=h3] + PADDING to 1200 bytes · PN_Initial = 0 Initial[CRYPTO: ServerHello] + Handshake[CRYPTO: EE, Cert, CertVerify, Finished] Handshake[CRYPTO: Finished] + 1-RTT[STREAM 0: GET /] data piggybacks on the handshake 1-RTT[STREAM 0: 200 OK + body] 1 RTT
FIG 07·1 QUIC 1-RTT 握手时序 · 三种颜色 = 三个 PN 空间 · 注意客户端发送 Finished 时同包带了 GET。 Fig 07·1 · QUIC 1-RTT handshake · three colours = three PN spaces · note the client packs GET inside the same datagram as Finished.
FIG 07·2 · full handshake · QUIC packets × TLS messages 完整 1-RTT 握手消息流 Every QUIC packet exchanged during a 1-RTT handshake, with each TLS 1.3 message and which CRYPTO offset it occupies, plus annotations on which key set protects which packet and when each key set becomes available. 1-RTT HANDSHAKE · QUIC PACKETS × TLS MESSAGES × KEY SETS CLIENT SERVER KEYS AVAILABLE Initial only + Handshake + 1-RTT ① Initial[CRYPTO: ClientHello] · 1200 B padded TLS msg: ClientHello · 0..516 · KeyShare, ALPN=h3, SNI, TP, signature_algos Protected by: Initial keys (salt-derived, public) + PADDING to ≥ 1200 B · anti-amplification ② Coalesced UDP datagram · 3 QUIC packets concatenated Initial[CRYPTO: ServerHello] · 0..120 → derives handshake_secret · keys 0..3 Handshake[CRYPTO: EE+Cert+CertVerify · 0..1100] Encrypted Extensions · server TP · Cert chain · CertVerify sig Handshake[CRYPTO: Finished · 1100..1132 + ACK] verify_data = HMAC(handshake_secret, transcript_hash) · 32 B handshake keys live ③ Coalesced · client confirms handshake AND opens 1-RTT Handshake[CRYPTO: Finished · 0..32] client Finished → derives master_secret → 1-RTT keys 1-RTT[STREAM 0: GET / HTTP/3] Encrypted with 1-RTT keys · client may send before server confirms ③a 1-RTT keys live on CLIENT ④ Server confirms handshake done, returns HTTP/3 response 1-RTT[HANDSHAKE_DONE + NEW_CONNECTION_ID×3 + ACK] Initial / Handshake keys are now safe to discard 1-RTT[STREAM 0: HTTP/3 HEADERS + DATA (200 OK + body)] first 3 KB of response, on the same UDP packet pair PACKETS × KEYS × TLS MESSAGES · summary table QUIC packet type key set TLS 1.3 messages inside (CRYPTO frames) when Initial Initial ClientHello (CH); ServerHello (SH) round 1 + 2a Handshake Handshake EE, Certificate, CertVerify, Finished (both sides) round 2b/c + 3a 0-RTT 0-RTT (none — carries STREAM frames, not TLS) resumption only 1-RTT 1-RTT NewSessionTicket (later); app data; HANDSHAKE_DONE round 3b onwards Retry (unencrypted) (none — stateless cookie response) anti-DoS only → 1 RTT total: round 1 = T₀, round 2 = T+½ RTT, round 3 starts at T+1 RTT, round 4 returns at T+1½ RTT. Application data ready at T+1 RTT (client sends first GET).

完整的 1-RTT 握手包含 4 个 QUIC 包(实际是 6 个 packet 合并到 4 个 UDP datagram 里),每个包用不同的密钥集合保护:Initial keys(对所有人公开)→ Handshake keys(server 的 ServerHello 后)→ 1-RTT keys(双方 Finished 后)。TLS 1.3 的 ClientHello / ServerHello / EncryptedExtensions / Certificate / CertVerify / Finished 通过 CRYPTO 帧载入,但不直接组成 TLS record——QUIC 自己处理重传 + 排序。整个握手在 1 RTT 内完成,client 可以在它发完自己的 Finished 那一刻(round 3b)就开始发 HTTP request,不必等 server 确认。

A full 1-RTT handshake involves 4 QUIC packets (actually ~ 6 packets coalesced into 4 UDP datagrams), each protected by a different key set: Initial keys (publicly derivable) → Handshake keys (after ServerHello) → 1-RTT keys (after both sides' Finished). TLS 1.3's ClientHello / ServerHello / EncryptedExtensions / Certificate / CertVerify / Finished are carried inside CRYPTO frames — but not assembled into actual TLS records; QUIC handles ordering and retransmission itself. The full handshake completes in 1 RTT; the client may start sending its first HTTP request the moment it finishes its own Finished (round 3b), without waiting for the server's confirmation.

TLS 1.3 进了哪里?

Where does TLS 1.3 live?

TLS records
TLS records
ClientHello / SH / EE …
QUIC 帧
QUIC frame
CRYPTO
QUIC 包
QUIC packet
Initial / Handshake
物理
Physical
UDP / IP
FIELD NOTE · 不是 "TLS over QUIC" FIELD NOTE · NOT "TLS over QUIC" RFC 9001 的标题刻意叫 "Using TLS to Secure QUIC"。TLS 1.3 在 QUIC 里只剩下两个角色:(1) 密钥协商引擎——产出 4 套密钥(Initial / 0-RTT / Handshake / 1-RTT);(2) 身份认证——证书链、CertVerify、Finished。TLS 1.3 的 record layer 整个被砍掉了——QUIC 自己做加密包装。这就是为什么 OpenSSL 不能直接用,所有 QUIC 库都用 BoringSSL fork 或者 quictls / s2n。 RFC 9001 is titled "Using TLS to Secure QUIC" on purpose. Inside QUIC, TLS 1.3 plays only two roles: (1) key-agreement engine — produces four key sets (Initial / 0-RTT / Handshake / 1-RTT); (2) identity authentication — certificate chain, CertVerify, Finished. TLS 1.3's record layer is stripped down — QUIC handles the packet wrapping itself. This is why stock OpenSSL doesn't work: every QUIC implementation forks BoringSSL, quictls or s2n.

为什么是 1-RTT?

Why 1-RTT?

Protocolhandshake+ first data合计total
TCP + TLS 1.21 RTT (SYN) + 2 RTT (TLS)+ 1 RTT4 RTT
TCP + TLS 1.31 RTT (SYN) + 1 RTT (TLS)+ 1 RTT3 RTT
TCP Fast Open + TLS 1.30.5 RTT (TFO) + 1 RTT+ 1 RTT2 RTT*
QUIC + TLS 1.3 (1-RTT)1 RTT (handshake + data)1 RTT
QUIC + TLS 1.3 (0-RTT)0 RTT (data on first packet)0.5 RTT

📖 RFC 9000 §18 · Transport Parameters 拆解📖 RFC 9000 §18 · Transport Parameters dissected

握手期间客户端和服务器各自声明一组 transport parameters(TP),夹在 TLS ClientHello / EncryptedExtensions 的扩展里。这是整个连接生命周期里所有窗口、超时、限额的源头。下面是 18 个标准参数中最关键的 11 个:

During handshake both sides declare a set of transport parameters (TP), wrapped inside TLS ClientHello / EncryptedExtensions extensions. This is the single source of truth for every window, timeout, and limit in the connection's lifetime. Eleven of the eighteen standard parameters that actually matter:

id · name含义MeaningChrome 默认
0x01 max_idle_timeout空闲超时(取双方最小值)idle timeout (min of both)30 s
0x02 stateless_reset_token用于 §10.3 无状态重置used by §10.3 stateless reset16 B random
0x03 max_udp_payload_size能接受的最大 UDP 载荷max UDP payload accepted1452
0x04 initial_max_data连接级流控窗connection-level flow window10 MB
0x05 init_max_stream_data_bidi_local本方主动开的双向流的初始窗stream window for streams we open6 MB
0x06 init_max_stream_data_bidi_remote对方开的双向流streams peer opens6 MB
0x07 init_max_stream_data_uni单向流unidirectional streams6 MB
0x08 initial_max_streams_bidi允许并发双向流数concurrent bidi stream cap100
0x09 initial_max_streams_uni单向流数uni stream cap100
0x0b max_ack_delay最大 ACK 拖延(影响 PTO)max ACK delay (drives PTO)25 ms
0x0c disable_active_migration禁用主动迁移(手机选 false)opt-out of active migrationfalse
0x0e active_connection_id_limit允许对端预存的 CID 数peer's CID pool size8
0x20 max_datagram_frame_sizeDATAGRAM 帧最大长度(默认 0 = 不启用)DATAGRAM frame max (0 = disabled)0 / 1200
§18 · 关键约束 TP 不是协商,是声明——每方独立宣布自己接受什么。生效值是两个声明的更严限制。比如双方都给 max_idle_timeout=30s ⇒ 30s 起效;如果客户端说 30s 服务器说 10s,10s 生效有些参数(如 disable_active_migration)只有服务器能发,客户端发了就是协议违反。 TP is not a negotiation — it's declarations. Each side independently states what it will accept. The effective value is the tighter of the two. Both say max_idle_timeout=30s ⇒ 30s wins; client says 30s, server says 10s ⇒ 10s wins. Some parameters (like disable_active_migration) are server-only; a client sending them is a protocol violation.

* TFO 的 cookie 在路上经常被中间盒丢,工程界一般不把它算成"真的可用"。

* TFO cookies frequently get stripped by middleboxes; in practice not considered "really usable".

本章引用Chapter references
RFC
RFC 9001 · Using TLS to Secure QUIC
RFC
RFC 8446 · TLS 1.3
CHAPTER 08

0-RTT — 把请求塞进握手包里

0-RTT — stuffing the request inside the handshake

免费的午餐,但有重放的尾巴

a free lunch, with a replay-attack tail

在主线里
In our request
2nd visit · T+0ms
触发
Trigger
session resumption
RFC
9001 §4.1 · 8470
省下
Saves
1 RTT

第一次访问 ursb.me 之后,服务器在 1-RTT 握手末尾发了一个 NewSessionTicket——这是一段被服务器密钥加密的 blob,里面装着 PSK。Chrome 把它存起来。下次再访问 ursb.me,Chrome 把 ticket 重新发回去,同时把 GET 请求用 PSK 派生的 0-RTT 密钥加密、放进 Early Data 一起发出去——握手第 0 个 RTT 应用数据就在路上了。

After the first visit to ursb.me, the server appends a NewSessionTicket at the tail of the 1-RTT handshake — an opaque blob encrypted by the server's own key, containing a PSK. Chrome stores it. On the next visit, Chrome ships the ticket back, and simultaneously encrypts the GET request with the PSK-derived 0-RTT key and sends it as Early Data — application bytes are flying before RTT 1.

◇ 在我们的 GET 请求里 · 主线阶段 1(恢复)◇ In our GET request · Phase 1 (resumption)

INPUT
PSK ticket(age 2h)PSK ticket (age 2h)上次访问遗留的 session ticketleftover session ticket
OUTPUT
0-RTT 密钥 + Early Data 许可0-RTT key + Early Data permissionCH 和 GET 拼进同一个 UDP 包CH + GET in same UDP datagram

0-RTT 时序

0-RTT timeline

Client Server Initial[CRYPTO: ClientHello+PSK] + 0-RTT[STREAM 0: GET /] one UDP datagram, two QUIC packets coalesced, different keys Initial[SH] + Handshake[FIN] + 1-RTT[STREAM 0: 200 OK + body] handshake completion piggybacks the response body ~0.5 RTT
FIG 08·1 0-RTT 时序:握手包和 GET 共一份 UDP 数据报;客户端到服务器只有半个 RTT就发完整个请求。 Fig 08·1 · 0-RTT timeline: handshake and GET share one UDP datagram; the request reaches the server after half an RTT.

重放风险

The replay risk

0-RTT 的 PSK 没有新鲜度。攻击者可以录下你的第一个 UDP 包,重发任意多次——服务器无法区分"是你"还是"录像回放"。对查询型 GET 没问题(重复也是同一个结果),但如果是 POST /transfer/100USD,重放就是一百次转账

The 0-RTT PSK carries no freshness. An attacker can record your first UDP datagram and replay it forever — the server can't tell "you" from "tape rewind". Fine for an idempotent GET (same answer). Catastrophic for POST /transfer/100USD — that's a hundred transfers.

三道防线

The three defences

FIG 08 · 0-RTT replay · attack vs 3 defences 0-RTT replay 攻击与三道防线的时序 Timeline showing how a captured 0-RTT request can be replayed by an attacker indefinitely without freshness, and how three defence layers (client method whitelist, server Early-Data header, TLS time-window + anti-replay cache) block different attack windows. 0-RTT REPLAY · ATTACK WINDOW vs THREE DEFENCES time → T₀ T₀ + 5s (legit GET sent) T₀ + 15s (attacker captures) T₀ + 60s (replay #1) T₀ + 30min (replay #N) LEGIT session ticket issued by server 0-RTT GET / ursb.me client uses ticket → arrives in 0 RTT ATTACKER REC capture UDP datagram (MITM or wifi sniff) REPLAY · same bytes, no freshness ① CLIENT · METHOD WHITELIST Chrome/Firefox only enable 0-RTT for GET / HEAD. POST/PUT/DELETE never enters this timeline — replay window doesn't open for state-changing methods. cuts off here for write methods ② SERVER · EARLY-DATA HEADER (RFC 8470) Edge adds Early-Data: 1 on 0-RTT requests forwarded upstream. App may reject with 425 Too Early for risky endpoints. Per-request decision, replay or not. ③ TLS · TIME WINDOW + ANTI-REPLAY CACHE Server accepts 0-RTT only in a narrow window (≤ 10s after ticket issuance) AND records (PSK_ID, ClientHello hash) in a Redis-style dedup cache. Replay after T+10s → handshake fallback. Coverage: cuts the > 10s replay window entirely. The first replay within 10s is caught by the dedup cache. 10s window ends here

0-RTT 的无新鲜性意味着攻击者捕获一次 UDP 数据报后可以无限重放。三道防线各自切断不同的攻击窗: 客户端方法白名单从根上阻止写操作进入 0-RTT 旅程; 服务器 Early-Data: 1 头让应用层按请求决定是否拒收; TLS 层 10 秒时间窗 + 反重放 dedup cache从底层 切断 > 10s 重放和 10s 内重发。三层叠加,Cloudflare 实测能把 0-RTT 重放风险压到接近 0。

0-RTT's lack of freshness means an attacker who captures one UDP datagram can replay it indefinitely. Three layers cut different windows: client method whitelist prevents write methods from entering the 0-RTT path at all; the server Early-Data: 1 header lets the app layer decide per request whether to accept; TLS-layer 10s time window + anti-replay dedup cache structurally kills both > 10s replays and re-sends inside 10s. Stacked, Cloudflare's measurements push residual replay risk close to zero.

浏览器侧 · 方法白名单
Client · method whitelist
Chrome / Firefox 只在幂等方法(GET、HEAD)上启用 0-RTT。POST / PUT / DELETE 一律退到 1-RTT。
Chrome / Firefox only enable 0-RTT on idempotent methods (GET, HEAD). POST / PUT / DELETE fall back to 1-RTT.
服务器侧 · Early-Data 头
Server · Early-Data header
RFC 8470 规定:服务器把 0-RTT 收到的请求转给上游时,加一行 Early-Data: 1。应用层(如 Cloudflare Worker)看到这行可以决定"不处理"或"返回 425 Too Early"。
RFC 8470: when forwarding 0-RTT-arrived requests upstream, the server adds Early-Data: 1. The application layer (e.g. Cloudflare Worker) can then choose "don't process" or "return 425 Too Early".
TLS 侧 · 时间窗 + 反重放缓存
TLS · time window + anti-replay cache
服务器只在 ticket 发出后的有限时间窗(一般 ≤ 10 秒)接受 0-RTT,并在 Redis-like 缓存里记下"已经见过的 PSK ID"做去重。Cloudflare 用 BoringSSL 的 SSL_CTX_set_early_data_enabled + 集群级 deduper。
The server accepts 0-RTT only inside a narrow time window after ticket issuance (typically ≤ 10s), backed by a Redis-style cache recording "PSK IDs already seen" for dedup. Cloudflare uses BoringSSL's SSL_CTX_set_early_data_enabled plus a cluster-wide deduper.
CASE · CLOUDFLARE
Cloudflare 的 0-RTT 策略
Cloudflare's 0-RTT policy

Cloudflare 默认对所有客户开启 0-RTT,但仅限 GET/HEAD 且 URL 中不含 query string(query 经常是状态变更动作)。如果客户的 origin 返回 Cache-Control: privateSet-Cookie,Cloudflare 边缘自动把请求升级到 1-RTT 才转给 origin。Cloudflare 的工程博客《Even faster connection establishment with QUIC 0-RTT resumption》给出的实测:0-RTT 让已经访问过的回访用户首字节延迟(TTFB)的中位数降低 ~50ms

Cloudflare enables 0-RTT for all customers by default, but only for GET/HEAD requests without a query string (queries are often state-changing). If the origin returns Cache-Control: private or Set-Cookie, the Cloudflare edge auto-promotes the request to 1-RTT before forwarding upstream. Per their blog «Even faster connection establishment with QUIC 0-RTT resumption», 0-RTT lowers median TTFB for returning users by ~50ms.

CHAPTER 09

加密分层 — 4 套密钥的精确边界

Crypto layers — the exact boundaries of 4 key sets

为什么 Initial 包"加密"但任何人都能解密

why Initial packets are "encrypted" yet anyone can decrypt them

在主线里
In our request
T+0..140ms
Layer
QUIC · key schedule
RFC
9001 §5 · §7
关键
Key idea
salt → HKDF → keys

◇ 在我们的 GET 请求里 · 主线阶段 1-2(密钥派生)◇ In our GET request · Phase 1-2 (key schedule)

INPUT
TLS handshake secretsTLS handshake secretsmaster_secret · handshake_secret · early_secretmaster_secret · handshake_secret · early_secret
OUTPUT
4 级 × 3 = 12 个量4 levels × 3 = 12 values(key 16B, iv 12B, hp 16B) for Initial/0-RTT/Handshake/1-RTT(key 16B, iv 12B, hp 16B) for Initial/0-RTT/Handshake/1-RTT
FIG 09 · QUIC key sets · 4 levels · HKDF tree QUIC 四套密钥的派生 + 生命周期 Timeline showing how QUIC derives Initial / 0-RTT / Handshake / 1-RTT key sets from different TLS 1.3 secrets, when each key set is used, and which packet types it protects. QUIC KEY SCHEDULE · 4 SECRET → 4 KEY SETS · PER RFC 9001 §5 TLS 1.3 SECRETS (from RFC 8446 §7.1) initial_secret HKDF(salt, client DCID) public salt — no secrecy early_secret HKDF(PSK, "") from resumption ticket handshake_secret HKDF(early, ECDHE) post-ServerHello master_secret HKDF(handshake, "") post-Finished FOUR QUIC KEY SETS · each set = { key 16B, iv 12B, hp_key 16B } ① Initial keys protects: Initial packets ClientHello, ServerHello, Retry, anti-amp ACKs "encrypted" — but salt is public! ② 0-RTT keys protects: 0-RTT packets early data (GET / HEAD) on resumed sessions optional · ticket-derived ③ Handshake keys protects: Handshake packets EE, Cert, CertVerify, Finished, mid-handshake ACKs truly encrypted ④ 1-RTT keys protects: 1-RTT packets all application data post-handshake forward-secret LIFETIME · when each set is alive T₀ ClientHello ServerHello (~1 RTT) Finished (handshake done) application data close ① Initial keys (discarded after Handshake start) ② 0-RTT keys (optional · early data only) ③ Handshake keys (mid-handshake → discarded after Finished) ④ 1-RTT keys (forever · until close/key-update) KEY UPDATE · rotating 1-RTT keys without re-handshake
QUIC's Key Phase bit (in the short header) is the only signal that 1-RTT keys have been rotated. Either side may initiate a rotation by:
① flipping its Key Phase bit on outgoing packets; ② receiver derives the next key set from application_traffic_secret_{N+1} = HKDF(application_traffic_secret_N, "quic ku"); ③ acks the change.
Rules: at most one update per RTT (anti-flap), receiver tolerates packets from either phase during transition, must drop old keys after a few RTT to bound exposure. Used to defend against long-term key compromise.

QUIC 用四套独立密钥 而非一套——每套包含 key 16B + iv 12B + hp_key 16B。四套对应 TLS 1.3 内部的四级 secret(initial / early / handshake / master),由 HKDF 派生。Initial keys 的"salt" 是公开的——所以 Initial 包所有人都能解密;真正的保密从 Handshake keys 开始。整套设计让每一类包用恰好够用的安全等级,handshake 期间能跨级别同时收发。Key update 通过短 header 的 Key Phase bit 不重握手即可轮换 1-RTT keys。

QUIC uses four independent key sets, not one — each set contains key 16B + iv 12B + hp_key 16B. The four mirror TLS 1.3's secret hierarchy (initial / early / handshake / master), derived via HKDF. Initial keys use a public salt — so Initial packets are decryptable by anyone; real secrecy starts at Handshake keys. The design grants each packet class exactly the security level it needs while allowing cross-level packets to flow during handshake. Key update rotates 1-RTT keys without a re-handshake via the short header's Key Phase bit.

四套密钥的派生时点

When each key set is derived

① INITIAL
公开 salt
public salt
PN_Initial
  • salt = 0x38762cf7…
  • HKDF(salt, DCID)
  • 仅防中间盒,不防窃听middlebox-proof only
② EARLY DATA (0-RTT)
PSK-derived
PN_Application*
  • 从上次 ticket 的 PSK 派生derived from last session's PSK
  • 客户端单向使用client → server only
③ HANDSHAKE
DH 之后派生
post-DH derived
PN_Handshake
  • TLS DH 完成后立即派生derived after TLS DH
  • 用于 EE/Cert/FINprotects EE/Cert/FIN
④ 1-RTT (APPLICATION)
主密钥
primary key
PN_Application
  • 承载 99% 数据carries 99% of data
  • 支持 KEY_UPDATE 滚动supports KEY_UPDATE rotation
WHY "INITIAL" IS PUBLIC Initial 包的密钥从一个公开的 salt(RFC 9001 §5.2 写明 0x38762cf7…f5b8)+ 客户端选的 DCID 派生。任何人都能算出来。所以 Initial 包的"加密"不是防窃听——它防的是"中间盒看了 ClientHello 之后做出不该做的事"。这是反僵化策略落在密钥层的体现。 Initial packets derive their keys from a public salt (RFC 9001 §5.2 spells out 0x38762cf7…f5b8) + the client-chosen DCID. Anyone can compute them. So Initial-packet "encryption" does not protect confidentiality — it protects against "middleboxes peeking at ClientHello and then acting on what they saw". This is anti-ossification at the key-schedule layer.

Key Schedule 全图

Full key schedule

QUIC v1 key derivation · RFC 9001 §5HKDF · TLS_AES_128_GCM_SHA256
; Step 1 · Initial keys (公开) initial_salt = 0x38762cf7f55934b34d179ae6a4c80cadccbb7f0a ; (QUIC v1) initial_secret = HKDF-Extract(initial_salt, DCID) client_initial_secret = HKDF-Expand-Label(initial_secret, "client in") server_initial_secret = HKDF-Expand-Label(initial_secret, "server in") ; Step 2 · TLS 1.3 secrets (handshake DH 之后) handshake_traffic_secret = TLS-derive(... DHE ...) client_hs_secret = HKDF-Expand-Label(handshake_secret, "c hs traffic") server_hs_secret = HKDF-Expand-Label(handshake_secret, "s hs traffic") ; Step 3 · 1-RTT (application) secrets client_app_secret = HKDF-Expand-Label(master_secret, "c ap traffic") server_app_secret = HKDF-Expand-Label(master_secret, "s ap traffic") ; Each secret then derives: key = HKDF-Expand-Label(secret, "quic key", 16 bytes) ; AEAD key iv = HKDF-Expand-Label(secret, "quic iv", 12 bytes) ; AEAD nonce base hp = HKDF-Expand-Label(secret, "quic hp", 16 bytes) ; header-protect key
FIELD NOTE · 抓包要点 FIELD NOTE · Decoding capture Wireshark 解 QUIC 必须有 SSLKEYLOGFILE:浏览器把每一级的 secret 写到这个文件,Wireshark 读了之后能解所有 4 级。在 macOS 启动 Chrome:SSLKEYLOGFILE=~/keys.log /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome。一旦 Encrypted Client Hello(ECH,draft-ietf-tls-esni) 进入稳定版(Cloudflare 2023 已开,Chrome 117+ 默认),这条招就只能拿到 outer ClientHello,真正的 SNI 在 inner ClientHello 里被 HPKE 加密 Wireshark needs SSLKEYLOGFILE to decrypt QUIC: the browser writes each level's secret to that file, and Wireshark can decode all four. On macOS, launch Chrome with SSLKEYLOGFILE=~/keys.log /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome. Once Encrypted Client Hello (ECH, draft-ietf-tls-esni) stabilises (Cloudflare turned it on in 2023; Chrome 117+ ships it on by default), this trick only yields the outer ClientHello — the real SNI lives inside an HPKE-encrypted inner ClientHello.
本章引用Chapter references
RFC
RFC 9001 · §5 Key schedule
CHAPTER 10

帧字典 — 28 种 QUIC 帧

Frame catalog — 28 kinds of QUIC frame

payload 不是字节流,是帧的串联

payload isn't a byte stream, it's a chain of frames

Layer
QUIC
RFC
9000 §19 · 9221
总数
Total
28 + DATAGRAM
结构
Form
Type · varint · payload

解密一个 QUIC 包的 payload,你得到的不是"一段数据",而是一串。每个帧自带类型和长度——服务器和客户端按顺序处理。下面把 RFC 9000 §19 的全部帧(加上 RFC 9221 的 DATAGRAM)整理成四类,让骨架可见。

Decrypt a QUIC packet's payload and you don't get "a chunk of data" — you get a chain of frames. Each carries its own type and length; both ends process them in order. Below is the full RFC 9000 §19 catalogue (plus RFC 9221 DATAGRAM), sorted into four families.

◇ 在我们的 GET 请求里 · 主线阶段 1, 3, 4◇ In our GET request · Phases 1, 3, 4

INPUT
HTTP 语义 + 控制意图HTTP semantics + control intentsGET+ 流控更新 + ACK + 探测
OUTPUT
payload 里的一串帧a chain of frames in the payloadSTREAM · ACK · MAX_S_D · CRYPTO · PADDING · …STREAM · ACK · MAX_S_D · CRYPTO · PADDING · …

四大族 · The four families

The four families

族 1 · 控制
Family 1 · Control
连接生命周期
connection lifecycle
8
  • PADDING (0x00)
  • PING (0x01)
  • CONNECTION_CLOSE (0x1c-1d)
  • HANDSHAKE_DONE (0x1e)
  • NEW_TOKEN (0x07)
  • NEW_CONNECTION_ID (0x18)
  • RETIRE_CONNECTION_ID (0x19)
  • PATH_CHALLENGE / PATH_RESPONSE (0x1a-1b)
族 2 · 可靠性
Family 2 · Reliability
ACK 与丢包
acks & loss
2
  • ACK (0x02)
  • ACK_ECN (0x03)
族 3 · 流 & 流量控制
Family 3 · Streams & flow control
应用数据载体
application data carrier
12
  • STREAM (0x08-0x0f · 8 variants)
  • RESET_STREAM (0x04)
  • STOP_SENDING (0x05)
  • MAX_DATA / DATA_BLOCKED (0x10/0x14)
  • MAX_STREAM_DATA / STREAM_DATA_BLOCKED
  • MAX_STREAMS / STREAMS_BLOCKED
族 4 · 密码学 + 扩展
Family 4 · Crypto + ext.
握手 / 不可靠数据
handshake / datagram
2
  • CRYPTO (0x06)
  • DATAGRAM (0x30-0x31, RFC 9221)
DATAGRAM · RFC 9221 §5 DATAGRAM 帧是 QUIC 唯一不可靠的载荷——不重传、不流控、不排序。最大长度受 SETTINGS 的 max_datagram_frame_size 限(默认禁用,需要双方协商)。它存在的全部理由是 WebTransport / MASQUE / Media-over-QUIC 这种"宁愿丢一帧也别等"的实时场景。普通 HTTP/3 流量根本不该碰它。 The DATAGRAM frame is QUIC's only unreliable payload — no retransmit, no flow control, no ordering. Max size capped by the max_datagram_frame_size SETTING (disabled by default, must be negotiated). Its sole purpose is to enable WebTransport / MASQUE / Media-over-QUIC — the "better-drop-than-wait" real-time use cases. Plain HTTP/3 traffic should never touch it.

STREAM 的 8 种变体

The eight STREAM variants

STREAM 帧的低 3 位编码了三个独立开关:OFF(带不带偏移量)/ LEN(带不带长度)/ FIN(是不是流尾)。2³ = 8 个 type 编码 0x08-0x0f。

The low 3 bits of a STREAM frame encode three independent flags: OFF (carries an offset?), LEN (carries a length?), FIN (is this stream's last byte?). 2³ = 8 type codes 0x08-0x0f.

0x08 STREAM
0x09 + FIN
0x0a + LEN
0x0b + LEN + FIN
0x0c + OFF
0x0d + OFF + FIN
0x0e + OFF + LEN
0x0f + OFF + LEN + FIN

一个 UDP 数据报里有什么

What's inside one UDP datagram

UDP DATAGRAM src=52341 · dst=443 · len=1224 · cksum=… → payload (1216 bytes) goes into a QUIC packet QUIC PACKET · 1-RTT Flags 1 B (HP) DCID 8 B PN 1-4 B (HP) ENCRYPTED PAYLOAD (AEAD-sealed) → decrypts to a chain of frames ▼ decrypt FRAME CHAIN · concat, processed left → right ACK type=0x02 acks PN 100-105, 90 STREAM (+OFF+LEN) type=0x0e · sid=0 · off=512 · len=200 data: "<html>...</html>" (stream A body bytes) STREAM (+LEN+FIN) type=0x0b · sid=4 · len=42 data: response trailer FIN ✓ MAX_S_D type=0x11 sid=0 max=1MB PADDING type=0x00 × N pad to MTU ~280 B ▸ same packet can carry frames for many streams ▸ a stream's bytes can be split across many packets
FIG 10·1 从 UDP 数据报 → QUIC packet → 帧链 · 一个 packet 可以承载多种帧、跨多条流。 Fig 10·1 · UDP datagram → QUIC packet → frame chain · one packet can carry many frame types across many streams.
VARINT QUIC 几乎每个长度字段都用可变长整数(var-int)编码:前 2 位决定占 1/2/4/8 字节。0x37 = 55;0x40 0x40 = 64;0x80 0x00 0x40 0x00 = 16384。这种"小数小占用"的设计让 ACK 帧之类的小包平均小 30%,是 QUIC 的隐形性能源。 Almost every length field in QUIC uses variable-length integers (var-ints): the top 2 bits decide 1/2/4/8-byte encoding. 0x37 = 55; 0x40 0x40 = 64; 0x80 0x00 0x40 0x00 = 16384. This "small = small" encoding makes small frames like ACKs ~30% smaller on average — an invisible source of QUIC's throughput edge.

ACK 帧的精巧

The elegance of ACK

QUIC 的 ACK 帧比 TCP 的 SACK 强 10 倍。一个 ACK 帧里可以装 Range{largest_ack, [gap, ack_range]*}——告诉对方"我收到了 PN 100,PN 90-95 收到,PN 80-85 收到,..."最多一个 ACK 帧描述整个连接的所有已收。TCP SACK option 因为占在 TCP options 里,最多 4 个 range;QUIC 没限制。

QUIC's ACK frame is 10× more capable than TCP SACK. One ACK frame can pack multiple ranges: {largest_ack, [gap, ack_range]*} — "I have PN 100, 90-95, 80-85, ..." A single ACK frame can describe every received PN of the whole connection. TCP SACK lives in TCP options, capped at 4 ranges; QUIC has no such cap.

CHAPTER 11

流多路复用 — StreamID 的 2-bit 字典

Stream multiplexing — the 2-bit StreamID dictionary

HTTP/2 在应用层做多路复用,HTTP/3 在传输层做

HTTP/2 mux at the app layer, HTTP/3 mux at transport

Layer
QUIC streams
RFC
9000 §2-§4
编码
Encoding
low 2 bits
最大并发
Max concurrent
var-int (≥ 100 default)

◇ 在我们的 GET 请求里 · 主线阶段 3◇ In our GET request · Phase 3

INPUT
4 条逻辑流4 logical streamsrequest bidi + control uni + QPACK enc/decrequest bidi + control uni + QPACK enc/dec
OUTPUT
已分配的 StreamIDallocated StreamIDs0 (req) · 2 (ctrl) · 6 (QPACK enc) · 10 (QPACK dec)0 (req) · 2 (ctrl) · 6 (QPACK enc) · 10 (QPACK dec)

StreamID 编码

StreamID encoding

每个流有一个 var-int 编码的 ID。最低 2 位同时编码两件事:方向(双向 / 单向)+ 发起方(客户端 / 服务器)。

Every stream has a var-int ID. The low 2 bits encode two things at once: direction (bidi / uni) and originator (client / server).

bits编码Encoded含义MeaningHTTP/3 use
0x000, 4, 8, 12, …客户端发起双向流Client-initiated bidi请求流request streams
0x011, 5, 9, 13, …服务端发起双向流Server-initiated bidiHTTP/3 不用unused in H3
0x022, 6, 10, …客户端发起单向流Client-initiated unicontrol · QPACK encoder/decodercontrol · QPACK enc/dec
0x033, 7, 11, …服务端发起单向流Server-initiated unicontrol · QPACK · Pushcontrol · QPACK · Push
主线主线 · 我们的 GET In our main-line 浏览器发起 GET ursb.me/,用的是 Stream ID = 0(第一条客户端双向流)。Chrome 同时打开三条单向流:StreamID=2(H3 control stream)、StreamID=6(QPACK encoder)、StreamID=10(QPACK decoder)。这就是为什么下一章讲 HTTP/3 帧时你会看到"控制流要先开"。 Our GET uses StreamID = 0 (the first client-initiated bidi stream). Chrome simultaneously opens three uni streams: StreamID=2 (H3 control), StreamID=6 (QPACK encoder), StreamID=10 (QPACK decoder). This is why the next chapter says "the control stream must open first".

四条流并行 · 一条连接里的全部

Four streams in parallel · everything inside one connection

ONE QUIC CONNECTION · time → T+100ms → T+250ms Stream 0 client-initiated bidi type 0x00 · request/response HEADERS (req) FIN HEADERS (rsp) DATA · body Stream 2 client uni · control type 0x02 · SETTINGS, GOAWAY SETTINGS PRIORITY_U Stream 6 client uni · QPACK encoder type 0x02 · inserts → server INSERT INSERT INSERT Stream 10 client uni · QPACK decoder type 0x03 · acks ← server ACK ACK | pkt PN=12 | PN=13 | PN=14 | PN=15 | PN=16 Each UDP packet (PN) may carry frames from multiple streams. Each stream is independently flow-controlled.
FIG 11·1 一条 QUIC 连接里的 4 条平行流 · 请求双向流 + 控制流 + QPACK encoder/decoder · 帧可以挤进同一个 packet。 Fig 11·1 · Four parallel streams inside one QUIC connection · request bidi + control + QPACK encoder/decoder · frames coalesce into the same packet.
FIG 11 · stream state machine · sending + receiving QUIC stream 生命周期状态机 Bidirectional stream lifecycle: separate state machines for sending and receiving halves, with state transitions on STREAM (FIN), RESET_STREAM, STOP_SENDING, MAX_STREAM_DATA, and STREAM_DATA_BLOCKED frames. STREAM LIFECYCLE · TWO HALF-MACHINES · PER RFC 9000 §3 SENDING HALF Ready app creates stream, no data yet Send STREAM frames flow Data Sent FIN written, awaiting ACK Data Recvd ✓ all bytes ACK'd · terminal Reset Sent Reset Recvd ✗ RST_STREAM ACK'd first STREAM FIN set all ACK'd RST_STREAM ACK'd RECEIVING HALF Recv stream created upon first STREAM Size Known peer FIN received Data Recvd all bytes contiguous up to FIN Data Read ✓ app consumed · terminal Reset Recvd Reset Read ✗ app saw reset · terminal FIN received all bytes arrive app reads RST_STREAM app sees reset STOP_SENDING ↗ recv side tells sender "stop, I don't want this" → sender transitions to Reset Sent FRAMES THAT DRIVE TRANSITIONS · 6 关键帧
STREAM (0x08-0x0f): carries data; FIN bit ends sending half. 8 variants encode {offset present?, length present?, FIN?}.
RESET_STREAM (0x04): sender aborts; recv goes to Reset Recvd. Final size required.
STOP_SENDING (0x05): recv asks sender to abort. Triggers sender's RESET_STREAM.
MAX_STREAM_DATA (0x11): grants more flow control credit per stream.
STREAM_DATA_BLOCKED (0x15): sender complains "I'm flow-control blocked here".
MAX_STREAMS (0x12-0x13): grants more concurrent stream slots.

QUIC stream 是双向 = 两套独立的半状态机(sending half + receiving half)。每个 endpoint 看到的 stream 有自己的 sending half 和 receiving half——它们独立转移。常规生命周期是 Ready → Send → Data Sent → Data Recvd(发送端)和 Recv → Size Known → Data Recvd → Data Read(接收端)。RESET_STREAM 让发送端进入 Reset Sent;STOP_SENDING接收端请求发送端 reset。这种双向解耦 是 HTTP/3 流多路复用的形式保证。

A QUIC stream is bidirectional = two independent half state machines (sending half + receiving half). Each endpoint sees its own sending and receiving halves of a given stream — they transition independently. Normal lifecycle: Ready → Send → Data Sent → Data Recvd (sender) and Recv → Size Known → Data Recvd → Data Read (receiver). RESET_STREAM sends the sender into Reset Sent; STOP_SENDING lets the receiver ask the sender to reset. This bidirectional decoupling is the formal guarantee of HTTP/3 stream multiplexing.

流量控制 · 两级

Flow control · two levels

STREAM LEVEL
每条流独立
Per-stream

每条流维护 MAX_STREAM_DATA。发送方累积发送的字节超过这个值就停。接收方通过 MAX_STREAM_DATA 帧主动增窗。

Each stream tracks MAX_STREAM_DATA. Sender stops when cumulative sent bytes hit the limit. Receiver grows the window with MAX_STREAM_DATA frames.

CONNECTION LEVEL
整条连接共享
Connection-wide

所有流字节的总和受 MAX_DATA 限。避免单个连接吃光内存。Chrome 默认 6 MB(OkHttp 25 MB · curl 1 MB)。

Sum of all streams' bytes capped by MAX_DATA. Stops one connection from eating all memory. Chrome defaults to 6 MB (OkHttp 25 MB · curl 1 MB).

📖 RFC 9000 §4 · 流量控制公式📖 RFC 9000 §4 · The flow-control formulas

流量控制有两个独立维度,每个维度都跑一组相同的状态变量:

Flow control runs in two independent dimensions, each with the same set of state variables:

QUIC flow control · sender stateRFC 9000 §4.1, §4.2
; Connection level (across ALL streams) conn.max_data ; advertised by peer via MAX_DATA frame conn.bytes_in_flight ; sum of all stream offsets sent INVARIANT: bytes_in_flight ≤ max_data ; otherwise DATA_BLOCKED ; Stream level (per stream) stream.max_data ; peer's MAX_STREAM_DATA(sid, N) stream.offset ; highest byte sent so far INVARIANT: offset ≤ stream.max_data ; else STREAM_DATA_BLOCKED ; Window update strategy (receiver side) WHEN consumed(stream) ≥ stream.window_threshold: stream.max_data += stream_window_size SEND MAX_STREAM_DATA(sid, stream.max_data) ; Chrome's strategy: bump window when receiver consumes half of current stream_window_size = 6 MB ; doubles on bandwidth detection stream_window_threshold = stream.max_data / 2
§4 · 为什么要两级 流级窗防单条流吃光对端内存(比如客户端下大文件,对端 buffer 撑爆)。连接级窗防 100 条流总和吃光内存(每条流只占 60 KB 也能加起来 6 MB)。两个窗任何一个用完,对应流(或整连接)就停发 STREAM 数据——但 PING、ACK、控制帧还能发,连接不会死。这是 QUIC 比 TCP 多了一层的关键。 Stream-level window stops one stream from eating peer memory (client downloads a huge file, peer buffer blows up). Connection-level window stops the sum of all streams from doing the same (100 streams × 60 KB each = 6 MB). When either window empties, that stream (or the whole connection) stops sending STREAM data — but PING, ACK and control frames keep flowing, connection survives. This is the extra layer QUIC adds on top of TCP.
"HTTP/2 用一个 TCP 连接装 100 条流,
HTTP/3 用一个 QUIC 连接装 100 条真正独立的流。"
"HTTP/2 stuffs 100 streams into one TCP connection.
HTTP/3 stuffs 100 actually independent streams into one QUIC connection."
RFC 9000 §2 paraphrased
CHAPTER 12

丢包恢复 — 严格单调的 Packet Number

Loss recovery — strictly-monotonic Packet Number

为什么 QUIC 的 RTT 比 TCP 准

why QUIC measures RTT more accurately than TCP

Layer
QUIC recovery
RFC
9002
关键
Key
PN never reused
触发器
Triggers
ACK · PTO · time

◇ 在我们的 GET 请求里 · 主线阶段 3(如果丢包)◇ In our GET request · Phase 3 (if lost)

INPUT
PN 14 (body part 2) 长时间未 ACKPN 14 (body part 2) outstanding > thresh阈值:3 包 或 9/8 × max_RTTthreshold: 3 packets or 9/8 × max_RTT
OUTPUT
重发同样数据,但分配新 PN 17resend same bytes with fresh PN 17PN 14 永久弃用PN 14 abandoned forever

TCP 的歧义 vs QUIC 的清晰

TCP's ambiguity vs QUIC's clarity

TCP · seq number = byte offset (reusable) QUIC · packet number is strictly monotonic sender side receiver side seq=1000, len=200 seq=1200, len=200 ✗ lost seq=1200, len=200 (retransmit) SAME seq as the lost one ACK seq=1400 ⚠ which transmit? Karn says: throw away RTT sample Result · 重传歧义 RTT estimate breaks on retransmits. Karn's algorithm: discard RTT on retx. PN=100 [STREAM offset=0..199] PN=101 [STREAM offset=200..399] ✗ lost PN=102 [STREAM offset=200..399] NEW PN — same stream data, fresh ID ACK [102] ✓ unambiguous · RTT = T_ack − T_send(102) Result · 单调清晰 RTT exact on every packet, even retransmits. Why BBR runs better on QUIC than TCP.
FIG 12·1 TCP 重传复用 seq 导致 RTT 测量歧义 · QUIC 每次重传分配新 PN · 解决 30 年的 Karn 算法 Fig 12·1 · TCP retransmits reuse seq, breaking RTT estimation (the Karn problem) · QUIC's monotonic PN gives every retransmit a fresh ID and exact RTT.
TCP
重传歧义
retransmission ambiguity

TCP 的 sequence number 指代字节偏移。重传时 seq 完全相同——你收到的 ACK 到底是回原包还是回重传包?没法分。这就是著名的 retransmission ambiguity,导致 RTT 测量必须用"Karn 算法"忽略重传 RTT。

TCP seq numbers identify byte offsets. A retransmission has the same seq as the original. When an ACK arrives, you can't tell whether it's for the original or the retransmit. This is the infamous retransmission ambiguity; it forces TCP to use "Karn's algorithm" and discard retransmit RTT samples.

QUIC
PN 严格单调
PN strictly monotonic

QUIC 的 packet number 永不复用。重传时新 PN,旧 PN 永远废弃。ACK 回的是哪个 PN,就是哪个 PN——RTT 测量绝对精确。这是 BBR 等高级拥塞控制能在 QUIC 上"开挂"的根源。

QUIC packet numbers are never reused. A retransmit carries a new PN, the old PN is dead forever. An ACK names exactly the PN it acknowledges — RTT samples are exact. This is why advanced cc like BBR runs better on QUIC than TCP.

三类丢包检测

Three classes of loss detection

ACK-based · 包裹被超越
ACK-based · packet outpaced
如果 PN X+3 已经 ACK 了,但 PN X 没 ACK——X 大概率丢了。RFC 9002 默认阈值:3 个包 9/8 × max_RTT 之后宣告丢失。
If PN X+3 is ACKed but PN X isn't — X is likely lost. RFC 9002 default thresholds: 3 packets or 9/8 × max_RTT before declaring it lost.
Probe Timeout (PTO)
Probe Timeout (PTO)
最久那个未 ACK 的包发出后超过 smoothed_RTT + 4 × RTTVAR + max_ack_delay,就触发 PTO——发一个 PING 探测包"叫醒"对方。取代了 TCP 的 RTO 一次干 1 秒。
If the oldest unacked packet has been outstanding longer than smoothed_RTT + 4·RTTVAR + max_ack_delay, PTO fires — send a PING to "wake up" the peer. Replaces TCP's RTO with its 1-second hammer.
Anti-amplification 校准
Anti-amplification adjust
握手期间,服务器受 3x 限制不能乱发探测包——RFC 9002 §6.2.2 规定 PTO 在握手期更保守。
During handshake the server is capped by the 3x amplification rule, so RFC 9002 §6.2.2 requires PTO to be more conservative during handshake.

连接级 vs 流级丢包

Connection-level vs stream-level loss

KILLER FEATURE KILLER FEATURE 假设 PN 5 丢了,里面装的是 Stream A 的字节 0-1200。Stream A 必须等重传。但 PN 6 装的是 Stream B——它的解密和应用层处理不需要等。这就是 HTTP/3 干掉 TCP head-of-line 的关键:丢包只阻塞被丢的流,不阻塞其它流 Suppose PN 5 is lost — it carried Stream A bytes 0-1200. Stream A must wait for retransmission. But PN 6 carried Stream B — its decryption and application-layer processing don't have to wait. This is the key to HTTP/3 killing TCP HOL: loss blocks only the affected stream, never the others.
FIG 12 · loss detection · 3 triggers QUIC 三种丢包检测时序 A timeline showing how three independent mechanisms (packet threshold = 3, time threshold = 9/8 × max-RTT, PTO probe) detect a lost packet. Whichever fires first wins. 3 LOSS DETECTION MECHANISMS · WHICHEVER FIRES FIRST WINS time → T₀ T₀ + 30 ms T₀ + 50 ms T₀ + 90 ms (9/8 RTT) T₀ + 280 ms (PTO) SENT 13 14 ✗ LOST 15 16 17 ACK ed 13 15 16 17 ① PACKET THRESHOLD · kPacketThreshold = 3 PN 14 + 3 = PN 17 acked → declare PN 14 lost. Triggered at T₀ + 65 ms. FIRES ✓ ② TIME THRESHOLD · 9/8 × max(SRTT, latest_RTT) PN 14 outstanding > 90 ms → declare lost. Backup for when ① doesn't fire (no later ACKs). would fire at 90 ms (slower than ①) ③ PTO (PROBE TIMEOUT) · smoothed_RTT + 4·rttvar + max_ack_delay, 2× per attempt If NO acks arrive at all (tail loss): send probe packets to elicit any ACK. First PTO ≈ 280 ms here. Doubles on every probe (280 → 560 → 1120 ms …) — bounded exponential backoff. last resort → ① fires first (T₀ + 65 ms) · loss declared · retransmit data with FRESH PN = 18 (see Fig "TCP vs QUIC")

QUIC 用三种独立机制并行检测丢包,谁先触发谁赢:① 包数阈值(kPacketThreshold = 3),收到比目标 PN 大 3 的 ACK 即判丢;② 时间阈值(9/8 × max RTT),用作①的兜底;③ PTO,处理"整条尾部都没 ACK"的死锁场景,指数 backoff。本图例里 ① 在 65 ms 处先触发,PTO 是不会用到的最后保险。RFC 9002 §6 把这三层判定写成一个 timer + 一组规则,后面伪代码会展开。

QUIC runs three independent loss-detection mechanisms in parallel; whichever fires first wins: ① packet threshold (kPacketThreshold = 3) — an ACK arrives for a PN at least 3 above the suspect; ② time threshold (9/8 × max RTT) — backs up ①; ③ PTO — for "nothing was acked at all" deadlock, with bounded exponential backoff. In this example ① fires first at 65 ms; PTO is the safety net you hope never trips. RFC 9002 §6 encodes all three behind one timer + one rule set — unfolded in the pseudocode below.

📖 RFC 9002 §6 · 丢包检测伪代码📖 RFC 9002 §6 · Loss detection in pseudocode

RFC 9002 Appendix A · OnAckReceivedQUIC loss recovery
; State per PN space (Initial / Handshake / Application) largest_acked_packet ; highest PN we've heard ACKed time_of_last_ack_eliciting ; when did we last send something needing ACK loss_detection_timer ; the one timer that drives PTO + early loss pto_count ; resets on each new ACK ; Constants (§6.1.1, §6.1.2) kPacketThreshold = 3 ; ACKed past N pkts → loss kTimeThreshold = 9 / 8 ; × max(srtt, latest_rtt) kGranularity = 1 ms OnAckReceived(ack_frame): FOR each newly_acked in ack_frame.ranges: UpdateRtt(newly_acked.send_time) sent_packets.remove(newly_acked.pn) cc.on_packet_acked(newly_acked) ; cc grows cwnd DetectAndRemoveLostPackets() pto_count = 0 SetLossDetectionTimer() DetectAndRemoveLostPackets(): ; §6.1 loss_delay = kTimeThreshold × max(srtt, latest_rtt) loss_delay = max(loss_delay, kGranularity) lost_send_time = now − loss_delay FOR each unacked in sent_packets: IF unacked.send_time ≤ lost_send_time: ; time-based MARK unacked as LOST IF largest_acked − unacked.pn ≥ kPacketThreshold: ; reordering-based MARK unacked as LOST cc.on_packets_lost(lost_packets) retransmit_data(lost_packets) ; assign NEW PNs (Ch12 reason) SetLossDetectionTimer(): ; §6.2 IF there are loss candidates: timer = earliest_loss_time + loss_delay ELSE: ; PTO mode pto = (srtt + max(4 × rttvar, kGranularity) + max_ack_delay) × 2^pto_count timer = time_of_last_ack_eliciting + pto
§6 · 两根触发器 RFC 9002 给了两个独立的丢包判定:(1) 包阈值——后面 3 个包都 ACK 了但这个没;(2) 时间阈值——超过 9/8×max(sRTT,latestRTT) 还没 ACK。任一触发就视为丢失。没有触发但有 ACK 等待时,PTO(Probe Timeout)兜底——每次失败 PTO 指数翻倍(×2^pto_count)。这两个机制合起来取代了 TCP 的 RTO + Fast Retransmit。 RFC 9002 gives two independent loss triggers: (1) packet threshold — the next 3 are ACKed but this one isn't; (2) time threshold — outstanding longer than 9/8×max(sRTT,latestRTT). Either fires → declared lost. If neither fires but ACKs are outstanding, PTO kicks in — doubles on each timeout (×2^pto_count). Together these replace TCP's RTO + Fast Retransmit.
本章引用Chapter references
RFC
RFC 9002 · QUIC Loss Detection
paper
draft-ietf-tcpm-rack · RACK
CHAPTER 13

拥塞控制 — 用户态的 BBR 实验场

Congestion control — BBR's user-space playground

QUIC 让拥塞控制变成应用配置

QUIC turns congestion control into an app setting

Layer
QUIC cc
RFC
9002 §7 (NewReno)
实际
In practice
BBR v2/v3 / CUBIC
特点
Property
pluggable

TCP 的拥塞控制写在内核里——升级一次要等几年。QUIC 把它搬到了用户态。Cloudflare 想换 BBR v3?改一行 Rust。Google YouTube 想用自家的 cc 算法?同样改一行 C++。这是 QUIC 真正的"研发加速器"价值——它让网络拥塞控制变成应用层关切,而不是十年内核排队等升级的事。

TCP cc lives in the kernel — upgrading takes years. QUIC moved it to user space. Cloudflare wants BBR v3? Change one Rust line. Google YouTube wants its own cc algorithm? Same — one C++ line. This is QUIC's real "R&D accelerator" value: it turns congestion control into an application concern, not a decade-long kernel queue.

◇ 在我们的 GET 请求里 · 主线阶段 3 + 5◇ In our GET request · Phase 3 + 5

INPUT
RTT 测量 + ACK 调步RTT samples + ACK pacingsRTT 40ms · bw 50 Mbps · loss 1.5%sRTT 40ms · bw 50 Mbps · loss 1.5%
OUTPUT
cwnd 目标 ≈ BDPcwnd target ≈ BDPBBR ProbeBW 主导BBR ProbeBW dominant

三大算法对照

Three algorithms side by side

ccsignalthroughputfairness部署在deployed at
NewReno (RFC 9002 default)lossbaselinegood小实现库的默认smaller libs default
CUBIC (RFC 8312)loss1.5x baselinegoodLinux TCP 默认 · ngtcp2Linux TCP default · ngtcp2
BBR v2/v3bandwidth + RTT2-3x baselinewarn:CUBIC starveGoogle · Cloudflare · Meta

BBR 凭什么这么强

Why BBR wins

CUBIC / NewReno 用丢包当拥塞信号——但现代网络的丢包大多来自无线信道错误,不是拥塞。BBR 直接测量瓶颈带宽(max bandwidth)和最小 RTT,用 BDP(带宽时延积)当目标在途字节数。结果:BBR 在有损但不拥塞的链路(4G/5G/Wi-Fi)上吃满带宽;CUBIC 在那种链路上把随机丢包误判为拥塞,反复半切 cwnd → 三次方爬升 → 再切——cwnd 在带宽线下方呈锯齿震荡,平均利用率明显低于实际可用带宽。

CUBIC / NewReno treat loss as the congestion signal — but most modern packet loss comes from wireless channel errors, not congestion. BBR directly measures bottleneck bandwidth (max bw) and minimum RTT, then uses BDP (bandwidth-delay product) as its target in-flight. Result: BBR saturates bandwidth on lossy but uncongested links (4G/5G/Wi-Fi). CUBIC on those same links misreads random loss as congestion, halving cwnd repeatedly, climbing back cubically, halving again — cwnd oscillates as a sawtooth well below the actual ceiling, with average utilisation visibly lower than the link could carry.

CONGESTION WINDOW · over time · 4G with 1.5% random loss cwnd 100k 75k 50k 25k 0 ▼ loss ▼ loss ▼ loss ▼ loss NewReno (additive) CUBIC (concave) BBR v2 (bw-based) 0s 1s 2s 3s 4s BBR ignores random loss · CUBIC carves a sawtooth · NewReno crawls
FIG 13·1 cwnd 在弱网(1.5% 随机丢包)下的三种行为 · BBR 看带宽不看丢包,吃满;CUBIC 锯齿;NewReno 缓爬。 Fig 13·1 · cwnd behaviour on a 1.5%-random-loss link · BBR saturates the link (it measures bandwidth, not loss); CUBIC sawtooths; NewReno crawls.
CASE · GOOGLE YOUTUBE
YouTube 上 BBR 实测
BBR in YouTube production

Google 2017 年 SIGCOMM 论文《BBR: Congestion-Based Congestion Control》给出:在美国跨州链路上,BBR 让 YouTube 的视频缓冲事件率下降 53%,启动时间降低 8%。2024 年 BBR v3 进一步把吞吐稳定性提升约 15%。Google 把 BBR 同时部署到 TCP(Linux 内核 4.9+)和 QUIC(QUICHE)——但 QUIC 上的 BBR 因为 PN 单调更精确(见 Ch12),效果更稳。

Google's 2017 SIGCOMM paper «BBR: Congestion-Based Congestion Control» reported: on US cross-state links, BBR reduced YouTube's video rebuffer rate by 53% and startup time by 8%. BBR v3 (2024) tightened throughput stability another ~15%. Google deploys BBR on both TCP (Linux kernel 4.9+) and QUIC (QUICHE) — but the QUIC variant runs more stably thanks to monotonic PN (see Ch12).

Spin Bit · 让运营商喘口气

Spin Bit · throwing operators a bone

QUIC 把 packet number 都加密了——运营商再也不能用过去测 TCP RTT 的招测 QUIC RTT。这让大量运营商抓狂(他们的 SLA 监控、流量调度全靠 RTT 数据)。QUIC WG 妥协的设计:Spin Bit——short header 里有 1 比特,在每个 RTT 翻转一次,中间盒不解密也能被动测算 RTT。客户端可以选择关闭它(出于隐私),但生产环境基本都开。

QUIC encrypts packet numbers — operators can no longer measure RTT the way they did with TCP. This drove operators wild (their SLAs and traffic engineering all depend on RTT). QUIC WG's compromise: Spin Bit — 1 bit in the short header that flips once per RTT. Middleboxes can passively measure RTT without decrypting. Clients may disable it for privacy, but in production it's almost always on.

📖 RFC 9002 §7 · NewReno 状态机伪代码📖 RFC 9002 §7 · NewReno state machine pseudocode

RFC 9002 §7 · NewReno (default cc)Appendix B
; State congestion_window (bytes) bytes_in_flight (bytes) ssthresh ; slow start threshold congestion_recovery_start_time ; for filtering duplicate triggers ; Constants kInitialWindow = 10 × max_datagram_size ; ~14 KB kMinimumWindow = 2 × max_datagram_size ; ~2.9 KB kLossReductionFactor = 0.5 kPersistentCongestionThr = 3 ; PTOs of no progress OnPacketAcked(acked): ; §7.3.1 IF acked.in_congestion_recovery: RETURN ; ignore old retransmits IF cwnd ≥ ssthresh: ; congestion avoidance cwnd += (max_datagram_size × acked.size) / cwnd ELSE: ; slow start cwnd += acked.size OnPacketsLost(lost): ; §7.3.2 IF any(p.send_time > congestion_recovery_start_time for p in lost): congestion_recovery_start_time = now() ssthresh = cwnd × kLossReductionFactor ; halve cwnd = max(ssthresh, kMinimumWindow) IF in_persistent_congestion(lost): ; §7.6 cwnd = kMinimumWindow congestion_recovery_start_time = 0 ; restart from scratch ; Sending constraint (everywhere) INVARIANT: bytes_in_flight ≤ cwnd
§7 · 拥塞控制三大概念 这个伪代码简化RFC 9002 §7:省略了 PTO 状态机的回退逻辑、ECN 标记处理、persistent congestion 判定的具体阈值,但核心三态完整。出现的机制:(1) 慢启动——cwnd 每收一个 ACK 涨一份;(2) 拥塞回避——cwnd 每收一个 ACK 涨 1/cwnd(即每 RTT 涨 1 包);(3) 持续拥塞——3 个 PTO 没有任何 ACK,被认定为路径中断,cwnd 重置到最小。BBR 抛弃了这套循环,直接测量瓶颈带宽,在弱网/移动场景下吞吐量大约是 NewReno 的 1.5-3 倍(Cardwell et al., ACM Queue 2017 · BBR v1)——参见 Ch21 性能数据。 This pseudocode simplifies RFC 9002 §7 by omitting the PTO state machine's back-off, ECN handling, and exact persistent-congestion thresholds — but the three core states are intact. Mechanisms: (1) slow start — cwnd grows by one segment per ACK; (2) congestion avoidance — cwnd grows by 1/cwnd per ACK (i.e. 1 packet per RTT); (3) persistent congestion — 3 PTOs with no ACK is treated as a path break, cwnd resets to minimum. BBR ditches this whole loop and directly measures bottleneck bandwidth, giving roughly 1.5-3× the throughput of NewReno on cellular / lossy links (Cardwell et al., ACM Queue 2017 · BBR v1) — see Ch21 for production numbers.
CHAPTER 14

HTTP/3 帧 — 把 HTTP/2 砍掉一半

HTTP/3 frames — HTTP/2 with half the surface chopped off

QUIC 已经做完的事,HTTP/3 就不再重复

whatever QUIC already did, HTTP/3 doesn't redo

Layer
HTTP/3
RFC
9114 §7
帧类型
Frame types
7
对比 H2
vs H2
~50% less

HTTP/2 有 10 种帧(DATA / HEADERS / PRIORITY / RST_STREAM / SETTINGS / PUSH_PROMISE / PING / GOAWAY / WINDOW_UPDATE / CONTINUATION),HTTP/3 只有 7 种——因为 QUIC 把流控制、流终止、ping、优先级都包了。HTTP/3 只剩"HTTP 自己的事"。

HTTP/2 has 10 frame types. HTTP/3 has 7 — because QUIC already handles flow control, stream reset, ping, and priority. HTTP/3 only carries "HTTP's own business" now.

◇ 在我们的 GET 请求里 · 主线阶段 3◇ In our GET request · Phase 3

INPUT
HTTP method/path/headers + 3200 B bodyHTTP method/path/headers + 3200 B bodyGETHTTP/3 · 12 个头字段 + HTML
OUTPUT
HEADERS 帧 + DATA 帧HEADERS frame + DATA frameHEADERS ≈ 7 B (QPACK) · DATA = 3200 BHEADERS ≈ 7 B (QPACK) · DATA = 3200 B

帧清单

Frame list

TypeHex用途PurposeHTTP/2 里In HTTP/2
DATA0x00HTTP bodyHTTP bodysame
HEADERS0x01QPACK 压缩头部QPACK-encoded headerssame
CANCEL_PUSH0x03取消 Push(已死)cancel push (dead)
SETTINGS0x04连接参数connection paramssame
PUSH_PROMISE0x05服务器 Push(已死)server push (dead)deprecated
GOAWAY0x07优雅关闭graceful closesame
MAX_PUSH_ID0x0d允许的 Push ID 上限push limit
— 砍掉 —— removed —PRIORITY · RST_STREAM · PING · WINDOW_UPDATE · CONTINUATIONQUIC 处理handled by QUIC

三类流的开局

Three streams open the connection

Control · uni 0x02
SETTINGS · GOAWAY
每端必须开 1 条
each side must open one
QPACK encoder · uni 0x02
动态表更新
dynamic table updates
→ peer's decoder
QPACK decoder · uni 0x03
已收确认
insert count ack
→ peer's encoder
FIELD NOTE · 类型字节的诡计 FIELD NOTE · Type-byte trick 单向流的第一个字节是 stream type,不是 frame type。0x00 = control, 0x01 = push, 0x02 = QPACK encoder, 0x03 = QPACK decoder。GREASE 类型(用预留范围 0x1f * N + 0x21RFC 9114 §7.2.8 + RFC 9287)任何端都可以发——这就是 RFC 9114 的反僵化策略:故意送一些对方不认识的流,强迫实现"遇到不认识就忽略",否则永远不会有 0x04 出现。 A uni stream's first byte is the stream type, not a frame type. 0x00 = control, 0x01 = push, 0x02 = QPACK encoder, 0x03 = QPACK decoder. GREASE types (reserved range 0x1f·N + 0x21, RFC 9114 §7.2.8 + RFC 9287) can be sent by either side — RFC 9114's anti-ossification trick: deliberately send streams the peer doesn't recognise, forcing implementations to "ignore unknown", so 0x04 can land in the future.

优先级 · 砍了 PRIORITY 之后

Priority · what replaced PRIORITY

HTTP/2 的优先级是个有名的笑话——RFC 7540 §5.3 设计了一棵weighted dependency tree,让客户端"告诉服务器谁先发"。Firefox 写过、Chrome 写过、Safari 没写。三家行为完全不一致,最后 RFC 9113 把它整段废弃了。

HTTP/3 选择了完全不同的路线 ——RFC 9218 · Extensible Priorities for HTTP(2022-06,和 RFC 9114 同期发):

HTTP/2's priority was a famous joke — RFC 7540 §5.3 designed a weighted dependency tree for clients to tell servers "send these first". Firefox shipped one. Chrome shipped a different one. Safari shipped none. The three implementations behaved nothing alike. RFC 9113 finally obsoleted the whole thing.

HTTP/3 went a different route entirely — RFC 9218 · Extensible Priorities for HTTP (2022-06, shipped with RFC 9114):

priority header
声明优先级
declare priority
u=0..7 · i
  • priority: u=3 — urgency 0(高)…7(低)
  • i — incremental(流式可逐字节渲染)
PRIORITY_UPDATE
中途调整
re-prioritise mid-flight
frame 0xF0700 (H3)
  • 客户端在 control stream 发送sent on the control stream
  • 用 SF-Item 结构SF-Item dict syntax
服务端策略
Server policy
建议而非强制
advisory, not mandatory
scheduler-defined
  • 服务器可以无视server may ignore
  • RFC 不规定调度算法RFC doesn't pick a scheduler
RFC 9218 example · what Chrome sendscurl -v --http3 ursb.me
; HTML — top urgency, render incrementally as bytes arrive GET / HTTP/3 priority: u=0, i ; CSS — high urgency, blocking but not incremental GET /app.css HTTP/3 priority: u=2 ; Image — low priority, can wait GET /hero.webp HTTP/3 priority: u=5, i ; PRIORITY_UPDATE on the control stream — Chrome bumps an image into view [PRIORITY_UPDATE frame 0x0F0700] prioritized_id = 0x14 ; stream 20 priority_field = "u=1, i" ; now urgent, scrolled into viewport
DEVTOOLS Chrome DevTools → Network → 右侧 "Priority" 列。Chrome 内部把主资源/CSS/JS/图片/字体映射成 u=0..5。你可以用 fetch API 的 priority 选项手动覆盖:fetch(url, { priority: 'high' })。这是 RFC 9218 在浏览器侧的唯一对外接口。 Chrome DevTools → Network → "Priority" column on the right. Chrome maps main resource / CSS / JS / image / font internally to u=0..5. You can override with the Fetch API's priority option: fetch(url, { priority: 'high' }). That's the only browser-facing surface for RFC 9218.
本章引用Chapter references
RFC
RFC 9114 · §7 H3 frames
CHAPTER 15

QPACK — 给 HPACK 解开有序枷锁

QPACK — unshackling HPACK from strict order

为什么不能直接用 HPACK

why we couldn't just keep HPACK

Layer
HTTP/3 header compression
RFC
9204
静态表
Static table
99 entries
动态表
Dynamic table
tunable · default 4096 B

◇ 在我们的 GET 请求里 · 主线阶段 3◇ In our GET request · Phase 3

INPUT
12 个 HTTP 头部键值对12 HTTP header KV pairs~600 B 原始字节~600 B raw
OUTPUT
5-7 字节5-7 bytes静态表查中 · 动态表命中 · 字面编码static hits · dynamic hits · literals

HPACK 在 QUIC 上的不可救药

Why HPACK can't live on QUIC

HPACK(HTTP/2)依赖一个严格同步的动态表。服务器在 Stream A 发了 ":status: 200",告诉客户端"把这条加进表,索引 62"。下一个流可以用索引 62 来引用——前提是 Stream A 在 Stream B 之前到达。HTTP/2 over TCP 天然按序,所以没问题。

QUIC 各流相互独立、并发到达。Stream A 的 update 还没来,Stream B 已经用了索引 62——无法解压。这就把 transport 层好不容易消灭的 head-of-line blocking 又拽回了应用层。

HPACK (HTTP/2) depends on a strictly synchronised dynamic table. Server sends ":status: 200" on Stream A and says "insert this, index 62". The next stream can now refer to index 62 — assuming Stream A arrives before Stream B. HTTP/2 over TCP is naturally ordered, so this works.

QUIC streams are independent and arrive concurrently. If Stream A's update hasn't landed yet but Stream B already references index 62 — cannot decode. The head-of-line blocking the transport layer worked so hard to kill comes roaring back at the app layer.

FIG 15 · QPACK · 3-stream architecture QPACK 三流架构 · request + encoder + decoder QPACK splits header compression across three independent QUIC streams: bidirectional request streams carry header references, a unidirectional encoder stream carries dynamic-table inserts, a unidirectional decoder stream carries acknowledgements. The Required Insert Count (RIC) gate blocks a request stream only when its referenced inserts haven't arrived yet. QPACK · 3 STREAMS, NO CROSS-STREAM HEAD-OF-LINE BLOCK CLIENT Chrome dynamic table recv side [62] :status: 200 [63] hsts: max... [64] (pending) decoder out ack: Stream 0 cancel: ... ICI: insert cnt decompresses request headers SERVER ursb.me dynamic table send side [62] :status: 200 [63] hsts: max... [64] content-type encoder out insert [64] duplicate [62] set table cap compresses response headers Stream 0 · request (bidi) → GET / · with header refs [62][63][64] StreamID=7 · encoder (server-uni) → insert [64] content-type: text/html · server pushes dyn-table updates StreamID=10 · decoder (client-uni) ← Insert Count Increment · Section Ack · Stream Cancel ⚠ REQUIRED INSERT COUNT (RIC) — the only blocking point

每个请求流的 header block 头部携带一个 RIC:"我引用了 dynamic table 项 [64] 之前的所有插入,你的 receiver dyn-table.insert_count ≥ 64 时才能解开我"。如果 encoder stream 还没把 [64] 送到,这条 request stream 就地暂停解头部,但不影响其他请求流——这是 QPACK 比 HPACK 高级的核心。

STREAM ID ASSIGNMENTS · per RFC 9204 §5 客户端发起: 0/4/8 (bidi) · 6/10 (uni for QPACK enc/dec) · 服务端发起: 7/11 (uni for QPACK enc/dec) · 控制流 2/3

QPACK 把头部压缩拆到 3 条独立 QUIC 流上:bidi 请求流(紫)携带对动态表项的引用,encoder 流(绿,server→client)推送动态表插入,decoder 流(橙,client→server)回报 insert-count + section-ack + stream-cancel。读者最该记的细节是RIC(Required Insert Count)闸门:每条请求流只在自己引用的插入未到达时暂停,不影响并发请求——这是 HPACK 在 QUIC 上"把跨流 HOL 阻塞拽回来"的根本性解。

QPACK splits header compression across three independent QUIC streams: bidi request streams (purple) carry references to dynamic-table entries; the encoder stream (green, server→client) ships dynamic-table inserts; the decoder stream (copper, client→server) reports insert-count + section-ack + stream-cancel. The detail to remember is the Required Insert Count (RIC) gate: a request stream stalls only when the inserts it references haven't arrived, never on a sibling stream. This is the structural answer to HPACK's "cross-stream HOL block resurrected on QUIC" failure mode.

QPACK 的三招

QPACK's three moves

静态表扩容 + 现代化
Bigger, modernised static table
HPACK 静态表 61 项 → QPACK 99 项。新加了 alt-svccontent-security-policystrict-transport-security:scheme: https 等现代 web 必备字段。
HPACK static table 61 entries → QPACK 99. Added alt-svc, content-security-policy, strict-transport-security, :scheme: https and other modern-web staples.
双向同步流
Bi-directional sync streams
encoder stream(单向 0x02)发送"插入这条到动态表"指令;decoder stream(单向 0x03)反向告知"我已经收到 N 条 insertion"——这两个数字叫 Insert Count 和 Known Received Count。
The encoder stream (uni 0x02) sends "insert this into the dynamic table". The decoder stream (uni 0x03) reports back "I've received N insertions so far" — the two counters: Insert Count and Known Received Count.
Required Insert Count + 阻塞容忍
Required Insert Count + tolerable blocking
每个 HEADERS 帧带一个 Required Insert Count。如果接收端的 Insert Count 还不够,这条 HEADERS 就仅这条暂存——其它流照样跑。SETTINGS 里可配置允许多少条"阻塞中的流"(默认 100)。如果发送端"压缩太激进"导致阻塞超限,发送端会自动退回到不引用动态表的字面编码。
Each HEADERS frame carries a Required Insert Count. If the receiver's Insert Count isn't there yet, only that HEADERS pends — other streams run on. SETTINGS configures how many "blocked streams" are tolerated (default 100). If the sender's aggressive compression would exceed the cap, it auto-falls-back to literal encoding without referencing the dynamic table.

阻塞容忍可视化

Tolerable blocking, visualised

REQUIRED INSERT COUNT · why QPACK doesn't HOL encoder stream server → client INSERT idx=62 :status: 200 INSERT idx=63 cache-control: … Stream 0 request A HEADERS · RIC=63 references idx=62 Stream 4 request B HEADERS · RIC=63 references idx=62, 63 Stream 0 packet arrives before encoder INSERTs ⏸ this one stream blocked, waiting for INSERT 62-63 Stream 4 packet arrives after INSERTs are buffered ✓ decodes immediately QPACK COUNTERS Insert Count (server) = 64 Known Received Count = 61 Required Insert Count = 63
FIG 15·1 Stream 0 引用了还没到的 idx=62 → 它一条暂存;Stream 4 等 INSERT 都到了再来,立即解码。HOL 只发生在这一条流上。 Fig 15·1 · Stream 0 references idx=62 before the INSERT lands → only Stream 0 blocks; Stream 4 arrives after the INSERTs and decodes immediately. HOL stays per-stream.

实测压缩率

Measured compression

scenarioraw bytesHPACK (H2)QPACK (H3)
首次请求first request~600~50~52
同连接重复请求repeated request, same conn~600~5~6
弱网(丢包)weak link (lossy)~600~5 + HOL~6 (no HOL)

压缩率本身差不多。QPACK 的赢面在抗丢包

Compression ratios are nearly identical. QPACK's win is in resistance to loss.

PRACTICAL 大多数 QUIC 库默认动态表只开 4 KB——比 HPACK 的 64 KB 小得多。原因:动态表越大,"阻塞中的流"越多。在内网/低延迟场景可以调大;公网/移动场景不要。如果你在 nginx 配置 H3 时看到 http3_max_field_size,那就是它。 Most QUIC libraries default the dynamic table to 4 KB — much smaller than HPACK's 64 KB. Reason: the bigger the table, the more "blocked streams" pile up. Bump it up on intra-datacenter / low-latency paths; don't on public / mobile networks. The nginx knob is http3_max_field_size.
本章引用Chapter references
RFC
RFC 9204 · QPACK
RFC
RFC 7541 · HPACK (for comparison)
CHAPTER 16

Server Push 之死

The death of Server Push

一个写在 RFC 里、被市场否决的功能

a feature that lived in the RFC and died in production

STATUS
默认禁用disabled by default
Chrome
v106+ 移除removed
Firefox
默认关off by default
替代方案
Replacement
103 Early Hints

2015 年 HTTP/2 把 Server Push 当成杀手特性写进了 RFC 7540——服务器知道客户端马上要 app.css,那为什么不提前推给它?2022 年 Chrome 106 默认禁用了 Server Push。2024 年彻底从 Chromium 代码里移除。HTTP/3 RFC 9114 出于"协议完整性"保留了 PUSH_PROMISE 帧——但浏览器都不接。

In 2015, HTTP/2 wrote Server Push into RFC 7540 as a killer feature — the server knows the client will need app.css, so why not push it ahead of time? In 2022, Chrome 106 disabled Server Push by default. In 2024, it was deleted from the Chromium tree. HTTP/3 RFC 9114 kept the PUSH_PROMISE frame for "protocol completeness" — but no browser accepts it anymore.

◇ 在我们的 GET 请求里 · 主线阶段 N/A · 不触发◇ In our GET request · Phase N/A · no-op

INPUT
— · 我们这次 GET 不触发 push— · our GET doesn't trigger pushChrome 106+ 默认禁 pushChrome 106+ default-disables push
OUTPUT
— · 浏览器直接 CANCEL_PUSH— · browser CANCEL_PUSHes immediately即使服务器主动 push 也会被丢pushes from server get cancelled

死因 · 三个

Three causes of death

死因 1 · 缓存不知道
Cause 1 · cache ignorance
服务器不知道客户端有什么
Server doesn't know what the client has

服务器盲目 push app.css——但如果客户端缓存里已经有了呢?带宽白浪费。Chrome 实测发现 70%+ 的 push 被客户端立即 CANCEL_PUSH 掉。

The server blindly pushes app.css — but what if the client already has it cached? Bandwidth wasted. Chrome's telemetry: 70%+ of pushes get immediately CANCEL_PUSHed.

死因 2 · 优先级混乱
Cause 2 · priority chaos
Push 抢了真正请求的带宽
Push steals real-request bandwidth

服务器推的 app.css 在线上跟客户端发起的 app.js 抢拥塞窗。BBR 不知道哪个更急——结果两个都慢。

The server-pushed app.css competes with the client-issued app.js on the congestion window. BBR can't tell which is more urgent — both end up slower.

死因 3 · 替代品更好
Cause 3 · better alternative
103 Early Hints + preload
103 Early Hints + preload

服务器先发一个 HTTP 103 Early Hints 响应(RFC 8297),告诉客户端"你可能会需要 app.css"。客户端自己决定要不要 preload。简单、可观察、不抢带宽。

Server sends a HTTP 103 Early Hints response (RFC 8297) telling the client "you'll probably need app.css". The client decides whether to preload. Simple, observable, no bandwidth war.

活下来的东西
What survived
实际部署的 Push 用法
Push patterns that actually shipped

CDN(Cloudflare 等)依然在边缘到 origin 之间偷偷用 Push 做 prefetch 优化——这不进客户端浏览器,所以不受 Chrome 106 影响。这种"内网 Push"还活着。

CDNs (Cloudflare et al.) still quietly use Push between their edge and origin for prefetch optimisation — that traffic never reaches the client browser, so Chrome 106 doesn't affect it. "Intra-network Push" lives on.

"Server Push 在 RFC 里完美无瑕,
在生产里几乎没找到一个稳定的用例。"
"Server Push was flawless in the RFC,
and almost no stable use case ever showed up in production."
Patrick Meenan · Chrome Web Performance · 2022
CHAPTER 17

连接迁移 — Wi-Fi 到 5G 不断线

Connection migration — Wi-Fi to 5G without dropping

CID 是 QUIC 的身份证

the Connection ID is QUIC's passport

在主线里
In our request
mid-flight switch
Layer
QUIC · CID layer
RFC
9000 §9
关键帧
Key frames
PATH_CHALLENGE / RESPONSE

主线时刻 T+200ms(请求中途):你走出咖啡馆,手机自动切到 5G。src_ip: 192.168.1.4210.220.5.13。TCP 在这里必死,因为连接由四元组定义。HTTP/3 不死——因为 QUIC 连接由 Connection ID 定义,而不是四元组。

Main-line time T+200ms (mid-request): you walk out of the café, the phone hops to 5G. src_ip: 192.168.1.4210.220.5.13. TCP dies here, because TCP identifies a connection by the 4-tuple. HTTP/3 doesn't die — because QUIC identifies a connection by the Connection ID, not by IP-port.

◇ 在我们的 GET 请求里 · 主线阶段 6◇ In our GET request · Phase 6

INPUT
旧路径 Wi-Fi 192.168.1.42:52341old path Wi-Fi 192.168.1.42:52341NSURLSession reports Wi-Fi lossWi-Fi 信号丢失
OUTPUT
新路径 5G 10.220.5.13:34188new path 5G 10.220.5.13:34188同 CID · PATH_CHALLENGE 验证完成same CID · PATH_CHALLENGE OK

CID 池 · 提前准备好

CID pool · prepared in advance

连接建立后,服务器和客户端不停发 NEW_CONNECTION_ID 帧,互相给对方备好"未来可以用的 CID 列表"。每个 CID 还附带一个 Stateless Reset Token——用于无状态重置。

Once the connection is up, both sides keep emitting NEW_CONNECTION_ID frames, populating each other's "list of CIDs you may use in future". Each CID carries a Stateless Reset Token too — for stateless reset.

FIG 17 · connection migration · Wi-Fi → 5G · 5 steps QUIC 连接迁移 step-by-step Five-step connection migration from Wi-Fi to 5G: client detects new path, sends PATH_CHALLENGE with new DCID, server validates RTT, server returns PATH_RESPONSE, client retires old DCID and the migration completes — all without interrupting the HTTP/3 request mid-flight. CONNECTION MIGRATION · Wi-Fi → 5G · 5 PACKETS · ~ 200 ms CLIENT phone · CID pool ready SERVER ursb.me · CID pool ready ⓪ BEFORE · normal H3 request on Wi-Fi STREAM frame · GET / · DCID=0xab12 src=192.168.1.42:52341 dst=39.105.102.252:443 ⚠ Wi-Fi disconnect OS hands phone a new local IP src=10.220.5.13:33012 (new!) ① PATH_CHALLENGE · 8 random bytes DCID=0xab12 (still old · server identifies us) PATH_CHALLENGE · data=0x77adef1234567890 + also sends a NEW_CONNECTION_ID 0xcd34 for future use ② PATH_RESPONSE · echoes 8 bytes back DCID=0xef56 (new · from server's CID pool) PATH_RESPONSE · data=0x77adef1234567890 (same value) server measures RTT on new path · < old RTT × 3 → ✓ ok ③ VALIDATED · client swaps active DCID STREAM frame continues · DCID=0xef56 (new) src=10.220.5.13:33012 dst=39.105.102.252:443 ④ RETIRE_CONNECTION_ID · clean up RETIRE_CONNECTION_ID · old CID 0xab12 stops being valid TIME T₀ T+50ms T+80ms T+120ms T+150ms T+200ms 3 INVARIANTS THAT MAKE THIS WORK
CID pool prepared in advance — both sides have issued ≥ 3 spare CIDs via NEW_CONNECTION_ID during the connection's life. No mid-migration "negotiate new CID" round-trip.
Crypto state survives — all 1-RTT keys remain valid; encryption / decryption keeps working. Only routing changes.
Anti-amplification 3× cap — server cannot send more than 3× the bytes it received on the new path until validated. Defeats off-path attackers spoofing PATH_CHALLENGE to redirect traffic.

连接迁移分 5 步,总耗时 ~ 200 ms,期间 H3 请求不中断:① OS 切到 5G 后客户端发现新 IP;② 客户端用新 DCID 发 PATH_CHALLENGE(8 字节随机数,要求 server 回 echo);③ server 验证 RTT 没超过旧路径 3× 就回 PATH_RESPONSE,带新 SCID;④ 客户端把后续 STREAM 切到新 DCID;⑤ 双方 RETIRE_CONNECTION_ID 释放旧 CID。关键在 CID pool——双方在连接活跃期就已经互相发了 ≥ 3 个备用 CID,无需迁移时谈判。这是 TCP 在物理上不可能做的事

Connection migration is 5 steps, ~ 200 ms total, during which the H3 request does not break: ① OS hands the phone a new IP after 5G hop; ② client sends PATH_CHALLENGE (8 random bytes) using a new DCID from the pool; ③ server validates RTT against the old path (must be < 3×) and returns PATH_RESPONSE with its new SCID; ④ client routes subsequent STREAM frames over the new DCID; ⑤ both sides issue RETIRE_CONNECTION_ID. The key is the CID pool — both sides have pre-issued ≥ 3 spare CIDs during normal operation, so no negotiation round-trip at migration time. This is physically impossible in TCP.

Path Validation 流程

Path Validation flow

Client (5G now) Server 1-RTT[ANY: src=10.220.5.13] DCID = next from pool PATH_CHALLENGE[random=0xAFBE…] server: "prove you're on this path" PATH_RESPONSE[0xAFBE…] echo back the same nonce 1-RTT[STREAM 0: body chunk N+1] migration complete, stream resumes
FIG 17·1 Path Validation 时序 · 浏览器换 src_ip 后服务器主动发起 challenge · 一次 RTT 内验证完毕 Fig 17·1 · Path Validation timeline · server initiates the challenge after seeing a new src_ip · validated within one RTT
FIELD NOTE · Apple iCloud Private Relay FIELD NOTE · Apple iCloud Private Relay Apple 的 iCloud Private Relay(2021 上线)是迄今最大的连接迁移实战场。手机在 Wi-Fi/5G 之间频繁切换,每次切换 Path Validation 都要在 100-300ms 内完成。Apple 的实测:让中位 RTT 在切网瞬间没有明显抖动,因为新路径在旧路径还没关闭前就完成了验证——这就是 RFC 9000 §9.4 描述的"NAT rebinding without active migration"模式。 Apple's iCloud Private Relay (launched 2021) is by far the largest connection-migration testbed. Phones flip Wi-Fi/5G constantly, and each flip requires Path Validation in 100-300ms. Apple's data: median RTT shows no detectable jitter at the moment of switch — because the new path is validated before the old path is torn down. This is the "NAT rebinding without active migration" mode in RFC 9000 §9.4.

NAT Rebinding · 隐式迁移

NAT Rebinding · implicit migration

家用路由器的 NAT 表项一般有过期时间(30 秒~2 分钟)。如果客户端短时间没发包,NAT 会回收映射;下次再发包时,src_port 可能变了——这等于一次"客户端不知情的迁移"。RFC 9000 §9 把这种情况归到 "passive migration",处理逻辑和主动迁移一致:服务器看到新 4-tuple 就发 PATH_CHALLENGE。

Home router NAT entries usually have an expiration (30s-2min). If the client stays silent, NAT recycles the mapping; the next packet may have a different src_port — effectively a "migration the client doesn't know about". RFC 9000 §9 calls this "passive migration", handled identically: the server sees a new 4-tuple and sends PATH_CHALLENGE.

CHAPTER 18

放大攻击与多路径 — UDP 的代价与红利

Amplification & Multipath — UDP's tax and bonus

3x 限制 · Retry · MPQUIC

3x limit · Retry · MPQUIC

◇ 在我们的 GET 请求里 · 主线阶段 1-2◇ In our GET request · Phase 1-2

INPUT
1228 B 客户端 Initial 字节1228 B from client Initial握手未完成 · 地址未验证handshake pending · address unverified
OUTPUT
服务器 3684 B 预算server budget 3684 B3 × 客户端字节3 × client bytes

为什么 UDP 容易被攻击放大

Why UDP invites amplification

UDP 无连接 ⇒ 服务器不知道"请求人是不是真的在这个 src_ip"。攻击者可以伪造 victim 的 src_ip 给 QUIC 服务器发 1 字节小包,让服务器回复 10000 字节大包到 victim ——典型的 DNS amp 攻击套路。QUIC 必须从协议层防住。

UDP is connectionless ⇒ the server doesn't know "is the requester really at this src_ip?" An attacker can spoof a victim's src_ip, send 1-byte QUIC packets to the server, and trick it into firing 10 000-byte responses at the victim — the classic DNS amp pattern. QUIC has to defend at the protocol level.

3 倍限制

The 3x limit

RFC 9000 §8.1 在握手未完成(即客户端地址未被验证)之前,服务器返给客户端的总字节数不能超过它从客户端收到的总字节数的 3 倍。这就是为什么客户端的第一个 Initial 包必须填到 ≥ 1200 字节——保证服务器有 3600 字节预算发完证书链。 Until the handshake completes (client address unverified), the server's total bytes to the client must not exceed 3× the bytes it has received from the client. This is exactly why the client's first Initial must pad to ≥ 1200 bytes — to give the server a 3600-byte budget to ship the cert chain.
CLIENT → SERVER bytes received from client SERVER → CLIENT (budget) ≤ 3× of received bytes (until verified) T+0 · just first Initial received 1200 B 3600 B (3 × 1200) × 3 cert chain typical size ~4800 B (cert + intermediates) ⚠ exceeds 3× T+RTT · client ACKs / repeats 1200 B + 1200 B budget now 7200 B ✓ fits cert after handshake completes address verified ✓ no limit · congestion control governs "3× rule" is also what kills UDP reflection amp attacks: a forged-IP attacker only gets 3× back.
FIG 18·1 3 倍预算如何随客户端字节增长 · 证书链塞不下时分两次 RTT · 验证完成后 cap 解除。 Fig 18·1 · How the 3× budget grows with client bytes · cert chain split across two RTTs when oversized · cap removed once verified.

Retry · 服务器忙的时候

Retry · for busy servers

如果服务器收到的 ClientHello 看起来可疑(流量异常、资源紧张),可以回一个 Retry 包——里面装一个加密的 token。客户端必须重发 ClientHello 并带上 token。token 等于"我证明你在这个 IP"——下次再来直接信任。Cloudflare 在 DDoS 攻击期间会大量使用 Retry。

If a ClientHello looks suspicious (traffic spikes, resource crunch), the server can return a Retry packet carrying an encrypted token. The client must re-send ClientHello with that token. The token attests "I've proven you're at this IP" — next visits skip the check. Cloudflare hammers Retry during DDoS storms.

Multipath QUIC · 真正的红利

Multipath QUIC · the real bonus

draft-ietf-quic-multipath(截至 2026 已成熟)允许一个 QUIC 连接同时跑 Wi-Fi 和 5G 两条路径。包号空间共享,stream 数据在两条路径上自由调度。Apple iCloud Private Relay 是最早的大规模生产 MPQUIC 部署。

与 MPTCP 对比:MPTCP 只能在内核做,部署率 < 1%;MPQUIC 完全在用户态,每个 QUIC 库都可以独立实现。

draft-ietf-quic-multipath (mature by 2026) lets one QUIC connection simultaneously use Wi-Fi and 5G. Packet number spaces are shared; stream data schedules freely across paths. Apple iCloud Private Relay is the earliest large-scale MPQUIC deployment.

vs MPTCP: MPTCP is kernel-only, < 1% deployed. MPQUIC lives entirely in user space — any QUIC library can implement it independently.

CASE · APPLE
iCloud Private Relay 的多路径
iCloud Private Relay multipath

Apple 使用 MASQUE(CONNECT-UDP)把 QUIC 隧道分发给两个独立的中继节点。手机端的 NSURLSession + MPQUIC 自动在 Wi-Fi/5G 两条物理路径上做透明聚合——当 Wi-Fi 抖动时,5G 直接接管,应用层零感知。这是第一次在消费级设备上规模化跑 MPQUIC。

Apple uses MASQUE (CONNECT-UDP) to distribute QUIC tunnels across two independent relay nodes. NSURLSession + MPQUIC on the phone transparently aggregates across Wi-Fi/5G physical paths — when Wi-Fi jitters, 5G takes over instantly, with zero app awareness. The first consumer-scale MPQUIC deployment.

CHAPTER 19

连接的生命周期 — 关闭、排空、复活

Connection lifecycle — close, drain, revive

GOAWAY · CONNECTION_CLOSE · draining · idle · stateless reset

GOAWAY · CONNECTION_CLOSE · draining · idle · stateless reset

主线阶段
Phase
5 / 7-8 / 9
Layer
QUIC + HTTP/3 lifecycle
RFC
9000 §10 · 9114 §5.2
关键帧
Key frames
GOAWAY · CC · PING · NEW_TOKEN

之前 18 章都讲请求的事——但一个真实的 QUIC 连接还要走完关闭、排空、复活三种结局。生产环境里大部分 bug、半小时一次的"无原因连接重置"、CDN 滚动重启时的瞬时错误,全藏在这一章。

The previous 18 chapters covered request arrival. A real QUIC connection still has to walk through close, drain, revive. Most production bugs, the "mysterious connection resets" every 30 minutes, the transient errors during CDN rolling restarts — they all hide in this chapter.

◇ 在我们的 GET 请求里 · 主线阶段 5 / 7-8 / 9◇ In our GET request · Phase 5 / 7-8 / 9

INPUT
活的连接 · idle 15 分钟live connection · idle 15 minCDN 决定回收 / 进程升级 / 用户切网CDN recycles · server upgrade · user goes offline
OUTPUT
closed · drained · 或 resetclosed · drained · or reset3 PTO 后真正死亡 · 状态从客户端内存抹除truly dead 3 PTO later · state erased from client memory

四种结局 · The four endings

The four endings

优雅关闭 · Graceful close
Graceful close
服务器先发 GOAWAYRFC 9114 §5.2,H3 帧 0x07)告诉客户端"新流我不接,已开的流我处理完"。等所有 stream 跑完,发 CONNECTION_CLOSERFC 9000 §19.19,QUIC 帧 0x1c)正式结束。
Server sends GOAWAY first (RFC 9114 §5.2, H3 frame 0x07): "no new streams, but I'll finish in-flight ones". Once every stream completes, it sends CONNECTION_CLOSE (RFC 9000 §19.19, QUIC frame 0x1c) for real.
立即关闭 · Immediate close
Immediate close
不要 GOAWAY 这一步,直接发 CONNECTION_CLOSE(error=N)所有进行中的流立即收到 RESET_STREAM。常见于客户端检测到加密协议错误时——比如 PN 单调性被破坏(§13.2.3)。
Skip GOAWAY entirely and send CONNECTION_CLOSE(error=N) at once. All in-flight streams receive RESET_STREAM. Common when the client detects a crypto-layer violation — e.g. PN monotonicity broken (§13.2.3).
空闲超时 · Idle timeout
Idle timeout
RFC 9000 §10.1:双方在 TP 里协商出 max_idle_timeout,取较小值。30 秒没收到任何包,连接静默销毁——不发 CC,不通知对端。这是 NAT 表项过期的常态。要保活:发 PING 帧(§19.2)刷新计时器。
RFC 9000 §10.1: both ends negotiate max_idle_timeout in TP, take the smaller. After 30 s with no packets, the connection is silently destroyed — no CC, no peer notification. This is also how NAT entries die. To prevent: send PING (§19.2) to reset the timer.
无状态重置 · Stateless reset
Stateless reset
服务器进程崩了重启,找不到客户端发的 1-RTT 包对应的连接状态。它没有密钥发 CC——只能发一个看起来像随机噪声的 Stateless Reset 包(§10.3),末尾带 16 字节 reset_token(来自对端之前 NEW_CONNECTION_ID 时分配的)。客户端识别 token 后才能销毁本地连接。
Server process crashes and restarts, can't match the client's 1-RTT packet to any connection state. It has no key to send CC — only a packet that looks like random noise: a Stateless Reset (§10.3) ending in the 16-byte reset_token (issued earlier via NEW_CONNECTION_ID). Only the client can recognise that token, then tear down locally.

三态机 · Closing / Draining / Closed

Three states · Closing / Draining / Closed

CONNECTION CLOSE STATE MACHINE · RFC 9000 §10.2 ACTIVE streams flowing PING keeps alive send CC CLOSING resend CC on any recv timer = 3 × PTO timer fires DRAINING discard incoming no resend 3 PTO × recv CC (skip CLOSING, go straight to DRAINING) recv stateless reset → immediate DRAINING idle timeout → silently CLOSED (no CC, no DRAINING)
FIG 19·1 RFC 9000 §10.2 状态机 · ACTIVE → CLOSING / DRAINING → CLOSED · 三条捷径分别为:收 CC、收 stateless reset、idle 超时。 Fig 19·1 · RFC 9000 §10.2 state machine · ACTIVE → CLOSING / DRAINING → CLOSED · three short-cuts: receiving CC, receiving stateless reset, idle timeout.

为什么需要 Draining

Why draining exists

关闭不能立刻完成——因为对端可能还在 in-flight 中送包过来。如果端点立刻销毁连接状态、再开一个新连接,新连接可能收到旧连接的包,把它误当成新连接的握手包处理——后果可能很严重。

RFC 9000 §10.2 的解法是:发完 CONNECTION_CLOSE 后进入 closing 状态,3 PTO 之内每收到一个包就回一次 CC(用 idempotent CC 避免对端不断重试);然后进入 draining,纯丢包 3 PTO;最后才进入 closed 销毁内存。这 3+3=6 PTO 大约 100-300ms——是 QUIC 连接关闭的真实耗时,不是你看到的"立刻"。

Close cannot complete instantly — the peer might still be sending packets in-flight. If an endpoint frees state immediately and opens a fresh connection, the fresh one might receive the old connection's packets and confuse them with new-connection handshake — potentially catastrophic.

RFC 9000 §10.2's fix: after sending CONNECTION_CLOSE, enter closing; for 3 PTO reply with another CC to every incoming packet (idempotent CC prevents the peer's retries). Then enter draining: silently drop everything for another 3 PTO. Only then enter closed and free memory. 3 + 3 = 6 PTO ≈ 100-300 ms — that's the real cost of closing a QUIC connection, not the "instant" you see.

GOAWAY · HTTP/3 层的优雅

GOAWAY · HTTP/3's grace

HTTP/3 GOAWAY frame · sent on control streamRFC 9114 §5.2
; H3 frame format: Type=0x07, Length, StreamID [GOAWAY frame] Type = 0x07 ; goaway Length = 4 ; bytes after length StreamID = 0x14 ; "I won't process any stream >= 20" ; Semantics: ; - Streams with id < 0x14: server WILL complete them ; - Streams with id >= 0x14: server WILL reject (H3_REQUEST_REJECTED) ; - Client MUST retry rejected requests on a NEW connection ; Servers can send GOAWAY multiple times, each one LOWERING the StreamID. ; Final GOAWAY can be StreamID=0 ("no more streams period"), then CC.
PRACTICAL · 滚动重启 PRACTICAL · Rolling restart CDN(Cloudflare、Fastly、Akamai)滚动重启边缘节点时必须正确实现 GOAWAY,否则上百万个长连接会被一次性 reset,客户端瞬时全员重连 = 雪崩。正确做法:先发 GOAWAY(stream_id=∞) 标记 "不接新请求",等 ~30 秒让 in-flight 完成,再发 GOAWAY(0) + CONNECTION_CLOSE。Cloudflare 的 Pingora 框架专门为这套逻辑做了状态机。 CDNs (Cloudflare, Fastly, Akamai) must implement GOAWAY correctly during edge node rolling restarts, or millions of long-lived connections get reset at once — every client reconnects simultaneously = thundering herd. Correct sequence: send GOAWAY(stream_id=∞) marking "no new requests", wait ~30 s for in-flight to drain, then GOAWAY(0) + CONNECTION_CLOSE. Cloudflare's Pingora framework has a dedicated state machine for this.

CONNECTION_CLOSE 错误码

CONNECTION_CLOSE error codes

CONNECTION_CLOSE 帧带一个错误码——按"是 QUIC 层错还是 H3 层错"分两种:

CONNECTION_CLOSE carries an error code — split into "QUIC-layer" vs "H3-layer":

frame 0x1c · QUIC 层QUIC-layercodeframe 0x1d · H3 层(透传)H3-layer (passthrough)code
NO_ERROR0x00H3_NO_ERROR0x0100
INTERNAL_ERROR0x01H3_GENERAL_PROTOCOL_ERROR0x0101
CONNECTION_REFUSED0x02H3_INTERNAL_ERROR0x0102
FLOW_CONTROL_ERROR0x03H3_STREAM_CREATION_ERROR0x0103
STREAM_LIMIT_ERROR0x04H3_CLOSED_CRITICAL_STREAM0x0104
STREAM_STATE_ERROR0x05H3_FRAME_UNEXPECTED0x0105
PROTOCOL_VIOLATION0x0aH3_REQUEST_REJECTED0x010b
CRYPTO_ERROR(N)0x0100+NH3_VERSION_FALLBACK0x0110

完整清单:RFC 9000 §20 列 18 个 QUIC 错误码;RFC 9114 §8.1 列 17 个 H3 错误码。CRYPTO_ERROR(N) 把所有 TLS Alert 透传出来——比如 CRYPTO_ERROR(0x132) = TLS BAD_RECORD_MAC。

Full lists: RFC 9000 §20 defines 18 QUIC error codes; RFC 9114 §8.1 defines 17 H3 error codes. CRYPTO_ERROR(N) tunnels any TLS Alert — e.g. CRYPTO_ERROR(0x132) = TLS BAD_RECORD_MAC.

Stateless Reset · 服务器丢状态后的最后一招

Stateless Reset · the last resort after state loss

stateless reset · looks like noise · ends with reset_tokenRFC 9000 §10.3 + §21.11
; 任意长度 packet (≥ 22 B),最后 16 字节 = 预分配的 reset_token ; arbitrary-length packet (≥ 22 B), last 16 B = pre-issued reset_token +--------+-----------------------------+----------------------+ | fixed | unpredictable random | reset_token (16 B) | | bit | (≥ 5 B, looks like PN) | | +--------+-----------------------------+----------------------+ ; The receiver finds a CID in its table whose stateless_reset_token matches ; the last 16 bytes — that proves the peer "really lost state". ; Without the token, this would be indistinguishable from random injection.
FIELD NOTE · token 怎么分发 FIELD NOTE · how the token gets there 服务器在每次发 NEW_CONNECTION_ID(RFC 9000 §18.2)都会附上一个 stateless_reset_token,由 HMAC(reset_secret, CID) 派生。客户端把所有看到的 token 存起来;下次如果收到一个"看起来像随机包"且末尾 16 字节命中其中一个 token,就触发 stateless reset 销毁路径。无密钥下的状态恢复——这是 QUIC 工程最优雅的设计之一。 The server attaches a stateless_reset_token every time it sends NEW_CONNECTION_ID (RFC 9000 §18.2), derived as HMAC(reset_secret, CID). The client stores every token it's ever seen. Next time it receives a "looks-random" packet whose last 16 bytes match a stored token, it triggers the stateless-reset teardown path. Keyless state recovery — one of the most elegant designs in QUIC engineering.

KEY_UPDATE · 长连接的密钥滚动

KEY_UPDATE · key rotation on long-lived connections

如果一条连接活了几小时(比如 WebSocket 替代品),用同一把 1-RTT 密钥发太多包会增加分析攻击面。RFC 9001 §6 给出了原地滚动密钥的机制:发送方把 short header 的 Key Phase 位(1 bit)翻转,并用派生的下一代密钥加密。接收方看到 Key Phase 变了,跑一次 HKDF 派生新密钥解密。这一切不需要新一轮握手

If a connection lives for hours (e.g. as a WebSocket replacement), using the same 1-RTT key for too many packets opens analysis attack surface. RFC 9001 §6 defines in-place key rotation: the sender flips the short-header's Key Phase bit (1 bit) and encrypts with the next-generation derived key. The receiver notices Key Phase changed, runs an HKDF step to derive the new key, decrypts. All this without a new handshake.

"关闭不是事件,是过程。" "Close isn't an event, it's a process." Martin Thomson · QUIC WG · RFC 9000 design note
CHAPTER 20

实现现状 — 谁在跑 HTTP/3

Implementations — who runs HTTP/3 today

2026 年的版图

the 2026 landscape

MAIN-LINE · ZOOM-OUT 1/5 · WHO RUNS IT 跳出"一次 GET" 主线: 前面 14 章我们追的是 Chrome 调 ursb.me 这一次请求。这一章往后退一步——同一段 10 阶段的旅程,在 2026 年实际有谁的实现在跑(Chrome / Firefox / Safari / Caddy / quiche / IIS / GFE)、各占多少市场份额。从这里开始,主线进入zoom-out 模式,4 章看完后再回到结尾。 Stepping out of the "one GET" main-line: the previous 14 chapters traced Chrome calling ursb.me one request. This chapter zooms out — who actually runs the same 10-stage journey in 2026 (Chrome / Firefox / Safari / Caddy / quiche / IIS / GFE), and what market share they hold. The main-line is now in zoom-out mode for 4 chapters, then we end the story.

浏览器

Browsers

Chrome / Edge
QUICHE C++
Firefox
neqo · Rust
Safari
URLSession
Brave / Opera
= Chromium

服务器 / 反向代理

Servers / reverse proxies

Cloudflare edge
quiche
Caddy
quic-go
nginx 1.26+
quictls
LiteSpeed
lsquic
Apache mod_http3
实验性experimental
Google GFE
internal quiche
IIS / Win Server 2025
msquic (kernel-mode)

库与产品 · 星座图

Libraries & products · constellation map

google/quiche C++ Chrome / Edge Envoy gRPC · GFE cf/quiche Rust · C-API CF edge nginx 1.26 curl --http3 quinn Rust · async Hyper · Tonic IPFS Cloudflare WT msquic C · kernel-mode IIS · WinServer .NET HttpClient ngtcp2+nghttp3 C · lean Node.js QUIC curl alt IETF interop quic-go Go Caddy s2n-quic AWS · Rust aioquic Python · research lsquic LiteSpeed · C size of circle ≈ install base · arrows = "powers"
FIG 19·1 2026 年 QUIC 库版图 · 圈大小 ≈ 部署量 · 连线表示"驱动"关系 · 同色系 = 同生态。 Fig 19·1 · The 2026 QUIC library constellation · circle size ≈ install base · lines mean "powers" · same colour = same ecosystem.

关键库

Key libraries

librarylang谁用Used by特点Strength
Google quicheC++Chrome · gRPC · Envoy最早最完整most complete
Cloudflare quicheRustCF edge · nginx-quic最快 C-APIfastest C-API
msquicCWindows Server · .NET内核态加速kernel-mode boost
quic-goGoCaddy · IPFSGo 生态唯一Go-ecosystem standard
quinnRustHyper · Tonic · IPFS异步原生async-native
ngtcp2 + nghttp3Ccurl · Node.js最克制最稳lean & rock-stable
aioquicPython学术研究 · CTFresearch · CTF易读源码readable source
s2n-quicRustAWS安全审计严格security-first
picoquicC学术参考实现academic referenceIETF interop 主力IETF interop workhorse
lsquicCLiteSpeed嵌入式部署embeddable

部署份额

Deployment share

45%
Top 1M 站点已开 H3
of Top 1M sites support H3
~35%
全网 web 请求走 H3
of all web requests use H3
~8%
UDP/443 被中间盒阻断
UDP/443 blocked by middlebox

来源:Web Almanac 2025、Cloudflare Radar、W3Techs。CDN 默认开启(Cloudflare / Fastly / Akamai / AWS CloudFront / Google Cloud LB)是普及主因。

Source: Web Almanac 2025, Cloudflare Radar, W3Techs. CDN default-on (Cloudflare / Fastly / Akamai / AWS CloudFront / Google Cloud LB) drove the bulk of adoption.

CHAPTER 21

性能数据 — 真实生产里赢了多少

Performance — what HTTP/3 actually wins in production

在哪儿赢、赢多少

where it wins, and by how much

MAIN-LINE · ZOOM-OUT 2/5 · BY HOW MUCH 仍在主线之外: 上一章问"谁部署了 H3",这一章问"部署完之后真的赢了吗?赢多少?"。把 ursb.me 那一次 GET 放大到百万次 GET,看 Google / Meta / Cloudflare / Fastly / Akamai 的生产数据。2026 年的共识不是"H3 总赢",而是"在某些场景下显著"——具体哪些,本章列。 Still off the main-line: last chapter asked "who deployed H3"; this one asks "does it actually win, and by how much?". We scale the single ursb.me GET up to millions of GETs and look at Google / Meta / Cloudflare / Fastly / Akamai production numbers. The 2026 consensus is not "H3 always wins" but "significantly under specific conditions" — this chapter pins down which.

大厂的实测

Real production numbers

公司 · 场景Company · scenario指标Metric提升Improvement来源Source
Google · YouTube India (4G)视频卡顿率中位video rebuffer median−20% ~ −40%Chrome blog · 2020-10
Google · Searchtail latency−16%SIGCOMM 2017 · Langley
Meta · Facebook App请求错误率request error rate−5%Meta Engineering · 2020
Meta · video streamvideo stall rate−20%+Meta Engineering · 2020
Cloudflare · returning users0-RTT median TTFB−50msCF blog · 0-RTT resumption
Cloudflare · global弱网 TTFBpoor-link TTFB−10% ~ −15%CF Radar · 2024
Fastly · GA launchcold connect−40%Fastly blog · RFC 9000 GA
Apple · iCloud Private Relay切网 RTT 抖动network-switch RTT jitter~ 0(看不出)~ zero (imperceptible)WWDC 2022 · session 110337

数字来自厂商公开 blog / SIGCOMM 论文。原文如有更新请以最新版本为准;上表数字保留首次公开值。

Numbers cite each vendor's first public disclosure on blog or SIGCOMM. If the post has been updated since, the original disclosure value is kept here.

何时 HTTP/3 不如 HTTP/2

When H3 loses to H2

场景 1 · 内网零丢包
Case 1 · lossless internal net
机房内部微服务
intra-DC microservices

数据中心内部丢包 < 0.01%,TCP HOL 几乎不发生。但 HTTP/3 用户态 UDP 处理带来 2× CPU 成本。结果是纯吞吐 H2 over TCP 完胜。gRPC 至今主流仍是 HTTP/2。

Intra-DC loss < 0.01%, TCP HOL almost never fires. But HTTP/3's user-space UDP carries a 2× CPU tax. On pure throughput, H2 over TCP crushes. gRPC still defaults to HTTP/2.

场景 2 · UDP 被封
Case 2 · UDP blocked
企业网 / 金融网络
enterprise / fin networks

~8% 连接尝试因 UDP/443 被防火墙阻断。浏览器 Happy Eyeballs 会自动 fallback 到 H2 over TCP——但用户先付了"试错"的延迟。

~8% of connection attempts get UDP/443 blocked by firewalls. Browser Happy Eyeballs auto-falls back to H2 over TCP — but the user has already paid the "tried it and failed" latency.

场景 3 · 后端瓶颈
Case 3 · backend-bound
SSR + 重 React
SSR + heavy React

如果你的 LCP 主要花在服务端渲染或 JS 主线程上,省下来的 RTT 在水池里游泳,看不见。Patrick Meenan:H3 提升下限,不抬上限。

If your LCP is dominated by SSR or JS main-thread work, the saved RTTs swim in a pond — invisible. Patrick Meenan: H3 raises the floor, not the ceiling.

场景 4 · H3 大杀器
Case 4 · H3 dominates
移动 + 弱网 + 多请求
mobile + weak net + many requests

手机用户、4G/5G、丢包 1-3%、页面有 50+ 子请求——这是 H3 设计场景。0-RTT、连接迁移、流独立丢包恢复全用上。

Mobile users, 4G/5G, 1-3% loss, page has 50+ subresources — H3's home turf. 0-RTT, migration, per-stream loss recovery all fire.

"如果你不知道你的用户在哪里,
HTTP/3 就是合理的默认选择。"
"If you don't know where your users are,
HTTP/3 is the sensible default."
Lucas Pardue · Cloudflare · IETF 116
桥接 · 这本书的立场 BRIDGE · this article's stance 上面两句话——Meenan 的"抬下限不抬上限"和 Pardue 的"不知道用户在哪就用 H3"——并不矛盾:抬下限本身就是合理默认值的依据。规则是:① 你能确定用户在低 RTT + 低丢包的环境(企业内网、有线宽带桌面):H2 完全够,H3 的好处接近 0,反而吃 CPU(下一章 Ch23 详述)。② 你的用户在移动、跨大陆、高峰拥塞场景中混合分布:H3 是 sensible default。③ 下一章 Ch23 列了 H3 的代价(CPU / 调试 / firewall 不友好)——读完两章再决定。本文不写"H3 永远赢",写"什么场景赢、什么场景不赢、代价是什么"。 The two quotes above — Meenan's "raises the floor, not the ceiling" and Pardue's "if you don't know where your users are, use H3" — do not contradict each other: raising the floor is the rationale for a sensible default. The rule: ① If you know your users sit on low-RTT, low-loss links (enterprise LAN, wired broadband desktop): H2 is enough; H3's gains approach zero and the CPU bill rises (Ch23 details). ② If your users are mixed across mobile, cross-continent, and congested-peak conditions: H3 is the sensible default. ③ Ch23 lists H3's costs (CPU / debug / firewall hostility) — read both chapters before deciding. This article does not claim "H3 always wins" — it asks "where it wins, where it doesn't, and what the bill is".
CHAPTER 22

工程实战 — 部署与调试

Field work — deploying and debugging

DNS · curl · Wireshark · qlog · sysctl

DNS · curl · Wireshark · qlog · sysctl

MAIN-LINE · ZOOM-OUT 3/5 · YOUR HANDS ON IT 主线视角下移: 不再看 Chrome 默认完成的那次 GET,而是你自己怎么调出来看每一个阶段——curl --http3 触发 Alt-Svc 协商、Wireshark 看 Initial 包、qlog 记录每个 PN 与 ACK、内核 sysctl 调整 UDP receive buffer。每一项都对应主线某个阶段的调试入口 Main-line, hands-on view: not Chrome's default-completed GET, but your own way to surface every stage — curl --http3 triggers Alt-Svc negotiation, Wireshark catches Initial packets, qlog records every PN and ACK, kernel sysctl tunes the UDP receive buffer. Each is a debug entry-point for one phase of the main-line.

协议发现 · DNS 路径

Discovery · the DNS path

老路 · Alt-Svc 头
Old way · Alt-Svc
第一次访问必走 TCP
first visit always TCP

服务器在响应头里加一行:alt-svc: h3=":443"; ma=86400。客户端记 24 小时,第二次访问才走 H3。意味着新用户的首屏永远拿不到 0-RTT

Server appends a response header: alt-svc: h3=":443"; ma=86400. Client caches it 24h, uses H3 from the next visit. Meaning new users never get 0-RTT on the first paint.

新路 · HTTPS RR (RFC 9460)
New way · HTTPS RR (RFC 9460)
DNS 里直接告诉你
DNS knows up-front

在 DNS 区文件加一行:
ursb.me. 300 IN HTTPS 1 . alpn="h3,h2" ipv4hint="39.105.102.252"
浏览器解析 DNS 就拿到了——第一次访问直接走 H3。配合 RFC 8484 DoHRFC 9250 DoQ,连 DNS 查询本身都加密。

Add one line to the DNS zone:
ursb.me. 300 IN HTTPS 1 . alpn="h3,h2" ipv4hint="39.105.102.252"
The browser gets it at DNS resolution time — first visit goes straight to H3. Combined with RFC 8484 DoH or RFC 9250 DoQ, the DNS query itself is encrypted.

DoQ · RFC 9250 DNS over QUIC(DoQ)是 QUIC 的第二大应用——不是 HTTP/3,是直接把 DNS 查询塞进 QUIC 流。AdGuard、NextDNS、Cloudflare 1.1.1.1 都支持。相比 DoT(DNS over TLS)省 1 RTT,相比 DoH 省 HTTP/3 那一层开销。ALPN 编号是 doq,默认端口 853。 DNS over QUIC (DoQ) is QUIC's second-biggest application — not HTTP/3, just plain DNS queries stuffed into a QUIC stream. AdGuard, NextDNS, Cloudflare 1.1.1.1 all support it. vs DoT it saves 1 RTT; vs DoH it skips the HTTP/3 overhead. ALPN doq, default port 853.

客户端工具

Client tools

verify a server speaks HTTP/3field commands
# curl with H3 (needs --with-quiche or --with-ngtcp2) $ curl --http3 -I https://ursb.me/ HTTP/3 200 alt-svc: h3=":443"; ma=86400 # quiche-client (Cloudflare reference client) $ quiche-client --no-verify https://ursb.me/ # Test a server with custom ALPN $ nghttp3 https://ursb.me/ # Capture and decrypt in Wireshark $ SSLKEYLOGFILE=~/keys.log /Applications/Google\ Chrome.app/... $ tcpdump -i en0 -w cap.pcap udp port 443 # Wireshark → Preferences → TLS → (Pre)-Master-Secret log → ~/keys.log

qlog + qvis · QUIC 的标准日志

qlog + qvis · the standard QUIC log

因为 QUIC 加密了一切,光抓包看不出连接内部发生了什么。IETF 用 qlogdraft-ietf-quic-qlog-main-schema,2024 已多版)定义了一份结构化 JSON 日志格式——服务端/客户端用任何 QUIC 库都可以输出 qlog,把它扔到 qvis.quictools.info 就能看到拥塞窗口曲线、PN 单调性、ACK 时序、loss event、stream 优先级。这是 H3 调试的唯一正解。

Because QUIC encrypts everything, raw pcap shows nothing about what's happening inside. IETF defined qlog (draft-ietf-quic-qlog-main-schema, several revisions by 2024) — a structured JSON log format any QUIC library can emit. Drop it into qvis.quictools.info and you get the congestion-window curve, PN monotonicity, ACK timeline, loss events, stream priorities. The only sane debug path for H3.

QVIS · CONGESTION SAMPLE · loaded from upload.qlog conn 7b0f23e4… · 4.2s · 1842 events bytes 120k 90k 60k 30k 0 0s 1s 2s 3s 4s time since first packet (s) cwnd (bytes) bytes_in_flight ▼ loss · PN 421 ▼ loss · PN 1043 smoothed RTT stream 0 (HTML) stream 4 (CSS) stream 8 (JS) streams 12-40 (images)
FIG 21·1 qvis 拥塞窗口面板(mock)· cwnd 蓝实线 · bytes_in_flight 淡蓝虚线 · sRTT 紫线 · 红色 ▼ 是丢包事件。 Fig 21·1 · A qvis congestion panel (mock) · cwnd solid blue · bytes_in_flight dashed · sRTT purple · red ▼ marks loss events.

服务器端调优

Server-side tuning

设置Knob默认Default推荐Recommended为什么Why
net.core.rmem_max208 KB≥ 2.5 MB单个 UDP socket 缓冲,避免突发丢包single-socket buffer to absorb bursts
net.core.wmem_max208 KB≥ 2.5 MB同上 · 发送方same · send side
GSO/GROoffon让网卡分片 = CPU 降一半NIC segmentation = halve CPU
SO_REUSEPORTon · per-core用 eBPF 把 CID 路由到 CPUeBPF-route CID → CPU
io_uringexperimental异步 IO · 减少系统调用async I/O · fewer syscalls
QPACK dynamic table4 KB4-16 KB大 = 压缩好但 HOL 风险larger = better compression, more HOL risk
DEVOPS 第一次在 nginx 1.26 上跑 HTTP/3 的必修课:(1) 用 quictls fork 替代 OpenSSL,否则编不过;(2) 配置 listen 443 quic reuseport;——少了 reuseport 单核 CPU 直接吃满;(3) 在同一份配置里保留 listen 443 ssl; 走 TCP fallback;(4) 加 add_header alt-svc 'h3=":443"; ma=86400';——一开始我就忘了这条,浏览器永远走不到 H3。 The compulsory checklist for first-time HTTP/3 on nginx 1.26: (1) replace OpenSSL with the quictls fork or it won't build; (2) configure listen 443 quic reuseport; — without reuseport one CPU core pegs at 100%; (3) keep listen 443 ssl; in the same config for TCP fallback; (4) add add_header alt-svc 'h3=":443"; ma=86400'; — I once forgot this and the browser never upgraded.
本章引用Chapter references
draft
qlog · QUIC debug event stream
tool
qvis · qlog visualisation
CHAPTER 23

批评与争议 — HTTP/3 的负面成本

Critique — HTTP/3's downside ledger

没有免费的午餐

no free lunches

MAIN-LINE · ZOOM-OUT 4/5 · THE BILL 主线之外,代价那一面: 我们刚追过的 10 阶段每一步都有代价——服务器多烧 1.5-2× CPU(Fastly 实测)、QUIC 没有内核加速、debug 难、防火墙不友好、ossification 风险。本章把"为什么有公司还坚持 H2 不上 H3"这件事拆开。它是对前 21 章的反向校验 Off-main-line, the bill side: every one of the 10 stages we just traced has a cost — servers burn 1.5–2× more CPU (Fastly's measurement), QUIC lacks kernel offload, debugging is harder, firewalls are unfriendly, ossification looms. This chapter unpacks "why some companies stick to H2". A reality check on the first 21 chapters.

争议 1 · 用户态 CPU 成本

Critique 1 · user-space CPU tax

Fastly 在 2020 年公开的实测:在相同吞吐下,HTTP/3 的 CPU 消耗是 HTTP/2 over TLS 的 1.5x ~ 2x。原因:每个 UDP 包都要进出用户态、做独立 AEAD 加解密、维护用户态拥塞控制状态。这是 CDN 厂商真正头疼的事——同样的服务器,H3 流量上限只有 H2 的一半。

Fastly's 2020 disclosure: at equal throughput, HTTP/3 burns 1.5x – 2x the CPU of HTTP/2 over TLS. Reason: every UDP packet crosses user/kernel boundary, does its own AEAD encrypt/decrypt, and maintains user-space cc state. The real CDN pain — the same box can carry half the H3 traffic of H2.

解药正在路上

Cures in progress

io_uring + Generic Segmentation Offload
Linux 5.x+ 的 io_uring 把"批量发包"变成可能。配合 GSO,让网卡分片 UDP——CPU 成本立刻降一半。
Linux 5.x+ io_uring enables "batch send". Combined with GSO, the NIC segments UDP — halving CPU cost overnight.
AF_QUIC (Linux kernel)
2024 年 LWN 上有提案:把 QUIC 加密/拥塞控制塞回内核,给个新的 socket 类型 AF_QUIC。还在讨论,远未合入。
A 2024 LWN proposal: push QUIC encryption + cc back into the kernel, add a new socket family AF_QUIC. Still discussion, far from merge.
msquic kernel-mode (Windows)
Microsoft 已经把 msquic 跑在 Windows 内核态,IIS 直接享受。比纯用户态实现快 30-40%。
Microsoft already runs msquic in Windows kernel mode; IIS reaps the benefit directly. 30-40% faster than pure user-space.
DPDK / XDP offload
Cloudflare 在边缘节点用 XDP 在内核网络栈早段过滤 UDP/443,绕开 socket 处理,CPU 又省 20%。
Cloudflare's edge nodes use XDP to filter UDP/443 in the kernel networking fastpath, bypassing socket processing — another 20% off.

争议 2 · 可观测性"瞎了"

Critique 2 · observability "goes blind"

过去运营商靠 TCP 序列号、SACK、SNI 明文做带宽统计、QoS 调度、DPI 拦截。QUIC 把这些全加密了。运营商失去了路径上的"抓手"——这是有意的,但也是一些行业(金融监管、合规审计、家长控制)真正头疼的事。Spin Bit 是部分妥协,但远远不够。

Carriers used to measure bandwidth, do QoS, run DPI based on TCP seq/SACK/cleartext SNI. QUIC encrypted all of that. Operators lost their "handles" on the path — this was intentional, but it's a real pain for industries like financial regulation, compliance auditing, parental control. Spin Bit is a partial compromise; nowhere close to enough.

争议 3 · HTTP/2 其实够用?

Critique 3 · was HTTP/2 enough?

Patrick Meenan、Steve Souders 等 Web 性能老兵不停指出:如果你的网站性能瓶颈是 JS 执行SSR 等待第三方脚本,HTTP/3 帮你的部分微乎其微。这是真的。HTTP/3 抬升的是分布的下限——P95、P99 的弱网用户体验。如果你的产品根本没有 P95 弱网用户(比如你只服务美国/欧洲城市光纤),花精力上 H3 的 ROI 接近零。

Patrick Meenan, Steve Souders and other web-perf veterans keep pointing out: if your bottleneck is JS execution, SSR wait, or third-party scripts, HTTP/3 helps you very little. True. HTTP/3 lifts the floor of the distribution — P95/P99 weak-link users. If your product has no P95 weak-link users (e.g. you only serve fiber-grade US/EU cities), the ROI of switching to H3 is near zero.

FAIRNESS 公平地说,对开发者来说 H3 的实际损益取决于场景:服务全球移动用户 ⇒ 显著正收益;服务局域网/桌面办公 ⇒ 几乎中性;内部微服务 ⇒ 负收益。"所有人都要上 H3"不是技术决策——是 CDN 厂商希望你这么做。 Honestly: H3's payoff depends on the workload. Serving global mobile users ⇒ clear net win. Serving LAN / desktop office ⇒ roughly neutral. Internal microservices ⇒ net loss. "Everyone should be on H3" is not a technical conclusion — it's what CDNs want you to believe.
CHAPTER 24

HTTP/3 之后 — QUIC 上长出来的下一代

After HTTP/3 — what's growing on top of QUIC

WebTransport · MASQUE · MoQ · HTTP/4?

WebTransport · MASQUE · MoQ · HTTP/4?

MAIN-LINE · ZOOM-OUT 5/5 · WHAT GROWS NEXT 主线视角最终一跳: ursb.me 那次 GET 走完了。但 QUIC 这条底座不只是为 HTTP/3 准备的——它已经长出 WebTransport(替代 WebSocket)、MASQUE(下一代 VPN)、MoQ(实时媒体)、DoQ(DNS-over-QUIC)。下次你按回车,字节走的可能不再是"HTTP over QUIC",而是"另一种 over QUIC"。这是主线讲完后才能讲的一章。 Main-line, the last hop: the ursb.me GET is done. But QUIC as a substrate wasn't only built for HTTP/3 — it has grown WebTransport (WebSocket successor), MASQUE (next-gen VPN), MoQ (real-time media), DoQ (DNS-over-QUIC). Next time you press Enter, the bytes may no longer travel as "HTTP over QUIC" but as "something else over QUIC". The closing chapter only makes sense after the main story has ended.

HTTP/3 不是终点——它是 QUIC 这个"通用安全传输"找到的第一个杀手应用。QUIC 上正在长出一片新协议生态。下面是 2026 年的四个方向。

HTTP/3 isn't the finish line — it's the first killer app of QUIC as a "generic secure transport". A whole protocol ecosystem is growing on top. Here are 2026's four directions.

① WebTransport · 取代 WebSocket

① WebTransport · the WebSocket successor

WebSocket 跑在 HTTP/1.1 Upgrade 上,有 TCP head-of-line,无可靠/不可靠混合、不适合 RTC。WebTransport over HTTP/3W3C WebTransport API + draft-ietf-webtrans-http3)给浏览器开放:(a) 可靠双向流;(b) 不可靠 datagram(RFC 9221)。Chrome 自 97 起原生支持,ALPN 复用 h3。云游戏 / 在线协作 / 实时翻译已经开始迁。

WebSocket runs on HTTP/1.1 Upgrade, inherits TCP HOL, lacks a mixed reliable/unreliable channel, and is awful for RTC. WebTransport over HTTP/3 (W3C WebTransport API + draft-ietf-webtrans-http3) exposes to browsers: (a) reliable bidi streams; (b) unreliable datagrams (RFC 9221). Chrome shipped support in 97; ALPN reuses h3. Cloud gaming, collaboration, live translation are migrating.

② MASQUE · 下一代 VPN 隧道

② MASQUE · the next-gen VPN tunnel

CONNECT-UDP
用 H3 转发 UDP
tunnel UDP via H3
CONNECT-IP
用 H3 转发整个 IP 包
tunnel IP via H3
Capsule
通用 H3 容器协议
generic H3 container
CAPSULE · RFC 9297 RFC 9297 · HTTP Datagrams and the Capsule Protocol 是 CONNECT-UDP / CONNECT-IP 的容器规范——定义了如何在 HTTP/3 流里携带"类似数据报"的载荷。是整个 MASQUE 栈的第三块拼图,常被简介材料省略。 RFC 9297 · HTTP Datagrams and the Capsule Protocol is the container spec behind CONNECT-UDP / CONNECT-IP — it defines how "datagram-like" payloads ride inside HTTP/3 streams. The third piece of the MASQUE puzzle, frequently omitted from MASQUE summaries.
CASE · ICLOUD PRIVATE RELAY
两跳隔离:MASQUE 的杀手案例
Two-hop relay: the MASQUE killer demo

Apple iCloud Private Relay(iOS 15+ 的 iCloud+ 功能)是目前最大量产的 MASQUE 实战。它的核心架构不是"用 H3 加密一下",而是故意把信任切两半

iCloud Private Relay (iOS 15+ as part of iCloud+) is by far the largest production MASQUE deployment. Its core trick isn't "tunnel things in H3" — it's deliberately splitting trust into two halves:

MASQUE · CONNECT-UDP · TWO-HOP RELAY (iCloud Private Relay) CLIENT iPhone, Safari INGRESS Apple (mask.icloud.com) EGRESS CDN partner (CF / Akamai) ORIGIN ursb.me knows: client IP does NOT know: target knows: target does NOT know: client IP H3 · CONNECT-UDP encrypted to ingress capsule · blinded ID ingress strips client IP TLS 1.3 / H3 egress IP only Trust split Neither hop sees both {client_ip, target} — collusion alone breaks privacy. Apple operates the ingress; egress is run by an independent CDN partner. RFC 9298 · CONNECT-UDP RFC 9297 · Capsule WWDC 2022 · session 10009
FIG 24·1 两跳架构 · Apple 看得到客户端 IP 但看不到目标域名 · CDN 伙伴看得到目标但拿不到客户端 IP · 单方泄露都不够 Fig 24·1 · Two-hop split · Apple sees client IP but not the destination; CDN partner sees the destination but not the client IP · neither side alone can deanonymise.

三件关键事实:

  • 客户端→入口 用 CONNECT-UDP(RFC 9298)建 H3 隧道;隧道内载荷再用 capsule(RFC 9297)打包传给出口。
  • 入口(Apple 自营)和出口(CDN 合作伙伴,目前为 Cloudflare / Akamai / Fastly)由不同主体运营——除非两家串通,没人能同时知道" 访问 哪里"。
  • 每个连接的客户端 IP 在入口处被替换成同一地理区域内的盲化 IP——服务器看到的 IP 仍能做粗粒度地理路由(CDN POP 选择、本地化),但精度只到城市级。

Three things to know:

  • Client → ingress uses CONNECT-UDP (RFC 9298) to set up the H3 tunnel; payloads inside are wrapped in capsules (RFC 9297) and forwarded to the egress.
  • Ingress (Apple-operated) and egress (CDN partners — currently Cloudflare, Akamai, Fastly) are run by different entities. Absent collusion, no single side knows "who visited where".
  • The client IP is rewritten at the ingress to a region-blinded IP — origins can still do coarse geo-routing (POP selection, localisation), but only at city granularity.

这是 MASQUE 至今最大、唯一商用规模的部署。它没有用 CONNECT-IP(更激进的整 IP 包封装),只用 CONNECT-UDP——Apple 不需要 VPN 全包代理的语义,只需要让 Web 流量"看起来都是同一个 IP 发出的"。剩下的 CONNECT-IP 用例(VPN 替代)还在等下一波。

This is the largest — and so far only commercial-scale — MASQUE deployment. Notably it uses only CONNECT-UDP, not CONNECT-IP (the more aggressive whole-IP-packet tunnel). Apple doesn't need full VPN semantics; it just needs web traffic to "look like it comes from one IP". The CONNECT-IP use case (full VPN replacement) is still waiting for the next wave.

③ Media over QUIC(MoQ)

③ Media over QUIC (MoQ)

HLS / DASH 延迟 5-30 秒;WebRTC 延迟 100ms 但太重、不好缓存。Media over QUIC(IETF MoQ WG 推进中)目标是亚秒级延迟 + CDN 可缓存,发布/订阅模式。预期取代体育直播、低延迟视频、合作直播的传输层。Cloudflare 已经把它内置进 Workers。

HLS / DASH have 5-30s latency; WebRTC is 100ms but heavy and uncacheable. Media over QUIC (IETF MoQ WG in progress) targets sub-second latency + CDN-cacheable, with a pub/sub model. Slated to replace transport for sports streaming, low-latency video, collaborative live. Cloudflare already ships it inside Workers.

④ HTTP/4?

④ HTTP/4?

IETF 当前的共识:未来 5-10 年内不会有 HTTP/4。理由很务实:HTTP/3 + QUIC 的扩展机制(Datagram、KEY_UPDATE、ALPN、可插拔 cc、TLS extension)已经足够柔软。要加东西(抗量子加密、FEC 前向纠错、新拥塞算法),都可以作为扩展挂在 QUIC 上,不需要新的主版本号。所以 H3 大概率会像 IPv4 那样长寿。

Current IETF consensus: no HTTP/4 in the next 5-10 years. Practical reason: HTTP/3 + QUIC's extension mechanisms (Datagram, KEY_UPDATE, ALPN, pluggable cc, TLS extensions) are flexible enough. Adding things — post-quantum crypto, FEC, new cc algorithms — all fit as QUIC extensions; no new major version needed. So H3 will likely live like IPv4 — for decades.

"我们花三十年做了一个能装下未来三十年的传输层。" "Thirty years of work for a transport layer that can hold the next thirty." Lars Eggert · IETF QUIC WG · 2022
CHEATSHEET

如果你只记 10 件事

If you only remember 10 things

这一节就是给你撕下来贴墙的

the page you'd print, pin, and screenshot

读完 24 章是一回事,下一次给同事讲清楚是另一回事。下面 10 条是这篇文章里最反直觉、最值得带走的事实——每一条都标了对应章节和最关键的 RFC 锚点。

Reading 24 chapters is one thing; explaining it cleanly to a colleague is another. The ten facts below are the most counter-intuitive takeaways — each pinned to its chapter and the single most important RFC anchor.

  1. 01
    0-RTT ≠ 免费0-RTT isn't free
    只能用在 idempotent 方法(GET/HEAD)。POST/PUT 默认有重放风险;服务器读 Early-Data: 1 决定是否降级到 1-RTT。Ch 08 · RFC 8470
    Idempotent methods only (GET/HEAD). POST/PUT on 0-RTT is replay-prone; servers inspect Early-Data: 1 to decide whether to defer to 1-RTT. Ch 08 · RFC 8470
  2. 02
    Initial 必须 padding 到 ≥ 1200 字节Initial packets must pad to ≥ 1200 bytes
    反放大攻击的 3× 预算从这里来——客户端"先付够字节",服务器才能合法回大包。Ch 06 · RFC 9000 §14.1
    The 3× anti-amplification budget starts here — the client has to "pay enough bytes first" before the server may return a large response. Ch 06 · RFC 9000 §14.1
  3. 03
    TLS 1.3 在 QUIC 里被腰斩TLS 1.3 inside QUIC is stripped down
    只保留两个角色:(1) 密钥协商引擎,(2) 身份认证。record layer 整个砍掉,QUIC 自己做加密包装——这就是为什么 stock OpenSSL 不行,所有 QUIC 库都 fork BoringSSL / quictls。Ch 07 · RFC 9001
    Two roles only: (1) key-agreement engine, (2) identity auth. The record layer is gone — QUIC handles packet wrapping itself. Hence stock OpenSSL won't work; every QUIC implementation forks BoringSSL or quictls. Ch 07 · RFC 9001
  4. 04
    4 加密级 / 3 PN 空间4 encryption levels / 3 PN spaces
    Initial / 0-RTT / Handshake / 1-RTT 四套密钥;PN 空间 Initial、Handshake、Application 互相独立,各自严格单调。0-RTT 复用 Application PN 空间。Ch 09 · RFC 9001 §4
    Four key sets: Initial / 0-RTT / Handshake / 1-RTT. PN spaces (Initial, Handshake, Application) are independent and each strictly monotonic; 0-RTT shares the Application space. Ch 09 · RFC 9001 §4
  5. 05
    28 种 QUIC 帧分四族28 QUIC frames in four families
    控制(8) · 可靠性(2) · 流与流控(12) · 加密+扩展(2)。STREAM 一种类型有 8 个变体——OFF/LEN/FIN 三 bit 编进 type 字节里。Ch 10 · RFC 9000 §19
    Control (8) · reliability (2) · streams & flow ctrl (12) · crypto + ext (2). The single STREAM type carries 8 variants — OFF/LEN/FIN as three bits inside the type byte. Ch 10 · RFC 9000 §19
  6. 06
    QPACK 用双向同步流杀 HOLQPACK kills HOL with two sync streams
    encoder 流(0x02) + decoder 流(0x03);每条 HEADERS 带 Required Insert Count。表没就绪只阻塞这一条流,其它流照跑。默认动态表 4 KB——比 HPACK 64 KB 小得多是有意的。Ch 15 · RFC 9204
    An encoder stream (0x02) + a decoder stream (0x03); each HEADERS frame carries a Required Insert Count. Missing inserts block only that stream — others run on. The 4 KB default dynamic table (vs HPACK's 64 KB) is intentional. Ch 15 · RFC 9204
  7. 07
    迁移靠 CID,不靠 IPMigration uses CID, not IP
    Wi-Fi → 5G 时 IP/port 全换,连接照样活——服务器看的是 Destination CID。PATH_CHALLENGE / PATH_RESPONSE 做新路径验证,防伪迁。Ch 17 · RFC 9000 §9
    When Wi-Fi → 5G swaps the IP/port pair, the connection survives because the server keys on Destination CID. PATH_CHALLENGE / PATH_RESPONSE validate the new path against spoofing. Ch 17 · RFC 9000 §9
  8. 08
    Alt-Svc 永远拿不到首屏 0-RTTAlt-Svc can't deliver first-hit 0-RTT
    客户端必须先走一次 TCP 才能拿到 H3 提示。HTTPS RR(RFC 9460)在 DNS 解析时就告诉浏览器走 H3——首次访问直接 H3。要做就做 HTTPS RR。Ch 22 · RFC 9460
    Client has to make one TCP visit first before learning H3 is on. HTTPS RR (RFC 9460) carries the hint in the DNS response itself — first visit goes straight to H3. If you're serious, ship HTTPS RR. Ch 22 · RFC 9460
  9. 09
    H3 比 H2 多 1.5–2× CPU 成本H3 burns 1.5–2× the CPU of H2
    Fastly 2020 实测。原因是用户态 UDP + 每包独立 AEAD + 用户态拥塞控制。GSO + SO_REUSEPORT + (Linux) io_uring + XDP 能砍掉一半。Ch 23
    Fastly 2020 measurement. Caused by user-space UDP, per-packet AEAD, and user-space congestion control. GSO + SO_REUSEPORT + Linux io_uring + XDP can halve it. Ch 23
  10. 10
    H3 抬下限,不抬上限H3 lifts the floor, not the ceiling
    P95 / P99 弱网用户受益巨大;光纤桌面用户感觉不出。如果你的用户没有 P95 弱网那一段,迁移 ROI 接近零。这是 Patrick Meenan 的公论。Ch 21
    P95 / P99 weak-link users gain hugely; fiber-grade desktop users feel nothing. If your audience has no P95 weak-link segment, the migration ROI is near zero (Patrick Meenan's call). Ch 21
DEBUG · 第 11 件事 DEBUG · the bonus one qlog + qvis 是唯一调试路。QUIC 加密了一切,原始 pcap 看不到拥塞窗口、丢包、ACK 时序、stream 优先级——任何严肃的 H3 部署都必须打 qlog,扔进 qvis 看。不接 qlog 等于盲调。Ch 22 qlog + qvis is the only debug path. QUIC encrypts everything; a raw pcap shows nothing about cwnd, loss, ACK timing or stream priorities. Any serious H3 deployment must emit qlog and load it into qvis. No qlog = flying blind. Ch 22
APPENDIX · 01 · HANDS-ON

从零写一个最小 QUIC 客户端 — 200 行 Rust + quiche

From scratch — a minimal QUIC client in 200 lines of Rust + quiche

读完能写,不只是读懂

readable code you could rewrite from memory

Goal
GET https://cloudflare-quic.com
Library
cloudflare/quiche
Lines
~ 200 Rust
Verifiable
SSLKEYLOGFILE + Wireshark

前 24 章读 wire 格式,这一章反过来:用 Cloudflare 的 quiche 库一步步实现一个能跑的 QUIC 客户端,发一个 GET 请求。完整代码 ~ 200 行 Rust,每段配 RFC 9000 / 9001 / 9114 章节引用。读完这一章你应该能从空 main.rs 起步,凭记忆复刻整个客户端。

The previous 24 chapters read the wire format. This one inverts: build a working QUIC client step by step using Cloudflare's quiche library, sending one GET request. The complete code is ~ 200 lines of Rust, each section cross-referenced to RFC 9000 / 9001 / 9114. By the end you should be able to start with an empty main.rs and reproduce the whole client from memory.

A · 项目骨架

A · Project skeleton

# Cargo.toml
[package]
name = "minquic"
version = "0.1.0"
edition = "2021"

[dependencies]
quiche = "0.21"   # Cloudflare's QUIC implementation, BoringSSL inside
mio    = "0.8"    # non-blocking UDP socket
ring   = "0.17"   # for stateless reset token random
url    = "2.5"
log    = "0.4"
env_logger = "0.10"

三个依赖:quiche 做 QUIC + TLS;mio 做非阻塞 socket 事件循环;ring 给我们一些密码学杂活。整套不到 1 MB 编译产物

Three dependencies: quiche for QUIC + TLS, mio for the non-blocking socket event loop, ring for cryptographic odd jobs. The whole thing compiles to under 1 MB.

B · 创建 socket 与配置

B · Socket and config

use mio::net::UdpSocket;
use std::net::SocketAddr;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    env_logger::init();

    // ---- (1) Parse target URL --------------------------------
    let url = url::Url::parse("https://cloudflare-quic.com/")?;
    let host = url.host_str().unwrap();
    let port = url.port_or_known_default().unwrap();
    let peer_addr: SocketAddr = format!("{}:{}", host, port)
                                .to_socket_addrs()?.next().unwrap();

    // ---- (2) Bind a UDP socket (local) -----------------------
    let local_addr: SocketAddr = "0.0.0.0:0".parse()?;
    let mut socket = UdpSocket::bind(local_addr)?;
    let mut poll = mio::Poll::new()?;
    let mut events = mio::Events::with_capacity(1024);
    poll.registry().register(&mut socket, mio::Token(0),
                              mio::Interest::READABLE)?;

    // ---- (3) Build the quiche Config -------------------------
    let mut config = quiche::Config::new(quiche::PROTOCOL_VERSION)?;  // = 1 (RFC 9000)
    config.set_application_protos(&[b"h3"])?;          // ALPN: HTTP/3
    config.set_max_idle_timeout(5_000);                // ms
    config.set_max_recv_udp_payload_size(1350);         // avoid IPv6 PMTU issues
    config.set_max_send_udp_payload_size(1350);
    config.set_initial_max_data(10_000_000);            // flow control · conn-level
    config.set_initial_max_stream_data_bidi_local(1_000_000);  // per-stream
    config.set_initial_max_streams_bidi(100);
    config.set_initial_max_streams_uni(100);
    config.verify_peer(true);                          // TLS cert validation
    config.load_verify_locations_from_directory("/etc/ssl/certs")?;

三件事:① 解析 URL 拿到 (host, port);② 起 UDP socket(local 0:0 让 OS 分配端口)+ mio poll;③ 构造 quiche Config——ALPN=h3 必填,5 秒空闲超时,1350 字节 UDP payload 上限(留 50 字节给 IPv6 头扩展),初始流量控制限额(10 MB conn + 1 MB stream),证书验证打开。

Three steps: ① parse URL to get (host, port); ② bind UDP socket (local 0:0, let OS choose port) + mio poll; ③ build quiche Config — ALPN=h3 is mandatory, 5 s idle timeout, 1350-byte UDP payload cap (leaves 50 B for IPv6 header extensions), initial flow-control limits (10 MB connection + 1 MB stream), peer verification on.

C · 生成 SCID 并发起 connect

C · Generate SCID and connect

    // ---- (4) Generate a random source CID --------------------
    let mut scid = [0; quiche::MAX_CONN_ID_LEN];   // 20
    ring::rand::SystemRandom::new().fill(&mut scid)?;
    let scid = quiche::ConnectionId::from_ref(&scid);

    // ---- (5) Initiate the connection -------------------------
    // `quiche::connect` creates Initial keys (RFC 9001 §5.2),
    // constructs ClientHello, and prepares to emit Initial packet.
    let mut conn = quiche::connect(
        Some(host),               // SNI
        &scid,
        local_addr,
        peer_addr,
        &mut config,
    )?;

    log::info!("connecting to {} from {}", peer_addr, local_addr);

关键的第一次时刻:quiche::connect 内部做了 RFC 9001 §5.2 全套——用 client DCID 做 HKDF 输入派生 Initial keys(salt 是公开常量)、构造 ClientHello(含 ALPN/SNI/transport parameters/keyshare),把它放进 CRYPTO 帧准备发送。此刻还没字节上线。

The crucial first moment: quiche::connect internally runs RFC 9001 §5.2 — derives Initial keys via HKDF on client DCID (salt is a public constant), constructs ClientHello (with ALPN/SNI/transport parameters/keyshare), and queues it into a CRYPTO frame. No bytes have hit the wire yet.

D · 事件循环

D · The event loop

    let mut buf = [0; 65535];
    let mut out = [0; quiche::MAX_DATAGRAM_SIZE];   // 1350
    let mut req_sent = false;

    loop {
        // (6) Send anything quiche has queued ------------------
        loop {
            let (write, send_info) = match conn.send(&mut out) {
                Ok(v) => v,
                Err(quiche::Error::Done) => break,
                Err(e) => return Err(e.into()),
            };
            socket.send_to(&out[..write], &send_info.to)?;
            log::debug!("sent {} B to {}", write, send_info.to);
        }

        // (7) Wait for either an incoming packet or the timeout
        let timeout = conn.timeout();
        poll.poll(&mut events, timeout)?;

        // timer fired (PTO, idle, etc.)
        if events.is_empty() {
            conn.on_timeout();
            continue;
        }

        // (8) Drain UDP socket ----------------------------------
        while let Ok((read, from)) = socket.recv_from(&mut buf) {
            let recv_info = quiche::RecvInfo { to: local_addr, from };
            match conn.recv(&mut buf[..read], recv_info) {
                Ok(_) => {},
                Err(e) => { log::warn!("recv: {}", e); break; }
            }
        }

        if conn.is_closed() { break; }

        // (9) Once handshake is done, send the GET --------------
        if conn.is_established() && !req_sent {
            send_http3_request(&mut conn, host, "/")?;
            req_sent = true;
        }

        // (10) Drain readable streams ---------------------------
        for stream_id in conn.readable() {
            while let Ok((read, fin)) = conn.stream_recv(stream_id, &mut buf) {
                std::io::stdout().write_all(&buf[..read])?;
                if fin {
                    log::info!("stream {} closed", stream_id);
                    conn.close(true, 0x00, b"bye")?;
                }
            }
        }
    }

    Ok(())
}

事件循环的 5 个步骤是所有 QUIC 客户端的共同骨架:(6) 把 quiche 排队待发的字节全部送出 → (7) poll 直到收到包 PTO/idle 超时(conn.timeout() 给出下次 deadline)→ (8) 把 socket 里所有 UDP 包灌进 conn.recv → (9) 握手完成后发 HTTP/3 request → (10) 读 readable streams 把 response 字节 dump 出来。quiche 不替你做 socket I/O 与 timer——它只是状态机,你必须自己驱动。

The five-step event loop is the common skeleton for every QUIC client: (6) drain quiche's pending send queue → (7) poll until a packet arrives or the PTO/idle timer fires (conn.timeout() tells you the next deadline) → (8) feed every UDP packet through conn.recv → (9) once handshake completes, send the HTTP/3 request → (10) read readable streams and dump response bytes. quiche does no socket I/O or timer handling for you — it's a state machine; you drive it.

E · 发 HTTP/3 请求

E · Send the HTTP/3 request

fn send_http3_request(
    conn: &mut quiche::Connection,
    host: &str,
    path: &str,
) -> Result<(), quiche::h3::Error> {
    // (E.1) Create H3 client on top of the QUIC conn ----------
    // This opens StreamID=2 (control) and StreamID=6/10 (QPACK enc/dec)
    let h3_config = quiche::h3::Config::new()?;
    let mut h3 = quiche::h3::Connection::with_transport(conn, &h3_config)?;

    // (E.2) Build the request headers (RFC 9114 §4) -----------
    let req_headers = vec![
        quiche::h3::Header::new(b":method", b"GET"),
        quiche::h3::Header::new(b":scheme", b"https"),
        quiche::h3::Header::new(b":authority", host.as_bytes()),
        quiche::h3::Header::new(b":path", path.as_bytes()),
        quiche::h3::Header::new(b"user-agent", b"minquic/0.1"),
    ];

    // (E.3) Send · QPACK encodes headers, writes HEADERS frame
    //       onto StreamID=0 (first client-initiated bidi stream)
    let stream_id = h3.send_request(conn, &req_headers, true)?;
    log::info!("sent request on stream {}", stream_id);

    Ok(())
}

五件事在 send_request 里发生:① QPACK 把 5 个 pseudo-header 压缩成 ~ 12 字节(命中 static table 多次);② 包成 HEADERS frame (type=0x01);③ 写到 StreamID=0(第一条 client-initiated bidi);④ FIN flag set 因为我们没有 body;⑤ quiche 内部把 STREAM 帧排进 1-RTT 包发送队列。最后由事件循环步骤 (6) 真正送上线。

Five things happen inside send_request: ① QPACK compresses the 5 pseudo-headers into ~ 12 bytes (multiple static-table hits); ② wraps them in a HEADERS frame (type=0x01); ③ writes them to StreamID=0 (first client-initiated bidi); ④ sets FIN since we have no body; ⑤ quiche queues a STREAM frame into the 1-RTT send queue. Event-loop step (6) actually pushes the bytes out.

F · 验证 — Wireshark + SSLKEYLOGFILE

F · Verify — Wireshark + SSLKEYLOGFILE

$ SSLKEYLOGFILE=keys.log cargo run --release 2>quic.log
HTTP/3 200
content-type: text/html
content-length: 12345
...

$ # in another shell, capture the UDP traffic:
$ sudo tcpdump -i en0 -w cap.pcap udp port 443

$ # Open cap.pcap in Wireshark:
$ # Preferences → Protocols → TLS → (Pre)-Master-Secret log file = keys.log
$ # Wireshark now decrypts QUIC and shows individual frames.
SSLKEYLOGFILE — QUIC 调试最重要的环境变量SSLKEYLOGFILE — the single most useful env var for QUIC debugging BoringSSL/quiche 看到这个环境变量后,会把所有派生密钥(initial / handshake / 1-RTT、双向)以 NSS Key Log Format 写入指定文件。Wireshark 拿到这个文件就能事后解密整段抓包——握手内容、frame 内容、payload 全部展开。没有这个,你只能看到 UDP 字节,什么协议级信息都没有。所有严肃的 QUIC 部署都预留了打开它的开关。 BoringSSL/quiche, when this env var is set, writes every derived key (initial / handshake / 1-RTT, both directions) to the file in NSS Key Log Format. Wireshark uses it to retroactively decrypt the capture — handshake contents, frame contents, payload all expand. Without it, you see only UDP bytes; no protocol-level info. Every serious QUIC deployment leaves a switch for this on standby.

G · 200 行做不到什么 — 边界

G · What 200 lines doesn't do — boundaries

×
不处理 0-RTT(会话恢复)
No 0-RTT (session resumption)
需要持久化 NewSessionTicket、跨进程存储 PSK、客户端启动时尝试 0-RTT 但失败时优雅回退。多 ~ 100 行。
Requires persisting NewSessionTicket, cross-process PSK storage, attempt 0-RTT on startup with graceful fallback. ~ 100 more lines.
×
不处理连接迁移
No connection migration
需要 OS-level 网络变化通知(NetworkPath API on macOS, RouteChange on Win),发起 PATH_CHALLENGE 序列。生产代码 ~ 300 行。
Needs OS-level network-change notifications (NetworkPath API on macOS, RouteChange on Win) and a PATH_CHALLENGE sequence. ~ 300 lines in production.
×
不处理 MTU discovery
No PMTU discovery
DPLPMTUD(RFC 8899)需要发探测包 + 二分搜索路径 MTU。生产代码必备,我们这里用 1350 写死。
DPLPMTUD (RFC 8899) needs probe packets + binary search over path MTU. Production-grade requirement; we hard-code 1350.
×
不处理 GOAWAY / graceful shutdown
No GOAWAY / graceful shutdown
server 重启时发 GOAWAY,客户端要识别并避免开新 stream。RFC 9114 §5.2
Server emits GOAWAY on restart; client should recognise and stop opening new streams. RFC 9114 §5.2.
读完会写,不只是读懂
这是这一章的不可逆能力。 Field Note · 06 · Hands-on
You can now write it,
not just read it.
That's the irreversible ability this chapter gives. Field Note · 06 · Hands-on
APPENDIX · 02 · SECURITY

QUIC 安全攻击面 — 0-RTT replay 之外的 7 类

QUIC attack surface — seven classes beyond 0-RTT replay

RFC 9000 §21 摊开

RFC 9000 §21 unpacked

Ch08 讲了 0-RTT replay。但 QUIC 的安全表面远不止这一条。RFC 9000 §21 单独列了 21 类已知威胁,这一章挑 7 类对部署者最重要的展开:① amplification(off-path)、② version downgrade、③ stateless reset(off-path 注入)、④ slow read DoS、⑤ linkability via CID、⑥ Optimistic ACK、⑦ handshake DoS。每一类配缓解方案与对应的 RFC 9000 章节。

Ch08 covered 0-RTT replay. But QUIC's attack surface is much wider. RFC 9000 §21 enumerates 21 known threat classes; this chapter unpacks the seven most operationally important: ① amplification (off-path), ② version downgrade, ③ stateless reset injection, ④ slow read DoS, ⑤ linkability via CID, ⑥ Optimistic ACK, ⑦ handshake DoS. Each comes with mitigations and the RFC 9000 section that governs it.

① Amplification attack(off-path)

① Amplification attack (off-path)

威胁:攻击者伪造受害者源 IP,发一个小 ClientHello UDP 包给 QUIC 服务器;服务器回一个的 Initial response(包含证书链,常见 3-5 KB)给"受害者"。放大比 ~ 50×-100×,这是 UDP-based reflection DoS 的标准模式。

Threat: attacker forges victim's source IP, sends a small ClientHello UDP datagram to a QUIC server; server sends a large Initial response (with certificate chain, usually 3-5 KB) to the "victim". Amplification ratio ~ 50–100×, the classic UDP reflection DoS pattern.

缓解(RFC 9000 §8.1):

Mitigation (RFC 9000 §8.1):

a
3× 反放大上限
3× anti-amplification cap
服务器在未验证路径上发送的字节总数不能超过客户端来包的 3 倍。这是为什么 ClientHello 必须 padding 到 ≥ 1200 字节——给服务器 3600 字节预算够发证书链。
Server can't send more than 3× the bytes the client sent on an unvalidated path. That's why ClientHello must pad to ≥ 1200 bytes — gives the server a 3600-byte budget for the certificate chain.
b
Retry(stateless cookie)
Retry (stateless cookie)
服务器疑惑时回一个 ~ 70 byte Retry 包 + 加密 token。客户端必须把 token 在第二次 ClientHello 里 echo——这证明了它能在那个 IP 上收包。Off-path 攻击者拿不到 token,游戏结束。
When suspicious, server sends a ~ 70-byte Retry packet with an encrypted token. Client must echo the token in its second ClientHello — proving it can receive at that IP. Off-path attacker can't get the token; game over.

② Version downgrade

② Version downgrade

威胁:active MITM 拦截 ClientHello,把 QUIC v1 改成更弱的 v0 / 不存在的实验版本,或者剥掉 ECH 扩展。

Threat: active MITM intercepts ClientHello, downgrades QUIC v1 to weaker v0 / a fake experimental version, or strips ECH extension.

缓解:Version Negotiation 包(RFC 9000 §6)+ transport parameters 的 version_information(RFC 9368)。客户端在 TLS handshake 完成后把自己看到的版本列表server 在 VN 包里宣布的版本列表对比,不一致 → handshake fail。这是密码学绑定下层版本与上层 TLS,downgrade 攻击必被发现。

Mitigation: Version Negotiation packet (RFC 9000 §6) + version_information transport parameter (RFC 9368). After TLS handshake completes, client compares its observed version list against the server's announced VN list; mismatch → handshake fails. This cryptographically binds the transport version to the TLS layer; downgrade is detected.

③ Stateless reset injection

③ Stateless reset injection

威胁:off-path 攻击者尝试构造一个看起来像 Stateless Reset 的包给受害者,期望受害者关闭连接

Threat: off-path attacker constructs something that looks like a Stateless Reset for the victim, hoping the victim closes the connection.

缓解:Stateless Reset 包尾部的 16 字节 stateless_reset_token 必须匹配已经派发给该 CID 的 token(通过 NEW_CONNECTION_ID 帧,RFC 9000 §10.3)。token 由 HMAC(reset_secret, CID) 派生——攻击者不知道 reset_secret(服务器内部)就无法伪造。

Mitigation: the 16-byte trailing stateless_reset_token must match a token already distributed for that CID (via NEW_CONNECTION_ID, RFC 9000 §10.3). Tokens are derived from HMAC(reset_secret, CID) — without the server's reset_secret, attackers can't forge.

④ Slow Read DoS

④ Slow Read DoS

威胁:攻击者建合法连接,但故意缓慢读 response——服务器在 stream flow-control 限额内必须一直缓存数据,占内存。10 万个慢客户端 = 服务器内存 OOM。这是 TCP "Slowloris" 在 QUIC 上的复制版。

Threat: attacker opens a legitimate connection but reads the response very slowly — server must keep data buffered within stream flow-control limits, occupying memory. 100 K slow clients → server OOM. The QUIC equivalent of TCP "Slowloris".

缓解:三层防御。① stream flow-control 限额本身就限制了每流 buffer 大小;② idle timeout(默认 5-30s)断开长时间无活动连接;③ 应用层主动监控 bytes/RTT 速率,异常低就 close。Cloudflare 实测把 idle timeout 调到 10s 后,慢客户端攻击成本上升 100×。

Mitigation: three layers. ① stream flow-control limits buffer per stream; ② idle timeout (default 5-30 s) closes long-idle connections; ③ app-layer actively monitors bytes/RTT rate and closes anomalously slow streams. Cloudflare measured 100× attack-cost increase after dropping idle timeout to 10 s.

⑤ Linkability via Connection ID

⑤ Linkability via Connection ID

威胁:CID 是 明文(routing 需要)。如果客户端跨 WiFi → 5G 时没换 CID,运营商通过 CID 把两个 IP 关联到同一个用户——隐私泄漏。

Threat: CIDs are cleartext (required for routing). If a client doesn't rotate CIDs when switching Wi-Fi → 5G, the operator can link both IPs to the same user via CID — privacy leak.

缓解(RFC 9000 §5.1):客户端在 NEW_CONNECTION_ID 池里要预备多个 CID,在每次 path 切换时必须用新 CID,旧 CID 用 RETIRE_CONNECTION_ID 注销。这是 connection migration 设计里隐藏的隐私保护

Mitigation (RFC 9000 §5.1): client maintains a NEW_CONNECTION_ID pool with multiple spare CIDs, must use a new CID on every path change, and retires the old one via RETIRE_CONNECTION_ID. This is the privacy protection hidden inside connection migration's design.

⑥ Optimistic ACK · 让发送端跑得太快

⑥ Optimistic ACK · trick sender into accelerating

威胁:作为接收方,我可以发提前的 ACK(ACK 还没收到的 PN),骗发送端以为带宽很大、congestion window 应该涨。congestion control 失效 → 发送端发太多 → 拥塞

Threat: as receiver, send premature ACKs (ACKing PNs not yet received) to trick the sender into thinking bandwidth is high and congestion window should grow. Congestion control breaks → sender over-sends → real congestion.

缓解:QUIC 的 PN 是strict monotonic(Ch12)且加密——发 ACK 必须知道实际收到的 PN,不能猜。如果 ACK PN > 最大已发 PN,连接会立刻进 protocol-violation 状态终止(RFC 9000 §13.1 中明确禁止)。这件事 TCP 时代发生过(因为 SEQ 明文),QUIC 通过头部加密 顺手堵了这个洞。

Mitigation: QUIC's PN is strictly monotonic (Ch12) and encrypted — to ACK a PN you must actually have received it; you can't guess. If an ACK references a PN beyond the largest sent, the connection immediately enters protocol-violation state and terminates (explicitly banned in RFC 9000 §13.1). This was a real TCP-era problem (SEQ was cleartext); QUIC closed the hole via header protection.

⑦ Handshake DoS · 占着茅坑不拉屎

⑦ Handshake DoS · half-open hoarding

威胁:攻击者发 N 万个 ClientHello,但永远不回 Finished——服务器在每个半连接上消耗内存。TCP 时代的 SYN flood 在 QUIC 上的对应物。

Threat: attacker fires N thousand ClientHellos but never sends Finished — server consumes memory on every half-open. The QUIC equivalent of TCP SYN flood.

缓解:Retry 包(同 ① 那个 stateless cookie)+ address validation token 让服务器在确认客户端"能真正收包" 之前不分配 connection state。Cloudflare 的实现:正常情况下不发 Retry(省一个 RTT),只在负载升高时进入 Retry-required 模式。

Mitigation: Retry packet (same stateless cookie from ①) + address validation token. Server allocates no connection state until the client proves "actually receives" the cookie. Cloudflare's implementation: don't issue Retry by default (save one RTT); enter Retry-required mode only when under load.

汇总 · 7 类威胁 + 3 个反复出现的模式

Summary · 7 threats + 3 recurring patterns

读完这 7 类,你会发现 3 个反复出现的设计模式: "未验证之前不投入资源"(amplification cap / Retry / address validation); "用加密绑定状态"(reset_token / PN 加密 / version_info); "放心丢东西"(stateless reset 不需要状态、Retry token 服务器无状态)。QUIC 的安全设计哲学是"能不维护状态就不维护"——比 TCP+TLS 时代的精神进步了一代。

After reading the seven, you'll notice three recurring design patterns: "no resource commitment before validation" (amplification cap / Retry / address validation); "bind state with cryptography" (reset_token / PN encryption / version_info); "be happy to forget" (stateless reset needs no state; Retry token is server-stateless). QUIC's security philosophy is "don't keep state if you don't have to" — a generational improvement over the TCP+TLS era.

安全设计的最高境界:
把"什么都不记" 变成可证明的防御。 Field Note · 06 · Security
The summit of security design:
turning "forgetting everything" into provable defence. Field Note · 06 · Security
APPENDIX · 03 · MoQ

MoQ — Media over QUIC,QUIC 上的实时媒体

MoQ — Media over QUIC, real-time media on QUIC

替代 RTMP / WebRTC 的候选

a candidate to replace RTMP / WebRTC

实时媒体在 web 上长期被两种协议瓜分:RTMP(2002 Adobe 出,TCP-based,推流走世界但延迟 ~ 3-5s)和 WebRTC(2011,UDP + SRTP,延迟 100-200ms 但传输不可靠且 NAT 穿越复杂)。两者都不同时满足"低延迟 + 可靠 + 大规模分发"。IETF MoQ WG(2023 成立,Twitch / Meta / Cisco 主推)用 QUIC 重新设计了这件事——Media over QUIC Transport(MOQT)

Real-time media on the web has long split between two protocols: RTMP (Adobe, 2002, TCP-based, ingests globally but with ~ 3-5 s latency) and WebRTC (2011, UDP + SRTP, 100-200 ms latency but unreliable and NAT-traversal-heavy). Neither simultaneously delivers "low latency + reliable + large-scale distribution". The IETF MoQ WG (chartered 2023, driven by Twitch / Meta / Cisco) redesigned it on QUIC — Media over QUIC Transport (MOQT).

MOQT 的核心抽象 · Track + Group + Object

MOQT's core abstraction · Track + Group + Object

Object · 最小可独立消费的字节单元
Object · smallest independently-consumable byte unit
一个 H.264 NAL unit、一个 Opus frame、一个 metadata blob——这是 MOQT 的""。带 (track_id, group_id, object_id) 三元组定位。
One H.264 NAL unit, one Opus frame, one metadata blob — this is MOQT's "packet". Identified by (track_id, group_id, object_id) triple.
Group · 可独立解码的边界
Group · independently-decodable boundary
视频的 GoP(Group of Pictures)、音频的若干 frame 集合。group 是 join 点——观众可以从任何 group 起播,无需历史。MOQT 把 GoP 概念上升为传输层概念
A video GoP (Group of Pictures), an audio frame batch. Group is the join point — a viewer can start playback at any group, no history needed. MOQT lifts the GoP concept to a transport-layer concept.
Track · 一条 group 序列
Track · a sequence of groups
一条"视频流"=一条 Track。直播同一个内容的多个比特率=多条 Track(adaptive bitrate)。订阅是track 粒度
One "video stream" = one Track. Multiple bitrates of the same content = multiple Tracks (adaptive bitrate). Subscriptions happen at track granularity.
Namespace · 组织 tracks
Namespace · organising tracks
类似 URL path:twitch.tv/airing/video/1080p。relay 用 namespace 做路由——subscribe 一个 namespace 等同订阅其下所有 track。
Like a URL path: twitch.tv/airing/video/1080p. Relays route by namespace — subscribing to a namespace is equivalent to subscribing to all its tracks.

五个控制平面消息

Five control-plane messages

;; MOQT control messages (simplified · draft-ietf-moq-transport)
ANNOUNCE       namespace                ;; publisher: "I have content at this NS"
SUBSCRIBE      namespace track          ;; subscriber: "send me this track"
SUBSCRIBE_OK   subscription_id          ;; ack of subscribe
UNSUBSCRIBE    subscription_id          ;; "I'm done"
FETCH          namespace track group obj  ;; "give me a specific object" (catch-up)

这 5 个消息走在双向 QUIC stream上。OBJECT 消息(承载 media payload)独立走 unidirectional QUIC streams 或 DATAGRAM 帧——per-object 选哪种取决于这一帧是不是允许丢

These 5 messages run on a bidirectional QUIC stream. OBJECT messages (carrying media payload) flow on separate unidirectional QUIC streams or DATAGRAM frames — the choice per object depends on whether that frame is droppable.

为什么 QUIC 是合适的下层

Why QUIC is the right substrate

1
streams + datagrams 双通道
streams + datagrams · dual channels

音频 frame 不能丢 → 走 stream(可靠);视频 P-frame 可以丢 → 走 datagram(快但不可靠)。每个 object 自己选

Audio frame must arrive → stream (reliable); video P-frame can drop → datagram (fast). Each object chooses its own.

2
stream 间无 HOL block
no cross-stream HOL block

一条音频 stream 丢包不阻塞另一条视频 stream——RTMP 在 TCP 上做不到。

A lost packet on an audio stream doesn't block a video stream — RTMP on TCP cannot do this.

3
连接迁移
connection migration

直播观众坐地铁,Wi-Fi → 5G 不断流。WebRTC 通过 ICE 重启来"恢复",几秒卡顿。

A live viewer on the subway, Wi-Fi → 5G, no stream drop. WebRTC "recovers" by restarting ICE — several seconds of stutter.

4
原生加密 + 路由信息可见
native encryption + visible routing

payload AEAD,但 CID/namespace 可被 relay 看到 → CDN 可以路由但看不见内容。HLS over HTTPS 的两难局面被解开。

AEAD payload, but CIDs / namespaces stay visible to relays → CDNs can route without seeing content. HLS-over-HTTPS's old dilemma dissolves.

relay tree · 大规模分发的关键

Relay tree · key to large-scale distribution

直播一个明星 Twitch 频道 = 1 个 publisher + 10 万 subscriber。绝对不可能直接连——必须有 relay 网络。MOQT 的关键设计:relay 是协议级 first-class concept(不像 WebRTC 的 SFU 是后接的)。

A popular Twitch channel = 1 publisher + 100 K subscribers. Direct connect is impossible — a relay network is required. MOQT's key design choice: relays are first-class protocol concepts (unlike WebRTC's SFU bolted on after the fact).

;; relay tree topology · text-mode diagram

         publisher (Twitch ingest)
              |
              v
   +----------+----------+
   |                     |
 relay A (US-East)    relay B (EU)
   |                     |
   +--+----+----+      +-+---+----+
   |  |    |    |      |     |    |
 v1 v2   v3   v4     v5    v6   v7      ; viewers

每条边:1 个 QUIC connection · namespace = twitch.tv/airing/
每个 viewer 只和最近的 relay建立连接 — geographic + cost optimised
publisher 不知道谁是 final viewer · CDN 完全代理这件事

实战时间表(2026 年视角)

Reality timeline (as of 2026)

YearMilestoneStatus
2022Twitch's "Warp" prototype (Luke Curley)internal demo
2023IETF MoQ WG charteredcharter approved
2024draft-ietf-moq-transport-04first multi-vendor interop
2025draft-ietf-moq-transport-10WGLC entered
2026 H1Twitch beta with select streamersnow
2026 H2Meta · Instagram Live trialannounced
2027 (est.)RFC publication

MoQ 替代 WebRTC 还是 RTMP?

Will MoQ replace WebRTC or RTMP?

两者都不完全RTMP 在 ingest 侧(主播 → 平台)2026 已经被 WHIP/WHEP / SRT 和 MoQ 实验性替代;WebRTC 在低延迟 P2P 通话(Discord / FaceTime)不会被 MoQ 替代,因为 P2P + SFU 那套机制 MoQ 没有对等替代。MoQ 真正的领地是"大规模单向分发 + 低延迟"——这正是 Twitch / YouTube Live / 体育赛事直播 / 在线教育那个市场。

Neither completely. RTMP on the ingest side (creator → platform) is already being supplanted by WHIP/WHEP / SRT and experimental MoQ in 2026. WebRTC in low-latency P2P calls (Discord / FaceTime) won't be replaced by MoQ — its P2P + SFU mechanisms have no MoQ equivalent. MoQ's real territory is "large-scale one-to-many distribution + low latency" — exactly the Twitch / YouTube Live / sports streaming / online-education market.

直播平台 2026 年这场迁移,
是过去 20 年最大的一次。 Field Note · 06 · MoQ
The live-streaming migration of 2026
is the biggest move in the field in twenty years. Field Note · 06 · MoQ
本章引用Chapter references
WG
IETF MoQ WG
draft
draft-ietf-moq-transport
APPENDIX · 04 · WebTransport

WebTransport API · 浏览器侧的全部

WebTransport API · the complete browser surface

替代 WebSocket 的 H3 接口

the H3 interface that replaces WebSocket

Ch24 提了 WebTransport 一句话。这一章把整个 W3C WebTransport API 完整走查——浏览器开发者从客户端发起 H3 长连接、收发可靠流 + 不可靠数据报 全过程。Chrome 97+(2022 Q1)、Firefox 114+(2023)、Safari 16.4+(2023-03)都已实现。

Ch24 mentioned WebTransport in passing. This chapter walks through the entire W3C WebTransport API — how a browser developer opens an H3-based long-lived connection and exchanges reliable streams + unreliable datagrams. Implemented in Chrome 97+ (2022 Q1), Firefox 114+ (2023), Safari 16.4+ (2023-03).

最小可跑 · 5 行 JS

Minimum viable · 5 lines of JS

// 1. open the transport (HTTPS only · ALPN = h3 + WebTransport)
const transport = new WebTransport("https://example.com:443/wt");
await transport.ready;            // resolves after H3 handshake + WebTransport SETTINGS exchange

// 2. open a bidirectional reliable stream
const stream = await transport.createBidirectionalStream();
const writer = stream.writable.getWriter();
await writer.write(new TextEncoder().encode("hello"));

// 3. send an unreliable datagram
const dgramWriter = transport.datagrams.writable.getWriter();
await dgramWriter.write(new Uint8Array([42, 7, 3]));

这 5 行 JS 在底层对应:① 用 ALPN=h3 建 QUIC + TLS 1.3 连接 → 发 HTTP/3 CONNECT with :protocol = webtransport(RFC 9220 Extended CONNECT);② 服务器 200 响应表示接受;③ 后续所有数据关联 QUIC stream 而非 H3 帧。WebTransport 跑在 H3 之上(共用连接),但逃出了 HTTP/3 的请求-响应模式。

These 5 lines map to: ① open a QUIC + TLS 1.3 connection with ALPN=h3 → send an HTTP/3 CONNECT with :protocol = webtransport (RFC 9220 Extended CONNECT); ② server replies 200 to accept; ③ all subsequent data flows on associated QUIC streams, not via HTTP/3 frames. WebTransport rides on top of H3 (shared connection) but escapes HTTP/3's request/response model.

API 全表

Full API surface

APItypepurpose
new WebTransport(url, options)constructoropen transport · options.serverCertificateHashes 可指定自签证书
transport.readyPromiseresolves once H3 handshake + WT SETTINGS done
transport.closedPromiseresolves on graceful close · rejects on abort
transport.close({closeCode, reason})methodactively shutdown
transport.createBidirectionalStream()method→ Promise<WebTransportBidirectionalStream>
transport.createUnidirectionalStream()method→ Promise<WritableStream>
transport.incomingBidirectionalStreamsReadableStreamserver-initiated bidi streams arrive here
transport.incomingUnidirectionalStreamsReadableStreamserver-initiated uni
transport.datagrams.readableReadableStreamincoming unreliable datagrams · max 1200 B each
transport.datagrams.writableWritableStreamoutgoing unreliable datagrams
transport.datagrams.maxDatagramSizenumbernegotiated max payload (PMTU-aware)
transport.getStats()Promisecongestion stats · cwnd / smoothedRTT / packetsLost / ...

三种流的区别

Three stream flavours

1
Bidirectional · 像 WebSocket
Bidirectional · WebSocket-like

双向可靠 byte stream。客户端发起用 createBidirectionalStream;服务端发起从 incomingBidirectionalStreams 读。每条 stream 独立丢包不阻塞别的——WebSocket 在 TCP 上做不到。

Bidi reliable byte stream. Client-initiated via createBidirectionalStream; server-initiated arrive on incomingBidirectionalStreams. Each stream's loss doesn't block others — WebSocket on TCP cannot do this.

2
Unidirectional · 单向
Unidirectional

单向可靠 byte stream。常用于事件广播(server 向 client 单向推) 或大文件上传(client 向 server 单向推)——节省一个方向的 stream-id 槽。

Reliable byte stream in one direction. Used for event broadcast (server → client) or file upload (client → server) — saves one direction's stream-id slot.

3
Datagram · 不可靠
Datagram · unreliable

单个 UDP 包级别的数据报,无可靠传输 · 无顺序保证 · 无拥塞控制(只受 conn 级 cwnd 限)。每条 ≤ 1200 字节(PMTU 决定)。这就是 RFC 9221 QUIC DATAGRAM 在浏览器侧的暴露。低延迟实时场景的金子

UDP-packet-level datagrams, unreliable · no ordering · no per-flow CC (just conn-level cwnd). ≤ 1200 B each (PMTU-bound). This is the browser-side exposure of RFC 9221 QUIC DATAGRAM. Gold for low-latency real-time use cases.

真实代码 · 云游戏控制流

Real code · cloud-gaming input loop

// CLIENT · send 60 Hz input via datagrams, receive frame deltas via streams
const wt = new WebTransport("https://cloud-gaming.example/wt");
await wt.ready;

// 1. 60 Hz unreliable datagram loop — keyboard / mouse / gamepad
const dgramW = wt.datagrams.writable.getWriter();
setInterval(async () => {
  const input = pollInput();         // keys + mouse delta · ~ 20 B
  try {
    await dgramW.write(input);
  } catch (e) {
    // dgram dropped under congestion · just send the next one
  }
}, 16);

// 2. server pushes frame deltas via uni streams (one per delta)
for await (const stream of wt.incomingUnidirectionalStreams) {
  const reader = stream.getReader();
  let bytes = new Uint8Array();
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    bytes = concat(bytes, value);
  }
  decodeAndPaintFrame(bytes);    // reliable per-delta · independent loss
}

这段代码体现了 WebTransport 的杀手锏:输入用 datagram(可丢,反正下一帧又来一遍),视频 frame 用 stream(必须按顺序解码,但每帧自己一条 stream,丢包只影响那一帧)。WebRTC 也能做这个,但需要 ICE / STUN / TURN / SDP 一大套基建;WebTransport 直接 5 行 JS 起步。

This shows WebTransport's killer pattern: input via datagrams (droppable; the next frame's input is fresh), video frames via streams (must decode in order, but each frame has its own stream so loss is isolated). WebRTC can do this too, but needs ICE / STUN / TURN / SDP machinery; WebTransport is 5 lines of JS to start.

WebTransport vs WebSocket vs WebRTC DataChannel

WebTransport vs WebSocket vs WebRTC DataChannel

WebTransportWebSocketWebRTC DataChannel
下层协议HTTP/3 + QUICHTTP/1.1 Upgrade · TCPSCTP over DTLS over UDP
低延迟✓ 1 RTT 启动✗ 1 RTT TCP + 1 RTT Upgrade + 2 RTT TLS✓ 但 ICE + STUN/TURN 数 RTT
HOL block✓ stream 间隔离✗ TCP HOL block✓ SCTP 流隔离
不可靠通道✓ datagrams✗ 全可靠✓ unordered/unreliable mode
P2P✗ 仅 client-server✓ 真 P2P
多路复用 in browser✓ 共享 H3 连接每个 ws 一个 TCP独立 DTLS · 不复用
用例cloud gaming / live · client-serverchat / notificationsP2P 视频 / 文件传输

遇到的实际问题

Real problems people hit

证书必须公网 CA(否则要 hash pin)
Cert must be public CA (otherwise hash-pin)
浏览器默认拒绝自签证书。开发期可以用 serverCertificateHashes 显式 pin,但每次证书过期都要更新。生产用 Let's Encrypt 简单。
Browsers reject self-signed by default. In dev you can pin via serverCertificateHashes, but every cert renewal needs an update. In prod, use Let's Encrypt.
企业 firewall 阻塞 UDP/443
Enterprise firewalls block UDP/443
~ 10-15% 企业网仍然只放 TCP/443。客户端需要 fallback 到 WebSocket 或显示"网络不支持" 提示。Ch20 case 4 里 Cloudflare 自己的数据。
~ 10–15% enterprise networks still pass TCP/443 only. Client needs WebSocket fallback or a "network unsupported" notice. Cloudflare's own data is in Ch20 case 4.
backpressure 需要手动
backpressure is manual
writer.write 总是立即 resolve(不像 WebSocket 有 bufferedAmount)。要做流控必须 await writer.ready 或自己看 transport.getStats() 的 cwnd。
writer.write always resolves immediately (no WebSocket-style bufferedAmount). For flow-control you must await writer.ready or poll transport.getStats()'s cwnd.
APPENDIX · 05 · ECOLOGY

部署生态学 — 为什么 QUIC 没有 100% 部署

Deployment ecology — why QUIC isn't at 100%

技术以外的力

forces beyond the protocol

HTTP/3 spec 2022 年成 RFC,2026 年 Cloudflare Radar 显示全球 ~ 32% 流量走 H3——但 web 平台的关键 25% 部署在哪里没做?为什么有些国家 QUIC 占比 < 5%?这不是技术问题,是政治经济问题。这一章把 QUIC 部署生态摊开,从谁阻止谁的视角看。

HTTP/3 RFC published 2022; by 2026, Cloudflare Radar shows ~ 32% global traffic on H3. Where are the missing 25% deployments hiding? Why is QUIC adoption < 5% in some countries? This isn't a technical problem — it's political economy. This chapter unpacks the QUIC deployment ecology from the angle of who blocks whom.

QUIC 部署的 5 个阻力

Five frictions against QUIC deployment

ISP QoS 政策
ISP QoS policies
一些 ISP 给 UDP 流量打低优先级(因为传统上 UDP = 游戏 / VoIP / DNS,带宽配额历史性偏少)。结果 QUIC 在这些 ISP 上 throughput 反而于 TCP。用户察觉到 H3 慢,关掉 → 浏览器统计到 QUIC 反向退化 → 进入"降级模式" 不再用 H3。这是 QUIC 在某些国家部署率低于 5% 的最大原因。
Some ISPs assign low priority to UDP traffic (historically UDP = games / VoIP / DNS, with smaller bandwidth allocations). Result: QUIC throughput on those ISPs is worse than TCP. Users feel H3 is slow and disable it; browsers detect QUIC regression and enter "fallback mode", never re-enabling H3. This is the biggest reason H3 adoption in some countries is < 5%.
企业 firewall 默认丢 UDP/443
Enterprise firewalls drop UDP/443 by default
企业 IT 历史上习惯"放 TCP/443 出去就 OK"。UDP/443 在传统 firewall ruleset 里被默认丢。Palo Alto / Fortinet / Cisco ASA 2023+ 版本才默认阻塞 UDP/443,但实际部署滞后多年10-15% 企业网仍然只通 TCP——这就是为什么生产 H3 部署必须有 TCP fallback。
Enterprise IT historically "opens TCP/443, calls it done". UDP/443 was dropped by default in legacy firewall rule-sets. Palo Alto / Fortinet / Cisco ASA versions from 2023+ no longer block by default, but real-world deployments lag years. 10–15% of enterprise networks still pass TCP only — which is why production H3 must always provide TCP fallback.
CDN 替代率 + cost
CDN substitution + cost
Cloudflare 100%、Fastly 100%、Akamai ~ 80%、CloudFront 2023 才上 H3。但很多 small site 仍跑自建 nginx 1.18(没 quictls)— H3 上不了。升级 nginx + 维护 quictls + 调 sysctl 在 small site 那边没人有动机做,反正 H2 也能跑。"够用"是 H3 部署的天敌。
Cloudflare 100%, Fastly 100%, Akamai ~ 80%, CloudFront added H3 in 2023. But many small sites still run self-hosted nginx 1.18 (no quictls) — H3 disabled. Upgrading nginx + quictls + sysctl tuning has no payoff at small scale; H2 works. "Good enough" is H3 deployment's natural enemy.
服务端 CPU 成本
Server CPU cost
Ch23 说过:H3 服务器 CPU 是 H2 的 1.5-2×(每个包 user/kernel boundary + 独立 AEAD + 用户态拥塞控制)。Fastly 实测同样的服务器,H3 流量上限只有 H2 一半。对 CPU 敏感型业务(比如 ad-tech)宁可少上 H3。kernel TLS + AF_XDP 在 2026 才开始让这个数字回归。
Per Ch23: H3 servers burn 1.5–2× the CPU of H2 (each packet crosses user/kernel boundary, does its own AEAD, runs user-space CC). Fastly measured: same box, half the H3 throughput. CPU-sensitive workloads (ad-tech) prefer to defer H3. Kernel TLS + AF_XDP started narrowing this in 2026.
Ossification 风险
Ossification risk
QUIC 之所以加密 PN / Reserved bits 就是为了防 middlebox 学坏。但如果每家 middlebox 都开始某种特定 QUIC 实现的指纹,新版本/新提案 ship 时就不能改那个字节——回到 TCP ossification 的老路。Cloudflare 的 QUIC v2 trial 就是主动验证 network 没有 ossify。
QUIC encrypts PN / Reserved bits precisely to keep middleboxes from learning bad habits. But if every middlebox starts recognising a specific QUIC implementation's fingerprint, future version / proposal ships can't move that byte — back to TCP ossification. Cloudflare's QUIC v2 trial proactively tests whether the network has ossified.

各国部署率(2026-Q1 · Cloudflare Radar)

National adoption (2026-Q1 · Cloudflare Radar)

Country / RegionH3 shareNotes
USA~ 45%Chrome 主导 + Cloudflare/Fastly 全开
Western Europe~ 40%类似 USA · 略低
Japan~ 35%NTT QoS 友好
India~ 30%移动占比高 → H3 收益大 → 浏览器倾向选 H3
South Korea~ 25%SK Telecom 部分 UDP 限速
Russia~ 8%ISP 普遍限速 UDP
China~ 5%GFW 历史上把 UDP 当 ".可疑" 处理 + 国内 CDN(阿里/腾讯/百度)2024 才开始上 H3
SSA(Sub-Saharan Africa)~ 3-15%差距大 · ISP 设备旧

2026 年的关键变量

Key 2026 variables

三件事可能在 2026-2027 把 H3 占比推过 50%:

Three things could push H3 share past 50% in 2026–2027:

kernel TLS for QUIC
Kernel TLS for QUIC

Linux 6.6+ 的 ktls + AF_XDP 让 QUIC 部分进入内核。CPU 成本会降回 H2 水平 → 阻力 ④ 消失。

Linux 6.6+'s ktls + AF_XDP partially moves QUIC into the kernel. CPU cost falls to H2 levels → friction ④ vanishes.

大陆 CDN 上量
China-mainland CDNs ramping

阿里云 2024 默认开 H3,腾讯云 2025 跟上。中国占全球流量 ~ 20%,这一拨直接推高总体 ~ 5-7%。

Alibaba Cloud enabled H3 by default in 2024; Tencent Cloud followed 2025. China's ~ 20% of global traffic — this single move adds ~ 5-7% to global totals.

Caddy / nginx 默认开 H3
Caddy / nginx default H3

Caddy 2.7+ 默认开 H3。nginx 1.27 (2024) 起 H3 进入稳定。Small site 不再需要主动配置——自动 H3。

Caddy 2.7+ has H3 on by default. nginx 1.27 (2024) marks H3 stable. Small sites stop needing manual configuration — H3 just works.

反向阻力 · 监管
Counter-force · regulation

部分国家要求 ISP-level DPI(深包检测)做内容审计。QUIC 加密一切让 DPI 失效——这些国家可能反向 限制 H3。

Some countries mandate ISP-level DPI for content audit. QUIC's encrypt-everything defeats DPI — those countries may actively throttle H3.

协议的命运不只在 RFC 里,
更在 firewall 规则和 ISP QoS 表里。 Field Note · 06 · Ecology
A protocol's fate is not only in the RFC.
It's also in firewall rules and ISP QoS tables. Field Note · 06 · Ecology
APPENDIX · STANDARDS

References & Standards — 文章每个论断的出处

References & Standards — sources for every claim

RFC · IETF Drafts · 论文 · 引擎源码

RFCs · IETF Drafts · papers · engine source

这一节把全文用到的 外部规范、RFC、论文、源码 归档。每条引用都带 状态徽章(STD = 正式 RFC / PS = Proposed Standard / DRAFT = IETF Internet-Draft)+ 链接 + 你在哪一章会用到它。所有 URL 在 2026 年 5 月有效;QUIC 生态演化很快,IETF Working Group(quic / masque / moq / httpbis)持续在出新草案。

This section archives every external spec, RFC, paper, or source-code reference the article touches. Each carries a status pill (STD = published RFC / PS = Proposed Standard / DRAFT = IETF Internet-Draft) + link + the chapter that needs it. All URLs valid as of May 2026; the QUIC ecosystem moves quickly — IETF Working Groups (quic / masque / moq / httpbis) keep issuing new drafts.

A · 核心 RFC 七件套

A · The seven core RFCs

QUIC core
RFC 9000 QUIC: A UDP-Based Multiplexed and Secure Transport · Iyengar & Thomson, May 2021. 全文最重要的 RFC,Ch04-Ch11 全用。Iyengar & Thomson, May 2021. The most-cited RFC in this article; underpins Ch04–Ch11.
QUIC TLS
RFC 9001 Using TLS to Secure QUIC · Thomson & Turner, May 2021. Ch07 握手 / Ch08 0-RTT。Thomson & Turner, May 2021. Ch07 handshake / Ch08 0-RTT.
QUIC recovery
RFC 9002 QUIC Loss Detection and Congestion Control · Iyengar & Swett, May 2021. Ch12 丢包恢复 / Ch13 拥塞控制。NewReno 伪代码出自 §7,本文 Ch13 的伪代码是该规范的简化版Iyengar & Swett, May 2021. Ch12 loss recovery / Ch13 congestion control. The NewReno pseudocode is from §7; the Ch13 listing is a simplified version of that spec.
QUIC version negotiation
RFC 9368 Compatible Version Negotiation for QUIC · QUIC v2 与 v1 共存的协商机制。Ch04。Mechanism for QUIC v1 and v2 to coexist. Ch04.
HTTP/3
RFC 9114 HTTP/3 · Bishop, June 2022. Ch14 帧 / Ch15 QPACK / Ch16 stream priorities。Bishop, June 2022. Ch14 frames / Ch15 QPACK / Ch16 stream priorities.
QPACK
RFC 9204 QPACK: Field Compression for HTTP/3 · Krasic, Bishop, Frindell, June 2022. Ch15 全章的 spec。RIC(Required Insert Count)定义在 §3.3.4。Krasic, Bishop, Frindell, June 2022. The spec behind all of Ch15. Required Insert Count (RIC) defined in §3.3.4.
HTTP semantics (shared)
RFC 9110 HTTP Semantics · HTTP/1.1 / 2 / 3 共享的语义层。Ch01 / Ch14。Shared HTTP semantics across versions 1.1 / 2 / 3. Ch01 / Ch14.

B · 扩展协议族(在 QUIC 上长出来的)

B · Extension family (built on QUIC)

HTTP/3 Datagram
RFC 9297 HTTP Datagrams and the Capsule Protocol · MASQUE 栈的容器规范。Ch24 §2。Container spec for the MASQUE stack. Ch24 §2.
CONNECT-UDP
RFC 9298 Proxying UDP in HTTP · 用 H3 转发 UDP。iCloud Private Relay 用到。Ch24。Tunnel UDP via H3. Used by iCloud Private Relay. Ch24.
CONNECT-IP
RFC 9484 Proxying IP in HTTP · 用 H3 转发整个 IP 包。下一代 VPN。Ch24。Tunnel IP packets via H3. Next-gen VPN. Ch24.
QUIC Datagram
RFC 9221 Unreliable Datagram Extension to QUIC · QUIC 上的不可靠数据报通道。WebTransport 用。Ch24。Unreliable datagram channel on QUIC. Used by WebTransport. Ch24.
WebTransport (W3C)
WD W3C · WebTransport API · 浏览器侧 API,WebSocket 的继任者。Ch24 §1。Browser-side API; WebSocket successor. Ch24 §1.
WebTransport over H3
DRAFT draft-ietf-webtrans-http3 · 底层协议定义。Chrome 97+ 实现。Ch24。Wire-protocol definition. Implemented in Chrome 97+. Ch24.
MoQ Transport
DRAFT IETF MoQ WG · Media over QUIC · Twitch / Meta 主推,QUIC 上的实时媒体协议。phase 3,2025 H1 进入实测。Ch24 §3。Driven by Twitch / Meta; real-time media protocol on QUIC. Reached phase 3 / interop in H1 2025. Ch24 §3.
DoH (DNS-over-HTTPS)
RFC 8484 DNS Queries over HTTPS (DoH) · 为什么 Ch07 0-RTT 能配 SNI 加密的 ECH。Ch20 case 4。Why Ch07 0-RTT plays well with SNI-encrypted ECH. Ch20 case 4.

C · 历史与失败案例(无 HTTP/3 不可能)

C · The history (why HTTP/3 was even necessary)

HTTP/2
RFC 9113 HTTP/2 · 2022 重发版,替代 RFC 7540。Ch01 / Ch23(为什么不够)。2022 reissue, supersedes RFC 7540. Ch01 / Ch23 (why it wasn't enough).
TLS 1.3
RFC 8446 TLS 1.3 · QUIC 内嵌的 TLS 来自这里。Ch07 / Ch08。The TLS embedded inside QUIC. Ch07 / Ch08.
QUIC at Google · SIGCOMM 2017
Langley et al. "The QUIC Transport Protocol: Design and Internet-Scale Deployment" · 原始 QUIC 论文,记录 Google QUIC(gQUIC)在 2015-2017 的部署经验。本文 Ch20 case 1 引用的 "YouTube India −20% to −40%" 实际来自Chrome blog 2020-10 "Chrome is deploying HTTP/3 and IETF QUIC",而非这篇 SIGCOMM——之前引文不准,已修正 The original QUIC paper, documenting Google QUIC (gQUIC) deployment 2015–2017. Ch20 case 1's "YouTube India −20% to −40%" number actually comes from Chrome blog 2020-10 "Chrome is deploying HTTP/3 and IETF QUIC", not this SIGCOMM paper. Earlier citation was off; corrected.
BBR · ACM Queue 2017
Cardwell, Cheng, Gunn, Yeganeh, Jacobson. "BBR: Congestion-Based Congestion Control" · ACM Queue · Sep 2016 · BBR v1 原始论文。Ch13 "BBR ≈ 1.5–3× of NewReno on cellular" 出处。USENIX SRECon 2017 talk 给了更多 YouTube/Akamai 实测数据。 BBR v1 paper. Source for the Ch13 "BBR ≈ 1.5–3× of NewReno on cellular" claim. USENIX SRECon 2017 talk has fuller YouTube/Akamai numbers.
BBRv2 / v3 drafts
DRAFT draft-cardwell-iccrg-bbr-congestion-control · BBRv2 修正了 v1 在高度 ECN 标记下吃带宽的问题。BBRv3 在 Google 内部 2024 起 ramp。Ch13 末尾。 BBRv2 fixes v1's bandwidth-hogging behaviour under heavy ECN. BBRv3 has been ramping inside Google since 2024. Ch13 end.

D · 实施 / 部署 / 调试

D · Implementations / deployments / debugging

qlog
DRAFT draft-ietf-quic-qlog-main-schema · QUIC 调试事件流的事实标准。Ch22。 De facto standard event stream for QUIC debugging. Ch22.
qvis
qvis.quictools.info · qlog 的可视化前端,KU Leuven Robin Marx 团队主维护。 qlog visualisation front-end, maintained by Robin Marx's team at KU Leuven.
Cloudflare · quiche
cloudflare/quiche · Rust QUIC 实现,Cloudflare 生产环境用。Ch04 帧解码逻辑参考自这里。 Cloudflare's production Rust QUIC implementation. Ch04 frame-decoding logic was cross-checked against this.
LiteSpeed · lsquic
litespeedtech/lsquic · C 实现,LiteSpeed 主推。Ch20 case 3 引用。 C implementation maintained by LiteSpeed. Cited in Ch20 case 3.
Google · QUICHE (gQUIC + IETF QUIC)
quiche.googlesource.com · Chrome / YouTube 的 QUIC 实现源码。 Source for Chrome's / YouTube's QUIC implementation.
Apple · network.framework QUIC
Apple Developer · Network framework · iOS 15+ / macOS 12+ 的 QUIC 实现,iCloud Private Relay 客户端用。Ch24 case。 iOS 15+ / macOS 12+ QUIC implementation; what iCloud Private Relay's client uses. Ch24 case.
microsoft · MsQuic
microsoft/msquic · Windows 内核 + .NET QUIC 实现。HTTP/3 在 IIS 上跑用的是这个。 Windows kernel + .NET QUIC implementation. IIS HTTP/3 rides on this.

E · 厂商部署数据(本文每条 X% 数字的出处)

E · Vendor deployment data (where every X% number in this article came from)

Cloudflare Radar
radar.cloudflare.com · QUIC / HTTP/3 全球部署占比的实时面板。Ch20 数字来源。 Live dashboard for global QUIC / HTTP/3 share. Source for Ch20 numbers.
Chrome blog · H3 ramp
"Chrome is deploying HTTP/3 and IETF QUIC" · Oct 2020 · YouTube India −20% to −40% search latency 的真实出处(此前 Ch20 case 1 错引到 SIGCOMM 2017)。 The real source for "YouTube India −20% to −40% search latency" (Ch20 case 1 previously mis-cited SIGCOMM 2017).
Meta · HTTP/3 at Facebook
Engineering Blog · 2020-10 · Meta 把 Facebook / Instagram 切到 H3 的过程,包括 BBR 比 CUBIC 多 1-3% throughput 的数据。Ch20 / Ch13。 Meta's migration of Facebook / Instagram to H3, including BBR-vs-CUBIC throughput numbers. Ch20 / Ch13.
Fastly · H3 status
Fastly Blog · 2022
Patrick Meenan
"HTTP/3 From A To Z: Core Concepts" · Smashing Magazine 2021 · 提供 Ch23 critique 中 "raises the floor, not the ceiling" 的原始措辞。 Source of the Ch23 critique phrasing "raises the floor, not the ceiling".

F · 安全 / 加密 / 学术

F · Security / cryptography / academic

QUIC formal verification
Goel et al. "QUICTester" · CCS 2021 · 用 model checking 找 QUIC 实现里的安全 bug。Ch08 提到的 0-RTT replay。 Model-checking QUIC implementations for security bugs. Ch08 0-RTT replay.
0-RTT replay analysis
Fischlin & Günther. "Replay Attacks on 0-RTT" · IACR ePrint 2018-082 · Ch08 三道防线的理论框架。 Theoretical framing of Ch08's three lines of defence.
Robin Marx · QUIC HOL paper
"Same Standards Different Decisions: A Study of QUIC and HTTP/3 Implementation Diversity" · EPIQ 2020 · 不同 QUIC 实现在 HOL / 调度上的差异度量。Ch11 / Ch16。 Measuring HOL / scheduling differences across QUIC implementations. Ch11 / Ch16.

G · IETF 工作组 · 在哪里看下一份草案

G · IETF working groups · where the next draft lives

quic
datatracker.ietf.org/wg/quic · QUIC 协议本体维护组。版本协商、long-header retirement 等仍在演进。Maintains the QUIC protocol itself. Version negotiation, long-header retirement, etc. still in motion.
httpbis
datatracker.ietf.org/wg/httpbis · HTTP/3 + QPACK + 语义,以及未来的 HTTP/4。HTTP/3 + QPACK + semantics, plus a future HTTP/4.
masque
datatracker.ietf.org/wg/masque · CONNECT-UDP / CONNECT-IP / Capsule。CONNECT-UDP / CONNECT-IP / Capsule.
moq
datatracker.ietf.org/wg/moq · Media over QUIC,2023 成组,Twitch / Meta 主推。Media over QUIC, chartered 2023, driven by Twitch / Meta.
tls
datatracker.ietf.org/wg/tls · TLS 1.3 + ECH + Encrypted SNI 持续在演进。TLS 1.3 + ECH + Encrypted SNI still evolving.
RFC 不是终点。
它只是"这一刻全世界同意了"的快照。 Field Note · 06 · Fin
An RFC is not the end.
It's a snapshot of "what the world agreed on, at this moment". Field Note · 06 · Fin

从你按下回车,
到屏幕上跳出 200 OK
HTTP/3 用 13 步把一次请求
封装成一个 UDP 包,
跨过四层加密,
在一个 RTT 里完成。

From the moment you press Enter,
to the moment 200 OK appears,
HTTP/3 wraps a request in a UDP datagram,
crosses four encryption layers,
and finishes in a single RTT —
in thirteen movements.

FIN // END OF FIELD NOTE 06
✦ ✦ ✦
阅读Reads

留下评论Leave a comment

评论Comments

加载中…Loading…