一个 GET 请求要在 UDP 之上跑完 13 道协议工序、跨 4 个加密级、穿过 3 类流,才能让你看到一个 200 OK——然后连接还要走完关闭、排空、复活三种结局。
这是 HTTP/3 与 QUIC 的全景手册,每一步都标出对应的 RFC 条款。
A single GET has to walk thirteen protocol stages on top of UDP, four cryptographic levels and three stream classes before it can land a single 200 OK — then the connection still has to walk close, drain, or revive.
This is a field map of HTTP/3 and QUIC, with every step pinned to the relevant RFC clause.
三个公式,一具协议骨骼
three formulas, one protocol skeleton
"HTTP" 在大多数人嘴里是一种东西——一个能让浏览器去网站取页面的协议。但工程师如果还把它当成一种东西,就永远理解不了为什么会有 HTTP/3。HTTP 从来不是一个协议,它是三个正交协议的乘积。
To most people, "HTTP" is a thing — the protocol your browser uses to fetch a page. Engineers who keep thinking of it as one thing will never understand why HTTP/3 exists. HTTP has never been one protocol; it has always been the product of three orthogonal layers.
| 版本Version | Semantics | Framing | Transport |
|---|---|---|---|
| HTTP/0.9 (1991) | GET only | — | TCP |
| HTTP/1.0 (1996, RFC 1945) | headers, methods | ASCII, 1 req / conn | TCP |
| HTTP/1.1 (1997-2022, RFC 9112) | 同上 + chunked+ chunked, keepalive | ASCII, keepalive, pipelining | TCP (+ TLS) |
| HTTP/2 (2015-2022, RFC 9113) | RFC 9110 | 二进制 · 多路复用 · HPACKbinary · mux · HPACK | TCP + TLS 1.2/1.3 |
| HTTP/3 (2022, RFC 9114) | RFC 9110 | 二进制 · 简化 · QPACKbinary · simpler · QPACK | QUIC (UDP+TLS 1.3) |
从 Tim Berners-Lee 的一行 GET 到 Cloudflare 的 50% 全网流量
from Tim Berners-Lee's first GET to Cloudflare's 50% global traffic
HTTP/3 不是凭空出现的。它是 30 年技术堆栈一次次试错的产物:从 HTTP/0.9 的一行 GET /,到 SPDY 的实验,到 HTTP/2 的"二进制化",再到 QUIC 把 TCP 整个搬进用户态。每一步都少做了一个假设。
HTTP/3 didn't appear from nowhere. It's the product of thirty years of trial and error: HTTP/0.9's one-line GET /, SPDY's experiments, HTTP/2's binary framing, finally QUIC dragging TCP into user space. Each step drops one assumption.
| 年份Year | 事件Event | 关键人物 / 文档Person / Doc |
|---|---|---|
| 1991 | HTTP/0.9 — 一行 GET /single-line GET / | Tim Berners-Lee · CERN |
| 1996 | HTTP/1.0 · RFC 1945 | Henrik Frystyk Nielsen · W3C |
| 1997 | HTTP/1.1 · RFC 2068 → 2616 (1999) → 7230 (2014) → 9112 (2022) | Roy Fielding · UCI |
| 2008 | TLS 1.2 · RFC 5246 | Tim Dierks · Eric Rescorla |
| 2009 | SPDY 在 Chrome 实验experimental in Chrome | Mike Belshe · Roberto Peon · Google |
| 2012 | gQUIC 在 Google 内部at Google | Jim Roskind |
| 2015 | HTTP/2 · RFC 7540 | Mark Nottingham · Martin Thomson |
| 2016 | IETF QUIC WG 成立chartered | Mark Nottingham · Lars Eggert |
| 2018 | TLS 1.3 · RFC 8446 | Eric Rescorla · Mozilla |
| 2018-11 | "HTTP/3" 正式命名name finalised | Mark Nottingham · IETF 103 |
| 2021-05 | RFC 9000/9001/9002 · QUIC v1 | Iyengar · Thomson · Bishop · Pardue |
| 2022-06 | RFC 9114 · HTTP/3 | Mike Bishop · Akamai |
| 2022-06 | RFC 9204 · QPACK | Charles 'Buck' Krasic · Mike Bishop · Alan Frindell |
| 2023 | RFC 9460 · HTTPS RR (SVCB) | Ben Schwartz · Mike Bishop · Erik Nygren |
| 2024 | QUIC v2 · RFC 9369 · 字段排列变更,反僵化field re-shuffle, anti-ossification | Martin Duke |
为什么花了七年发现 HTTP/2 还不够
why it took seven years to find out HTTP/2 wasn't enough
2015 年 HTTP/2 发布的时候,大家以为 HTTP 终于"完工"了。它把 ASCII 换成了二进制,把 6 条 TCP 连接压成 1 条,把头部用 HPACK 压扩到 95%。结果跑了三年实战,工程师们发现 HTTP/2 留下了三个根本治不好的问题——而且都不是 HTTP/2 的错。是 TCP 的错。
When HTTP/2 shipped in 2015, everyone thought HTTP was finally "done". It swapped ASCII for binary, collapsed 6 TCP connections into 1, compressed headers ~95% with HPACK. Three years of production later, engineers found that HTTP/2 left three diseases that couldn't be cured — and none of them were HTTP/2's fault. They were TCP's fault.
HTTP/2 在应用层多路复用 100 个流,但 TCP 在传输层仍然要求按序交付。一个数据包丢了,整条 TCP 连接停下来等重传——即使另外 99 个流毫无关系。这叫 TCP head-of-line blocking。
HTTP/2 multiplexes 100 streams at the application layer, but TCP at the transport layer still demands in-order delivery. Drop one packet, the entire TCP connection halts — even if the other 99 streams are unrelated. This is TCP head-of-line blocking.
实测:3% 丢包率下 HTTP/2 经常比 HTTP/1.1 多连接还慢。
Measured: at 3% loss, HTTP/2 often loses to HTTP/1.1 multi-connection.
HTTP/2 必须跑在 TLS 上(实际上)。一次新连接要:TCP SYN/SYN-ACK/ACK(1 RTT)+ TLS 1.2 ClientHello/ServerHello(2 RTT)= 3 RTT;用 TLS 1.3 + TCP Fast Open 还是 2 RTT。200ms 的跨洲 RTT 下,开口就花 400~600ms。
HTTP/2 must run over TLS (in practice). A fresh connection needs: TCP SYN/SYN-ACK/ACK (1 RTT) + TLS 1.2 ClientHello/ServerHello (2 RTT) = 3 RTT; TLS 1.3 + TCP Fast Open still 2 RTT. At 200ms intercontinental RTT, you spend 400-600ms before saying a word.
实测:手机 4G/5G 上,握手时间常常超过整个页面的 LCP 预算。
Measured: on 4G/5G, handshake alone often eats the page's entire LCP budget.
TCP 连接由 (src_ip, src_port, dst_ip, dst_port) 五元组定义。手机从 Wi-Fi 切到 5G,src_ip 变了——TCP 连接立即报废,TLS 会话也跟着重建。前端 SPA 里那个长连接 WebSocket 就这样断了。
A TCP connection is identified by the 4-tuple (src_ip, src_port, dst_ip, dst_port). When a phone switches Wi-Fi → 5G, src_ip changes — the TCP connection is dead on arrival, the TLS session along with it. That long-lived WebSocket inside your SPA? Gone.
实测:Meta 测算 5% 的视频流断流是因为切网。
Measured: Meta attributes ~5% of video stalls to network switches.
中间盒(运营商 NAT、企业防火墙、CDN)对 TCP/TLS 字段有路径上的判断逻辑。RFC 允许的扩展字段到中间盒手里就被丢包。TLS 1.3 当初为此用了"中间盒兼容模式"伪装成 TLS 1.2。HTTP/3 干脆躲到 UDP 里。
Middleboxes — ISP NATs, enterprise firewalls, CDNs — inspect TCP/TLS fields and silently drop anything new. RFC-permitted extensions get blackholed in flight. TLS 1.3 ended up disguising itself as TLS 1.2. HTTP/3 just hides inside UDP.
实测:TLS 1.3 早期遭遇 ~3% 中间盒丢包。
Measured: early TLS 1.3 saw ~3% middlebox drops.
「HTTP/2 把 HTTP 治好了,
但 HTTP/2 自己被 TCP 治残了。」 "HTTP/2 cured HTTP,
and then TCP crippled HTTP/2." Daniel Stenberg · curl · 2018
这是 IETF 在 2015-2016 年最先想到的方案。但 TCP 是内核态协议——任何字段改动都要等 Linux / Windows / iOS / Android / 每一台路由器升级一遍。看看 TCP Fast Open(RFC 7413, 2014)现状:发布十年了,实际部署率仍然 < 5%,因为中间盒会丢掉它的 cookie。
结论:在 TCP 上演进 = 在十年这个时间尺度上演进。
That was IETF's first instinct in 2015-2016. But TCP lives in the kernel — any field change waits for Linux / Windows / iOS / Android / every router to ship a new version. Look at TCP Fast Open (RFC 7413, 2014): ten years on, deployment is still < 5%, because middleboxes drop its cookie.
Conclusion: evolving on top of TCP means evolving on a decade timescale.
不是因为 UDP 好,是因为 UDP 不被人管
not because UDP is good, but because nobody touches UDP
"为什么 QUIC 跑在 UDP 上?" 这是任何讲 HTTP/3 的人都要回答的第一个问题。直觉答案"UDP 没有可靠传输、所以 QUIC 自己实现可靠"是错的——这是结果,不是原因。真正的原因只有一个:UDP 是当今互联网上仅剩的、中间盒不会乱碰的协议号。
"Why does QUIC run on UDP?" is the first question every HTTP/3 talk has to answer. The intuitive answer — "UDP isn't reliable, so QUIC has to add its own reliability" — is wrong. That's a consequence, not a cause. The real reason is one sentence: UDP is the only protocol number left on the modern internet that middleboxes don't mess with.
| 选项Option | 优势Pros | 为什么不行Why not |
|---|---|---|
| SCTP | 天然多流,按消息边界传输native multi-stream, message-based | IP protocol number 132 — 大多数 NAT 直接丢包,~50% 丢包率IP protocol 132 — most NATs drop, ~50% loss |
| DCCP | 无序但拥塞控制unordered with cc | IP protocol 33 — 同上,部署率 < 0.1%IP protocol 33 — same, < 0.1% deployed |
| 新协议号New IP protocol | 理论最干净theoretically cleanest | 需要全球每一台路由器+NAT+防火墙升级,不可能needs every router/NAT/firewall on Earth to upgrade — impossible |
| TCP option | 复用现有连接reuse existing conn | 中间盒会清空未知 TCP optionsmiddleboxes strip unknown options |
| UDP | 所有 NAT/防火墙都放行 UDP/443UDP/443 traverses everywhere | 需要在用户态重造 TCP——但这就是 QUIC 想做的have to rebuild TCP in user space — but that's exactly what QUIC wants |
把 TCP 的所有功能(重传、拥塞控制、流控、多路复用、连接管理)搬到用户态,意味着每个 QUIC 数据包都要:进内核 → recvfrom() 拷贝到用户态 → 解密 → 处理 → 加密 → sendto() 拷贝回内核 → 网卡。Fastly 2020 年的实测:QUIC 的 CPU 成本是 TCP+TLS 的 ~2 倍。这是 HTTP/3 真正的负面成本,我们会在第 22 章详细讲。
Moving everything TCP did (retransmit, cc, flow control, mux, connection management) into user space means every QUIC packet has to: enter kernel → recvfrom() copy to user space → decrypt → handle → encrypt → sendto() copy back → NIC. Fastly's 2020 measurement: QUIC costs ~2x the CPU of TCP+TLS. That is HTTP/3's real downside, and we will revisit it in chapter 22.
在钻进每章细节之前,先把骨架记牢
memorise the skeleton before diving into each chapter
QUIC 的设计可以用三个小数字描述:4 个加密级(Initial / 0-RTT / Handshake / 1-RTT)、3 个 Packet Number 空间(Initial / Handshake / Application)、2 类 Header(Long / Short)。这三个数字之间的关系,是后面所有章节的预读骨架。
QUIC's design fits into three small numbers: 4 encryption levels (Initial / 0-RTT / Handshake / 1-RTT), 3 Packet Number spaces (Initial / Handshake / Application), 2 Header types (Long / Short). The relationship between these three numbers is the pre-read skeleton for every later chapter.
↓ UDP/443 · IPv4 / IPv6 · 链路层link layer
↓ UDP/443 · IPv4 / IPv6 · link layer
CRYPTO 帧携带 TLS 1.3 的 records,而不是反过来。这就是为什么 RFC 9001 叫 "Using TLS to Secure QUIC" 而不是 "QUIC over TLS"。
Note that TLS 1.3 is not below QUIC but inside QUIC. QUIC carries TLS 1.3 records inside CRYPTO frames, not the other way around. That is why RFC 9001 is titled "Using TLS to Secure QUIC" — not "QUIC over TLS".
字段:Version(32) · DCID Len(8) · DCID · SCID Len(8) · SCID · Type-specific...
Fields: Version(32) · DCID Len(8) · DCID · SCID Len(8) · SCID · Type-specific...
Initial · 0-RTT · Handshake · Retry
字段:Flags(8) · DCID · PN(8/16/24/32)
Fields: Flags(8) · DCID · PN(8/16/24/32)
1-RTT only
从 DNS 查询到 200 OK 到连接关闭 · 每一步都标 RFC §
from DNS query to 200 OK to connection close · every step pinned to its RFC §
接下来 14 章流水线都用同一条请求把它们串起来——在 Chrome 地址栏输入 https://ursb.me,按回车。我们跟着这次请求的字节流走完它的一生:DNS 解析、初次握手、传输请求、收到响应、连接闲置、网络切换、最后优雅关闭——一共 10 个阶段。每章都有一个 "◇ 在我们的 GET 请求里" 卡片,告诉你这一章的输入、变换、输出分别是什么。
这条主线的角色清单是:
The next 14 pipeline chapters all hang off one request: type https://ursb.me in Chrome, press Enter. We follow this request's byte stream through its full life: DNS query, first handshake, request payload, response, idle, network switch, graceful close — 10 phases. Every chapter below carries a "◇ In our GET request" card showing input, transform, output at that stage.
The cast on this main line:
Chrome 不会直接发 QUIC 包——它先要问 DNS:ursb.me 在哪?支持哪些 ALPN? 这里 Chrome 用 DoH(DNS over HTTPS,RFC 8484)向 1.1.1.1 查询,请求里同时问 A(IPv4)和 HTTPS(RFC 9460)两种 RR——后者一行就能拿到 ALPN 列表 + IP hint,省一个 RTT。
Chrome can't fire a QUIC packet yet — it needs DNS first: where's ursb.me? Which ALPNs does it speak? Chrome queries 1.1.1.1 over DoH (RFC 8484), asking simultaneously for A (IPv4) and the new HTTPS RR (RFC 9460). The latter returns ALPN + IP hint in one record, saving an RTT.
DNS 回包 5 ms 后,Chrome 拼出第一个真正的 QUIC 包。因为有上次会话的 PSK ticket,这次走 0-RTT:ClientHello 和 GET 请求一起放进同一个 UDP 数据报。
5 ms after the DNS response, Chrome assembles the first actual QUIC packet. Because we have a PSK ticket from last visit, this is a 0-RTT send: ClientHello and the GET request ride in the same UDP datagram.
20 ms 后第一个回程包到达。这是多包合并(coalesced datagram)的典型场景:服务器在同一个 UDP 数据报里塞了 Initial、Handshake、1-RTT 三种包,分别承载握手不同阶段的 CRYPTO 帧和首批数据。
20 ms later the first server datagram arrives. This is a classic coalesced case: the server packs Initial, Handshake and 1-RTT packets all into one UDP datagram, carrying CRYPTO frames for different handshake stages plus the first batch of response data.
前面那个 Handshake 包确认完后,Chrome 在 ~45 ms 收到完整正文。3200 字节的 HTML 通过同一个 Stream 0 的 DATA 帧分两个 1-RTT 包送到——这就是 0-RTT 的胜利:用户看到 200 OK 时握手还没完全结束。
A few packets later, the complete body lands by ~45 ms. The 3200-byte HTML rides Stream 0 in two DATA frames spread across 1-RTT packets. The 0-RTT win is concrete here: the user sees 200 OK before the handshake is fully closed.
Chrome 收完 3200 字节后在 STREAM 帧上看到 FIN=1,知道服务器不会再发了。客户端回一个空 STREAM(带 FIN)关闭自己的方向——这是双向流的半关闭语义。同时回一个 HANDSHAKE_DONE 的 ACK,让服务器知道可以丢掉 Handshake 密钥。
Once Chrome receives the 3200 bytes, the STREAM frame carries FIN=1 — no more data this direction. The client replies with an empty STREAM(FIN) to close its direction — bidirectional half-close semantics. It also ACKs HANDSHAKE_DONE, allowing the server to drop the Handshake keys.
连接没有立刻关——Chrome 默认会保留它 30 秒,等下一个请求(CSS、图片、API 调用)复用。期间双方按需发 PING 帧(RFC 9000 §19.2)防 NAT 表项过期。max_idle_timeout 在 TP 里协商出来——min(client 30s, server 30s) = 30s。
The connection doesn't close immediately — Chrome holds it for 30 s, hoping the next request (CSS, images, an API call) reuses it. Either side may send PING frames (RFC 9000 §19.2) to keep NAT mappings alive. max_idle_timeout was negotiated in TP — min(client 30s, server 30s) = 30s.
8 分钟后用户走出咖啡馆,手机切到 5G——src_ip 从 192.168.1.42 变成 10.220.5.13。Chrome 启用预存的备用 CID,服务器看到陌生 IP + 合法 CID 立刻发 PATH_CHALLENGE。一次 RTT 内完成路径验证,连接没断。
8 minutes later the user walks out of the café and the phone switches to 5G — src_ip flips from 192.168.1.42 to 10.220.5.13. Chrome activates the pre-stocked spare CID; the server sees a new IP with a valid CID and fires PATH_CHALLENGE. Path validated in one RTT; the connection survives.
15 分钟后服务器决定下线这个连接(也可能是版本升级、负载均衡、配额到期),发 GOAWAY(H3 帧 0x07)告诉客户端"我不再接受新流,但已开的流我处理完"。等所有未完成的流结束后,发 CONNECTION_CLOSE(QUIC 帧 0x1c)正式结束连接。然后进入 draining 状态 3 PTO,等任何延迟的包不再处理——避免和"新连接"混淆。详见 Ch19。
15 minutes in, the server decides to retire this connection (rolling deploy, load-balance, quota expiry). It sends GOAWAY (H3 frame 0x07): "I'll finish in-flight streams but accept no new ones." After the last stream is done, it sends CONNECTION_CLOSE (QUIC frame 0x1c). The server then enters draining state for 3 PTO, ignoring any late packets to avoid confusion with a "new" connection. See Ch19.
如果服务器进程意外重启(OOM、crash、容器升级),客户端发的下一个 1-RTT 包会让新进程找不到对应的连接上下文。新进程不能用 CONNECTION_CLOSE(没密钥也没握手状态),只能发一个 Stateless Reset——一段看起来像随机 UDP 数据但末尾带 16 字节 reset token(在阶段 2 的 NEW_CONNECTION_ID 里预发过)的包。客户端识别 token 后才能安全地说"对方真的丢状态了",然后销毁本地连接。这是 RFC 9000 §10.3 给出的无状态恢复路径。
If the server process unexpectedly restarts (OOM, crash, container upgrade), the next 1-RTT packet from the client finds the new process without any matching connection state. The new process can't send CONNECTION_CLOSE (no keys, no state). Instead it emits a Stateless Reset: a packet that looks like random UDP bytes but ends in the 16-byte reset token the original server pre-distributed via NEW_CONNECTION_ID in Phase 2. Only the client can recognise the token — and only then can it safely conclude "peer really lost state" and tear down locally. This is the stateless-recovery path of RFC 9000 §10.3.
每一章下面都会有一个 "◇ 在我们的 GET 请求里" 卡片,把这一章的输入/动作/输出对应到上面 10 个阶段。下面这张表先把对应关系列清楚——按这个顺序读:
Each chapter below carries a "◇ In our GET request" card that anchors its input / action / output to the 10 phases above. Use this table as the reading map:
| 主线阶段Main-line phase | 深入章节Drill-down chapter | RFC § |
|---|---|---|
| Phase 0 · DNS | Ch22 Field work · Ch04 UDP | 9460 · 8484 · 9250 |
| Phase 1 · Initial out | Ch06 UDP Datagram · Ch08 0-RTT | 9000 §17.2 · 9001 §4 |
| Phase 2 · Server crypto | Ch07 Handshake · Ch09 Crypto layers | 9001 §4-§5 · 8446 §4 |
| Phase 3 · 200 OK | Ch14 H3 frames · Ch15 QPACK | 9114 §7 · 9204 |
| Phase 4 · FIN | Ch11 Streams · Ch12 Loss | 9000 §3 (states) · §19.8 |
| Phase 5 · idle | Ch13 Congestion | 9000 §10.1 · §19.2 PING |
| Phase 6 · migration | Ch17 Migration | 9000 §9 |
| Phase 7-8 · close | Ch19 Lifecycle (new) | 9114 §5.2 · 9000 §10 |
| Phase 9 · stateless reset | Ch19 Lifecycle (new) | 9000 §10.3 · §18.2 |
"DNS 解析 5 ms · 握手 + 0-RTT GET 25 ms · 收到 200 OK 45 ms ·
15 分钟后优雅关闭。
整个过程 50% 的时间花在加密,30% 在等光速。" "DNS in 5 ms · handshake + 0-RTT GET in 25 ms · 200 OK at 45 ms ·
gracefully closed 15 minutes later.
Half the time was in crypto, a third in waiting on the speed of light." 主线 · 阶段总览 main-line · phase summary
字节级的 QUIC 包结构
QUIC packet structure, byte by byte
主线时间 T+20ms:你的 Chrome 浏览器把一段还没什么内容的 ClientHello(TLS 1.3)包成一个 UDP 包,从源端口 52341 发到目标 39.105.102.252:443。这一节我们把这个包按字节拆开。
Main-line time T+20ms: Chrome wraps a still-mostly-empty TLS 1.3 ClientHello into one UDP datagram, sent from source port 52341 to destination 39.105.102.252:443. This chapter takes that packet apart byte by byte.
QUIC 包的第一字节 8 bit 全部承载结构信息:Header Form 区分长短 header、Fixed Bit 必须为 1(防 NAT 偶然匹配)、长 header 中 2 bit Long Packet Type 区分 Initial/0-RTT/Handshake/Retry、短 header 用其中 1 bit 作 Spin Bit 给运营商看 RTT、Key Phase 1 bit 标记密钥轮换。Reserved 和 PN Length 这 4 bit 是加密的(Header Protection),让中间盒看不见 PN 做流量分析。 RFC 9000 §17 全部就在这两个字节里。
The first byte of every QUIC packet carries all the routing structure: Header Form distinguishes long/short, Fixed Bit (must be 1) prevents accidental NAT collisions, the 2 Long Packet Type bits distinguish Initial/0-RTT/Handshake/Retry, while short header repurposes one bit as Spin Bit (operator RTT observability) and another as Key Phase (key-rotation marker). Reserved + PN Length (4 bits) are encrypted by Header Protection so middleboxes can't read packet numbers for traffic analysis. RFC 9000 §17 entirely lives in these two bytes.
QUIC 不只加密 payload,还加密 packet number 和 flags 的最低几位。具体做法:取 payload 加密后的密文取 16 字节"样本",用对应级别的密钥跑 AES-ECB 派生出一个 mask,把 mask 异或到 PN 和 flags 上。这一层"header protection"专门防中间盒读取 PN 做流量分析。
QUIC encrypts not only the payload but also the packet number and the low bits of the flags. The recipe: take a 16-byte sample of the ciphertext payload, run AES-ECB with the level's HP key to derive a mask, XOR the mask onto PN and flags. This "header protection" specifically defeats middleboxes that would otherwise read PN for traffic analysis.
h3 = 走的 HTTP/3。如果你看到 h2,说明被某个环节挡了——浏览器走了 TCP fallback。要查为什么,跑 chrome://net-export/ 导出 NetLog 再用 netlog-viewer.appspot.com 看。
The quickest way to see H3 in Chrome: DevTools → Network → enable the Protocol column. A row marked h3 = HTTP/3. If it says h2, something blocked you and the browser fell back to TCP. To diagnose, dump chrome://net-export/ and load it into netlog-viewer.appspot.com.
QUIC 不是 TLS over UDP,而是 QUIC carrying TLS
QUIC isn't TLS over UDP, QUIC carries TLS
把 HTTP/2 干掉的"2-RTT 起步"是 HTTP/3 最大的卖点。但要真正理解为什么 HTTP/3 能做到 1-RTT(重连 0-RTT),你需要看清 QUIC 和 TLS 1.3 是怎么融合的:不是上下层堆叠,而是 QUIC 用 CRYPTO 帧承载 TLS 1.3 的握手 records,让握手和应用数据共用一个 RTT。
Killing the "2-RTT minimum" left over from HTTP/2 is HTTP/3's biggest selling point. To really see why HTTP/3 hits 1-RTT (and 0-RTT on resumption), you need to look at how QUIC and TLS 1.3 merge: not as stacked layers, but QUIC carrying TLS 1.3 handshake records inside CRYPTO frames, letting handshake and application data share a single RTT.
完整的 1-RTT 握手包含 4 个 QUIC 包(实际是 6 个 packet 合并到 4 个 UDP datagram 里),每个包用不同的密钥集合保护:Initial keys(对所有人公开)→ Handshake keys(server 的 ServerHello 后)→ 1-RTT keys(双方 Finished 后)。TLS 1.3 的 ClientHello / ServerHello / EncryptedExtensions / Certificate / CertVerify / Finished 通过 CRYPTO 帧载入,但不直接组成 TLS record——QUIC 自己处理重传 + 排序。整个握手在 1 RTT 内完成,client 可以在它发完自己的 Finished 那一刻(round 3b)就开始发 HTTP request,不必等 server 确认。
A full 1-RTT handshake involves 4 QUIC packets (actually ~ 6 packets coalesced into 4 UDP datagrams), each protected by a different key set: Initial keys (publicly derivable) → Handshake keys (after ServerHello) → 1-RTT keys (after both sides' Finished). TLS 1.3's ClientHello / ServerHello / EncryptedExtensions / Certificate / CertVerify / Finished are carried inside CRYPTO frames — but not assembled into actual TLS records; QUIC handles ordering and retransmission itself. The full handshake completes in 1 RTT; the client may start sending its first HTTP request the moment it finishes its own Finished (round 3b), without waiting for the server's confirmation.
| Protocol | handshake | + first data | 合计 | total |
|---|---|---|---|---|
| TCP + TLS 1.2 | 1 RTT (SYN) + 2 RTT (TLS) | + 1 RTT | 4 RTT | |
| TCP + TLS 1.3 | 1 RTT (SYN) + 1 RTT (TLS) | + 1 RTT | 3 RTT | |
| TCP Fast Open + TLS 1.3 | 0.5 RTT (TFO) + 1 RTT | + 1 RTT | 2 RTT* | |
| QUIC + TLS 1.3 (1-RTT) | 1 RTT (handshake + data) | — | 1 RTT | |
| QUIC + TLS 1.3 (0-RTT) | 0 RTT (data on first packet) | — | 0.5 RTT |
握手期间客户端和服务器各自声明一组 transport parameters(TP),夹在 TLS ClientHello / EncryptedExtensions 的扩展里。这是整个连接生命周期里所有窗口、超时、限额的源头。下面是 18 个标准参数中最关键的 11 个:
During handshake both sides declare a set of transport parameters (TP), wrapped inside TLS ClientHello / EncryptedExtensions extensions. This is the single source of truth for every window, timeout, and limit in the connection's lifetime. Eleven of the eighteen standard parameters that actually matter:
| id · name | 含义Meaning | Chrome 默认 |
|---|---|---|
| 0x01 max_idle_timeout | 空闲超时(取双方最小值)idle timeout (min of both) | 30 s |
| 0x02 stateless_reset_token | 用于 §10.3 无状态重置used by §10.3 stateless reset | 16 B random |
| 0x03 max_udp_payload_size | 能接受的最大 UDP 载荷max UDP payload accepted | 1452 |
| 0x04 initial_max_data | 连接级流控窗connection-level flow window | 10 MB |
| 0x05 init_max_stream_data_bidi_local | 本方主动开的双向流的初始窗stream window for streams we open | 6 MB |
| 0x06 init_max_stream_data_bidi_remote | 对方开的双向流streams peer opens | 6 MB |
| 0x07 init_max_stream_data_uni | 单向流unidirectional streams | 6 MB |
| 0x08 initial_max_streams_bidi | 允许并发双向流数concurrent bidi stream cap | 100 |
| 0x09 initial_max_streams_uni | 单向流数uni stream cap | 100 |
| 0x0b max_ack_delay | 最大 ACK 拖延(影响 PTO)max ACK delay (drives PTO) | 25 ms |
| 0x0c disable_active_migration | 禁用主动迁移(手机选 false)opt-out of active migration | false |
| 0x0e active_connection_id_limit | 允许对端预存的 CID 数peer's CID pool size | 8 |
| 0x20 max_datagram_frame_size | DATAGRAM 帧最大长度(默认 0 = 不启用)DATAGRAM frame max (0 = disabled) | 0 / 1200 |
max_idle_timeout=30s ⇒ 30s 起效;如果客户端说 30s 服务器说 10s,10s 生效。有些参数(如 disable_active_migration)只有服务器能发,客户端发了就是协议违反。
TP is not a negotiation — it's declarations. Each side independently states what it will accept. The effective value is the tighter of the two. Both say max_idle_timeout=30s ⇒ 30s wins; client says 30s, server says 10s ⇒ 10s wins. Some parameters (like disable_active_migration) are server-only; a client sending them is a protocol violation.
* TFO 的 cookie 在路上经常被中间盒丢,工程界一般不把它算成"真的可用"。
* TFO cookies frequently get stripped by middleboxes; in practice not considered "really usable".
免费的午餐,但有重放的尾巴
a free lunch, with a replay-attack tail
第一次访问 ursb.me 之后,服务器在 1-RTT 握手末尾发了一个 NewSessionTicket——这是一段被服务器密钥加密的 blob,里面装着 PSK。Chrome 把它存起来。下次再访问 ursb.me,Chrome 把 ticket 重新发回去,同时把 GET 请求用 PSK 派生的 0-RTT 密钥加密、放进 Early Data 一起发出去——握手第 0 个 RTT 应用数据就在路上了。
After the first visit to ursb.me, the server appends a NewSessionTicket at the tail of the 1-RTT handshake — an opaque blob encrypted by the server's own key, containing a PSK. Chrome stores it. On the next visit, Chrome ships the ticket back, and simultaneously encrypts the GET request with the PSK-derived 0-RTT key and sends it as Early Data — application bytes are flying before RTT 1.
0-RTT 的 PSK 没有新鲜度。攻击者可以录下你的第一个 UDP 包,重发任意多次——服务器无法区分"是你"还是"录像回放"。对查询型 GET 没问题(重复也是同一个结果),但如果是 POST /transfer/100USD,重放就是一百次转账。
The 0-RTT PSK carries no freshness. An attacker can record your first UDP datagram and replay it forever — the server can't tell "you" from "tape rewind". Fine for an idempotent GET (same answer). Catastrophic for POST /transfer/100USD — that's a hundred transfers.
0-RTT 的无新鲜性意味着攻击者捕获一次 UDP 数据报后可以无限重放。三道防线各自切断不同的攻击窗:① 客户端方法白名单从根上阻止写操作进入 0-RTT 旅程;② 服务器 Early-Data: 1 头让应用层按请求决定是否拒收;③ TLS 层 10 秒时间窗 + 反重放 dedup cache从底层 切断 > 10s 重放和 10s 内重发。三层叠加,Cloudflare 实测能把 0-RTT 重放风险压到接近 0。
0-RTT's lack of freshness means an attacker who captures one UDP datagram can replay it indefinitely. Three layers cut different windows: ① client method whitelist prevents write methods from entering the 0-RTT path at all; ② the server Early-Data: 1 header lets the app layer decide per request whether to accept; ③ TLS-layer 10s time window + anti-replay dedup cache structurally kills both > 10s replays and re-sends inside 10s. Stacked, Cloudflare's measurements push residual replay risk close to zero.
Early-Data: 1。应用层(如 Cloudflare Worker)看到这行可以决定"不处理"或"返回 425 Too Early"。Early-Data: 1. The application layer (e.g. Cloudflare Worker) can then choose "don't process" or "return 425 Too Early".SSL_CTX_set_early_data_enabled + 集群级 deduper。SSL_CTX_set_early_data_enabled plus a cluster-wide deduper.Cloudflare 默认对所有客户开启 0-RTT,但仅限 GET/HEAD 且 URL 中不含 query string(query 经常是状态变更动作)。如果客户的 origin 返回 Cache-Control: private 或 Set-Cookie,Cloudflare 边缘自动把请求升级到 1-RTT 才转给 origin。Cloudflare 的工程博客《Even faster connection establishment with QUIC 0-RTT resumption》给出的实测:0-RTT 让已经访问过的回访用户首字节延迟(TTFB)的中位数降低 ~50ms。
Cloudflare enables 0-RTT for all customers by default, but only for GET/HEAD requests without a query string (queries are often state-changing). If the origin returns Cache-Control: private or Set-Cookie, the Cloudflare edge auto-promotes the request to 1-RTT before forwarding upstream. Per their blog «Even faster connection establishment with QUIC 0-RTT resumption», 0-RTT lowers median TTFB for returning users by ~50ms.
为什么 Initial 包"加密"但任何人都能解密
why Initial packets are "encrypted" yet anyone can decrypt them
QUIC 用四套独立密钥 而非一套——每套包含 key 16B + iv 12B + hp_key 16B。四套对应 TLS 1.3 内部的四级 secret(initial / early / handshake / master),由 HKDF 派生。Initial keys 的"salt" 是公开的——所以 Initial 包所有人都能解密;真正的保密从 Handshake keys 开始。整套设计让每一类包用恰好够用的安全等级,handshake 期间能跨级别同时收发。Key update 通过短 header 的 Key Phase bit 不重握手即可轮换 1-RTT keys。
QUIC uses four independent key sets, not one — each set contains key 16B + iv 12B + hp_key 16B. The four mirror TLS 1.3's secret hierarchy (initial / early / handshake / master), derived via HKDF. Initial keys use a public salt — so Initial packets are decryptable by anyone; real secrecy starts at Handshake keys. The design grants each packet class exactly the security level it needs while allowing cross-level packets to flow during handshake. Key update rotates 1-RTT keys without a re-handshake via the short header's Key Phase bit.
0x38762cf7…0x38762cf7…f5b8)+ 客户端选的 DCID 派生。任何人都能算出来。所以 Initial 包的"加密"不是防窃听——它防的是"中间盒看了 ClientHello 之后做出不该做的事"。这是反僵化策略落在密钥层的体现。
Initial packets derive their keys from a public salt (RFC 9001 §5.2 spells out 0x38762cf7…f5b8) + the client-chosen DCID. Anyone can compute them. So Initial-packet "encryption" does not protect confidentiality — it protects against "middleboxes peeking at ClientHello and then acting on what they saw". This is anti-ossification at the key-schedule layer.
SSLKEYLOGFILE:浏览器把每一级的 secret 写到这个文件,Wireshark 读了之后能解所有 4 级。在 macOS 启动 Chrome:SSLKEYLOGFILE=~/keys.log /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome。一旦 Encrypted Client Hello(ECH,draft-ietf-tls-esni) 进入稳定版(Cloudflare 2023 已开,Chrome 117+ 默认),这条招就只能拿到 outer ClientHello,真正的 SNI 在 inner ClientHello 里被 HPKE 加密。
Wireshark needs SSLKEYLOGFILE to decrypt QUIC: the browser writes each level's secret to that file, and Wireshark can decode all four. On macOS, launch Chrome with SSLKEYLOGFILE=~/keys.log /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome. Once Encrypted Client Hello (ECH, draft-ietf-tls-esni) stabilises (Cloudflare turned it on in 2023; Chrome 117+ ships it on by default), this trick only yields the outer ClientHello — the real SNI lives inside an HPKE-encrypted inner ClientHello.
payload 不是字节流,是帧的串联
payload isn't a byte stream, it's a chain of frames
解密一个 QUIC 包的 payload,你得到的不是"一段数据",而是一串帧。每个帧自带类型和长度——服务器和客户端按顺序处理。下面把 RFC 9000 §19 的全部帧(加上 RFC 9221 的 DATAGRAM)整理成四类,让骨架可见。
Decrypt a QUIC packet's payload and you don't get "a chunk of data" — you get a chain of frames. Each carries its own type and length; both ends process them in order. Below is the full RFC 9000 §19 catalogue (plus RFC 9221 DATAGRAM), sorted into four families.
max_datagram_frame_size 限(默认禁用,需要双方协商)。它存在的全部理由是 WebTransport / MASQUE / Media-over-QUIC 这种"宁愿丢一帧也别等"的实时场景。普通 HTTP/3 流量根本不该碰它。
The DATAGRAM frame is QUIC's only unreliable payload — no retransmit, no flow control, no ordering. Max size capped by the max_datagram_frame_size SETTING (disabled by default, must be negotiated). Its sole purpose is to enable WebTransport / MASQUE / Media-over-QUIC — the "better-drop-than-wait" real-time use cases. Plain HTTP/3 traffic should never touch it.
STREAM 帧的低 3 位编码了三个独立开关:OFF(带不带偏移量)/ LEN(带不带长度)/ FIN(是不是流尾)。2³ = 8 个 type 编码 0x08-0x0f。
The low 3 bits of a STREAM frame encode three independent flags: OFF (carries an offset?), LEN (carries a length?), FIN (is this stream's last byte?). 2³ = 8 type codes 0x08-0x0f.
0x37 = 55;0x40 0x40 = 64;0x80 0x00 0x40 0x00 = 16384。这种"小数小占用"的设计让 ACK 帧之类的小包平均小 30%,是 QUIC 的隐形性能源。
Almost every length field in QUIC uses variable-length integers (var-ints): the top 2 bits decide 1/2/4/8-byte encoding. 0x37 = 55; 0x40 0x40 = 64; 0x80 0x00 0x40 0x00 = 16384. This "small = small" encoding makes small frames like ACKs ~30% smaller on average — an invisible source of QUIC's throughput edge.
QUIC 的 ACK 帧比 TCP 的 SACK 强 10 倍。一个 ACK 帧里可以装 Range:{largest_ack, [gap, ack_range]*}——告诉对方"我收到了 PN 100,PN 90-95 收到,PN 80-85 收到,..."最多一个 ACK 帧描述整个连接的所有已收。TCP SACK option 因为占在 TCP options 里,最多 4 个 range;QUIC 没限制。
QUIC's ACK frame is 10× more capable than TCP SACK. One ACK frame can pack multiple ranges: {largest_ack, [gap, ack_range]*} — "I have PN 100, 90-95, 80-85, ..." A single ACK frame can describe every received PN of the whole connection. TCP SACK lives in TCP options, capped at 4 ranges; QUIC has no such cap.
HTTP/2 在应用层做多路复用,HTTP/3 在传输层做
HTTP/2 mux at the app layer, HTTP/3 mux at transport
每个流有一个 var-int 编码的 ID。最低 2 位同时编码两件事:方向(双向 / 单向)+ 发起方(客户端 / 服务器)。
Every stream has a var-int ID. The low 2 bits encode two things at once: direction (bidi / uni) and originator (client / server).
| bits | 编码Encoded | 含义Meaning | HTTP/3 use |
|---|---|---|---|
| 0x00 | 0, 4, 8, 12, … | 客户端发起双向流Client-initiated bidi | 请求流request streams |
| 0x01 | 1, 5, 9, 13, … | 服务端发起双向流Server-initiated bidi | HTTP/3 不用unused in H3 |
| 0x02 | 2, 6, 10, … | 客户端发起单向流Client-initiated uni | control · QPACK encoder/decodercontrol · QPACK enc/dec |
| 0x03 | 3, 7, 11, … | 服务端发起单向流Server-initiated uni | control · QPACK · Pushcontrol · QPACK · Push |
GET ursb.me/,用的是 Stream ID = 0(第一条客户端双向流)。Chrome 同时打开三条单向流:StreamID=2(H3 control stream)、StreamID=6(QPACK encoder)、StreamID=10(QPACK decoder)。这就是为什么下一章讲 HTTP/3 帧时你会看到"控制流要先开"。
Our GET uses StreamID = 0 (the first client-initiated bidi stream). Chrome simultaneously opens three uni streams: StreamID=2 (H3 control), StreamID=6 (QPACK encoder), StreamID=10 (QPACK decoder). This is why the next chapter says "the control stream must open first".
QUIC stream 是双向 = 两套独立的半状态机(sending half + receiving half)。每个 endpoint 看到的 stream 有自己的 sending half 和 receiving half——它们独立转移。常规生命周期是 Ready → Send → Data Sent → Data Recvd(发送端)和 Recv → Size Known → Data Recvd → Data Read(接收端)。RESET_STREAM 让发送端进入 Reset Sent;STOP_SENDING 让接收端请求发送端 reset。这种双向解耦 是 HTTP/3 流多路复用的形式保证。
A QUIC stream is bidirectional = two independent half state machines (sending half + receiving half). Each endpoint sees its own sending and receiving halves of a given stream — they transition independently. Normal lifecycle: Ready → Send → Data Sent → Data Recvd (sender) and Recv → Size Known → Data Recvd → Data Read (receiver). RESET_STREAM sends the sender into Reset Sent; STOP_SENDING lets the receiver ask the sender to reset. This bidirectional decoupling is the formal guarantee of HTTP/3 stream multiplexing.
每条流维护 MAX_STREAM_DATA。发送方累积发送的字节超过这个值就停。接收方通过 MAX_STREAM_DATA 帧主动增窗。
Each stream tracks MAX_STREAM_DATA. Sender stops when cumulative sent bytes hit the limit. Receiver grows the window with MAX_STREAM_DATA frames.
所有流字节的总和受 MAX_DATA 限。避免单个连接吃光内存。Chrome 默认 6 MB(OkHttp 25 MB · curl 1 MB)。
Sum of all streams' bytes capped by MAX_DATA. Stops one connection from eating all memory. Chrome defaults to 6 MB (OkHttp 25 MB · curl 1 MB).
流量控制有两个独立维度,每个维度都跑一组相同的状态变量:
Flow control runs in two independent dimensions, each with the same set of state variables:
"HTTP/2 用一个 TCP 连接装 100 条流,
HTTP/3 用一个 QUIC 连接装 100 条真正独立的流。" "HTTP/2 stuffs 100 streams into one TCP connection.
HTTP/3 stuffs 100 actually independent streams into one QUIC connection." RFC 9000 §2 paraphrased
为什么 QUIC 的 RTT 比 TCP 准
why QUIC measures RTT more accurately than TCP
TCP 的 sequence number 指代字节偏移。重传时 seq 完全相同——你收到的 ACK 到底是回原包还是回重传包?没法分。这就是著名的 retransmission ambiguity,导致 RTT 测量必须用"Karn 算法"忽略重传 RTT。
TCP seq numbers identify byte offsets. A retransmission has the same seq as the original. When an ACK arrives, you can't tell whether it's for the original or the retransmit. This is the infamous retransmission ambiguity; it forces TCP to use "Karn's algorithm" and discard retransmit RTT samples.
QUIC 的 packet number 永不复用。重传时新 PN,旧 PN 永远废弃。ACK 回的是哪个 PN,就是哪个 PN——RTT 测量绝对精确。这是 BBR 等高级拥塞控制能在 QUIC 上"开挂"的根源。
QUIC packet numbers are never reused. A retransmit carries a new PN, the old PN is dead forever. An ACK names exactly the PN it acknowledges — RTT samples are exact. This is why advanced cc like BBR runs better on QUIC than TCP.
smoothed_RTT + 4 × RTTVAR + max_ack_delay,就触发 PTO——发一个 PING 探测包"叫醒"对方。取代了 TCP 的 RTO 一次干 1 秒。smoothed_RTT + 4·RTTVAR + max_ack_delay, PTO fires — send a PING to "wake up" the peer. Replaces TCP's RTO with its 1-second hammer.QUIC 用三种独立机制并行检测丢包,谁先触发谁赢:① 包数阈值(kPacketThreshold = 3),收到比目标 PN 大 3 的 ACK 即判丢;② 时间阈值(9/8 × max RTT),用作①的兜底;③ PTO,处理"整条尾部都没 ACK"的死锁场景,指数 backoff。本图例里 ① 在 65 ms 处先触发,PTO 是不会用到的最后保险。RFC 9002 §6 把这三层判定写成一个 timer + 一组规则,后面伪代码会展开。
QUIC runs three independent loss-detection mechanisms in parallel; whichever fires first wins: ① packet threshold (kPacketThreshold = 3) — an ACK arrives for a PN at least 3 above the suspect; ② time threshold (9/8 × max RTT) — backs up ①; ③ PTO — for "nothing was acked at all" deadlock, with bounded exponential backoff. In this example ① fires first at 65 ms; PTO is the safety net you hope never trips. RFC 9002 §6 encodes all three behind one timer + one rule set — unfolded in the pseudocode below.
QUIC 让拥塞控制变成应用配置
QUIC turns congestion control into an app setting
TCP 的拥塞控制写在内核里——升级一次要等几年。QUIC 把它搬到了用户态。Cloudflare 想换 BBR v3?改一行 Rust。Google YouTube 想用自家的 cc 算法?同样改一行 C++。这是 QUIC 真正的"研发加速器"价值——它让网络拥塞控制变成应用层关切,而不是十年内核排队等升级的事。
TCP cc lives in the kernel — upgrading takes years. QUIC moved it to user space. Cloudflare wants BBR v3? Change one Rust line. Google YouTube wants its own cc algorithm? Same — one C++ line. This is QUIC's real "R&D accelerator" value: it turns congestion control into an application concern, not a decade-long kernel queue.
| cc | signal | throughput | fairness | 部署在deployed at |
|---|---|---|---|---|
| NewReno (RFC 9002 default) | loss | baseline | good | 小实现库的默认smaller libs default |
| CUBIC (RFC 8312) | loss | 1.5x baseline | good | Linux TCP 默认 · ngtcp2Linux TCP default · ngtcp2 |
| BBR v2/v3 | bandwidth + RTT | 2-3x baseline | warn:CUBIC starve | Google · Cloudflare · Meta |
CUBIC / NewReno 用丢包当拥塞信号——但现代网络的丢包大多来自无线信道错误,不是拥塞。BBR 直接测量瓶颈带宽(max bandwidth)和最小 RTT,用 BDP(带宽时延积)当目标在途字节数。结果:BBR 在有损但不拥塞的链路(4G/5G/Wi-Fi)上吃满带宽;CUBIC 在那种链路上把随机丢包误判为拥塞,反复半切 cwnd → 三次方爬升 → 再切——cwnd 在带宽线下方呈锯齿震荡,平均利用率明显低于实际可用带宽。
CUBIC / NewReno treat loss as the congestion signal — but most modern packet loss comes from wireless channel errors, not congestion. BBR directly measures bottleneck bandwidth (max bw) and minimum RTT, then uses BDP (bandwidth-delay product) as its target in-flight. Result: BBR saturates bandwidth on lossy but uncongested links (4G/5G/Wi-Fi). CUBIC on those same links misreads random loss as congestion, halving cwnd repeatedly, climbing back cubically, halving again — cwnd oscillates as a sawtooth well below the actual ceiling, with average utilisation visibly lower than the link could carry.
Google 2017 年 SIGCOMM 论文《BBR: Congestion-Based Congestion Control》给出:在美国跨州链路上,BBR 让 YouTube 的视频缓冲事件率下降 53%,启动时间降低 8%。2024 年 BBR v3 进一步把吞吐稳定性提升约 15%。Google 把 BBR 同时部署到 TCP(Linux 内核 4.9+)和 QUIC(QUICHE)——但 QUIC 上的 BBR 因为 PN 单调更精确(见 Ch12),效果更稳。
Google's 2017 SIGCOMM paper «BBR: Congestion-Based Congestion Control» reported: on US cross-state links, BBR reduced YouTube's video rebuffer rate by 53% and startup time by 8%. BBR v3 (2024) tightened throughput stability another ~15%. Google deploys BBR on both TCP (Linux kernel 4.9+) and QUIC (QUICHE) — but the QUIC variant runs more stably thanks to monotonic PN (see Ch12).
QUIC 把 packet number 都加密了——运营商再也不能用过去测 TCP RTT 的招测 QUIC RTT。这让大量运营商抓狂(他们的 SLA 监控、流量调度全靠 RTT 数据)。QUIC WG 妥协的设计:Spin Bit——short header 里有 1 比特,在每个 RTT 翻转一次,中间盒不解密也能被动测算 RTT。客户端可以选择关闭它(出于隐私),但生产环境基本都开。
QUIC encrypts packet numbers — operators can no longer measure RTT the way they did with TCP. This drove operators wild (their SLAs and traffic engineering all depend on RTT). QUIC WG's compromise: Spin Bit — 1 bit in the short header that flips once per RTT. Middleboxes can passively measure RTT without decrypting. Clients may disable it for privacy, but in production it's almost always on.
1/cwnd(即每 RTT 涨 1 包);(3) 持续拥塞——3 个 PTO 没有任何 ACK,被认定为路径中断,cwnd 重置到最小。BBR 抛弃了这套循环,直接测量瓶颈带宽,在弱网/移动场景下吞吐量大约是 NewReno 的 1.5-3 倍(Cardwell et al., ACM Queue 2017 · BBR v1)——参见 Ch21 性能数据。
This pseudocode simplifies RFC 9002 §7 by omitting the PTO state machine's back-off, ECN handling, and exact persistent-congestion thresholds — but the three core states are intact. Mechanisms: (1) slow start — cwnd grows by one segment per ACK; (2) congestion avoidance — cwnd grows by 1/cwnd per ACK (i.e. 1 packet per RTT); (3) persistent congestion — 3 PTOs with no ACK is treated as a path break, cwnd resets to minimum. BBR ditches this whole loop and directly measures bottleneck bandwidth, giving roughly 1.5-3× the throughput of NewReno on cellular / lossy links (Cardwell et al., ACM Queue 2017 · BBR v1) — see Ch21 for production numbers.
QUIC 已经做完的事,HTTP/3 就不再重复
whatever QUIC already did, HTTP/3 doesn't redo
HTTP/2 有 10 种帧(DATA / HEADERS / PRIORITY / RST_STREAM / SETTINGS / PUSH_PROMISE / PING / GOAWAY / WINDOW_UPDATE / CONTINUATION),HTTP/3 只有 7 种——因为 QUIC 把流控制、流终止、ping、优先级都包了。HTTP/3 只剩"HTTP 自己的事"。
HTTP/2 has 10 frame types. HTTP/3 has 7 — because QUIC already handles flow control, stream reset, ping, and priority. HTTP/3 only carries "HTTP's own business" now.
| Type | Hex | 用途Purpose | HTTP/2 里In HTTP/2 |
|---|---|---|---|
| DATA | 0x00 | HTTP bodyHTTP body | same |
| HEADERS | 0x01 | QPACK 压缩头部QPACK-encoded headers | same |
| CANCEL_PUSH | 0x03 | 取消 Push(已死)cancel push (dead) | — |
| SETTINGS | 0x04 | 连接参数connection params | same |
| PUSH_PROMISE | 0x05 | 服务器 Push(已死)server push (dead) | deprecated |
| GOAWAY | 0x07 | 优雅关闭graceful close | same |
| MAX_PUSH_ID | 0x0d | 允许的 Push ID 上限push limit | — |
| — 砍掉 —— removed — | — | PRIORITY · RST_STREAM · PING · WINDOW_UPDATE · CONTINUATION | QUIC 处理handled by QUIC |
0x00 = control, 0x01 = push, 0x02 = QPACK encoder, 0x03 = QPACK decoder。GREASE 类型(用预留范围 0x1f * N + 0x21,RFC 9114 §7.2.8 + RFC 9287)任何端都可以发——这就是 RFC 9114 的反僵化策略:故意送一些对方不认识的流,强迫实现"遇到不认识就忽略",否则永远不会有 0x04 出现。
A uni stream's first byte is the stream type, not a frame type. 0x00 = control, 0x01 = push, 0x02 = QPACK encoder, 0x03 = QPACK decoder. GREASE types (reserved range 0x1f·N + 0x21, RFC 9114 §7.2.8 + RFC 9287) can be sent by either side — RFC 9114's anti-ossification trick: deliberately send streams the peer doesn't recognise, forcing implementations to "ignore unknown", so 0x04 can land in the future.
HTTP/2 的优先级是个有名的笑话——RFC 7540 §5.3 设计了一棵weighted dependency tree,让客户端"告诉服务器谁先发"。Firefox 写过、Chrome 写过、Safari 没写。三家行为完全不一致,最后 RFC 9113 把它整段废弃了。
HTTP/3 选择了完全不同的路线 ——RFC 9218 · Extensible Priorities for HTTP(2022-06,和 RFC 9114 同期发):
HTTP/2's priority was a famous joke — RFC 7540 §5.3 designed a weighted dependency tree for clients to tell servers "send these first". Firefox shipped one. Chrome shipped a different one. Safari shipped none. The three implementations behaved nothing alike. RFC 9113 finally obsoleted the whole thing.
HTTP/3 went a different route entirely — RFC 9218 · Extensible Priorities for HTTP (2022-06, shipped with RFC 9114):
priority: u=3 — urgency 0(高)…7(低)i — incremental(流式可逐字节渲染)priority 选项手动覆盖:fetch(url, { priority: 'high' })。这是 RFC 9218 在浏览器侧的唯一对外接口。
Chrome DevTools → Network → "Priority" column on the right. Chrome maps main resource / CSS / JS / image / font internally to u=0..5. You can override with the Fetch API's priority option: fetch(url, { priority: 'high' }). That's the only browser-facing surface for RFC 9218.
为什么不能直接用 HPACK
why we couldn't just keep HPACK
HPACK(HTTP/2)依赖一个严格同步的动态表。服务器在 Stream A 发了 ":status: 200",告诉客户端"把这条加进表,索引 62"。下一个流可以用索引 62 来引用——前提是 Stream A 在 Stream B 之前到达。HTTP/2 over TCP 天然按序,所以没问题。
QUIC 各流相互独立、并发到达。Stream A 的 update 还没来,Stream B 已经用了索引 62——无法解压。这就把 transport 层好不容易消灭的 head-of-line blocking 又拽回了应用层。
HPACK (HTTP/2) depends on a strictly synchronised dynamic table. Server sends ":status: 200" on Stream A and says "insert this, index 62". The next stream can now refer to index 62 — assuming Stream A arrives before Stream B. HTTP/2 over TCP is naturally ordered, so this works.
QUIC streams are independent and arrive concurrently. If Stream A's update hasn't landed yet but Stream B already references index 62 — cannot decode. The head-of-line blocking the transport layer worked so hard to kill comes roaring back at the app layer.
QPACK 把头部压缩拆到 3 条独立 QUIC 流上:bidi 请求流(紫)携带对动态表项的引用,encoder 流(绿,server→client)推送动态表插入,decoder 流(橙,client→server)回报 insert-count + section-ack + stream-cancel。读者最该记的细节是RIC(Required Insert Count)闸门:每条请求流只在自己引用的插入未到达时暂停,不影响并发请求——这是 HPACK 在 QUIC 上"把跨流 HOL 阻塞拽回来"的根本性解。
QPACK splits header compression across three independent QUIC streams: bidi request streams (purple) carry references to dynamic-table entries; the encoder stream (green, server→client) ships dynamic-table inserts; the decoder stream (copper, client→server) reports insert-count + section-ack + stream-cancel. The detail to remember is the Required Insert Count (RIC) gate: a request stream stalls only when the inserts it references haven't arrived, never on a sibling stream. This is the structural answer to HPACK's "cross-stream HOL block resurrected on QUIC" failure mode.
alt-svc、content-security-policy、strict-transport-security、:scheme: https 等现代 web 必备字段。alt-svc, content-security-policy, strict-transport-security, :scheme: https and other modern-web staples.| scenario | raw bytes | HPACK (H2) | QPACK (H3) |
|---|---|---|---|
| 首次请求first request | ~600 | ~50 | ~52 |
| 同连接重复请求repeated request, same conn | ~600 | ~5 | ~6 |
| 弱网(丢包)weak link (lossy) | ~600 | ~5 + HOL | ~6 (no HOL) |
压缩率本身差不多。QPACK 的赢面在抗丢包。
Compression ratios are nearly identical. QPACK's win is in resistance to loss.
http3_max_field_size,那就是它。
Most QUIC libraries default the dynamic table to 4 KB — much smaller than HPACK's 64 KB. Reason: the bigger the table, the more "blocked streams" pile up. Bump it up on intra-datacenter / low-latency paths; don't on public / mobile networks. The nginx knob is http3_max_field_size.
一个写在 RFC 里、被市场否决的功能
a feature that lived in the RFC and died in production
2015 年 HTTP/2 把 Server Push 当成杀手特性写进了 RFC 7540——服务器知道客户端马上要 app.css,那为什么不提前推给它?2022 年 Chrome 106 默认禁用了 Server Push。2024 年彻底从 Chromium 代码里移除。HTTP/3 RFC 9114 出于"协议完整性"保留了 PUSH_PROMISE 帧——但浏览器都不接。
In 2015, HTTP/2 wrote Server Push into RFC 7540 as a killer feature — the server knows the client will need app.css, so why not push it ahead of time? In 2022, Chrome 106 disabled Server Push by default. In 2024, it was deleted from the Chromium tree. HTTP/3 RFC 9114 kept the PUSH_PROMISE frame for "protocol completeness" — but no browser accepts it anymore.
服务器盲目 push app.css——但如果客户端缓存里已经有了呢?带宽白浪费。Chrome 实测发现 70%+ 的 push 被客户端立即 CANCEL_PUSH 掉。
The server blindly pushes app.css — but what if the client already has it cached? Bandwidth wasted. Chrome's telemetry: 70%+ of pushes get immediately CANCEL_PUSHed.
服务器推的 app.css 在线上跟客户端发起的 app.js 抢拥塞窗。BBR 不知道哪个更急——结果两个都慢。
The server-pushed app.css competes with the client-issued app.js on the congestion window. BBR can't tell which is more urgent — both end up slower.
服务器先发一个 HTTP 103 Early Hints 响应(RFC 8297),告诉客户端"你可能会需要 app.css"。客户端自己决定要不要 preload。简单、可观察、不抢带宽。
Server sends a HTTP 103 Early Hints response (RFC 8297) telling the client "you'll probably need app.css". The client decides whether to preload. Simple, observable, no bandwidth war.
CDN(Cloudflare 等)依然在边缘到 origin 之间偷偷用 Push 做 prefetch 优化——这不进客户端浏览器,所以不受 Chrome 106 影响。这种"内网 Push"还活着。
CDNs (Cloudflare et al.) still quietly use Push between their edge and origin for prefetch optimisation — that traffic never reaches the client browser, so Chrome 106 doesn't affect it. "Intra-network Push" lives on.
"Server Push 在 RFC 里完美无瑕,
在生产里几乎没找到一个稳定的用例。" "Server Push was flawless in the RFC,
and almost no stable use case ever showed up in production." Patrick Meenan · Chrome Web Performance · 2022
CID 是 QUIC 的身份证
the Connection ID is QUIC's passport
主线时刻 T+200ms(请求中途):你走出咖啡馆,手机自动切到 5G。src_ip: 192.168.1.42 → 10.220.5.13。TCP 在这里必死,因为连接由四元组定义。HTTP/3 不死——因为 QUIC 连接由 Connection ID 定义,而不是四元组。
Main-line time T+200ms (mid-request): you walk out of the café, the phone hops to 5G. src_ip: 192.168.1.42 → 10.220.5.13. TCP dies here, because TCP identifies a connection by the 4-tuple. HTTP/3 doesn't die — because QUIC identifies a connection by the Connection ID, not by IP-port.
连接建立后,服务器和客户端不停发 NEW_CONNECTION_ID 帧,互相给对方备好"未来可以用的 CID 列表"。每个 CID 还附带一个 Stateless Reset Token——用于无状态重置。
Once the connection is up, both sides keep emitting NEW_CONNECTION_ID frames, populating each other's "list of CIDs you may use in future". Each CID carries a Stateless Reset Token too — for stateless reset.
连接迁移分 5 步,总耗时 ~ 200 ms,期间 H3 请求不中断:① OS 切到 5G 后客户端发现新 IP;② 客户端用新 DCID 发 PATH_CHALLENGE(8 字节随机数,要求 server 回 echo);③ server 验证 RTT 没超过旧路径 3× 就回 PATH_RESPONSE,带新 SCID;④ 客户端把后续 STREAM 切到新 DCID;⑤ 双方 RETIRE_CONNECTION_ID 释放旧 CID。关键在 CID pool——双方在连接活跃期就已经互相发了 ≥ 3 个备用 CID,无需迁移时谈判。这是 TCP 在物理上不可能做的事。
Connection migration is 5 steps, ~ 200 ms total, during which the H3 request does not break: ① OS hands the phone a new IP after 5G hop; ② client sends PATH_CHALLENGE (8 random bytes) using a new DCID from the pool; ③ server validates RTT against the old path (must be < 3×) and returns PATH_RESPONSE with its new SCID; ④ client routes subsequent STREAM frames over the new DCID; ⑤ both sides issue RETIRE_CONNECTION_ID. The key is the CID pool — both sides have pre-issued ≥ 3 spare CIDs during normal operation, so no negotiation round-trip at migration time. This is physically impossible in TCP.
家用路由器的 NAT 表项一般有过期时间(30 秒~2 分钟)。如果客户端短时间没发包,NAT 会回收映射;下次再发包时,src_port 可能变了——这等于一次"客户端不知情的迁移"。RFC 9000 §9 把这种情况归到 "passive migration",处理逻辑和主动迁移一致:服务器看到新 4-tuple 就发 PATH_CHALLENGE。
Home router NAT entries usually have an expiration (30s-2min). If the client stays silent, NAT recycles the mapping; the next packet may have a different src_port — effectively a "migration the client doesn't know about". RFC 9000 §9 calls this "passive migration", handled identically: the server sees a new 4-tuple and sends PATH_CHALLENGE.
3x 限制 · Retry · MPQUIC
3x limit · Retry · MPQUIC
UDP 无连接 ⇒ 服务器不知道"请求人是不是真的在这个 src_ip"。攻击者可以伪造 victim 的 src_ip 给 QUIC 服务器发 1 字节小包,让服务器回复 10000 字节大包到 victim ——典型的 DNS amp 攻击套路。QUIC 必须从协议层防住。
UDP is connectionless ⇒ the server doesn't know "is the requester really at this src_ip?" An attacker can spoof a victim's src_ip, send 1-byte QUIC packets to the server, and trick it into firing 10 000-byte responses at the victim — the classic DNS amp pattern. QUIC has to defend at the protocol level.
如果服务器收到的 ClientHello 看起来可疑(流量异常、资源紧张),可以回一个 Retry 包——里面装一个加密的 token。客户端必须重发 ClientHello 并带上 token。token 等于"我证明你在这个 IP"——下次再来直接信任。Cloudflare 在 DDoS 攻击期间会大量使用 Retry。
If a ClientHello looks suspicious (traffic spikes, resource crunch), the server can return a Retry packet carrying an encrypted token. The client must re-send ClientHello with that token. The token attests "I've proven you're at this IP" — next visits skip the check. Cloudflare hammers Retry during DDoS storms.
draft-ietf-quic-multipath(截至 2026 已成熟)允许一个 QUIC 连接同时跑 Wi-Fi 和 5G 两条路径。包号空间共享,stream 数据在两条路径上自由调度。Apple iCloud Private Relay 是最早的大规模生产 MPQUIC 部署。
与 MPTCP 对比:MPTCP 只能在内核做,部署率 < 1%;MPQUIC 完全在用户态,每个 QUIC 库都可以独立实现。
draft-ietf-quic-multipath (mature by 2026) lets one QUIC connection simultaneously use Wi-Fi and 5G. Packet number spaces are shared; stream data schedules freely across paths. Apple iCloud Private Relay is the earliest large-scale MPQUIC deployment.
vs MPTCP: MPTCP is kernel-only, < 1% deployed. MPQUIC lives entirely in user space — any QUIC library can implement it independently.
Apple 使用 MASQUE(CONNECT-UDP)把 QUIC 隧道分发给两个独立的中继节点。手机端的 NSURLSession + MPQUIC 自动在 Wi-Fi/5G 两条物理路径上做透明聚合——当 Wi-Fi 抖动时,5G 直接接管,应用层零感知。这是第一次在消费级设备上规模化跑 MPQUIC。
Apple uses MASQUE (CONNECT-UDP) to distribute QUIC tunnels across two independent relay nodes. NSURLSession + MPQUIC on the phone transparently aggregates across Wi-Fi/5G physical paths — when Wi-Fi jitters, 5G takes over instantly, with zero app awareness. The first consumer-scale MPQUIC deployment.
GOAWAY · CONNECTION_CLOSE · draining · idle · stateless reset
GOAWAY · CONNECTION_CLOSE · draining · idle · stateless reset
之前 18 章都讲请求来的事——但一个真实的 QUIC 连接还要走完关闭、排空、复活三种结局。生产环境里大部分 bug、半小时一次的"无原因连接重置"、CDN 滚动重启时的瞬时错误,全藏在这一章。
The previous 18 chapters covered request arrival. A real QUIC connection still has to walk through close, drain, revive. Most production bugs, the "mysterious connection resets" every 30 minutes, the transient errors during CDN rolling restarts — they all hide in this chapter.
GOAWAY(RFC 9114 §5.2,H3 帧 0x07)告诉客户端"新流我不接,已开的流我处理完"。等所有 stream 跑完,发 CONNECTION_CLOSE(RFC 9000 §19.19,QUIC 帧 0x1c)正式结束。GOAWAY first (RFC 9114 §5.2, H3 frame 0x07): "no new streams, but I'll finish in-flight ones". Once every stream completes, it sends CONNECTION_CLOSE (RFC 9000 §19.19, QUIC frame 0x1c) for real.CONNECTION_CLOSE(error=N)。所有进行中的流立即收到 RESET_STREAM。常见于客户端检测到加密协议错误时——比如 PN 单调性被破坏(§13.2.3)。CONNECTION_CLOSE(error=N) at once. All in-flight streams receive RESET_STREAM. Common when the client detects a crypto-layer violation — e.g. PN monotonicity broken (§13.2.3).max_idle_timeout,取较小值。30 秒没收到任何包,连接静默销毁——不发 CC,不通知对端。这是 NAT 表项过期的常态。要保活:发 PING 帧(§19.2)刷新计时器。max_idle_timeout in TP, take the smaller. After 30 s with no packets, the connection is silently destroyed — no CC, no peer notification. This is also how NAT entries die. To prevent: send PING (§19.2) to reset the timer.关闭不能立刻完成——因为对端可能还在 in-flight 中送包过来。如果端点立刻销毁连接状态、再开一个新连接,新连接可能收到旧连接的包,把它误当成新连接的握手包处理——后果可能很严重。
RFC 9000 §10.2 的解法是:发完 CONNECTION_CLOSE 后进入 closing 状态,3 PTO 之内每收到一个包就回一次 CC(用 idempotent CC 避免对端不断重试);然后进入 draining,纯丢包 3 PTO;最后才进入 closed 销毁内存。这 3+3=6 PTO 大约 100-300ms——是 QUIC 连接关闭的真实耗时,不是你看到的"立刻"。
Close cannot complete instantly — the peer might still be sending packets in-flight. If an endpoint frees state immediately and opens a fresh connection, the fresh one might receive the old connection's packets and confuse them with new-connection handshake — potentially catastrophic.
RFC 9000 §10.2's fix: after sending CONNECTION_CLOSE, enter closing; for 3 PTO reply with another CC to every incoming packet (idempotent CC prevents the peer's retries). Then enter draining: silently drop everything for another 3 PTO. Only then enter closed and free memory. 3 + 3 = 6 PTO ≈ 100-300 ms — that's the real cost of closing a QUIC connection, not the "instant" you see.
GOAWAY(stream_id=∞) 标记 "不接新请求",等 ~30 秒让 in-flight 完成,再发 GOAWAY(0) + CONNECTION_CLOSE。Cloudflare 的 Pingora 框架专门为这套逻辑做了状态机。
CDNs (Cloudflare, Fastly, Akamai) must implement GOAWAY correctly during edge node rolling restarts, or millions of long-lived connections get reset at once — every client reconnects simultaneously = thundering herd. Correct sequence: send GOAWAY(stream_id=∞) marking "no new requests", wait ~30 s for in-flight to drain, then GOAWAY(0) + CONNECTION_CLOSE. Cloudflare's Pingora framework has a dedicated state machine for this.
CONNECTION_CLOSE 帧带一个错误码——按"是 QUIC 层错还是 H3 层错"分两种:
CONNECTION_CLOSE carries an error code — split into "QUIC-layer" vs "H3-layer":
| frame 0x1c · QUIC 层QUIC-layer | code | frame 0x1d · H3 层(透传)H3-layer (passthrough) | code |
|---|---|---|---|
| NO_ERROR | 0x00 | H3_NO_ERROR | 0x0100 |
| INTERNAL_ERROR | 0x01 | H3_GENERAL_PROTOCOL_ERROR | 0x0101 |
| CONNECTION_REFUSED | 0x02 | H3_INTERNAL_ERROR | 0x0102 |
| FLOW_CONTROL_ERROR | 0x03 | H3_STREAM_CREATION_ERROR | 0x0103 |
| STREAM_LIMIT_ERROR | 0x04 | H3_CLOSED_CRITICAL_STREAM | 0x0104 |
| STREAM_STATE_ERROR | 0x05 | H3_FRAME_UNEXPECTED | 0x0105 |
| PROTOCOL_VIOLATION | 0x0a | H3_REQUEST_REJECTED | 0x010b |
| CRYPTO_ERROR(N) | 0x0100+N | H3_VERSION_FALLBACK | 0x0110 |
完整清单:RFC 9000 §20 列 18 个 QUIC 错误码;RFC 9114 §8.1 列 17 个 H3 错误码。CRYPTO_ERROR(N) 把所有 TLS Alert 透传出来——比如 CRYPTO_ERROR(0x132) = TLS BAD_RECORD_MAC。
Full lists: RFC 9000 §20 defines 18 QUIC error codes; RFC 9114 §8.1 defines 17 H3 error codes. CRYPTO_ERROR(N) tunnels any TLS Alert — e.g. CRYPTO_ERROR(0x132) = TLS BAD_RECORD_MAC.
NEW_CONNECTION_ID 时(RFC 9000 §18.2)都会附上一个 stateless_reset_token,由 HMAC(reset_secret, CID) 派生。客户端把所有看到的 token 存起来;下次如果收到一个"看起来像随机包"且末尾 16 字节命中其中一个 token,就触发 stateless reset 销毁路径。无密钥下的状态恢复——这是 QUIC 工程最优雅的设计之一。
The server attaches a stateless_reset_token every time it sends NEW_CONNECTION_ID (RFC 9000 §18.2), derived as HMAC(reset_secret, CID). The client stores every token it's ever seen. Next time it receives a "looks-random" packet whose last 16 bytes match a stored token, it triggers the stateless-reset teardown path. Keyless state recovery — one of the most elegant designs in QUIC engineering.
如果一条连接活了几小时(比如 WebSocket 替代品),用同一把 1-RTT 密钥发太多包会增加分析攻击面。RFC 9001 §6 给出了原地滚动密钥的机制:发送方把 short header 的 Key Phase 位(1 bit)翻转,并用派生的下一代密钥加密。接收方看到 Key Phase 变了,跑一次 HKDF 派生新密钥解密。这一切不需要新一轮握手。
If a connection lives for hours (e.g. as a WebSocket replacement), using the same 1-RTT key for too many packets opens analysis attack surface. RFC 9001 §6 defines in-place key rotation: the sender flips the short-header's Key Phase bit (1 bit) and encrypts with the next-generation derived key. The receiver notices Key Phase changed, runs an HKDF step to derive the new key, decrypts. All this without a new handshake.
"关闭不是事件,是过程。" "Close isn't an event, it's a process." Martin Thomson · QUIC WG · RFC 9000 design note
2026 年的版图
the 2026 landscape
| library | lang | 谁用Used by | 特点Strength |
|---|---|---|---|
| Google quiche | C++ | Chrome · gRPC · Envoy | 最早最完整most complete |
| Cloudflare quiche | Rust | CF edge · nginx-quic | 最快 C-APIfastest C-API |
| msquic | C | Windows Server · .NET | 内核态加速kernel-mode boost |
| quic-go | Go | Caddy · IPFS | Go 生态唯一Go-ecosystem standard |
| quinn | Rust | Hyper · Tonic · IPFS | 异步原生async-native |
| ngtcp2 + nghttp3 | C | curl · Node.js | 最克制最稳lean & rock-stable |
| aioquic | Python | 学术研究 · CTFresearch · CTF | 易读源码readable source |
| s2n-quic | Rust | AWS | 安全审计严格security-first |
| picoquic | C | 学术参考实现academic reference | IETF interop 主力IETF interop workhorse |
| lsquic | C | LiteSpeed | 嵌入式部署embeddable |
来源:Web Almanac 2025、Cloudflare Radar、W3Techs。CDN 默认开启(Cloudflare / Fastly / Akamai / AWS CloudFront / Google Cloud LB)是普及主因。
Source: Web Almanac 2025, Cloudflare Radar, W3Techs. CDN default-on (Cloudflare / Fastly / Akamai / AWS CloudFront / Google Cloud LB) drove the bulk of adoption.
在哪儿赢、赢多少
where it wins, and by how much
| 公司 · 场景Company · scenario | 指标Metric | 提升Improvement | 来源Source |
|---|---|---|---|
| Google · YouTube India (4G) | 视频卡顿率中位video rebuffer median | −20% ~ −40% | Chrome blog · 2020-10 |
| Google · Search | tail latency | −16% | SIGCOMM 2017 · Langley |
| Meta · Facebook App | 请求错误率request error rate | −5% | Meta Engineering · 2020 |
| Meta · video stream | video stall rate | −20%+ | Meta Engineering · 2020 |
| Cloudflare · returning users | 0-RTT median TTFB | −50ms | CF blog · 0-RTT resumption |
| Cloudflare · global | 弱网 TTFBpoor-link TTFB | −10% ~ −15% | CF Radar · 2024 |
| Fastly · GA launch | cold connect | −40% | Fastly blog · RFC 9000 GA |
| Apple · iCloud Private Relay | 切网 RTT 抖动network-switch RTT jitter | ~ 0(看不出)~ zero (imperceptible) | WWDC 2022 · session 110337 |
数字来自厂商公开 blog / SIGCOMM 论文。原文如有更新请以最新版本为准;上表数字保留首次公开值。
Numbers cite each vendor's first public disclosure on blog or SIGCOMM. If the post has been updated since, the original disclosure value is kept here.
数据中心内部丢包 < 0.01%,TCP HOL 几乎不发生。但 HTTP/3 用户态 UDP 处理带来 2× CPU 成本。结果是纯吞吐 H2 over TCP 完胜。gRPC 至今主流仍是 HTTP/2。
Intra-DC loss < 0.01%, TCP HOL almost never fires. But HTTP/3's user-space UDP carries a 2× CPU tax. On pure throughput, H2 over TCP crushes. gRPC still defaults to HTTP/2.
~8% 连接尝试因 UDP/443 被防火墙阻断。浏览器 Happy Eyeballs 会自动 fallback 到 H2 over TCP——但用户先付了"试错"的延迟。
~8% of connection attempts get UDP/443 blocked by firewalls. Browser Happy Eyeballs auto-falls back to H2 over TCP — but the user has already paid the "tried it and failed" latency.
如果你的 LCP 主要花在服务端渲染或 JS 主线程上,省下来的 RTT 在水池里游泳,看不见。Patrick Meenan:H3 提升下限,不抬上限。
If your LCP is dominated by SSR or JS main-thread work, the saved RTTs swim in a pond — invisible. Patrick Meenan: H3 raises the floor, not the ceiling.
手机用户、4G/5G、丢包 1-3%、页面有 50+ 子请求——这是 H3 设计场景。0-RTT、连接迁移、流独立丢包恢复全用上。
Mobile users, 4G/5G, 1-3% loss, page has 50+ subresources — H3's home turf. 0-RTT, migration, per-stream loss recovery all fire.
"如果你不知道你的用户在哪里,
HTTP/3 就是合理的默认选择。" "If you don't know where your users are,
HTTP/3 is the sensible default." Lucas Pardue · Cloudflare · IETF 116
DNS · curl · Wireshark · qlog · sysctl
DNS · curl · Wireshark · qlog · sysctl
curl --http3 触发 Alt-Svc 协商、Wireshark 看 Initial 包、qlog 记录每个 PN 与 ACK、内核 sysctl 调整 UDP receive buffer。每一项都对应主线某个阶段的调试入口。
Main-line, hands-on view: not Chrome's default-completed GET, but your own way to surface every stage — curl --http3 triggers Alt-Svc negotiation, Wireshark catches Initial packets, qlog records every PN and ACK, kernel sysctl tunes the UDP receive buffer. Each is a debug entry-point for one phase of the main-line.
服务器在响应头里加一行:alt-svc: h3=":443"; ma=86400。客户端记 24 小时,第二次访问才走 H3。意味着新用户的首屏永远拿不到 0-RTT。
Server appends a response header: alt-svc: h3=":443"; ma=86400. Client caches it 24h, uses H3 from the next visit. Meaning new users never get 0-RTT on the first paint.
在 DNS 区文件加一行:ursb.me. 300 IN HTTPS 1 . alpn="h3,h2" ipv4hint="39.105.102.252"
浏览器解析 DNS 就拿到了——第一次访问直接走 H3。配合 RFC 8484 DoH 或 RFC 9250 DoQ,连 DNS 查询本身都加密。
Add one line to the DNS zone:ursb.me. 300 IN HTTPS 1 . alpn="h3,h2" ipv4hint="39.105.102.252"
The browser gets it at DNS resolution time — first visit goes straight to H3. Combined with RFC 8484 DoH or RFC 9250 DoQ, the DNS query itself is encrypted.
1.1.1.1 都支持。相比 DoT(DNS over TLS)省 1 RTT,相比 DoH 省 HTTP/3 那一层开销。ALPN 编号是 doq,默认端口 853。
DNS over QUIC (DoQ) is QUIC's second-biggest application — not HTTP/3, just plain DNS queries stuffed into a QUIC stream. AdGuard, NextDNS, Cloudflare 1.1.1.1 all support it. vs DoT it saves 1 RTT; vs DoH it skips the HTTP/3 overhead. ALPN doq, default port 853.
因为 QUIC 加密了一切,光抓包看不出连接内部发生了什么。IETF 用 qlog(draft-ietf-quic-qlog-main-schema,2024 已多版)定义了一份结构化 JSON 日志格式——服务端/客户端用任何 QUIC 库都可以输出 qlog,把它扔到 qvis.quictools.info 就能看到拥塞窗口曲线、PN 单调性、ACK 时序、loss event、stream 优先级。这是 H3 调试的唯一正解。
Because QUIC encrypts everything, raw pcap shows nothing about what's happening inside. IETF defined qlog (draft-ietf-quic-qlog-main-schema, several revisions by 2024) — a structured JSON log format any QUIC library can emit. Drop it into qvis.quictools.info and you get the congestion-window curve, PN monotonicity, ACK timeline, loss events, stream priorities. The only sane debug path for H3.
| 设置Knob | 默认Default | 推荐Recommended | 为什么Why |
|---|---|---|---|
net.core.rmem_max | 208 KB | ≥ 2.5 MB | 单个 UDP socket 缓冲,避免突发丢包single-socket buffer to absorb bursts |
net.core.wmem_max | 208 KB | ≥ 2.5 MB | 同上 · 发送方same · send side |
GSO/GRO | off | on | 让网卡分片 = CPU 降一半NIC segmentation = halve CPU |
SO_REUSEPORT | — | on · per-core | 用 eBPF 把 CID 路由到 CPUeBPF-route CID → CPU |
io_uring | — | experimental | 异步 IO · 减少系统调用async I/O · fewer syscalls |
| QPACK dynamic table | 4 KB | 4-16 KB | 大 = 压缩好但 HOL 风险larger = better compression, more HOL risk |
listen 443 quic reuseport;——少了 reuseport 单核 CPU 直接吃满;(3) 在同一份配置里保留 listen 443 ssl; 走 TCP fallback;(4) 加 add_header alt-svc 'h3=":443"; ma=86400';——一开始我就忘了这条,浏览器永远走不到 H3。
The compulsory checklist for first-time HTTP/3 on nginx 1.26: (1) replace OpenSSL with the quictls fork or it won't build; (2) configure listen 443 quic reuseport; — without reuseport one CPU core pegs at 100%; (3) keep listen 443 ssl; in the same config for TCP fallback; (4) add add_header alt-svc 'h3=":443"; ma=86400'; — I once forgot this and the browser never upgraded.
没有免费的午餐
no free lunches
Fastly 在 2020 年公开的实测:在相同吞吐下,HTTP/3 的 CPU 消耗是 HTTP/2 over TLS 的 1.5x ~ 2x。原因:每个 UDP 包都要进出用户态、做独立 AEAD 加解密、维护用户态拥塞控制状态。这是 CDN 厂商真正头疼的事——同样的服务器,H3 流量上限只有 H2 的一半。
Fastly's 2020 disclosure: at equal throughput, HTTP/3 burns 1.5x – 2x the CPU of HTTP/2 over TLS. Reason: every UDP packet crosses user/kernel boundary, does its own AEAD encrypt/decrypt, and maintains user-space cc state. The real CDN pain — the same box can carry half the H3 traffic of H2.
AF_QUIC。还在讨论,远未合入。AF_QUIC. Still discussion, far from merge.过去运营商靠 TCP 序列号、SACK、SNI 明文做带宽统计、QoS 调度、DPI 拦截。QUIC 把这些全加密了。运营商失去了路径上的"抓手"——这是有意的,但也是一些行业(金融监管、合规审计、家长控制)真正头疼的事。Spin Bit 是部分妥协,但远远不够。
Carriers used to measure bandwidth, do QoS, run DPI based on TCP seq/SACK/cleartext SNI. QUIC encrypted all of that. Operators lost their "handles" on the path — this was intentional, but it's a real pain for industries like financial regulation, compliance auditing, parental control. Spin Bit is a partial compromise; nowhere close to enough.
Patrick Meenan、Steve Souders 等 Web 性能老兵不停指出:如果你的网站性能瓶颈是 JS 执行、SSR 等待、第三方脚本,HTTP/3 帮你的部分微乎其微。这是真的。HTTP/3 抬升的是分布的下限——P95、P99 的弱网用户体验。如果你的产品根本没有 P95 弱网用户(比如你只服务美国/欧洲城市光纤),花精力上 H3 的 ROI 接近零。
Patrick Meenan, Steve Souders and other web-perf veterans keep pointing out: if your bottleneck is JS execution, SSR wait, or third-party scripts, HTTP/3 helps you very little. True. HTTP/3 lifts the floor of the distribution — P95/P99 weak-link users. If your product has no P95 weak-link users (e.g. you only serve fiber-grade US/EU cities), the ROI of switching to H3 is near zero.
WebTransport · MASQUE · MoQ · HTTP/4?
WebTransport · MASQUE · MoQ · HTTP/4?
HTTP/3 不是终点——它是 QUIC 这个"通用安全传输"找到的第一个杀手应用。QUIC 上正在长出一片新协议生态。下面是 2026 年的四个方向。
HTTP/3 isn't the finish line — it's the first killer app of QUIC as a "generic secure transport". A whole protocol ecosystem is growing on top. Here are 2026's four directions.
WebSocket 跑在 HTTP/1.1 Upgrade 上,有 TCP head-of-line,无可靠/不可靠混合、不适合 RTC。WebTransport over HTTP/3(W3C WebTransport API + draft-ietf-webtrans-http3)给浏览器开放:(a) 可靠双向流;(b) 不可靠 datagram(RFC 9221)。Chrome 自 97 起原生支持,ALPN 复用 h3。云游戏 / 在线协作 / 实时翻译已经开始迁。
WebSocket runs on HTTP/1.1 Upgrade, inherits TCP HOL, lacks a mixed reliable/unreliable channel, and is awful for RTC. WebTransport over HTTP/3 (W3C WebTransport API + draft-ietf-webtrans-http3) exposes to browsers: (a) reliable bidi streams; (b) unreliable datagrams (RFC 9221). Chrome shipped support in 97; ALPN reuses h3. Cloud gaming, collaboration, live translation are migrating.
Apple iCloud Private Relay(iOS 15+ 的 iCloud+ 功能)是目前最大量产的 MASQUE 实战。它的核心架构不是"用 H3 加密一下",而是故意把信任切两半:
iCloud Private Relay (iOS 15+ as part of iCloud+) is by far the largest production MASQUE deployment. Its core trick isn't "tunnel things in H3" — it's deliberately splitting trust into two halves:
三件关键事实:
CONNECT-UDP(RFC 9298)建 H3 隧道;隧道内载荷再用 capsule(RFC 9297)打包传给出口。Three things to know:
CONNECT-UDP (RFC 9298) to set up the H3 tunnel; payloads inside are wrapped in capsules (RFC 9297) and forwarded to the egress.这是 MASQUE 至今最大、唯一商用规模的部署。它没有用 CONNECT-IP(更激进的整 IP 包封装),只用 CONNECT-UDP——Apple 不需要 VPN 全包代理的语义,只需要让 Web 流量"看起来都是同一个 IP 发出的"。剩下的 CONNECT-IP 用例(VPN 替代)还在等下一波。
This is the largest — and so far only commercial-scale — MASQUE deployment. Notably it uses only CONNECT-UDP, not CONNECT-IP (the more aggressive whole-IP-packet tunnel). Apple doesn't need full VPN semantics; it just needs web traffic to "look like it comes from one IP". The CONNECT-IP use case (full VPN replacement) is still waiting for the next wave.
HLS / DASH 延迟 5-30 秒;WebRTC 延迟 100ms 但太重、不好缓存。Media over QUIC(IETF MoQ WG 推进中)目标是亚秒级延迟 + CDN 可缓存,发布/订阅模式。预期取代体育直播、低延迟视频、合作直播的传输层。Cloudflare 已经把它内置进 Workers。
HLS / DASH have 5-30s latency; WebRTC is 100ms but heavy and uncacheable. Media over QUIC (IETF MoQ WG in progress) targets sub-second latency + CDN-cacheable, with a pub/sub model. Slated to replace transport for sports streaming, low-latency video, collaborative live. Cloudflare already ships it inside Workers.
IETF 当前的共识:未来 5-10 年内不会有 HTTP/4。理由很务实:HTTP/3 + QUIC 的扩展机制(Datagram、KEY_UPDATE、ALPN、可插拔 cc、TLS extension)已经足够柔软。要加东西(抗量子加密、FEC 前向纠错、新拥塞算法),都可以作为扩展挂在 QUIC 上,不需要新的主版本号。所以 H3 大概率会像 IPv4 那样长寿。
Current IETF consensus: no HTTP/4 in the next 5-10 years. Practical reason: HTTP/3 + QUIC's extension mechanisms (Datagram, KEY_UPDATE, ALPN, pluggable cc, TLS extensions) are flexible enough. Adding things — post-quantum crypto, FEC, new cc algorithms — all fit as QUIC extensions; no new major version needed. So H3 will likely live like IPv4 — for decades.
"我们花三十年做了一个能装下未来三十年的传输层。" "Thirty years of work for a transport layer that can hold the next thirty." Lars Eggert · IETF QUIC WG · 2022
这一节就是给你撕下来贴墙的
the page you'd print, pin, and screenshot
读完 24 章是一回事,下一次给同事讲清楚是另一回事。下面 10 条是这篇文章里最反直觉、最值得带走的事实——每一条都标了对应章节和最关键的 RFC 锚点。
Reading 24 chapters is one thing; explaining it cleanly to a colleague is another. The ten facts below are the most counter-intuitive takeaways — each pinned to its chapter and the single most important RFC anchor.
读完能写,不只是读懂
readable code you could rewrite from memory
前 24 章读 wire 格式,这一章反过来:用 Cloudflare 的 quiche 库一步步实现一个能跑的 QUIC 客户端,发一个 GET 请求。完整代码 ~ 200 行 Rust,每段配 RFC 9000 / 9001 / 9114 章节引用。读完这一章你应该能从空 main.rs 起步,凭记忆复刻整个客户端。
The previous 24 chapters read the wire format. This one inverts: build a working QUIC client step by step using Cloudflare's quiche library, sending one GET request. The complete code is ~ 200 lines of Rust, each section cross-referenced to RFC 9000 / 9001 / 9114. By the end you should be able to start with an empty main.rs and reproduce the whole client from memory.
# Cargo.toml [package] name = "minquic" version = "0.1.0" edition = "2021" [dependencies] quiche = "0.21" # Cloudflare's QUIC implementation, BoringSSL inside mio = "0.8" # non-blocking UDP socket ring = "0.17" # for stateless reset token random url = "2.5" log = "0.4" env_logger = "0.10"
三个依赖:quiche 做 QUIC + TLS;mio 做非阻塞 socket 事件循环;ring 给我们一些密码学杂活。整套不到 1 MB 编译产物。
Three dependencies: quiche for QUIC + TLS, mio for the non-blocking socket event loop, ring for cryptographic odd jobs. The whole thing compiles to under 1 MB.
use mio::net::UdpSocket; use std::net::SocketAddr; fn main() -> Result<(), Box<dyn std::error::Error>> { env_logger::init(); // ---- (1) Parse target URL -------------------------------- let url = url::Url::parse("https://cloudflare-quic.com/")?; let host = url.host_str().unwrap(); let port = url.port_or_known_default().unwrap(); let peer_addr: SocketAddr = format!("{}:{}", host, port) .to_socket_addrs()?.next().unwrap(); // ---- (2) Bind a UDP socket (local) ----------------------- let local_addr: SocketAddr = "0.0.0.0:0".parse()?; let mut socket = UdpSocket::bind(local_addr)?; let mut poll = mio::Poll::new()?; let mut events = mio::Events::with_capacity(1024); poll.registry().register(&mut socket, mio::Token(0), mio::Interest::READABLE)?; // ---- (3) Build the quiche Config ------------------------- let mut config = quiche::Config::new(quiche::PROTOCOL_VERSION)?; // = 1 (RFC 9000) config.set_application_protos(&[b"h3"])?; // ALPN: HTTP/3 config.set_max_idle_timeout(5_000); // ms config.set_max_recv_udp_payload_size(1350); // avoid IPv6 PMTU issues config.set_max_send_udp_payload_size(1350); config.set_initial_max_data(10_000_000); // flow control · conn-level config.set_initial_max_stream_data_bidi_local(1_000_000); // per-stream config.set_initial_max_streams_bidi(100); config.set_initial_max_streams_uni(100); config.verify_peer(true); // TLS cert validation config.load_verify_locations_from_directory("/etc/ssl/certs")?;
三件事:① 解析 URL 拿到 (host, port);② 起 UDP socket(local 0:0 让 OS 分配端口)+ mio poll;③ 构造 quiche Config——ALPN=h3 必填,5 秒空闲超时,1350 字节 UDP payload 上限(留 50 字节给 IPv6 头扩展),初始流量控制限额(10 MB conn + 1 MB stream),证书验证打开。
Three steps: ① parse URL to get (host, port); ② bind UDP socket (local 0:0, let OS choose port) + mio poll; ③ build quiche Config — ALPN=h3 is mandatory, 5 s idle timeout, 1350-byte UDP payload cap (leaves 50 B for IPv6 header extensions), initial flow-control limits (10 MB connection + 1 MB stream), peer verification on.
// ---- (4) Generate a random source CID -------------------- let mut scid = [0; quiche::MAX_CONN_ID_LEN]; // 20 ring::rand::SystemRandom::new().fill(&mut scid)?; let scid = quiche::ConnectionId::from_ref(&scid); // ---- (5) Initiate the connection ------------------------- // `quiche::connect` creates Initial keys (RFC 9001 §5.2), // constructs ClientHello, and prepares to emit Initial packet. let mut conn = quiche::connect( Some(host), // SNI &scid, local_addr, peer_addr, &mut config, )?; log::info!("connecting to {} from {}", peer_addr, local_addr);
关键的第一次时刻:quiche::connect 内部做了 RFC 9001 §5.2 全套——用 client DCID 做 HKDF 输入派生 Initial keys(salt 是公开常量)、构造 ClientHello(含 ALPN/SNI/transport parameters/keyshare),把它放进 CRYPTO 帧准备发送。此刻还没字节上线。
The crucial first moment: quiche::connect internally runs RFC 9001 §5.2 — derives Initial keys via HKDF on client DCID (salt is a public constant), constructs ClientHello (with ALPN/SNI/transport parameters/keyshare), and queues it into a CRYPTO frame. No bytes have hit the wire yet.
let mut buf = [0; 65535]; let mut out = [0; quiche::MAX_DATAGRAM_SIZE]; // 1350 let mut req_sent = false; loop { // (6) Send anything quiche has queued ------------------ loop { let (write, send_info) = match conn.send(&mut out) { Ok(v) => v, Err(quiche::Error::Done) => break, Err(e) => return Err(e.into()), }; socket.send_to(&out[..write], &send_info.to)?; log::debug!("sent {} B to {}", write, send_info.to); } // (7) Wait for either an incoming packet or the timeout let timeout = conn.timeout(); poll.poll(&mut events, timeout)?; // timer fired (PTO, idle, etc.) if events.is_empty() { conn.on_timeout(); continue; } // (8) Drain UDP socket ---------------------------------- while let Ok((read, from)) = socket.recv_from(&mut buf) { let recv_info = quiche::RecvInfo { to: local_addr, from }; match conn.recv(&mut buf[..read], recv_info) { Ok(_) => {}, Err(e) => { log::warn!("recv: {}", e); break; } } } if conn.is_closed() { break; } // (9) Once handshake is done, send the GET -------------- if conn.is_established() && !req_sent { send_http3_request(&mut conn, host, "/")?; req_sent = true; } // (10) Drain readable streams --------------------------- for stream_id in conn.readable() { while let Ok((read, fin)) = conn.stream_recv(stream_id, &mut buf) { std::io::stdout().write_all(&buf[..read])?; if fin { log::info!("stream {} closed", stream_id); conn.close(true, 0x00, b"bye")?; } } } } Ok(()) }
事件循环的 5 个步骤是所有 QUIC 客户端的共同骨架:(6) 把 quiche 排队待发的字节全部送出 → (7) poll 直到收到包或 PTO/idle 超时(conn.timeout() 给出下次 deadline)→ (8) 把 socket 里所有 UDP 包灌进 conn.recv → (9) 握手完成后发 HTTP/3 request → (10) 读 readable streams 把 response 字节 dump 出来。quiche 不替你做 socket I/O 与 timer——它只是状态机,你必须自己驱动。
The five-step event loop is the common skeleton for every QUIC client: (6) drain quiche's pending send queue → (7) poll until a packet arrives or the PTO/idle timer fires (conn.timeout() tells you the next deadline) → (8) feed every UDP packet through conn.recv → (9) once handshake completes, send the HTTP/3 request → (10) read readable streams and dump response bytes. quiche does no socket I/O or timer handling for you — it's a state machine; you drive it.
fn send_http3_request( conn: &mut quiche::Connection, host: &str, path: &str, ) -> Result<(), quiche::h3::Error> { // (E.1) Create H3 client on top of the QUIC conn ---------- // This opens StreamID=2 (control) and StreamID=6/10 (QPACK enc/dec) let h3_config = quiche::h3::Config::new()?; let mut h3 = quiche::h3::Connection::with_transport(conn, &h3_config)?; // (E.2) Build the request headers (RFC 9114 §4) ----------- let req_headers = vec![ quiche::h3::Header::new(b":method", b"GET"), quiche::h3::Header::new(b":scheme", b"https"), quiche::h3::Header::new(b":authority", host.as_bytes()), quiche::h3::Header::new(b":path", path.as_bytes()), quiche::h3::Header::new(b"user-agent", b"minquic/0.1"), ]; // (E.3) Send · QPACK encodes headers, writes HEADERS frame // onto StreamID=0 (first client-initiated bidi stream) let stream_id = h3.send_request(conn, &req_headers, true)?; log::info!("sent request on stream {}", stream_id); Ok(()) }
五件事在 send_request 里发生:① QPACK 把 5 个 pseudo-header 压缩成 ~ 12 字节(命中 static table 多次);② 包成 HEADERS frame (type=0x01);③ 写到 StreamID=0(第一条 client-initiated bidi);④ FIN flag set 因为我们没有 body;⑤ quiche 内部把 STREAM 帧排进 1-RTT 包发送队列。最后由事件循环步骤 (6) 真正送上线。
Five things happen inside send_request: ① QPACK compresses the 5 pseudo-headers into ~ 12 bytes (multiple static-table hits); ② wraps them in a HEADERS frame (type=0x01); ③ writes them to StreamID=0 (first client-initiated bidi); ④ sets FIN since we have no body; ⑤ quiche queues a STREAM frame into the 1-RTT send queue. Event-loop step (6) actually pushes the bytes out.
$ SSLKEYLOGFILE=keys.log cargo run --release 2>quic.log HTTP/3 200 content-type: text/html content-length: 12345 ... $ # in another shell, capture the UDP traffic: $ sudo tcpdump -i en0 -w cap.pcap udp port 443 $ # Open cap.pcap in Wireshark: $ # Preferences → Protocols → TLS → (Pre)-Master-Secret log file = keys.log $ # Wireshark now decrypts QUIC and shows individual frames.
读完会写,不只是读懂。
这是这一章的不可逆能力。 Field Note · 06 · Hands-on
You can now write it,
not just read it.
That's the irreversible ability this chapter gives. Field Note · 06 · Hands-on
RFC 9000 §21 摊开
RFC 9000 §21 unpacked
Ch08 讲了 0-RTT replay。但 QUIC 的安全表面远不止这一条。RFC 9000 §21 单独列了 21 类已知威胁,这一章挑 7 类对部署者最重要的展开:① amplification(off-path)、② version downgrade、③ stateless reset(off-path 注入)、④ slow read DoS、⑤ linkability via CID、⑥ Optimistic ACK、⑦ handshake DoS。每一类配缓解方案与对应的 RFC 9000 章节。
Ch08 covered 0-RTT replay. But QUIC's attack surface is much wider. RFC 9000 §21 enumerates 21 known threat classes; this chapter unpacks the seven most operationally important: ① amplification (off-path), ② version downgrade, ③ stateless reset injection, ④ slow read DoS, ⑤ linkability via CID, ⑥ Optimistic ACK, ⑦ handshake DoS. Each comes with mitigations and the RFC 9000 section that governs it.
威胁:攻击者伪造受害者源 IP,发一个小 ClientHello UDP 包给 QUIC 服务器;服务器回一个大的 Initial response(包含证书链,常见 3-5 KB)给"受害者"。放大比 ~ 50×-100×,这是 UDP-based reflection DoS 的标准模式。
Threat: attacker forges victim's source IP, sends a small ClientHello UDP datagram to a QUIC server; server sends a large Initial response (with certificate chain, usually 3-5 KB) to the "victim". Amplification ratio ~ 50–100×, the classic UDP reflection DoS pattern.
缓解(RFC 9000 §8.1):
Mitigation (RFC 9000 §8.1):
威胁:active MITM 拦截 ClientHello,把 QUIC v1 改成更弱的 v0 / 不存在的实验版本,或者剥掉 ECH 扩展。
Threat: active MITM intercepts ClientHello, downgrades QUIC v1 to weaker v0 / a fake experimental version, or strips ECH extension.
缓解:Version Negotiation 包(RFC 9000 §6)+ transport parameters 的 version_information(RFC 9368)。客户端在 TLS handshake 完成后把自己看到的版本列表和server 在 VN 包里宣布的版本列表对比,不一致 → handshake fail。这是密码学绑定下层版本与上层 TLS,downgrade 攻击必被发现。
Mitigation: Version Negotiation packet (RFC 9000 §6) + version_information transport parameter (RFC 9368). After TLS handshake completes, client compares its observed version list against the server's announced VN list; mismatch → handshake fails. This cryptographically binds the transport version to the TLS layer; downgrade is detected.
威胁:off-path 攻击者尝试构造一个看起来像 Stateless Reset 的包给受害者,期望受害者关闭连接。
Threat: off-path attacker constructs something that looks like a Stateless Reset for the victim, hoping the victim closes the connection.
缓解:Stateless Reset 包尾部的 16 字节 stateless_reset_token 必须匹配已经派发给该 CID 的 token(通过 NEW_CONNECTION_ID 帧,RFC 9000 §10.3)。token 由 HMAC(reset_secret, CID) 派生——攻击者不知道 reset_secret(服务器内部)就无法伪造。
Mitigation: the 16-byte trailing stateless_reset_token must match a token already distributed for that CID (via NEW_CONNECTION_ID, RFC 9000 §10.3). Tokens are derived from HMAC(reset_secret, CID) — without the server's reset_secret, attackers can't forge.
威胁:攻击者建合法连接,但故意缓慢读 response——服务器在 stream flow-control 限额内必须一直缓存数据,占内存。10 万个慢客户端 = 服务器内存 OOM。这是 TCP "Slowloris" 在 QUIC 上的复制版。
Threat: attacker opens a legitimate connection but reads the response very slowly — server must keep data buffered within stream flow-control limits, occupying memory. 100 K slow clients → server OOM. The QUIC equivalent of TCP "Slowloris".
缓解:三层防御。① stream flow-control 限额本身就限制了每流 buffer 大小;② idle timeout(默认 5-30s)断开长时间无活动连接;③ 应用层主动监控 bytes/RTT 速率,异常低就 close。Cloudflare 实测把 idle timeout 调到 10s 后,慢客户端攻击成本上升 100×。
Mitigation: three layers. ① stream flow-control limits buffer per stream; ② idle timeout (default 5-30 s) closes long-idle connections; ③ app-layer actively monitors bytes/RTT rate and closes anomalously slow streams. Cloudflare measured 100× attack-cost increase after dropping idle timeout to 10 s.
威胁:CID 是 明文(routing 需要)。如果客户端跨 WiFi → 5G 时没换 CID,运营商通过 CID 把两个 IP 关联到同一个用户——隐私泄漏。
Threat: CIDs are cleartext (required for routing). If a client doesn't rotate CIDs when switching Wi-Fi → 5G, the operator can link both IPs to the same user via CID — privacy leak.
缓解(RFC 9000 §5.1):客户端在 NEW_CONNECTION_ID 池里要预备多个 CID,在每次 path 切换时必须用新 CID,旧 CID 用 RETIRE_CONNECTION_ID 注销。这是 connection migration 设计里隐藏的隐私保护。
Mitigation (RFC 9000 §5.1): client maintains a NEW_CONNECTION_ID pool with multiple spare CIDs, must use a new CID on every path change, and retires the old one via RETIRE_CONNECTION_ID. This is the privacy protection hidden inside connection migration's design.
威胁:作为接收方,我可以发提前的 ACK(ACK 还没收到的 PN),骗发送端以为带宽很大、congestion window 应该涨。congestion control 失效 → 发送端发太多 → 拥塞。
Threat: as receiver, send premature ACKs (ACKing PNs not yet received) to trick the sender into thinking bandwidth is high and congestion window should grow. Congestion control breaks → sender over-sends → real congestion.
缓解:QUIC 的 PN 是strict monotonic(Ch12)且加密——发 ACK 必须知道实际收到的 PN,不能猜。如果 ACK PN > 最大已发 PN,连接会立刻进 protocol-violation 状态终止(RFC 9000 §13.1 中明确禁止)。这件事 TCP 时代发生过(因为 SEQ 明文),QUIC 通过头部加密 顺手堵了这个洞。
Mitigation: QUIC's PN is strictly monotonic (Ch12) and encrypted — to ACK a PN you must actually have received it; you can't guess. If an ACK references a PN beyond the largest sent, the connection immediately enters protocol-violation state and terminates (explicitly banned in RFC 9000 §13.1). This was a real TCP-era problem (SEQ was cleartext); QUIC closed the hole via header protection.
威胁:攻击者发 N 万个 ClientHello,但永远不回 Finished——服务器在每个半连接上消耗内存。TCP 时代的 SYN flood 在 QUIC 上的对应物。
Threat: attacker fires N thousand ClientHellos but never sends Finished — server consumes memory on every half-open. The QUIC equivalent of TCP SYN flood.
缓解:Retry 包(同 ① 那个 stateless cookie)+ address validation token 让服务器在确认客户端"能真正收包" 之前不分配 connection state。Cloudflare 的实现:正常情况下不发 Retry(省一个 RTT),只在负载升高时进入 Retry-required 模式。
Mitigation: Retry packet (same stateless cookie from ①) + address validation token. Server allocates no connection state until the client proves "actually receives" the cookie. Cloudflare's implementation: don't issue Retry by default (save one RTT); enter Retry-required mode only when under load.
读完这 7 类,你会发现 3 个反复出现的设计模式:① "未验证之前不投入资源"(amplification cap / Retry / address validation);② "用加密绑定状态"(reset_token / PN 加密 / version_info);③ "放心丢东西"(stateless reset 不需要状态、Retry token 服务器无状态)。QUIC 的安全设计哲学是"能不维护状态就不维护"——比 TCP+TLS 时代的精神进步了一代。
After reading the seven, you'll notice three recurring design patterns: ① "no resource commitment before validation" (amplification cap / Retry / address validation); ② "bind state with cryptography" (reset_token / PN encryption / version_info); ③ "be happy to forget" (stateless reset needs no state; Retry token is server-stateless). QUIC's security philosophy is "don't keep state if you don't have to" — a generational improvement over the TCP+TLS era.
安全设计的最高境界:
把"什么都不记" 变成可证明的防御。 Field Note · 06 · Security
The summit of security design:
turning "forgetting everything" into provable defence. Field Note · 06 · Security
替代 RTMP / WebRTC 的候选
a candidate to replace RTMP / WebRTC
实时媒体在 web 上长期被两种协议瓜分:RTMP(2002 Adobe 出,TCP-based,推流走世界但延迟 ~ 3-5s)和 WebRTC(2011,UDP + SRTP,延迟 100-200ms 但传输不可靠且 NAT 穿越复杂)。两者都不同时满足"低延迟 + 可靠 + 大规模分发"。IETF MoQ WG(2023 成立,Twitch / Meta / Cisco 主推)用 QUIC 重新设计了这件事——Media over QUIC Transport(MOQT)。
Real-time media on the web has long split between two protocols: RTMP (Adobe, 2002, TCP-based, ingests globally but with ~ 3-5 s latency) and WebRTC (2011, UDP + SRTP, 100-200 ms latency but unreliable and NAT-traversal-heavy). Neither simultaneously delivers "low latency + reliable + large-scale distribution". The IETF MoQ WG (chartered 2023, driven by Twitch / Meta / Cisco) redesigned it on QUIC — Media over QUIC Transport (MOQT).
twitch.tv/airing/video/1080p。relay 用 namespace 做路由——subscribe 一个 namespace 等同订阅其下所有 track。twitch.tv/airing/video/1080p. Relays route by namespace — subscribing to a namespace is equivalent to subscribing to all its tracks.;; MOQT control messages (simplified · draft-ietf-moq-transport) ANNOUNCE namespace ;; publisher: "I have content at this NS" SUBSCRIBE namespace track ;; subscriber: "send me this track" SUBSCRIBE_OK subscription_id ;; ack of subscribe UNSUBSCRIBE subscription_id ;; "I'm done" FETCH namespace track group obj ;; "give me a specific object" (catch-up)
这 5 个消息走在双向 QUIC stream上。OBJECT 消息(承载 media payload)独立走 unidirectional QUIC streams 或 DATAGRAM 帧——per-object 选哪种取决于这一帧是不是允许丢。
These 5 messages run on a bidirectional QUIC stream. OBJECT messages (carrying media payload) flow on separate unidirectional QUIC streams or DATAGRAM frames — the choice per object depends on whether that frame is droppable.
音频 frame 不能丢 → 走 stream(可靠);视频 P-frame 可以丢 → 走 datagram(快但不可靠)。每个 object 自己选。
Audio frame must arrive → stream (reliable); video P-frame can drop → datagram (fast). Each object chooses its own.
一条音频 stream 丢包不阻塞另一条视频 stream——RTMP 在 TCP 上做不到。
A lost packet on an audio stream doesn't block a video stream — RTMP on TCP cannot do this.
直播观众坐地铁,Wi-Fi → 5G 不断流。WebRTC 通过 ICE 重启来"恢复",几秒卡顿。
A live viewer on the subway, Wi-Fi → 5G, no stream drop. WebRTC "recovers" by restarting ICE — several seconds of stutter.
payload AEAD,但 CID/namespace 可被 relay 看到 → CDN 可以路由但看不见内容。HLS over HTTPS 的两难局面被解开。
AEAD payload, but CIDs / namespaces stay visible to relays → CDNs can route without seeing content. HLS-over-HTTPS's old dilemma dissolves.
直播一个明星 Twitch 频道 = 1 个 publisher + 10 万 subscriber。绝对不可能直接连——必须有 relay 网络。MOQT 的关键设计:relay 是协议级 first-class concept(不像 WebRTC 的 SFU 是后接的)。
A popular Twitch channel = 1 publisher + 100 K subscribers. Direct connect is impossible — a relay network is required. MOQT's key design choice: relays are first-class protocol concepts (unlike WebRTC's SFU bolted on after the fact).
;; relay tree topology · text-mode diagram publisher (Twitch ingest) | v +----------+----------+ | | relay A (US-East) relay B (EU) | | +--+----+----+ +-+---+----+ | | | | | | | v1 v2 v3 v4 v5 v6 v7 ; viewers 每条边:1 个 QUIC connection · namespace = twitch.tv/airing/ 每个 viewer 只和最近的 relay建立连接 — geographic + cost optimised publisher 不知道谁是 final viewer · CDN 完全代理这件事
| Year | Milestone | Status |
|---|---|---|
| 2022 | Twitch's "Warp" prototype (Luke Curley) | internal demo |
| 2023 | IETF MoQ WG chartered | charter approved |
| 2024 | draft-ietf-moq-transport-04 | first multi-vendor interop |
| 2025 | draft-ietf-moq-transport-10 | WGLC entered |
| 2026 H1 | Twitch beta with select streamers | now |
| 2026 H2 | Meta · Instagram Live trial | announced |
| 2027 (est.) | RFC publication | — |
两者都不完全。RTMP 在 ingest 侧(主播 → 平台)2026 已经被 WHIP/WHEP / SRT 和 MoQ 实验性替代;WebRTC 在低延迟 P2P 通话(Discord / FaceTime)不会被 MoQ 替代,因为 P2P + SFU 那套机制 MoQ 没有对等替代。MoQ 真正的领地是"大规模单向分发 + 低延迟"——这正是 Twitch / YouTube Live / 体育赛事直播 / 在线教育那个市场。
Neither completely. RTMP on the ingest side (creator → platform) is already being supplanted by WHIP/WHEP / SRT and experimental MoQ in 2026. WebRTC in low-latency P2P calls (Discord / FaceTime) won't be replaced by MoQ — its P2P + SFU mechanisms have no MoQ equivalent. MoQ's real territory is "large-scale one-to-many distribution + low latency" — exactly the Twitch / YouTube Live / sports streaming / online-education market.
直播平台 2026 年这场迁移,
是过去 20 年最大的一次。 Field Note · 06 · MoQ
The live-streaming migration of 2026
is the biggest move in the field in twenty years. Field Note · 06 · MoQ
替代 WebSocket 的 H3 接口
the H3 interface that replaces WebSocket
Ch24 提了 WebTransport 一句话。这一章把整个 W3C WebTransport API 完整走查——浏览器开发者从客户端发起 H3 长连接、收发可靠流 + 不可靠数据报 全过程。Chrome 97+(2022 Q1)、Firefox 114+(2023)、Safari 16.4+(2023-03)都已实现。
Ch24 mentioned WebTransport in passing. This chapter walks through the entire W3C WebTransport API — how a browser developer opens an H3-based long-lived connection and exchanges reliable streams + unreliable datagrams. Implemented in Chrome 97+ (2022 Q1), Firefox 114+ (2023), Safari 16.4+ (2023-03).
// 1. open the transport (HTTPS only · ALPN = h3 + WebTransport) const transport = new WebTransport("https://example.com:443/wt"); await transport.ready; // resolves after H3 handshake + WebTransport SETTINGS exchange // 2. open a bidirectional reliable stream const stream = await transport.createBidirectionalStream(); const writer = stream.writable.getWriter(); await writer.write(new TextEncoder().encode("hello")); // 3. send an unreliable datagram const dgramWriter = transport.datagrams.writable.getWriter(); await dgramWriter.write(new Uint8Array([42, 7, 3]));
这 5 行 JS 在底层对应:① 用 ALPN=h3 建 QUIC + TLS 1.3 连接 → 发 HTTP/3 CONNECT with :protocol = webtransport(RFC 9220 Extended CONNECT);② 服务器 200 响应表示接受;③ 后续所有数据 走关联 QUIC stream 而非 H3 帧。WebTransport 跑在 H3 之上(共用连接),但逃出了 HTTP/3 的请求-响应模式。
These 5 lines map to: ① open a QUIC + TLS 1.3 connection with ALPN=h3 → send an HTTP/3 CONNECT with :protocol = webtransport (RFC 9220 Extended CONNECT); ② server replies 200 to accept; ③ all subsequent data flows on associated QUIC streams, not via HTTP/3 frames. WebTransport rides on top of H3 (shared connection) but escapes HTTP/3's request/response model.
| API | type | purpose |
|---|---|---|
new WebTransport(url, options) | constructor | open transport · options.serverCertificateHashes 可指定自签证书 |
transport.ready | Promise | resolves once H3 handshake + WT SETTINGS done |
transport.closed | Promise | resolves on graceful close · rejects on abort |
transport.close({closeCode, reason}) | method | actively shutdown |
transport.createBidirectionalStream() | method | → Promise<WebTransportBidirectionalStream> |
transport.createUnidirectionalStream() | method | → Promise<WritableStream> |
transport.incomingBidirectionalStreams | ReadableStream | server-initiated bidi streams arrive here |
transport.incomingUnidirectionalStreams | ReadableStream | server-initiated uni |
transport.datagrams.readable | ReadableStream | incoming unreliable datagrams · max 1200 B each |
transport.datagrams.writable | WritableStream | outgoing unreliable datagrams |
transport.datagrams.maxDatagramSize | number | negotiated max payload (PMTU-aware) |
transport.getStats() | Promise | congestion stats · cwnd / smoothedRTT / packetsLost / ... |
双向可靠 byte stream。客户端发起用 createBidirectionalStream;服务端发起从 incomingBidirectionalStreams 读。每条 stream 独立丢包不阻塞别的——WebSocket 在 TCP 上做不到。
Bidi reliable byte stream. Client-initiated via createBidirectionalStream; server-initiated arrive on incomingBidirectionalStreams. Each stream's loss doesn't block others — WebSocket on TCP cannot do this.
单向可靠 byte stream。常用于事件广播(server 向 client 单向推) 或大文件上传(client 向 server 单向推)——节省一个方向的 stream-id 槽。
Reliable byte stream in one direction. Used for event broadcast (server → client) or file upload (client → server) — saves one direction's stream-id slot.
单个 UDP 包级别的数据报,无可靠传输 · 无顺序保证 · 无拥塞控制(只受 conn 级 cwnd 限)。每条 ≤ 1200 字节(PMTU 决定)。这就是 RFC 9221 QUIC DATAGRAM 在浏览器侧的暴露。低延迟实时场景的金子。
UDP-packet-level datagrams, unreliable · no ordering · no per-flow CC (just conn-level cwnd). ≤ 1200 B each (PMTU-bound). This is the browser-side exposure of RFC 9221 QUIC DATAGRAM. Gold for low-latency real-time use cases.
// CLIENT · send 60 Hz input via datagrams, receive frame deltas via streams const wt = new WebTransport("https://cloud-gaming.example/wt"); await wt.ready; // 1. 60 Hz unreliable datagram loop — keyboard / mouse / gamepad const dgramW = wt.datagrams.writable.getWriter(); setInterval(async () => { const input = pollInput(); // keys + mouse delta · ~ 20 B try { await dgramW.write(input); } catch (e) { // dgram dropped under congestion · just send the next one } }, 16); // 2. server pushes frame deltas via uni streams (one per delta) for await (const stream of wt.incomingUnidirectionalStreams) { const reader = stream.getReader(); let bytes = new Uint8Array(); while (true) { const { value, done } = await reader.read(); if (done) break; bytes = concat(bytes, value); } decodeAndPaintFrame(bytes); // reliable per-delta · independent loss }
这段代码体现了 WebTransport 的杀手锏:输入用 datagram(可丢,反正下一帧又来一遍),视频 frame 用 stream(必须按顺序解码,但每帧自己一条 stream,丢包只影响那一帧)。WebRTC 也能做这个,但需要 ICE / STUN / TURN / SDP 一大套基建;WebTransport 直接 5 行 JS 起步。
This shows WebTransport's killer pattern: input via datagrams (droppable; the next frame's input is fresh), video frames via streams (must decode in order, but each frame has its own stream so loss is isolated). WebRTC can do this too, but needs ICE / STUN / TURN / SDP machinery; WebTransport is 5 lines of JS to start.
| WebTransport | WebSocket | WebRTC DataChannel | |
|---|---|---|---|
| 下层协议 | HTTP/3 + QUIC | HTTP/1.1 Upgrade · TCP | SCTP over DTLS over UDP |
| 低延迟 | ✓ 1 RTT 启动 | ✗ 1 RTT TCP + 1 RTT Upgrade + 2 RTT TLS | ✓ 但 ICE + STUN/TURN 数 RTT |
| HOL block | ✓ stream 间隔离 | ✗ TCP HOL block | ✓ SCTP 流隔离 |
| 不可靠通道 | ✓ datagrams | ✗ 全可靠 | ✓ unordered/unreliable mode |
| P2P | ✗ 仅 client-server | ✗ | ✓ 真 P2P |
| 多路复用 in browser | ✓ 共享 H3 连接 | 每个 ws 一个 TCP | 独立 DTLS · 不复用 |
| 用例 | cloud gaming / live · client-server | chat / notifications | P2P 视频 / 文件传输 |
serverCertificateHashes 显式 pin,但每次证书过期都要更新。生产用 Let's Encrypt 简单。serverCertificateHashes, but every cert renewal needs an update. In prod, use Let's Encrypt.writer.ready 或自己看 transport.getStats() 的 cwnd。writer.write always resolves immediately (no WebSocket-style bufferedAmount). For flow-control you must await writer.ready or poll transport.getStats()'s cwnd.技术以外的力
forces beyond the protocol
HTTP/3 spec 2022 年成 RFC,2026 年 Cloudflare Radar 显示全球 ~ 32% 流量走 H3——但 web 平台的关键 25% 部署在哪里没做?为什么有些国家 QUIC 占比 < 5%?这不是技术问题,是政治经济问题。这一章把 QUIC 部署生态摊开,从谁阻止谁的视角看。
HTTP/3 RFC published 2022; by 2026, Cloudflare Radar shows ~ 32% global traffic on H3. Where are the missing 25% deployments hiding? Why is QUIC adoption < 5% in some countries? This isn't a technical problem — it's political economy. This chapter unpacks the QUIC deployment ecology from the angle of who blocks whom.
| Country / Region | H3 share | Notes |
|---|---|---|
| USA | ~ 45% | Chrome 主导 + Cloudflare/Fastly 全开 |
| Western Europe | ~ 40% | 类似 USA · 略低 |
| Japan | ~ 35% | NTT QoS 友好 |
| India | ~ 30% | 移动占比高 → H3 收益大 → 浏览器倾向选 H3 |
| South Korea | ~ 25% | SK Telecom 部分 UDP 限速 |
| Russia | ~ 8% | ISP 普遍限速 UDP |
| China | ~ 5% | GFW 历史上把 UDP 当 ".可疑" 处理 + 国内 CDN(阿里/腾讯/百度)2024 才开始上 H3 |
| SSA(Sub-Saharan Africa) | ~ 3-15% | 差距大 · ISP 设备旧 |
三件事可能在 2026-2027 把 H3 占比推过 50%:
Three things could push H3 share past 50% in 2026–2027:
Linux 6.6+ 的 ktls + AF_XDP 让 QUIC 部分进入内核。CPU 成本会降回 H2 水平 → 阻力 ④ 消失。
Linux 6.6+'s ktls + AF_XDP partially moves QUIC into the kernel. CPU cost falls to H2 levels → friction ④ vanishes.
阿里云 2024 默认开 H3,腾讯云 2025 跟上。中国占全球流量 ~ 20%,这一拨直接推高总体 ~ 5-7%。
Alibaba Cloud enabled H3 by default in 2024; Tencent Cloud followed 2025. China's ~ 20% of global traffic — this single move adds ~ 5-7% to global totals.
Caddy 2.7+ 默认开 H3。nginx 1.27 (2024) 起 H3 进入稳定。Small site 不再需要主动配置——自动 H3。
Caddy 2.7+ has H3 on by default. nginx 1.27 (2024) marks H3 stable. Small sites stop needing manual configuration — H3 just works.
部分国家要求 ISP-level DPI(深包检测)做内容审计。QUIC 加密一切让 DPI 失效——这些国家可能反向 限制 H3。
Some countries mandate ISP-level DPI for content audit. QUIC's encrypt-everything defeats DPI — those countries may actively throttle H3.
协议的命运不只在 RFC 里,
更在 firewall 规则和 ISP QoS 表里。 Field Note · 06 · Ecology
A protocol's fate is not only in the RFC.
It's also in firewall rules and ISP QoS tables. Field Note · 06 · Ecology
RFC · IETF Drafts · 论文 · 引擎源码
RFCs · IETF Drafts · papers · engine source
这一节把全文用到的 外部规范、RFC、论文、源码 归档。每条引用都带 状态徽章(STD = 正式 RFC / PS = Proposed Standard / DRAFT = IETF Internet-Draft)+ 链接 + 你在哪一章会用到它。所有 URL 在 2026 年 5 月有效;QUIC 生态演化很快,IETF Working Group(quic / masque / moq / httpbis)持续在出新草案。
This section archives every external spec, RFC, paper, or source-code reference the article touches. Each carries a status pill (STD = published RFC / PS = Proposed Standard / DRAFT = IETF Internet-Draft) + link + the chapter that needs it. All URLs valid as of May 2026; the QUIC ecosystem moves quickly — IETF Working Groups (quic / masque / moq / httpbis) keep issuing new drafts.
RFC 不是终点。
它只是"这一刻全世界同意了"的快照。 Field Note · 06 · Fin
An RFC is not the end.
It's a snapshot of "what the world agreed on, at this moment". Field Note · 06 · Fin
从你按下回车,
到屏幕上跳出 200 OK,
HTTP/3 用 13 步把一次请求
封装成一个 UDP 包,
跨过四层加密,
在一个 RTT 里完成。
From the moment you press Enter,
to the moment 200 OK appears,
HTTP/3 wraps a request in a UDP datagram,
crosses four encryption layers,
and finishes in a single RTT —
in thirteen movements.