一个 GET 请求要在 UDP 之上跑完 13 道协议工序、跨 4 个加密级、穿过 3 类流,才能让你看到一个 200 OK——然后连接还要走完关闭、排空、复活三种结局。
这是 HTTP/3 与 QUIC 的全景手册,每一步都标出对应的 RFC 条款。
A single GET has to walk thirteen protocol stages on top of UDP, four cryptographic levels and three stream classes before it can land a single 200 OK — then the connection still has to walk close, drain, or revive.
This is a field map of HTTP/3 and QUIC, with every step pinned to the relevant RFC clause.
三个公式,一具协议骨骼
three formulas, one protocol skeleton
"HTTP" 在大多数人嘴里是一种东西——一个能让浏览器去网站取页面的协议。但工程师如果还把它当成一种东西,就永远理解不了为什么会有 HTTP/3。HTTP 从来不是一个协议,它是三个正交协议的乘积。
To most people, "HTTP" is a thing — the protocol your browser uses to fetch a page. Engineers who keep thinking of it as one thing will never understand why HTTP/3 exists. HTTP has never been one protocol; it has always been the product of three orthogonal layers.
| 版本Version | Semantics | Framing | Transport |
|---|---|---|---|
| HTTP/0.9 (1991) | GET only | — | TCP |
| HTTP/1.0 (1996, RFC 1945) | headers, methods | ASCII, 1 req / conn | TCP |
| HTTP/1.1 (1997-2022, RFC 9112) | 同上 + chunked+ chunked, keepalive | ASCII, keepalive, pipelining | TCP (+ TLS) |
| HTTP/2 (2015-2022, RFC 9113) | RFC 9110 | 二进制 · 多路复用 · HPACKbinary · mux · HPACK | TCP + TLS 1.2/1.3 |
| HTTP/3 (2022, RFC 9114) | RFC 9110 | 二进制 · 简化 · QPACKbinary · simpler · QPACK | QUIC (UDP+TLS 1.3) |
从 Tim Berners-Lee 的一行 GET 到 Cloudflare 的 50% 全网流量
from Tim Berners-Lee's first GET to Cloudflare's 50% global traffic
HTTP/3 不是凭空出现的。它是 30 年技术堆栈一次次试错的产物:从 HTTP/0.9 的一行 GET /,到 SPDY 的实验,到 HTTP/2 的"二进制化",再到 QUIC 把 TCP 整个搬进用户态。每一步都少做了一个假设。
HTTP/3 didn't appear from nowhere. It's the product of thirty years of trial and error: HTTP/0.9's one-line GET /, SPDY's experiments, HTTP/2's binary framing, finally QUIC dragging TCP into user space. Each step drops one assumption.
| 年份Year | 事件Event | 关键人物 / 文档Person / Doc |
|---|---|---|
| 1991 | HTTP/0.9 — 一行 GET /single-line GET / | Tim Berners-Lee · CERN |
| 1996 | HTTP/1.0 · RFC 1945 | Henrik Frystyk Nielsen · W3C |
| 1997 | HTTP/1.1 · RFC 2068 → 2616 (1999) → 7230 (2014) → 9112 (2022) | Roy Fielding · UCI |
| 2008 | TLS 1.2 · RFC 5246 | Tim Dierks · Eric Rescorla |
| 2009 | SPDY 在 Chrome 实验experimental in Chrome | Mike Belshe · Roberto Peon · Google |
| 2012 | gQUIC 在 Google 内部at Google | Jim Roskind |
| 2015 | HTTP/2 · RFC 7540 | Mark Nottingham · Martin Thomson |
| 2016 | IETF QUIC WG 成立chartered | Mark Nottingham · Lars Eggert |
| 2018 | TLS 1.3 · RFC 8446 | Eric Rescorla · Mozilla |
| 2018-11 | "HTTP/3" 正式命名name finalised | Mark Nottingham · IETF 103 |
| 2021-05 | RFC 9000/9001/9002 · QUIC v1 | Iyengar · Thomson · Bishop · Pardue |
| 2022-06 | RFC 9114 · HTTP/3 | Mike Bishop · Akamai |
| 2022-06 | RFC 9204 · QPACK | Charles 'Buck' Krasic · Mike Bishop · Alan Frindell |
| 2023 | RFC 9460 · HTTPS RR (SVCB) | Ben Schwartz · Mike Bishop · Erik Nygren |
| 2024 | QUIC v2 · RFC 9369 · 字段排列变更,反僵化field re-shuffle, anti-ossification | Martin Duke |
为什么花了七年发现 HTTP/2 还不够
why it took seven years to find out HTTP/2 wasn't enough
2015 年 HTTP/2 发布的时候,大家以为 HTTP 终于"完工"了。它把 ASCII 换成了二进制,把 6 条 TCP 连接压成 1 条,把头部用 HPACK 压扩到 95%。结果跑了三年实战,工程师们发现 HTTP/2 留下了三个根本治不好的问题——而且都不是 HTTP/2 的错。是 TCP 的错。
When HTTP/2 shipped in 2015, everyone thought HTTP was finally "done". It swapped ASCII for binary, collapsed 6 TCP connections into 1, compressed headers ~95% with HPACK. Three years of production later, engineers found that HTTP/2 left three diseases that couldn't be cured — and none of them were HTTP/2's fault. They were TCP's fault.
HTTP/2 在应用层多路复用 100 个流,但 TCP 在传输层仍然要求按序交付。一个数据包丢了,整条 TCP 连接停下来等重传——即使另外 99 个流毫无关系。这叫 TCP head-of-line blocking。
HTTP/2 multiplexes 100 streams at the application layer, but TCP at the transport layer still demands in-order delivery. Drop one packet, the entire TCP connection halts — even if the other 99 streams are unrelated. This is TCP head-of-line blocking.
实测:3% 丢包率下 HTTP/2 经常比 HTTP/1.1 多连接还慢。
Measured: at 3% loss, HTTP/2 often loses to HTTP/1.1 multi-connection.
HTTP/2 必须跑在 TLS 上(实际上)。一次新连接要:TCP SYN/SYN-ACK/ACK(1 RTT)+ TLS 1.2 ClientHello/ServerHello(2 RTT)= 3 RTT;用 TLS 1.3 + TCP Fast Open 还是 2 RTT。200ms 的跨洲 RTT 下,开口就花 400~600ms。
HTTP/2 must run over TLS (in practice). A fresh connection needs: TCP SYN/SYN-ACK/ACK (1 RTT) + TLS 1.2 ClientHello/ServerHello (2 RTT) = 3 RTT; TLS 1.3 + TCP Fast Open still 2 RTT. At 200ms intercontinental RTT, you spend 400-600ms before saying a word.
实测:手机 4G/5G 上,握手时间常常超过整个页面的 LCP 预算。
Measured: on 4G/5G, handshake alone often eats the page's entire LCP budget.
TCP 连接由 (src_ip, src_port, dst_ip, dst_port) 五元组定义。手机从 Wi-Fi 切到 5G,src_ip 变了——TCP 连接立即报废,TLS 会话也跟着重建。前端 SPA 里那个长连接 WebSocket 就这样断了。
A TCP connection is identified by the 4-tuple (src_ip, src_port, dst_ip, dst_port). When a phone switches Wi-Fi → 5G, src_ip changes — the TCP connection is dead on arrival, the TLS session along with it. That long-lived WebSocket inside your SPA? Gone.
实测:Meta 测算 5% 的视频流断流是因为切网。
Measured: Meta attributes ~5% of video stalls to network switches.
中间盒(运营商 NAT、企业防火墙、CDN)对 TCP/TLS 字段有路径上的判断逻辑。RFC 允许的扩展字段到中间盒手里就被丢包。TLS 1.3 当初为此用了"中间盒兼容模式"伪装成 TLS 1.2。HTTP/3 干脆躲到 UDP 里。
Middleboxes — ISP NATs, enterprise firewalls, CDNs — inspect TCP/TLS fields and silently drop anything new. RFC-permitted extensions get blackholed in flight. TLS 1.3 ended up disguising itself as TLS 1.2. HTTP/3 just hides inside UDP.
实测:TLS 1.3 早期遭遇 ~3% 中间盒丢包。
Measured: early TLS 1.3 saw ~3% middlebox drops.
「HTTP/2 把 HTTP 治好了,
但 HTTP/2 自己被 TCP 治残了。」 "HTTP/2 cured HTTP,
and then TCP crippled HTTP/2." Daniel Stenberg · curl · 2018
这是 IETF 在 2015-2016 年最先想到的方案。但 TCP 是内核态协议——任何字段改动都要等 Linux / Windows / iOS / Android / 每一台路由器升级一遍。看看 TCP Fast Open(RFC 7413, 2014)现状:发布十年了,实际部署率仍然 < 5%,因为中间盒会丢掉它的 cookie。
结论:在 TCP 上演进 = 在十年这个时间尺度上演进。
That was IETF's first instinct in 2015-2016. But TCP lives in the kernel — any field change waits for Linux / Windows / iOS / Android / every router to ship a new version. Look at TCP Fast Open (RFC 7413, 2014): ten years on, deployment is still < 5%, because middleboxes drop its cookie.
Conclusion: evolving on top of TCP means evolving on a decade timescale.
不是因为 UDP 好,是因为 UDP 不被人管
not because UDP is good, but because nobody touches UDP
"为什么 QUIC 跑在 UDP 上?" 这是任何讲 HTTP/3 的人都要回答的第一个问题。直觉答案"UDP 没有可靠传输、所以 QUIC 自己实现可靠"是错的——这是结果,不是原因。真正的原因只有一个:UDP 是当今互联网上仅剩的、中间盒不会乱碰的协议号。
"Why does QUIC run on UDP?" is the first question every HTTP/3 talk has to answer. The intuitive answer — "UDP isn't reliable, so QUIC has to add its own reliability" — is wrong. That's a consequence, not a cause. The real reason is one sentence: UDP is the only protocol number left on the modern internet that middleboxes don't mess with.
| 选项Option | 优势Pros | 为什么不行Why not |
|---|---|---|
| SCTP | 天然多流,按消息边界传输native multi-stream, message-based | IP protocol number 132 — 大多数 NAT 直接丢包,~50% 丢包率IP protocol 132 — most NATs drop, ~50% loss |
| DCCP | 无序但拥塞控制unordered with cc | IP protocol 33 — 同上,部署率 < 0.1%IP protocol 33 — same, < 0.1% deployed |
| 新协议号New IP protocol | 理论最干净theoretically cleanest | 需要全球每一台路由器+NAT+防火墙升级,不可能needs every router/NAT/firewall on Earth to upgrade — impossible |
| TCP option | 复用现有连接reuse existing conn | 中间盒会清空未知 TCP optionsmiddleboxes strip unknown options |
| UDP | 所有 NAT/防火墙都放行 UDP/443UDP/443 traverses everywhere | 需要在用户态重造 TCP——但这就是 QUIC 想做的have to rebuild TCP in user space — but that's exactly what QUIC wants |
把 TCP 的所有功能(重传、拥塞控制、流控、多路复用、连接管理)搬到用户态,意味着每个 QUIC 数据包都要:进内核 → recvfrom() 拷贝到用户态 → 解密 → 处理 → 加密 → sendto() 拷贝回内核 → 网卡。Fastly 2020 年的实测:QUIC 的 CPU 成本是 TCP+TLS 的 ~2 倍。这是 HTTP/3 真正的负面成本,我们会在第 22 章详细讲。
Moving everything TCP did (retransmit, cc, flow control, mux, connection management) into user space means every QUIC packet has to: enter kernel → recvfrom() copy to user space → decrypt → handle → encrypt → sendto() copy back → NIC. Fastly's 2020 measurement: QUIC costs ~2x the CPU of TCP+TLS. That is HTTP/3's real downside, and we will revisit it in chapter 22.
在钻进每章细节之前,先把骨架记牢
memorise the skeleton before diving into each chapter
QUIC 的设计可以用三个小数字描述:4 个加密级(Initial / 0-RTT / Handshake / 1-RTT)、3 个 Packet Number 空间(Initial / Handshake / Application)、2 类 Header(Long / Short)。这三个数字之间的关系,是后面所有章节的预读骨架。
QUIC's design fits into three small numbers: 4 encryption levels (Initial / 0-RTT / Handshake / 1-RTT), 3 Packet Number spaces (Initial / Handshake / Application), 2 Header types (Long / Short). The relationship between these three numbers is the pre-read skeleton for every later chapter.
↓ UDP/443 · IPv4 / IPv6 · 链路层link layer
↓ UDP/443 · IPv4 / IPv6 · link layer
CRYPTO 帧携带 TLS 1.3 的 records,而不是反过来。这就是为什么 RFC 9001 叫 "Using TLS to Secure QUIC" 而不是 "QUIC over TLS"。
Note that TLS 1.3 is not below QUIC but inside QUIC. QUIC carries TLS 1.3 records inside CRYPTO frames, not the other way around. That is why RFC 9001 is titled "Using TLS to Secure QUIC" — not "QUIC over TLS".
字段:Version(32) · DCID Len(8) · DCID · SCID Len(8) · SCID · Type-specific...
Fields: Version(32) · DCID Len(8) · DCID · SCID Len(8) · SCID · Type-specific...
Initial · 0-RTT · Handshake · Retry
字段:Flags(8) · DCID · PN(8/16/24/32)
Fields: Flags(8) · DCID · PN(8/16/24/32)
1-RTT only
从 DNS 查询到 200 OK 到连接关闭 · 每一步都标 RFC §
from DNS query to 200 OK to connection close · every step pinned to its RFC §
接下来 14 章流水线都用同一条请求把它们串起来——在 Chrome 地址栏输入 https://ursb.me,按回车。我们跟着这次请求的字节流走完它的一生:DNS 解析、初次握手、传输请求、收到响应、连接闲置、网络切换、最后优雅关闭——一共 10 个阶段。每章都有一个 "◇ 在我们的 GET 请求里" 卡片,告诉你这一章的输入、变换、输出分别是什么。
这条主线的角色清单是:
The next 14 pipeline chapters all hang off one request: type https://ursb.me in Chrome, press Enter. We follow this request's byte stream through its full life: DNS query, first handshake, request payload, response, idle, network switch, graceful close — 10 phases. Every chapter below carries a "◇ In our GET request" card showing input, transform, output at that stage.
The cast on this main line:
Chrome 不会直接发 QUIC 包——它先要问 DNS:ursb.me 在哪?支持哪些 ALPN? 这里 Chrome 用 DoH(DNS over HTTPS,RFC 8484)向 1.1.1.1 查询,请求里同时问 A(IPv4)和 HTTPS(RFC 9460)两种 RR——后者一行就能拿到 ALPN 列表 + IP hint,省一个 RTT。
Chrome can't fire a QUIC packet yet — it needs DNS first: where's ursb.me? Which ALPNs does it speak? Chrome queries 1.1.1.1 over DoH (RFC 8484), asking simultaneously for A (IPv4) and the new HTTPS RR (RFC 9460). The latter returns ALPN + IP hint in one record, saving an RTT.
DNS 回包 5 ms 后,Chrome 拼出第一个真正的 QUIC 包。因为有上次会话的 PSK ticket,这次走 0-RTT:ClientHello 和 GET 请求一起放进同一个 UDP 数据报。
5 ms after the DNS response, Chrome assembles the first actual QUIC packet. Because we have a PSK ticket from last visit, this is a 0-RTT send: ClientHello and the GET request ride in the same UDP datagram.
20 ms 后第一个回程包到达。这是多包合并(coalesced datagram)的典型场景:服务器在同一个 UDP 数据报里塞了 Initial、Handshake、1-RTT 三种包,分别承载握手不同阶段的 CRYPTO 帧和首批数据。
20 ms later the first server datagram arrives. This is a classic coalesced case: the server packs Initial, Handshake and 1-RTT packets all into one UDP datagram, carrying CRYPTO frames for different handshake stages plus the first batch of response data.
前面那个 Handshake 包确认完后,Chrome 在 ~45 ms 收到完整正文。3200 字节的 HTML 通过同一个 Stream 0 的 DATA 帧分两个 1-RTT 包送到——这就是 0-RTT 的胜利:用户看到 200 OK 时握手还没完全结束。
A few packets later, the complete body lands by ~45 ms. The 3200-byte HTML rides Stream 0 in two DATA frames spread across 1-RTT packets. The 0-RTT win is concrete here: the user sees 200 OK before the handshake is fully closed.
Chrome 收完 3200 字节后在 STREAM 帧上看到 FIN=1,知道服务器不会再发了。客户端回一个空 STREAM(带 FIN)关闭自己的方向——这是双向流的半关闭语义。同时回一个 HANDSHAKE_DONE 的 ACK,让服务器知道可以丢掉 Handshake 密钥。
Once Chrome receives the 3200 bytes, the STREAM frame carries FIN=1 — no more data this direction. The client replies with an empty STREAM(FIN) to close its direction — bidirectional half-close semantics. It also ACKs HANDSHAKE_DONE, allowing the server to drop the Handshake keys.
连接没有立刻关——Chrome 默认会保留它 30 秒,等下一个请求(CSS、图片、API 调用)复用。期间双方按需发 PING 帧(RFC 9000 §19.2)防 NAT 表项过期。max_idle_timeout 在 TP 里协商出来——min(client 30s, server 30s) = 30s。
The connection doesn't close immediately — Chrome holds it for 30 s, hoping the next request (CSS, images, an API call) reuses it. Either side may send PING frames (RFC 9000 §19.2) to keep NAT mappings alive. max_idle_timeout was negotiated in TP — min(client 30s, server 30s) = 30s.
8 分钟后用户走出咖啡馆,手机切到 5G——src_ip 从 192.168.1.42 变成 10.220.5.13。Chrome 启用预存的备用 CID,服务器看到陌生 IP + 合法 CID 立刻发 PATH_CHALLENGE。一次 RTT 内完成路径验证,连接没断。
8 minutes later the user walks out of the café and the phone switches to 5G — src_ip flips from 192.168.1.42 to 10.220.5.13. Chrome activates the pre-stocked spare CID; the server sees a new IP with a valid CID and fires PATH_CHALLENGE. Path validated in one RTT; the connection survives.
15 分钟后服务器决定下线这个连接(也可能是版本升级、负载均衡、配额到期),发 GOAWAY(H3 帧 0x07)告诉客户端"我不再接受新流,但已开的流我处理完"。等所有未完成的流结束后,发 CONNECTION_CLOSE(QUIC 帧 0x1c)正式结束连接。然后进入 draining 状态 3 PTO,等任何延迟的包不再处理——避免和"新连接"混淆。详见 Ch19。
15 minutes in, the server decides to retire this connection (rolling deploy, load-balance, quota expiry). It sends GOAWAY (H3 frame 0x07): "I'll finish in-flight streams but accept no new ones." After the last stream is done, it sends CONNECTION_CLOSE (QUIC frame 0x1c). The server then enters draining state for 3 PTO, ignoring any late packets to avoid confusion with a "new" connection. See Ch19.
如果服务器进程意外重启(OOM、crash、容器升级),客户端发的下一个 1-RTT 包会让新进程找不到对应的连接上下文。新进程不能用 CONNECTION_CLOSE(没密钥也没握手状态),只能发一个 Stateless Reset——一段看起来像随机 UDP 数据但末尾带 16 字节 reset token(在阶段 2 的 NEW_CONNECTION_ID 里预发过)的包。客户端识别 token 后才能安全地说"对方真的丢状态了",然后销毁本地连接。这是 RFC 9000 §10.3 给出的无状态恢复路径。
If the server process unexpectedly restarts (OOM, crash, container upgrade), the next 1-RTT packet from the client finds the new process without any matching connection state. The new process can't send CONNECTION_CLOSE (no keys, no state). Instead it emits a Stateless Reset: a packet that looks like random UDP bytes but ends in the 16-byte reset token the original server pre-distributed via NEW_CONNECTION_ID in Phase 2. Only the client can recognise the token — and only then can it safely conclude "peer really lost state" and tear down locally. This is the stateless-recovery path of RFC 9000 §10.3.
每一章下面都会有一个 "◇ 在我们的 GET 请求里" 卡片,把这一章的输入/动作/输出对应到上面 10 个阶段。下面这张表先把对应关系列清楚——按这个顺序读:
Each chapter below carries a "◇ In our GET request" card that anchors its input / action / output to the 10 phases above. Use this table as the reading map:
| 主线阶段Main-line phase | 深入章节Drill-down chapter | RFC § |
|---|---|---|
| Phase 0 · DNS | Ch22 Field work · Ch04 UDP | 9460 · 8484 · 9250 |
| Phase 1 · Initial out | Ch06 UDP Datagram · Ch08 0-RTT | 9000 §17.2 · 9001 §4 |
| Phase 2 · Server crypto | Ch07 Handshake · Ch09 Crypto layers | 9001 §4-§5 · 8446 §4 |
| Phase 3 · 200 OK | Ch14 H3 frames · Ch15 QPACK | 9114 §7 · 9204 |
| Phase 4 · FIN | Ch11 Streams · Ch12 Loss | 9000 §3 (states) · §19.8 |
| Phase 5 · idle | Ch13 Congestion | 9000 §10.1 · §19.2 PING |
| Phase 6 · migration | Ch17 Migration | 9000 §9 |
| Phase 7-8 · close | Ch19 Lifecycle (new) | 9114 §5.2 · 9000 §10 |
| Phase 9 · stateless reset | Ch19 Lifecycle (new) | 9000 §10.3 · §18.2 |
"DNS 解析 5 ms · 握手 + 0-RTT GET 25 ms · 收到 200 OK 45 ms ·
15 分钟后优雅关闭。
整个过程 50% 的时间花在加密,30% 在等光速。" "DNS in 5 ms · handshake + 0-RTT GET in 25 ms · 200 OK at 45 ms ·
gracefully closed 15 minutes later.
Half the time was in crypto, a third in waiting on the speed of light." 主线 · 阶段总览 main-line · phase summary
字节级的 QUIC 包结构
QUIC packet structure, byte by byte
主线时间 T+20ms:你的 Chrome 浏览器把一段还没什么内容的 ClientHello(TLS 1.3)包成一个 UDP 包,从源端口 52341 发到目标 39.105.102.252:443。这一节我们把这个包按字节拆开。
Main-line time T+20ms: Chrome wraps a still-mostly-empty TLS 1.3 ClientHello into one UDP datagram, sent from source port 52341 to destination 39.105.102.252:443. This chapter takes that packet apart byte by byte.
QUIC 不只加密 payload,还加密 packet number 和 flags 的最低几位。具体做法:取 payload 加密后的密文取 16 字节"样本",用对应级别的密钥跑 AES-ECB 派生出一个 mask,把 mask 异或到 PN 和 flags 上。这一层"header protection"专门防中间盒读取 PN 做流量分析。
QUIC encrypts not only the payload but also the packet number and the low bits of the flags. The recipe: take a 16-byte sample of the ciphertext payload, run AES-ECB with the level's HP key to derive a mask, XOR the mask onto PN and flags. This "header protection" specifically defeats middleboxes that would otherwise read PN for traffic analysis.
h3 = 走的 HTTP/3。如果你看到 h2,说明被某个环节挡了——浏览器走了 TCP fallback。要查为什么,跑 chrome://net-export/ 导出 NetLog 再用 netlog-viewer.appspot.com 看。
The quickest way to see H3 in Chrome: DevTools → Network → enable the Protocol column. A row marked h3 = HTTP/3. If it says h2, something blocked you and the browser fell back to TCP. To diagnose, dump chrome://net-export/ and load it into netlog-viewer.appspot.com.
QUIC 不是 TLS over UDP,而是 QUIC carrying TLS
QUIC isn't TLS over UDP, QUIC carries TLS
把 HTTP/2 干掉的"2-RTT 起步"是 HTTP/3 最大的卖点。但要真正理解为什么 HTTP/3 能做到 1-RTT(重连 0-RTT),你需要看清 QUIC 和 TLS 1.3 是怎么融合的:不是上下层堆叠,而是 QUIC 用 CRYPTO 帧承载 TLS 1.3 的握手 records,让握手和应用数据共用一个 RTT。
Killing the "2-RTT minimum" left over from HTTP/2 is HTTP/3's biggest selling point. To really see why HTTP/3 hits 1-RTT (and 0-RTT on resumption), you need to look at how QUIC and TLS 1.3 merge: not as stacked layers, but QUIC carrying TLS 1.3 handshake records inside CRYPTO frames, letting handshake and application data share a single RTT.
| Protocol | handshake | + first data | 合计 | total |
|---|---|---|---|---|
| TCP + TLS 1.2 | 1 RTT (SYN) + 2 RTT (TLS) | + 1 RTT | 4 RTT | |
| TCP + TLS 1.3 | 1 RTT (SYN) + 1 RTT (TLS) | + 1 RTT | 3 RTT | |
| TCP Fast Open + TLS 1.3 | 0.5 RTT (TFO) + 1 RTT | + 1 RTT | 2 RTT* | |
| QUIC + TLS 1.3 (1-RTT) | 1 RTT (handshake + data) | — | 1 RTT | |
| QUIC + TLS 1.3 (0-RTT) | 0 RTT (data on first packet) | — | 0.5 RTT |
握手期间客户端和服务器各自声明一组 transport parameters(TP),夹在 TLS ClientHello / EncryptedExtensions 的扩展里。这是整个连接生命周期里所有窗口、超时、限额的源头。下面是 18 个标准参数中最关键的 11 个:
During handshake both sides declare a set of transport parameters (TP), wrapped inside TLS ClientHello / EncryptedExtensions extensions. This is the single source of truth for every window, timeout, and limit in the connection's lifetime. Eleven of the eighteen standard parameters that actually matter:
| id · name | 含义Meaning | Chrome 默认 |
|---|---|---|
| 0x01 max_idle_timeout | 空闲超时(取双方最小值)idle timeout (min of both) | 30 s |
| 0x02 stateless_reset_token | 用于 §10.3 无状态重置used by §10.3 stateless reset | 16 B random |
| 0x03 max_udp_payload_size | 能接受的最大 UDP 载荷max UDP payload accepted | 1452 |
| 0x04 initial_max_data | 连接级流控窗connection-level flow window | 10 MB |
| 0x05 init_max_stream_data_bidi_local | 本方主动开的双向流的初始窗stream window for streams we open | 6 MB |
| 0x06 init_max_stream_data_bidi_remote | 对方开的双向流streams peer opens | 6 MB |
| 0x07 init_max_stream_data_uni | 单向流unidirectional streams | 6 MB |
| 0x08 initial_max_streams_bidi | 允许并发双向流数concurrent bidi stream cap | 100 |
| 0x09 initial_max_streams_uni | 单向流数uni stream cap | 100 |
| 0x0b max_ack_delay | 最大 ACK 拖延(影响 PTO)max ACK delay (drives PTO) | 25 ms |
| 0x0c disable_active_migration | 禁用主动迁移(手机选 false)opt-out of active migration | false |
| 0x0e active_connection_id_limit | 允许对端预存的 CID 数peer's CID pool size | 8 |
| 0x20 max_datagram_frame_size | DATAGRAM 帧最大长度(默认 0 = 不启用)DATAGRAM frame max (0 = disabled) | 0 / 1200 |
max_idle_timeout=30s ⇒ 30s 起效;如果客户端说 30s 服务器说 10s,10s 生效。有些参数(如 disable_active_migration)只有服务器能发,客户端发了就是协议违反。
TP is not a negotiation — it's declarations. Each side independently states what it will accept. The effective value is the tighter of the two. Both say max_idle_timeout=30s ⇒ 30s wins; client says 30s, server says 10s ⇒ 10s wins. Some parameters (like disable_active_migration) are server-only; a client sending them is a protocol violation.
* TFO 的 cookie 在路上经常被中间盒丢,工程界一般不把它算成"真的可用"。
* TFO cookies frequently get stripped by middleboxes; in practice not considered "really usable".
免费的午餐,但有重放的尾巴
a free lunch, with a replay-attack tail
第一次访问 ursb.me 之后,服务器在 1-RTT 握手末尾发了一个 NewSessionTicket——这是一段被服务器密钥加密的 blob,里面装着 PSK。Chrome 把它存起来。下次再访问 ursb.me,Chrome 把 ticket 重新发回去,同时把 GET 请求用 PSK 派生的 0-RTT 密钥加密、放进 Early Data 一起发出去——握手第 0 个 RTT 应用数据就在路上了。
After the first visit to ursb.me, the server appends a NewSessionTicket at the tail of the 1-RTT handshake — an opaque blob encrypted by the server's own key, containing a PSK. Chrome stores it. On the next visit, Chrome ships the ticket back, and simultaneously encrypts the GET request with the PSK-derived 0-RTT key and sends it as Early Data — application bytes are flying before RTT 1.
0-RTT 的 PSK 没有新鲜度。攻击者可以录下你的第一个 UDP 包,重发任意多次——服务器无法区分"是你"还是"录像回放"。对查询型 GET 没问题(重复也是同一个结果),但如果是 POST /transfer/100USD,重放就是一百次转账。
The 0-RTT PSK carries no freshness. An attacker can record your first UDP datagram and replay it forever — the server can't tell "you" from "tape rewind". Fine for an idempotent GET (same answer). Catastrophic for POST /transfer/100USD — that's a hundred transfers.
Early-Data: 1。应用层(如 Cloudflare Worker)看到这行可以决定"不处理"或"返回 425 Too Early"。Early-Data: 1. The application layer (e.g. Cloudflare Worker) can then choose "don't process" or "return 425 Too Early".SSL_CTX_set_early_data_enabled + 集群级 deduper。SSL_CTX_set_early_data_enabled plus a cluster-wide deduper.Cloudflare 默认对所有客户开启 0-RTT,但仅限 GET/HEAD 且 URL 中不含 query string(query 经常是状态变更动作)。如果客户的 origin 返回 Cache-Control: private 或 Set-Cookie,Cloudflare 边缘自动把请求升级到 1-RTT 才转给 origin。Cloudflare 的工程博客《Even faster connection establishment with QUIC 0-RTT resumption》给出的实测:0-RTT 让已经访问过的回访用户首字节延迟(TTFB)的中位数降低 ~50ms。
Cloudflare enables 0-RTT for all customers by default, but only for GET/HEAD requests without a query string (queries are often state-changing). If the origin returns Cache-Control: private or Set-Cookie, the Cloudflare edge auto-promotes the request to 1-RTT before forwarding upstream. Per their blog «Even faster connection establishment with QUIC 0-RTT resumption», 0-RTT lowers median TTFB for returning users by ~50ms.
为什么 Initial 包"加密"但任何人都能解密
why Initial packets are "encrypted" yet anyone can decrypt them
0x38762cf7…0x38762cf7…f5b8)+ 客户端选的 DCID 派生。任何人都能算出来。所以 Initial 包的"加密"不是防窃听——它防的是"中间盒看了 ClientHello 之后做出不该做的事"。这是反僵化策略落在密钥层的体现。
Initial packets derive their keys from a public salt (RFC 9001 §5.2 spells out 0x38762cf7…f5b8) + the client-chosen DCID. Anyone can compute them. So Initial-packet "encryption" does not protect confidentiality — it protects against "middleboxes peeking at ClientHello and then acting on what they saw". This is anti-ossification at the key-schedule layer.
SSLKEYLOGFILE:浏览器把每一级的 secret 写到这个文件,Wireshark 读了之后能解所有 4 级。在 macOS 启动 Chrome:SSLKEYLOGFILE=~/keys.log /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome。一旦 Encrypted Client Hello(ECH,draft-ietf-tls-esni) 进入稳定版(Cloudflare 2023 已开,Chrome 117+ 默认),这条招就只能拿到 outer ClientHello,真正的 SNI 在 inner ClientHello 里被 HPKE 加密。
Wireshark needs SSLKEYLOGFILE to decrypt QUIC: the browser writes each level's secret to that file, and Wireshark can decode all four. On macOS, launch Chrome with SSLKEYLOGFILE=~/keys.log /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome. Once Encrypted Client Hello (ECH, draft-ietf-tls-esni) stabilises (Cloudflare turned it on in 2023; Chrome 117+ ships it on by default), this trick only yields the outer ClientHello — the real SNI lives inside an HPKE-encrypted inner ClientHello.
payload 不是字节流,是帧的串联
payload isn't a byte stream, it's a chain of frames
解密一个 QUIC 包的 payload,你得到的不是"一段数据",而是一串帧。每个帧自带类型和长度——服务器和客户端按顺序处理。下面把 RFC 9000 §19 的全部帧(加上 RFC 9221 的 DATAGRAM)整理成四类,让骨架可见。
Decrypt a QUIC packet's payload and you don't get "a chunk of data" — you get a chain of frames. Each carries its own type and length; both ends process them in order. Below is the full RFC 9000 §19 catalogue (plus RFC 9221 DATAGRAM), sorted into four families.
max_datagram_frame_size 限(默认禁用,需要双方协商)。它存在的全部理由是 WebTransport / MASQUE / Media-over-QUIC 这种"宁愿丢一帧也别等"的实时场景。普通 HTTP/3 流量根本不该碰它。
The DATAGRAM frame is QUIC's only unreliable payload — no retransmit, no flow control, no ordering. Max size capped by the max_datagram_frame_size SETTING (disabled by default, must be negotiated). Its sole purpose is to enable WebTransport / MASQUE / Media-over-QUIC — the "better-drop-than-wait" real-time use cases. Plain HTTP/3 traffic should never touch it.
STREAM 帧的低 3 位编码了三个独立开关:OFF(带不带偏移量)/ LEN(带不带长度)/ FIN(是不是流尾)。2³ = 8 个 type 编码 0x08-0x0f。
The low 3 bits of a STREAM frame encode three independent flags: OFF (carries an offset?), LEN (carries a length?), FIN (is this stream's last byte?). 2³ = 8 type codes 0x08-0x0f.
0x37 = 55;0x40 0x40 = 64;0x80 0x00 0x40 0x00 = 16384。这种"小数小占用"的设计让 ACK 帧之类的小包平均小 30%,是 QUIC 的隐形性能源。
Almost every length field in QUIC uses variable-length integers (var-ints): the top 2 bits decide 1/2/4/8-byte encoding. 0x37 = 55; 0x40 0x40 = 64; 0x80 0x00 0x40 0x00 = 16384. This "small = small" encoding makes small frames like ACKs ~30% smaller on average — an invisible source of QUIC's throughput edge.
QUIC 的 ACK 帧比 TCP 的 SACK 强 10 倍。一个 ACK 帧里可以装 Range:{largest_ack, [gap, ack_range]*}——告诉对方"我收到了 PN 100,PN 90-95 收到,PN 80-85 收到,..."最多一个 ACK 帧描述整个连接的所有已收。TCP SACK option 因为占在 TCP options 里,最多 4 个 range;QUIC 没限制。
QUIC's ACK frame is 10× more capable than TCP SACK. One ACK frame can pack multiple ranges: {largest_ack, [gap, ack_range]*} — "I have PN 100, 90-95, 80-85, ..." A single ACK frame can describe every received PN of the whole connection. TCP SACK lives in TCP options, capped at 4 ranges; QUIC has no such cap.
HTTP/2 在应用层做多路复用,HTTP/3 在传输层做
HTTP/2 mux at the app layer, HTTP/3 mux at transport
每个流有一个 var-int 编码的 ID。最低 2 位同时编码两件事:方向(双向 / 单向)+ 发起方(客户端 / 服务器)。
Every stream has a var-int ID. The low 2 bits encode two things at once: direction (bidi / uni) and originator (client / server).
| bits | 编码Encoded | 含义Meaning | HTTP/3 use |
|---|---|---|---|
| 0x00 | 0, 4, 8, 12, … | 客户端发起双向流Client-initiated bidi | 请求流request streams |
| 0x01 | 1, 5, 9, 13, … | 服务端发起双向流Server-initiated bidi | HTTP/3 不用unused in H3 |
| 0x02 | 2, 6, 10, … | 客户端发起单向流Client-initiated uni | control · QPACK encoder/decodercontrol · QPACK enc/dec |
| 0x03 | 3, 7, 11, … | 服务端发起单向流Server-initiated uni | control · QPACK · Pushcontrol · QPACK · Push |
GET ursb.me/,用的是 Stream ID = 0(第一条客户端双向流)。Chrome 同时打开三条单向流:StreamID=2(H3 control stream)、StreamID=6(QPACK encoder)、StreamID=10(QPACK decoder)。这就是为什么下一章讲 HTTP/3 帧时你会看到"控制流要先开"。
Our GET uses StreamID = 0 (the first client-initiated bidi stream). Chrome simultaneously opens three uni streams: StreamID=2 (H3 control), StreamID=6 (QPACK encoder), StreamID=10 (QPACK decoder). This is why the next chapter says "the control stream must open first".
每条流维护 MAX_STREAM_DATA。发送方累积发送的字节超过这个值就停。接收方通过 MAX_STREAM_DATA 帧主动增窗。
Each stream tracks MAX_STREAM_DATA. Sender stops when cumulative sent bytes hit the limit. Receiver grows the window with MAX_STREAM_DATA frames.
所有流字节的总和受 MAX_DATA 限。避免单个连接吃光内存。Chrome 默认 6 MB(OkHttp 25 MB · curl 1 MB)。
Sum of all streams' bytes capped by MAX_DATA. Stops one connection from eating all memory. Chrome defaults to 6 MB (OkHttp 25 MB · curl 1 MB).
流量控制有两个独立维度,每个维度都跑一组相同的状态变量:
Flow control runs in two independent dimensions, each with the same set of state variables:
"HTTP/2 用一个 TCP 连接装 100 条流,
HTTP/3 用一个 QUIC 连接装 100 条真正独立的流。" "HTTP/2 stuffs 100 streams into one TCP connection.
HTTP/3 stuffs 100 actually independent streams into one QUIC connection." RFC 9000 §2 paraphrased
为什么 QUIC 的 RTT 比 TCP 准
why QUIC measures RTT more accurately than TCP
TCP 的 sequence number 指代字节偏移。重传时 seq 完全相同——你收到的 ACK 到底是回原包还是回重传包?没法分。这就是著名的 retransmission ambiguity,导致 RTT 测量必须用"Karn 算法"忽略重传 RTT。
TCP seq numbers identify byte offsets. A retransmission has the same seq as the original. When an ACK arrives, you can't tell whether it's for the original or the retransmit. This is the infamous retransmission ambiguity; it forces TCP to use "Karn's algorithm" and discard retransmit RTT samples.
QUIC 的 packet number 永不复用。重传时新 PN,旧 PN 永远废弃。ACK 回的是哪个 PN,就是哪个 PN——RTT 测量绝对精确。这是 BBR 等高级拥塞控制能在 QUIC 上"开挂"的根源。
QUIC packet numbers are never reused. A retransmit carries a new PN, the old PN is dead forever. An ACK names exactly the PN it acknowledges — RTT samples are exact. This is why advanced cc like BBR runs better on QUIC than TCP.
smoothed_RTT + 4 × RTTVAR + max_ack_delay,就触发 PTO——发一个 PING 探测包"叫醒"对方。取代了 TCP 的 RTO 一次干 1 秒。smoothed_RTT + 4·RTTVAR + max_ack_delay, PTO fires — send a PING to "wake up" the peer. Replaces TCP's RTO with its 1-second hammer.QUIC 让拥塞控制变成应用配置
QUIC turns congestion control into an app setting
TCP 的拥塞控制写在内核里——升级一次要等几年。QUIC 把它搬到了用户态。Cloudflare 想换 BBR v3?改一行 Rust。Google YouTube 想用自家的 cc 算法?同样改一行 C++。这是 QUIC 真正的"研发加速器"价值——它让网络拥塞控制变成应用层关切,而不是十年内核排队等升级的事。
TCP cc lives in the kernel — upgrading takes years. QUIC moved it to user space. Cloudflare wants BBR v3? Change one Rust line. Google YouTube wants its own cc algorithm? Same — one C++ line. This is QUIC's real "R&D accelerator" value: it turns congestion control into an application concern, not a decade-long kernel queue.
| cc | signal | throughput | fairness | 部署在deployed at |
|---|---|---|---|---|
| NewReno (RFC 9002 default) | loss | baseline | good | 小实现库的默认smaller libs default |
| CUBIC (RFC 8312) | loss | 1.5x baseline | good | Linux TCP 默认 · ngtcp2Linux TCP default · ngtcp2 |
| BBR v2/v3 | bandwidth + RTT | 2-3x baseline | warn:CUBIC starve | Google · Cloudflare · Meta |
CUBIC / NewReno 用丢包当拥塞信号——但现代网络的丢包大多来自无线信道错误,不是拥塞。BBR 直接测量瓶颈带宽(max bandwidth)和最小 RTT,用 BDP(带宽时延积)当目标在途字节数。结果:BBR 在有损但不拥塞的链路(4G/5G/Wi-Fi)上吃满带宽,CUBIC 在那种链路上一脚刹车一脚油。
CUBIC / NewReno treat loss as the congestion signal — but most modern packet loss comes from wireless channel errors, not congestion. BBR directly measures bottleneck bandwidth (max bw) and minimum RTT, then uses BDP (bandwidth-delay product) as its target in-flight. Result: BBR saturates bandwidth on lossy but uncongested links (4G/5G/Wi-Fi), where CUBIC stutters between brake and accelerator.
Google 2017 年 SIGCOMM 论文《BBR: Congestion-Based Congestion Control》给出:在美国跨州链路上,BBR 让 YouTube 的视频缓冲事件率下降 53%,启动时间降低 8%。2024 年 BBR v3 进一步把吞吐稳定性提升约 15%。Google 把 BBR 同时部署到 TCP(Linux 内核 4.9+)和 QUIC(QUICHE)——但 QUIC 上的 BBR 因为 PN 单调更精确(见 Ch12),效果更稳。
Google's 2017 SIGCOMM paper «BBR: Congestion-Based Congestion Control» reported: on US cross-state links, BBR reduced YouTube's video rebuffer rate by 53% and startup time by 8%. BBR v3 (2024) tightened throughput stability another ~15%. Google deploys BBR on both TCP (Linux kernel 4.9+) and QUIC (QUICHE) — but the QUIC variant runs more stably thanks to monotonic PN (see Ch12).
QUIC 把 packet number 都加密了——运营商再也不能用过去测 TCP RTT 的招测 QUIC RTT。这让大量运营商抓狂(他们的 SLA 监控、流量调度全靠 RTT 数据)。QUIC WG 妥协的设计:Spin Bit——short header 里有 1 比特,在每个 RTT 翻转一次,中间盒不解密也能被动测算 RTT。客户端可以选择关闭它(出于隐私),但生产环境基本都开。
QUIC encrypts packet numbers — operators can no longer measure RTT the way they did with TCP. This drove operators wild (their SLAs and traffic engineering all depend on RTT). QUIC WG's compromise: Spin Bit — 1 bit in the short header that flips once per RTT. Middleboxes can passively measure RTT without decrypting. Clients may disable it for privacy, but in production it's almost always on.
1/cwnd(即每 RTT 涨 1 包);(3) 持续拥塞——3 个 PTO 没有任何 ACK,被认定为路径中断,cwnd 重置到最小。BBR 抛弃了这套循环,直接测量瓶颈带宽,因此性能优于 NewReno 1.5-3 倍——参见 Ch20 性能数据。
Three concepts in this pseudocode: (1) slow start — cwnd grows by one segment per ACK; (2) congestion avoidance — cwnd grows by 1/cwnd per ACK (i.e. 1 packet per RTT); (3) persistent congestion — 3 PTOs with no ACK is treated as a path break, cwnd resets to minimum. BBR ditches this whole loop and directly measures bottleneck bandwidth, hence 1.5-3× the throughput of NewReno — see Ch20 for production numbers.
QUIC 已经做完的事,HTTP/3 就不再重复
whatever QUIC already did, HTTP/3 doesn't redo
HTTP/2 有 10 种帧(DATA / HEADERS / PRIORITY / RST_STREAM / SETTINGS / PUSH_PROMISE / PING / GOAWAY / WINDOW_UPDATE / CONTINUATION),HTTP/3 只有 7 种——因为 QUIC 把流控制、流终止、ping、优先级都包了。HTTP/3 只剩"HTTP 自己的事"。
HTTP/2 has 10 frame types. HTTP/3 has 7 — because QUIC already handles flow control, stream reset, ping, and priority. HTTP/3 only carries "HTTP's own business" now.
| Type | Hex | 用途Purpose | HTTP/2 里In HTTP/2 |
|---|---|---|---|
| DATA | 0x00 | HTTP bodyHTTP body | same |
| HEADERS | 0x01 | QPACK 压缩头部QPACK-encoded headers | same |
| CANCEL_PUSH | 0x03 | 取消 Push(已死)cancel push (dead) | — |
| SETTINGS | 0x04 | 连接参数connection params | same |
| PUSH_PROMISE | 0x05 | 服务器 Push(已死)server push (dead) | deprecated |
| GOAWAY | 0x07 | 优雅关闭graceful close | same |
| MAX_PUSH_ID | 0x0d | 允许的 Push ID 上限push limit | — |
| — 砍掉 —— removed — | — | PRIORITY · RST_STREAM · PING · WINDOW_UPDATE · CONTINUATION | QUIC 处理handled by QUIC |
0x00 = control, 0x01 = push, 0x02 = QPACK encoder, 0x03 = QPACK decoder。GREASE 类型(用预留范围 0x1f * N + 0x21,RFC 9114 §7.2.8 + RFC 9287)任何端都可以发——这就是 RFC 9114 的反僵化策略:故意送一些对方不认识的流,强迫实现"遇到不认识就忽略",否则永远不会有 0x04 出现。
A uni stream's first byte is the stream type, not a frame type. 0x00 = control, 0x01 = push, 0x02 = QPACK encoder, 0x03 = QPACK decoder. GREASE types (reserved range 0x1f·N + 0x21, RFC 9114 §7.2.8 + RFC 9287) can be sent by either side — RFC 9114's anti-ossification trick: deliberately send streams the peer doesn't recognise, forcing implementations to "ignore unknown", so 0x04 can land in the future.
HTTP/2 的优先级是个有名的笑话——RFC 7540 §5.3 设计了一棵weighted dependency tree,让客户端"告诉服务器谁先发"。Firefox 写过、Chrome 写过、Safari 没写。三家行为完全不一致,最后 RFC 9113 把它整段废弃了。
HTTP/3 选择了完全不同的路线 ——RFC 9218 · Extensible Priorities for HTTP(2022-06,和 RFC 9114 同期发):
HTTP/2's priority was a famous joke — RFC 7540 §5.3 designed a weighted dependency tree for clients to tell servers "send these first". Firefox shipped one. Chrome shipped a different one. Safari shipped none. The three implementations behaved nothing alike. RFC 9113 finally obsoleted the whole thing.
HTTP/3 went a different route entirely — RFC 9218 · Extensible Priorities for HTTP (2022-06, shipped with RFC 9114):
priority: u=3 — urgency 0(高)…7(低)i — incremental(流式可逐字节渲染)priority 选项手动覆盖:fetch(url, { priority: 'high' })。这是 RFC 9218 在浏览器侧的唯一对外接口。
Chrome DevTools → Network → "Priority" column on the right. Chrome maps main resource / CSS / JS / image / font internally to u=0..5. You can override with the Fetch API's priority option: fetch(url, { priority: 'high' }). That's the only browser-facing surface for RFC 9218.
为什么不能直接用 HPACK
why we couldn't just keep HPACK
HPACK(HTTP/2)依赖一个严格同步的动态表。服务器在 Stream A 发了 ":status: 200",告诉客户端"把这条加进表,索引 62"。下一个流可以用索引 62 来引用——前提是 Stream A 在 Stream B 之前到达。HTTP/2 over TCP 天然按序,所以没问题。
QUIC 各流相互独立、并发到达。Stream A 的 update 还没来,Stream B 已经用了索引 62——无法解压。这就把 transport 层好不容易消灭的 head-of-line blocking 又拽回了应用层。
HPACK (HTTP/2) depends on a strictly synchronised dynamic table. Server sends ":status: 200" on Stream A and says "insert this, index 62". The next stream can now refer to index 62 — assuming Stream A arrives before Stream B. HTTP/2 over TCP is naturally ordered, so this works.
QUIC streams are independent and arrive concurrently. If Stream A's update hasn't landed yet but Stream B already references index 62 — cannot decode. The head-of-line blocking the transport layer worked so hard to kill comes roaring back at the app layer.
alt-svc、content-security-policy、strict-transport-security、:scheme: https 等现代 web 必备字段。alt-svc, content-security-policy, strict-transport-security, :scheme: https and other modern-web staples.| scenario | raw bytes | HPACK (H2) | QPACK (H3) |
|---|---|---|---|
| 首次请求first request | ~600 | ~50 | ~52 |
| 同连接重复请求repeated request, same conn | ~600 | ~5 | ~6 |
| 弱网(丢包)weak link (lossy) | ~600 | ~5 + HOL | ~6 (no HOL) |
压缩率本身差不多。QPACK 的赢面在抗丢包。
Compression ratios are nearly identical. QPACK's win is in resistance to loss.
http3_max_field_size,那就是它。
Most QUIC libraries default the dynamic table to 4 KB — much smaller than HPACK's 64 KB. Reason: the bigger the table, the more "blocked streams" pile up. Bump it up on intra-datacenter / low-latency paths; don't on public / mobile networks. The nginx knob is http3_max_field_size.
一个写在 RFC 里、被市场否决的功能
a feature that lived in the RFC and died in production
2015 年 HTTP/2 把 Server Push 当成杀手特性写进了 RFC 7540——服务器知道客户端马上要 app.css,那为什么不提前推给它?2022 年 Chrome 106 默认禁用了 Server Push。2024 年彻底从 Chromium 代码里移除。HTTP/3 RFC 9114 出于"协议完整性"保留了 PUSH_PROMISE 帧——但浏览器都不接。
In 2015, HTTP/2 wrote Server Push into RFC 7540 as a killer feature — the server knows the client will need app.css, so why not push it ahead of time? In 2022, Chrome 106 disabled Server Push by default. In 2024, it was deleted from the Chromium tree. HTTP/3 RFC 9114 kept the PUSH_PROMISE frame for "protocol completeness" — but no browser accepts it anymore.
服务器盲目 push app.css——但如果客户端缓存里已经有了呢?带宽白浪费。Chrome 实测发现 70%+ 的 push 被客户端立即 CANCEL_PUSH 掉。
The server blindly pushes app.css — but what if the client already has it cached? Bandwidth wasted. Chrome's telemetry: 70%+ of pushes get immediately CANCEL_PUSHed.
服务器推的 app.css 在线上跟客户端发起的 app.js 抢拥塞窗。BBR 不知道哪个更急——结果两个都慢。
The server-pushed app.css competes with the client-issued app.js on the congestion window. BBR can't tell which is more urgent — both end up slower.
服务器先发一个 HTTP 103 Early Hints 响应(RFC 8297),告诉客户端"你可能会需要 app.css"。客户端自己决定要不要 preload。简单、可观察、不抢带宽。
Server sends a HTTP 103 Early Hints response (RFC 8297) telling the client "you'll probably need app.css". The client decides whether to preload. Simple, observable, no bandwidth war.
CDN(Cloudflare 等)依然在边缘到 origin 之间偷偷用 Push 做 prefetch 优化——这不进客户端浏览器,所以不受 Chrome 106 影响。这种"内网 Push"还活着。
CDNs (Cloudflare et al.) still quietly use Push between their edge and origin for prefetch optimisation — that traffic never reaches the client browser, so Chrome 106 doesn't affect it. "Intra-network Push" lives on.
"Server Push 在 RFC 里完美无瑕,
在生产里几乎没找到一个稳定的用例。" "Server Push was flawless in the RFC,
and almost no stable use case ever showed up in production." Patrick Meenan · Chrome Web Performance · 2022
CID 是 QUIC 的身份证
the Connection ID is QUIC's passport
主线时刻 T+200ms(请求中途):你走出咖啡馆,手机自动切到 5G。src_ip: 192.168.1.42 → 10.220.5.13。TCP 在这里必死,因为连接由四元组定义。HTTP/3 不死——因为 QUIC 连接由 Connection ID 定义,而不是四元组。
Main-line time T+200ms (mid-request): you walk out of the café, the phone hops to 5G. src_ip: 192.168.1.42 → 10.220.5.13. TCP dies here, because TCP identifies a connection by the 4-tuple. HTTP/3 doesn't die — because QUIC identifies a connection by the Connection ID, not by IP-port.
连接建立后,服务器和客户端不停发 NEW_CONNECTION_ID 帧,互相给对方备好"未来可以用的 CID 列表"。每个 CID 还附带一个 Stateless Reset Token——用于无状态重置。
Once the connection is up, both sides keep emitting NEW_CONNECTION_ID frames, populating each other's "list of CIDs you may use in future". Each CID carries a Stateless Reset Token too — for stateless reset.
家用路由器的 NAT 表项一般有过期时间(30 秒~2 分钟)。如果客户端短时间没发包,NAT 会回收映射;下次再发包时,src_port 可能变了——这等于一次"客户端不知情的迁移"。RFC 9000 §9 把这种情况归到 "passive migration",处理逻辑和主动迁移一致:服务器看到新 4-tuple 就发 PATH_CHALLENGE。
Home router NAT entries usually have an expiration (30s-2min). If the client stays silent, NAT recycles the mapping; the next packet may have a different src_port — effectively a "migration the client doesn't know about". RFC 9000 §9 calls this "passive migration", handled identically: the server sees a new 4-tuple and sends PATH_CHALLENGE.
3x 限制 · Retry · MPQUIC
3x limit · Retry · MPQUIC
UDP 无连接 ⇒ 服务器不知道"请求人是不是真的在这个 src_ip"。攻击者可以伪造 victim 的 src_ip 给 QUIC 服务器发 1 字节小包,让服务器回复 10000 字节大包到 victim ——典型的 DNS amp 攻击套路。QUIC 必须从协议层防住。
UDP is connectionless ⇒ the server doesn't know "is the requester really at this src_ip?" An attacker can spoof a victim's src_ip, send 1-byte QUIC packets to the server, and trick it into firing 10 000-byte responses at the victim — the classic DNS amp pattern. QUIC has to defend at the protocol level.
如果服务器收到的 ClientHello 看起来可疑(流量异常、资源紧张),可以回一个 Retry 包——里面装一个加密的 token。客户端必须重发 ClientHello 并带上 token。token 等于"我证明你在这个 IP"——下次再来直接信任。Cloudflare 在 DDoS 攻击期间会大量使用 Retry。
If a ClientHello looks suspicious (traffic spikes, resource crunch), the server can return a Retry packet carrying an encrypted token. The client must re-send ClientHello with that token. The token attests "I've proven you're at this IP" — next visits skip the check. Cloudflare hammers Retry during DDoS storms.
draft-ietf-quic-multipath(截至 2026 已成熟)允许一个 QUIC 连接同时跑 Wi-Fi 和 5G 两条路径。包号空间共享,stream 数据在两条路径上自由调度。Apple iCloud Private Relay 是最早的大规模生产 MPQUIC 部署。
与 MPTCP 对比:MPTCP 只能在内核做,部署率 < 1%;MPQUIC 完全在用户态,每个 QUIC 库都可以独立实现。
draft-ietf-quic-multipath (mature by 2026) lets one QUIC connection simultaneously use Wi-Fi and 5G. Packet number spaces are shared; stream data schedules freely across paths. Apple iCloud Private Relay is the earliest large-scale MPQUIC deployment.
vs MPTCP: MPTCP is kernel-only, < 1% deployed. MPQUIC lives entirely in user space — any QUIC library can implement it independently.
Apple 使用 MASQUE(CONNECT-UDP)把 QUIC 隧道分发给两个独立的中继节点。手机端的 NSURLSession + MPQUIC 自动在 Wi-Fi/5G 两条物理路径上做透明聚合——当 Wi-Fi 抖动时,5G 直接接管,应用层零感知。这是第一次在消费级设备上规模化跑 MPQUIC。
Apple uses MASQUE (CONNECT-UDP) to distribute QUIC tunnels across two independent relay nodes. NSURLSession + MPQUIC on the phone transparently aggregates across Wi-Fi/5G physical paths — when Wi-Fi jitters, 5G takes over instantly, with zero app awareness. The first consumer-scale MPQUIC deployment.
GOAWAY · CONNECTION_CLOSE · draining · idle · stateless reset
GOAWAY · CONNECTION_CLOSE · draining · idle · stateless reset
之前 18 章都讲请求来的事——但一个真实的 QUIC 连接还要走完关闭、排空、复活三种结局。生产环境里大部分 bug、半小时一次的"无原因连接重置"、CDN 滚动重启时的瞬时错误,全藏在这一章。
The previous 18 chapters covered request arrival. A real QUIC connection still has to walk through close, drain, revive. Most production bugs, the "mysterious connection resets" every 30 minutes, the transient errors during CDN rolling restarts — they all hide in this chapter.
GOAWAY(RFC 9114 §5.2,H3 帧 0x07)告诉客户端"新流我不接,已开的流我处理完"。等所有 stream 跑完,发 CONNECTION_CLOSE(RFC 9000 §19.19,QUIC 帧 0x1c)正式结束。GOAWAY first (RFC 9114 §5.2, H3 frame 0x07): "no new streams, but I'll finish in-flight ones". Once every stream completes, it sends CONNECTION_CLOSE (RFC 9000 §19.19, QUIC frame 0x1c) for real.CONNECTION_CLOSE(error=N)。所有进行中的流立即收到 RESET_STREAM。常见于客户端检测到加密协议错误时——比如 PN 单调性被破坏(§13.2.3)。CONNECTION_CLOSE(error=N) at once. All in-flight streams receive RESET_STREAM. Common when the client detects a crypto-layer violation — e.g. PN monotonicity broken (§13.2.3).max_idle_timeout,取较小值。30 秒没收到任何包,连接静默销毁——不发 CC,不通知对端。这是 NAT 表项过期的常态。要保活:发 PING 帧(§19.2)刷新计时器。max_idle_timeout in TP, take the smaller. After 30 s with no packets, the connection is silently destroyed — no CC, no peer notification. This is also how NAT entries die. To prevent: send PING (§19.2) to reset the timer.关闭不能立刻完成——因为对端可能还在 in-flight 中送包过来。如果端点立刻销毁连接状态、再开一个新连接,新连接可能收到旧连接的包,把它误当成新连接的握手包处理——后果可能很严重。
RFC 9000 §10.2 的解法是:发完 CONNECTION_CLOSE 后进入 closing 状态,3 PTO 之内每收到一个包就回一次 CC(用 idempotent CC 避免对端不断重试);然后进入 draining,纯丢包 3 PTO;最后才进入 closed 销毁内存。这 3+3=6 PTO 大约 100-300ms——是 QUIC 连接关闭的真实耗时,不是你看到的"立刻"。
Close cannot complete instantly — the peer might still be sending packets in-flight. If an endpoint frees state immediately and opens a fresh connection, the fresh one might receive the old connection's packets and confuse them with new-connection handshake — potentially catastrophic.
RFC 9000 §10.2's fix: after sending CONNECTION_CLOSE, enter closing; for 3 PTO reply with another CC to every incoming packet (idempotent CC prevents the peer's retries). Then enter draining: silently drop everything for another 3 PTO. Only then enter closed and free memory. 3 + 3 = 6 PTO ≈ 100-300 ms — that's the real cost of closing a QUIC connection, not the "instant" you see.
GOAWAY(stream_id=∞) 标记 "不接新请求",等 ~30 秒让 in-flight 完成,再发 GOAWAY(0) + CONNECTION_CLOSE。Cloudflare 的 Pingora 框架专门为这套逻辑做了状态机。
CDNs (Cloudflare, Fastly, Akamai) must implement GOAWAY correctly during edge node rolling restarts, or millions of long-lived connections get reset at once — every client reconnects simultaneously = thundering herd. Correct sequence: send GOAWAY(stream_id=∞) marking "no new requests", wait ~30 s for in-flight to drain, then GOAWAY(0) + CONNECTION_CLOSE. Cloudflare's Pingora framework has a dedicated state machine for this.
CONNECTION_CLOSE 帧带一个错误码——按"是 QUIC 层错还是 H3 层错"分两种:
CONNECTION_CLOSE carries an error code — split into "QUIC-layer" vs "H3-layer":
| frame 0x1c · QUIC 层QUIC-layer | code | frame 0x1d · H3 层(透传)H3-layer (passthrough) | code |
|---|---|---|---|
| NO_ERROR | 0x00 | H3_NO_ERROR | 0x0100 |
| INTERNAL_ERROR | 0x01 | H3_GENERAL_PROTOCOL_ERROR | 0x0101 |
| CONNECTION_REFUSED | 0x02 | H3_INTERNAL_ERROR | 0x0102 |
| FLOW_CONTROL_ERROR | 0x03 | H3_STREAM_CREATION_ERROR | 0x0103 |
| STREAM_LIMIT_ERROR | 0x04 | H3_CLOSED_CRITICAL_STREAM | 0x0104 |
| STREAM_STATE_ERROR | 0x05 | H3_FRAME_UNEXPECTED | 0x0105 |
| PROTOCOL_VIOLATION | 0x0a | H3_REQUEST_REJECTED | 0x010b |
| CRYPTO_ERROR(N) | 0x0100+N | H3_VERSION_FALLBACK | 0x0110 |
完整清单:RFC 9000 §20 列 18 个 QUIC 错误码;RFC 9114 §8.1 列 17 个 H3 错误码。CRYPTO_ERROR(N) 把所有 TLS Alert 透传出来——比如 CRYPTO_ERROR(0x132) = TLS BAD_RECORD_MAC。
Full lists: RFC 9000 §20 defines 18 QUIC error codes; RFC 9114 §8.1 defines 17 H3 error codes. CRYPTO_ERROR(N) tunnels any TLS Alert — e.g. CRYPTO_ERROR(0x132) = TLS BAD_RECORD_MAC.
NEW_CONNECTION_ID 时(RFC 9000 §18.2)都会附上一个 stateless_reset_token,由 HMAC(reset_secret, CID) 派生。客户端把所有看到的 token 存起来;下次如果收到一个"看起来像随机包"且末尾 16 字节命中其中一个 token,就触发 stateless reset 销毁路径。无密钥下的状态恢复——这是 QUIC 工程最优雅的设计之一。
The server attaches a stateless_reset_token every time it sends NEW_CONNECTION_ID (RFC 9000 §18.2), derived as HMAC(reset_secret, CID). The client stores every token it's ever seen. Next time it receives a "looks-random" packet whose last 16 bytes match a stored token, it triggers the stateless-reset teardown path. Keyless state recovery — one of the most elegant designs in QUIC engineering.
如果一条连接活了几小时(比如 WebSocket 替代品),用同一把 1-RTT 密钥发太多包会增加分析攻击面。RFC 9001 §6 给出了原地滚动密钥的机制:发送方把 short header 的 Key Phase 位(1 bit)翻转,并用派生的下一代密钥加密。接收方看到 Key Phase 变了,跑一次 HKDF 派生新密钥解密。这一切不需要新一轮握手。
If a connection lives for hours (e.g. as a WebSocket replacement), using the same 1-RTT key for too many packets opens analysis attack surface. RFC 9001 §6 defines in-place key rotation: the sender flips the short-header's Key Phase bit (1 bit) and encrypts with the next-generation derived key. The receiver notices Key Phase changed, runs an HKDF step to derive the new key, decrypts. All this without a new handshake.
"关闭不是事件,是过程。" "Close isn't an event, it's a process." Martin Thomson · QUIC WG · RFC 9000 design note
2026 年的版图
the 2026 landscape
| library | lang | 谁用Used by | 特点Strength |
|---|---|---|---|
| Google quiche | C++ | Chrome · gRPC · Envoy | 最早最完整most complete |
| Cloudflare quiche | Rust | CF edge · nginx-quic | 最快 C-APIfastest C-API |
| msquic | C | Windows Server · .NET | 内核态加速kernel-mode boost |
| quic-go | Go | Caddy · IPFS | Go 生态唯一Go-ecosystem standard |
| quinn | Rust | Hyper · Tonic · IPFS | 异步原生async-native |
| ngtcp2 + nghttp3 | C | curl · Node.js | 最克制最稳lean & rock-stable |
| aioquic | Python | 学术研究 · CTFresearch · CTF | 易读源码readable source |
| s2n-quic | Rust | AWS | 安全审计严格security-first |
| picoquic | C | 学术参考实现academic reference | IETF interop 主力IETF interop workhorse |
| lsquic | C | LiteSpeed | 嵌入式部署embeddable |
来源:Web Almanac 2025、Cloudflare Radar、W3Techs。CDN 默认开启(Cloudflare / Fastly / Akamai / AWS CloudFront / Google Cloud LB)是普及主因。
Source: Web Almanac 2025, Cloudflare Radar, W3Techs. CDN default-on (Cloudflare / Fastly / Akamai / AWS CloudFront / Google Cloud LB) drove the bulk of adoption.
不是包治百病的灵药
not a panacea
| 公司 · 场景Company · scenario | 指标Metric | 提升Improvement | 来源Source |
|---|---|---|---|
| Google · YouTube India (4G) | 视频卡顿率中位video rebuffer median | −20% ~ −40% | SIGCOMM 2017 · Langley et al. |
| Google · Search | tail latency | −16% | SIGCOMM 2017 |
| Meta · Facebook App | 请求错误率request error rate | −5% | Meta Engineering · 2020 |
| Meta · video stream | video stall rate | −20%+ | Meta Engineering · 2020 |
| Cloudflare · returning users | 0-RTT median TTFB | −50ms | CF blog · 0-RTT resumption |
| Cloudflare · global | 弱网 TTFBpoor-link TTFB | −10% ~ −15% | CF Radar · 2024 |
| Fastly · GA launch | cold connect | −40% | Fastly blog · RFC 9000 GA |
| Apple · iCloud Private Relay | 切网 RTT 抖动network-switch RTT jitter | ~ 0(看不出)~ zero (imperceptible) | WWDC 2022 · session 110337 |
数字来自厂商公开 blog / SIGCOMM 论文。原文如有更新请以最新版本为准;上表数字保留首次公开值。
Numbers cite each vendor's first public disclosure on blog or SIGCOMM. If the post has been updated since, the original disclosure value is kept here.
数据中心内部丢包 < 0.01%,TCP HOL 几乎不发生。但 HTTP/3 用户态 UDP 处理带来 2× CPU 成本。结果是纯吞吐 H2 over TCP 完胜。gRPC 至今主流仍是 HTTP/2。
Intra-DC loss < 0.01%, TCP HOL almost never fires. But HTTP/3's user-space UDP carries a 2× CPU tax. On pure throughput, H2 over TCP crushes. gRPC still defaults to HTTP/2.
~8% 连接尝试因 UDP/443 被防火墙阻断。浏览器 Happy Eyeballs 会自动 fallback 到 H2 over TCP——但用户先付了"试错"的延迟。
~8% of connection attempts get UDP/443 blocked by firewalls. Browser Happy Eyeballs auto-falls back to H2 over TCP — but the user has already paid the "tried it and failed" latency.
如果你的 LCP 主要花在服务端渲染或 JS 主线程上,省下来的 RTT 在水池里游泳,看不见。Patrick Meenan:H3 提升下限,不抬上限。
If your LCP is dominated by SSR or JS main-thread work, the saved RTTs swim in a pond — invisible. Patrick Meenan: H3 raises the floor, not the ceiling.
手机用户、4G/5G、丢包 1-3%、页面有 50+ 子请求——这是 H3 设计场景。0-RTT、连接迁移、流独立丢包恢复全用上。
Mobile users, 4G/5G, 1-3% loss, page has 50+ subresources — H3's home turf. 0-RTT, migration, per-stream loss recovery all fire.
"如果你不知道你的用户在哪里,
HTTP/3 就是合理的默认选择。" "If you don't know where your users are,
HTTP/3 is the sensible default." Lucas Pardue · Cloudflare · IETF 116
DNS · curl · Wireshark · qlog · sysctl
DNS · curl · Wireshark · qlog · sysctl
服务器在响应头里加一行:alt-svc: h3=":443"; ma=86400。客户端记 24 小时,第二次访问才走 H3。意味着新用户的首屏永远拿不到 0-RTT。
Server appends a response header: alt-svc: h3=":443"; ma=86400. Client caches it 24h, uses H3 from the next visit. Meaning new users never get 0-RTT on the first paint.
在 DNS 区文件加一行:ursb.me. 300 IN HTTPS 1 . alpn="h3,h2" ipv4hint="39.105.102.252"
浏览器解析 DNS 就拿到了——第一次访问直接走 H3。配合 RFC 8484 DoH 或 RFC 9250 DoQ,连 DNS 查询本身都加密。
Add one line to the DNS zone:ursb.me. 300 IN HTTPS 1 . alpn="h3,h2" ipv4hint="39.105.102.252"
The browser gets it at DNS resolution time — first visit goes straight to H3. Combined with RFC 8484 DoH or RFC 9250 DoQ, the DNS query itself is encrypted.
1.1.1.1 都支持。相比 DoT(DNS over TLS)省 1 RTT,相比 DoH 省 HTTP/3 那一层开销。ALPN 编号是 doq,默认端口 853。
DNS over QUIC (DoQ) is QUIC's second-biggest application — not HTTP/3, just plain DNS queries stuffed into a QUIC stream. AdGuard, NextDNS, Cloudflare 1.1.1.1 all support it. vs DoT it saves 1 RTT; vs DoH it skips the HTTP/3 overhead. ALPN doq, default port 853.
因为 QUIC 加密了一切,光抓包看不出连接内部发生了什么。IETF 用 qlog(draft-ietf-quic-qlog-main-schema,2024 已多版)定义了一份结构化 JSON 日志格式——服务端/客户端用任何 QUIC 库都可以输出 qlog,把它扔到 qvis.quictools.info 就能看到拥塞窗口曲线、PN 单调性、ACK 时序、loss event、stream 优先级。这是 H3 调试的唯一正解。
Because QUIC encrypts everything, raw pcap shows nothing about what's happening inside. IETF defined qlog (draft-ietf-quic-qlog-main-schema, several revisions by 2024) — a structured JSON log format any QUIC library can emit. Drop it into qvis.quictools.info and you get the congestion-window curve, PN monotonicity, ACK timeline, loss events, stream priorities. The only sane debug path for H3.
| 设置Knob | 默认Default | 推荐Recommended | 为什么Why |
|---|---|---|---|
net.core.rmem_max | 208 KB | ≥ 2.5 MB | 单个 UDP socket 缓冲,避免突发丢包single-socket buffer to absorb bursts |
net.core.wmem_max | 208 KB | ≥ 2.5 MB | 同上 · 发送方same · send side |
GSO/GRO | off | on | 让网卡分片 = CPU 降一半NIC segmentation = halve CPU |
SO_REUSEPORT | — | on · per-core | 用 eBPF 把 CID 路由到 CPUeBPF-route CID → CPU |
io_uring | — | experimental | 异步 IO · 减少系统调用async I/O · fewer syscalls |
| QPACK dynamic table | 4 KB | 4-16 KB | 大 = 压缩好但 HOL 风险larger = better compression, more HOL risk |
listen 443 quic reuseport;——少了 reuseport 单核 CPU 直接吃满;(3) 在同一份配置里保留 listen 443 ssl; 走 TCP fallback;(4) 加 add_header alt-svc 'h3=":443"; ma=86400';——一开始我就忘了这条,浏览器永远走不到 H3。
The compulsory checklist for first-time HTTP/3 on nginx 1.26: (1) replace OpenSSL with the quictls fork or it won't build; (2) configure listen 443 quic reuseport; — without reuseport one CPU core pegs at 100%; (3) keep listen 443 ssl; in the same config for TCP fallback; (4) add add_header alt-svc 'h3=":443"; ma=86400'; — I once forgot this and the browser never upgraded.
没有免费的午餐
no free lunches
Fastly 在 2020 年公开的实测:在相同吞吐下,HTTP/3 的 CPU 消耗是 HTTP/2 over TLS 的 1.5x ~ 2x。原因:每个 UDP 包都要进出用户态、做独立 AEAD 加解密、维护用户态拥塞控制状态。这是 CDN 厂商真正头疼的事——同样的服务器,H3 流量上限只有 H2 的一半。
Fastly's 2020 disclosure: at equal throughput, HTTP/3 burns 1.5x – 2x the CPU of HTTP/2 over TLS. Reason: every UDP packet crosses user/kernel boundary, does its own AEAD encrypt/decrypt, and maintains user-space cc state. The real CDN pain — the same box can carry half the H3 traffic of H2.
AF_QUIC。还在讨论,远未合入。AF_QUIC. Still discussion, far from merge.过去运营商靠 TCP 序列号、SACK、SNI 明文做带宽统计、QoS 调度、DPI 拦截。QUIC 把这些全加密了。运营商失去了路径上的"抓手"——这是有意的,但也是一些行业(金融监管、合规审计、家长控制)真正头疼的事。Spin Bit 是部分妥协,但远远不够。
Carriers used to measure bandwidth, do QoS, run DPI based on TCP seq/SACK/cleartext SNI. QUIC encrypted all of that. Operators lost their "handles" on the path — this was intentional, but it's a real pain for industries like financial regulation, compliance auditing, parental control. Spin Bit is a partial compromise; nowhere close to enough.
Patrick Meenan、Steve Souders 等 Web 性能老兵不停指出:如果你的网站性能瓶颈是 JS 执行、SSR 等待、第三方脚本,HTTP/3 帮你的部分微乎其微。这是真的。HTTP/3 抬升的是分布的下限——P95、P99 的弱网用户体验。如果你的产品根本没有 P95 弱网用户(比如你只服务美国/欧洲城市光纤),花精力上 H3 的 ROI 接近零。
Patrick Meenan, Steve Souders and other web-perf veterans keep pointing out: if your bottleneck is JS execution, SSR wait, or third-party scripts, HTTP/3 helps you very little. True. HTTP/3 lifts the floor of the distribution — P95/P99 weak-link users. If your product has no P95 weak-link users (e.g. you only serve fiber-grade US/EU cities), the ROI of switching to H3 is near zero.
WebTransport · MASQUE · MoQ · HTTP/4?
WebTransport · MASQUE · MoQ · HTTP/4?
HTTP/3 不是终点——它是 QUIC 这个"通用安全传输"找到的第一个杀手应用。QUIC 上正在长出一片新协议生态。下面是 2026 年的四个方向。
HTTP/3 isn't the finish line — it's the first killer app of QUIC as a "generic secure transport". A whole protocol ecosystem is growing on top. Here are 2026's four directions.
WebSocket 跑在 HTTP/1.1 Upgrade 上,有 TCP head-of-line,无可靠/不可靠混合、不适合 RTC。WebTransport over HTTP/3(W3C WebTransport API + draft-ietf-webtrans-http3)给浏览器开放:(a) 可靠双向流;(b) 不可靠 datagram(RFC 9221)。Chrome 自 97 起原生支持,ALPN 复用 h3。云游戏 / 在线协作 / 实时翻译已经开始迁。
WebSocket runs on HTTP/1.1 Upgrade, inherits TCP HOL, lacks a mixed reliable/unreliable channel, and is awful for RTC. WebTransport over HTTP/3 (W3C WebTransport API + draft-ietf-webtrans-http3) exposes to browsers: (a) reliable bidi streams; (b) unreliable datagrams (RFC 9221). Chrome shipped support in 97; ALPN reuses h3. Cloud gaming, collaboration, live translation are migrating.
Apple iCloud Private Relay(iOS 15+ 的 iCloud+ 功能)是目前最大量产的 MASQUE 实战。它的核心架构不是"用 H3 加密一下",而是故意把信任切两半:
iCloud Private Relay (iOS 15+ as part of iCloud+) is by far the largest production MASQUE deployment. Its core trick isn't "tunnel things in H3" — it's deliberately splitting trust into two halves:
三件关键事实:
CONNECT-UDP(RFC 9298)建 H3 隧道;隧道内载荷再用 capsule(RFC 9297)打包传给出口。Three things to know:
CONNECT-UDP (RFC 9298) to set up the H3 tunnel; payloads inside are wrapped in capsules (RFC 9297) and forwarded to the egress.这是 MASQUE 至今最大、唯一商用规模的部署。它没有用 CONNECT-IP(更激进的整 IP 包封装),只用 CONNECT-UDP——Apple 不需要 VPN 全包代理的语义,只需要让 Web 流量"看起来都是同一个 IP 发出的"。剩下的 CONNECT-IP 用例(VPN 替代)还在等下一波。
This is the largest — and so far only commercial-scale — MASQUE deployment. Notably it uses only CONNECT-UDP, not CONNECT-IP (the more aggressive whole-IP-packet tunnel). Apple doesn't need full VPN semantics; it just needs web traffic to "look like it comes from one IP". The CONNECT-IP use case (full VPN replacement) is still waiting for the next wave.
HLS / DASH 延迟 5-30 秒;WebRTC 延迟 100ms 但太重、不好缓存。Media over QUIC(IETF MoQ WG 推进中)目标是亚秒级延迟 + CDN 可缓存,发布/订阅模式。预期取代体育直播、低延迟视频、合作直播的传输层。Cloudflare 已经把它内置进 Workers。
HLS / DASH have 5-30s latency; WebRTC is 100ms but heavy and uncacheable. Media over QUIC (IETF MoQ WG in progress) targets sub-second latency + CDN-cacheable, with a pub/sub model. Slated to replace transport for sports streaming, low-latency video, collaborative live. Cloudflare already ships it inside Workers.
IETF 当前的共识:未来 5-10 年内不会有 HTTP/4。理由很务实:HTTP/3 + QUIC 的扩展机制(Datagram、KEY_UPDATE、ALPN、可插拔 cc、TLS extension)已经足够柔软。要加东西(抗量子加密、FEC 前向纠错、新拥塞算法),都可以作为扩展挂在 QUIC 上,不需要新的主版本号。所以 H3 大概率会像 IPv4 那样长寿。
Current IETF consensus: no HTTP/4 in the next 5-10 years. Practical reason: HTTP/3 + QUIC's extension mechanisms (Datagram, KEY_UPDATE, ALPN, pluggable cc, TLS extensions) are flexible enough. Adding things — post-quantum crypto, FEC, new cc algorithms — all fit as QUIC extensions; no new major version needed. So H3 will likely live like IPv4 — for decades.
"我们花三十年做了一个能装下未来三十年的传输层。" "Thirty years of work for a transport layer that can hold the next thirty." Lars Eggert · IETF QUIC WG · 2022
这一节就是给你撕下来贴墙的
the page you'd print, pin, and screenshot
读完 24 章是一回事,下一次给同事讲清楚是另一回事。下面 10 条是这篇文章里最反直觉、最值得带走的事实——每一条都标了对应章节和最关键的 RFC 锚点。
Reading 24 chapters is one thing; explaining it cleanly to a colleague is another. The ten facts below are the most counter-intuitive takeaways — each pinned to its chapter and the single most important RFC anchor.
从你按下回车,
到屏幕上跳出 200 OK,
HTTP/3 用 13 步把一次请求
封装成一个 UDP 包,
跨过四层加密,
在一个 RTT 里完成。
From the moment you press Enter,
to the moment 200 OK appears,
HTTP/3 wraps a request in a UDP datagram,
crosses four encryption layers,
and finishes in a single RTT —
in thirteen movements.