A stream of bytes from the network has to cross thirteen stages, three processes and four property trees before it can light a single pixel. This is a field map of Chromium's rendering pipeline.
AUTHORAiringTOPICBlink · cc · Viz · GPUFORMATLong Read
渲染流水线 · 13 阶段Rendering pipeline · 13 stages▸ live pulse
To users, a "browser" is a single product. Open the chest cavity and you see a set of replaceable parts. Before walking through Chromium's pipeline, fix two formulas in your head — they are the skeleton the rest of this story hangs on.
A quiet fact climbs out of the table: apart from Firefox and the long-buried IE, the mainstream browser world has converged on either Blink + V8 or WebKit + JavaScriptCore. It was a silent annexation.
所谓"浏览器之争",
最后只剩两个引擎在赛跑。
Field Note · 02
The famed «browser war»,
ended with only two engines still on the track.
Field Note · 02
渲染引擎是什么WHAT IS A RENDERING ENGINE解析 HTML / CSS / JS 后把页面"画"出来。Firefox 的渲染引擎 Gecko 内部至少包含十几个工作组:document parser、layout engine、style system、JS runtime、image library、networking (Necko)、平台图形适配、字体库、安全库 (NSS)…… 渲染引擎从来不是一个东西,它是一座工厂。Parses HTML / CSS / JS and draws the page. Firefox's Gecko alone bundles a dozen workgroups: a document parser, layout engine, style system, JS runtime, image library, networking (Necko), platform graphics adapters, font library, security library (NSS)… A rendering engine is never one thing — it is a factory.
In 2001 Apple lifted WebKit out of KDE's KHTML. Seven years later Google lifted its first Chromium engine out of WebKit. Five years after that, Google forked again — this time into Blink. The line is still alive today.
Plot the 22 years as a sequence of forks and you get the family tree below — a single picture of an engine being split, inherited and renamed.
FIG 02WebKit 家谱:实线为继承,虚线为 fork。Blink 至今仍带着大量 Apple 与 WebKit 的影子代码。The WebKit family tree: solid lines for inheritance, dashed lines for forks. Blink today still carries vast tracts of Apple-and-WebKit ghost code.
三个分裂时刻
Three moments of divergence
01
2001 · Apple 拿走 KHTML
2001 · Apple lifts KHTML
为 Mac OS X 自己做一个浏览器,Safari 1.0 把 KHTML 重写为 WebKit。
A browser for Mac OS X. Safari 1.0 ships with KHTML rewritten into WebKit.
Chromium ships with WebKit, but is born inside a multi-process architecture — the body type that would determine the fork to come.
03
2013 · Google fork 出 Blink
2013 · Google forks Blink
不为新功能,为"减负"——Blink 第一次大改是删除 8000 个文件、450K 行代码。
Not about adding features — about losing weight. Blink's first big change deletes 8 000 files and 450 000 lines of code.
兼容性COMPATIBILITYWeb Platform Tests 的 Interop 报告里,Blink 长期处于第一梯队。可以理解为:浏览器之争虽然分了两边,但"网页"这个标准本身仍是一个共识——Blink 通过它的方式向 WebKit 致敬,并且超过了它。Web Platform Tests consistently rank Blink in the leading tier. Read it like this: the browser war is split into two camps, but «the web» itself remains a shared standard — Blink's way of saluting WebKit, while quietly overtaking it.
The rendering engine decides what a page looks like; the JS engine decides what a page does. The two are neighbours — the JS engine usually runs as a module inside the rendering engine, yet stays independent enough to be lifted into Node.js, embedded firmware or IoT.
The names worth knowing:
GOOGLE
V8
C++ · 开 JIT 后吊打全场 Chromium / Node.js / Android WebViewC++ · with JIT, leaves the rest behind Chromium / Node.js / Android WebView
APPLE
JavaScriptCore
系统级 API 暴露给 iOS App (JIT 在 App 端被关闭)A system API exposed to iOS apps (JIT disabled in app sandbox)
MOZILLA
SpiderMonkey
最早的 JS 引擎之一 Firefox 的心脏One of the oldest JS engines The heart of Firefox
FACEBOOK
Hermes
为 RN 而生 · 直接吃字节码 无 JIT,但首屏 TTI 优秀Built for RN · loads bytecode directly No JIT, but excellent cold-start TTI
Place the two side by side and you see the trade-off everyone is balancing on: V8 trades double-digit megabytes of runtime for top performance; QuickJS trades performance for 210 KB and embeddability. Hermes walks the middle line — «fast cold-start, no JIT».
为什么移动端关掉 JITWHY MOBILE TURNS JIT OFFJIT 预热时间长 → 首屏慢;JIT 也会增加包体积与内存。当系统的 sandbox 不允许动态生成可执行内存(比如 iOS App),JIT 干脆没法用。这就是 JSCore-iOS 不开 JIT 的根本原因。JIT warm-up is long → cold-start regresses; JIT also bloats binary size and memory. When the sandbox bans writable-executable pages (iOS apps), JIT is simply impossible. That's why JSCore on iOS apps cannot enable it.
"JIT" inside V8 is not one thing — it's four tiers. As a function heats up, V8 promotes it through progressively more expensive but faster implementations:
FIG 03V8 的四层 JIT 流水线。函数边跑边升格:Ignition 跑前几次执行,热了进 Sparkplug,更热进 Maglev,极热进 TurboFan;类型假设一旦失效,整段被 deopt 退回 Ignition。Maglev 是 Chrome 117(2023)新加的中间层,把"从慢到快"的台阶从 2 级补到 4 级——以前 Sparkplug → TurboFan 一步跨太大。V8's four-tier JIT pipeline. A function is promoted as it heats up: Ignition for the first few runs, then Sparkplug, then Maglev, then TurboFan; once a type assumption breaks, the whole frame is deopted back to Ignition. Maglev is Chrome 117 (2023) new — bridging the "cold to hot" staircase from 2 steps to 4. The old Sparkplug → TurboFan was too steep.
Key design: compilation runs on background threads (--concurrent-recompilation, default-on). Main thread only runs Ignition (zero compile overhead); background threads spot the hot frames and compile, then atomically swap a pointer in the dispatch table to the new version — next call hits Sparkplug/Maglev/TurboFan with no stop-the-world. This is how V8 "gets faster while running" without stalling.
隐藏类 + 内联缓存 · 让动态语言变成"近静态"
Hidden classes + Inline caches · making a dynamic language "nearly static"
JS is dynamic — an object's shape can change at any moment. Yet in memory, V8 quietly assigns each "property set" a HiddenClass (Map), so property access can fetch by offset like a C++ struct. Combined with Inline Caches (IC), a single obj.x access can skip any dictionary lookup:
HIDDEN CLASS · A WORKED EXAMPLEv8/src/objects/map.h
This is why "initialise properties in consistent order" is V8's golden rule. React/Vue's createElement, class field initialisation, the order of assignments in a constructor — all of these directly affect IC hit rate. Each step down the IC ladder costs an order of magnitude: Monomorphic ≫ Polymorphic ≫ Megamorphic.
Orinoco · V8 的代际 GC
Orinoco · V8's generational GC
V8 的堆按"年龄"分两块:Young Gen(新对象,~1-8MB)+ Old Gen(老对象,几十-几百 MB)。绝大多数对象朝生夕死(分配后几毫秒就没引用),不值得用昂贵的 mark-sweep——所以 Young Gen 用Scavenger 半空间复制 GC,只扫存活对象,死对象不需要被"收集"——自动消失。
V8's heap is split by age: Young Gen (new objects, ~1-8MB) + Old Gen (long-lived, tens to hundreds of MB). The vast majority of objects die young (no refs within milliseconds of allocation), not worth the cost of mark-sweep — so Young Gen uses Scavenger semi-space copying GC, scanning only live objects; dead objects need no "collection" — they vanish.
对前端意味着什么: 频繁创建临时对象(函数返回值、Array.map、JSX 重渲染)不可怕——Scavenger 1ms 就清完。真正的杀手是"本来该死掉但意外被引用住"的对象——它们晋升到 Old Gen,触发 Major GC 时全堆 mark,可能 stall 主线程几十毫秒。Memory leak 在 V8 里的体征不是 OOM,是"每隔几秒卡顿一下"——那是 Major GC 在抢 Main thread。
What this means for web devs: creating heaps of short-lived objects (function return values, Array.map, JSX re-renders) is fine — Scavenger sweeps them in 1ms. The real killer is "objects that should have died but accidentally stayed referenced" — they're promoted to Old Gen, triggering Major GC, marking the whole heap, possibly stalling Main for tens of milliseconds. A memory leak in V8 doesn't show as OOM — it shows as "jank every few seconds" — that's Major GC stealing the Main thread.
V8 与渲染流水线的"邻居关系"
V8 and the rendering pipeline · the "neighbour" relationship
V8 is not a "stage" of the rendering pipeline, but it squats inside every stage — JS handlers fill the gaps left by Style/Layout/Paint on the Main thread. A typical script for one 16.7ms frame:
MAIN THREAD · 16.7 MS · WHO RUNS WHENv8 ↔ blink ↔ cc
0 ms ─▶ vsync · Compositor 给 Main 发 BeginMainFrame 0 ms ─▶ V8: 跑事件回调(click/keypress/setTimeout) 2 ms ─▶ V8: microtask queue(Promise.then) 3 ms ─▶ Blink: Style + Layout + Pre-paint + Paint 6 ms ─▶ cc: Commit · Main 阻塞 ~1ms 7 ms ─▶ V8: requestAnimationFrame 回调(动画 / 渲染前最后一次改 DOM) 9 ms ─▶ V8: requestIdleCallback / scheduler.postTask 低优任务 14 ms ─▶ V8: idle · 等下一个 vsync// V8 在 [0,3) [7,9) [9,14) 三段窗口里// 一段 JS 长任务 50ms → 阻塞 3 帧 → INP > 200ms
A 16.7ms frame budget already reserves ~6ms for Style/Layout/Paint, leaving ~10ms for JS. A single JS task over 50ms = crosses 3 frames — the browser's Performance Observer flags it as a "Long Task"; Web Vitals counts the INP for that input as actual time from input to next-paint, instantly 200ms+. Chrome 122's scheduler.yield() + scheduler.postTask({ priority }) exist precisely for this — to actively slice long tasks.
V8 vs JSCore vs Hermes · 设计哲学的三条路
V8 vs JSCore vs Hermes · three design philosophies
Three engines take three roads: V8 pushes "peak speed in long sessions" to the limit; JSCore uses an LLInt interpreter (executable in the sandbox) to compensate for iOS's no-JIT rule; Hermes moves parse + bytecode-gen to build time — the APK ships bytecode directly, the app skips parsing on launch. There is no "best" engine, only the "best fit for this scenario".
2024 +Maglev 默认开启 + Sparkplug 提前编译: Chrome 117+ 默认开 Maglev,把"中等热度函数"的性能从 5× 拉到 30×。Chrome 121 起还引入Compile Hints(Magic-Bytecode 注解),允许网站用 HTTP header 告诉 V8 "这些脚本进 Sparkplug 不要再等",把首屏 JS 启动再砍 ~30%。Maglev default-on + Sparkplug-eager compilation: Chrome 117+ ships Maglev default, lifting "warm function" performance from 5× to 30×. Chrome 121+ added Compile Hints (Magic-Bytecode annotations) — sites can tell V8 via HTTP header "compile these scripts straight to Sparkplug, don't wait", trimming cold-start JS by another ~30%.
V8 doesn't just run JS — it also runs WebAssembly (Wasm). These two pipelines share the V8 process but virtually nothing else: different bytecode formats, different compilers, different heaps, different optimisation philosophies. Wasm carries types (i32/i64/f32/f64); no IC feedback needed. No GC (linear memory is manually managed), so the entire Orinoco machinery is absent on the Wasm side.
Liftoff's "streaming" is its sharpest trick: as Wasm bytes download from the network, V8 compiles concurrently — decoder fires on the first byte, Liftoff compiles each function the moment its boundary arrives, and the main entry-point can be ready before the byte stream finishes. A 10MB Wasm bundle compiles end-to-end in ~50ms on a GHz CPU; JS would need V8 to walk bytecode → optimization, easily hundreds of ms.
Wasm 与渲染流水线 · 谁占 Main thread?
Wasm and the rendering pipeline · who owns Main?
Wasm 跑在 V8 上 → V8 跑在 Render Process 的 Main thread → 所以默认情况下 Wasm 与 JS 与 Style/Layout/Paint 共用同一个 Main thread,互相阻塞。但 Wasm 可以做JS 做不到的事:
Wasm runs on V8 → V8 runs on the Render process's Main thread → by default Wasm, JS and Style/Layout/Paint all share that single Main thread and block each other. But Wasm can do what JS cannot:
True multi-threading · via SharedArrayBuffer + Atomics + Web Workers, Wasm allows "Main triggers → Worker runs Wasm compute" in true parallel. Worker threads aren't the Main thread, so Wasm compute can run truly parallel with rendering on Main — something JS cannot achieve (JS Workers can't touch the DOM directly; Wasm Workers do pure compute, never needed the DOM anyway).
SIMD · Wasm's 128-bit SIMD (v128) is explicit vectorisation. One SIMD add handles 4 float32 or 2 float64 — perfect for image processing, ML inference, crypto. JS has no SIMD (the SIMD.js proposal died years ago).
Predictable performance · no GC, no deopt → Wasm functions take nearly the same time on every call. Decisive for real-time audio/video (WebRTC codecs, AudioWorklet) — JS occasionally stalls for 50ms, Wasm never.
Figma compiles its rendering engine (written in C++) into Wasm; the DOM has one canvas; all graphics, layout, fonts are computed by Wasm. Photoshop Web (Adobe + Chrome team collaboration) does the same. Google Earth runs 3D terrain in Wasm. For these apps, "most CPU work is not on the Main thread" — Wasm runs on Worker threads, Main only pastes the result into a canvas (the cc::TextureLayer path).
2024 +Wasm GC + JSPI: 2024 后 Wasm 引入原生 GC(--experimental-wasm-gc,Chrome 119+ 默认开),让 Java/Kotlin/Dart 编译到 Wasm 而不必带自己的 GC。JSPI(JavaScript Promise Integration)让 Wasm 可以"挂起 + 等 Promise + 恢复",写出像 async/await 的同步式代码,但底下走 JS 异步——把 Wasm 与 Web 异步生态彻底打通。Wasm GC + JSPI: 2024 introduces native GC for Wasm (--experimental-wasm-gc, default-on Chrome 119+), letting Java/Kotlin/Dart compile to Wasm without bundling their own GC. JSPI (JavaScript Promise Integration) lets Wasm "suspend + await a Promise + resume", writing synchronous-looking code that runs on JS's async machinery — fully bridging Wasm with the Web's async ecosystem.
Boot Chromium and you don't get a process — you get a city. A capital (Browser), a few suburbs (Render), an airport (Viz), some factories (Utility / Plugin). The map is what makes "one crashing tab won't take down the browser" possible.
Three districts are relevant to rendering: Browser · Render · Viz. The thread roster of each:
Imagine three tabs: foo.com / bar.com / baz.com. Inside foo.com, two iframes point at foo.com/other-url and bar.com. The cross-site iframe spawns an additional Render Process.
Note that the bar.com iframe in Tab 1 and Tab 2's bar.com share the same render process (same-site reuse), but live in a different process from Tab 1's foo.com because they're cross-site. Site Isolation moved the "render island" boundary from per-tab to per-site.
Viz 是干什么的WHAT VIZ ACTUALLY DOESViz 是 GPU 进程里的渲染合成服务("Viz Process" 与 "GPU Process" 在新版 Chromium 是同一个进程,Viz 是它托管的服务)。它收 Render Process 与 Browser Process 各自产出的 viz::CompositorFrame(CF),用 SurfaceAggregator 合并,然后用 GPU 把结果显示在窗口上。所有屏幕上看到的画面,最后都是 Viz 写出去的。The Viz process is where compositing and display converge. It accepts viz::CompositorFrame (CF) from every Render and the Browser process, merges them via SurfaceAggregator, and pushes the result onto the screen via GPU. Whatever you see on screen — Viz wrote it.
Site Isolation 简史 · Spectre 怎么改了进程模型
A short history of Site Isolation · how Spectre rewrote the process model
"One Render process per site" sounds like a founding design choice — it isn't. Before 2018, Chromium's process model was per-tab (one process per tab; cross-origin iframes shared their parent's process). It was a three-way compromise between performance / memory / security — per-tab gave enough "islands", iframe sharing saved a process per embed. Then January 2018 happened.
FIG 04Site Isolation 简史。2018 年 Spectre 公布是分水岭——攻击者可以在 JS 里通过分支预测旁路读到同进程任何地址的内存,这意味着同进程里的跨域 iframe 不再安全。Chromium 团队用 4 个月 + 数百个 bug fix 把渲染进程的边界从 Tab 重画到站点(eTLD+1),Chrome 67 默认开启,代价是 +10-13% 内存。A short history of Site Isolation. The 2018 Spectre disclosure was the watershed — attackers could leak memory at any address in the same process via branch-prediction side-channels in JS, meaning a cross-origin iframe in the same process was no longer safe. The Chromium team spent 4 months + hundreds of bug fixes redrawing the renderer-process boundary from Tab to site (eTLD+1); Chrome 67 turned it on by default at a cost of +10-13% memory.
为什么 Spectre 把"同进程 iframe"打成不安全
Why Spectre made "same-process iframe" unsafe
现代 CPU 用分支预测提前执行可能用到的指令——预测错了就回滚。Spectre 的核心:预测错了的指令虽然不"提交",但留下的 cache 痕迹可以被旁路探测。在 JS 里写一段精心构造的 if 分支,可以哄 CPU 投机访问同进程任意地址的内存,然后通过 cache 命中时间反推那个字节的值——本来 SOP(同源策略)说"你不能读跨域 iframe 的 DOM",但 Spectre 说"我直接读它在物理内存里的位置"。
Modern CPUs use branch prediction to speculatively execute likely-needed instructions — wrong predictions get rolled back. Spectre's insight: even rolled-back speculation leaves cache traces that can be probed. Carefully constructed JS branches can trick the CPU into speculatively reading any memory address in the same process, then recover the byte value via cache-hit timing. The Same-Origin Policy says "you can't read a cross-origin iframe's DOM" — Spectre says "I'll just read its physical memory location directly".
Site Isolation's fix is brutally simple: put cross-origin iframes in different processes. You can't read cross-origin content from your process — the bytes aren't in your address space. The cost: every embedded cross-origin iframe (ads, social buttons, third-party widgets) adds a Render Process; per-tab memory grew 10-13%. This is why "embed Baidu Analytics" code post-2018 spawns an extra Render process for your page.
SharedArrayBuffer (SAB) is the linchpin of JS multi-threading, but post-Spectre every browser killed it overnight — SAB itself provides high-precision timing (atomic counters), exactly what Spectre's side-channel needs. Two years later, the COOP/COEP protocol resurrected it:
实际后果: Figma / Photoshop Web / 任何用 Wasm 多线程的应用必须 把这套 header 全配齐。否则 SAB 不能用,Wasm 多线程就成了空壳。这就是为什么 crossOriginIsolated 是 2024 年高性能 Web App 的入场券。
The practical consequence: Figma / Photoshop Web / any Wasm-multi-threaded app must ship the full header set. Without them, SAB doesn't work, and Wasm threads become a hollow shell. This is why crossOriginIsolated is the 2024 admission ticket for high-performance Web apps.
2024 +Origin-Agent-Cluster · 进程隔离再细一档: Chrome 88+ 起,网页可以发 Origin-Agent-Cluster: ?1 header 主动要求"把我和同站点(eTLD+1)的其他源也分到不同进程"。比如 a.example.com 和 b.example.com 默认共用进程(同站),开了这个 header 就分开。Site Isolation 是默认,Origin Isolation 是高安全场景的 opt-in。Origin-Agent-Cluster · finer-grained isolation: Since Chrome 88+, a page can opt-in via Origin-Agent-Cluster: ?1 to demand "isolate me even from same-site (eTLD+1) origins". Default: a.example.com + b.example.com share a process (same site); with this header they split. Site Isolation is the default; Origin Isolation is the opt-in for high-security scenarios.
延伸阅读FURTHER READING
官方设计文档 · "Site Isolation"
Official design docs · "Site Isolation"
想看大型重构怎么落地,Chromium 团队把整套 Site Isolation 的设计与回顾公开放在 chromium.org/Home/chromium-security/site-isolation:从威胁模型、跨进程边界(document.domain、window.opener、剪贴板事件)到性能数据,全摊开。配合 Charlie Reis 的 USENIX Security 2019 论文《Site Isolation: Process Separation for Web Sites within the Browser》一起读,就是近距离看一次大型重构是怎么落地的。
If you want to watch a major refactor land in close-up, the Chromium team published the full Site Isolation design + retrospective at chromium.org/Home/chromium-security/site-isolation: threat model, every cross-process boundary (document.domain, window.opener, clipboard events), performance numbers — all in the open. Pair it with Charlie Reis' USENIX Security 2019 paper «Site Isolation: Process Separation for Web Sites within the Browser» — that's a textbook view of how this kind of "major surgery" actually lands.
底部色带:6 段线程——Network → Main → Compositor → Raster → Compositor → Skia
The rendering pipeline is a chain that turns network bytes into pixels. Chromium cuts the chain into thirteen stages — sliced across three processes, owned by three modules, run by six thread segments.
The master diagram below is the map for every chapter that follows. Four layers, all at once:
The master map answers "who works where", but not a deeper question: how long does each artifact live? what's reusable across frames? The figure below charts the lifelines of 11 core data structures across the 14 stages, colour-coded by cacheability — green = cached across frames (cheap), yellow = partially cached (fragile), red = born fresh every frame (expensive). After reading it you can work backward: why is mutating a transform cheap? Because only red-zone LayerImpl properties refresh — everything else stays green.
FIG 05B数据结构生灭线 · 横轴 14 阶段,纵轴 11 种核心数据结构。左偏的产物大多是绿的(DOM / ComputedStyle / Property Trees / DisplayItemList / SharedImage 都跨帧持久),右尾的产物大多是红的(CF / Quad / Aggregated CF 每帧重生)。 这就是流水线设计的中心思想——把"不必重算的东西" 推到尽可能左、尽可能绿。Data-structure lifelines · X-axis 14 stages, Y-axis 11 core artifacts. Left-biased artifacts are mostly green (DOM / ComputedStyle / Property Trees / DisplayItemList / SharedImage all persist across frames), right-tail artifacts are mostly red (CF / Quad / Aggregated CF born fresh each frame). This is the central design idea of the pipeline — push "what must not be recomputed" as far left and as green as possible.
用一个钥匙词记住每一步
A keyword for each stage
#
阶段Stage
输入 → 输出(钥匙词)In → Out (keyword)
01
Parsing
bytes → DOM Treebytes → DOM Tree
02
Style
DOM Tree → Render Tree(带 ComputedStyle)DOM Tree → Render Tree (with ComputedStyle)
03
Layout
Render Tree → Layout Tree(带几何)Render Tree → Layout Tree (with geometry)
04
Pre-paint
Layout Tree → Property TreesLayout Tree → Property Trees
流水线的每一道工序,都不是为了"好看"——
而是为了把"重新计算什么"的范围, 压缩到尽可能小。
Field Note · 02
Each stage exists not for elegance —
but to shrink the surface of what has to be recomputed when something changes.
Field Note · 02
MAIN-LINE EXAMPLE
主线例子 — 一张名片的旅程
The running example — one business card's journey
每一章都会回到这张卡
the card we'll watch through every stage
抽象的流水线总让人忘——3 个进程、6 段线程、13 道工序,看完合上书一个数字也记不住。所以从下一章开始,每一章顶部都会有一段"主线 · The Card 在这一步",跟踪同一张名片在该工序后的形态。
这就是那张卡——Airing 的名片:
An abstract pipeline always slips out of memory — 3 processes, 6 thread segments, 13 stages, and an hour later you can't recall a single number. So from the next chapter onward, each chapter opens with a "Main-line · The Card after this stage" block, tracking what happens to one business card at every step.
FIG 05.5主线例子。整张卡 20 行 HTML+CSS,所有 13 道工序、4 棵属性树、3 个进程都能从这张卡上讲出来。把鼠标悬到 关注 按钮上——这次 hover 动画的整条路径,主线程一根毛都不动。The running example. Twenty lines of HTML+CSS, yet every one of the 13 stages, 4 property trees and 3 processes can be told through this single card. Hover over Follow — the entire animation path runs without touching the Main thread.
:hover mutating both transform and background → the Display finale: transform stays on the Compositor, background drags Main back in. Real code rarely produces 100% Compositor-pure animation
卡片在 13 步里的形态地图
Stage-by-stage transformation map
#
阶段Stage
这张卡的当前形态The card after this stage
00
Network
HTML 字节流到达 + airing.png 由 PreloadScanner 抢跑发出请求HTML bytes arrive + airing.png fired early by the PreloadScanner
01
Parsing
11 个 token → DOM 栈最深 4 层 → 6 节点的 DOM 树11 tokens → DOM stack 4 deep → a 6-node DOM tree
02
Style
5 条规则分到 3 张 RuleMap;每节点挂 ComputedStyle5 rules split across 3 RuleMaps; ComputedStyle attached to each node
03
Layout
LayoutNGFlexibleBox(card 340×88)双遍布局完成LayoutNGFlexibleBox(card 340×88) finishes its two passes
04
Pre-paint
.follow 在 Transform tree 出节点;Effect tree 多 2 个节点(shadow + gradient).follow gets a Transform tree node; Effect tree gains 2 nodes (shadow + gradient)
05
Paint
DisplayItemList 约 12 项;Save/ClipRRect/Restore 围住头像~12-entry DisplayItemList; Save/ClipRRect/Restore wrap the avatar
06
Commit
2 棵 cc::Layer:主图层 + .follow 独立图层Two cc::Layers: the main one + a dedicated one for .follow
07
Compositing
.follow 因 will-change 升格成独立 GraphicsLayer.follow is promoted to its own GraphicsLayer thanks to will-change
每块 tile playback DisplayItemList;头像走 ImageDecodeCacheeach tile plays back its slice of the DisplayItemList; avatar goes via ImageDecodeCache
10
Activate
Pending Tree → Active Tree;tile 全部 readyPending Tree → Active Tree; all tiles ready
11
Draw
主层吐 2 个 TileDrawQuad;.follow 1 个 TileDrawQuad;shadow 触发独立 RenderPassmain layer emits 2 TileDrawQuads; .follow emits 1; the shadow triggers a separate RenderPass
12
Aggregate
(变体)如果嵌入第三方页面,父用 SurfaceDrawQuad 引用(variant) if embedded as OOPIF, parent references via SurfaceDrawQuad
13
Display
SwapBuffers 上屏。hover 时 transform 走 Compositor,background 把 Main 拉回SwapBuffers to screen. On hover, transform stays on Compositor while background drags Main back in
怎么用这张表HOW TO READ THE TABLE这张表是检索表。读完整篇你应该能反过来用——看到一张卡片,你能预言它会在每一步被处理成什么样。如果某一行你想不起来"为什么",就翻回那一章读"主线 · The Card 在这一步"那段。This table is an index. After finishing the article you should be able to use it in reverse — see any card and predict what happens to it at every stage. If a row stops making sense, jump back to that chapter's "Main-line · The Card after this stage" block.
STAGE 00 · NETWORK
Loading — bytes 到达之前的事
Loading — what happens before the first byte
网络线程 + Mojo IPC + 抢跑机制
network thread, mojo IPC, the preload scanner head-start
HTML bytes stream through Mojo IPC from NetworkService into blink::DocumentLoader. At the same moment, HTMLPreloadScanner spots airing.png ahead of the main Parser — the avatar request is already on the wire. The card isn't a "card" yet, but its makeup is half-downloaded.
Module
network_service
Process
Browser
Thread
Network ×N
Output
bytes → Renderer
这一步在做什么
What it does
Browser Process 的 NetworkService 通过 Mojo IPC 把 HTML 字节流推给 Render Process 的 blink::DocumentLoader。同时,在 Render Process 里跑的 HTMLPreloadScanner 抢在主 Parser 之前发现 <img> / <link rel="stylesheet"> 的 URL,反过来再向 Browser 申请第二批资源——主 HTML 还没解析完,sub-resource 已经在路上。Browser-process NetworkService streams HTML bytes to Render-process blink::DocumentLoader via Mojo IPC. Meanwhile, the in-Render HTMLPreloadScanner races ahead of the main parser, spots <img> / <link rel="stylesheet"> URLs and asks Browser for the next batch of bytes — sub-resources are on the wire before the main HTML is even fully parsed.
为什么要算一步
Why count it as a stage
原文把 13 步从 Parsing 数起,把 Loading 隐进了 Browser Process 那个虚框。但整条流水线的 P50 延迟,80% 来自这一段——首字节没到,后面 13 步连开始的资格都没有。这一节正式把它拉进流水线。The original counts 13 stages starting from Parsing and folds Loading into the Browser-process box. But 80% of the pipeline's P50 latency lives here — until the first byte arrives, the next 13 stages cannot even start. This chapter promotes it to a first-class stage.
从 DNS 解析到第一字节交给 Renderer,整条链路跨越 Browser Process 的 NetworkService → Mojo IPC → Render Process 的 ResourceFetcher。下面这张图把链路展开:
From DNS lookup to the first byte arriving at the Renderer, the chain crosses Browser-process NetworkService → Mojo IPC → Render-process ResourceFetcher. The figure below unpacks it:
FIG 00Loading 阶段的真实拓扑。Browser 进程的 NetworkService 持有所有底层连接、cookie、缓存;Render 进程通过 Mojo DataPipe 接收字节,同时反向触发新的 sub-resource 请求。主 parser 在阻塞,PreloadScanner 在抢跑——这是 Chromium 首屏速度的秘诀。The real topology of Loading. Browser-process NetworkService owns all the lower-level connections, cookies and cache; the Render process receives bytes via Mojo DataPipe and, in the opposite direction, fires new sub-resource requests. The main parser blocks, the PreloadScanner races — this is the secret of Chromium's cold-start speed.
主线 · The Card 在这一步
Main-line · The Card after this stage
STAGE 00网络阶段Network stage
两个 URLLoader 并行飞着
Two URLLoaders in flight, side by side
主页 HTML 字节流(几 KB)由 URLLoader · main 流入 blink::DocumentLoader::DataReceived。当 HTMLPreloadScanner 在某个未阻塞的间隙扫到 <img class="avatar" src="airing.png">,立刻调 ResourceFetcher::PreloadStarted 反向通知 Browser 进程申请头像 png。头像的 GET 请求会在主 HTML 还在飞的时候就发出去。
The home HTML byte stream (a few KB) flows from URLLoader · main into blink::DocumentLoader::DataReceived. The moment HTMLPreloadScanner spots <img class="avatar" src="airing.png"> in an unblocked window, it calls ResourceFetcher::PreloadStarted back at the Browser process to fetch the avatar PNG. The avatar's GET request leaves before the main HTML even finishes downloading.
Early Chromium baked the network stack into the Browser process. Chrome 73 split it out as NetworkService — either in-process or as a standalone utility process. The split is not just engineering hygiene: a separate process means cookies and credentials live in their own sandbox. A pwned Render process can never read raw cookies — it can only ask Mojo for "the bytes of this URL", and NetworkService attaches the cookies on its behalf.
2024 +预连接 + 推测式 prefetch: 现在 Chromium 还会在用户悬停链接时,通过 chrome.predictors 启发式提前 DNS / TCP / TLS 握手,甚至预取 HTML——这一切发生在 Stage 0 之前的"Stage -1"。配合 <link rel="modulepreload"> 和 Speculation Rules API,首屏感知速度持续在压。Pre-connect + speculative prefetch: Chromium now uses chrome.predictors on link hover to pre-warm DNS / TCP / TLS, sometimes prefetching the HTML itself — happening before Stage 0, in what's effectively "Stage -1". Combined with <link rel="modulepreload"> and the Speculation Rules API, perceived cold-start keeps shrinking.
DEVTOOLS
Network 面板 · 看 Initiator 是不是 (preload)Network panel — check if Initiator is (preload)
诊断 3 件事:① Initiator 列里看 (preload) 标记 — 关键资源都该有这个标(说明 PreloadScanner 抢跑成功);若标的是 (parser),资源是被主 parser 顺路发现的,迟一拍② queued 段太长(本图 analytics.js 排了 55%) = 浏览器并发限制(同源 6 路)堵住了,关键资源被低优 JS 挤后面 — 用 fetchpriority="high" 解决③ TLS 段看是否走了 HTTP/2 multiplexing(同域多请求该共用连接),没走会看到每个请求都重做握手。3 things to diagnose:① Look at the Initiator column for (preload) tags — every critical resource should have one (PreloadScanner head-start succeeded); if it shows (parser), the resource was discovered by the main parser, one beat late② queued segment too long (here analytics.js queued 55%) = browser's per-origin concurrency cap (6) blocked it, critical resources stuck behind low-priority JS — fix with fetchpriority="high"③ TLS segment reveals HTTP/2 multiplexing usage (same-origin requests should reuse one connection); if not, every request redoes the handshake.
URL来自地址栏 / 链接点击 / APIfrom address bar / link / API
→
OUTPUT
bytes推给 Render Processpushed to Render process
STAGE 01 · DOC PHASE
Parsing — bytes 到 DOM Tree
Parsing — bytes to a DOM tree
字节如何长成一棵树
bytes → characters → tokens → DOM
主线 · The Card 在这一步
DOM 树长出来了 · 6 个节点
11 个 token 经历 4 层最深的栈深度,扭转成一棵 6 节点的 DOM 树:article → img + div → h2 + p + button。这就是名片的骨架——还没颜色、没尺寸,但树形已经定型。
Main-line · The Card after this stage
DOM tree is up · 6 nodes
11 tokens, a max stack depth of 4, become a 6-node DOM tree: article → img + div → h2 + p + button. This is the card's skeleton — no color, no size, but the topology is fixed.
Module
blink
Process
Render
Thread
Main
Output
DOM Tree
这一步在做什么
What it does
把网络线程吐出来的 bytes,一路扭转为一棵 DOM Tree,挂在 blink::TreeScope 上。Take the bytes coming out of the network thread and end up with a DOM tree hanging off blink::TreeScope.
为什么要分 5 步
Why five sub-stages
每一段输入都不只一种"形态"——bytes 来自字节流、characters 取决于编码、tokens 是 W3C 标准、Element 是 Blink 数据结构。把它们拆开,每段都能流式增量处理,也能复用——比如同样的 Tokenizer 可以喂给预扫描器(Preload Scanner)来提前发起请求。Every input has a distinct "shape" — bytes are a network stream, characters depend on encoding, tokens are a W3C standard, Element is a Blink data type. Splitting them lets each layer stream incrementally and be reused — the same tokenizer feeds the Preload Scanner to fire requests early.
Parsing 是 Main thread 的第一项工作:把 Browser Process 网络线程喂过来的 bytes,一路扭转成一棵活生生的 DOM Tree。中间的数据流可以拆成 5 段:
Parsing is the Main thread's opening act: take the bytes the Browser process's network thread hands over, and turn them into a living DOM tree. The data flow splits into five hand-offs:
Loading
bytes
Conversion
characters
Tokenizing
W3C tokens
Lexing
Element
DOM Build
DOM Tree
→→→
STAGE 01主线 · The Card 在解析后Main-line · The Card after Parsing
11 个 token,4 层栈,6 节点的 DOM 树
11 tokens, a stack 4 deep, a 6-node DOM tree
名片源码进 Tokenizer 后吐出 11 个 token,DOM 构造栈最深一刻压到 4 层(article → div.info → h2 / p / a 三选一)。结束时栈空,留下下面这棵 DOM 树:
Through the Tokenizer the card emits 11 tokens, with the construction stack peaking at 4 deep (article → div.info → h2 / p / a). When it empties, this DOM tree is left behind:
↑ 中间是真名片 HTML。每个标注框的颜色等于卡上对应元素的高亮色。Parsing 之后 DOM 树站起来了——骨架已成、还没穿衣。每个 element 都没有 ComputedStyle、没有 LayoutObject,img 还没解码。↑ The card in the middle is the real HTML. Each box's colour matches a highlight on the card. After Parsing the DOM tree is up — the skeleton stands but isn't dressed yet. No element has a ComputedStyle, no LayoutObject, and img is still un-decoded.
Notice: ComputedStyle is not yet attached (that's Style's job), the img isn't decoded (Raster's job), and the button has no idea it's about to be promoted into its own layer (Compositing's job). This bare DOM is the seed for every downstream stage.
边走边等PARSE ↔ FETCH ↔ EXECTokenizing 时碰到 <link> / <script> / <img>,会反过来发起新的网络请求;碰到 <script> 还要先把 JavaScript 跑完——因为 document.write() 可能会改写后面的 DOM。"边解析边等" 是 HTML 解析最贵的成本之一。Hit <link> / <script> / <img> mid-tokenizing and the parser fires new network requests; hit <script> and it must finish executing before resuming — because document.write() may rewrite what comes next. The «parse-and-wait» tax is one of the steepest costs of HTML parsing.
Read bottom-up: the network thread hands AppendBytes(char*) in, DecodedDataDocumentParser decodes by encoding into a String, and the result lands at the tokenizer's Append(String&) entry. Decoding follows the page's encoding (UTF-8 / GBK / ISO-8859-1) — get that wrong and everything downstream is wrong.
The key entrypoint: HTMLConstructionSite::CreateElement. Internally, a stack tracks currently-open Elements — HTML5's implicit close rules (a <div> appearing inside <p> auto-closes the <p>) are implemented through this stack:
HTML CONSTRUCTION SITE · STACK OPShtml_construction_site.h
HTML5 rule: a <p> is phrasing content; encountering a block element like <div>forces <p> to close first. The stack is silently mutated — when StartTag-div fires, the constructor pops <p> before pushing <div>. The result: what you wrote as <p><div>...</div></p> is in fact three sibling nodes in the DOM — <p></p> + <div></div> + an implicit empty <p></p>.
DOM 是用一个栈搭出来的
A stack is what builds the tree
Lexing 把 token 转成 Element 实例,"DOM construction" 用一个栈一边压一边出——开始标签压栈,结束标签出栈,最后栈空时这棵树也就建完了。
Lexing turns tokens into Element instances. "DOM construction" then walks a stack — start-tags push, end-tags pop. When the stack empties, the tree is finished.
栈 · 实时
Stack · live
DOM 树 · 增量
DOM tree · incremental
输入:<div><p><div></div></p><span></span></div>
Input: <div><p><div></div></p><span></span></div>
为什么用栈,不用链表? · 嵌套天然 LIFO,栈是规范要求的数据结构Why a stack, not a linked list? · nesting is naturally LIFO; the stack is the spec-required data structure
这是个很好的问题——4 个原因,从最直接到最深入:
A great question — four reasons, from the most immediate to the deepest:
HTML nesting is naturally LIFO. When the parser sees </div>, it needs the "most recent unclosed open tag with that name" — exactly the top of the stack. Stack pop is O(1); a linked list has to traverse to find the latest match, O(n). A 100KB page does thousands of push/pop pairs — O(1) vs O(n) is an order-of-magnitude gap.
HTML5 规范本身就用栈描述。W3C/WHATWG 的 HTML5 解析算法里有两个明文叫做"stack of open elements"和"list of active formatting elements"的数据结构。最复杂的那段——adoption agency algorithm(处理 <b><p>X</b>Y</p> 这种交叉嵌套的"错误恢复")——直接按栈的术语写规范。换成链表,你不仅要重写所有规范引文,逻辑也表达不出来了。
The HTML5 spec itself describes it as a stack. W3C/WHATWG's HTML parsing algorithm explicitly uses two data structures named "stack of open elements" and "list of active formatting elements". The hairiest piece — the adoption agency algorithm (error recovery for crossed nestings like <b><p>X</b>Y</p>) — is written directly in stack terminology. Switching to a linked list would force you to rewrite every spec quote, and the logic would no longer express itself.
Cache-friendly. Stacks typically live in a contiguous array (Blink's HTMLElementStack is internally a Vector<HTMLStackItem*>). One 64-byte cache line holds 8 pointers; push/pop runs in L1. Linked-list nodes scatter across the heap, so each next-pointer chase risks a cache miss — 5-10× slower in practice.
Stack depth = nesting depth, a free semantic index. Many HTML5 spec rules dispatch on "nesting depth": foster parenting inside <table>, banning <p> from nesting block elements, the implicit close of <option> inside <select>… Stack .size() answers in O(1). A linked list would need a separately maintained depth counter, with its own sync cost.
Flip the question: "When would a linked list be a better fit?" Answer: scenarios needing arbitrary mid-list insert/delete. HTML parsing rarely needs that (only the adoption agency algorithm's fragment-tree rearrangement, and even that's cheap on top of a stack). So the real answer to "why a stack, not a linked list?" is: the question is inverted — HTML nesting is a stack; a linked list would be the counter-intuitive choice.
Chrome DevTools Performance can show Parsing as a flame graph — but only the JS-side call stack. The C++ internals are a black box to it.
想看到 HTMLDocumentParser::AppendBytes → ... → HTMLConstructionSite::CreateElement 这一整条 C++ 栈,就必须用 Perfetto 录制——它不仅能拉出 C++ 调用栈,还能告诉你这个调用属于哪个线程,跨进程通信还会自动连线"发出端 → 接收端"的两个函数调用。
To see the full C++ stack HTMLDocumentParser::AppendBytes → ... → HTMLConstructionSite::CreateElement, you need Perfetto traces — they expose C++ stacks, tag each call with its thread, and even draw cross-process IPC as "sender → receiver" arcs.
"Tokenizing" sounds simple but is in fact a sprawling state machine spelled out in the W3C HTML5 spec — 80+ states, hundreds of transitions. HTMLTokenizer::NextToken is one giant switch that reads a character based on the current state and either emits a token or switches state. The most common edges:
HTML_TOKENIZER · STATE TRANSITIONSthird_party/blink/renderer/core/html/parser/html_tokenizer.cc
The hard problem this machine solves is error recovery. HTML5 spec describes "how to fix errors" with 24 "insertion modes" + a stack-based "original insertion mode" rewind — for instance, a <span> appearing inside <table> is mandated to be "foster-parented out of the table". That's why every browser parses bad HTML identically — they all follow this same spec.
Parsing's real "go-faster" trick lives in HTMLPreloadScanner. When the main parser is blocked on a <script> (waiting for JS to run), a second lightweight tokenizer continues scanning ahead on a side thread. The moment it sees <link rel="stylesheet"> / <img src> / <script src> it fires the network request early. By the time the main parser unblocks, the bytes are on the wire — sometimes already arrived.
This is what makes "HTML parsing" and "resource download" effectively parallel — and the real reason Chromium's cold-start is 30-50% faster than a "naive single-threaded parser". Those (Preload)-tagged requests you see in DevTools' Network panel? All fired by PreloadScanner ahead of time.
To see the Tokenizer state machine flip in real-time, the fastest path is Chromium's tracing: in chrome://tracing, enable the blink.parser category and reload — you'll see a time-aligned "state trace" with a colour block for every tag open/close. Here's roughly how it looks:
看 Main · Tokenizer 那条轨——每个色块是一次 token 触发,蓝/橙/绿对应不同 tag 类型;红色 <script> 期间 Tokenizer 完全冻结(主 parser 停了 6ms 等 V8 跑完);但同一时刻下面 PreloadScanner 那条还在偷偷扫,提前发了 app.js / avatar.png 的请求——上面 Network 轨里那两个棕条就是抢跑出去的。"parse-and-wait 的真实代价"在这张图里一目了然。Watch the Main · Tokenizer lane — each block is one token, blue / orange / green map to different tag types. During the red <script> the Tokenizer freezes (main parser stalls 6 ms while V8 runs); but at the same time the PreloadScanner below keeps scanning and fires app.js / avatar.png early — the two brown bars on the Network lane are those head-start requests. The real cost of parse-and-wait is right here in one picture.
DEVTOOLS
Performance > "Parse HTML" 段;Memory > Heap snapshot 看 DOM 节点数Performance > "Parse HTML" segment; Memory > Heap snapshot for DOM node count
bytes来自 Browser Process 网络线程from Browser network thread
→
OUTPUT
DOM Treeblink::TreeScopeblink::TreeScope
DEMO K · LIVE · 解析流水线全程DEMO K · LIVE · The full parsing pipeline
字节 → token → 栈 → DOM 树 · 一步一步看Bytes → tokens → stack → DOM tree, step by step
下面是一个迷你 HTML 解析器。点 下一步 一次走一个 token,看光标在源码上前进、token 列表长出来、栈 push/pop、DOM 树一节点一节点拼起来。或者点 自动播放 看完整流程。Below is a miniature HTML parser. Click step to advance one token at a time and watch the cursor crawl through the source, the token list grow, the stack push/pop, and the DOM tree assemble one node at a time. Or hit play to run it end-to-end.
这个 demo 把 5 个子环节都映射到了可见状态:(1) Loading · bytes 流入 — 由源码上方的标签代表;(2) Conversion · bytes → characters — 我们直接当 ASCII 跳过;(3) Tokenizing · characters → tokens — 看光标走过 + 左侧 token 列表;(4) Lexing · tokens → Element 实例 — 隐含在 StartTag 的 push 动作里;(5) DOM construction · 用栈搭树 — 中间的 stack + 右边的 tree。每一道工序都对应 Blink 源码里的一个具体函数:HTMLDocumentParser::Append → HTMLTokenizer::NextToken → HTMLConstructionSite::CreateElement → HTMLConstructionSite::AttachAtIncludingNonDistributedNodes。This demo maps all 5 sub-stages to visible state: (1) Loading · bytes flow in — represented by the source tag above; (2) Conversion · bytes → characters — skipped (assume ASCII); (3) Tokenizing · characters → tokens — see the cursor crawl + the token list on the left; (4) Lexing · tokens → Element instances — implicit in each StartTag push; (5) DOM construction · stack-based tree building — the middle stack + right-side tree. Every stage maps to a real Blink function: HTMLDocumentParser::Append → HTMLTokenizer::NextToken → HTMLConstructionSite::CreateElement → HTMLConstructionSite::AttachAtIncludingNonDistributedNodes.
STAGE 02 · DOC PHASE
Style — CSS 是从右到左读的
Style — CSS is read right-to-left
CSSOM 与反向匹配
CSSOM and right-to-left selectors
主线 · The Card 在这一步
每个节点都穿好衣服
5 条 CSS 规则被分到 3 张 RuleMap(class / id / tag)。每个 DOM 节点挂上一份 ComputedStyle——颜色、字体、间距、display 模式全部计算完毕。骨架穿好了衣服,但还没站起来。
Main-line · The Card after this stage
Every node gets dressed
5 CSS rules get split across 3 RuleMaps (class / id / tag). Every DOM node now carries a ComputedStyle — colors, fonts, spacing, display modes all resolved. The skeleton is dressed but hasn't stood up yet.
Module
blink
Process
Render
Thread
Main
Output
Render Tree
这一步在做什么
What it does
遍历 DOM Tree,每个节点跑一遍"哪些 CSS 命中我",把命中的样式合并 + 继承 + UA 默认值,最后挂一个 ComputedStyle——这就是 Render Tree。Walk the DOM tree. For each node, find which CSS rules match, then merge + inherit + UA-default them. Attach a ComputedStyle to the node — that's the Render Tree.
为什么不能跳过
Why not skip
CSS 是 render-blocking。一棵无样式的 DOM 渲染上屏,下一帧 CSS 一到又得整页重排——等是更便宜的。所以浏览器宁可白屏也要等 CSSOM。CSS is render-blocking. Drawing an unstyled DOM and re-layouting the second CSS arrives is more expensive than waiting — blank-screen is cheaper than a re-layout. The browser sits and waits for the CSSOM.
The Style Engine walks the DOM, matches against the CSSOM and attaches a ComputedStyle to every node. The output: a Render Tree. Core: Document::UpdateStyleAndLayout.
Three sub-stages: CSS load → CSS parse → CSS compute. Two counter-intuitive facts here decide the entire performance shape — selectors are read right-to-left and RuleMap shards by selector type.
STAGE 02主线 · The Card 在样式后Main-line · The Card after Style
5 条规则进 3 张 RuleMap,每节点挂 ComputedStyle
5 rules into 3 RuleMaps, ComputedStyle on every node
↑ 中间是真名片 HTML。Style 之后每个 element 都挂上了 ComputedStyle——骨架穿好了衣,但 (x, y, w, h) 还没有,Layout 还没跑。↑ The card in the middle is the real HTML. After Style, every element carries a ComputedStyle — the skeleton is dressed, but (x, y, w, h) is still unknown — Layout hasn't run yet.
每个 DOM 节点跑一遍"右往左反向匹配"——比如 article.card 上的".card 命中我吗"是 1 跳 hash 命中;给它合并 + 继承所有命中的属性,再套上 UA 默认值,挂出 ComputedStyle。最后一步 article.card 的 ComputedStyle 长这样:
Every DOM node runs the right-to-left match — for instance, article.card asks "does .card match me?" with a single hash hit. Then merge + inherit + UA defaults, and attach a ComputedStyle. After all that, article.card's ComputedStyle reads roughly:
// ComputedStyle · article.card
display : flex// 由 .card 给
flex-direction : row// flex 默认
align-items : center// 由 .card 给
gap : 14px
width : 340px
padding : 18px 20px
border-radius : 14px
background : linear-gradient(...)// 触发 Effect tree
box-shadow : 0 6px 20px rgba(...)// 触发 Effect tree
font-family : -apple-system, ...// 从 body 继承
color : rgb(21,23,28)// 从 body 继承
关键产物: 6 节点的 DOM 树各挂一个 ComputedStyle。从这一刻起,样式不再是字符串——它是 RGBA32、Length、EFlexDirection 这些紧凑的 C++ 类型。下游所有阶段都按这套结构化数据干活,没人需要再看 CSS 字符串。
The output: the 6-node DOM tree, each carrying a ComputedStyle. From this moment on style is no longer strings — it's RGBA32, Length, EFlexDirection, all compact C++ types. Every downstream stage operates on this structured data; no one needs to look at the CSS source again.
CSS 加载 · 真实日志
CSS load · the real log
在 Blink 里加桩打印解析过程,可以看到 HTML 解析与 CSS 解析是交错进行的。当 HTML 解析到 readystatechange = Interactive 之后,CSSParserImpl 才开始把外联样式表解析为 StyleRule:
Instrument Blink and print as it parses — you can see HTML and CSS parsing interleave. Only after HTML reaches readystatechange = Interactive does CSSParserImpl start turning the external stylesheet into StyleRules:
First stop in CSS parsing: characters → tokens. The tokenizer emits flavours like the ones below — FunctionToken (blue), HashToken (copper) and DelimToken (purple) being the hot ones:
Blink stores colours as RGBA32 — one 32-bit int — via CSSColor::Create. #hex goes through HashToken's direct path: bitwise pack straight into RGBA32. rgb() is a FunctionToken: parse arg list, range-check, then pack. Same white, more hops.
15% 这个数字真的有意义吗? · 微基准 vs 实际页面收益Is the 15% really meaningful? · micro-bench vs real-page payoff
Bottom line: for nearly every business page, the 15% is invisible. CSS parsing runs once at first load; a stylesheet of thousands of rules takes 5-15ms total, so 15% means ~2ms — noise inside cold-start (which starts at hundreds of ms). "Convert all rgb() to hex" is textbook over-optimisation.
那为什么这个数字还值得记? 因为它是一扇窗户——透过这 15% 你能看到 V8/Blink 这种 C++ 系统的性能哲学:
Then why is the number worth knowing? Because it's a window — through that 15% you see the performance philosophy of C++ systems like V8/Blink:
"fast path + slow path" is a Blink/V8 staple. Hex is the fast path (pure bitwise); rgb() is the slow path (function-parsing subsystem). The CSS parser, JS engine, and layout engine all use this structure — optimise the common case to nanoseconds, let the rare case take its microseconds.
"function calls are themselves performance events" — one C++ virtual call costs ~5-10ns, plus arg validation ~50ns. A row of color: rgb(...) adds 50ns; thousands of rows add tens of microseconds. The "drop in the bucket" calculus only matters on per-frame hot paths — CSS parsing isn't one, so the math doesn't move the needle.
真正每帧都跑的颜色路径在 paint/raster:每个 DisplayItem 的 fill color 解析、每个 tile 的像素采样,这些才是 RGBA32 优化的真正受益者。CSS 解析的 15% 只是同一套数据结构在 cold path 上的副产品。
The colour path that actually runs every frame is in paint/raster: every DisplayItem's fill color, every tile's pixel sampling — these are the real beneficiaries of RGBA32. The CSS parsing 15% is just the same data structure showing up on the cold path.
So:don't rewrite existing CSS for that 15%; do understand that "fast/slow path bifurcation" is a pattern the entire browser engine uses — the real optimisation targets are the "1000 times per frame" paths in later pipeline stages (C9/C10/C14 are the actual battleground).
Parsing .text .hello and #world, Blink emits the structure below. The relation field — that's the pointer right-to-left matching follows: start at .hello, walk Descendant edges up to .text.
selector text = ".text .hello"value = "hello" matchType = "Class" relation = "Descendant"tag history selector text = ".text" value = "text" matchType = "Class" relation = "SubSelector"selector text = "#world"value = "world" matchType = "Id" relation = "SubSelector"
Full list in blink/public/blink_resources.grd. This is why "your button style overrides the UA's without !important" — your CSS comes last, winning on declaration order.
The browser blocks rendering until it has both
the DOM and the CSSOM.
Render-blocking CSS · MDN
The browser blocks rendering until it has both
the DOM and the CSSOM.
Render-blocking CSS · MDN
为什么?因为没样式的 DOM 是无意义的。一棵裸树渲染上屏,下一帧又要因为 CSS 进来重排——还不如等。这是 CSS 一直被叫 "render-blocking" 的根源。
Because rendering a bare DOM is meaningless. Drawing it to the screen and re-laying it out the moment CSS arrives is more expensive than waiting. That is the root reason CSS is called "render-blocking".
StyleRules are not piled into one big array — they're sharded by their first selector's type into five maps. Matching only consults the relevant bucket, collapsing O(N) into O(N / k).
Map · #id
id_rules_
#world #header
Map · .class
class_rules_
.text .btn-primary
Map · [attr]
attr_rules_
[data-state="open"]
Map · tag
tag_rules_
div, span, p…
Map · ::pseudo
ua_shadow_…
::before ::placeholder
从右到左 · 选择器为什么这样读
Right-to-left · why selectors read backwards
假设你写下 .text .hello。要快速判断"这个 div 命中吗",浏览器从最右边开始:先看节点本身有没有 .hello,命中了再向上找祖先里有没有 .text。从右到左能在第一步就否决掉绝大多数节点。
You write .text .hello. The fastest way to decide "does this div match?" is to start from the right: does the node itself have .hello? If yes, walk ancestors looking for .text. Right-to-left rejects the vast majority of nodes on the very first check.
Whether a declaration applies, and which one wins, is decided by four ranked criteria. Only when one ties does the next level kick in — declaration order is the final tiebreaker.
01
Cascade layers 顺序
Cascade layers
@layer 块的声明顺序,最先声明的最弱。
Order of @layer declarations — earliest is weakest.
02
选择器特异度
Selector specificity
id (100) · class/attr/pseudo-class (10) · tag (1) 之和。
Sum of id (100) · class/attr/pseudo-class (10) · tag (1).
03
Proximity 排序
Proximity ordering
Cascade Level 6 引入,作用范围嵌套深的获胜。
Introduced in Cascade Level 6 — deeper-scoped scope wins.
04
声明位置
Declaration order
最后到达的获胜——这就是为什么 main-heading2 写在后面就赢了。
Last-write-wins — this is why main-heading2 wins simply by being declared later.
Suppose <h1 class="main-heading main-heading2"> with two rules: .main-heading { color: red; } and .main-heading2 { color: blue; }.
特异度同为 0,1,0,class 顺序无关——决定胜负的是声明位置。.main-heading2 写在后面,标题就是蓝色,把 class 顺序反过来写也一样。HTML 里 class 出现的先后从来不影响 CSS。
Specificity is identical at 0,1,0; the order of class names is irrelevant — declaration order decides. .main-heading2 is declared later, so the heading is blue, no matter what order you write the classes in HTML. Class order in HTML never affects CSS.
为什么 UA 样式排在你之前WHY UA STYLES DECLARE BEFORE YOURSBlink 内置默认样式表(html.css 等)总是第一个注册到 RuleSet。在第 4 级判定(声明位置)里,业务样式由于是后注册的,每次都赢——这就是 UA 样式可以被你覆盖 的根本机制。Blink's UA stylesheet (html.css and friends) is always registered into the RuleSet first. At the 4th cascade tier (declaration order), your CSS is registered later — and wins by being last. That is the actual mechanism that lets you override UA styles without !important.
DEVTOOLS
Performance > "Recalculate Style";Elements > Computed 看 ComputedStylePerformance > "Recalculate Style"; Elements > Computed for ComputedStyle
▸ Performance · Main thread · "Recalculate Style"selected · 4.2 ms · 1842 elements
Main
JS
Recalculate Style 4.2ms
Layout 3.1ms
Paint 1.8ms
Commit
idle
Recalc Style 展开
CollectMatching
CompareRules · cascade
ApplyMatched · ComputedStyle
Selector match stats · this frameRuleSet hit-rate: 96.4%
id_rules_ = 12 hits
class_rules_ = 1432 hits
tag_rules_ = 387 hits
attr_rules_ = 9 hits ⚠ slow
看 3 件事:① Recalc Style / Layout / Paint 三个块的相对宽度 — 哪个胖哪个就是瓶颈。Style 通常是 Layout 的 1/2;若反过来,八成是选择器太复杂(每个节点跑右往左匹配的成本爆了)② 底部 RuleSet hit-rate < 80% = 大量节点跑了无效匹配③ attr_rules_ 命中 标红 — 属性选择器([data-state="open"])是最慢的桶,遇到全文档量级 selector 时尤其贵。3 things to watch:① Recalc Style vs Layout vs Paint width ratios — fattest one is the bottleneck. Style is usually ½ of Layout; if reversed, you almost certainly have over-complex selectors (per-node right-to-left match cost explodes)② RuleSet hit-rate < 80% = many nodes running futile matches③ attr_rules_ hits in red — attribute selectors ([data-state="open"]) are the slowest bucket, particularly costly with document-scale selectors.
Render Tree每节点附 ComputedStyle+ ComputedStyle per node
DEMO H · LIVE · DOM × CSSOM → Render TreeDEMO H · LIVE · DOM × CSSOM → Render Tree
两棵树合成一棵的现场Two trees, one merge
点 DOM 树的任意节点,看 CSSOM 里哪几条规则被命中(黄色高亮)。下面是合成出来的 Render Tree——它从来不等于 DOM Tree。勾选 display: none 看节点怎么从 Render Tree 里消失(但仍在 DOM 里)。Click any DOM node and watch which CSSOM rules match (yellow highlight). Below is the Render Tree they produce — it never equals the DOM Tree. Toggle display: none to watch a node vanish from the Render Tree (while staying in DOM).
这就是 Style 阶段的核心数据流:DOM tree + CSSOM → Render Tree。head 和它的孩子(title、script)没视觉,全部丢弃;display: none 的节点不进 Render Tree、不参与 Layout、不参与 Paint——所以这是隐藏元素的最低成本方式。但注意:visibility: hidden 的节点仍在 Render Tree 里占位置,只是不绘制,因此和 display:none 性能差异巨大。This is the core data flow of the Style stage: DOM tree + CSSOM → Render Tree. <head> and its children (<title>, <script>) have no visual — all dropped; nodes with display: none never enter the Render Tree, skip Layout, skip Paint — making this the cheapest "hide". But beware: visibility: hidden nodes stay in the Render Tree taking space; they're just not painted — a huge perf gap from display: none.
DEMO M · LIVE · 选择器匹配 A/BDEMO M · LIVE · selector matching A/B
"选择器越具体越快" 的反证The counter-evidence to "more specific selectors are faster"
下面两边都有 800 个 .target。A 用 6 级 descendant 选择器(.x1 .x2 .x3 .x4 .x5 .target),B 用 扁平 class(.target-flat)。点 ▸ 触发同一个主题切换,看实测时间差——descendant 走的是 right-to-left 路径,每个 .target 都要回溯 5 级。Both sides have 800 .target elements. A uses a 6-level descendant selector (.x1 .x2 .x3 .x4 .x5 .target); B uses a flat class (.target-flat). Click ▸ to flip the same theme on both and watch the measurement gap — descendant matching walks right-to-left, climbing 5 ancestors per .target.
这是 BEM / Tailwind / CSS-in-JS hash class 真正的存在理由。所有它们做的事——把规则压成"一个 class 决定一切"——不是审美洁癖,是让 RuleMap 的 class_rules_ 桶在 O(1) 里完成匹配,把上面 A 的 5 级回溯整个砍掉。差距随 DOM 深度和数量放大——深 component tree + 全局 CSS-in-CSS 是典型反模式。This is the real reason BEM / Tailwind / CSS-in-JS hash classes exist. What they do — compress every rule into "one class decides everything" — isn't aesthetic preference; it's letting RuleMap's class_rules_ bucket match in O(1), cutting out the 5-level walk from A above. The gap scales with DOM depth and element count — deep component trees + global CSS-in-CSS is the classic anti-pattern.
LayoutNGFlexibleBox runs two passes: first measures main-axis min/max, second distributes remaining space. The card lands at (40, 40), size 340 × 88. Avatar, name, follow-button each receive their (x, y, w, h) — the card finally stands up.
Module
blink
Process
Render
Thread
Main
Output
Layout Tree
这一步在做什么
What it does
遍历 Render Tree,给每个 LayoutObject 计算 x · y · width · height。所谓 LayoutTree = Render Tree + 几何属性。Walk the Render Tree, compute x · y · width · height for every LayoutObject. LayoutTree = Render Tree + geometry.
为什么不能跳过
Why not skip
没有几何就没法绘制——"画一个红色矩形" 至少需要 4 个数字。Layout 还要解决 inline ↔ block ↔ float ↔ flex ↔ grid 之间错综的相互影响,是 Main thread 上最容易长尾的一段。No geometry → no painting. "Draw a red rectangle" needs four numbers at minimum. Layout also has to resolve the tangled interactions between inline ↔ block ↔ float ↔ flex ↔ grid — and it is the Main thread's most long-tail-prone stage.
Layout is about geometry — position and size. Each LayoutObject carries a LayoutRect that stores x / y / width / height.
The catch: LayoutObject does not map 1 : 1 to a DOM node. A display: list-item becomes two LayoutObjects (item box + marker box). An anonymous block "appears from nowhere" to keep layout rules consistent.
STAGE 03主线 · The Card 在布局后Main-line · The Card after Layout
LayoutNGFlexibleBox 的两遍布局,8 个 LayoutObject 各就位
LayoutNGFlexibleBox's two passes, 8 LayoutObjects in place
The card is a display: flex container, so the root article uses LayoutNGFlexibleBox, not LayoutNGBlockFlow. The flex algorithm runs twice: first the main axis (horizontal) — avatar 56 + gap 14 + button 53 = 123, leaving 217 for .info; then the cross axis (vertical) — center via align-items. The final LayoutTree:
↑ 中间是真名片 HTML。每个标注框颜色等于卡上对应元素的高亮色。Layout 之后每个 LayoutObject 都拿到了 (x, y, w, h)。↑ The card in the middle is the real HTML. Each box's colour matches a highlight on the card. After Layout, every LayoutObject has its (x, y, w, h).
Five things to notice:① img uses LayoutImage (a LayoutReplaced subclass) — it has no children and never will; img is a replaced element, Layout gives it a single box for the external resource② a uses LayoutInline, not BlockFlow — it's phrasing content, occupies a line-box③ button brings its own LayoutNGBlockFlow with default padding / border-radius from UA stylesheet④ DOM 6 nodes → 8 LayoutObjects (the three LayoutText appear from nowhere)⑤ the layout algorithm ran twice — that's the flex tax. A plain block flow only runs once.
What stages does mutating each CSS property trigger? CSS Triggers answers it. The most-cited rows below — moving a property from the reflow path to the composite path is the lowest-hanging perf win:
CSS 属性CSS property
Layout
Paint
Composite
width / height / padding / margin
●
●
●
top / left / right / bottom
●
●
●
font-size / line-height / display
●
●
●
color / background-color / box-shadow
—
●
●
border-radius / outline
—
●
●
opacity
—
—
●
transform
—
—
●
filter
—
—
●
用法HOW TO USE想做位移动画 → 用 transform: translate(x, y),不要用 top / left;想淡入淡出 → opacity,不要 display 切换;圆角变化 → 改 border-radius 时整张图层都得重 Paint,能避就避。不同浏览器内核的处理表略有差异,CSS Triggers 是 Lookup table,不是宪法。Want a position animation? Use transform: translate(x, y), not top / left. Cross-fade? opacity, not toggling display. Animating border-radius repaints the whole layer — avoid where you can. Different engines vary slightly; CSS Triggers is a lookup, not a law.
"LayoutObject" is not a single class — it's an inheritance tree. Blink uses subclassing to encode different box-model rules: block / inline / table / svg / mathml each walk their own algorithm. Below is a condensed map of the third_party/blink/renderer/core/layout/ tree:
LAYOUT OBJECT · CLASS HIERARCHYthird_party/blink/renderer/core/layout/layout_object.h
Two details worth remembering:① LayoutText doesn't inherit from LayoutBox — it has no box, its geometry is decided by the parent LayoutInline / LayoutBlockFlow inside an inline line-box② LayoutView is the single root — it owns the viewport's size + the root ScrollableArea. Removing document.body doesn't kill it; LayoutView is a permanent member of Document.
From 2017 Chromium rewrote the layout engine as LayoutNG (Next-Generation Layout). The headline change: introduce a "Fragment" as a read-only geometry snapshot — LayoutObjects remain the input "recipe", but layout output is no longer written back into them. Instead we get an immutable fragment tree (NGPhysicalFragment tree). This split lets layout be cached, parallelised, and short-circuited at subtree boundaries.
Operationally, LayoutBlockFlow::UpdateLayout() constructs an NGBlockLayoutAlgorithm, feeds it an NGConstraintSpace, runs it and emits an NGLayoutResult — at its centre an NGPhysicalBoxFragment. "Constraint + Algorithm → Fragment" is LayoutNG's three-act form, each act purely functional, each act cacheable. This is what reduces many O(whole tree) reflows to O(dirty subtree) under NG.
CSS says: adjacent vertical margins between block siblings collapse to the larger one. The catch: whether the current block's margin-top participates in collapse can only be decided after its first child block has been laid out — a cross-subtree backward dependency. LayoutNG models this explicitly with NGUnpositionedFloat and NGMarginStrut — keeping the algorithm pure functional.
为什么 Flex 要跑两遍布局? · 主轴 / 交叉轴的强先后依赖Why does Flex run layout twice? · main-axis must finish before cross-axis can begin
flex 的两遍布局,本质是"分配尺寸 → 排列方向" 这两步必须串行:
Flex's two layout passes really come from "distribute size → place along axis" being a strict sequence:
Pass 1 · main axis (horizontal): solve flex-grow / flex-shrink / flex-basis arithmetic — given the container's available width, distribute it across children per flex: 1 0 200px style rules. The output: each child's main-axis size (i.e. width for a horizontal flexbox).
Pass 2 · cross axis (vertical): now that every child's width is known, we can measure their heights (height often depends on width — e.g. auto-wrapping text: 1 line if the container is wide, 3 lines if narrow). align-items / align-self alignment + "the tallest item in a flex-line sets the line's height" rule both need this second pass.
Why can't this run in parallel? Because a child's height depends on the width it received in pass 1 — a one-way dependency. If you forced height-first, you'd get "height computed against the original container width" — once the container width changes (because flex-grow expanded a child), all heights are stale and need recomputing. So the "two passes" are a mathematical constraint, not an engineering whim.
Grid is even worse: CSS Grid runs three or more passes in some scenarios (the min-content / max-content track-sizing algorithm is iterative until convergence). Flex's two passes are actually frugal in comparison.
Layout Tree含 LayoutRect + Fragment 树+ LayoutRect + Fragment tree
DEMO B · LIVE · 强制同步布局DEMO B · LIVE · Forced Sync Layout
真的能看见的 Forced Sync LayoutForced Sync Layout you can actually see
下面是一段 5000 次的 write → read 循环。先点 BAD——每次写都触发一次 layout;再点 GOOD——所有写完成后一次性读。速度差距是真的。Below is a 5,000-iteration write → read loop. Click BAD first — every write triggers a layout. Then click GOOD — all writes first, then read in one batch. The gap is real.
这个 demo 没有作弊——两段代码做的是同样的工作量,差别只在读和写交错 vs 批量化。开 DevTools Performance 面板录一次能看到那一排紫色 Layout 条。你写的每一段 ResizeObserver 回调、每一段 el.style.x = …; el.offsetX 都在重蹈这里的覆辙。This demo isn't cheating — both code paths do the exact same work. The only difference is interleaved read+write vs batched. Open DevTools Performance and you'll see that wall of purple Layout bars. Every ResizeObserver callback you wrote, every el.style.x = …; el.offsetX — they all replay this same bug.
DEMO I · LIVE · CSS Triggers 速查DEMO I · LIVE · CSS Triggers cheat-sheet
每个 CSS 属性到底跑了哪几道工序?Which pipeline stages does each CSS property trigger?
点击任一属性看它的代价。红色 = Layout、橙色 = Paint、绿色 = Composite。用上方过滤器只看"动画安全集"(绿色)— 这就是写 60fps 动画的属性白名单。Click any property to see its cost. Red = Layout, orange = Paint, green = Composite. Use the filters above to see the "animation-safe set" (green-only) — that's your whitelist for 60fps animations.
分享时记住三组对照:(1) left/top 跑全套 vs transform 只 Composite — 这就是 Demo A 的根因;(2) border 触发 Layout、border-radius 不 — 前者是几何变化,后者只是 paint 时的 clip;(3) box-shadow 不触发 Layout,但每帧重画整个阴影 — 比你想的更贵。这张表回头你可以单独贴墙上当 cheat-sheet。Three contrasts to remember during the talk: (1) left/top runs the full pipeline vs transform Composite-only — that's the root of Demo A; (2) border triggers Layout but border-radius doesn't — the former changes geometry, the latter is just a clip at paint time; (3) box-shadow skips Layout but redraws the whole shadow per frame — pricier than you'd guess. Tear this off as your CSS-perf cheat-sheet.
DEMO L · LIVE · Flexbox 布局算法DEMO L · LIVE · Flexbox layout algorithm
LayoutNG 现场分配空间LayoutNG distributing space, live
三个 flex item,所有属性都可拖。右边的"算法轨迹"实时打印 Chromium 这一帧实际跑了什么——主轴可用长度、basis 之和、free space 怎么算、grow 怎么分配、justify/align 怎么落点。flex-grow 跟 flex-shrink 的分配公式不一样——shrink 走 shrink × basis 加权,调小容器宽度就能看到。Three flex items, every property is draggable. The "algorithm trace" on the right prints what Chromium actually does this frame — main-axis available length, sum of flex-basis, how free space is computed, how grow distributes it, how justify/align position. flex-grow and flex-shrink don't use the same formula — shrink is weighted by shrink × basis. Lower the container width and you'll see it kick in.
几个值得在分享时点出来的反直觉点:(1) flex-basis 不等于最终宽度——它只是分配前的"出发点",free space 一分就变了;(2) flex: 1 不是 grow=1,它是 1 1 0%——所有 item 平分整个容器,因为 basis=0;(3) flex: 1 1 auto 跟 flex: 1 算出来宽度不一样,因为 auto 把 basis 设成 content size;(4) shrink 公式带 basis 权重——大 item 缩得更多。这些你拖几下滑块都能在右边轨迹里直接看到。Counterintuitive points worth calling out: (1) flex-basis is not the final width — it's just the starting point before free-space distribution; (2) flex: 1 is not just grow=1 — it's 1 1 0%, so all items split the whole container equally because basis is 0; (3) flex: 1 1 auto computes differently from flex: 1 because auto uses content size as basis; (4) shrink is weighted by basis — bigger items shrink more. Drag the sliders and watch the trace prove each of these.
STAGE 04 · DOC PHASE
Pre-paint — 四棵属性树的诞生
Pre-paint — birth of the four property trees
局部更新的"语法"
isolation contracts for transform · clip · effect · scroll
Transform / Clip / Effect / Scroll — four trees grow in parallel. .follow gets its own node in the Transform tree thanks to will-change: transform; Effect tree gains 2 nodes (button shadow + avatar gradient). Each tree is a contract — "if this element changes, who else needs to repaint?"
Module
blink
Process
Render
Thread
Main
Output
Property Trees ×4
这一步在做什么
What it does
从 LayoutTree 抽出 4 棵属性树(Transform / Clip / Effect / Scroll),每棵树用父子继承的方式表达"该节点上面叠了哪些变换 / 裁剪 / 特效 / 滚动"。把这部分从图层结构里剥离出来,是 Compositor 能"只更新一个属性、不重画"的根基。Extract 4 property trees (Transform / Clip / Effect / Scroll) from the LayoutTree. Each tree uses parent-child inheritance to express "what transforms / clips / effects / scrolls are stacked above this node". Splitting these axes from the layer structure is what lets the Compositor "mutate one property without repainting".
为什么不能跳过
Why not skip
没有这 4 棵树,每次 transform / opacity / scroll 改变都要顺着图层树重新计算继承关系,跨线程把整棵树拷一遍。属性树是 Compositor "动一个节点的一条属性,其他都 cache 命中"的隔离合同。Without the four trees, every transform / opacity / scroll change has to recompute inheritance up the layer tree and ship the whole tree cross-thread. Property trees are the isolation contract that lets "mutate one property of one node" stay local.
CAP · COMPOSITE AFTER PAINT新版本 Chromium 的 Pre-paint & Paint 已重写为 CAP(Composite After Paint) 模式——属性树的构建从 Layout 后剥离到 Paint 之后完成,去掉了 PaintLayer 这一层。结果是更少的中间产物 + 更精确的失效计算。本文中的描述基于 CAP 之前的世界,但脉络与新版完全一致。Recent Chromium has rewritten Pre-paint & Paint as CAP (Composite After Paint) — property-tree construction moves from "after Layout" to "after Paint", and the PaintLayer abstraction is gone. The result: fewer intermediate artifacts and more precise invalidation. The model below predates CAP, but the shape is identical in the new world.
STAGE 04主线 · The Card 在 Pre-paint 后Main-line · The Card after Pre-paint
4 棵树各取所需,Effect 树多 2 个节点,Transform 树多 1 个
Four trees each take a piece — Effect gains 2 nodes, Transform gains 1
名片这次会让 4 棵属性树都"动起来"——但每棵树多出来的节点不一样。NeedsEffect() 看到 .card 上的 box-shadow 与 linear-gradient,在 Effect tree 里给它建一个节点;看到 .follow 的 background 切色又建一个;NeedsTransform() 看到 .follow 的 will-change,在 Transform tree 给它建一个;NeedsClip() 看到 .avatar 的 border-radius: 50%,在 Clip tree 多一个 RRect 节点。Scroll tree 没动(整张卡都不滚)。
This time the card stirs all four property trees — but each tree gains a different number of nodes. NeedsEffect() sees .card's box-shadow + linear-gradient and creates one Effect node; .follow's background-mutation creates another. NeedsTransform() sees .follow's will-change and creates a Transform node. NeedsClip() sees .avatar's border-radius: 50% and creates a Clip node. Scroll tree remains untouched — nothing scrolls.
Clip tree · +1.avatar · radius:50%SkRRect 4× 28pxNeedsClip() = true
Transform tree · +1.follow · will-change:transform提前升 Layer 的开关layer-promotion switch
Scroll tree · 0名片不滚动 · 整树 cache 命中card doesn't scroll · cache hit下一帧整体复用reused wholesale next frame
RenderPass合同 contract.card 子树 → 独立 RenderPass.card subtree → its own passC16 (Draw) 兑现cashed at C16 (Draw)
↑ 中间是真名片 HTML。每个标注框颜色等于卡上对应元素的高亮色。Pre-paint 之后 4 棵属性树各自只长出"真正改了的节点"——下一帧只 push 变了的节点。↑ The card in the middle is the real HTML. Each box's colour matches a highlight on the card. After Pre-paint, each of the four trees only grows the nodes that actually changed.
// 4 棵属性树 · The Card 之后Transform treeClip tree
─ root ─ root
└─ .follow └─ .avatar (RRect 50%)[will-change][border-radius]Effect treeScroll tree
─ root ─ root
├─ .card// (无变化)
│ [shadow + gradient]// nothing scrolls in
│ render_surface = YES// the example
└─ .follow[will-change]
Key decision: .card's render_surface_reason_ flips to non-null (the box-shadow needs off-screen compositing to render correctly) — meaning .card's whole subtree will land in its own RenderPass during Draw. The decision is made in Pre-paint but only cashes in at C16. The four trees are the contract that powers every "local update" later — hover-changing .follow's transform mutates one node in the Transform tree; the other three trees are cache hits and Layout doesn't run at all.
四棵属性树
The four property trees
TRANSFORM
变换树Transform tree
每个节点的位移、旋转、缩放、3D 变换;动画热路径必经。
Per-node translation / rotation / scale / 3D. The hot path of animations.
CLIP
裁剪树Clip tree
overflow / clip-path 在层级里生成的剪裁矩形。
Clip rectangles inherited from overflow / clip-path.
Backed by these trees, Chromium can mutate one node's transform / clip / effect / scroll without disturbing its descendants. This is why CSS animations stay smooth — the entire animation never goes back to the Main thread.
案例 · 一个 div,4 棵树各取所需
Worked case · one div, four trees each take a piece
CASE · DECOMPOSITION
CSS 属性是怎么"分流"到不同属性树上的
How CSS properties get routed to different trees
<div style="
transform: rotate(10deg); /* → Transform tree */
overflow: hidden; /* → Clip tree */
opacity: 0.5; /* → Effect tree */
filter: blur(4px); /* → Effect tree */
overflow-y: scroll; /* → Scroll tree */
">
The same LayoutObject contributes one node to each of Transform / Clip / Effect / Scroll during Pre-paint. Animating opacity later mutates only the Effect tree — Transform / Clip / Scroll are all cache hits.
The key is step ①: each object does not rebuild the whole tree, it only updates the 4 property-node pointers tied to itself. Pre-paint runs in O(changed nodes), not O(whole tree) — that's what makes it cheap enough to run every frame.
"Property tree" sounds abstract; in code it's a handful of small structs that inherit from cc::PropertyTreeNode. Each tree is an array (Vector<Node>); nodes link via integer parent_id — not pointers, because the whole tree gets memcpy'd cross-thread at Commit time. The four flavours of node, with their key fields:
Three details unlock the design's power:① all four trees share one node-id space (1-based, 0 is the root) — a LayerImpl needs only 4 ints to know "which transform / clip / effect / scroll chain I belong to"② every node carries a changed_flag, so the next frame only pushes the nodes that actually changed — Commit bytes scale with delta③ the render_surface_reason_ on an Effect node is the real switch for "do we need an off-screen RenderPass?" — filter: blur(), mix-blend-mode, mask-image all flip it on.
When PrePaintTreeWalk reaches a LayoutObject, PaintPropertyTreeBuilder::UpdateForSelf() follows a "build-on-demand" rule — it creates a node only if a CSS property needs one. For example:
This is the real answer to "which properties trigger GPU acceleration?" — anything that makes Pre-paint emit a new Transform / Effect node with a non-empty render_surface_reason_promotes that subtree to its own composited layer. will-change: transform "creates a layer" precisely because it bypasses the NeedsTransform guard and forces a Transform node into existence.
为什么是 4 棵树,而不是 1 棵大树? · 不同属性的失效粒度本质不同Why four trees, not one big tree? · different properties have fundamentally different invalidation granularity
最直接的答案:这 4 类属性的"变化频率" 与"影响范围" 完全不一样。把它们塞进一棵树,每次变一个属性都要把整棵树推到 Compositor 线程,那就退化成"整页重做"了。
The most direct answer: these four kinds of properties have completely different "change frequency" and "impact radius". Stuff them into one tree and every property change ships the whole tree to the Compositor — degenerating to "redo the whole page".
具体看 4 棵树各自的"性格":
Each tree's "personality" specifically:
Transform tree · 变化最频繁(每帧动画都改),但不影响其他元素的几何。一个独立树,变化只 push 一个节点。
Transform tree · changes most often (every animation frame), but doesn't affect other elements' geometry. Its own tree → one node push per change.
Clip tree · driven by layout, changes less often; but clipping is cumulative (parent clip ∩ child clip), needing its own ancestor chain. Mixed with Transform, every transform change would re-evaluate clip — pure waste.
Effect tree · decides RenderPass boundaries (box-shadow / opacity / blend-mode trigger off-screen composition). This tree's nodes are the buckets of GPU work — they decide "which quads go to a dedicated RenderPass". Transform / Clip don't affect RenderPass partitioning, so this must stay separate.
Scroll tree · 滚动是用户输入触发,跟 vsync 节奏完全独立。Compositor 线程要在没有 Main 线程参与的情况下直接修改 scroll offset。如果 Scroll 在 Transform 树里,Compositor 修一个 offset 就要"叫醒整棵 Transform 树",违背"滚动跑在 Compositor 上"的初衷。
Scroll tree · scrolling is triggered by user input, on a completely different cadence from vsync. The Compositor needs to mutate scroll offset without involving Main. If Scroll lived inside Transform, mutating one offset would "wake the entire Transform tree" — defeating the purpose of "scrolling runs on the Compositor".
So the 4 trees are an "orthogonal decomposition": each property axis has independent "change frequency × impact radius"; cramming them into one tree disables 4 independent optimisations. Storing orthogonal axes separately and updating locally per axis — the same trick used by database engines, graphics engines, and OS schedulers: "shard by mutation pattern".
Counter-example: early Blink's PaintLayer was exactly "one tree with all properties bundled" — every transform change traversed the PaintLayer tree. The CAP (Composite After Paint) project's core action was killing PaintLayer and replacing it with these 4 independent property trees. The perf win came from "splitting things that shouldn't change together" — a rule that holds for any system.
DEVTOOLS
Performance > "Update Layer Tree";Layers 面板看图层树Performance > "Update Layer Tree"; Layers panel for the layer tree
看 3 处:① 树视图里星标 ★ promoted 的就是 4 棵属性树给它建了独立节点 的元素 — Pre-paint 的输出在这里可见② 右侧画布是层级俯视图,实线方框 = 独立合成层(.card 与 .follow),虚线 = 普通子元素③ 鼠标悬到任一层上 DevTools 会显示 Compositing Reasons(kActiveTransformAnimation / kBackdropFilter 等) — 直接对应 cc::CompositingReason enum。独立合成层数量爆炸是 will-change 滥用的明证。3 spots to inspect:① ★ promoted entries in the tree are elements where the four property trees created dedicated nodes — Pre-paint's output is visible here② the right canvas is the top-down layer view; solid borders = composited layers (.card and .follow), dashed = inline children③ hover any layer and DevTools shows the Compositing Reasons (kActiveTransformAnimation / kBackdropFilter etc.) — directly mapping to the cc::CompositingReason enum. An explosion of composited layers is proof of will-change abuse.
DEMO E · LIVE · 4 棵 property tree 长起来DEMO E · LIVE · the 4 property trees grow
每一行 CSS 都在动哪棵树?Which tree does each line of CSS grow?
点 下一步 或 自动播放,看 Transform / Clip / Effect / Scroll 四棵树跟着 CSS 一行行往里长节点。这就是 CAP (Composite After Paint) 把"什么变了"和"谁要重画"解耦的方式。Click next or auto-play to watch the four trees — Transform / Clip / Effect / Scroll — grow nodes as CSS rules are added one by one. This is how CAP (Composite After Paint) decouples "what changed" from "who needs to repaint".
注意第 3 步——will-change: transform 在 Transform tree 多出一个 .follow 节点。**这就是 Layer 升格的起点**。第 4 步 box-shadow 让 Effect tree 长出 shadow 节点——下一帧任何 .follow 的位置变化,只需要重画这个 Effect 节点子树,不需要碰主图层。这就是为什么 transform / opacity 动画能在 Compositor 独立跑。Notice step 3 — will-change: transform grows a .follow node in the Transform tree. This is where Layer promotion begins. Step 4's box-shadow grows a shadow node in the Effect tree — when .follow moves on the next frame, only that Effect subtree needs to repaint, the main layer is untouched. That's why transform / opacity animations run independently on Compositor.
DEMO F · LIVE · Layer Explosion 滑块DEMO F · LIVE · Layer Explosion slider
看见 will-change 的真实代价See what will-change actually costs
拖动滑块加更多 will-change: transform 元素。看 Layer 数、估算的 GPU 内存、和实际 FPS 怎么变。更多 layer ≠ 更流畅——这是分享里 Layer Explosion 的实物证据。Drag the slider to add more will-change: transform elements. Watch Layer count, estimated GPU memory, and actual FPS change. More layers ≠ smoother — this is the exhibit for Layer Explosion.
注意 ~150 层时 FPS 开始掉、~300 层时崩。VRAM 估算每层按 64×64 RGBA = 16 KB 算下限——真实的 Layer 经常远大于这个。所以 Chrome 团队的建议是 hover 进来加、离开删——而不是预先撒一堆 will-change 等着不知道哪个会触发动画。FPS starts dropping around ~150 layers, collapses around ~300. The VRAM estimate uses 64×64 RGBA = 16 KB as a lower bound per layer — real layers are often much bigger. That's why Chrome's guidance is add-on-hover, remove-on-leave — not sprinkle will-change everywhere and hope.
STAGE 05 · DOC PHASE
Paint — 不画像素,写"绘画清单"
Paint — not pixels, but a display list
按 CSS 绘画顺序写指令
layout objects → display item list, in CSS painting order
A ~12-entry DisplayItemList: Save / ClipRRect(avatar) / DrawRect bg / DrawImage(avatar) / DrawText("Airing") / DrawText(bio) / DrawRRect(button) / Restore. A pure command "score" — and not one pixel has been played yet.
Module
blink → cc
Process
Render
Thread
Main
Output
cc::PictureLayer + DisplayItemList
这一步在做什么
What it does
遍历 LayoutTree,按 CSS 绘画顺序把每个 LayoutObject 翻译成一组 DisplayItem,组成一份 DisplayItemList,挂到 cc::PictureLayer。这一步不画一个像素——它只是把"要画什么"写成可重放的脚本。Walk the LayoutTree in CSS painting order, translate each LayoutObject into a batch of DisplayItems assembled into a DisplayItemList, attach to cc::PictureLayer. Not one pixel painted here — Paint writes a replayable script of "what to draw".
为什么不能跳过
Why not skip
指令而非像素 = 可跨线程传递。Main thread 只生产 DisplayItemList,Raster 线程拿过去 playback。同一份指令在不同 scale / 不同 Tile / 不同设备上反复使用——是 Chromium "便宜地多次画" 的根。Instructions, not pixels = cross-thread transferable. Main only produces DisplayItemList; Raster threads play it back. The same list is reused across scales, tiles, devices — the root of Chromium's "paint cheaply, paint many times".
STAGE 05主线 · The Card 在 Paint 后Main-line · The Card after Paint
14 条 DisplayItem 排好,头像被 ClipRRect 圈住
14 DisplayItems in line, avatar wrapped by ClipRRect
Paint 不画一个像素——它只把 Layout 与 Effect tree 给的指令转写成 cc::DisplayItemList。名片的最终 list 是按 CSS Appendix E 的 7 阶段绘画顺序排好的:
Paint paints zero pixels — it only transcribes Layout + Effect tree decisions into a cc::DisplayItemList. The card's final list is ordered by the CSS Appendix E painting sequence:
↑ 中间是真名片 HTML。每个标注框颜色等于卡上对应元素的高亮色。Paint 之后 ~14 条 DisplayItem 排好——但一个像素也没画。真画是 Raster 的事。↑ The card in the middle is the real HTML. Each box's colour matches a highlight on the card. After Paint, ~14 DisplayItems are lined up — but not one pixel has been drawn. The actual drawing is Raster's job.
Three things to notice:① the avatar's ClipRRect is a real circle — SkRRect carries four independent corner radii, all four set to 28px, so the GPU treats it as a true circle② SaveLayer nests twice — the outer one carves an off-screen texture for .card's shadow effect; the inner one is .follow's own composite layer. To cc these are two independent PaintChunks, each bound to its own 4-tree state③ the whole list reuses incrementally via PaintController's double buffer — next frame's unchanged chunks are O(1) range-moved over, only mutated chunks enter Raster.
The CSS spec mandates a fixed 7-phase order for every block — background at the bottom, text at the top. The order is "CSS 2.1 Appendix E painting order", strictly followed by Blink:
The Paint stage doesn't paint pixels — it records "what to paint". Each entry is a DisplayItem, ultimately fed into a cc::PictureLayer. Given this HTML:
和 DOM 构建一样,Paint 走的是栈式遍历——遇到一个 LayoutObject,先 SaveLayer 压栈,绘制完子节点后 Restore 出栈。这种模式让裁剪、变换、不透明度都能就地嵌套,又不会污染兄弟节点。
Like DOM construction, Paint walks a stack — hit a LayoutObject, push SaveLayer, paint children, pop Restore. This pattern lets clip, transform and opacity nest in place without contaminating siblings.
"DisplayItemList" inside Blink is not a flat array — it nests three layers: DisplayItem → PaintChunk → PaintArtifact. The design lets invalidation granularity and property-tree binding both target sub-ranges precisely:
PAINT_CONTROLLER · DATA MODELthird_party/blink/renderer/platform/graphics/paint/
Why a "chunk layer"? Because cc's Layerization step doesn't operate on individual DisplayItems — it operates on "contiguous DisplayItems sharing one property-tree state". Only a chunk's worth can land on the same cc::Layer. The chunk's properties_ field carries the current pointer into the four property trees Pre-paint just produced — it switches at every SaveLayer/Clip/Transform boundary. This is the seam that stitches "property trees" and "display lists" together.
Every LayoutObject's DisplayItems carry an identity (DisplayItemClient). On the next Paint, unchanged LayoutObjects reuse last frame's DisplayItems verbatim. Paint is incremental — its cost scales with the dirty region only.
Mechanically the reuse runs on a double-buffer inside PaintController: current_paint_artifact_ is last frame's output; new_paint_artifact_ is this frame's accumulator. When PaintController::UseCachedItemIfPossible(client, type) hits (the client is clean and last frame painted this type), the whole contiguous slice of DisplayItems is bulk-moved into new_artifact — O(1) range move, no repaint. The mechanism is called "Paint cache", and it's why a typical frame in Blink only paints 200~300 new items even though the document has thousands.
cc::Layer 的 5 个家族成员
The cc::Layer family — five subtypes
Paint 把 LayoutObject 转成的不是一张图——而是一棵 cc::Layer 树,运行在主线程,每个 Render Process 有且只有一棵。它的子类决定了"上屏方式":
Paint hands off not an image — but a cc::Layer tree, living on the Main thread, exactly one per Render process. The subclass decides "how this will reach the screen":
Embeds a CompositorFrame from another process. Used by iframes, OffscreenCanvas, video.
cc::UIResourceLayer / NinePatch
软件渲染场景下的"位图层"——类 TextureLayer 的 fallback。
Software-rendering bitmap layer — the fallback cousin of TextureLayer.
cc::VideoLayer (deprecated)
已弃用,被 SurfaceLayer 取代。
Replaced by SurfaceLayer.
cc 是什么WHAT 'CC' STANDS FOR这里的 cc = content collator(内容编排器),不是 Chromium Compositor。整个 cc 模块的工作就是在 Render 进程内组织好"该给 Viz 画什么",所以叫 collator 比 compositor 更贴切。cc = content collator — not Chromium Compositor. The cc module's job is to assemble what should be drawn for Viz inside the Render process. Collator fits better than Compositor.
The card now lives on both sides — LayerTreeHost (Main) and LayerTreeImpl (Compositor). PushPropertiesTo atomically pushes 2 cc::Layers (main + .follow standalone) across. Main can let go now — the card's fate belongs to Compositor.
Module
cc
Process
Render
Thread
Main → Compositor
Output
LayerImpl Tree
这一步在做什么
What it does
把 Main thread 上的 cc::Layer 树(外加 4 棵属性树、DisplayItemList)同步到 Compositor thread 上的 LayerImpl 树。这是渲染管线唯一一次显式跨线程的时刻——执行期间 Main thread 被短暂"冻住"。Synchronise the Main-thread cc::Layer tree (plus 4 property trees + DisplayItemList) onto the Compositor-thread LayerImpl tree. This is the pipeline's only explicit cross-thread moment — Main is briefly frozen while it happens.
为什么不能跳过
Why not skip
两线程不能直接共享指针——JS 随时可能在 Main thread 上改动 cc::Layer,Compositor thread 同时还要光栅化它。Commit 是一次 "snapshot + 转交所有权" 的同步,让两边在边界上不打架。The two threads cannot share pointers directly — JS may mutate cc::Layer on Main any moment while the Compositor thread is rasterising it. Commit is a "snapshot + ownership transfer", so the two sides never trip over each other.
Paint produced 2 cc::Layers (main + .follow) + 4 property trees + 2 PaintChunks (each bound to a property-tree state). Commit pushes this whole bundle atomically from Main to Compositor:
The mechanism:TreeSynchronizer walks both sides with a double pointer — left is Main's cc::Layer tree, right is Compositor's LayerImpl tree. Each pair calls PushPropertiesTo() to push deltas. If only .follow's transform changed, this Push touches one LayerImpl + one Transform-tree node (changed_flag). The bytes copied per Commit scale with the actual number of changed properties — that's why mutating a single transform doesn't blow up the Commit cost.
Frame Lifecycle · BeginMainFrame 到 Commit
Frame lifecycle · from BeginMainFrame to Commit
vsync 一来,Compositor thread 上的 Scheduler 给 Main thread 发一个 BeginMainFrame。Main thread 接到信号后跑 Style → Layout → Pre-paint → Paint 这四步,把产物(cc::Layer 树)准备好;准备完毕,触发 Commit;Commit 执行期间 Main 被阻塞,结束后 Main 立刻继续干别的(执行 JS、跑 microtask 等)。
When vsync ticks, the Compositor-thread Scheduler sends a BeginMainFrame to the Main thread. Main runs Style → Layout → Pre-paint → Paint to prepare the cc::Layer tree, then triggers Commit. Commit blocks Main; once it returns, Main is free to do other things (execute JS, run microtasks, …).
FIG 11.A一个完整的 frame lifecycle:vsync 触发 BeginMainFrame;Main 跑前 5 步;Commit 后 Main 立即解放,Compositor 接着跑后 7 步。A complete frame lifecycle: vsync fires BeginMainFrame; Main runs the first five steps; Commit unblocks Main, and the Compositor takes over the remaining seven.
Commit is essentially every cc::Layer pushing its latest properties onto its matching cc::LayerImpl. TreeSynchronizer walks both trees in lockstep, calling each Layer's PushPropertiesTo. Textures aren't copied (they live in SharedImage), only properties.
SingleThreadProxy vs ProxyMain · two compositing modes
Chromium 实际上有两套 Commit 实现:
Chromium actually has two Commit implementations:
SingleThreadProxy
Compositor 跑在 Main thread 自己内(非典型)。Android WebView、headless mode 用它。Commit 退化成函数调用。
Compositor runs on the Main thread itself (atypical). Used by Android WebView, headless mode. Commit degenerates into a function call.
ProxyMain ↔ ProxyImpl
默认模式。Main 和 Compositor 各跑各的,通过消息泵通信。Commit 是一次跨线程同步——Main 阻塞等 Compositor 拷完属性。
The default. Main and Compositor each run on their own thread, communicating via message pumps. Commit is a cross-thread sync — Main blocks while the Compositor copies properties.
Commit 慢起来 · 为什么
Slow commits · why
PERF DIAGNOSIS
"为什么我的 Commit 要 30ms"
"Why is my commit taking 30ms"
Commit 慢,常见三种原因:
Three common causes:
① 图层树过深——一万个 cc::Layer 一个一个 PushPropertiesTo 是有成本的。优化:合并相邻图层、避免无意义的 will-change。
① The layer tree is too deep — calling PushPropertiesTo on ten thousand cc::Layers isn't free. Fix: merge adjacent layers, drop pointless will-change.
② Property tree 节点爆炸——每个 transform / clip / effect 都新建一个属性节点。一个 1000 节点的属性树同步起来会卡。
② Property tree node explosion — every transform / clip / effect creates a new property-tree node. A 1000-node tree syncs slowly.
③ 大块图片资源——TransferableResource 引用一旦改动,需要更新 SharedImage 引用,跨进程通信增多。
③ Large image resources — TransferableResource refs that churn force SharedImage ref updates, increasing IPC.
为什么 Commit 必须阻塞 Main thread? · 跨线程数据一致性的硬约束Why must Commit block the Main thread? · a hard constraint on cross-thread data consistency
No blocking = torn reads. A cc::Layer holds a dozen fields — transform / opacity / bounds / display_list, etc. If Main is mid-mutation on transform (x updated, y not yet) when Compositor reads the layer, it gets an illegal "new x, old y" state and the next frame jitters. Either lock (expensive + deadlock risk) or stop-the-world — Chromium picks the latter, because Commit is usually short (<1ms), cheaper than the lock overhead.
Commit 是"事务边界"。 一帧 Main 上跑了:Style → Layout → Paint → 改 cc::Layer。这一长串变更必须原子地一起出现在 Compositor 端,否则 Compositor 看到的是"新的 Layout 但是老的 Paint",几何与像素对不上。Commit 阻塞 Main 那一瞬,本质是在执行一次跨线程事务提交——跟数据库的 COMMIT 同名同义。
Commit is a "transaction boundary". In one frame, Main ran: Style → Layout → Paint → mutate cc::Layer. This whole train of changes must appear atomically on the Compositor side; otherwise Compositor sees "new Layout but old Paint" and geometry doesn't match pixels. Commit blocking Main is literally a cross-thread transaction commit — same name, same meaning as a database COMMIT.
Commit really is short (typically 0.5-2ms), because it moves pointers, not data: LayerImpl is a "shadow copy" of cc::Layer, sharing the underlying SkPicture / TransferableResource (refcount); property trees are shallow vector copies. So even with Main blocked, the impact is small — most pages commit in under 1ms and you can't feel it. What actually slows Commit is cc::Layer count exploding into the thousands, or property tree node count going wild (covered in the "slow commits" section above).
Are there non-blocking alternatives? Yes — "impl-side painting" (experimented in 2014, abandoned) and "composite without commit" (a side-channel for the very few Compositor-only animations). The first was killed for complexity; the second only handles "pure transform / opacity" animations. For most business pages, accepting 1ms of Commit blocking in exchange for a clean pipeline is an excellent trade.
DEVTOOLS
Performance > Frames 行 + "Commit" 事件;Main 线和 Compositor 线对齐看Performance > Frames row + "Commit" event; align Main and Compositor lanes
TRACING
cc, blink.commit, viz.frame_production, scheduler
FLAG
--single-process/ 看 SingleThreadProxy 模式 vs ProxyMain/ inspect SingleThreadProxy vs ProxyMain modes
▸ Performance · BeginMainFrame 到 Commit 的完整周期vsync N · 16.7 ms budget
Compositor
BMF
wait for Main
Commit
Tile + Activate
wait for Raster
Draw + Submit
idle
Main
wake
JS
Style
Layout
Pre-paint
Paint
⤳ Commit (blocked)
microtask + rAF
idle (until next BMF)
Raster ×4
idle
parallel raster (4 threads)
idle
0481216.7 ms
3 个判读:① Compositor 与 Main 在 "Commit" 那条红线同时停下 — 这是唯一一段两线程显式同步的时刻,~1ms 阻塞② Commit 之后 Main 立即解锁,可以跑 microtask 与 rAF — 不是在等渲染完成③ Compositor 在 Commit 之后并不闲,而是继续推进 Tile / Activate / Draw,与 Main 上的 JS 工作并行 — 这就是 cc 的 "异步流水线"。看到 Commit 时间长 > 5ms,八成是 cc::Layer 树过大或 Property 节点爆炸。3 reads:① Compositor and Main both pause at the red "Commit" line — the one moment of explicit cross-thread sync, ~1ms block② Main unlocks immediately after Commit and runs microtasks + rAF — not waiting for rendering to finish③ Compositor isn't idle after Commit either — it keeps pushing Tile / Activate / Draw, in parallel with Main's JS work. That's cc's "asynchronous pipeline". Commit time > 5ms almost always means an oversized cc::Layer tree or property-node explosion.
cc::Layer Tree+ Property Trees + DisplayItemList+ Property Trees + DisplayItemList
→
OUTPUT
LayerImpl Tree挂在 Compositor thread 上on the Compositor thread
DEMO D · LIVE · 冻结 main threadDEMO D · LIVE · Freeze the main thread
看见 INP 和 Compositor 独立的瞬间Watch INP and Compositor independence happen
点红色按钮冻结 main thread 800 ms。左边的计数器停了,但右边的 spinner 仍在转。冻结期间点右边的按钮,会显示真实的点击延迟——这就是 INP。Click the red button to freeze main thread for 800 ms. The left counter stops, but the right spinner keeps spinning. Click the right button during the freeze and the real click delay shows up — that is INP.
这个 demo 用 while (perf.now() - t < 800) {} 真的把 main 烧死 800 ms。你写的每一段长 JS、每一个未拆的 Long Task、每一次同步 fetch都在做这件事。Compositor 上的 CSS 动画继续——这就是为什么 transform 比 left 流畅。但用户点击响应卡死——这就是为什么 INP 是 main thread 的健康度指标。This demo uses while (perf.now() - t < 800) {} to actually burn main for 800 ms. Every long JS, every un-split Long Task, every synchronous fetch you wrote does the same thing. CSS animations on Compositor keep running — that's why transform feels smoother than left. But user clicks stall — that's why INP is the health metric for main thread.
STAGE 07 · CC PHASE
Compositing — 把页面切成独立图层
Compositing — slicing the page into independent layers
Because of will-change: transform, .follow gets promoted to a standalone GraphicsLayer at Compositing — owning its own cc::PictureLayerImpl. From here on, the button's hover animation no longer disturbs the main card — main can be drowning in React; the button still moves.
Module
cc
Process
Render
Thread
Compositor
Output
GraphicsLayer Tree
这一步在做什么
What it does
在 Compositor thread 上把 LayerImpl 树分组成独立的 GraphicsLayer——每一个 GraphicsLayer 拥有自己的纹理与变换矩阵,能独立动画、独立失效、独立合成。On the Compositor thread, group the LayerImpl tree into independent GraphicsLayers — each owns its own texture and transform matrix, can be animated, invalidated and composited on its own.
为什么不能跳过
Why not skip
没了图层切分,一次普通的滚动也会让所有像素重新 Paint + Raster。即便每个阶段都做缓存,无图层下也救不了——失效粒度是"全屏"。Without layer separation, even a single scroll repaints every pixel. Caches at every prior stage can't save you — the invalidation granularity collapses to "the whole screen".
Pre-paint already reserved a Transform-tree node for .follow; Compositing simply cashes in: extract .follow from the main layer into its own GraphicsLayer. The cost is one extra GPU texture (53×32 ≈ 6KB); the payoff is "only this layer moves on hover".
The promotion criteria: consult the cc::CompositingReason enum — 30+ triggers, any one matches and you're promoted. .follow matches kActiveTransformAnimation (because of will-change: transform); video / canvas / iframe / 3D transform / fixed scroll / overflow:scroll-snap would also trigger. Every layer costs memory, so cc has the reverse heuristic too — "if a layer is too small and isn't animating, fold it back into the parent".
Imagine the stage cut: Paint goes straight to Raster, straight to screen. The moment Raster's data isn't ready when vsync arrives — a frame drops. Even with caches at every prior stage, a single scroll would force every pixel re-Painted and re-Rastered.
Without will-change: .wobble shares a layer with its surroundings; every frame retriggers Paint + Raster of the whole layer. With will-change: transform: the Compositor promotes it to its own GraphicsLayer ahead of time, animation reduces to a matrix multiply on the Compositor thread — zero Main-thread work, zero Raster re-runs.
Animation runs on the Compositor thread · zero Main-thread involvement.
什么会被升格为独立图层 · 完整清单
What gets promoted to its own layer · the full list
Compositor 不是无脑给每个元素一个图层——它有明确的升格条件。下面这张表是常见的命中点(CAP 之后规则有简化但骨架不变):
The Compositor doesn't blindly give every element its own layer — it has explicit promotion criteria. The list below is the common hit set (post-CAP simplifies the rules, but the skeleton is the same):
Every promoted layer costs memory — tile cache, textures, property-tree entries. Sprinkle will-change everywhere and you watch VRAM filled with motionless layers. The rule: only on elements that actually move; remove it once the animation ends.
DevTools · 看一眼自己的图层
DevTools · inspecting your own layers
CHROME DEVTOOLS · LAYERS
每个图层为什么存在 / 占多少内存 / 重画了几次
Why each layer exists, how much memory, how many repaints
DevTools 的 Layers 面板(实验功能里开启)能列出当前页面的所有 GraphicsLayer,点开任意一个会告诉你:这个图层的产生原因(will-change / 3D transform / video / iframe / mix-blend-mode…)、纹理内存占用、已绘制次数。看到一个莫名其妙存在的图层,往往是性能洞的入口。
DevTools' Layers panel (toggle in experiments) lists every GraphicsLayer. Click one and it tells you: why this layer exists (will-change / 3D transform / video / iframe / mix-blend-mode…), texture memory, paint count so far. An unexpectedly-existing layer is often the entry to a perf hole.
输入事件的小后门
Input · the side door
Compositor Thread 还有一个小职能:处理输入事件。Browser Process 把 mousewheel / scroll / touch 投到 Compositor thread 上,它能直接处理而不用麻烦 Main thread——前提是页面没有 JS 监听这些事件。一旦你 addEventListener,Compositor 就只能把事件转发回 Main thread 了。
The Compositor thread also has a side door: input event handling. The Browser process throws mousewheel / scroll / touch directly to the Compositor — bypassing Main thread entirely — as long as no JS is listening. The moment you addEventListener, the Compositor must hand the event back to Main.
默认情况下浏览器认为你可能 preventDefault,所以把 touch 事件路由回 Main thread——一旦 Main 阻塞(JS 慢函数),滚动就掉帧。解决:明确声明 passive listener:
By default the browser assumes you might preventDefault, so it routes touch events back to Main — and any slow JS on Main drops the frame rate. Fix: declare a passive listener:
这样 Compositor thread 会直接处理事件,滚动永远不被 Main thread 拖累。
Now the Compositor thread handles the event directly — scrolling stops paying the Main-thread tax.
Compositor 像一个客厅。
你不开窗户,没人吵——所有动画自己滚自己的;
你一开 onScroll,Main thread 就被叫醒。
Field Note · 02
The Compositor is like a quiet living room.
Keep the windows shut and animations roll themselves;
open an onScroll and the Main thread is awakened.
Field Note · 02
两个橙色方块都在做"从左滑到右"的动画。但左边用 left:,右边用 transform:——一帧的 Chromium 流水线压根不一样。Both orange boxes do the same "slide from left to right". But the left one uses left:, the right one uses transform: — the Chromium pipeline for these two is completely different.
点 显示 Layer 边框——只有右边那块外面会出现蓝色虚线,代表它已经升格成独立的 cc::PictureLayerImpl,光栅化结果存在 GPU 纹理里,Compositor 每帧只是"换个位置贴一下"。左边那块每帧都要走完 Layout → Paint → Raster 全套——main thread 在死跑 React 时,左边会卡,右边不会。打开 DevTools Performance 录一段就能看到差距。Click Show Layer borders — only the right box gets a blue dashed border, meaning it has been promoted to a standalone cc::PictureLayerImpl whose raster result lives in a GPU texture. Compositor each frame just "places the texture at a new spot". The left box has to run Layout → Paint → Raster every frame — when main is burning on React, the left one janks, the right one doesn't. Record a Performance trace in DevTools to see the gap.
DEMO J · LIVE · Layer Promotion 实验台DEMO J · LIVE · Layer Promotion playground
到底什么 CSS 会创建独立 Layer?Which CSS actually creates an independent Layer?
勾选不同的属性看下面这块元素会不会升 Layer。蓝色虚线 = 独立 cc::PictureLayerImpl。右侧"Compositor 的理由"会列出实际起作用的因素。Toggle different properties to see whether the element gets promoted. Blue dashed border = standalone cc::PictureLayerImpl. The "Compositor reasons" list on the right enumerates what's actually driving the decision.
几个分享时值得点出的反直觉点:(1) 2D translateX 一般不升 Layer——现代 Chrome 是"按需"promote 的,单一个 translateX 不一定真创建图层(你想强制升要么用 will-change 要么 translateZ(0));(2) opacity 静态值不升 Layer,但配上动画就升;(3) filter: blur 不光升 Layer,还创建独立 RenderPass——blur 半径越大越贵。这就是为什么 backdrop-filter: blur 在低端机上是性能杀手。A few counterintuitive points worth calling out in the talk: (1) a lone 2D translateX often does NOT promote — modern Chrome lazy-promotes; to force promotion use will-change or translateZ(0); (2) static opacitydoes not promote, but animated opacity does; (3) filter: blur not only promotes — it creates its own RenderPass, and bigger blur radius = bigger bill. That's why backdrop-filter: blur is a perf killer on low-end devices.
The main card 340 × 88 splits into one 256 × 128 main tile (left half) + one 128 × 128 edge tile (right half). TileManager prioritizes tiles by distance from the viewport — the card is in the viewport, so both get NOW priority and enter the Raster queue immediately.
Module
cc
Process
Render
Thread
Compositor
Output
cc::TileTask[]
这一步在做什么
What it does
把每个 cc::PictureLayerImpl 按 256×256 / 512×512 切成一组 cc::Tile,根据距视口的距离排好优先级,封装成 cc::TileTask 投入 TaskGraph。这一步只调度,不画。Cut every cc::PictureLayerImpl into 256×256 / 512×512cc::Tiles, order them by viewport distance, and wrap each into a cc::TileTask for the TaskGraph. This stage only schedules — no painting.
为什么不能跳过
Why not skip
两条物理边界:① GPU 不支持任意大小的纹理——一张超大图层必须切② 多 Tab Chromium 共用一个统一缓冲池,Tile 是池的最小分配单位。没了 Tiling,多开几个 Tab 就显存爆。Two physical limits:① GPUs can't support arbitrary-sized textures — a huge layer must be split② multi-tab Chromium shares a unified buffer pool, and the tile is its smallest allocation unit. Without Tiling, opening a few extra tabs runs out of VRAM.
The main layer is 340×88 — sliced into 256-aligned tiles, that's 1 full tile + 1 right-edge tile (84×88). The standalone .follow layer is only 53×32 — smaller than a tile but still owns one (because it's bound to its own property-tree state). In-viewport, all 3 tiles are priority_bin = NOW:
↑ 中间是真名片 HTML。Tiling 之后名片被切成 3 块 tile——两块给主层、一块给 .follow。边缘瘦 tile 与小 .follow tile 都占满 256×256 纹理——这就是"Layer 升格"在内存上的真实代价。↑ The card in the middle is the real HTML. After Tiling, the card is sliced into 3 tiles — two for the main layer, one for .follow. The thin edge tile and the tiny .follow tile both occupy full 256×256 textures — that's the real memory cost of "layer promotion".
Three subtleties:① edge tiles still allocate a full 256×256 texture (GPUs reject arbitrary sizes, so #r2 is really 256×256 with only 84×88 painted — wasted memory, simpler pool management)② .follow occupies a tile of its own — this is the memory cost of "promotion to a composited layer", made concrete: a 53×32 button, eating a 256×256 texture slot③ 3 cc::TileTasks enter the TaskGraph — alongside 1 ImageDecodeTask (for airing.png), totaling 4 tasks for the Raster thread pool to chew through.
cc::Tile · 一个最小渲染单元的内部
cc::Tile · what's inside one mosaic piece
每一个 Tile 不是一张单纯的位图——它带着身份与状态:
A Tile is not a plain bitmap — it carries identity and state:
The figure below shows TileManager re-prioritising tiles during inertial scroll — deeper blue = higher priority; the chequerboard zone is the "not yet ready" placeholder:
FIG 13视口在网页中跳来跳去(模拟惯性滚动),cc::TileManager 实时调度每个 Tile 的光栅化优先级。As the viewport jumps across the page (simulating inertial scroll), cc::TileManager re-orders raster priority for every tile in real time.
Before a Tile task lands on a Raster thread, it travels this full chain — from the vsync-triggered BeginMainFrame down to SingleThreadTaskGraphRunner::ScheduleTasks:
256×256 is the empirical sweet spot. Mobile uses 256 (narrow screens, small animation range); desktop uses 512 (large viewport amortises the overhead). Tile size is not a constant — it is chosen per device.
Intuition says "small layer = small tile, big layer = big tile" should be most efficient. cc doesn't do this, because tiles aren't just "chunks" — they're the entire GPU memory system's "currency unit":
Uniform size → reusable texture pool. GPU texture allocation is extremely slow (tens of µs to ms). cc maintains a ResourcePool caching "allocated but idle" texture slots — when a tile needs one, grab from the pool, O(1). But that requires all tiles to be the same size: with mixed sizes, the pool buckets by size, large slots sit unused while small ones are starved — heavy fragmentation. Uniform 256/512 = every slot in the pool is fully interchangeable — the foundation of ResourcePool's hit rate.
统一尺寸 → 跨 Tab 共享内存。Chrome 多 Tab 共用同一个 GPU 进程的纹理池。如果 Tab A 用 256, Tab B 用 333, Tab C 用 512,池子被三种尺寸切碎,共享几乎不可能。统一标准让 30 个 Tab 共用一池——就像内存分页用 4KB 一个标准,不会按文件大小做动态页大小。
Uniform size → cross-tab memory sharing. Chrome's tabs share one GPU-process texture pool. If Tab A uses 256, Tab B uses 333, Tab C uses 512, the pool is sliced three ways and sharing is nearly impossible. Standard size lets 30 tabs share one pool — just like memory paging uses one 4KB standard, not dynamic page sizes per file.
Uniform size → simpler algorithm. TileManager's priority calc uses grid-coord distance fields — different grid pitches per layer would force normalisation before comparison. Uniform 256 makes "row N, col M" equivalent across all layers; priority compares directly.
So "per-device" 256 vs 512 is fine (same device → all layers use the same size → pool stays uniform); but "per-layer" dynamic isn't — it would gut the entire cache-reuse mechanism of ResourcePool. The same trade-off as an OS picking 4KB vs 2MB Huge Pages: the reuse dividend of one standard far outweighs the savings of "fitting each case".
预测光栅化 · 先低后高
Predictive raster · low first, high later
Chromium 还做一件事:"先粗后细"。首次合成图块时降低分辨率("LOW resolution tiling"),等优先级转成 NOW 后再补上高分辨率版本。这样首屏看起来"立刻有内容",但内容会在第二三帧"清晰一下"——你在 4G 网络下加载长页时常会看到这种行为。
Chromium also does "coarse first, fine later". The first composite of a tile is rendered at lower resolution ("LOW resolution tiling"); the high-res version follows once it gets bumped to NOW. The screen has "something" immediately, but you'll see content "sharpen" a frame or two later — common when you load a long page over 4G.
TileManager 与 ImageDecodeCache · 共用 TaskGraph
TileManager & ImageDecodeCache · sharing the TaskGraph
Tile tasks aren't islands — cc::ImageDecodeCache drops JPEG/PNG/WebP decode tasks into the same TaskGraph, consumed by the same Raster threads. Decode and rasterisation share one CPU pool. That's why image-heavy pages stutter more on low-end devices — Raster threads get monopolised by decoding.
cc::TileTask[]已按优先级排序,丢入 TaskGraphprioritised, posted into the TaskGraph
DEMO C · LIVE · Tiling 可视化DEMO C · LIVE · Tiling visualization
看见 TileManager 在排队Watch TileManager queue up
下面是一个 60 块 tile 的虚拟"长页面"。滚动视口(或者点自动滚动),看每块 tile 在 pending → rastering → active 三种状态之间切换——这正是 chrome://flags 里 "Tile borders" 工具显示给你看的东西。Below is a virtual "long page" of 60 tiles. Scroll the viewport (or click auto-scroll) and watch each tile move through pending → rastering → active — exactly what the "Tile borders" tool in chrome://flags shows you.
注意——视口外的 tile 永远不会变 active。它们要么在等(pending),要么在被 Skia 光栅化(rastering)。如果滚太快,pending tile 跟不上你,就会看到 checkerboarding。这就是为什么 content-visibility: auto 这么有效——它告诉 TileManager"这一坨视口外的内容,连 pending 队列都不用排",直接砍掉 70% 的 Raster 工作量。Note — tiles outside the viewport never turn active. They're either waiting (pending) or being rasterized by Skia (rastering). If you scroll too fast and pending tiles can't catch up, you'll see checkerboarding. That's exactly why content-visibility: auto is so effective — it tells TileManager "this off-screen chunk doesn't even need to enter the pending queue", cutting up to 70% of Raster work.
DEMO G · LIVE · content-visibility: autoDEMO G · LIVE · content-visibility: auto
一行 CSS 砍 70% TTI 的真相The truth behind "one CSS line cuts 70% TTI"
下面两列各渲染 800 个 item。左边普通渲染,右边在 item 上加了 content-visibility: auto——视口外的内容直接跳过 Layout/Paint/Raster。点 ▸ 按钮看实测时间差。Both columns render 800 items. Left is a normal render; right adds content-visibility: auto on each item — off-viewport content skips Layout/Paint/Raster. Click ▸ to measure the gap.
差距随屏幕尺寸 / 内容复杂度浮动,桌面端通常 3-10×,长列表 + 富 DOM 经常 20× 起步。配合 contain-intrinsic-size 让浏览器预估高度,可以避免滚动跳。**这一条是 2024-2025 前端性能最被低估的武器**——你今晚回去给你最长的列表加一行试试。The gap varies with screen size / content complexity — typically 3-10× on desktop; long lists + rich DOM often start at 20×. Pair with contain-intrinsic-size for the browser to estimate height and avoid scroll jumps. This is the most underrated frontend perf weapon of 2024-2025 — go add one line to your longest list tonight.
Each tile plays back its slice of the DisplayItemList — Skia translates commands into GL/Vulkan instructions; tiles become GL Textures in VRAM. The avatar airing.png flows through ImageDecodeCache — it finished downloading way back at Loading thanks to the Preload Scanner, so here it just hits cache, no re-decoding.
Module
cc
Process
Render
Thread
Raster ×N
Output
Tile texture / bitmap
这一步在做什么
What it does
Raster 线程逐个执行 cc::TileTask——把 DisplayItemList 中属于该 Tile 的绘画指令"Playback" 到一块纹理(GPU SharedImage)或位图(共享内存)上。13 步里第一次真正"画像素"的一步。The Raster threads run cc::TileTasks one by one — "playing back" the DisplayItemList's draw commands that fall in this tile onto a texture (GPU SharedImage) or bitmap (shared memory). The first stage in the 13 that actually paints pixels.
为什么不能跳过
Why not skip
DrawQuad 需要的"图片资源" 必须事先存在。Raster 是把 cc 的"指令"变成 GPU 能采样的"纹理"的唯一桥梁。没有 Raster,Display 阶段无 quad 可贴。DrawQuads require pre-existing "image resources". Raster is the sole bridge from cc's "instructions" into GPU-samplable "textures". Without Raster, Display has nothing to sample.
Three tiles dispatch to 3 Raster threads in parallel, each calling Skia to replay the relevant slice of the DisplayItemList onto its own SharedImage. Simultaneously, ImageDecodeCache hands airing.png to a 4th Raster thread for PNG decoding; the result lands directly into yet another SharedImage:
Notice #r4 is separate from #r1: the avatar doesn't belong to any tile — it's a standalone resource. When Tile #t1 plays back the avatar's DisplayItem, it actually writes a reference (DrawImageRect(SharedImage_id=#r4, dst_rect=avatar)); the GPU samples #r4 only at final composite time. This is the heart of "resource independent + reference assembled" — the same avatar can be shared across many tiles and layers without duplication. If on this frame at 9ms #r4 hasn't finished decoding, Tile #t1's playback still proceeds — the avatar slot just paints transparent; once #r4 is ready next frame, the avatar appears. This is the physical source of "avatar appears grey first, then loads".
"Playback" is not a metaphor — given the Tile's DisplayItemList, the Raster thread re-executes every DisplayItem in order, outputting onto the target buffer. This is exactly why Paint doesn't paint and Raster does — the same instruction list can be played back at different scales, on different tiles, on different devices. It's the heart of Chromium's performance model.
同步 vs 异步光栅化
Sync vs async rasterisation
浏览器走的是异步分块光栅化;移动 OS 与 Flutter 走同步光栅化。两条路线各有优势——下面这张对照能让你看清边界:
Browsers run async tiled raster; mobile OSes and Flutter run sync raster. Each route has its strong suit. The boundary, side-by-side:
SYNCHRONOUS
同步光栅化Synchronous raster
Android / iOS / Flutter · 间接像素缓冲
Android / iOS / Flutter · indirect pixel buffer
内存占用Memory footprintA+
首屏性能Cold-start TTIB
动态变化Dynamic contentB
图层动画Layer animationC
低端机Low-end devicesC
ASYNCHRONOUS · TILED
异步分块光栅化Async tiled raster
Chromium / WebView · Raster thread
Chromium / WebView · Raster thread
内存占用Memory footprintD
首屏性能Cold-start TTIC
动态变化Dynamic contentC
图层动画Layer animationA+
惯性滚动Inertial scrollA
总结一句:浏览器内核的性能,大半是用内存换来的。异步光栅化给惯性滚动和 CSS 动画带来绝对优势,但代价是内存占用极高 · 快速滚动会白屏 · 滚动中 DOM 更新可能不同步。
In one line: browser-engine performance is mostly bought with memory. Async raster gives inertial scroll and CSS animations their unfair advantage, at the cost of massive memory · white screens during fast scroll · DOM updates that may desync mid-scroll.
The figure below runs both strategies live — left: sync freezes the screen at each "raster" moment (yellow RASTER flash), scroll proceeds in discrete steps. Right: async scrolls continuously, but the viewport edges show chequer placeholders (raster hasn't caught up); tiles fill in one by one. The thread strips below explain who is moving and who is idle.
SYNC
同步光栅化 · 串行 · raster→composite→displaysync raster · serial · raster→composite→display
RASTER…
Main
GPU
Display
看出来: Main thread 一直在 raster (满格铜色),屏幕只在每次 raster 末尾"跳一下"。8 帧/4 秒 = 2 fps 的视觉节奏。每多滚 1px 就要重 raster 整屏。Look: the Main thread is always rastering (full copper bars), and the screen only "jumps" at the tail of each raster. 8 frames in 4s = 2 fps visual rhythm. Every extra px of scroll re-rasters the entire screen.
看出来: Main thread 全程闲(灰条),Compositor 与 GPU 不停跑(满格),Raster 1/2/3 三条线程并行各自处理 tile。视口里出现棋盘格的瞬间——那是raster 还没追上;但屏幕每帧都在动,不冻结。这就是 60fps 的代价:多 3 条 Raster 线程 + 一堆 SharedImage 内存。Look: the Main thread idles all the way (grey strip), Compositor and GPU run continuously, Raster 1/2/3 process tiles in parallel. The chequer cells in the viewport are tiles raster hasn't caught up to yet — but the screen moves every frame, never freezes. This is the cost of 60fps: three extra Raster threads + a chunk of SharedImage memory.
FIG 14·anim两种光栅化策略的实时对照。Sync 那侧整屏冻结、离散滚动;Async 那侧持续滚动、棋盘占位。彩蛋: 这个动画自身只用 transform 和 opacity,所以它正是它讲的"纯 Compositor 动画"的实例——你读这段字时,Main 和 Raster 都没在跑这个动画的任何一帧。The two raster strategies, live. Sync freezes the entire screen and scrolls in discrete steps; Async scrolls continuously and shows chequer placeholders. Easter egg: this very animation uses only transform and opacity, which means it is itself an instance of the "pure Compositor animation" it describes — while you read this caption, neither Main nor Raster runs a single frame of this animation.
Different hardware / accel capability / config map to different RasterBufferProvider subtypes. Their difference is really about "how the raster output reaches GPU memory" — fewer copies the better:
SharedImage is Chromium's abstraction over GPU data storage — it replaced the older Mailbox mechanism. The architecture is a classic Client / Service split:
FIG 14SharedImage:多个 Client 都能直写 GPU 内存,由 GPU Process 上的唯一 Service 协调。这是 cc Raster 与 Viz 之间能"零拷贝"传纹理的底座。SharedImage: many Clients write directly to GPU memory, coordinated by the single Service in the GPU process. This is the substrate that makes textures travel from cc Raster to Viz with zero copies.
Got <img> on the page? JPEG / PNG / WebP decoding also happens here — cc::ImageDecodeCache orchestrates the Raster threads to decode asynchronously: decode tasks and tile tasks share the same TaskGraph.
Raster thread count is fixed (typically ≤ 4, tied to CPU cores). A large image decode takes 10–80ms; while it holds a thread, tile tasks queue behind it. Critical viewport content gets delayed — the root cause of "image-heavy pages scroll-jank + render-slow".
The difference between all RasterBufferProvider subclasses really hides in two virtuals — AcquireBufferForRaster() decides where the texture is borrowed from, RasterBuffer::Playback() decides how the DisplayItemList lands on it:
The string of "depends_on_*" booleans hides a core secret of cc — it asks: does this tile's output depend on any "not-yet-decoded image"?. If yes, cc cannot reuse the previous tile's pixels for partial raster — the whole tile re-queues. This is why "scrolling onto a WebP-heavy region often shows the chequer first" — the precondition fails, and the tile gets re-scheduled from scratch.
The GPU raster path on Skia rides on DDL (Deferred Display List). Skia physically splits "building the drawing commands" from "submitting them to the GPU" into two threads: the Raster thread records the DDL (no GL context), the GPU thread replays it (owns the GL context). Four steps:
This is why Chromium's desktop rendering doesn't need to "funnel every GL call back to the main thread" — Raster threads only build command buffers, while the GPU context stays exclusive to the Viz process from beginning to end. Many tabs rasterising in parallel = dozens of Raster threads each writing their own DDL, queued for the single GPU thread to replay. "Record / Replay" is the real hinge of Chromium's multi-threaded rendering.
NOW · ready NOW · raster pending LOW-res placeholder checker · 未光栅化 视口
RASTER THREADS · 6ms
Raster 1
tile #t1
tile #t5
tile #t9
Raster 2
tile #t2
tile #t6
Raster 3
tile #t3
tile #t7
Raster 4
decode avatar.png
tile #t8
03 ms6 ms
看 4 件事:① 视口内(黄框)必须全绿 — 否则就是棋盘② 视口外的 LOW-res(对角线)是预渲染,会逐步 upgrade 到 NOW③ 右侧 4 条 Raster 线程满载并行,Raster 4 上 Image decode 跟 tile raster 共用 TaskGraph(C14 那段说的)④ 棋盘格密集 = TileManager 优先级排错或 raster 跟不上 — 用 cc.debug.scheduling 找瓶颈。4 things to look at:① the viewport (yellow border) must be all green — otherwise chequer② LOW-res (diagonal) outside the viewport is pre-render, gradually upgraded to NOW③ the 4 Raster lanes on the right run in full parallel; Raster 4 shares its TaskGraph between image decode and tile raster (the C14 footgun)④ dense chequer = TileManager priority misorder or raster can't keep up — use cc.debug.scheduling to find the bottleneck.
pending → active → recycle, the buffering you didn't see
主线 · The Card 在这一步
PENDING → ACTIVE
两块 tile 全部 ready,Pending Tree 整体 swap 进 Active Tree。原 Active 退役到 Recycle Pool,留作复用。从这一刻起,新版名片正式登上"会被合成"的舞台——但还没真上屏。
Main-line · The Card after this stage
PENDING → ACTIVE
Both tiles are ready; the Pending Tree atomically swaps into Active. The old Active retires to the Recycle Pool for reuse. From this moment, the new card officially takes the will-be-composited stage — but it isn't on the screen yet.
Module
cc
Process
Render
Thread
Compositor
Output
Active LayerImpl Tree
这一步在做什么
What it does
把已经光栅化好的 Pending Tree "翻"成可被 Draw 使用的 Active Tree。这一步是 Compositor thread 上的原子切换:切换前后屏幕始终能看到一帧合法画面。Promote the now-rasterised Pending Tree into a draw-ready Active Tree. The switch is atomic on the Compositor thread — the screen always sees a valid frame, before and after.
为什么不能跳过
Why not skip
没了 Activate,光栅化与上屏就成了串行:要么等所有 tile 画完再上屏(卡顿),要么边画边上屏(撕裂)。三棵树的中间层是把这两个矛盾解开的设计。Without Activate, raster and display would serialise: either wait for every tile, then display (stalls) or display while painting (tearing). The triple-tree middle layer is what unties the knot.
All 3 tiles successfully rastered — IsReadyToActivate() returns true. The Compositor thread swaps the active_tree_ and pending_tree_ pointers: the old pending (freshly rastered) becomes active; the old active (last frame) becomes recycle. From this instant, Draw emits quads from the new active.
Counter-example: if avatar #r4 hasn't decoded yet, #t1 is a "partial raster with placeholder"; IsReadyToActivate() may still return true (partial raster counts as ready). But num_missing_tiles > 0, and this number rides the CompositorFrame metadata back to Viz — Viz knows another frame is likely needed. "Activated ≠ complete" is a key fact about this stage — it only guarantees "drawable", never "complete".
三棵树 · 各司其职
Three trees · each with one job
Compositor thread 同时持有三棵 LayerImpl 树:
The Compositor thread holds three LayerImpl trees at the same time:
class LayerTreeHostImpl {// Tree currently being drawn. std::unique_ptr<LayerTreeImpl> active_tree_;// In impl-side painting mode, tree with possibly// incomplete rasterized content.// May be promoted to active by ActivateSyncTree(). std::unique_ptr<LayerTreeImpl> pending_tree_;// Inert tree with layers that can be recycled// by the next sync from the main thread. std::unique_ptr<LayerTreeImpl> recycle_tree_;};
两棵不够:如果只有 Pending + Active,每次 Commit 都要等 Active 用完再回收,主线程要等 Compositor 一次 Draw 周期,无法连续提交。
Two trees is not enough: with just Pending + Active, every Commit would have to wait for Active to be done before recycling, so the Main thread waits a Compositor draw cycle — no continuous commits.
Activation has a precondition: IsReadyToActivate() must return true — meaning every viewport tile on the Pending tree is rasterised. If a far-away tile or an image decode hasn't finished, activation is deferred until NotifyReadyToActivate fires. That's the physical source of "checkerboard tiles during fast scroll" — Pending isn't ready, so Active is still last frame.
为什么是三棵 LayerImpl,不是两棵? · 三个并发态需要三块"各自占位"Why three LayerImpl trees, not two? · three concurrent phases need three independent slots
Pending + Active 两棵看起来够了——但实际跑下来,每一帧都会卡一下。三棵的真正理由,是"下一帧的准备工作 ‖ 当前帧的展示 ‖ 上一帧的回收" 这三件事在物理上同时存在:
Two trees (Pending + Active) seem enough — but in practice, every frame would stall briefly. The real reason for three: "preparing the next frame ‖ displaying the current frame ‖ recycling the previous frame" all coexist in time:
// 时间线 · 三棵树的"分时复用"
t = 0ms Active ─▶ 上屏画 frame N // GPU 在读它Pending ─▶ 接收 frame N+1 的 Commit
Recycle ─▶ 闲(留给 N+2)
t = 4ms Pending ─▶ Raster 完成,准备 Activate
Active ─▶ 仍在画 frame N(GPU 没完)
Recycle ─▶ 闲
t = 8ms vsync · ActivateSyncTree() 切换指针
old Active ──▶ Recycle(等下一次 Commit 复用)
old Pending ─▶ Active(开始上屏 frame N+1)
new Pending ◀─ Main 线程开始 Commit frame N+2 进来
三棵刚好对应三个角色:
Three trees, three roles:
Active · 当前帧,GPU 正在采样它的 tile 纹理。这棵树只能读,不能改——一改 GPU 就读到撕裂数据。
Active · the current frame, GPU is sampling its tile textures. This tree is read-only — mutate it and the GPU sees torn data.
Pending · the next frame's worksite, Raster threads pour new tiles in, Main's Commit also writes here. Must be independent of Active, or the rule above is violated.
Recycle · the "resting position" for the just-retired Active tree. When the next Commit arrives, simply rename Recycle to Pending (pointer swap) and reuse its memory slot. Without Recycle, every Commit would allocate a fresh LayerImpl tree — GC + fragmentation cost would be brutal.
Only 2 trees (Active + Pending): when Activate promotes Pending to Active, the old Active is immediately discarded — next Commit must build a fresh one. One alloc + one free per frame, 60 times/sec at 60Hz. Each LayerImpl carries Tile references + property-tree copies; allocating multi-MB object trees that often is expensive.
So "three" is forced by the pipeline's three concurrent states: "being drawn" / "being built" / "waiting to be reused". The same pattern as OS process scheduling (running / ready / blocked), and database WAL (active / clean / recycled). Any "consume + produce + recycle" system's minimum viable config is three slots.
DEVTOOLS
Performance > "Activate Layer Tree" 事件;Layers 面板看 pending vs activePerformance > "Activate Layer Tree" event; Layers panel for pending vs active
3 个观察点:① Active 指针在每个 vsync 整数倍切换 = 健康节奏(60fps)② Pending 提前 ~12ms 完成 build 才跟得上 — 跟不上(图中 frame N+4)就 Activate 推迟,Active 还显示老 frame,用户感知"这一帧没动"③ Recycle 永远满载 — 它总是上一个被替换下来的 Active,留给下一次 Commit 复用,这就是"不需要每帧 alloc/free LayerImpl" 的源头。3 watch points:① Active pointer flips at every vsync = healthy rhythm (60fps)② Pending finishes build ~12ms early to keep up — when it can't (frame N+4 here), Activate is delayed, Active stays on the old frame, and the user perceives "this frame didn't move"③ Recycle is always full — always holding the just-retired Active, ready for next Commit's reuse. This is the source of "no per-frame alloc/free of LayerImpl".
Main layer emits 2 TileDrawQuads (one per tile); .follow emits 1 TileDrawQuad; the button shadow triggers a separate RenderPass (needs offscreen blur), referenced via RenderPassDrawQuad. The whole CompositorFrame ships to the viz process via IPC — the card is a complete script, waiting for the director to call "rolling".
Module
cc
Process
Render
Thread
Compositor
Output
viz::CompositorFrame
这一步在做什么
What it does
遍历 Active Tree 的每个 LayerImpl,调用 AppendQuads 生成一组 viz::DrawQuad,封装为 viz::CompositorFrame,发给 Viz Process。这一步不动 GPU 一根毛——它生产的是"指令脚本"。Walk the Active tree's LayerImpls, call AppendQuads on each, produce a batch of viz::DrawQuad, wrap them in a viz::CompositorFrame and ship to the Viz process. The GPU is not touched here — what's produced is an "instruction script".
为什么不能跳过
Why not skip
Render Process 不能直接画屏幕——OS 把 GPU 上下文交给 GPU/Viz Process。所以 Render 必须把"我想画什么"序列化成 CF,由 Viz 执行。这是多进程隔离的代价。A Render process cannot draw to the screen directly — the OS hands the GPU context to the GPU/Viz process. So Render must serialise "what to draw" into a CF that Viz executes. This is the price of multi-process isolation.
On the Active tree, AppendQuads runs: the main PictureLayerImpl emits 2 TileDrawQuads (one per tile); the standalone .follow layer emits 1. Because box-shadow flipped render_surface_reason_ on in Pre-paint, the shadow becomes its own RenderPass. The final viz::CompositorFrame:
Four meaningful details:① the shadow gets its own RenderPass + #r5 temporary texture — this is box-shadow's real GPU cost, the larger the blur radius the bigger the temp texture② the two main-layer TileDrawQuads share one SharedQuadState (same transform/clip/opacity) — Viz computes the matrix once③ the avatar doesn't appear in the quad list — its reference is baked into #r1's tile texture (Raster painted it in); to Viz, #r1 is just one solid 256×88 chunk of pixels④ the whole CF travels via LayerTreeFrameSink::SubmitCompositorFrame over Mojo IPC to the Viz process — at this instant, the Render process is done with this frame.
The hairiest one is PictureLayerImpl::AppendQuads — it walks all "currently visible tiles" of the layer and emits Quads based on each tile's state: rasterised → TileDrawQuad; in-viewport-but-unrasterised → SolidColorDrawQuad placeholder (background colour + chequer); missing but low-res available → fall back to the low-res tier. The skeleton:
Two details form the contract behind smooth scrolling:① shared SharedQuadState — the few hundred TileDrawQuads from a single layer share one transform / clip / opacity, Viz computes the matrix once② num_missing_tiles bubbles into the frame metadata — Viz reads it to know "how chequered is this frame?" and, if needed, can defer Activate until the next Raster cycle catches up. That's the actual shape of the "feedback loop" between cc and viz.
By default, all DrawQuads land in the same RenderPass — "draw onto the root surface". But when a layer has an effect that requires off-screen compositing (e.g. filter: blur(8px), mix-blend-mode, mask-image), cc creates a dedicated RenderPass: render the subtree to a temporary texture, then reference it back into the main pass as a RenderPassDrawQuad.
backdrop-filter reads the background pixels and blurs them. cc spins up a RenderPass: render every layer it covers to a temporary texture, run the blur shader, then composite back into the main pass. Every frame pays for an off-screen pass + a GPU blur — double cost in VRAM and bandwidth.
DrawQuad 的 6 种类型
Six flavours of DrawQuad
TileDrawQuad
最常见——一个 Tile 块。DisplayItemList 被 cc 光栅化后就变它。
The default — one tile. DisplayItemLists become these after cc rasterises them.
TextureDrawQuad
引用一份 GPU 资源——Canvas / WebGL / Video 都走它。
References a GPU resource — Canvas / WebGL / Video all take this path.
SolidColorDrawQuad
纯色矩形。最便宜的 Quad。
A solid-coloured rectangle. The cheapest quad on the menu.
RenderPassDrawQuad
引用另一个 RenderPass 的 ID——给嵌套特效用。
References another RenderPass by ID — for nested effects.
SurfaceDrawQuad
嵌入另一个进程的 Surface——OOPIF / OffscreenCanvas 的关键。
Embeds a Surface from another process — the linchpin of OOPIF and OffscreenCanvas.
PictureDrawQuad
里面直接装 DisplayItemList——目前只 Android WebView 用。
Carries a DisplayItemList directly — only Android WebView uses this today.
LayerTreeFrameSink · 把 CF 寄出去
LayerTreeFrameSink · the parcel office
CF 装好之后,cc 调用 LayerTreeFrameSink::SubmitCompositorFrame(local_surface_id, frame, hit_test_data) 把它通过 Mojo IPC 投到 Viz 进程。Render Process 的渲染至此结束——剩下的事归 Viz 管。
Once the CF is packed, cc calls LayerTreeFrameSink::SubmitCompositorFrame(local_surface_id, frame, hit_test_data) and ships it to Viz over Mojo IPC. The Render process's rendering work ends here — everything that follows belongs to Viz.
为什么叫 "Submit" 而不是 "Draw"WHY "SUBMIT", NOT "DRAW"cc 在这里强调"提交"——它不直接画像素,只是把"应该画什么"寄给 Viz。如果 Viz 没空(GPU 忙、vsync 错过),CF 会被排队甚至丢弃。Submit 的成功 ≠ 上屏。Render 进程通过 BeginFrameAck 才知道"自己上一次的 Submit 上屏了没"。cc emphasises "submit" here — it doesn't paint pixels, it ships "what to paint" to Viz. If Viz is busy (GPU is full, vsync missed), the CF queues or even drops. Submit succeeding ≠ on-screen. The Render process learns whether its last submission landed via BeginFrameAck.
Submit 与 Draw,Chromium 词汇里到底是什么区别? · "下单" vs "下厨"Submit vs Draw — what's the precise difference in Chromium-speak? · "placing the order" vs "cooking it"
In Chromium's vocabulary, these two words are never interchangeable, but they look so similar they're easy to confuse. The simplest analogy: Submit is "placing the order", Draw is "cooking it".
Submit
Draw
谁在做Who
cc (Render Process · Compositor thread)
Viz (GPU Process · GPU thread)
在做什么Doing what
把 CompositorFrame 通过 Mojo IPC 寄出去ships CompositorFrame via Mojo IPC
真的调 GL/Vulkan 把像素画到 framebufferactually calls GL/Vulkan, paints pixels to framebuffer
"Submit success" only tells you "cc packaged its work". Web Vitals' LCP/CLS cannot use Submit time — they must use actual on-screen time. That's exactly why Chrome internals have FrameMetrics, which after the Display stage uses BeginFrameAck to feed "did this frame reach the screen?" back to the Render process.
The "queue period" between Submit and Draw is Viz's load buffer. The GPU is occasionally busy (other tabs running heavy animations); Viz can queue 2-3 frames of CF. Once it exceeds 3, old ones drop (kSkipped) — cc, via BeginFrameAck, learns "my last frame was wasted work" and decides whether to degrade the next (lower resolution, skip animation frame). "Submit is push, Draw is pull, queue in between" is the essence of this architecture.
Analogy: very much like Git's git push vs git merge — push only "uploads commits to the remote" (which may reject); merge is "actually integrating into trunk". Chromium pushes this "submit ≠ land" semantic to the limit, and any perf monitor that conflates Submit with Draw timestamps gives wrong numbers.
viz stitches our CompositorFrame, the OOPIF CompositorFrame, and the Browser UI CompositorFrame into a single surface at the Aggregate stage. In the embed variant, the parent references our frame via SurfaceDrawQuad — the card is now part of someone else's page, but the composition cost is virtually unchanged.
Module
viz
Process
GPU(hosts viz)(hosts viz)
Thread
Skia / Display Compositor
Output
Aggregated CF
这一步在做什么
What it does
Viz 把当前所有"活着"的 CF(来自 Browser UI、每个 Render、每个 OOPIF、每个 OffscreenCanvas)按 SurfaceId 引用关系铺平成一份 Aggregated CF。Viz takes every live CF (Browser UI, every Render process, every OOPIF, every OffscreenCanvas) and flattens them — following SurfaceId references — into a single Aggregated CF.
为什么不能跳过
Why not skip
屏幕只有一块。多进程产出多份 CF,必须有人决定它们的层叠与裁剪。Aggregate 是 Site Isolation × 流畅渲染的结合点。There's only one screen. Multiple processes produce multiple CFs — someone has to decide their stacking and clipping. Aggregate is where Site Isolation meets smooth rendering.
The variant for this stage: drop the card into an imaginary "third-party praise wall" page. The card is rendered by Render B (ursb.me origin), the parent page by Render A. Both processes submit their CFs to Viz; SurfaceAggregator flattens them into one:
3 件让人惊叹的事:① Render A 永远拿不到名片的真实像素——它只持有一个 SurfaceId,具体的 #r1~#r5 由 Viz 进程持有。这是 Site Isolation 的图形侧实现,跨域 iframe 的安全边界靠这一刀刻出来② 变换矩阵会跨边界相乘——父页面给名片 Surface 应用的 T_card 与名片自己内部的 T_inside 在 Viz 里乘起来,等价于"名片直接画在父页面坐标系上"③ 裁剪求交可能让整张卡白白渲染——如果父页面把名片的 clip 设成 0×0(可能因 overflow:hidden 滚出视口),Viz 会跳过整张卡的所有 quad,GPU 一根毛不动,但 Render B 的 cc 仍然在背后默默 raster——这就是"不可见的 OOPIF 也消耗 CPU 但不消耗 GPU"。
3 things to marvel at:① Render A can never see the card's real pixels — it only holds a SurfaceId; the actual #r1~#r5 live in the Viz process. This is Site Isolation's graphics-side implementation; the cross-origin iframe security boundary is carved here② transform matrices multiply across the boundary — the parent's T_card applied to the card's Surface times the card's own internal T_inside equals "the card painted directly into the parent's coordinate system"③ clip intersection can make the whole card render in vain — if the parent clips the card to 0×0 (e.g. scrolled out via overflow:hidden), Viz skips every quad of the card, the GPU doesn't move, but Render B's cc is still silently rastering in the background — this is "invisible OOPIFs cost CPU but not GPU".
Aggregate walks depth-first: start from the root surface (typically Browser UI), every SurfaceDrawQuad hit triggers a jump to the referenced Surface, copy its RenderPasses and DrawQuads (with proper transform + clip), then continue.
FIG 17SurfaceAggregator 把分布在多个进程的 CF 合成一帧。OOPIF 之所以"隔离但顺滑",靠的是这个步骤。SurfaceAggregator merges CFs scattered across processes into a single frame. OOPIF stays isolated yet seamless because of this stage.
这套 ID 是跨进程引用的核心——一个 OOPIF 知道父页面的 SurfaceId,但拿不到真实 GPU 纹理;一切只通过这个 ID 由 Viz 在合成时解引用。
This is how cross-process references work — an OOPIF knows the parent page's SurfaceId but cannot reach its actual GPU textures; the dereference happens inside Viz during aggregation.
Damage 跟踪 · 不是每帧都"全合"
Damage tracking · not every frame is fully aggregated
Aggregate has built-in diffing. Each frame, SurfaceAggregator computes damage_rect — the actual area changed since the previous frame. The GPU only redraws that area; the rest is reused from last frame's Front Buffer. A static page + one spinning badge can mean only a few hundred pixels of GPU work per frame.
CASE · OOPIF
为什么跨域 iframe 也能"完美贴合"父页
Why cross-origin iframes still composite seamlessly
父页面 Render A 的 CF 里包含一个 SurfaceDrawQuad,引用 OOPIF Render B 的 SurfaceId。两进程独立提交 CF 到 Viz;SurfaceAggregator 在 Viz 里把它们用 变换矩阵 + 裁剪 rect 拼好。父页面永远拿不到 OOPIF 的像素,但屏幕上看起来天衣无缝——这就是 Site Isolation 的图形侧实现。
The parent page's CF (Render A) contains a SurfaceDrawQuad referencing OOPIF Render B's SurfaceId. Both processes submit CFs to Viz independently; SurfaceAggregator stitches them together with transform + clip rect. The parent never sees the OOPIF's pixels, yet the screen looks seamless. This is Site Isolation's graphics-side implementation.
HandleSurfaceQuad · 跨进程指针的展开
HandleSurfaceQuad · expanding the cross-process pointer
The flatten algorithm's key hook is SurfaceAggregator::HandleSurfaceQuad. Each DFS visit to a SurfaceDrawQuadexpands the referenced surface's whole subtree inline into the main RenderPass, while threading "child coordinate system → parent" transforms and clips the whole way:
Two pieces of math deliver the "perfectly stitched" look:① transform multiplication — parent frame's transform × child RenderPass's transform = the child quad's final on-screen pose② clip-rect intersection — parent's clip ∩ child surface's clip = the actual visible region. If the intersection is empty, the whole child surface is never rasterised — no GPU memory allocated. That's why "invisible OOPIFs cost no GPU".
为什么 SurfaceAggregator 用 DFS,不是 BFS? · 渲染顺序就是深度顺序Why does SurfaceAggregator use DFS, not BFS? · rendering order IS depth order
"Wouldn't BFS be faster? Shallower levels, cache-friendly." — a fair question. But Aggregate is a step in a rendering pipeline, and DFS is forced by three hard constraints:
z-order is depth-order, not breadth-order. "Child surface paints on top of parent surface" is the HTML/CSS stacking rule — picture a surface tree with Browser UI at the root and the deepest OOPIF at the leaves. The correct paint order: start from the root, draw self, then immediately recurse into the first child's entire subtree, then into the second child's, etc. — that's exactly DFS pre-order. Under BFS you'd paint all level-1 surfaces, then all level-2, but two level-N surfaces have no relation to each other and get interleaved across subtrees — z-order breaks.
RenderPass dependency = child before parent. In SurfaceAggregator's output RenderPass list, when a parent references a child Pass via RenderPassDrawQuad, the child Pass must appear earlier in the list (GPU executes in list order). DFS post-order naturally produces a "leaves first, root last" list — a topological sort. BFS gives no such guarantee — you'd need a separate topo-sort pass, doubling the cost.
Earliest possible pruning. When DFS recurses into a child surface, it can immediately compute "parent transform × child transform" and "parent clip ∩ child clip" — if the intersection is empty, the entire subtree is skipped, not even one quad is copied. BFS finishes level 1 before level 2 — either compute every transform/clip up front (waste), or discover the empty clip late (waste). DFS's "back off on empty clip" is natural pruning.
Bottom line: DFS naturally fits SurfaceAggregator on three axes — z-order = depth-first, RenderPass dependency = topological order, clip pruning = early backoff. BFS looks friendlier but each property costs extra code. "Pick the right traversal and half the algorithm is free" is a common phenomenon in graph engineering.
DEVTOOLS
chrome://compositor-thread-rendering-stats · Performance > "Frame submitted to display"
SkiaRenderer translates DrawQuads into GL/Vulkan/Metal commands; SwapBuffers hits the screen — a pixel inside the avatar physically emits its first photon. On a 120 Hz display, every 8.3 ms. On hover: transform stays on Compositor and never disturbs Main, but background drags Main back in — real code rarely produces 100% Compositor-pure animations.
Module
viz
Process
GPU(hosts viz)(hosts viz)
Thread
Skia / GPU main
Output
on-screen pixels
这一步在做什么
What it does
把 Aggregated CF 里的 DrawQuad 翻译成实际 GPU 调用,画到 Back Buffer;vsync 一来,Display::DrawAndSwap 把 Back / Front 互换,新一帧就出现在屏幕上。Translate the Aggregated CF's DrawQuads into real GPU calls, paint them into the Back Buffer; on vsync, Display::DrawAndSwap swaps Back / Front, and the new frame appears on screen.
为什么不能跳过
Why not skip
这是 13 步流水线的唯一真实操作 GPU 的一步。前 12 步都在"准备"——分类、组织、序列化、调度——而 Display 是把指令真的执行下去的那一刻。This is the only step of the 13 that actually drives the GPU. The previous 12 stages all "prepare" — classify, organise, serialise, schedule — Display is the moment instructions actually execute.
After all 12 stages, the Aggregated CF lands in the SkiaRenderer in Viz. SkiaRenderer records every quad into one SkDeferredDisplayList, hands it to the GPU thread → SkSurface::draw replays it → OutputSurface::SwapBuffers() → the card lights up before your eyes.
Try this now: hover over the Follow button on the real Airing card at the top of this article (the interactive one in the Main-line example chapter). The entire execution path of the hover animation:
This is what 13 stages and 20 years of engineering bought you — a single transform animation, from input to on-screen, crosses 3 processes and 5 thread segments, yet every segment only moves the bare minimum it must. One Transform-tree node mutates; one transform matrix on .follow LayerImpl mutates; one 53×32 tile re-rasters; one TileDrawQuad re-emits; the GPU repaints 1696 pixels; SwapBuffers. "What's not recomputed — that is performance itself" — this is the true meaning of C5's epigraph.
但要注意CAVEAT同一次 hover 还同时改了 background ——这个动画走的是 Paint 路径,Main thread 会 被叫起来。所以"纯 Compositor 动画" 在真实代码里很少 100% 纯。名片用的是混合动画,这正是真实业务的样子。要 100% 纯合成,只改 transform / opacity / filter 即可。The same hover also mutates background — that path goes through Paint, and Main thread does wake up. So "pure Compositor animation" is rare in real code at 100%. The card uses a hybrid animation, which is what real product code looks like. To stay 100% on the Compositor, only mutate transform / opacity / filter.
SkiaRenderer's core is the SkDeferredDisplayListRecorder (DDL) — it doesn't paint immediately, but records every RenderPass's draw operations into a DDL. When all RenderPasses are recorded, SkiaOutputSurfaceImpl::SubmitPaint ships the whole batch to SkiaOutputSurfaceImplOnGpu for one execution on the GPU thread.
FIG 18SkiaRenderer 的延迟绘制流:DrawQuads 先在 Compositor thread 录成 DDL,最后在 GPU thread 上 SkSurface::draw 一次性执行。SkiaRenderer's deferred-draw flow: DrawQuads recorded into a DDL on the Compositor thread, then the GPU thread runs SkSurface::draw in one shot.
GLRenderer (deprecated) tunnels through the CommandBuffer: GL calls on the Compositor thread don't really execute — they're serialised into a command byte stream, posted via InProcessCommandBuffer to the GPU process's CrGpuMain thread, where the real OpenGL ES happens. The split decouples GL caller ↔ real driver — and is what makes the security sandbox possible.
Every modern graphics stack double-buffers. Front Buffer is what the screen reads; Back Buffer is where you paint. At vsync, Display::DrawAndSwapswaps the pointers and the new frame is on display the next refresh — the screen never sees a half-painted frame.
Front Buffer
A
Back Buffer
B
VSYNC
FIG 18.VVSync 的每次"▼",Front 与 Back 互换。屏幕永远从 Front 读,所以从不闪烁。At every "▼" of vsync, Front and Back swap. The screen always reads Front — and never flickers.
Triple buffer · 牺牲一点延迟换流畅
Triple buffer · trading a touch of latency for smoothness
Mobile OSes (Android Surface Flinger / iOS) default to triple buffering: while the GPU is still painting frame N and the screen still displays N-1, the CPU can already prepare frame N+1. The cost is +1 frame of input latency; the reward is far less stutter — no party ever waits for another. Chromium employs a similar strategy on most desktop / mobile platforms.
"Why not just translate quads to glDrawArrays directly?" That was GLRenderer's path (now deprecated). SkiaRenderer chose DDL (SkDeferredDisplayList) for three reasons:
GL 上下文不能跨线程。OpenGL/Vulkan 规范要求同一时刻一个 GL 上下文只能被一个线程访问(make_current 的语义)。如果 SkiaRenderer 在 Compositor 线程上直接调 GL,就要把 GL 上下文 make_current 到 Compositor 线程——但 Compositor 线程除了渲染还要处理输入、滚动、动画,GL 上下文的独占性会变成串行瓶颈。DDL 解耦了"构造命令"与"提交命令":Compositor 线程构造 DDL(无 GL 上下文,纯内存操作),GPU 线程独占 GL 上下文执行 DDL,两条线程真正并行。
GL contexts can't cross threads. OpenGL/Vulkan specs require only one thread can access a GL context at a time (make_current semantics). If SkiaRenderer called GL directly on the Compositor thread, you'd have to make_current to that thread — but Compositor also handles input, scroll, animation, and the GL context's exclusivity becomes a serial bottleneck. DDL decouples "building commands" from "submitting commands": Compositor builds DDL (no GL context, pure memory ops), GPU thread exclusively owns the GL context and replays the DDL — true parallelism.
Batched submission = minimum state changes. GL "state changes" (swap shader, swap texture, swap blend mode) are extremely slow on GPUs. Quad-by-quad GL means a state switch per quad — most of the GPU's time goes to swapping state, not painting pixels. Skia, when recording the DDL, reorders commands to batch same-state draws together (like a database batching queries) — setShader once, draw 100 quads, then switch state. 3-5× faster than quad-by-quad GL in practice.
Pluggable backends = Skia abstraction dividend. The same DDL can feed Skia's GL backend, Vulkan backend, Metal backend (macOS), Dawn (WebGPU), even a software backend. GLRenderer hard-coded GL; Vulkan / Metal / Graphite new backends would need a full renderer rewrite. SkiaRenderer covers all of them in one codebase — switching backends is just swapping the SkSurface implementation. This is why Chrome 122+'s "SkiaGraphite" experiment (swapping Skia rendering to a wgpu-based modern backend) only touches SkiaRenderer, not cc — the dividend of a clean architecture.
In essence, DDL is Skia's intermediate representation for the "multi-threaded graphics API era" — same kind of thing as LLVM IR for compilers, or Mojo IDL for IPC. "Record + replay" is the path of any system that needs to decouple "construction" from "execution". Chromium uses it at the very end of the rendering pipeline, locking the GL context's serial bottleneck inside one GPU-process thread once and for all.
DEVTOOLS
chrome://gpu · Performance > GPU lane · Rendering > FPS meter
Bytes complete the thirteen steps,
and Chromium hands you the entire 16.7 ms of magic —
what you saw was one frame.
Field Note · 02
DEMO O · LIVE · rAF vs setTimeout 帧时序DEMO O · LIVE · rAF vs setTimeout frame timing
"setTimeout(fn, 16) 就当 rAF 用"的代价The real cost of "I'll just use setTimeout(fn, 16) as a rAF"
跑 3 秒,分别用 setTimeout(fn, 16) 和 requestAnimationFrame 当动画驱动,记录每次回调之间的真实间隔。看 std-dev、平均漂移、和下面那两条波形——setTimeout 永远跟 vsync 错位、且 std 比 rAF 大几倍。Runs both setTimeout(fn, 16) and requestAnimationFrame for 3 seconds as animation drivers, recording the actual interval between callbacks. Watch std-dev, average drift, and the two waveforms below — setTimeout always drifts away from vsync, with std-dev several times worse than rAF.
几个值得点出来的反直觉点:(1) setTimeout(fn, 16) 的 16 ms 是最小延迟,不是"准点 16 ms"——浏览器调度有 1-5 ms clamp + 主线程闲忙抖动,所以平均经常是 17-22 ms;(2) requestAnimationFrame跟 vsync 信号绑死,60Hz 屏稳定 16.7 ms,120Hz 稳定 8.3 ms——它就是"下一次 vsync 来之前调我"的承诺;(3) 在 120Hz 屏上跑这个 demo——setTimeout 还是 16-22 ms 一拍,rAF 直接降到 8.3 ms 一拍,这就是为什么硬编码 16 ms 假设在高刷屏上立刻翻车。Counterintuitive points worth calling out: (1) setTimeout(fn, 16) treats 16 ms as a minimum delay, not "precisely 16 ms" — browser scheduling has 1-5 ms clamping plus main-thread jitter, so the average lands at 17-22 ms; (2) requestAnimationFrameis bound to the vsync signal, stable at 16.7 ms on 60 Hz, 8.3 ms on 120 Hz — it's literally the promise "call me before the next vsync"; (3) run this on a 120 Hz screen and setTimeout still ticks every 16-22 ms while rAF drops to 8.3 ms — that's why hard-coding 16 ms snaps the moment you hit a high-refresh display.
The previous 18 chapters sliced the pipeline open, one chapter per stage. But in real life all 13 stages are running at the same time — some have hard serial constraints (must wait), many can run in parallel (everybody works at once). The figure below puts the card's one-frame composition back onto the time axis: who moves, and when.
FIG 19名片在 16.7ms 内的真实时间线。3 件值得记的事:① Main thread 真正"渲染相关"的时间只有约 6ms,剩下 10ms 全是 idle——可以塞 JS / microtask / rAF② 3 条 Raster 线程并行,与 Compositor 的 Tiling/Activate 阶段重叠③ Viz/GPU 在最后 4ms 被叫醒,做 Aggregate + DrawAndSwap + Swap,整个前 12ms 它在睡。The card's real timeline within 16.7ms. Three things worth remembering:① the Main thread spends only ~6ms on rendering, the remaining 10ms is idle — perfect for JS / microtasks / rAF② 3 Raster threads run in parallel and overlap with the Compositor's Tiling/Activate③ Viz/GPU wakes up only in the last 4ms to Aggregate + DrawAndSwap + Swap — it sleeps the first 12ms.
Main thread 的一秒钟
A typical second on the Main thread
把上面那张帧时间线按 60 倍重复就是 1 秒。但 Main thread 的工作不只是渲染——还有 JS 执行、事件处理、microtask、setTimeout 回调。一个典型 1 秒(中等复杂度的 SPA)的 Main thread 时间分布大致如此:
Repeat that frame timeline 60 times — that's a second. But the Main thread does more than rendering: JS, event handlers, microtasks, setTimeout callbacks. A typical 1-second budget on the Main thread of a moderately complex SPA looks like this:
JS 35%
Style 12%
Layout 8%
Paint 5%
idle 40%
0ms250ms500ms750ms1000ms
JS 是头号竞争者——React 重 render、状态库 reducer、IntersectionObserver 回调,这些都在抢 Main thread。当 JS 一个长任务超过 50ms,整条流水线在那 50ms 都在排队:Style/Layout/Paint 都做不了,vsync 来了也只能丢帧。这就是为什么 Web Vitals 把 INP(Interaction to Next Paint)和 TBT(Total Blocking Time)放在前面——它们直接量"JS 占用 Main 多久"。
JS is the chief competitor — React re-renders, state-library reducers, IntersectionObserver callbacks all fight for the Main thread. When a JS long task exceeds 50ms, the entire pipeline queues up for those 50ms: Style/Layout/Paint cannot proceed, the vsync arrives only to drop the frame. That's why Web Vitals leads with INP (Interaction to Next Paint) and TBT (Total Blocking Time) — both measure "how long does JS hold the Main thread".
2024 +Scheduler.yield() 与 isInputPending(): 现代 Chromium 提供 scheduler.yield() 让 JS 主动让出主线程,以及 navigator.scheduling.isInputPending() 让长任务可以提前退让给输入事件。这两个 API 让"不要让 JS 阻塞渲染"从口号变成可量化的工程实践。Scheduler.yield() and isInputPending(): modern Chromium ships scheduler.yield() for JS to voluntarily yield the Main thread, plus navigator.scheduling.isInputPending() for long tasks to step aside for incoming input. These two APIs make "don't let JS block render" measurable rather than aspirational.
三种"掉帧"的物理来源
Three physical sources of "jank"
#
物理现象Physical event
看到什么What you see
1
Main 长任务
JS 跑了 80ms,5 帧没刷新——卡顿"段落式"出现JS ran 80ms, 5 frames missed — jank in "chunks"
2
Raster 跟不上
滚动时屏幕一直在动,但视口边缘棋盘格screen keeps moving while scrolling, viewport edges show chequer
3
GPU 排队
动画起步那一刹那"卡一下",之后顺畅(GPU 上了纹理)animation "hitches" at the very first frame, smooth afterward (GPU loaded textures)
名片的 14 站 · 一字排开The card's 14 stations · at a glance
从字节流到 SwapBuffers — 一张图回看完整旅程From bytestream to SwapBuffers — the full journey, recapped
点 ▸ 播放,跟着名片走完一帧的 14 道工序click ▸ Play to follow one frame through 14 stages
The previous 18 chapters described the forward pipeline: bytes in, pixels out. But half of a browser's complexity hides in the reverse pipeline: a click, from a hardware interrupt, crossing 3 processes and 5 thread segments, eventually firing the next 13-stage round. This is the real topology behind RAIL's R (Response).
输入流水线 · 一次 click 的旅程Input pipeline · one click's journey
OS EventHardware IRQ
→
Browser · IOBrowser process
→
Browser · UIrouting & hit-test
→
Render · Compositortry handler
↘
Render · MainJS handler · setState
→
Style + Layout + Paintrender pipeline
→
Viz · GPUSwapBuffers
5 个关键节点:
Five key checkpoints:
OS → Browser IO 线程:操作系统通过 evdev / WindowProc / NSEvent 把硬件中断翻译成 InputEvent,塞进 Browser process 的 IO 线程消息队列。
OS → Browser IO thread: the OS translates the hardware interrupt into an InputEvent via evdev / WindowProc / NSEvent and enqueues it on the Browser process's IO thread.
Browser UI 路由 + hit-test:Browser 用 hit-test region(由 cc 提供的命中测试矩形列表)决定该事件归哪个 Render Process 的哪个 frame——OOPIF 的事件路由就靠这个。
Browser UI routes + hit-tests: Browser uses cc-supplied hit-test region (a list of hit-test rectangles) to decide which Render process's which frame owns this event — this is how OOPIF event routing works.
Render Compositor 先看一眼:输入事件优先送到 Render 的 Compositor thread。如果是滚动 / pinch / non-blocking touch,Compositor 自己处理就够(直接调整 scroll offset / transform),从不打扰 Main——这就是"滚动跑在 Compositor 上"的物理实现。
Render Compositor takes a first look: input goes to Render's Compositor thread first. If it's a scroll / pinch / non-blocking touch, the Compositor handles it alone (just adjust scroll offset / transform) and never wakes Main — this is the physical implementation of "scrolling runs on the Compositor".
Bounce 给 Main · 跑 JS handler:如果是 click / keypress / 注册了 active touch listener 的事件,Compositor 把事件转给 Main thread,这才轮到 JS handler 跑。passive: true 是关键标记——它告诉 Compositor"这个事件不会调 preventDefault",Compositor 可以在等 Main 处理的同时继续把后续的滚动事件按自己的节奏处理。
Bounce to Main · run the JS handler: if it's a click / keypress / event with active touch listener registered, Compositor forwards it to Main, where the JS handler finally runs. passive: true is the key flag — it tells the Compositor "this event will not call preventDefault", letting Compositor keep handling subsequent scroll events on its own cadence while Main works.
Trigger a new frame: the JS handler calls setState / changes className / mutates DOM → invalidates Style → on the next BeginMainFrame, the forward pipeline runs again. If the mutation only touches Compositor-only properties, Main doesn't even need to wake.
Google's RAIL model classifies user-perceptible work by time budget: Response 100ms · Animation 16ms · Idle 50ms · Load 1000ms. The 100ms in R is not "click to pixel" — it's "click to feedback-on-screen" (could be a spinner, a pressed-button state, a ripple). This gives the Compositor a precious "react-fast-then-process-properly" window — every modern UI library leans on it (:active pseudo, focus ring, ripple animations).
click waits for the "maybe double-click?" window (~300ms by default, but browsers have heuristics that drop it to ~100ms). pointerdown fires immediately. Material Design's ripple animation always appears the moment you press down precisely because it's bound to pointerdown, not click — exploiting RAIL's 100ms "react-first window".
The original was written in 2022; the Chromium pipeline has moved on by another three years. This chapter pins the most meaningful recent changes back to their stages — none reshapes the pipeline's skeleton, but each adds a new "hook" somewhere on it.
PRE-PAINTAnchor Positioning · CSS Anchor 新属性 anchor-name / position-anchor / inset-area 让一个元素"跟着另一个元素飞"。这给 Pre-paint 阶段引入了新的 Transform 节点子类——anchor 位置变化要通过 cc 同步到 Compositor 上,不需要回 Main。第一次让"tooltip / popover 跟随触发器"完全跑在 Compositor 上。New properties anchor-name / position-anchor / inset-area let an element "fly with another". This introduces a new Transform-node subclass in Pre-paint — anchor position changes sync through cc to the Compositor without going back to Main. For the first time, "tooltip / popover tracks its trigger" runs entirely on the Compositor.
COMPOSITINGScroll-Driven Animations · CSS animation-timeline animation-timeline: scroll() / view() 让 CSS 动画的进度由滚动位置驱动,而不是时间。整条动画跑在 Compositor 线程上——cc 把 scroll offset 直接喂给 animation interpolator,Main thread 完全不参与。这把以前用 IntersectionObserver + JS 算的"视差滚动 / 进度条" 一夜之间变成了几行 CSS。animation-timeline: scroll() / view() drives a CSS animation by scroll position rather than time. The entire animation runs on the Compositor thread — cc feeds the scroll offset straight into the animation interpolator, with no Main-thread involvement. Overnight, the "parallax / progress-bar" pattern that needed IntersectionObserver + JS becomes a few lines of CSS.
AGGREGATEView Transitions API document.startViewTransition() 让 SPA 路由切换有了原生的跨状态平滑动画。底层机制就是 SurfaceAggregator 的快照 + 跨状态合成:旧状态被截图为一个 SharedImage,新状态正常渲染,Viz 用 cross-fade / slide / scale 把两个 surface 在合成阶段连起来。从 C17 章的视角看,这是 SurfaceAggregator 第一次被前端开发者直接调用。document.startViewTransition() brings native cross-state smooth transitions to SPA route changes. Under the hood it's SurfaceAggregator's snapshot + cross-state composition: the old state captured into a SharedImage, the new state rendered normally, Viz cross-fades / slides / scales the two surfaces during aggregation. From C17's perspective, this is the first time SurfaceAggregator is directly invokable by web developers.
STYLE@scope / @container / @starting-style 三个新 at-rule 给 Style 章的 RuleSet 多了几条 sharding 维度。@scope 让 RuleSet 多出一个 scoped_rules_ 桶;@container 让一条 rule 的"命中条件"取决于祖先容器的 layout——这违反了原来 Style 在 Layout 之前的强约束,所以 Chromium 给 Container Queries 实现了"two-pass layout-style-layout",这是 LayoutNG 之后最大的样式系统改造。Three new at-rules add new sharding dimensions to Style's RuleSet. @scope introduces a scoped_rules_ bucket; @container makes a rule's "match condition" depend on an ancestor container's layout — breaking the old "Style strictly before Layout" rule, so Chromium implemented "two-pass layout-style-layout" for Container Queries, the biggest style-system rework since LayoutNG.
RASTERRasterInducingScroll · default-on 早期 Chromium 滚动时只走 Compositor,内容不重 raster——快速滚动会出棋盘。新策略 RasterInducingScroll 在惯性滚动期间主动触发 raster,牺牲一点 CPU 换"没有棋盘"。Chrome 122 起已经默认开启。Early Chromium scrolled on the Compositor only, never re-rastering — fast scroll showed chequer. The new RasterInducingScroll strategy proactively triggers raster during inertial scroll, trading some CPU for "no chequer". Default-on since Chrome 122.
PROCESSNetworkService · 默认独立进程 2024 年 Chrome 把 NetworkService 默认推到独立进程(早期可选 in-process)。意味着 Stage 0 的 Mojo IPC 是真的跨进程,不是同进程的简单消息传递。安全沙箱因此更深:Render 即使被 PWN 也拿不到原始 cookie。In 2024 Chrome flipped NetworkService's default to a separate process (used to be optionally in-process). That makes Stage 0's Mojo IPC actually cross-process, not in-process message passing. The sandbox runs deeper: even a pwned Render cannot read raw cookies.
SYNTHESIS 04 · DEBUG GUIDE
症状反查表 — 从卡顿回到阶段
Symptom reverse lookup — from jank back to a stage
By now you know what every stage does. But the question engineers actually ask in PRs is the reverse: page is janky / scroll is sluggish / animation drops frames / cold-start is white — which stage do I look at first? The table below maps common symptoms back to a stage, what to capture, and which tool to reach for.
症状Symptom
可疑阶段Suspect stage
先抓什么First capture
首屏白屏久(LCP > 2.5s)Cold-start blank (LCP > 2.5s)
Stage 00 + 02
Network 面板看 render-blocking 资源 · 看是否有 PreloadScanner 没抢到的关键 CSS / 字体Network panel for render-blocking resources · check whether any critical CSS / font missed PreloadScanner
点击响应慢 (INP > 200ms)Slow click (INP > 200ms)
C20 input + Main
Performance 录制看 click handler 的 long task · 用 scheduler.yield() 把它劈开Performance trace for the click handler's long task · split it with scheduler.yield()
滚动卡顿,Compositor thread 高Scroll jank, Compositor thread saturated
DevTools Layers 面板 · 确认元素在独立合成层 · will-change 是否生效DevTools Layers panel · confirm the element is on its own composite layer · whether will-change works
backdrop-filter 卡backdrop-filter is heavy
C16 Draw + C18 Display
Rendering 面板开 "Layer borders" · 看是否产生独立 RenderPass · 测 GPU 使用率Rendering panel turn on "Layer borders" · check for separate RenderPass · measure GPU usage
大量 DOM 修改,布局抖动DOM thrashing on bulk mutation
C8 Layout
Performance 录制 · 看 forced reflow 警告 · 用 batch DOM API / requestAnimationFrame 合并Performance trace · forced reflow warnings · batch via requestAnimationFrame
页面打开 ~5s 后突然变流畅Page suddenly smooth ~5s after open
C7 Style + V8
V8 JIT 优化完成,bytecode → optimized code · 等热身;或预热关键路径V8 JIT done, bytecode → optimized code · either wait, or warm up critical paths
通用工作流UNIVERSAL WORKFLOW1. 录: 用 Performance 面板录 5-10 秒,包括症状出现的瞬间。2. 分: 看哪个线程是红的——Main 红 = JS / Style / Layout / Paint;Compositor 红 = Tiling / Activate / Draw;Raster 红 = 光栅化跟不上;GPU 红 = 像素吞吐瓶颈。3. 抓: 用上表定位到具体阶段,然后用对应工具深入。1. Record: 5-10 seconds in Performance panel, including the moment the symptom appears. 2. Diagnose: see which thread is red — Main red = JS / Style / Layout / Paint; Compositor red = Tiling / Activate / Draw; Raster red = raster can't keep up; GPU red = pixel-throughput ceiling. 3. Capture: use the table above to land on a specific stage, then dive in with that stage's tool.
SYNTHESIS 05 · INDEX
术语表 — 64 个关键名词
Glossary — 64 key terms
类名与概念速查
a quick lookup for class names & concepts
展开 / 收起expand / collapse
Blink
Chromium 的渲染引擎,2013 fork 自 WebKit。Chromium's rendering engine, forked from WebKit in 2013.→ CH 02
This piece took you to a depth where you can start working with the pipeline. If you want to dig further, the documents below are Chromium / V8's primary sources. After reading them you'll be ready to contribute code or fix bugs.
OFFICIAL DESIGN DOCS · 官方设计文档chromium.org · v8.dev · web.dev
«WebKit 技术内幕» (Chen Zihao) — Chinese-language deep dive on rendering engines; based on early WebKit/Blink, but the pipeline skeleton matches this article. «Inside Chromium» (Tom Dale, online) — module-level diagrams. «High Performance Browser Networking» (Ilya Grigorik) — the networking layer's bible, perfect companion to Stage 0. Together with this piece, you've closed the loop.
From bytes to pixels,
Chromium translates "a web page" into "light" in thirteen movements.
Every frame you see is this pipeline rehearsing the score in 16.7 ms.