ursb.me / notes
FIELD NOTE / 01 性能工程 Performance Engineering 2026

测量「流畅」
是一门工程。

On «Smoothness»,
an engineering act.

从 FrameTime 到 Stutter,一套衡量「卡顿」的判别语言——
它告诉你:高帧率为什么也能不流畅,以及,肉眼说"卡了"时,机器看到的是什么。

From FrameTime to Stutter — a discriminator language for the experience of lag.
Why high frame rates can still feel choppy, and what the machine sees the moment your eyes say it's stuttering.

CHAPTER 01

FrameTime — 帧的最小单位

FrameTime — the smallest unit of a frame

the smallest unit of a frame

帧的最小单位

所有关于"流畅"的讨论,最终都得回到一个数字:两帧画面之间的间隔耗时。这就是 FrameTime,可以简单理解为"一帧画面渲染并最终被你看到所用的时间"。

但这里有一个常被忽略的区分——GPU 渲染完成屏幕真正刷新 不是同一件事。GPU 把一帧画完了 (eglSwapBuffers),并不意味着这一帧已经被推到了你眼前。玩家最终看到的,是显示器 Display 的刷新节奏,而不是 GPU 的吐帧节奏。

PERFDOG  //  取数策略 所有 FrameTime 都按 Display-FrameTime 计——也就是用户真正看到画面变化的时间间隔。这也意味着,你可以直接读 FrameTime 来判断"那一刻屏幕到底有没有更新"。

Every conversation about smoothness eventually collapses into a single number: the time interval between two visible frames. That is FrameTime — simply, the time it takes for one frame to be rendered and ultimately reach your eyes.

But there is a distinction that gets quietly ignored — GPU render completion is not the same event as display refresh. The GPU finishing a frame (eglSwapBuffers) does not mean the frame has been pushed in front of you. What the player ultimately sees is the rhythm of the display, not the rhythm of the GPU.

PERFDOG  //  ACQUISITION Every FrameTime is measured as Display-FrameTime — the actual interval between visible frame changes. Which means: read FrameTime directly to know whether the screen really updated at that instant.
理想刷新(每帧均匀 16.7ms) Ideal refresh — uniform 16.7ms per frame ↓ 帧序 ↓ frame order
0ms33ms67ms100ms133ms
实际刷新(含一次掉帧) Actual refresh — one dropped frame ↓ B 帧占用两次刷新 ↓ frame B occupies two refreshes
0ms33ms67ms100ms133ms
FIG 01 FrameTime 视图:B 帧 GPU 渲染耗时超过一次刷新间隔,下一个刷新点没有新画面,等于"掉了一帧"。 FrameTime view: frame B's GPU render time exceeds one refresh interval; the next vsync passes with no new image — a dropped frame. When a frame's render time exceeds the display refresh interval, one vsync passes without a new image — the eye registers it as a stutter. 渲染耗时一旦超过显示器刷新间隔,下一次 vsync 就没有新画面——眼睛把这一刻记成卡顿。
CHAPTER 02

FPS — 帧率的两副面孔

FPS — the two faces of frame rate

the two faces of frame rate

帧率的两副面孔

FPS 通常被理解为"帧率":1 秒内的平均画面刷新次数。但这只是它的一面。它还有一个不被注意的孪生兄弟——瞬时帧率,也就是用单帧 FrameTime 反算出来的实时 FPS。

这两者之差,正是大量"明明帧率不低但就是觉得卡"的真相所在。

FPS is usually read as "frame rate": the average number of refreshes per second. But that's only one face. It has a quieter twin — instantaneous frame rate — computed back from a single frame's FrameTime.

The gap between these two numbers is exactly where the truth of "high FPS that still feels choppy" hides.

CASE — APPLE WWDC18

CASE — APPLE WWDC18

苹果在 WWDC18 给过一个被反复引用的对照案例:

Apple, at WWDC18, offered a now-classic comparison:

Game A · 高帧率
Game A · High FPS
40 fps

试图以 60 帧运行但实际只能到 40,FrameTime 跳变剧烈,单帧最高 117ms
体感:一卡一顿(micro stuttering)。

Aiming for 60 fps but landing at 40. FrameTime jumps violently, peaking at 117ms.
Feel: stop-and-go (micro stuttering).

Game B · 低帧率
Game B · Low FPS
30 fps

稳定锁帧 30,FrameTime 均匀维持 33ms
体感:非常平滑

Locked at 30 fps. FrameTime stays uniform at 33ms.
Feel: unmistakably smooth.

帧率高,未必流畅。
流畅的钥匙不是"更多帧",而是"更稳定的帧"。 WWDC 18 · Apple
A higher frame rate is no guarantee of smoothness.
The key to smooth isn't more frames — it's steadier ones. WWDC 18 · Apple

CASE — ANDROID 黄油计划

CASE — ANDROID PROJECT BUTTER

2012 年 Google I/O,Android 4.1 (Jelly Bean) 用了一个甜蜜的代号—— Project Butter,黄油计划。在它之前,UI 卡顿是 Android 的"原生味道";在它之后,"流畅"第一次被写进了系统级的设计目标。

At Google I/O 2012, Android 4.1 (Jelly Bean) shipped under a sweet codename — Project Butter. Before it, UI stutter was Android's signature flavour; after it, "smoothness" was inscribed into the OS design itself.

黄油计划把"流畅"翻译成了一个工程指标:以硬件 vsync 为锚,连续一次 vsync 没有新画面刷新,记一次 Jank。这是工业界第一次承认:流畅不是平均值的事,而是分布的事。

Project Butter turned "smoothness" into an engineering metric: anchored to the hardware vsync, one consecutive vsync without a new image counts as one Jank. It was the industry's first admission: smoothness isn't a matter of averages — it's a matter of distribution.

为了达成这条指标,Google 同时上了四件武器:

To meet that metric, Google deployed four weapons at once:

01
VSync 同步信号
VSync Synchronization

所有 UI 渲染对齐显示器的硬件刷新信号。GPU 不再"想画就画",而是在每个 vsync 节拍上提交画面,从源头消除画面撕裂。

All UI rendering aligns to the display's hardware refresh. The GPU no longer paints whenever it pleases; it commits one frame per vsync beat, eliminating tearing at the source.

02
三重缓冲
Triple Buffering

在前/后双缓冲之上加第三个备用缓冲。当 GPU 偶尔慢一拍,显示器仍能从备用缓冲拿到新画面,避免 vsync 空打。

A spare buffer behind the front/back pair. When the GPU lags by a beat, the display still has a fresh frame ready — no empty vsync.

03
Choreographer 协调器
Choreographer

系统级的"心跳":把输入、动画、布局、绘制全部钉在 vsync 时间点上有序触发。每一帧都按"输入 → 动画 → 测量 → 绘制 → 上屏"的固定节拍走。

An OS-level metronome. Input, animation, layout, draw — all pinned to vsync moments and run in a fixed beat: input → animate → measure → draw → display, every frame.

04
CPU Boost on Touch
CPU Boost on Touch

手指触屏的瞬间,CPU 短暂提频,确保第一帧响应不会因为省电节流而错过 vsync 窗口——按下与画面变化之间,必须感觉不到延迟。

The instant a finger touches the screen, the CPU briefly boosts clock speed so the first response frame doesn't miss the vsync window because of power throttling — there must be no perceptible gap between press and reaction.

BEFORE · GPU 自由发挥BEFORE · UNBOUND GPU 不受 vsync 约束的 GPU 输出unbound GPU output vsyncvsyncvsync vsyncvsyncvsync GPU A B (slow) C D E F G Display A A (stale) B D / E? E / F? G 慢帧导致 vsync 窗口空打 → 同一画面停留两拍 slow frame → empty vsync slot → same image lingers AFTER · 对齐 VSYNCAFTER · VSYNC-ALIGNED 出帧节奏锁到 vsync 时间点vsync-aligned production vsyncvsyncvsync vsyncvsyncvsync GPU A B C D E F Display A B C D E F 每个 vsync 都对应一个新画面 · 节奏稳定 · 无 Jank every vsync brings a new image · steady rhythm · no Jank
FIG 02 VSync 对齐:把"GPU 何时出帧"绑定到显示器的硬件节拍上,撕裂与重复帧从源头消失。 VSync alignment: binding "when the GPU emits a frame" to the display's hardware beat — tearing and duplicate frames disappear at the source. Once frame production is locked to vsync, the display always has exactly one fresh frame per refresh — no stale repeats. 将出帧节奏锁到 vsync,显示器在每个刷新周期上都恰好拿到一帧新画面——不会再有"上一帧的复读"。
SINGLE BUFFER → 撕裂→ tearing Buffer read + write GPU → → Disp 读写同时进行 read & write simultaneously → 画面撕裂 → visible tearing DOUBLE BUFFER → GPU 慢则卡顿→ stutter when GPU lags Front on screen → Disp Back drawing GPU → swap on vsync GPU 慢一拍 → Front 复用 GPU lags → reuse Front → Jank(vsync 空打) → Jank (empty vsync) TRIPLE BUFFER → 抖动也不卡→ smooth even with hiccups Front on screen → Disp Back-1 queued ready Back-2 drawing GPU → always one ready GPU 慢一拍仍有备用 spare frame absorbs the lag → vsync 不空打 → no empty vsync 代价:多一个缓冲的内存(约 5 MB / 1080p RGBA),换 vsync 不漏帧 trade-off: ~5MB more memory per 1080p RGBA buffer, in exchange for never missing a vsync 这是后来移动端"流畅性"工程的事实标准 later the de-facto standard of mobile smoothness engineering
FIG 03 缓冲策略对比:单缓冲撕裂、双缓冲会因 GPU 慢半拍而漏帧、三重缓冲用一个备用帧吃掉抖动。 Buffer strategies: single buffer tears, double buffer drops a frame whenever the GPU lags, triple buffer absorbs the jitter with a spare. Triple buffering is the cheapest way to convert occasional render lag into "imperceptible" — at the cost of one extra framebuffer. 三重缓冲,是把"偶尔的渲染抖动"变成"感知不到"的最便宜方案——代价只是多一个 framebuffer。

撕裂到底是什么样子

WHAT TEARING ACTUALLY LOOKS LIKE

单 buffer 的"撕裂"听起来很抽象,其实非常具体:显示器是逐行扫描的,从屏幕顶端到底端用大约 16.67ms 完成一次刷新。如果在扫描的中途,缓冲被换成了下一帧的图像 —— 上半屏还是旧帧,下半屏已经是新帧 —— 你的眼睛就会看到一条裂缝。

"Tearing" sounds abstract — it isn't. The display draws line by line, top to bottom, taking ~16.67ms per refresh. If the buffer is overwritten mid-scan, the top half still shows the old frame while the bottom shows the new one — and your eye sees a crack.

VSYNC 对齐 — 一次只画一帧VSYNC-ALIGNED — one frame at a time
FRAME A

扫描线(白线)从上扫到下,整屏完成后才允许换帧。The scanline sweeps top → bottom; the buffer can only be swapped after a full pass.

单 BUFFER · 中途换帧 → 撕裂SINGLE BUFFER · mid-scan swap → tearing
FRAME A FRAME B

扫描走到 38% 时缓冲被覆盖。下半屏跳到新帧 —— 黄线就是物理上的"撕裂"。At 38% of the scan, the buffer is overwritten. The bottom half jumps to a different frame — that yellow line is the physical "tear".

FIG 04 为什么 vsync 是「同步」信号:它把"换帧"这件事限制在两次扫描之间,绝不允许扫描中途换。 Why vsync is the sync signal: it forbids the buffer swap from happening during a scan — it can only happen between scans. 这就是为什么单 buffer 一定会撕:不撕的唯一办法是禁止扫描中途换帧。 That's why single buffering must tear: the only way to prevent it is to forbid mid-scan swaps.

三重缓冲不是免费的

TRIPLE BUFFERING ISN'T FREE

三重缓冲解决了"撕裂 / 漏帧",但同时给"按下到画面变化"塞了一段隐形的延迟 —— 因为指令要排队穿过 3 个 buffer 才能上屏。下面是同一帧从 GPU 出发到玩家眼睛为止,三种策略各自的延迟代价:

Triple buffering kills tearing and dropped frames — but it also slips an invisible latency between the user's finger and the screen. Each frame waits its turn through three buffers. Here's the latency cost of each strategy:

单 BufferSingle buffer
≈ 8 ms
双 BufferDouble buffer
≈ 16–33 ms
三重 BufferTriple buffer
≈ 33–50 ms
016ms33ms50ms66ms
FIG 05 缓冲数 vs 延迟:每多一个 buffer 就多一拍 vsync 的延迟。 Buffer count vs latency: each extra buffer adds another vsync's worth of delay. 这就是为什么 FPS / 格斗类游戏宁愿偶尔抖一下,也不愿手指按下后晚 50ms 才反应——稳定与即时是永远的 trade-off。 It's why FPS shooters and fighting games take an occasional hitch over a 50ms response delay — stability vs. responsiveness is the eternal trade-off.

这套组合拳之后,PerfDog Jank、Apple Smoothness、Web Performance Budget 等指标都站在了它的肩膀上:把"流畅"拆成可测量、可比较、可优化的工程语言,是黄油计划留给整个移动行业的真正遗产。

After that combination, PerfDog Jank, Apple Smoothness, the Web Performance Budget — all of them stand on its shoulders. Decomposing "smoothness" into a measurable, comparable, optimizable engineering language is Project Butter's real legacy to the mobile industry.

现代延伸 — Frame Pacing 与 Swappy

MODERN HEIRS — FRAME PACING & SWAPPY

"对齐 vsync"这件事在 2024 年也还在打补丁。Google 在 2019 年发布了 Swappy 库,专门解决"游戏渲染速度 ≠ 显示器刷新率"的拍子错配 —— 比如游戏内部跑 90fps、显示器只支持 60Hz,Swappy 会主动节流到 60fps 的整数倍,避免节奏不齐造成的 micro-stutter。是同一个 2012 年的思想,工具变了,目标没变。

"Align to vsync" still gets patches in 2024. In 2019 Google released Swappy, a library purpose-built for the "render rate ≠ display rate" beat mismatch — say a game running 90fps on a 60Hz panel, where Swappy throttles to a clean 60fps multiple to prevent the micro-stutter of misaligned cadence. Same 2012 idea, new tool, same goal.

CHAPTER 03

流畅度 — 视觉惯性与电影帧

Smoothness — visual inertia & cinematic frames

visual inertia & cinematic frames

视觉惯性与电影帧

"卡顿感"从哪里长出来?答案藏在两个看似无关的概念里:

Where does the feeling of lag actually come from? The answer hides in two seemingly unrelated ideas:

1 ─ 视觉惯性

1 ─ Visual Inertia

大脑会下意识用"上一帧的节奏"预测下一帧。一直 60 帧,它就以为下一帧也是 60。一旦节奏忽然降到 25 帧,预测被打断,卡顿感由此诞生
所以同样是 25 帧——一直保持的 25 不卡,从 60 突然掉到 25 才是卡。

The brain unconsciously uses last frame's rhythm to predict the next. Steady at 60 fps, it expects 60. The moment the rhythm drops to 25, prediction breaks — and that is where lag is born.
Same 25 fps, two different worlds: a steady 25 doesn't feel like lag; a sudden fall from 60 to 25 does.

2 ─ 电影帧

2 ─ Cinematic Frames

电影一般是 24 帧,单帧约 41.67ms。这是一个生理学的临界点:低于这个帧率,人眼就开始能辨别画面的不连续。

Films typically run at 24 fps, with each frame around 41.67ms. This is a physiological threshold: below it, the eye begins to detect discontinuity.

24fps
电影帧率 / cinematic baseline
cinematic baseline
41.67ms
单帧耗时 / per-frame budget
per-frame budget
2frames
人眼可容忍延迟 / tolerance
eye-tolerable delay

把这两件事放在一起,就能推导出 PerfDog 衡量卡顿的整套阈值:以"前三帧均值"度量节奏的稳定,以"电影帧倍数"度量绝对时长的容忍。

Put these two together and you arrive at PerfDog's full threshold system: measure rhythmic stability via the three-frame moving average; measure absolute-duration tolerance via multiples of the cinematic frame.

DEMO — 看,「卡」长这个样子

DEMO — what stutter actually looks like

下面三条轨道里的小球都在重复"左 → 右 → 左"。试着只看节奏,别看速度:

Three balls below run a loop: left → right → left. Watch the rhythm, not the speed:

A  ·  平稳 60 fpssteady 60 fps no jank · 16.67ms × N

流畅 · 节奏完全均匀smooth · perfectly even rhythm

B  ·  60 fps + 3 次 Jank60 fps + 3 janks avg fps higher · feels worse

总速度更快,但每一次停顿都被身体清楚地记下来了faster on average — yet every pause is felt by the body

C  ·  锁定 30 fpslocked at 30 fps half the speed · same rhythm

慢一倍,但稳定 — 体感反而比 B 流畅得多half as fast, but uniform — and feels far smoother than B

FIG 06 三条轨道直观展示「流畅是分布的事,不是平均的事」:B 的平均速度其实更快,但你的眼睛会本能地说"卡"。 Three tracks show that "smoothness is distribution, not average": B is faster on average — yet your eye instinctively calls it laggy. If track B feels worse than track C despite running at a higher average frame rate, you've just experienced WWDC18's lesson with your own eyes. 如果 B 让你觉得比 C 难受,即使它的平均帧率更高——你刚刚用眼睛验证了 WWDC18 的那条结论。

现代显示器:30 / 60 / 120Hz

MODERN DISPLAYS — 30 / 60 / 120Hz

2012 年所有"流畅"的对话都默认 60Hz。今天的高端手机已经是 120Hz 起步,iPhone Pro 用 LTPO 面板实现 1–120Hz 的无级变速 —— 静态时降到 1Hz 省电,滚动时拉到 120Hz 抢丝滑感。同一个画面在三种刷新率下,节奏完全不同:

In 2012 every conversation about smoothness assumed 60Hz. Today's flagship phones start at 120Hz, with iPhone Pro using LTPO panels to vary refresh rate from 1 to 120Hz — dropping to 1Hz when static to save battery, climbing to 120Hz when scrolling. The same image looks different at each rate:

30 Hz  ·  每帧 33.3ms33.3ms per frame budget · low-end / power saver

能看,但快速滑动时会感到"轻微拖影"readable, but a faint smear during fast flicks

60 Hz  ·  每帧 16.67ms16.67ms per frame mainstream baseline · 2012–today

业界标准。"流畅"两个字最初就是为这个数定义的the industry standard — "smooth" was originally defined around this number

120 Hz  ·  每帧 8.3ms8.3ms per frame flagship · ProMotion / LTPO

手指越快,差距越明显。但每帧预算只剩一半 —— 渲染负担也翻倍the faster you flick, the more obvious it gets — but the per-frame budget is half, doubling the render burden

FIG 07 三种刷新率横向对比:120Hz 不只是"画面更快",它把每帧预算压到了 8.3ms,这给渲染器、JS 引擎、所有的中间件都换了一个新难度。 Three refresh rates side by side: 120Hz isn't just "faster image" — it crushes the per-frame budget to 8.3ms, raising the difficulty for the renderer, the JS engine, and every piece of middleware in between. LTPO 面板的可变刷新率让"FPS"这个概念第一次开始流动 —— 同一段视频,前一秒可能是 24Hz,下一秒就跳到 120Hz。 LTPO's variable refresh rate makes "FPS" itself fluid for the first time — one second the panel is at 24Hz, the next at 120Hz.
CHAPTER 04

Jank — 一条判别式

Jank — a discriminator function

a discriminator function

一条判别式

把上面两个直觉变成可以跑在工具里的判别式,PerfDog 用了一组双条件:一帧必须同时违反"节奏稳定"与"时长可容忍",才算一次 Jank。

To turn those two intuitions into a function the tooling can actually run, PerfDog uses a paired condition: a frame must violate both "rhythmic stability" and "duration tolerance" to count as one Jank.

DEFINITION / Jank(一次卡顿)
DEFINITION / Jank — one stutter
Display FrameTime > 前三帧平均耗时 × 2 Display FrameTime > 2 × avg(prev 3 frames) Display FrameTime > 1000ms / 24 × 2  ≈  83.33ms
⇒  同时满足,则记一次 Jank
⇒  both conditions met → one Jank
DEFINITION / BigJank(一次严重卡顿)
DEFINITION / BigJank — a severe stutter
Display FrameTime > 前三帧平均耗时 × 2 Display FrameTime > 2 × avg(prev 3 frames) Display FrameTime > 1000ms / 24 × 3  =  125ms
⇒  同时满足,则记一次 BigJank
⇒  both conditions met → one BigJank

Jank vs BigJank — 一张图说清

JANK vs BIGJANK — IN ONE PICTURE

两条公式只差一个数字 —— 把"2 倍电影帧 (83ms)"改成"3 倍电影帧 (125ms)"。Jank 是"开始觉得卡"的临界,BigJank 是"明确觉得卡了一下"的临界。一帧如果同时跨过两条阈值线(即"前三帧平均 × 2" 和 "电影帧 × N"),就会被记一次。

The two formulas differ by a single number — swap "2 × cinematic frame (83ms)" for "3 × cinematic frame (125ms)". Jank is the threshold where the eye starts to feel lag; BigJank is where it definitely registers a hitch. A frame is counted only if it crosses both lines (the relative "2 × prev-3-avg" and the absolute "N × cinematic frame").

FRAMETIME OVER 12 FRAMES 柱高 = 单帧耗时(ms)· 阈值线穿过即触发bar height = FrameTime in ms · crossing a line trips the discriminator 160ms 125ms 83ms 33ms 0 BigJank 阈值 125msBigJank · 125ms Jank 阈值 83msJank · 83ms 2 × 前三帧均值(约 33ms)2 × prev-3 avg (≈ 33ms) JANK BIGJANK ~ F1F2F3 F4 F5F6F7F8 F9 F10F11F12 F4 / F9 同时跨过两条阈值线 → 计入 Jank · BigJank。F10 只跨了相对阈值,未越绝对线 → 不计。 F4 / F9 cross both lines → counted as Jank · BigJank. F10 crosses only the relative line — not enough → not counted.
FIG 08 Jank vs BigJank:F4 越过 83ms 横线触发 Jank;F9 一次越过 125ms 触发 BigJank;F10 看似很高(50ms),但没越过 83ms 那条绝对线,所以不算。 Jank vs BigJank: F4 crosses 83ms — one Jank. F9 crosses 125ms — one BigJank. F10 looks tall (50ms) but never crosses 83ms — not counted. "双条件"是关键:相对阈值(节奏突变)+ 绝对阈值(人眼底线),少一个都不算。 The "paired condition" is the point: relative threshold (rhythm break) + absolute threshold (eye floor) — miss either, and it doesn't count.

魔法数字背后

BEHIND THE MAGIC NUMBERS

"前三帧均值 × 2" 和 "1000/24 × 2" 这两个数字看起来像是拍脑袋拍出来的,实际上各自有清晰的工程直觉:

"2 × avg of prev 3" and "1000/24 × 2" look like back-of-the-napkin numbers — they aren't. Each has a clear engineering rationale:

三家判别式 — Google / PerfDog / Apple

THREE DISCRIMINATORS — GOOGLE / PERFDOG / APPLE

PerfDog 是其中一种判别式,市面还有另外两套主流定义。三家直觉不同,对同一段帧数据可能给出完全不同的答案:

PerfDog is one discriminator among several. Two other mainstream definitions exist — and given the same frame data, the three may return entirely different answers:

Google · Jank
系统级 · 绝对锚
System-level · Absolute anchor
FrameTime > 1 vsync interval

每错过一次 vsync 就计一次。最朴素,但忽略了"短时抖动 vs 长时卡顿"的体感差。

Every missed vsync is one Jank. The simplest definition — but it ignores the felt difference between brief jitter and a long freeze.

PerfDog · Jank
体验级 · 相对 + 绝对兜底
Experience-level · Relative + Floor
> 2 × avg(prev 3)
&  > 83 ms (Jank) / 125 ms (BigJank)

"双条件"判别:节奏破了 + 长得离谱才算。还分了 Jank / BigJank 两档严重程度。

A paired condition — rhythm break and absolutely too long. Also graded into Jank / BigJank.

Apple · Hitch
连续值 · ms/s 占比
Continuous · ms-per-second
hitchTime = ∑(FrameTime − vsync)
ratio = hitchTime / totalTime

不数次数,直接累加每帧"超出 vsync 的部分"。MetricKit 推荐目标 < 10 ms/s。

Doesn't count events — sums the excess time over vsync per frame. MetricKit's recommended ceiling is < 10 ms/s.

同一段数据 · 三种判决

SAME DATA · THREE VERDICTS

把上面 FIG 08 那 12 帧的数据原封不动喂给三家判别式,结果差别非常大:

Feed the same 12-frame data from FIG 08 into all three discriminators — and the results diverge significantly:

Google

F4 错过 5 次 vsync · F9 错过 8 次 · F10 错过 2 次。合计 15 次 Jank。

F4 misses 5 vsyncs · F9 misses 8 · F10 misses 2. Total: 15 Janks.

F4 ×5 F9 ×8 F10 ×2
PerfDog

F4 触 Jank · F9 触 BigJank · F10 跨过 33ms 但没跨 83ms,不算合计 1 Jank + 1 BigJank。

F4 trips Jank · F9 trips BigJank · F10 crossed 33ms but not 83ms, not counted. Total: 1 Jank + 1 BigJank.

F4 Jank F9 BigJank F10 ignored
Apple

超出 16.67 的部分累加:73 + 123 + 33 = ≈ 230 ms 的 hitch time,作为占比报告。

Sum of "excess over 16.67ms": 73 + 123 + 33 = ≈ 230 ms of hitch time, reported as a ratio.

F4 +73ms F9 +123ms F10 +33ms
FIG 09 同一段帧数据,三家判别式给出完全不同的"事故报告":Google 关心"次数"、PerfDog 关心"严重程度分级"、Apple 关心"累计影响"。没有谁错,只是各自盯的体验维度不同。 Same frame data, three completely different "incident reports": Google counts events, PerfDog grades severity, Apple sums impact. None is wrong — they each track a different dimension of experience. 如果你的产品同时跑在 Android 和 iOS 上,这意味着你不能在两边用同一个数字汇报"流畅度"。 If your product ships on both Android and iOS, this means you can't report "smoothness" with a single number across the two platforms.
CHAPTER 05

Stutter — 用占比代替次数

Stutter — ratio over count

ratio over count

用占比代替次数

Jank 是次数,Stutter 是占比。后者的存在是为了回答一个 Jank 回答不了的问题:每次卡顿到底有多严重?

Jank is a count. Stutter is a ratio. The latter exists to answer a question Jank cannot: how severe is each stutter?

DEFINITION / Stutter
Stutter = ∑Jank Time  /  Total Time
⇒  测试过程中,卡顿时长占总时长的比例
⇒  the share of stutter time within total test time

同样是"3 次 Jank",可能意味着完全不同的两段体验:

"Three Janks" can mean two very different stories:

A 3 × 90ms 3 × 90ms
B 3 × 600ms

A、B 的 Jank 次数相同,但 Stutter 相差近一个数量级。这就是为什么 Jank 和 Stutter 只能同时看,不能互相替代。

A and B share the same Jank count, but their Stutter values differ by nearly an order of magnitude. That is why Jank and Stutter must be read together, never as substitutes.

把直觉换成数字

FROM INTUITION TO NUMBERS

假设两段都是 10 秒测试,分别累计 270ms 和 1800ms 的卡顿时长,把数字代入定义就一目了然:

Assume both runs are 10-second tests, with 270ms and 1800ms of stutter time respectively. Drop the numbers into the definition and the gap is undeniable:

EXAMPLE / A · 3 × 90ms
Stutter = 270ms / 10 000ms = 2.7%
⇒  3 次"轻微 Jank",体感"偶尔抖一下"
⇒  3 mild janks — feels like an "occasional hiccup"
EXAMPLE / B · 3 × 600ms
Stutter = 1800ms / 10 000ms = 18%
⇒  3 次"长时间冻结",体感"卡到怀疑死机"
⇒  3 long freezes — feels "frozen, possibly hung"

业界经验区间

RULE-OF-THUMB BANDS

不同行业有自己默认接受的 Stutter 区间,可以把它当快速诊断尺:

Each segment of the industry has its own acceptable Stutter band. Treat them as a quick diagnostic ruler:

< 1% 1–3% 3–10% > 10% 优秀excellent 良好good 可接受 / 看场景acceptable · context 明显卡顿,需优化noticeable lag — must fix A · 2.7% B · 18% 0% 20%
FIG 10 Stutter 经验尺:A 落在「良好/可接受」边界,B 已到「必须优化」区。同样 3 次 Jank,对产品的意义截然不同。 Stutter ruler: A sits on the "good / acceptable" boundary; B is firmly in "must fix". Same Jank count, two utterly different products. 游戏对 Stutter 的容忍度通常更低(<3%),社交 / 视频可放宽到 5–10%;超过 10% 几乎一定会被用户口碑反应出来。 Games tolerate less Stutter (<3%); social / video can stretch to 5–10%. Past 10% it almost always shows up in user reviews.

还要再深一层 — 看分布

ONE LEVEL DEEPER — LOOK AT THE DISTRIBUTION

Jank 是次数,Stutter 是占比。但即使两者都看,仍然有最后一层信息会被埋掉:帧时分布的形状。FrameTime 本质是个长尾分布——大部分帧很快,少数帧拖了所有人下水。光看平均值会被这种尾巴骗。

Jank counts events, Stutter measures share. But even both miss one more layer: the shape of the FrameTime distribution. Frame times are inherently long-tailed — most are fast, a handful drag the whole experience down. The mean lies because of that tail.

FRAMETIME DISTRIBUTION · 1000 SAMPLES x = 单帧耗时 (ms) · y = 出现次数 · 平均值在中段,但 P99 远在尾巴上x = FrameTime (ms) · y = frequency · the mean sits mid-pack, but P99 lives way out on the tail P50 · 17ms P95 · 38ms P99 · 85ms mean · 22ms 0 40 80 120 160 200ms
FIG 11 长尾分布:P50 是 17ms(看起来很流畅),但 P99 已经到 85ms(每 100 帧就有 1 帧达到 Jank 阈值)。平均值 22ms 介于两者之间,骗了所有只看平均值的人。 Long tail: P50 is 17ms (looks smooth), but P99 hits 85ms (1 in 100 frames at the Jank threshold). The mean of 22ms sits between them — and lies to anyone who only watches averages. 这就是为什么"P99 帧时"是真正应该写在 dashboard 顶部的数字——它直接对应"用户偶尔遇到的最差体验"。 This is why "P99 FrameTime" deserves the top of every performance dashboard — it directly tracks "the worst experience your user occasionally hits".

结论很简单:没有 Jank → Stutter 必然为 0;有 Jank → 两者趋势一致但并非线性。两个一起看,才能描出体验的全貌。再加一层分布视角(P50/P95/P99),才算把"流畅"这件事真正讲透。

The conclusion is simple: no Jank → Stutter is necessarily 0; with Jank → the two trend together but never linearly. Read both, or you'll miss the shape of the experience. Add a distributional view (P50 / P95 / P99) on top, and only then have you really pinned down "smoothness".

CHAPTER 06

影响 — 不同场景的指标侧重

Impact — where to look, by scenario

where to look, by scenario

不同场景的指标侧重

是不是所有 APP/游戏都该把 FPS、Jank、Stutter 三件事一起盯?答案是看场景。同一个数字在不同语境下,含义、阈值、优化方向都完全不同。

Should every app and every game watch FPS, Jank, and Stutter all at once? The honest answer: it depends on the scenario. The same number means different things — and points to different fixes — in different contexts.

游戏 — 三件套全要

GAMES — All three, always

游戏是三件套都不能放的场景:玩家的手指、画面、预期同步在一帧上。FPS 决定操作的"反馈感",Jank 决定关键瞬间是否破功,Stutter 决定整局体验的"质量底色"。

Games are the one place all three matter at once. The player's finger, the image, and their expectations are aligned on a single frame. FPS shapes response feel; Jank decides whether key moments collapse; Stutter sets the baseline quality of the entire session.

实操上有两条额外注意:

Two practical notes:

APP — 按场景分而治之

APPS — Divide by scenario

APP 没有"一组指标走天下"的奢侈。同一个 App 里,登陆页、信息流、视频播放对应完全不同的指标语言:

Apps don't get a single metric set for everything. Within one app, the login page, the feed, and the video player each speak a totally different metric language:

SCENARIO · 01
静态页面(设置页 / 登录页)
Static page (settings / login)
FPS Jank Stutter

理论 FPS 应该是 0:没人在交互时,根本不该有任何刷新。一旦 FPS > 0,说明有动画或定时任务在偷偷工作 —— 直接对应发热和耗电的暗债。

Theoretical FPS should be 0. With no interaction, no redraws should fire. Anything above zero means an animation or timer is working in secret — heat and battery drain hidden in plain sight.

SCENARIO · 02
滚动动画 / 信息流
Scrolling animation / feed
FPS Jank Stutter

FPS 锁在合理值(30 / 60)即可,不必一味追求 120。手指匀速滑时,60fps 完全足够;多出来的帧只在烧 GPU、烧电池,并不会被感知。

Lock FPS at a reasonable value (30 / 60) — don't chase 120 for its own sake. With a finger gliding at constant speed, 60fps is enough; the extra frames just burn GPU and battery without being noticed.

SCENARIO · 03
快速滑动 / 弹性动画
Fast flick / spring animation
FPS Jank Stutter

手机交互"灵敏感"的来源——也是 Android 黄油计划诞生的场景。手指越快、画面越敏感,Jank 在这里变得致命:一次 100ms 的卡顿,足以让用户怀疑屏幕坏了。
建议:FPS > 55,Stutter < 1%

The wellspring of mobile UI responsiveness — and the very scenario Project Butter was born to fix. The faster the finger, the more sensitive the image — Jank becomes lethal here. A single 100ms hitch is enough for a user to suspect a broken screen.
Target: FPS > 55, Stutter < 1%.

SCENARIO · 04
视频播放
Video playback
FPS Jank Stutter

视频源帧率一般 18–24 帧,FPS 不能掉、Jank 必须为 0。任何一次卡顿都会让人物嘴型对不上声音,被用户立刻感知。
这里的"FPS 高"反而要怀疑:是不是解码器在反复重绘同一帧?

Source video typically runs 18–24 fps; FPS must not drop and Jank must be 0. Any stutter desyncs lips from sound — the viewer notices instantly.
An unusually high FPS here is suspicious — is the decoder repainting the same frame?

速查表 — 哪类场景该看哪个数

CHEAT SHEET — WHICH METRIC, WHICH SCENE

场景FPSJankStutter建议目标
游戏 · 战斗≥ 55< 3 / 分< 1%三件套同盯,按场景分桶
游戏 · 菜单30 OK< 1 / 分< 0.5%UI 卡顿优先
App · 静态页= 0非 0 即异常
App · 滑动 / Feed≥ 30< 5 / 分< 3%避免追高,关注稳定
App · 快滑 / 弹性≥ 55< 2 / 分< 1%触摸响应优先
视频播放原帧率= 0< 0.5%FPS 异常高也是问题
SceneFPSJankStutterTarget
Game · combat≥ 55< 3 / min< 1%All three, bucketed by scene
Game · menu30 OK< 1 / min< 0.5%UI smoothness first
App · static page= 0Anything > 0 is a leak
App · scroll / feed≥ 30< 5 / min< 3%Stability over peak fps
App · flick / spring≥ 55< 2 / min< 1%Touch response first
Video playbacksource fps= 0< 0.5%An unusually high FPS is also a bug
CHAPTER 07

根因 — 知道卡了,还要知道为什么卡

Root cause — knowing it lagged isn't enough

knowing it lagged isn’t enough

知道卡了,还要知道为什么卡

前面六章定义了"什么是卡"。但工程上更难的问题是下一句:这一帧为什么卡了?FrameTime 长,原因可能在 CPU、GPU、JS、布局、IO、硬件、甚至运营商网络。光看 FrameTime 高度,是看不出哪条线让它高的。

The first six chapters defined what a stutter is. The harder question follows immediately: why did this frame stutter? A long FrameTime can come from CPU, GPU, JS, layout, IO, hardware throttling — even the carrier's network. Bar height alone won't tell you which one made it tall.

把工程经验里最常见的根因归类成 6 大族,每族都有自己的"指纹"和对应的诊断工具:

From engineering experience, the most common root causes fall into 6 families — each with its own "fingerprint" and a matching diagnostic tool:

01 · CPU
主线程长任务
Long main-thread task

JS / 布局 / 复杂业务逻辑同步占用主线程。指纹:FrameTime 突然飙升,CPU 在这一帧持续 100% 占用单核。

JS / layout / complex business logic monopolizing the main thread. Fingerprint: FrameTime spikes; one CPU core pegs at 100% for that frame.

工具:Perfetto / Instruments — Time ProfilerTool: Perfetto · Instruments Time Profiler
02 · GPU
渲染管线瓶颈
Render pipeline bottleneck

overdraw、复杂 shader、大尺寸纹理加载。指纹:CPU 已交完命令,但帧仍在等 GPU 完成;profiler 显示 GPU busy > vsync。

Overdraw, heavy shaders, large texture uploads. Fingerprint: CPU has flushed commands but the frame waits for GPU completion; profiler shows GPU busy > vsync.

工具:GAPID / Xcode GPU Frame Capture / RenderDocTool: GAPID · Xcode GPU Frame Capture · RenderDoc
03 · GC
垃圾回收停顿
GC pause

JS / Java / Swift 的垃圾回收在主线程上同步触发。指纹:周期性、每隔几秒一次的 50–200ms 卡顿,伴随内存陡降。

JS / Java / Swift garbage collection firing synchronously on the main thread. Fingerprint: periodic 50–200ms hitches every few seconds, with a sharp memory dip.

工具:Chrome DevTools Memory · Android ProfilerTool: Chrome DevTools Memory · Android Profiler
04 · IO
主线程 IO / 网络阻塞
Main-thread IO / network

同步文件读写或同步 RPC 进入主线程。指纹:FrameTime 抖动幅度极大(10ms ~ 数秒不等),跟网络/磁盘负载强相关。

Synchronous file IO or sync RPC reaching the main thread. Fingerprint: FrameTime variance is enormous (10ms to seconds), correlated with disk / network load.

工具:systrace / strace · 网络抓包Tool: systrace / strace · network capture
05 · COMPILE
Shader / JIT 首次编译
Shader / JIT first compile

Shader 第一次使用时被编译;V8 / JSCore 把热点 JS 升 JIT。指纹:只在"第一次"出现的卡顿——重启后再现,预热后消失。

Shaders compile on first use; V8 / JSCore promote hot JS to JIT. Fingerprint: hitches that appear only on first encounter — they reproduce after a clean start and vanish after warm-up.

工具:Pre-warm · PSO 缓存 · JS bytecode cacheTool: Pre-warm · PSO cache · JS bytecode cache
06 · THERMAL
热降频 / 大小核调度
Thermal throttle / big.LITTLE

设备发烫后内核降频;任务被错误调度到小核上。指纹:游戏前 5 分钟 60fps,第 6 分钟开始稳定掉到 40fps,FrameTime 整体抬升。

The kernel down-clocks once the device gets hot; tasks land on the LITTLE cores by mistake. Fingerprint: a game holds 60fps for 5 minutes, then settles at 40fps from minute 6 onward — FrameTime baseline lifts uniformly.

工具:Battery Historian · CPU 温度日志Tool: Battery Historian · CPU temperature logs

「火焰图」直觉 — 一帧的 16.67ms 都花在哪了

FLAME-GRAPH INTUITION — WHERE 16.67ms GOES

真正调试时,profiler 会把这 16.67ms 切成一段段堆叠的时间块(Perfetto / systrace / Instruments 都长一样)。哪段长得反常,就是这帧的根因。下面是一个简化示意:

When you actually debug, the profiler dices that 16.67ms into stacked time blocks (Perfetto / systrace / Instruments all look the same). The block that looks abnormally tall is the root cause for that frame:

PROFILER VIEW · ONE FRAME · 16.67ms BUDGET vsync · 16.67ms HEALTHY ≈ 12ms input · 1ms animate · 2 layout · 3 draw · 4 composite · 2 JANKED ≈ 90ms input · 1ms animate · 1 layout · 75ms ← BLOWN UP draw · 8 这一段就是根因 → 一次过深的布局递归this is the root cause → a deep layout recursion profiler 不会告诉你"卡了",它告诉你"哪段长了"the profiler doesn't say "it lagged" — it says "this segment got long"
FIG 12 所有调试工具的核心动作只有一个:把一帧切片,看哪一片异常长。Jank / Stutter 告诉你"出了事",profiler 告诉你"在哪里"。 Every profiler does the same thing at heart: slice a frame, find the abnormally long block. Jank / Stutter tells you something happened; the profiler tells you where. 学会读这种"分层时间块"是性能工程师从"知道卡了"走向"知道为什么"的关键一步。 Learning to read these "stacked time blocks" is the step from knowing it lagged to knowing why.

完整 case — 一次卡顿是怎么被抓住的

A FULL CASE — HOW ONE STUTTER GETS CAUGHT

把上面的所有概念串起来,看一段真实的排查链路。场景:电商商品列表页,快速滚动时偶尔出现一次明显的 ~200ms 顿挫。监控大盘上能看到 P99 帧时异常 spike,5% 用户上报"滑得不顺"。我们沿着五步把它抓下来:

Stitch all the concepts above into a real investigation. Scenario: an e-commerce product list page, where fast scrolling occasionally produces a clear ~200ms hitch. The dashboard shows a P99 FrameTime spike; about 5% of users report "scroll feels off". Five steps to catch it:

01
报警 · DASHBOARDALERT · DASHBOARD
监控数据先告诉你"出事了"
The dashboard tells you "something happened"

P99 FrameTime 平时稳定在 ~25ms,今天某个版本上线后跳到 213ms,集中在"商品列表 · 快速滚动"分桶里。Stutter 也从 0.4% 抬到 8%。

P99 FrameTime is normally ~25ms; after today's deploy it spikes to 213ms — concentrated in the "product list · fast scroll" bucket. Stutter rose from 0.4% to 8%.

P99 FrameTime · last 24h 213ms v3.42 release ~25ms baseline
02
抓帧 · CAPTURECAPTURE · TRACE
用 profiler 录一段,找到那一帧
Record a session, locate the offending frame

在 Perfetto / Instruments 上录一段 5 秒的列表滚动。看 frame timeline,绝大多数帧都在 16ms 左右,唯独中间冒出一个 213ms 的红色长帧——肉眼一眼可见的异常。

Record 5 seconds of list scrolling in Perfetto / Instruments. The frame timeline shows almost every frame near 16ms — except one red 213ms bar in the middle, jumping out instantly.

Frame timeline · 5s capture 213ms · this one
03
拆解 · DECOMPOSEDECOMPOSE · DRILL DOWN
展开这一帧,看时间块
Expand the frame, look at the time blocks

点开这个 213ms 帧,profiler 把它切成"input · animate · layout · draw · composite"。其中 layout 段占 180ms(正常应 < 5ms)—— 锁定,是 layout 的问题。

Open the 213ms frame; the profiler splits it into "input · animate · layout · draw · composite". Layout alone takes 180ms (normal: < 5ms). The culprit class is layout.

in an layout · 180ms ← 这一段就是元凶 draw comp total · 213ms
04
定位 · PINPOINTPINPOINT · CALL STACK
进入 layout 的调用栈,找到那行代码
Walk into layout's call stack, find the line

展开 layout 段的火焰图,看到 100 次相同的栈:scroll handler → checkVisible() → getBoundingClientRect()。每次调用都触发同步 reflow—— 100 × 1.8ms = 180ms。一句话定位。

Expand the layout flame graph and see 100 identical stacks: scroll handler → checkVisible() → getBoundingClientRect(). Each call triggers a synchronous reflow — 100 × 1.8ms = 180ms. The line is pinpointed.

// in scroll handler — fires 100× per scroll event items.forEach(item => {   const rect = item.getBoundingClientRect();  // ← forced reflow   if (rect.top < window.innerHeight) item.classList.add('visible'); });
05
修复 + 验证 · FIX & VERIFYFIX & VERIFY
改成异步 + 监控验收
Switch to async + verify on the dashboard

把同步 getBoundingClientRect 换成 IntersectionObserver,可见性判断从主线程移到浏览器内置异步通知。重新发版后 P99 从 213ms 回到 24ms,Stutter 从 8% 回到 0.3%——闭环。

Replace the synchronous getBoundingClientRect with IntersectionObserver — visibility checks move from the main thread to the browser's async callback. After redeploy: P99 drops from 213ms to 24ms; Stutter from 8% to 0.3% — closed loop.

// async, off main thread const io = new IntersectionObserver(entries => {   entries.forEach(e => e.target.classList.toggle('visible', e.isIntersecting)); }); items.forEach(item => io.observe(item));
P99 FrameTime · before / after before 213ms after 24ms
FIG 13 一次完整的卡顿排查 case:监控 → 抓帧 → 拆段 → 调用栈 → 修复 + 验证。每一步都对应前面章节里讲过的概念。 A complete stutter investigation: monitor → capture → decompose → call stack → fix & verify. Each step lands on a concept from earlier chapters. 流程在所有平台都通用,只是工具不同——Web 用 Chrome DevTools / Lighthouse,iOS 用 Instruments,Android 用 Perfetto。 The flow is universal — only the tools change. Web uses Chrome DevTools / Lighthouse, iOS uses Instruments, Android uses Perfetto.
CHAPTER 08

优化 — 让卡顿不发生

Fix — keeping Jank from happening

keeping Jank from happening

让卡顿不发生

第七章告诉你"卡在哪",这一章告诉你"怎么不卡"。优化手段虽然繁多,但底层只有 三种思路:要么让每帧的工作变少,要么把工作搬走,要么干脆接受妥协。

Chapter 7 told you where it lags. This one tells you how to keep it from lagging. The toolbox is large, but underneath there are really only three strategies: do less work per frame, move work elsewhere, or accept compromise.

三种思路

THREE STRATEGIES

A
缩短关键路径
Shorten the critical path

让每一帧主线程上的工作变少。能不算就不算、能少算就少算。最朴素的优化方向,往往也最有效。

Make each frame's main-thread work smaller. If you can skip it, skip it. The plainest direction — and usually the most effective.

虚拟列表 · 缓存 · 算法降复杂度virtual lists · caching · lower-O algorithms
B
让出主线程
Yield the main thread

同样的工作量,搬到别的地方做。Web Worker、异步队列、空闲回调,都是把"必须算"的活儿挪开主线程的工具。

Same work, different runner. Web Workers, async queues, idle callbacks — tools for relocating necessary work off the main thread.

Web Worker · requestIdleCallback · 任务拆分Web Worker · requestIdleCallback · chunking
C
降级 / 跳过
Degrade / skip

承认一帧塞不下,主动选择"哪一帧不画"或"画得糙一点"。滚动时降帧、视口外剔除、低端机降画质——是工程上的"诚实"。

Admit a frame can't fit, then choose what to drop or simplify: lower framerate while scrolling, cull off-screen, downgrade quality on low-end devices. Engineering honesty.

视口剔除 · 动态降级 · 滚动降频viewport culling · dynamic downgrade · scroll-throttle

六个最常用的具体手段

SIX MOST-USED TACTICS

A · TASK
任务拆分
Chunking

把一个长任务切成多段,每段之间让 vsync 插一帧。50ms 的活拆成 5 × 10ms,主线程就能在中间画出 5 帧。

Slice a long task into pieces with vsync gaps. A 50ms job split into 5 × 10ms lets the main thread paint 5 frames in between.

A · LIST
虚拟列表
Virtual list

10 万条数据只渲染屏幕里那 20 条。React-window / RecyclerView / Compose LazyList 都是这一类。

Render only the 20 items currently visible out of 100,000. React-window / RecyclerView / Compose LazyList all belong here.

B · WORKER
Web Worker / 后台线程
Web Worker / off-main-thread

JSON 解析、图像编码、复杂计算搬到 Worker 线程。主线程拿结果不参与计算,自然不会卡。

JSON parsing, image encoding, heavy compute — move to a Worker. The main thread takes the result instead of doing the work.

B · SCHEDULE
异步调度
Async scheduling

requestAnimationFrame 控制画面、requestIdleCallback 处理低优先级任务。React 18 的 startTransition 也是同源思想。

requestAnimationFrame for visuals, requestIdleCallback for low-priority work. React 18's startTransition is the same school.

A · CACHE
缓存与预热
Caching & pre-warm

Shader 预编译(PSO cache)、图片预解码、JS bytecode 缓存——让"第一次"的成本提前付掉。

Pre-compile shaders (PSO cache), pre-decode images, cache JS bytecode — pay the "first time" cost up front.

C · DEGRADE
动态降级
Dynamic downgrade

检测到帧时拉长就主动降画质:粒子数减半、阴影关闭、动画跳帧。诚实的妥协,比"卡到死"好得多。

When FrameTime drifts, downgrade actively: halve particles, kill shadows, drop animation frames. Honest compromise beats freezing.

最常引用的招式 — 任务拆分

THE MOST-CITED MOVE — CHUNKING

举一个最直观的例子:一段 50ms 的 JSON 解析挂在 click handler 上。同步跑会让接下来 3 个 vsync 没有新画面(即 3 次 Jank)。把它切成 5 段、每段之间 yield,主线程就能在中间画出帧。代码量没变,体感差一个数量级:

The simplest example: a 50ms JSON parse hanging off a click handler. Run synchronously and the next 3 vsyncs see no new image — 3 Janks. Slice it into 5 pieces and yield between them; the main thread paints frames in the gaps. Same code count; the felt difference is an order of magnitude:

SAME WORK · 50ms · TWO SCHEDULES vsync 0 vsync 16 vsync 33 vsync 50 ms BEFORE · ONE LONG TASK main parseJSON · 50ms (blocking) paint × missed × missed paint → 错过 3 次 vsync · 3 次 Jank → 3 missed vsyncs · 3 Janks AFTER · CHUNKED · YIELD BETWEEN main 10ms 10ms 10ms 10ms 10ms paint → 每个 vsync 都画到 · 0 次 Jank → every vsync paints · 0 Janks
FIG 14 同样 50ms 的工作,整块跑会塞死 3 次 vsync;切成 5 段每段 10ms,每个 vsync 都能挤出一帧。这是"让出主线程"思想最直观的例子。 Same 50ms work — run as a single block and it suffocates 3 vsyncs; chunked into 5 × 10ms, every vsync still paints. The cleanest illustration of "yield the main thread". 这条思路对应 React 18 的 startTransition、Web 的 scheduler.yield()、Cocos 的协程拆帧。底层全部是同一个动作。 This pattern shows up as React 18's startTransition, Web's scheduler.yield(), and Cocos coroutine yielding. All the same underlying move.

速查 — 根因 → 手段

CHEAT SHEET — CAUSE → FIX

把第七章的 6 类根因和上面的手段拼一张表,下次找到根因后能直接在表里查处方:

Stitch Chapter 7's 6 root causes to the tactics above into one table — once you've named the cause, look up the prescription:

根因主要手段典型工具 / API
CPU · 主线程长任务任务拆分 · 异步调度 · Web Workerscheduler.yield · requestIdleCallback · postMessage
GPU · 渲染管线瓶颈降 overdraw · 减纹理 · 简化 shaderRenderDoc · Texture Atlas · Mip-mapping
GC · 垃圾回收停顿对象池 · 减少临时对象 · 关闭装箱Object Pool · TypedArray · 避免闭包临时对象
IO · 主线程 IO / 网络异步化 · 预加载 · 缓存async/await · Service Worker cache · IndexedDB
COMPILE · Shader / JIT 首次编译预热 · PSO cache · bytecode cacheVulkan PSO Cache · V8 Snapshot · pre-warm
THERMAL · 热降频动态降级 · 限帧 · 推送高负载到峰前FPS auto-throttle · QoS class · pre-compute
Root causePrimary fixTypical tool / API
CPU · long main-thread taskChunking · async scheduling · Web Workerscheduler.yield · requestIdleCallback · postMessage
GPU · pipeline bottleneckReduce overdraw · smaller textures · simpler shadersRenderDoc · texture atlas · mip-mapping
GC · pauseObject pools · fewer temporaries · avoid boxingObject Pool · TypedArray · avoid closure temps
IO · main-thread IO / networkAsync · preload · cacheasync/await · Service Worker cache · IndexedDB
COMPILE · first-time shader / JITPre-warm · PSO cache · bytecode cacheVulkan PSO Cache · V8 snapshot · pre-warm
THERMAL · throttlingDynamic downgrade · cap fps · pre-compute peaksFPS auto-throttle · QoS class · pre-compute

六大类、三种思路、一张矩阵——优化的全图基本就这些。剩下的工作不是"知道有什么手段",而是"具体场景下挑哪一个"。这件事没有银弹,只有反复地:测、改、再测。

Six causes, three strategies, one matrix — that's the optimization map in full. The remaining job isn't "knowing what tools exist", it's "picking the right one for this scene". No silver bullet — just measure, fix, measure again.

CHAPTER 09

通用语 — 流畅是一种跨平台方言

A common tongue — smoothness as a cross-platform dialect

smoothness as a cross-platform dialect

流畅是一种跨平台方言

前面所有的概念听起来像移动端专属,其实它们在 Web 和 iOS 上都有同形异姓的孪生兄弟。一旦认出"这是同一种语言",跨端跳起来就不再陌生:

Most of the concepts above sound mobile-specific. They aren't — they have same-shape, different-name twins on the Web and on iOS. Once you recognize "this is the same language", crossing platforms stops feeling foreign:

Web · INP
Interaction to Next Paint

2024 年取代 FID 成为 Core Web Vitals 三件套之一。测量"用户交互到下一次画面更新"的耗时,目标 < 200ms。本质就是 Web 版的 Jank Time。

Replaced FID in 2024 as a Core Web Vitals metric. Measures the time from user input to the next paint, with a target < 200ms. Fundamentally Web's "Jank time".

iOS · MetricKit Hitch
scrollHitchTimeRatio

iOS 14+ 系统级指标。每帧错过 vsync 的"超出时间"被累加,作为 ms/s 的占比上报。Apple 推荐目标 < 10 ms/s。

A system-level metric since iOS 14. Excess time per missed-vsync frame is summed and reported as a ms-per-second ratio. Apple's recommended target is < 10 ms/s.

UX · Doherty
100 / 1000 / 10000ms 法则

1968 年 IBM 的研究:响应在 100ms 内 = "瞬时",< 1s = "连续",> 10s = "中断"。它和帧时无关,但和"流畅感"同源 —— 都是用户对延迟的容忍曲线。

A 1968 IBM study: response < 100ms feels instant, < 1s feels continuous, > 10s feels broken. Independent of FrameTime — but rooted in the same human-tolerance curve for latency.

从触摸到上屏 — 流畅之外还有"快"

FROM TOUCH TO PHOTON — "FAST" IS NOT ONLY "SMOOTH"

流畅(FrameTime 稳定)和响应(输入延迟低)是两件事,常常被混作一谈。一次完整的"按下到看见反馈"经过 5 个阶段,每一段都可能成为瓶颈:

Smoothness (steady FrameTime) and responsiveness (low input latency) are different things, frequently conflated. A complete "press → see-it-react" path crosses 5 stages — any one can become the bottleneck:

触控扫描touch scan
~5ms
系统派发dispatch
~3ms
应用响应(JS / 业务)app response (JS / logic)
~12ms
渲染 + 合成render + composite
~8ms
扫描上屏scanout
~4ms
0ms 总延迟 ≈ 32 mstotal ≈ 32 ms 16ms 32ms
硬件采样hardware sample
OS 派发OS dispatch
应用层(最大)app layer (largest)
渲染管线render pipeline
硬件扫描hardware scanout
FIG 15 一次按下,5 个阶段的触控延迟分解。"应用响应"(JS / 业务逻辑)通常是其中最大、也最容易优化的一段。 A single press, decomposed into five latency stages. "App response" (JS / business logic) is usually the largest segment — and the most tractable. 总延迟 32ms 听起来很快,但 Doherty 的 100ms 阈值 是把"瞬时"留给那些能压到 30ms 以内的产品。 32ms total sounds fast — but Doherty's 100ms ceiling reserves "instant" for products that hold response within 30ms.

所以下一次别人说"我们 60fps 流畅"时,可以追问一句:P99 帧时是多少?输入延迟是多少?hitch 占比多少?真正流畅的产品,三个问题都答得出,且数字相互呼应。

So the next time someone says "we're at 60fps, smooth" — push back: What's your P99 FrameTime? Your input latency? Your hitch ratio? Truly smooth products answer all three — and their answers reinforce each other.

"流畅"听起来是一种感受,
但它在工程上能被切成 FrameTime、FPS、视觉惯性、电影帧、判别阈值与占比。
每一个数字,都对应一种被你身体记住、却说不出名字的不适。

"Smoothness" sounds like a feeling.
In engineering it splits into FrameTime, FPS, visual inertia, cinematic frames, discriminator thresholds, and ratios.
Each number is the name of a discomfort your body has memorised but cannot pronounce.

FIN // END OF FIELD NOTE 01
✦ ✦ ✦
阅读Reads

留下评论Leave a comment

评论Comments

加载中…Loading…