为什么不能继续用 WebView Why Not WebView Anymore
WebView 是 Web 跨平台最常见的载体。但当业务从渲染好的页面变成实时渲染的游戏,它开始显出结构性的短板。这一章先把「为什么需要一个新容器」讲清楚。
The WebView is the default carrier for cross-platform Web. But once the workload shifts from rendered pages to real-time games, structural limits start to show. This chapter is about why a new container was needed in the first place.
1.1 先看一眼业务的「长相」 1.1 What the Business Actually Looks Like
我们做的事情其实挺具体的:在我们的多款音乐 App 里,给用户提供「小游戏」。绝大部分是消除类的休闲互动游戏,承担拉新、留存、活动转化等一系列业务目标。
What we build is concrete: mini-games shipped inside our suite of music apps. Most are casual interactive titles in the match-3 family, driving acquisition, retention, and event conversion.
整体业务 DAU 在十万量级,其中绝大多数跑在 Helio 容器上。游戏数量超过 10 款,每款都是独立的 Cocos 工程,技术栈是 JS + WebGL,发布形态接近业界主流小游戏。本文你看到的所有数据,都来自于这个真实的生产环境。
The business runs in the hundreds-of-thousands DAU range, the majority of which sits on Helio. The portfolio includes 10+ titles, each its own Cocos project, written in JS + WebGL and packaged in a way close to industry-standard mini-games. All numbers in this article come from this real production environment.
1.2 这件事不是我们一家在做 1.2 We’re Not the Only Ones Doing This
把镜头拉远一些,整个小游戏行业正在经历一次明显的「重度化」演进。从 2017 年「跳一跳」式的轻度休闲,到近几年 SLG、卡牌乃至 FPS / MMO 入场,平台和容器需要承载的负载,每隔两三年就抬一档。
Zoom out and the whole mini-game industry is shifting toward heavier genres. From 2017’s “Tap to Jump” era of light casuals, to recent waves of SLG, card battlers and even FPS / MMO titles, the workload that platforms and containers must absorb has stepped up every two or three years.
跳一跳类Light casuals
Tap-to-Jump era
棋牌 / 模拟经营Mid-weight
Card games / sims
进入头部榜单SLG / card battlers
top the charts
FPS / MMO 入场Heavy genres
FPS / MMO arrive
这条潮流意味着一件事:容器侧的性能投入是必须的,不是过度设计。业界几家主流平台(包括跨端引擎商、社交平台、游戏厂商等)都在持续投入容器层的优化 —— 后面几章你会看到我们在某些点上跟他们殊途同归,在另一些点上选了不同的路。
The implication is clear: investing in container-side performance is necessary, not over-engineering. Major platforms across the industry have all been pouring effort into the container layer. Later chapters will show that on some points we converged with them, and on others we deliberately took a different road.
1.3 WebView 跑小游戏的三个原罪 1.3 The Three Original Sins of Running Mini-Games on WebView
回到自己的选择。最早的方案就是 WebView —— 它在 Web 时代被验证过几乎所有能力。但把它用来跑实时渲染的游戏,很快就暴露出三个绕不开的结构性问题:
Back to our own decisions. The first version used a plain WebView, proven over years of the Web era. But once you put a real-time rendered game on top, three structural problems become unavoidable:
这三条单独看都不算致命,但凑在一起,每一条都会放大其他两条 —— 卡顿触发降频,降频又让卡顿更明显,问题反馈进来又找不到根因。要打破这个闭环,得换一套底座。
Any one of these in isolation is survivable. Combined, they amplify each other — stutter triggers throttling; throttling worsens stutter; user reports flood in but nothing can be traced. Breaking the loop required a new foundation.
1.4 三方案横评:为什么是 Brush Renderer 1.4 Three Candidates: Why Brush Renderer Won
放弃 WebView 之后,我们筛了三类候选方案:
With WebView ruled out, we narrowed the field to three categories:
- 游戏框架特有引擎 —— 比如 Cocos 自带的 Native 渲染管线,绑定单一引擎
- Engine-specific native pipelines — e.g. Cocos’s built-in native renderer, bound to one engine
- 小游戏供应商 SDK —— 平台方打包好的容器,开箱即用但深度可控性差
- Vendor mini-game SDKs — pre-packaged containers, drop-in but with limited depth control
- Brush Renderer —— 一套基于 OpenGL 的轻量渲染引擎,设计上对标 WebGL,可以无缝接入 Cocos 等多种引擎
- Brush Renderer — a lightweight OpenGL-based renderer designed to mirror WebGL semantics, integrating cleanly with Cocos and other engines
实测下来,Brush Renderer 与 CocosNative 在原始性能上几乎打平:FPS 59.3 vs 60,CPU 24% vs 24.8%,内存 103MB vs 110MB(同一个 demo,Oppo chp1969)。但在五个综合维度上,Brush Renderer 在「扩展性」和「通用性」上甩开一截 —— 这两点直接决定了我们能否复用它跨多款游戏、跨多个宿主 App。
In raw performance, Brush Renderer and CocosNative ran essentially neck-and-neck: 59.3 vs 60 fps, 24% vs 24.8% CPU, 103MB vs 110MB memory (same demo, Oppo chp1969). But across five composite dimensions, Brush Renderer pulled ahead on extensibility and universality — the two dimensions that determine whether one container can serve many games across many host apps.
1.5 但 Brush Renderer 也不是终点 1.5 But Brush Renderer Wasn’t the End Either
选定 Brush Renderer 不等于问题全部解决。落地之后,我们立刻又发现了新的三个问题:
Picking Brush Renderer didn’t close the case. Once it shipped, three new problems showed up almost immediately:
- 复杂度与一致性问题 —— 不同游戏接入时各自踩坑,缺乏统一抽象
- Complexity and inconsistency — every new game hit its own integration pitfalls; no unified abstraction
- 生态缺失 —— 缺本地开发、缺断点调试、缺 CI/CD,研发体验大幅倒退
- Missing ecosystem — no local dev workflow, no breakpoint debugger, no CI/CD; developer experience regressed
- 性能仍有提升空间 —— 渲染和执行效率都没有触到天花板
- Performance still capped — neither render nor execution had reached its ceiling
每一条都对应一个解。三个解叠在一起,就是 Helio 的初始架构 —— 适配层吸收差异,C++ 模块吸收性能瓶颈,外加完整的本地开发链路。
Each problem mapped to one solution. Stacked together, they form Helio’s initial architecture: an adapter layer to absorb diff, C++ modules to absorb perf bottlenecks, and a full local-dev pipeline that closes the loop.
1.6 Helio 在 互动游戏生态中的位置 1.6 A Framework’s Place in Our Game Ecosystem
单独看 Helio,容易把它当成「又一个轮子」。把镜头拉到整个 互动游戏生态里,它的定位会清楚很多 —— 它是底座,不是工具:
Looking at Helio alone risks treating it as just another wheel. The picture sharpens once you put it inside the broader interactive-game ecosystem — it’s a foundation, not a tool:
1.7 生态建设:开发 → 调试 → 发布 → 运营 1.7 Ecosystem Building: Develop → Debug → Release → Operate
工程哲学的另一面是:性能优化只是 Helio 的一半,另一半是研发体验。一个引擎能跑得快还不够,团队还要能开发得快。我们沿着游戏研发的全生命周期,做了四个象限的工作 ——
The other half of the engineering philosophy: performance is only half of Helio. The other half is developer experience. An engine that runs fast isn’t enough if the team can’t ship fast. We worked the four quadrants along the full game-development lifecycle:
开发 · DevelopDevelop
- 对齐 Web 标准,提升研效
- Aligned with Web standards
- 支持完整的业界主流小游戏协议
- Full industry-standard mini-game protocol
- 支持必要的浏览器能力
- Necessary browser capabilities included
调试 · DebugDebug
- 支持完备的 CI/CD
- Full CI/CD pipeline
- 完整的断点调试能力
- Complete breakpoint debugging
- 支持全部 JSBridge 接口
- Full JSBridge surface
发布 · ReleaseRelease
- 业务侧 + 框架侧 + 配置发布系统
- Business + framework + release-config system
- 支持自定义容器能力
- Custom container capabilities
- 移除浏览器黑盒差异
- Remove browser black-box variance
运营 · OperateOperate
- 提供完备的上报接口与监控看板
- Full telemetry and dashboards
- 提供 JS 侧 / OC 侧的日志代理
- JS-side and OC-side log proxies
- Android Crash 率压到 0.0015%
- Android crash rate held at 0.0015%
本图展示 10,000 个点中的 1 个红点 —— 实际还要稀疏 6.7×The grid shows 10,000 dots with 1 red — production is 6.7× sparser still
这四个象限不是独立的工作清单,而是一条闭环 —— 任何一个没做好,链路都会被卡住。比如 Crash 率 0.0015% 看起来是运营层面的指标,但如果开发体验跟不上、断点调不到、CI 发不动,这个数字根本拿不下来。
These four quadrants aren’t a checklist of independent items — they form a loop. If any one breaks, the chain breaks. The 0.0015% crash rate looks like an operations number, but it’s unreachable without strong dev experience, working breakpoints, and a reliable CI.
骨架讲完了。接下来三章,我们逐个看 Helio 的「快」是怎么炼成的:加载、渲染、业务接入。
That covers the foundation. The next three chapters dig into how Helio got fast: loading, rendering, and business integration.
加载优化:从 31 秒到 5 秒 Load Optimization: From 31 Seconds to 5
一款消除类小游戏,安装后第一次冷启动到首屏可交互,能慢成什么样?最早的数字是 31 秒。一年之后,这个数字降到了 5 秒。这一章讲讲是怎么一步步压下来的。
How long can the first cold-start of a match-3 game take, all the way from install to interactive? Our earliest number was 31 seconds. One year later, it’s 5. This chapter is how we got there.
2.1 启动器:把混乱的回调拍成状态机 2.1 The Startup Pipeline: Replacing Callback Chaos with a State Machine
加载流程比想象的复杂。从容器初始化、资源下载、包解压、JS 加载、引擎启动到游戏首屏,中间经过 9 个关键节点。任意一个节点出问题,整条链路就会卡住。
The loading sequence is more intricate than it looks. Container init, resource download, package unzip, JS bootstrap, engine startup, all the way to the first interactive frame — nine critical events along the way. Stall any one of them and the entire chain stalls.
最早我们用一堆回调和标志位串起来。结果是:每加一个错误处理就乱一次,每加一个埋点要改 5 处代码,每个游戏接入都要重新写一份。后来重写成了状态机:明确的状态定义、明确的事件触发、明确的状态迁移。9 个流程事件(JS 侧 4 个 + NA 侧 5 个)严格对应状态机的边。出问题时只要看是卡在哪个状态,就知道该看哪段日志。
The first version was a tangle of callbacks and flags. Adding error handling broke things; adding telemetry meant editing five places at once; every new game forced us to rewrite the integration. We rebuilt it as a state machine: explicit states, explicit triggers, explicit transitions. The nine events (4 on the JS side, 5 on the NA side) map precisely to its edges. When something stalls, you read which state you’re in — that tells you exactly which log to look at.
2.2 分包加载:首屏依赖从 78MB 砍到 25.5MB 2.2 Subpackage Loading: First-Paint Bundle 78MB → 25.5MB
第一波优化复用了业界主流小游戏的「分包加载」协议:把整包拆成主包 + 若干子包,按各自的「最早需要时机」延迟加载。框架侧提供 loadSubpackage 接口,业务侧把游戏拆成 6 个包:
The first wave borrowed the industry-standard mini-game subpackage pattern: split one big bundle into a main package plus several subpackages, then defer-load each one to the earliest moment it’s actually needed. The framework exposed loadSubpackage; the product team carved each game into six packages:
2.3 分包加载的天花板 2.3 Where Subpackaging Hits the Ceiling
分包帮我们把首屏砍到 10 秒,但很快遇到了 4 个结构性问题:
Subpackaging cut first-paint to 10 seconds, then four structural problems surfaced:
- 框架侧无法控制业务拆包效果。拆得对不对全靠业务方的判断,框架只能干看。
- The framework can’t control how packages get split. Whether the cuts are right depends on each team’s judgment; the framework can only watch.
- 拆包成本会逐步劣化。业务每次迭代都可能让原来的分包失效,需要反复重新评估。
- Splits decay over time. Every business iteration risks invalidating prior cuts, forcing repeated re-evaluation.
- 加载链路设计上滞后。必须先 download → 再 unzip → 才能 loadFile,多次跨语言调用,包处理完才能加载其中的资源。
- Load chain is structurally lagged. You must download → unzip → loadFile, with multiple cross-language hops; resources can’t be touched until the package is fully prepped.
- 用户更新感知明显。每次版本升级都要重新下载,迭代频繁的游戏体验更差。
- Updates are user-visible. Every version forces a re-download; the more frequently a game iterates, the worse this gets.
要继续往下挤性能,得换思路。
To squeeze more, we needed a different idea.
2.4 流式加载:把每个资源都变成独立流 2.4 Streaming: Every Resource Is Its Own Stream
流式加载的核心思路:不再下整包,而是把每一个静态资源(图片、字体、音频、JSON、JS)当成一个可单独流式获取的单元。
The streaming idea: stop downloading whole packages. Instead, treat every static resource — image, font, audio, JSON, JS — as an independently streamable unit.
实现路径有三个层面:
Three implementation layers:
- 拦截 download:在 JS 适配层抹掉「下载」概念,把它改成「按需流式获取」。
- Intercept download: in the JS adapter layer, erase the “download” concept entirely and replace it with on-demand fetch.
- 针对静态资源实现不同的 load:Image 重写 setter;Font 修改 loadFont binding;Audio 用 Proxy 实现同步逻辑;Spine 用 remoteBundle。
- Per-type load implementations: Image overrides its setter; Font patches the loadFont binding; Audio uses a Proxy for synchronous semantics; Spine goes through remoteBundle.
- 原生层重做 loadFile:保留同步 require 接口,新增异步 loadFile,并在适配层为查询结果做缓存。
- Rebuild loadFile natively: keep a synchronous require API, add an asynchronous loadFile, and cache lookups in the adapter layer.
最后这一条是关键 —— 业务方的代码完全不知道资源是按需流来的还是预下载好的。这种「无感」是流式加载能成立的前提。
That last point is the linchpin — game code has no idea whether a resource was streamed or pre-downloaded. Invisibility is the precondition for streaming to work at all.
2.5 音频加载:5 个方案的艰难抉择 2.5 Audio: A Hard Choice Between Five Designs
流式加载里最难的细节是音频。原生音频接口是同步的 —— 业务调 audio.play(),立刻就期望响声。流式加载意味着资源可能还在路上,怎么处理?
The hardest detail in streaming is audio. Native audio APIs are synchronous — when game code calls audio.play(), it expects sound immediately. Streaming means the file may still be in flight. How do you reconcile that?
我们评估了 5 个方案:
We evaluated five options:
| 方案Option | 业务感知Visible to Game | 协议破坏Breaks API | 体积/复杂度Cost | 采用Picked |
|---|---|---|---|---|
|
① AudioEngine 自带下载能力
① Bake download into AudioEngine
客户端来不及改,C++ 加网络模块结构上不合理Client team had no bandwidth; bolting a network module into C++ was structurally wrong
|
— | — | ↑↑ | ✕ |
|
② 无缓存时首次同步调用
② First call hits sync I/O
影响首次卡顿率,会卡顿First-paint stutter regression — janks under sync I/O
|
✓ | ✓ | — | ✕ |
|
③ 确保音频资源在本地(preload / 子包打入)
③ Always have audio local (preload / bundle in subpackage)
耦合,存在隐患;影响首屏Tight coupling, hidden risks, hurts first-paint
|
✓ | ✓ | — | ✕ |
|
④ 改异步调用
④ Refactor calls to async
改动太大,破坏接口协议,业务侧需要全改Massive surface change; breaks the API contract; rewrites all callers
|
✕ | ✕ | ↑↑ | ✕ |
|
⑤ 同步返回可异步赋值的代理对象
⑤ Sync return of an async-fillable proxy
ID 仅 stop / setVolume 用,异步赋值无隐患;不修改接口,业务完全无感IDs are only used by stop/setVolume; async fill is safe; API contract unchanged; business is fully unaware
|
✓ | ✓ | — | ✓ |
最终选了方案⑤。核心是一个「先返回代理,等音频真正下载完再回填 ID」的小状态机:
We picked option ⑤. At its core is a small state machine that returns a proxy synchronously, then back-fills the real ID once the audio file lands:
loadAudio() 立刻拿到代理对象(idle);代理状态置 pending 同时触发下载;下载完成置 loaded、回填真实 ID。如果 pending 阶段调了 stop(),直接把这个 pending 任务丢弃 —— 业务对整个过程 100% 透明。
A loadAudio() call returns the proxy immediately (idle). The proxy flips to pending while the download fires; on completion it goes to loaded with the real ID filled in. If stop() is called during pending, that task is silently dropped. The whole flow is transparent to game code.
2.6 深度优化:JSON 合并 / 公共缓存 / 调用优化 2.6 Deeper Cuts: JSON Merging, Shared Cache, Call Tuning
流式加载落地后,新的瓶颈出现在网络层 —— NA 侧的网络库并发上限是 10,资源多的时候很容易触发瓶颈。我们沿着这条链路又做了三层优化:
Once streaming was live, a new bottleneck appeared at the network layer — the NA-side network stack caps concurrency at 10, easy to saturate. Three more optimizations followed:
JSON 合并:减少 52% 文件数
JSON Merging: 52% Fewer Files
构建产物里把各模块的 JSON 提前合并。文件数从 107 砍到 51。一次请求拿走多个 JSON,避开了并发瓶颈。
At build time, merge per-module JSON files. File count dropped from 107 to 51. One request pulls back multiple JSONs, sidestepping the concurrency cap.
公共缓存:跨版本提高命中率
Public Cache: Higher Cross-Version Hit Rate
用 LRU + 调用优化划分公共资源,跨版本提高缓存命中率。和 WebView 的资源缓存策略对齐。
An LRU + call-tuning policy carves out shared resources, lifting cross-version cache hit rates. Aligned with how the WebView caches resources.
调用优化:三级缓存 + 反序列化加速
Call Tuning: Three-Tier Cache + Faster Deserialization
缓存 loadFile 的查询结果(三级缓存:内存 → 持久化 → 网络)。OC 侧通过 JSExport 实现了反序列化方法导出,提升约 95% 的反序列化性能。
Cache loadFile lookups in three tiers: memory → persistent → network. The OC side exposes a deserialization method via JSExport, improving cache deserialization throughput by ~95%.
三层叠加之后 ——With all three tiers stacked ——
2.7 横向对照:业界做加载优化的三条路 2.7 Industry Side-by-Side: Three Paths to Faster Loading
把视野拉开看,业界做容器加载优化的思路大致有三条 —— 我们走了第一条,另外两条是其他团队在不同约束下选的路。三条路的分工很有意思:
Stepping back across the industry, container-side load optimization splits roughly into three families. We picked the first; other teams under different constraints picked the others. The division of labor is illuminating:
资源层:流式加载Resource layer: streaming
把每个静态资源当独立单元,按需流式加载,对业务零侵入。
Treat each static resource as a streamable unit. Fetch on demand. Zero business-side change.
优势:适用面广,任意 JS + WebGL 游戏都能用。
Strength: broadly applicable — any JS + WebGL game can adopt it.
限制:无法优化 wasm 这类二进制 IL 的加载链路。
Limit: can’t reach into wasm-style binary IL loading.
wasm 函数级分包Function-level wasm splitting
通过 PGO(Profile-Guided Optimization)把 wasm 包按函数粒度切成首包 + 子包。首包能小到原始包的 30–40%。
Use PGO (Profile-Guided Optimization) to slice a wasm bundle at function granularity into a first-package + sub-packages. The first package can shrink to 30–40% of the original.
优势:重度 wasm 场景下瘦身能力极强。
Strength: unmatched for slimming heavy wasm targets.
限制:需要业务方在测试阶段打桩收集,研发体验有妥协。
Limit: teams must instrument and run a profiling pass — some dev-experience cost.
分级编译 + 压缩升级Tiered compile + better compression
从 JS 引擎本身入手:先用 LiftOff 做冷启动,运行中用 TurboFan 后台 re-compile 并缓存到下次。Brotli 替 Gzip 拿额外 20% 压缩率。
Act on the JS engine itself: use LiftOff for cold start, let TurboFan re-compile in the background and cache for next launch. Brotli over Gzip adds another 20% compression.
优势:不依赖业务方配合,普适性强。
Strength: no business-side cooperation required; broadly applicable.
限制:天花板由引擎实现决定,自己能动的范围有限。
Limit: ceiling set by the engine vendor — your room to maneuver is small.
2.8 vs Cocos 远程包模式 2.8 vs Cocos’s Remote-Package Mode
在我们之前,Cocos 提供了一套自己的「远程包」模式 —— 类似流式加载,但实现方式不同。两者在 5 个维度上的对比:
Before us, Cocos shipped its own “remote-package” mode — similar in spirit to streaming, but different in execution. A side-by-side across five dimensions:
| 维度Dimension | Helio 流式加载Helio Streaming | Cocos 远程包Cocos Remote-Package |
|---|---|---|
| 包处理Package handling | 可仅配置 index.js,也可任意配置缓存文件Configure just index.js, or any subset of cache files |
只能内置Built-in only |
| 远程脚本Remote scripts | ✓ 支持,且支持预解析(基于 require 原文件做扩展)✓ Supported, with pre-parse (extends require natively) | ✕ |
| 缓存方案Cache strategy | 扩展性强,支持公共缓存Extensible, supports shared cache | JS 实现,较简单JS-based, simpler |
| 业务感知Business visibility | ✓ 完全无感✓ Fully invisible | 需修改包名 + CIRequires renaming + CI changes |
| 引擎支持Engine support | 底层接口支持,适配多游戏框架Low-level API support, multi-engine | 仅支持 Cocos 游戏Cocos-only |
渲染优化:从落后 6.8×到反超 4.8× Render Optimization: From 6.8× Behind to 4.8× Ahead
加载完成只是开始 —— 真正决定用户体验的是渲染。Helio 在渲染上做的事情,比加载层多一倍。这一章会比前面长一些,因为它涉及音频、渲染管线、JS 执行、内存管理、析构、调试六条线。
Loading is just the start. What actually decides user experience is rendering. Helio did roughly twice as much work on rendering as on loading — this chapter runs longer because it spans six tracks: audio, render pipeline, JS execution, memory, teardown, and debugging.
3.0 渲染优化的方法论:三种姿态 3.0 The Methodology: Three Stances
在跳到具体优化之前,先建立一个共同的方法论。渲染性能优化的本质是定位瓶颈,再决定用三种姿态中的哪一种 ——
Before jumping into specifics, here’s the common methodology. Render performance optimization is fundamentally about locating the bottleneck, then choosing one of three stances:
减少 · ReduceReduce
少做一些事 —— 减帧、减绘制、减纹理、减状态切换。Do less — fewer draws, fewer textures, fewer state switches.
转移 · ShiftShift
把瓶颈从一边搬到另一边 —— CPU 转 GPU、JS 转 C++、运行时转构建时。Move the bottleneck — CPU to GPU, JS to C++, runtime to build time.
砍掉 · CutCut
实现层已到天花板,那就调整效果本身。When the implementation hits a ceiling, adjust the effect itself.
要决定走哪条姿态,得先识别瓶颈是 CPU、GPU 还是带宽。识别方法、分析工具、各种 trick,业界已经积累了大量经验,本文不重复造轮子。我们关注的是 Helio 在这三种姿态下做的具体技术决策 —— 后面的小节,每一段都对应其中一种姿态。
Choosing the right stance requires identifying whether the bottleneck is CPU, GPU, or bandwidth. The industry has long-established methods, tools, and tricks for that — we won’t rehash them. What follows are Helio’s specific decisions within those three stances. Each subsection maps to one of them.
3.1 音频卡顿:一边一个坑 3.1 Audio Stutter: One Pit on Each Platform
音频不在主渲染路径上,但它能拉低帧率 —— 一次音频 IO 阻塞主线程,整帧就丢了。两端的情况完全不一样:
Audio isn’t on the main render path, but it drags frame rate — one blocking I/O on the main thread and a frame is gone. The two platforms looked very different:
Android:1 个炸弹音效 = 18 个 MediaPlayer
Android: One Bomb SFX = 18 MediaPlayer Instances
原方案是基于队列的丢弃策略,对并发处理有瑕疵。一次炸弹音效需要创建 18 个 MediaPlayer 实例(耗内存)。我们融合了 SoundPool 方案 —— 接入 OpenSL ES 接口,通过 Web 适配层对应改 AudioEngine 模块,接入成本低。
The original used a queue-based drop policy, fragile under concurrency. A single bomb sound effect spawned 18 MediaPlayer instances (memory hog). We fused SoundPool — integrating OpenSL ES through the Web adapter layer’s AudioEngine module, with low integration cost.
数据:Android 卡顿率 0.999% → 0.638%。同期 WebView 反而从 0.797% 退化到 0.883% —— 我们的优化在 WebView 路线上没有同步发生。
Result: Android stutter rate 0.999% → 0.638%. In the same period, WebView regressed from 0.797% to 0.883% — that optimization didn’t happen on the WebView path.
iOS:替换 OpenAL,再修一个 OpenAL 的坑
iOS: Switch to OpenAL, Then Fix an OpenAL Pitfall
iOS 上是线程调度导致主线程被 block。先把方案换成 OpenAL,但 OpenAL 自己有个坑 —— 切后台偶现不播放、报 alSourcePlay error code:a003。原因是系统在某些打断场景下不会触发 AVAudioSessionInterruptionTypeEnded 事件。我们补上了打断回调的处理逻辑。
On iOS, thread scheduling was blocking the main thread. We swapped to OpenAL — which has its own pitfall: silent failures on background swap with alSourcePlay error code:a003, because under certain interruption flows iOS doesn’t fire AVAudioSessionInterruptionTypeEnded. We patched the interruption callback path.
3.2 GFX 的灵感来源:500 vs 37 3.2 The Spark Behind GFX: 500 vs 37
从渲染层往上一级看,最大的一个发现是这个:同样一帧某款消除游戏的主界面,WebView / Helio 走 500 条 GL 指令,CocosNative 只用 37 条。差了 13.5 倍。
One layer up from rendering, here’s the biggest finding: for the same single frame of a match-3 game’s main screen, WebView / Helio issues 500 GL commands; CocosNative issues just 37. A 13.5× ratio.
为什么差这么多?因为 Cocos 的 Native 实现把多个常用 GL 调用做了合批和封装。WebView 走 JS 标准接口逐条调用,每一条都是 JS → Binding → C++ 的一次跨语言开销。在解释执行的 JSCore 上(iOS 不能开 JIT),这个开销极其可观。
Why so different? Because Cocos’s native implementation batches and packages common GL calls. The WebView path goes through standard JS APIs one call at a time — each paying a JS → Binding → C++ cross-language hop. On interpreter-only JSCore (no JIT on iOS), that cost adds up brutally.
思路就出来了:能不能把 GL 调用从解释执行的 JS 里搬到 C++ 里,并且尽量合批?这就是 GFX C++ 扩展的起点。
The idea writes itself: can we lift GL calls out of interpreted JS into C++, and batch them while we’re at it? That’s GFX’s starting point.
3.3 GFX 的架构与接入 3.3 GFX Architecture & Integration
GFX 的架构有两个关键决定:
Two key architectural decisions:
- 独立建库:从 Cocos Engine 的 GFX 部分抽离出来,提供完整的交叉编译能力,支持 Android + iOS 双平台
- Stand-alone library: extract the GFX bits from Cocos Engine, with full cross-compile support for Android + iOS
- 复用 Helio 已有上下文:复用 Helio 的 JSContext(共享 Binding)、复用 EAGLView 的 glContext(避免双份开销)
- Reuse Helio’s existing context: reuse the JSContext (shared bindings), reuse EAGLView’s glContext (no double overhead)
接入流程上,SE 定义新接口复用 Helio 的 JSContext,EAGLView 复用已有的 glContext,适配层在前向渲染初始化阶段调整 _flow 的实现 (initWebGL),绘制阶段把原本 JS 的 gfx 接口改为 C++ 的 Binding。
Integration: SE defines new APIs reusing Helio’s JSContext, EAGLView reuses Helio’s glContext, the adapter layer adjusts _flow at forward-render init (initWebGL), and turns JS-side gfx calls into C++ Bindings during draw.
se::ScriptEngine *se = se::ScriptEngine::getInstance();
jsb_register_all_modules();
se->start(self.gameEJView.jsGlobalContext);
最关键的设计是可插拔:编译参数可选编译 GFX 模块,Helio 可选接 lib 库 —— 业务方按需引入。这件事对 Helio 的可推广性至关重要 —— 不强制升级,业务方按自己的节奏迁移。
The most important design choice is being pluggable: a compile flag selects whether GFX is included; Helio can optionally link the library — products opt in. This matters enormously for adoption: no forced upgrades, teams migrate at their own pace.
3.4 横向对照:渲染优化的两条哲学路 3.4 Two Philosophies of Render Optimization
在 GFX 这件事上,业界其实分成了两条哲学路。它们的目标都是「让游戏渲染更快」,但实现的姿态截然不同。
On the GFX question, the industry actually splits into two philosophical paths. Both aim at “make game rendering faster,” but the stance is fundamentally different.
保留 WebGL 语义,下沉 C++ 实现Preserve WebGL semantics, lower implementation into C++
WebGL 数百个接口保持不变,把热路径的 GL 调用从 JS 沉到 C++。业务侧零改动 —— Cocos 引擎层吸收。Keep all hundreds of WebGL APIs unchanged. Move hot-path GL calls from JS into C++. Zero business-side change — absorbed at the Cocos engine layer.
代价:天花板有限,毕竟仍是 WebGL 语义。Cost: the ceiling is bounded — it’s still WebGL semantics underneath.
重新定义渲染接口Redefine the rendering API itself
抛弃 WebGL,新设计一套精简接口(约十几个)。抽象出 Pipeline / RenderPass 等更现代的概念。在引擎侧重新生成 shader / 重写绑定。Abandon WebGL; design a slimmed API (about a dozen calls). Abstract more modern concepts like Pipeline / RenderPass. Re-generate shaders / rewrite bindings at the engine layer.
代价:需要业务方/引擎层配合改造,落地周期长。Cost: business teams and engines must come along; rollout takes much longer.
3.5 GFX 落地的 6 个难点 Bug 3.5 Six Hard Bugs From the GFX Rollout
落地不是平的。GFX 接入过程中遇到了 6 个有代表性的难点 Bug,每一个都需要对 GL 管线、JS 引擎、适配层都有深入理解才能定位。
Production wasn’t flat. Six representative bugs surfaced during integration. Each required deep understanding of the GL pipeline, the JS engine, and the adapter layer to root-cause.
黑屏Black screen
渲染错乱Rendering mangled
纹理本身错乱Wrong textures
glTexImage2D 时 glType 与 glFormat 出错Adapter-layer Image presets were off — glType and glFormat wrong at glTexImage2D圆角不生效Rounded corners broken
点击事件丢失Click events lost
GL_VALIDATE_STATUSGL_VALIDATE_STATUS fails
glValidateProgram 验证Required calling glValidateProgram explicitly3.6 6 个 Bug 能高效解决,靠的是全链路调试体系 3.6 Solving Those Bugs Required a Full-Chain Debug System
这 6 个 Bug 能在合理的时间内定位掉,不是因为我们运气好,而是因为 Helio 同时在做调试体系建设。我们沉淀了 5 维调试能力:
These bugs got fixed in reasonable time not because we got lucky, but because Helio had built a debug system in parallel. Five dimensions of debug capability:
- 源码:支持 C++ Debug 模式源码集成
- Source: C++ Debug-mode source integration
- 日志:支持 JS 日志代理 + C++ 日志代理
- Logs: JS-side proxy + C++-side proxy
- 断点:支持全链路断点 —— 一次断点穿透 5 层
- Breakpoints: full-chain — a single break can pause across 5 layers
- 渲染:支持抓帧分析 GL 命令
- Render: frame capture for GL command analysis
- 请求:支持 JS 侧与 NA 侧的请求抓包
- Network: request inspection on both JS and NA sides
最关键的是「全链路断点穿透」 —— 一次单步可以从游戏业务的 JS 代码,逐步调到底层 C++ 的 GL 实现。
The most important piece is full-chain breakpoint traversal — a single step-through can walk from game JS code all the way down to the C++ GL implementation.
3.7 体积、内存、析构、Command Buffer 3.7 Size, Memory, Teardown, Command Buffer
接下来是几个体积、内存、析构相关的优化点。一并放在一个小节里,避免散乱。
A handful of size, memory, and teardown improvements — grouped here to avoid scattering.
体积优化:-35.4%
Binary Size: -35.4%
移除了 Network 模块(libwebsockets / libjson / libuv 三个库),充分利用原生组件优化纹理加载链路(减少 2 次跨语言调用),加上 -O3 编译优化。最终从 11532 KB 降到 7448 KB。
Removed the Network module (libwebsockets / libjson / libuv), shortened the texture-loading call chain by two cross-language hops, and added -O3 compile flags. From 11,532 KB down to 7,448 KB.
CanvasBuffer 内存:从 2 份到 1 份
CanvasBuffer Memory: Two Copies → One
优化前所有 Canvas 操作存在 2 份 Texture2D 的内存开销,通过注册的回调函数跨语言发送 Buffer 数据。优化后改成 C++ 直接操纵 JS 引擎修改 Buffer 数据,仅留 1 份 Texture2D。
Previously every Canvas operation kept two copies of a Texture2D, shuttling buffer data across languages via callback. Now C++ writes directly into the JS engine’s buffer, leaving just one copy.
(JS & C++)
析构优化:4 步 + 2 个真实崩溃
Teardown: Four Steps + Two Real Crashes
析构是个被低估的复杂问题。Cocos 的设计上没考虑过销毁(毕竟 Cocos 本身就是个应用),GFX 只是容器中的一个类库。我们补充了 4 步析构流程:
Teardown is underestimated. Cocos isn’t designed to be destroyed (it expects to *be* the application); GFX is just one library within our container. We added a four-step teardown:
实战中有两个有代表性的崩溃:
Two representative crashes hit production:
案例 1:异步任务持有的 JSContext 在析构后变 null。某个网络请求回调时容器已经销毁,回调里还在用 JSContext。解决:对异步任务里持有的 JSContext 等对象增加保护,避免析构后异步任务出错。
Case 1: async tasks hold a now-null JSContext after teardown. A network callback fired after the container had been destroyed, still trying to use JSContext. Fix: guard JSContext (and friends) in async-held closures so post-teardown invocations fail safely.
案例 2:渲染循环仍在触发 JS 执行。析构期间渲染循环还没真正停下,触发了一次 JS tick,撞上正在销毁中的 JS 引擎。解决:先销毁渲染循环、再销毁 JS 引擎。注意要用标记位/事件等方式确认循环已销毁,不是简单的延时执行。
Case 2: render loop still firing JS during teardown. The loop hadn’t fully stopped, fired one more JS tick, and hit a half-destroyed engine. Fix: destroy the render loop before the JS engine, and confirm it has actually stopped via a flag/event — not a naive setTimeout.
Command Buffer:双 Buffer 设计减少 43.6% 内存
Command Buffer: Dual-Buffer Cuts 43.6% Memory
GFX C++ 化之后,JS 侧到 C++ 侧的通信开销成了新瓶颈。原生对象导出 JSObject 给 JS 侧使用,JS 调用时因为 JSCore 的线程安全机制会进行上锁,但目前业务是单线程,上锁和解锁属于冗余开销。
Once GFX moved into C++, JS↔C++ chatter became the new bottleneck. Native objects exported as JSObjects acquire a JSCore lock on every JS call for thread safety — but our workload is single-threaded, so that lock is pure overhead.
基于 TypedArray 实现的 Command Buffer 直接操作 JSCore 内存,无需调用访问,规避了 JSLock 开销。两个命令 Buffer (Float64Array) 占 56 字节。业务场景中每帧 GL 命令在 80~600 区间,最大内存占用约 14 KB。
A TypedArray-backed Command Buffer writes directly into JSCore memory, sidestepping the JSLock entirely. Two Float64Array command buffers occupy 56 bytes. In real workloads, GL commands per frame range 80–600, peaking around 14 KB.
C++ 执行 → release JSLock gl.drawArrays(0, 6) → acquire JSLock
C++ executes → release JSLock
C++ 直接读 ArrayBuffer → 无 JSObject · 无 JSLock cmdBuf[i] = 23 · paramBuf[j..j+1] = [0, 6]
C++ reads ArrayBuffer directly → no JSObject · no JSLock
3.8 渲染时钟与 rAF 标准化 3.8 Render Clock & rAF Standardization
GFX 引入之后出现了一个意外问题:双渲染时钟。原本 JS 侧有一个 rAF 时钟,GFX 引入后又有了一个 C++ 侧渲染时钟,两个时钟同时跑:意外切主线程导致 Jank,意外切子线程导致 Crash。
GFX introduced a surprise: dual render clocks. The JS side had its own rAF clock; GFX brought a C++-side render clock; both ran concurrently. Accidental main-thread switches caused Jank; accidental worker-thread switches caused Crash.
解决方法是合并渲染时钟 —— 移除多余时钟,把移除的时钟逻辑通过新实现的 C++ 接口,把函数指针给到 C++ 侧代理执行。
The fix: merge the clocks — remove the redundant one, hand its logic to the C++ side via a new binding that takes function pointers and proxies execution.
顺手把 rAF 也标准化了:原本 Ejecta 的 rAF 用 setTimeout 0 模拟(不合规范),由 Native 维护 Timer 队列等待 vsync 信号才回调(链路太长)。改成:vsync 后直接触发 JS tick 调用,JS 侧自己维护 Timers 队列,移除了 JSBinding 中转开销。
We also standardized rAF along the way. Ejecta’s rAF was simulated with setTimeout 0 (out of spec), with Native maintaining a Timer queue and waiting for vsync to call back (too long a chain). Now: vsync triggers JS tick directly; the JS side keeps its own Timers queue; the JSBinding round-trip is gone.
setTimeout(0) 模拟 rAF,不是浏览器标准实现,行为不可预期。
Simulating rAF with setTimeout(0) isn’t the spec; behaviour isn’t reliable.
数据:FPS +10%,Small Jank ↓ 78.5%。
Data: FPS +10%, Small Jank −78.5%.
3.9 JS 执行效率:四招 3.9 JS Execution: Four Tactics
GFX 解决了「重渲染」场景,但「重逻辑、轻渲染」场景里,JS 解释执行仍然是瓶颈。我们做了 4 招:
GFX solved render-heavy scenarios, but in logic-heavy / render-light scenarios, interpreted JS was still the bottleneck. Four tactics:
阻多余 JSLock 调用Avoid redundant JSLock calls
详见 Command Buffer 一节 —— 把 GL 调用从 JSObject 调用改成直接读写 TypedArray。Covered in the Command Buffer section — turn GL calls from JSObject methods into direct TypedArray reads/writes.
JSON 序列化原生化Native JSON serialization
每次切 Tab 都要序列化 148 KB JSON,JSCore 同步阻塞约 100ms。改原生实现导出 JSExport 后,性能提升约 95%,单次 Tab 切换耗时降到 8ms 以内。Every Tab switch serialized 148 KB of JSON, blocking JSCore for ~100ms. Moving the impl to native via JSExport sped this up by ~95% — each Tab switch shaved off up to 8ms.
主动 + 被动 GC 调用Active + passive GC
被动:Helio.onMemoryWarning 协议,容器侧收到内存告警时清理主进程资源。主动:jsb.garbageCollect,业务在合适时机调用,清理长时间未使用的纹理缓存。Passive: Helio.onMemoryWarning — when the container gets a memory warning, clear main-process resources. Active: jsb.garbageCollect — game code calls at safe moments to drop long-unused texture caches.
Separate JS ThreadSeparate JS thread
把 JS 执行放到独立线程上,让 JSVM 与 UI 线程解耦。落地中 —— 难点是 UI 相关接口需要切主线程、JSCallback 切回子线程,整体 binding 协议要重写。Move JS execution onto its own thread to decouple the JSVM from the UI thread. In progress — the hard part is that UI-touching APIs must switch to main and JSCallbacks must switch back, so the entire binding protocol needs a rewrite.
3.10 粒子与骨骼:同样的「下沉 C++」 3.10 Particles & Spine: The Same “Lower into C++”
同样的「下沉 C++」思路也用在了两个具体的 JS 模块上:粒子(Particle)和骨骼(Spine)。
The same “lower into C++” idea applied to two concrete JS modules: Particles and Spine.
Particle:原 JS 实现里 render 是每帧调用的热路径。我们把 render 封装成中间件,移除 JS 模块,render 移到渲染时钟中每帧调用。原 JS 对象改 C++ 实现,调用改为 JSBinding。
Particle: in the original JS implementation, render was a per-frame hot path. We wrapped render as middleware, removed the JS module, and called render every frame in the render-clock loop. Original JS objects became C++ implementations; calls became JSBindings.
Spine(骨骼动画):基于开源 spine-runtimes 项目,把 TypeScript 模块迁到 C++。集成到渲染时钟中间件,替换 TS 模块为 C++ 模块并提供 JSBinding。优势是可以自主控制 Spine 版本并做二次优化(比如对不活动骨骼做屏蔽)。
Spine: built on the open-source spine-runtimes project, migrating the TypeScript module to C++. Integrated into the render-clock middleware; TS swapped for C++ + JSBinding. The bonus is autonomy over the Spine version and the freedom to do further optimizations (e.g. cull inactive bones).
3.11 灰度数据:最终战绩 3.11 Field Data: The Final Tally
把所有渲染层优化叠在一起,看灰度数据 —— 最直接的对比是同帧率下的卡顿表现:
Stacking every render-layer optimization, the field data tells a clean story — the most direct comparison is stutter behavior at equal frame rates:
最终的战绩对比 ——
The final tally ——
3.12 iOS JIT:那个绕不开的天花板 3.12 iOS JIT: The Ceiling You Can’t Avoid
但故事还没完。iOS 上 JSCore 不支持 JIT 是绕不开的天花板 —— 在 V8 benchmark 里,WKWebView(带 JIT)的 JS 性能比 JSCore 高出 5-15 倍:
The story isn’t over. On iOS, JSCore’s no-JIT ceiling is unavoidable. In V8 benchmarks, WKWebView (with JIT) is 5–15× faster than JSCore:
| Benchmark | WKWebView (JIT) | JSCore (No JIT) | Ratio |
|---|---|---|---|
| Richards | 11,095 | 2,090 | 5.3× |
| Crypto | 37,000 | 2,313 | 16.0× |
| RayTrace | 16,748 | 2,835 | 5.9× |
| NavierStokes | 26,753 | 2,984 | 9.0× |
| DeltaBlue | 7,167 | 1,719 | 4.2× |
| Score (v7) | 16,750 | 2,314 | 7.2× |
解决思路上,业界目前有两条主流路径,我们都在探索:
There are two mainstream paths in the industry. We’re exploring both:
Separate JS ThreadSeparate JS Thread
在子线程创建 JSContext,让 JSVM 固定在子线程执行;UI 相关接口里切主线程;JSCallback 切回子线程。Create the JSContext on a worker thread so the JSVM is pinned there. UI-touching APIs switch to main; JSCallbacks switch back.
难点:整个 binding 协议要重写,UI 切换的边界要逐一审视。Hard part: entire binding protocol needs a rewrite; every UI-switching boundary needs review.
用 WKWebView 提供 JIT,Helio 只做渲染WKWebView for JIT, Helio for rendering
利用 WKWebView 中 JSCore 的 JIT 能力。改 xhr 拦截方式处理原本的 JSBinding 逻辑。业界其他团队走的「计算渲染分离」也是相似思路 —— 把 JIT 交给系统浏览器内核,渲染交给客户端。Use the JIT-enabled JSCore inside WKWebview. Replace JSBinding with xhr-intercepted JSBridge. Other industry teams have followed a similar “split compute and render” pattern — give JIT to the system browser kernel, keep rendering on the client.
难点:JSBinding 改 JSBridge 工作量大;JSBridge 通道吞吐量需要保障。Hard part: JSBinding → JSBridge is significant work; JSBridge throughput must be guaranteed.
渲染这一章到此为止。下面两章短一些 —— 第 4 章是业务视角,第 5 章是收获与展望。
That ends the rendering chapter. The next two are shorter — chapter 4 is the business view, chapter 5 is takeaways and what’s next.
业务视角:怎么用 Helio From the Business Side: Using Helio
前面三章是「Helio 怎么炼成的」 —— 引擎视角。这一章换到业务方视角:拿到 Helio 之后怎么接入、怎么调试、出问题之后怎么定位。短一些,但是日常用得最多的那部分。
The first three chapters were “how Helio was built” — the engine view. This chapter switches to the business view: how to integrate it, how to debug it, how to root-cause production issues. Shorter — but the part teams use every day.
4.1 构建模版与加载流程 4.1 Build Template & Load Sequence
业务方接入 Helio 的成本被压到很低。一份标准的构建模版,几个固定的入口文件,剩下都交给 CI 完成。结构如下:
Integration cost was pushed down hard. One standard build template, a few fixed entry files, the rest handled by CI. The shape:
game.js 加载 adapter + cocos sdk + 业务 SDK + cocos 游戏入口 main.js,全程异步、业务无感。
Use whatever mainstream mini-game CI your game framework already ships. Under streaming, game.js loads adapter + cocos sdk + business SDK + cocos entry main.js — all async, all invisible to the game code.
4.2 JSContext 上的对象设计 4.2 What Lives on the JSContext
JSContext 上的对象做了明确的分层:业务侧只需要关注 Helio 协议对象,其他对象按需访问。
Objects on the JSContext are explicitly layered. Business code only needs the Helio protocol objects; everything else is reachable on demand.
wx.*—— 所有业界主流小游戏协议方法,业务侧主要使用wx.*— full industry-standard mini-game protocol surface, the main business-side touchpointenv—— 配置信息(设备、版本、环境变量)env— config info (device, version, environment variables)loading—— 控制 Loading 模版loading— controls the Loading templateevent—— 内建事件机制,支持自定义事件event— built-in event system, custom events allowedcallNative—— 对宿主 JSBridge 的桥接方法callNative— bridge into the host’s JSBridgenative—— 原生方法的 JS 导出native— JS exports of native methodsloadFile—— 流式加载入口,业务无感loadFile— streaming-load entry, invisible to business
GFX C++ 扩展绑定的对象(仅在 cocos 初始化创建前向渲染管线时用到,之后转交给 window.cc 维护)和 Ejecta 渲染引擎绑定的内置类,业务侧基本不需要关注。
The GFX C++ extension bindings (only used when Cocos initializes the forward render pipeline, then handed off to window.cc) and Ejecta’s built-in classes are not meant for business consumption.
wx.*所有小游戏协议方法all mini-game protocol methodsenv配置信息(设备 / 版本 / 环境)config (device / version / env)loading控制 Loading 模版controls the Loading templateevent内建事件机制(支持自定义)built-in event system (custom OK)callNative桥接宿主 JSBridgebridge into host JSBridgenative原生方法的 JS 导出JS exports of native methodsloadFile流式加载入口(业务无感)streaming entry (invisible)
window.gfxGFX C++ 扩展绑定GFX C++ extension bindingswindow.cccocos 引擎对象cocos engine objectsEjecta.*Ejecta 内置类Ejecta built-in classes
4.3 三类问题的归因路径 4.3 Three Categories of Problems, Three Triage Paths
线上问题分三类:本地能复现的、用户反馈的、隐藏的没有反馈的。三类的处理路径完全不同 ——
Production issues come in three flavors: locally reproducible, user-reported, and silent / unreported. Each has a different triage path:
本地可复现Reproducible locally
最幸运的一类 —— 直接走断点流程The lucky case — go straight to breakpoint flow
- 本地开发服务(
npm run dev) - Local dev server (
npm run dev) - Android: adb forward + Chrome devtools
- Android: adb forward + Chrome devtools
- iOS: Safari Inspector + Xcode 混编调试
- iOS: Safari Inspector + Xcode mixed debug
- 真机连接 → 启动端口转发
- Connect device → forward ports
- 输入游戏 ID → 点击启动
- Enter game ID → launch
- 在 5 层栈任意位置打断点
- Set breakpoints at any of the 5 layers
用户反馈的User-reported
用户报上来但本地复现不出来 —— 走日志流程User reports come in but local repro fails — go to the log flow
- 被动获取:用户附件日志(火眼 / 树洞)
- Pull mode: user-attached logs (firewatch tools)
- 主动获取:根据 uin 在 wns 平台捞
- Push mode: pull by uin from the WNS platform
- 日志分类:Helio 容器日志 / C++ 侧 / JS 侧
- Log categories: container / C++ / JS
- 搜索游戏关键词向下检索
- Search by game keyword and scroll forward
- 核心看「framework」tag 的日志
- Focus on the “framework”-tagged lines
- 交叉对比 JS Console.warn 与 GL 错误
- Cross-reference Console.warn with GL errors
没有反馈的Silent / unreported
最难的一类 —— 用户已经流失但没说话。靠数据归因。The hardest — users churned without saying anything. Attribution via data.
- 上报数据 + 数据描述(mean / 75% / std / 50%)
- Telemetry + descriptive stats (mean / 75% / std / 50%)
- 散点图、饼图、聚类
- Scatter plots, pie charts, clustering
- 分维度:平台 / 系统 / 机型天梯分
- Dimensions: platform / OS / device tier
- 定量随机取样 3-5 万条
- Quantitative random sample, 30K–50K rows
- 先看数据描述常有意外收获
- Always start from descriptive stats
- 锁定特定维度后再做对照
- Lock a dimension, then run comparisons
4.4 案例 1:卡顿率分析(3 步定位) 4.4 Case 1: Stutter-Rate Analysis (3 Steps)
第三类「没有反馈的问题」最考验功夫。讲两个真实案例。第一个案例:iOS 外网平均卡顿率 Helio 高于 WebView,与实验室数据不符 —— 怎么办?
The third category — silent issues — is where the hard work lives. Two real cases. Case one: iOS field stutter rate for Helio was higher than WebView, contradicting lab data — what now?
预备:定量且随机取样Prep: quantitative random sampling
取 3 万条随机数据上报。如果数量不够,凑多天。先看各指标的数据描述 —— 帧率、卡顿率、DrawCall、机型、系统。Pull 30K random telemetry rows. If volume is insufficient, span more days. Start from descriptive stats — FPS, stutter rate, DrawCall, device, OS.
聚类:锁 FPS × Stuck 维度绘图Cluster: lock FPS × Stuck and plot
绘制散点图(结构跟第 3 章 FIG 34 一样)。结果一眼看到:Helio 表现更合理、更集中;WebView 散乱。Build a scatter plot (same shape as FIG 34 in chapter 3). The eye sees it instantly: Helio clusters tightly; WebView scatters.
结论:WebView 遇到卡顿更容易降到 30 帧Conclusion: WebView is more likely to drop to 30fps under stress
WebView 在卡顿时大幅降帧到 30,从而规避了很多卡顿场景的样本(变成「掉帧」而非「卡顿」)。Helio 始终保持 60,反而把所有卡顿都暴露在统计里。这不是 Helio 更卡 —— 是 Helio 更诚实。When stressed, WebView drops to 30fps — which dodges many stutter samples (they become “dropped frames” instead of “stutters”). Helio sticks at 60, which means every stutter shows up in the stats. Helio isn’t more janky — Helio is more honest.
这个案例的价值不只是结论本身,更是结论的「反直觉」 —— 监控数据看似变差,但用户体验实际更好。如果不做归因分析,很容易被表面数据带偏,做出错误的优化决策。
The value here isn’t just the conclusion — it’s the counterintuitive shape of it. The metric got worse on paper, but actual user experience improved. Without doing the attribution work, the surface number could push you toward a wrong optimization decision.
4.5 案例 2:转化率分析(猜想 → 验证) 4.5 Case 2: Conversion Analysis (Hypothesis → Verification)
第二个案例更复杂。背景:iOS 早期转化率比 Android 低 10% 左右,但 iOS 加载耗时比 Android 快 30%。「更快但是转化更差」 —— 这中间一定有问题。
Case two is messier. Background: early iOS conversion was ~10% below Android, even though iOS load time was 30% faster. “Faster but worse conversion” — something didn’t add up.
猜想:加载耗时只有进入之后才上报。如果用户在加载过程中遇到问题主动退出,加载耗时这个指标看不到。会不会就是这种情况?
Hypothesis: load-time telemetry only fires after entry. If users hit problems mid-load and bail, that drop-off doesn’t show up in load-time numbers. Could that be it?
三轮聚类分析 ——
Three rounds of clustering ——
聚类 1(uin 个例分析):捞了几个个例,发现 iOS 用户出现这个问题之后,会高概率复现,难以进入游戏。
Cluster 1 (per-uin): sampled individual users — those who hit the issue on iOS reproduced it consistently, with no path forward into the game.
聚类 2(加载失败 vs 实时上报维度):Android 加载失败率 4%(约等于实时上报的 3.6%);iOS 加载失败率只有 0.5%(远低于实时上报的 12.6%)。结论:用户在 iOS 下会遇到卡住,且不会有 JS 报错。
Cluster 2 (failure rate vs telemetry): Android failure rate 4% (matching the 3.6% real-time drop-off). iOS failure rate just 0.5% (far below the 12.6% drop-off). Conclusion: iOS users were getting stuck, with no JS error firing.
聚类 3(按加载进度分布):iOS 大部分卡在了 87% 进度。
Cluster 3 (by load progress): the bulk of iOS stuck users sat at exactly 87% progress.
结论:业务侧 87% 函数有问题,转业务侧定位修复。这个分析价值在于:从一个看起来抽象的「转化率差距」,一路追到具体到哪个加载进度的哪个函数。Helio 提供完备的上报和工具,让这种归因成为可能。
Conclusion: a function called at 87% load progress on the business side was broken — handed off for product-side fix. The value here is the trace path: from an abstract “conversion gap” all the way down to a specific function at a specific load percentage. Helio’s telemetry and tooling are what make that possible.
4.6 Helio 在 GameSDK 中的位置(呼应第 1 章) 4.6 Helio’s Place in GameSDK (echoing Chapter 1)
回到第 1 章那张生态图。业务方接的是 GameSDK,不是 Helio。GameSDK 已经覆盖 10+ 款游戏,业务方接入耗时从 10 天降到 3 天。这个数字怎么来的?因为 Helio 把脏活全包了:
Back to that ecosystem diagram in Chapter 1. Games integrate with GameSDK, not Helio. GameSDK already covers 10+ titles; integration time per game dropped from 10 days to 3. How? Because Helio absorbed the dirty work:
- 业务方不用考虑端能力差异(几个不同宿主 App 各家 bridge 协议不同)
- Business teams don’t have to handle host-app capability differences (several host apps each ship their own bridge protocol)
- 业务方不用关心运行时优化(粒子 / 骨骼 / GL 调用)
- Business teams don’t have to worry about runtime tuning (particles / bones / GL calls)
- 业务方不用维护本地开发链路(adb / Safari / Xcode 调试栈打通了)
- Business teams don’t have to maintain a local-dev pipeline (adb / Safari / Xcode debug stacks already stitched together)
- 业务方不用搭建上报与监控(Helio 框架自带)
- Business teams don’t have to build telemetry & monitoring (it’s in the framework)
业务方写完游戏,挂上 GameSDK 的几个钩子,剩下的事 Helio 管。从他们的视角看,Helio 是「不用知道存在的那一层」 —— 这是底座的最高赞誉。
Game teams ship the game, wire up a few GameSDK hooks, and let Helio handle the rest. From their seat, Helio is “the layer you don’t need to know exists” — about the highest compliment a foundation layer can earn.
收获与展望 Takeaways & Future
前面四章讲了 Helio 是怎么炼成的。最后这一章讲讲它带给我们什么样的复利、以及还能往哪里走。
The previous four chapters covered how Helio was built. This last one is about what it compounds into — and where it goes next.
5.1 类库封装与推广 5.1 Library Extraction & Adoption
Helio 在被验证可行之后,开始把核心能力沉淀成可复用的类库,让别的容器也能受益:
Once validated, Helio’s core capabilities started being extracted into reusable libraries so that other containers could benefit too:
- GFX 类库:支持各类容器接入,把「500 → 37 GL 指令」这套优化做成开箱即用的可选模块
- GFX library: drop-in for any container — packaging the “500 → 37 GL commands” optimization as an opt-in module
- JSAdapter:支持容器自定义适配层,把「适配层抽象」这件事开放出去
- JSAdapter: container-specific adapter layer support, opening up the “adapter abstraction” pattern itself
- GameSDK:建设游戏统一容器,让多 App、多游戏、多端的接入成本进一步降低 [DOING]
- GameSDK: a unified game container across host apps, games, and platforms — driving integration cost still lower [in progress]
- 流式加载推广:把第 2 章那条优化路径做成可被其他容器借用的标准能力
- Streaming-load adoption: package the chapter 2 path as a standard capability other containers can pick up
5.2 游戏引擎架构抽象(远期) 5.2 Game Engine Architecture (Long-Term)
远一点看,Helio 容器还有一个更大的方向:把所有模块下沉 C++ 实现,交叉编译出双端动态库。
Looking further out, Helio has a bigger ambition: lower every module into C++, cross-compile into dynamic libraries for both platforms.
指导原则:可以 C++ 实现的模块只实现一次(双端共享),不能 C++ 实现的能力则按 C++ 统一协议在双端各自实现。
Guiding principle: anything that can be C++ is implemented once (shared across both platforms); anything that can’t goes through a unified C++-style protocol with native implementations on each side.
这件事如果能做成,会带来三个收益:
If we land this, three returns follow:
- 解决双端协议对齐问题(很多线上 Bug 来自双端不一致)
- Cross-platform protocol parity (a lot of production bugs come from drift between the two)
- 可维护性提升,减少重复实现
- Better maintainability — less duplicated implementation
- 对接其他业务/容器更友好
- Easier integration into other businesses and containers
这是一个 18 个月以上的工程,但方向已经定下来了。
An 18-month-plus project, but the direction is fixed.
5.3 性能优化方法论:平衡的艺术 5.3 The Methodology: It’s a Balancing Act
一年的工程下来,最大的收获不是某一个具体的优化,而是对「权衡」的体感。性能优化本质上不是「做加法」 —— 加越多优化代价越大;它是一门取舍的艺术。我们碰到的取舍主要有三组:
A year in, the biggest takeaway isn’t any single optimization — it’s the visceral sense of trade-off. Performance optimization isn’t additive — every added optimization comes with a cost. It’s the art of choosing what to give up. We hit three pairs of trade-offs repeatedly:
硬件瓶颈是天然的。任何一种资源被压满,必须把负载转移到另一种资源上 —— 没有「全都更省」的方案。Hardware bottlenecks are inherent. When one resource saturates, the load has to move elsewhere — there’s no “cheaper across the board” option.
严守 Web 规范保证了通用性,但同时关上了某些性能优化的门。我们多次在「合规」和「快」之间做选择 —— 比如标准 rAF。Strict Web compliance keeps things universal, but also closes certain optimization doors. We picked between “to spec” and “to fast” many times — standardized rAF was one.
下沉 C++ 性能更好,但 C++ 的开发与调试体验远不如 JS。我们要的是两者都好 —— 这就是 Helio 在调试体系上花大量精力的原因。C++ runs faster; but C++ dev/debug feels nowhere as good as JS. We wanted both — which is why Helio invested so heavily in the debug stack.
三组权衡里,没有一组有「绝对正确」的答案。所谓工程哲学,就是在每个具体场景下,明确地知道自己在为什么取舍、为什么放弃 ——「哦,我这次选性能放弃了规范,是因为业务方就这一个引擎,不需要通用」。把「为什么」写下来,比「做什么」更重要。
None of these have a single right answer. Engineering philosophy is, at every concrete decision, knowing exactly what you’re trading for and why — “here I picked performance over the spec, because this team uses just one engine and doesn’t need universality.” Writing the why down matters more than the what.
5.4 未来方向 5.4 What’s Next
最后是 Helio 接下来要做的事。每一项都对应前面文章里某条没拉满的线 ——
Finally, Helio’s near-term roadmap. Each item corresponds to a thread from earlier in this article that isn’t yet fully pulled:
iOS JIT 模式iOS JIT mode在做in flight
第 3.12 节展开过的两条路 —— Separate JS Thread + WKWebview 借位。最终形态可能是两者的组合。The two paths from §3.12 — Separate JS Thread + WKWebview-borrowed JIT. The final shape will likely combine both.
WASM 能力WASM support规划中planned
如果未来引入重度 wasm 品类游戏(不是当前的 JS 游戏),需要补齐 wasm 运行时与函数级分包能力 —— 第 2.7 节提到的那条路就要走起来。If we onboard heavy wasm titles (different from today’s JS games), we’ll need a wasm runtime plus function-level splitting — the second path described in §2.7.
HotReload 能力HotReload提开发体验DX upgrade
第 1.7 节说「研发体验是另一半」。HotReload 是把开发体验再往前推一档 —— 改完代码不用重启容器,秒级看到效果。§1.7 said “dev experience is the other half.” HotReload pushes that experience another notch — change code, no container restart, see the result in seconds.
容器全面 C++ 化Full C++-ification of the container远期long-term
5.2 节描述的远期方向。一旦做成,Helio 就不只是一个 iOS / Android 容器,而是一套能被其他业务/团队接入的标准化游戏运行时。The long-term direction from §5.2. Once landed, Helio stops being “an iOS/Android container” and becomes a standardized game runtime that other businesses and teams can integrate.
2023 到 2024,一年的工程,1.4 万字的复盘可以收尾了。如果这篇文章让你对「容器层做什么」「为什么这么做」「能做到什么程度」有了一些更具体的概念,那它的目的就达到了。
2023 to 2024, one year of work, ~14K words of retrospective — time to wrap. If this article made “what a container layer actually does,” “why these choices,” and “how far it can go” feel a little more concrete to you, it’s done its job.
技术没有银弹,工程是一连串取舍的艺术。能做的,是把每一次取舍的「为什么」都讲清楚。
There’s no silver bullet — engineering is a chain of trade-offs. The most we can do is say why each one was made.