ursb.me / notesursb.me / notes
案例复盘 · 小游戏容器 Case Study · Mini-Game Container

如何打造
高性能小游戏容器
How to Build a
High-Performance Mini-Game Container

—— Helio 的进化之路与技术揭秘 — The Evolution & Tech Behind Helio

2023 到 2024,一年时间,一套跨多端的小游戏容器走完了一条「先证明能跑,再证明跑得更快」的路。这是一份完整的工程复盘。 Across 2023–2024, in one year, a cross-platform mini-game container took one path: first prove it works, then prove it’s faster. This is the engineering retrospective in full.

首屏耗时First Paint
5.2s
↓ 83.8%
Big Jank 反超Big Jank vs WebView
4.8×
vs WebViewsmoother
业务转化率Conversion
+4%
次留 +3%D1 retention +3%
Airing 互动游戏组Interactive Games 2024 · 4 ~28 分钟阅读~28 min read
向下滚动Scroll
第 01 章Chapter 01

为什么不能继续用 WebView Why Not WebView Anymore

WebView 是 Web 跨平台最常见的载体。但当业务从渲染好的页面变成实时渲染的游戏,它开始显出结构性的短板。这一章先把「为什么需要一个新容器」讲清楚。

The WebView is the default carrier for cross-platform Web. But once the workload shifts from rendered pages to real-time games, structural limits start to show. This chapter is about why a new container was needed in the first place.

1.1 先看一眼业务的「长相」 1.1 What the Business Actually Looks Like

我们做的事情其实挺具体的:在我们的多款音乐 App 里,给用户提供「小游戏」。绝大部分是消除类的休闲互动游戏,承担拉新、留存、活动转化等一系列业务目标。

What we build is concrete: mini-games shipped inside our suite of music apps. Most are casual interactive titles in the match-3 family, driving acquisition, retention, and event conversion.

整体业务 DAU 在十万量级,其中绝大多数跑在 Helio 容器上。游戏数量超过 10 款,每款都是独立的 Cocos 工程,技术栈是 JS + WebGL,发布形态接近业界主流小游戏。本文你看到的所有数据,都来自于这个真实的生产环境。

The business runs in the hundreds-of-thousands DAU range, the majority of which sits on Helio. The portfolio includes 10+ titles, each its own Cocos project, written in JS + WebGL and packaged in a way close to industry-standard mini-games. All numbers in this article come from this real production environment.

1.2 这件事不是我们一家在做 1.2 We’re Not the Only Ones Doing This

把镜头拉远一些,整个小游戏行业正在经历一次明显的「重度化」演进。从 2017 年「跳一跳」式的轻度休闲,到近几年 SLG、卡牌乃至 FPS / MMO 入场,平台和容器需要承载的负载,每隔两三年就抬一档。

Zoom out and the whole mini-game industry is shifting toward heavier genres. From 2017’s “Tap to Jump” era of light casuals, to recent waves of SLG, card battlers and even FPS / MMO titles, the workload that platforms and containers must absorb has stepped up every two or three years.

FIG 01 小游戏行业的重度化时间线 The Mini-Game Industry’s Heavy-Genre Timeline
2017
轻度休闲
跳一跳类
Light casuals
Tap-to-Jump era
2019–2020
中度品类
棋牌 / 模拟经营
Mid-weight
Card games / sims
2022
SLG / 卡牌
进入头部榜单
SLG / card battlers
top the charts
2023+
重度品类
FPS / MMO 入场
Heavy genres
FPS / MMO arrive

这条潮流意味着一件事:容器侧的性能投入是必须的,不是过度设计。业界几家主流平台(包括跨端引擎商、社交平台、游戏厂商等)都在持续投入容器层的优化 —— 后面几章你会看到我们在某些点上跟他们殊途同归,在另一些点上选了不同的路。

The implication is clear: investing in container-side performance is necessary, not over-engineering. Major platforms across the industry have all been pouring effort into the container layer. Later chapters will show that on some points we converged with them, and on others we deliberately took a different road.

1.3 WebView 跑小游戏的三个原罪 1.3 The Three Original Sins of Running Mini-Games on WebView

回到自己的选择。最早的方案就是 WebView —— 它在 Web 时代被验证过几乎所有能力。但把它用来跑实时渲染的游戏,很快就暴露出三个绕不开的结构性问题:

Back to our own decisions. The first version used a plain WebView, proven over years of the Web era. But once you put a real-time rendered game on top, three structural problems become unavoidable:

FIG 02 WebView 容器的三大痛 The Three WebView Pains
01
体验差Poor Experience
60 帧的目标常常掉到 30 帧;纹理糊;容器从冷启动到首屏经常需要 30 秒以上。 A 60fps target routinely dropped to 30fps. Textures looked blurry. Container cold-start to first paint often took 30+ seconds.
02
资源占用高High Resource Cost
CPU 与内存都吃紧。设备很快进入热降频状态,进一步把帧率压低 —— 一个负反馈闭环。 CPU and memory both ran hot. Devices entered thermal throttling quickly, which dragged the frame rate further down — a negative feedback loop.
03
问题定位困难Hard to Debug
WebView 是黑盒;版本碎片化严重;现场没有 source map;线上崩溃和卡顿很难找到根因。 The WebView is a black box. Versions are fragmented across devices. Source maps don’t survive in the wild. Production crashes and stutter are hard to root-cause.

这三条单独看都不算致命,但凑在一起,每一条都会放大其他两条 —— 卡顿触发降频,降频又让卡顿更明显,问题反馈进来又找不到根因。要打破这个闭环,得换一套底座。

Any one of these in isolation is survivable. Combined, they amplify each other — stutter triggers throttling; throttling worsens stutter; user reports flood in but nothing can be traced. Breaking the loop required a new foundation.

1.4 三方案横评:为什么是 Brush Renderer 1.4 Three Candidates: Why Brush Renderer Won

放弃 WebView 之后,我们筛了三类候选方案:

With WebView ruled out, we narrowed the field to three categories:

  • 游戏框架特有引擎 —— 比如 Cocos 自带的 Native 渲染管线,绑定单一引擎
  • Engine-specific native pipelines — e.g. Cocos’s built-in native renderer, bound to one engine
  • 小游戏供应商 SDK —— 平台方打包好的容器,开箱即用但深度可控性差
  • Vendor mini-game SDKs — pre-packaged containers, drop-in but with limited depth control
  • Brush Renderer —— 一套基于 OpenGL 的轻量渲染引擎,设计上对标 WebGL,可以无缝接入 Cocos 等多种引擎
  • Brush Renderer — a lightweight OpenGL-based renderer designed to mirror WebGL semantics, integrating cleanly with Cocos and other engines

实测下来,Brush Renderer 与 CocosNative 在原始性能上几乎打平:FPS 59.3 vs 60,CPU 24% vs 24.8%,内存 103MB vs 110MB(同一个 demo,Oppo chp1969)。但在五个综合维度上,Brush Renderer 在「扩展性」和「通用性」上甩开一截 —— 这两点直接决定了我们能否复用它跨多款游戏、跨多个宿主 App。

In raw performance, Brush Renderer and CocosNative ran essentially neck-and-neck: 59.3 vs 60 fps, 24% vs 24.8% CPU, 103MB vs 110MB memory (same demo, Oppo chp1969). But across five composite dimensions, Brush Renderer pulled ahead on extensibility and universality — the two dimensions that determine whether one container can serve many games across many host apps.

FIG 03 三方案五维横评雷达图 Five-Dimension Radar of the Three Candidates
业实践度Adoption 性能Performance 可控性Control 扩展性Extensibility 通用性Universality
Brush Renderer(选定)Brush Renderer (chosen)
CocosNativeCocosNative
供应商 SDKVendor SDK

1.5 但 Brush Renderer 也不是终点 1.5 But Brush Renderer Wasn’t the End Either

选定 Brush Renderer 不等于问题全部解决。落地之后,我们立刻又发现了新的三个问题:

Picking Brush Renderer didn’t close the case. Once it shipped, three new problems showed up almost immediately:

  1. 复杂度与一致性问题 —— 不同游戏接入时各自踩坑,缺乏统一抽象
  2. Complexity and inconsistency — every new game hit its own integration pitfalls; no unified abstraction
  3. 生态缺失 —— 缺本地开发、缺断点调试、缺 CI/CD,研发体验大幅倒退
  4. Missing ecosystem — no local dev workflow, no breakpoint debugger, no CI/CD; developer experience regressed
  5. 性能仍有提升空间 —— 渲染和执行效率都没有触到天花板
  6. Performance still capped — neither render nor execution had reached its ceiling

每一条都对应一个解。三个解叠在一起,就是 Helio 的初始架构 —— 适配层吸收差异,C++ 模块吸收性能瓶颈,外加完整的本地开发链路。

Each problem mapped to one solution. Stacked together, they form Helio’s initial architecture: an adapter layer to absorb diff, C++ modules to absorb perf bottlenecks, and a full local-dev pipeline that closes the loop.

FIG 04 Helio 分层架构(点击层级查看说明) Helio’s Layered Architecture (click a layer for details)
Game 游戏层Game Layer
Cocos 游戏Cocos Game
Babylon 游戏Babylon Game
Pixi 游戏Pixi Game
Service 服务层Service Layer
基础Foundation
FileManager
AudioEngine
Network
BundleLoader
平台Platform
用户信息UserInfo
广告Ads
支付Pay
分享Share
Engine 引擎层Engine Layer
绑定Binding
JSBinding
WXBinding
GFXBinding
ScriptEngine
V8
JSCore
渲染引擎Renderer
Brush RendererBrush Renderer
GFX
OpenGL ES3
Platform 平台层Platform Layer
Android
iOS

1.6 Helio 在 互动游戏生态中的位置 1.6 A Framework’s Place in Our Game Ecosystem

单独看 Helio,容易把它当成「又一个轮子」。把镜头拉到整个 互动游戏生态里,它的定位会清楚很多 —— 它是底座,不是工具:

Looking at Helio alone risks treating it as just another wheel. The picture sharpens once you put it inside the broader interactive-game ecosystem — it’s a foundation, not a tool:

FIG 05 互动游戏生态全景 Our Interactive-Game Ecosystem at a Glance
业务游戏Games
消除类Match-3
派对Party
闯关Levels
抽奖Lottery
其他 6+6+ more
统一 SDKUnified SDK
GameSDK · 组局 / 结算 / UI / 麦能力 / 端兼容GameSDK · Lobby / Billing / UI / Mic / Host shims
容器Container
宿主 AppHost Apps
音乐 App ①Music App #1
音乐 App ②Music App #2
音乐 App ③Music App #3
音乐 App ④Music App #4
音乐 App ⑤Music App #5
业务方接 GameSDK,不接 Helio。SDK 负责业务级编排,Helio 负责底座的运行时与性能。一套游戏代码,可以借这个结构跑在多个宿主 App 里。 Games integrate with GameSDK, not Helio. SDK orchestrates business flows; Helio owns the runtime and performance. One game codebase, many host apps.

1.7 生态建设:开发 → 调试 → 发布 → 运营 1.7 Ecosystem Building: Develop → Debug → Release → Operate

工程哲学的另一面是:性能优化只是 Helio 的一半,另一半是研发体验。一个引擎能跑得快还不够,团队还要能开发得快。我们沿着游戏研发的全生命周期,做了四个象限的工作 ——

The other half of the engineering philosophy: performance is only half of Helio. The other half is developer experience. An engine that runs fast isn’t enough if the team can’t ship fast. We worked the four quadrants along the full game-development lifecycle:

01

开发 · DevelopDevelop

和 Web 游戏一致的开发体验A dev experience on par with the Web
  • 对齐 Web 标准,提升研效
  • Aligned with Web standards
  • 支持完整的业界主流小游戏协议
  • Full industry-standard mini-game protocol
  • 支持必要的浏览器能力
  • Necessary browser capabilities included
02

调试 · DebugDebug

和 Web 游戏一致的高效构建Build efficiency matching the Web
  • 支持完备的 CI/CD
  • Full CI/CD pipeline
  • 完整的断点调试能力
  • Complete breakpoint debugging
  • 支持全部 JSBridge 接口
  • Full JSBridge surface
03

发布 · ReleaseRelease

定制实验、高效发布Custom experiments and fast rollout
  • 业务侧 + 框架侧 + 配置发布系统
  • Business + framework + release-config system
  • 支持自定义容器能力
  • Custom container capabilities
  • 移除浏览器黑盒差异
  • Remove browser black-box variance
04

运营 · OperateOperate

在线游戏的高质量运营High-quality production operations
  • 提供完备的上报接口与监控看板
  • Full telemetry and dashboards
  • 提供 JS 侧 / OC 侧的日志代理
  • JS-side and OC-side log proxies
  • Android Crash 率压到 0.0015%
  • Android crash rate held at 0.0015%
FIG 06 Android Crash 率视觉化:每 67,000 次启动只崩 1 次 Android Crash Rate Visualized: 1 Crash per 67,000 Sessions
0.0015%
Android 线上崩溃率Android Production Crash Rate
≈ 1 次崩溃 / 67,000 次启动≈ 1 crash per 67,000 sessions
本图展示 10,000 个点中的 1 个红点 —— 实际还要稀疏 6.7×The grid shows 10,000 dots with 1 red — production is 6.7× sparser still
这个数字不是孤立做出来的 —— 它建立在前面三章所有优化(架构、加载、渲染、调试体系)之上。 This number wasn’t produced in isolation — it sits on top of every optimization from the first three chapters: architecture, loading, rendering, debug system.

这四个象限不是独立的工作清单,而是一条闭环 —— 任何一个没做好,链路都会被卡住。比如 Crash 率 0.0015% 看起来是运营层面的指标,但如果开发体验跟不上、断点调不到、CI 发不动,这个数字根本拿不下来。

These four quadrants aren’t a checklist of independent items — they form a loop. If any one breaks, the chain breaks. The 0.0015% crash rate looks like an operations number, but it’s unreachable without strong dev experience, working breakpoints, and a reliable CI.

骨架讲完了。接下来三章,我们逐个看 Helio 的「快」是怎么炼成的:加载、渲染、业务接入。

That covers the foundation. The next three chapters dig into how Helio got fast: loading, rendering, and business integration.

第 02 章Chapter 02

加载优化:从 31 秒到 5 秒 Load Optimization: From 31 Seconds to 5

一款消除类小游戏,安装后第一次冷启动到首屏可交互,能慢成什么样?最早的数字是 31 秒。一年之后,这个数字降到了 5 秒。这一章讲讲是怎么一步步压下来的。

How long can the first cold-start of a match-3 game take, all the way from install to interactive? Our earliest number was 31 seconds. One year later, it’s 5. This chapter is how we got there.

2.1 启动器:把混乱的回调拍成状态机 2.1 The Startup Pipeline: Replacing Callback Chaos with a State Machine

加载流程比想象的复杂。从容器初始化、资源下载、包解压、JS 加载、引擎启动到游戏首屏,中间经过 9 个关键节点。任意一个节点出问题,整条链路就会卡住。

The loading sequence is more intricate than it looks. Container init, resource download, package unzip, JS bootstrap, engine startup, all the way to the first interactive frame — nine critical events along the way. Stall any one of them and the entire chain stalls.

最早我们用一堆回调和标志位串起来。结果是:每加一个错误处理就乱一次,每加一个埋点要改 5 处代码,每个游戏接入都要重新写一份。后来重写成了状态机:明确的状态定义、明确的事件触发、明确的状态迁移。9 个流程事件(JS 侧 4 个 + NA 侧 5 个)严格对应状态机的边。出问题时只要看是卡在哪个状态,就知道该看哪段日志。

The first version was a tangle of callbacks and flags. Adding error handling broke things; adding telemetry meant editing five places at once; every new game forced us to rewrite the integration. We rebuilt it as a state machine: explicit states, explicit triggers, explicit transitions. The nine events (4 on the JS side, 5 on the NA side) map precisely to its edges. When something stalls, you read which state you’re in — that tells you exactly which log to look at.

FIG 07 游戏启动器状态机(9 个流程事件) The Game Startup State Machine (9 events)
INIT PHASE PROCESSING TERMINAL 01 初始化Init CONTAINER_START NA SIDE 02 环境就绪Env Ready ENV_READY NA SIDE 03 资源下载Download DOWNLOAD NA SIDE 04 解压Unzip UNZIP NA SIDE 05 JS 加载JS Load JS_LOADED JS SIDE 06 引擎启动Engine Up ENGINE_READY NA → JS 07 首屏可交互First Paint INTERACTIVE JS SIDE ⚠ ERROR STATE 任一节点失败 → 上报 + 降级 + 重试any step fails → telemetry + fallback + retry 9 EVENTS · NA: 5 · JS: 4 每条状态边 = 1 个埋点 + 1 段日志each edge = 1 telemetry event + 1 log marker 出错时只看「卡在哪个状态」即可定位a stall = look at the stuck state, debug from there

2.2 分包加载:首屏依赖从 78MB 砍到 25.5MB 2.2 Subpackage Loading: First-Paint Bundle 78MB → 25.5MB

第一波优化复用了业界主流小游戏的「分包加载」协议:把整包拆成主包 + 若干子包,按各自的「最早需要时机」延迟加载。框架侧提供 loadSubpackage 接口,业务侧把游戏拆成 6 个包:

The first wave borrowed the industry-standard mini-game subpackage pattern: split one big bundle into a main package plus several subpackages, then defer-load each one to the earliest moment it’s actually needed. The framework exposed loadSubpackage; the product team carved each game into six packages:

FIG 08 分包结构与首屏依赖 Subpackage Structure & First-Paint Dependencies
原始整包:所有资源同步加载(77.8 MB) Original monolith: all resources loaded together (77.8 MB)
10.1main
3.0res-shared
12.1lobby-main
34.9 MBlobby-extra
0.3tab-res
17.4 MBgame-bundle
分包后:仅前三段需要在首屏前到位(25.5 MB) After subpackaging: only the first three are needed before first paint (25.5 MB)
10.1main
3.0res-shared
12.1lobby-main
34.9首屏后post first-paint
0.3点 Tabon tab
17.4点闯关on level
首屏前必须Required before first paint
延迟加载Deferred
把首屏依赖从 77.8 MB 砍到 25.5 MB,是分包能跑出 67.7% 提升的根因。 Cutting first-paint dependencies from 77.8 MB to 25.5 MB is the root cause behind the 67.7% improvement.
FIG 09 首屏耗时:整包 vs 分包(Android 同弱网测试) First Paint: Monolith vs Subpackages (Android, weak network)
整包加载Monolith
31.0s
分包加载Subpackages
10.0s
提升 67.7% 67.7% faster

2.3 分包加载的天花板 2.3 Where Subpackaging Hits the Ceiling

分包帮我们把首屏砍到 10 秒,但很快遇到了 4 个结构性问题:

Subpackaging cut first-paint to 10 seconds, then four structural problems surfaced:

  1. 框架侧无法控制业务拆包效果。拆得对不对全靠业务方的判断,框架只能干看。
  2. The framework can’t control how packages get split. Whether the cuts are right depends on each team’s judgment; the framework can only watch.
  3. 拆包成本会逐步劣化。业务每次迭代都可能让原来的分包失效,需要反复重新评估。
  4. Splits decay over time. Every business iteration risks invalidating prior cuts, forcing repeated re-evaluation.
  5. 加载链路设计上滞后。必须先 download → 再 unzip → 才能 loadFile,多次跨语言调用,包处理完才能加载其中的资源。
  6. Load chain is structurally lagged. You must download → unzip → loadFile, with multiple cross-language hops; resources can’t be touched until the package is fully prepped.
  7. 用户更新感知明显。每次版本升级都要重新下载,迭代频繁的游戏体验更差。
  8. Updates are user-visible. Every version forces a re-download; the more frequently a game iterates, the worse this gets.

要继续往下挤性能,得换思路。

To squeeze more, we needed a different idea.

FIG 10 分包加载的天花板:4 块砖砌成的墙 The Subpackage Ceiling: A Wall of Four Bricks
↑ 性能继续优化的方向↑ Direction of further optimization
BRICK 01
框架侧无法控制业务拆包效果Framework can’t control how teams split packages
BRICK 02
拆包成本逐步劣化Splits decay over time as iterations pile up
BRICK 03
加载链路设计上滞后Load chain is structurally lagged
BRICK 04
用户更新感知明显Updates are user-visible every time
分包加载的性能曲线在此处封顶 —— 再优化的边际收益接近零。Subpackage’s performance curve tops out here — marginal gains approach zero.
CEILING
↓ 必须换思路:流式加载↓ Time to switch approach: streaming load

2.4 流式加载:把每个资源都变成独立流 2.4 Streaming: Every Resource Is Its Own Stream

流式加载的核心思路:不再下整包,而是把每一个静态资源(图片、字体、音频、JSON、JS)当成一个可单独流式获取的单元。

The streaming idea: stop downloading whole packages. Instead, treat every static resource — image, font, audio, JSON, JS — as an independently streamable unit.

实现路径有三个层面:

Three implementation layers:

  1. 拦截 download:在 JS 适配层抹掉「下载」概念,把它改成「按需流式获取」。
  2. Intercept download: in the JS adapter layer, erase the “download” concept entirely and replace it with on-demand fetch.
  3. 针对静态资源实现不同的 load:Image 重写 setter;Font 修改 loadFont binding;Audio 用 Proxy 实现同步逻辑;Spine 用 remoteBundle。
  4. Per-type load implementations: Image overrides its setter; Font patches the loadFont binding; Audio uses a Proxy for synchronous semantics; Spine goes through remoteBundle.
  5. 原生层重做 loadFile:保留同步 require 接口,新增异步 loadFile,并在适配层为查询结果做缓存。
  6. Rebuild loadFile natively: keep a synchronous require API, add an asynchronous loadFile, and cache lookups in the adapter layer.

最后这一条是关键 —— 业务方的代码完全不知道资源是按需流来的还是预下载好的。这种「无感」是流式加载能成立的前提。

That last point is the linchpin — game code has no idea whether a resource was streamed or pre-downloaded. Invisibility is the precondition for streaming to work at all.

FIG 11 流式加载的三层实现 + 五类资源对应 Streaming Load: Three Layers × Five Resource Types
L1
拦截 download · Intercept downloadL1 · Intercept download
在 JS 适配层抹掉「下载」概念,统一改成「按需流式获取」。In the JS adapter layer, erase the “download” concept entirely and replace it with on-demand fetch.
L2
按类型 load · Per-type loadL2 · Per-type load
不同资源走不同的同步化策略 ——Each resource type uses a different synchronization tactic ——
Image
setter 覆写setter override
Font
binding 替换binding patch
Audio
Proxy 代理对象async-proxy object
Spine
remoteBundleremoteBundle
JS
同步 requiresync require
L3
重做 loadFile · Rebuild loadFileL3 · Rebuild loadFile
原生层保留同步 require,新增异步 loadFile,并在适配层为查询结果做缓存 —— 业务侧完全不需要感知下层的变化。Native side keeps a sync require, adds an async loadFile, and the adapter layer caches lookups — the business code is unaware of any of this.
三层叠在一起,业务方调用任意资源接口都能拿到「同步」语义,而底层走的是异步流式获取。 Stacked together, every resource API call returns “synchronously” while the bottom path is doing async streaming.

2.5 音频加载:5 个方案的艰难抉择 2.5 Audio: A Hard Choice Between Five Designs

流式加载里最难的细节是音频。原生音频接口是同步的 —— 业务调 audio.play(),立刻就期望响声。流式加载意味着资源可能还在路上,怎么处理?

The hardest detail in streaming is audio. Native audio APIs are synchronous — when game code calls audio.play(), it expects sound immediately. Streaming means the file may still be in flight. How do you reconcile that?

我们评估了 5 个方案:

We evaluated five options:

FIG 12 音频流式加载:五方案决策矩阵 Audio Streaming: Five-Option Decision Matrix
方案Option 业务感知Visible to Game 协议破坏Breaks API 体积/复杂度Cost 采用Picked
① AudioEngine 自带下载能力 ① Bake download into AudioEngine
客户端来不及改,C++ 加网络模块结构上不合理Client team had no bandwidth; bolting a network module into C++ was structurally wrong
↑↑
② 无缓存时首次同步调用 ② First call hits sync I/O
影响首次卡顿率,会卡顿First-paint stutter regression — janks under sync I/O
③ 确保音频资源在本地(preload / 子包打入) ③ Always have audio local (preload / bundle in subpackage)
耦合,存在隐患;影响首屏Tight coupling, hidden risks, hurts first-paint
④ 改异步调用 ④ Refactor calls to async
改动太大,破坏接口协议,业务侧需要全改Massive surface change; breaks the API contract; rewrites all callers
↑↑
⑤ 同步返回可异步赋值的代理对象 ⑤ Sync return of an async-fillable proxy
ID 仅 stop / setVolume 用,异步赋值无隐患;不修改接口,业务完全无感IDs are only used by stop/setVolume; async fill is safe; API contract unchanged; business is fully unaware

最终选了方案⑤。核心是一个「先返回代理,等音频真正下载完再回填 ID」的小状态机:

We picked option ⑤. At its core is a small state machine that returns a proxy synchronously, then back-fills the real ID once the audio file lands:

FIG 13 音频代理对象的三态状态机 The Audio Proxy’s Three-State Machine
idle
默认值default
pending
下载中downloading
loaded
真实 ID 已回填real ID filled in
业务调 loadAudio() 立刻拿到代理对象(idle);代理状态置 pending 同时触发下载;下载完成置 loaded、回填真实 ID。如果 pending 阶段调了 stop(),直接把这个 pending 任务丢弃 —— 业务对整个过程 100% 透明。 A loadAudio() call returns the proxy immediately (idle). The proxy flips to pending while the download fires; on completion it goes to loaded with the real ID filled in. If stop() is called during pending, that task is silently dropped. The whole flow is transparent to game code.

2.6 深度优化:JSON 合并 / 公共缓存 / 调用优化 2.6 Deeper Cuts: JSON Merging, Shared Cache, Call Tuning

流式加载落地后,新的瓶颈出现在网络层 —— NA 侧的网络库并发上限是 10,资源多的时候很容易触发瓶颈。我们沿着这条链路又做了三层优化:

Once streaming was live, a new bottleneck appeared at the network layer — the NA-side network stack caps concurrency at 10, easy to saturate. Three more optimizations followed:

JSON 合并:减少 52% 文件数

JSON Merging: 52% Fewer Files

构建产物里把各模块的 JSON 提前合并。文件数从 107 砍到 51。一次请求拿走多个 JSON,避开了并发瓶颈。

At build time, merge per-module JSON files. File count dropped from 107 to 51. One request pulls back multiple JSONs, sidestepping the concurrency cap.

公共缓存:跨版本提高命中率

Public Cache: Higher Cross-Version Hit Rate

用 LRU + 调用优化划分公共资源,跨版本提高缓存命中率。和 WebView 的资源缓存策略对齐。

An LRU + call-tuning policy carves out shared resources, lifting cross-version cache hit rates. Aligned with how the WebView caches resources.

调用优化:三级缓存 + 反序列化加速

Call Tuning: Three-Tier Cache + Faster Deserialization

缓存 loadFile 的查询结果(三级缓存:内存 → 持久化 → 网络)。OC 侧通过 JSExport 实现了反序列化方法导出,提升约 95% 的反序列化性能。

Cache loadFile lookups in three tiers: memory → persistent → network. The OC side exposes a deserialization method via JSExport, improving cache deserialization throughput by ~95%.

FIG 14 三级缓存的命中流程 Three-Tier Cache Hit Flow
loadFile() REQUEST L1 · MEMORY 内存命中in-memory hit ≈ 1 ms FASTEST miss L2 · PERSISTENT 持久化命中disk hit ≈ 5 ms FAST miss L3 · NETWORK 回源拉取network fetch ≈ 50 ms+ SLOWEST hit hit done 返回结果给 JS · OC 侧反序列化已加速 95%Return to JS · OC-side deserialization sped up by 95%
L1 → L2 → L3 顺序探测,先快后慢;任意一层命中就立即返回。OC 侧用 JSExport 导出反序列化方法是关键 —— 让缓存的数据从字节回到 JS 对象的过程不再被 JSCore 解释执行的成本拖累。 L1 → L2 → L3 probed in order, fast-to-slow; any hit returns immediately. The key trick is exporting the OC-side deserializer via JSExport — keeping cached-bytes-back-to-JS-objects out of the interpreted JSCore path.

三层叠加之后 ——With all three tiers stacked ——

FIG 15 加载耗时的完整演进(同弱网测试) Full Load-Time Evolution (same weak-network test)
整包加载Monolith
31.0s
分包加载Subpackages
10.0s
流式加载Streaming
5.5s
+ 调用优化+ Call tuning
5.0s
31 秒5 秒,总体优化 83.8%。最后一档反超 WebView 12.5% From 31s to 5s — an overall 83.8% reduction. The final bar surpasses WebView by 12.5%.

2.7 横向对照:业界做加载优化的三条路 2.7 Industry Side-by-Side: Three Paths to Faster Loading

把视野拉开看,业界做容器加载优化的思路大致有三条 —— 我们走了第一条,另外两条是其他团队在不同约束下选的路。三条路的分工很有意思:

Stepping back across the industry, container-side load optimization splits roughly into three families. We picked the first; other teams under different constraints picked the others. The division of labor is illuminating:

FIG 16 业界三条加载优化路 Three Industry Paths to Load Optimization
01
Helio 选择Helio’s pick

资源层:流式加载Resource layer: streaming

把每个静态资源当独立单元,按需流式加载,对业务零侵入。

Treat each static resource as a streamable unit. Fetch on demand. Zero business-side change.

优势:适用面广,任意 JS + WebGL 游戏都能用。

Strength: broadly applicable — any JS + WebGL game can adopt it.

限制:无法优化 wasm 这类二进制 IL 的加载链路。

Limit: can’t reach into wasm-style binary IL loading.

02
编译产物层Artifact layer

wasm 函数级分包Function-level wasm splitting

通过 PGO(Profile-Guided Optimization)把 wasm 包按函数粒度切成首包 + 子包。首包能小到原始包的 30–40%。

Use PGO (Profile-Guided Optimization) to slice a wasm bundle at function granularity into a first-package + sub-packages. The first package can shrink to 30–40% of the original.

优势:重度 wasm 场景下瘦身能力极强。

Strength: unmatched for slimming heavy wasm targets.

限制:需要业务方在测试阶段打桩收集,研发体验有妥协。

Limit: teams must instrument and run a profiling pass — some dev-experience cost.

03
编译器层Compiler layer

分级编译 + 压缩升级Tiered compile + better compression

从 JS 引擎本身入手:先用 LiftOff 做冷启动,运行中用 TurboFan 后台 re-compile 并缓存到下次。Brotli 替 Gzip 拿额外 20% 压缩率。

Act on the JS engine itself: use LiftOff for cold start, let TurboFan re-compile in the background and cache for next launch. Brotli over Gzip adds another 20% compression.

优势:不依赖业务方配合,普适性强。

Strength: no business-side cooperation required; broadly applicable.

限制:天花板由引擎实现决定,自己能动的范围有限。

Limit: ceiling set by the engine vendor — your room to maneuver is small.

三条路并不冲突 —— 一个理论上「全栈最优」的容器,三种优化都会做。Helio 现在主要走第一条,因为业务场景是 JS 游戏;如果未来引入 wasm 重度品类,第二条会成为必修课。 The three paths don’t conflict. A theoretically “optimal” container would pursue all three. Helio is mainly on the first because our games are JS, not wasm. If we ever onboard heavy wasm titles, the second path becomes required reading.

2.8 vs Cocos 远程包模式 2.8 vs Cocos’s Remote-Package Mode

在我们之前,Cocos 提供了一套自己的「远程包」模式 —— 类似流式加载,但实现方式不同。两者在 5 个维度上的对比:

Before us, Cocos shipped its own “remote-package” mode — similar in spirit to streaming, but different in execution. A side-by-side across five dimensions:

FIG 17 流式加载 vs Cocos 远程包 Streaming vs Cocos Remote-Package
维度Dimension Helio 流式加载Helio Streaming Cocos 远程包Cocos Remote-Package
包处理Package handling 只能内置Built-in only
远程脚本Remote scripts
缓存方案Cache strategy JS 实现,较简单JS-based, simpler
业务感知Business visibility 需修改包名 + CIRequires renaming + CI changes
引擎支持Engine support 仅支持 Cocos 游戏Cocos-only
流式加载在每一项上都更通用、更解耦。这一章到此为止 —— 加载这条路上能压的,基本都压完了。下一章看渲染。 Streaming is more general and more decoupled across the board. End of this chapter — most of what could be squeezed from loading has been squeezed. Next chapter: rendering.
第 03 章Chapter 03

渲染优化:从落后 6.8×反超 4.8× Render Optimization: From 6.8× Behind to 4.8× Ahead

加载完成只是开始 —— 真正决定用户体验的是渲染。Helio 在渲染上做的事情,比加载层多一倍。这一章会比前面长一些,因为它涉及音频、渲染管线、JS 执行、内存管理、析构、调试六条线。

Loading is just the start. What actually decides user experience is rendering. Helio did roughly twice as much work on rendering as on loading — this chapter runs longer because it spans six tracks: audio, render pipeline, JS execution, memory, teardown, and debugging.

3.0 渲染优化的方法论:三种姿态 3.0 The Methodology: Three Stances

在跳到具体优化之前,先建立一个共同的方法论。渲染性能优化的本质是定位瓶颈,再决定用三种姿态中的哪一种 ——

Before jumping into specifics, here’s the common methodology. Render performance optimization is fundamentally about locating the bottleneck, then choosing one of three stances:

FIG 18 渲染优化的三种姿态 Three Stances of Render Optimization
减少 · ReduceReduce

少做一些事 —— 减帧、减绘制、减纹理、减状态切换。Do less — fewer draws, fewer textures, fewer state switches.

转移 · ShiftShift

把瓶颈从一边搬到另一边 —— CPU 转 GPU、JS 转 C++、运行时转构建时。Move the bottleneck — CPU to GPU, JS to C++, runtime to build time.

砍掉 · CutCut

实现层已到天花板,那就调整效果本身。When the implementation hits a ceiling, adjust the effect itself.

要决定走哪条姿态,得先识别瓶颈是 CPU、GPU 还是带宽。识别方法、分析工具、各种 trick,业界已经积累了大量经验,本文不重复造轮子。我们关注的是 Helio 在这三种姿态下做的具体技术决策 —— 后面的小节,每一段都对应其中一种姿态。

Choosing the right stance requires identifying whether the bottleneck is CPU, GPU, or bandwidth. The industry has long-established methods, tools, and tricks for that — we won’t rehash them. What follows are Helio’s specific decisions within those three stances. Each subsection maps to one of them.

3.1 音频卡顿:一边一个坑 3.1 Audio Stutter: One Pit on Each Platform

音频不在主渲染路径上,但它能拉低帧率 —— 一次音频 IO 阻塞主线程,整帧就丢了。两端的情况完全不一样:

Audio isn’t on the main render path, but it drags frame rate — one blocking I/O on the main thread and a frame is gone. The two platforms looked very different:

Android:1 个炸弹音效 = 18 个 MediaPlayer

Android: One Bomb SFX = 18 MediaPlayer Instances

原方案是基于队列的丢弃策略,对并发处理有瑕疵。一次炸弹音效需要创建 18 个 MediaPlayer 实例(耗内存)。我们融合了 SoundPool 方案 —— 接入 OpenSL ES 接口,通过 Web 适配层对应改 AudioEngine 模块,接入成本低。

The original used a queue-based drop policy, fragile under concurrency. A single bomb sound effect spawned 18 MediaPlayer instances (memory hog). We fused SoundPool — integrating OpenSL ES through the Web adapter layer’s AudioEngine module, with low integration cost.

数据:Android 卡顿率 0.999%0.638%。同期 WebView 反而从 0.797% 退化到 0.883% —— 我们的优化在 WebView 路线上没有同步发生。

Result: Android stutter rate 0.999%0.638%. In the same period, WebView regressed from 0.797% to 0.883% — that optimization didn’t happen on the WebView path.

FIG 19 Android 音效融合:18 个 MediaPlayer 实例 → 1 个 SoundPool Android Audio Fusion: 18 MediaPlayer Instances → 1 SoundPool
优化前 · 队列丢弃策略Before · Queue-Drop Policy
18 × MediaPlayer18 × MediaPlayer 每个炸弹音效创建一份实例 · 内存与初始化开销叠加one instance per concurrent SFX · memory + init pile up
优化后 · 接入 SoundPoolAfter · SoundPool Fusion
1 × SoundPool1 × SoundPool 池化实例统一调度 · OpenSL ES 接入single pooled instance · OpenSL ES backend
融合后 Android 卡顿率从 0.999% 降到 0.638%。同期 WebView 路线没做这套优化,反而从 0.797% 退化到 0.883%。 After fusion, Android stutter rate dropped 0.999% → 0.638%. The WebView path didn’t get the same optimization and regressed 0.797% → 0.883% over the same window.

iOS:替换 OpenAL,再修一个 OpenAL 的坑

iOS: Switch to OpenAL, Then Fix an OpenAL Pitfall

iOS 上是线程调度导致主线程被 block。先把方案换成 OpenAL,但 OpenAL 自己有个坑 —— 切后台偶现不播放、报 alSourcePlay error code:a003。原因是系统在某些打断场景下不会触发 AVAudioSessionInterruptionTypeEnded 事件。我们补上了打断回调的处理逻辑。

On iOS, thread scheduling was blocking the main thread. We swapped to OpenAL — which has its own pitfall: silent failures on background swap with alSourcePlay error code:a003, because under certain interruption flows iOS doesn’t fire AVAudioSessionInterruptionTypeEnded. We patched the interruption callback path.

3.2 GFX 的灵感来源:500 vs 37 3.2 The Spark Behind GFX: 500 vs 37

从渲染层往上一级看,最大的一个发现是这个:同样一帧某款消除游戏的主界面,WebView / Helio 走 500 条 GL 指令,CocosNative 只用 37 条。差了 13.5 倍

One layer up from rendering, here’s the biggest finding: for the same single frame of a match-3 game’s main screen, WebView / Helio issues 500 GL commands; CocosNative issues just 37. A 13.5× ratio.

FIG 20 同一帧 GL 指令密度对比(某款消除游戏主界面) Same Frame, GL Command Density (Match-3 Game’s Main Screen)
WebView / Helio (JS)500
Helio + GFX C++37
每个方块代表一条 GL 指令。13.5× 的差距 —— 这就是 GFX C++ 化的灵感来源。 Each square is one GL command. The 13.5× gap is the spark for GFX’s C++ migration.

为什么差这么多?因为 Cocos 的 Native 实现把多个常用 GL 调用做了合批和封装。WebView 走 JS 标准接口逐条调用,每一条都是 JS → Binding → C++ 的一次跨语言开销。在解释执行的 JSCore 上(iOS 不能开 JIT),这个开销极其可观。

Why so different? Because Cocos’s native implementation batches and packages common GL calls. The WebView path goes through standard JS APIs one call at a time — each paying a JS → Binding → C++ cross-language hop. On interpreter-only JSCore (no JIT on iOS), that cost adds up brutally.

思路就出来了:能不能把 GL 调用从解释执行的 JS 里搬到 C++ 里,并且尽量合批?这就是 GFX C++ 扩展的起点。

The idea writes itself: can we lift GL calls out of interpreted JS into C++, and batch them while we’re at it? That’s GFX’s starting point.

3.3 GFX 的架构与接入 3.3 GFX Architecture & Integration

GFX 的架构有两个关键决定:

Two key architectural decisions:

  1. 独立建库:从 Cocos Engine 的 GFX 部分抽离出来,提供完整的交叉编译能力,支持 Android + iOS 双平台
  2. Stand-alone library: extract the GFX bits from Cocos Engine, with full cross-compile support for Android + iOS
  3. 复用 Helio 已有上下文:复用 Helio 的 JSContext(共享 Binding)、复用 EAGLView 的 glContext(避免双份开销)
  4. Reuse Helio’s existing context: reuse the JSContext (shared bindings), reuse EAGLView’s glContext (no double overhead)

接入流程上,SE 定义新接口复用 Helio 的 JSContext,EAGLView 复用已有的 glContext,适配层在前向渲染初始化阶段调整 _flow 的实现 (initWebGL),绘制阶段把原本 JS 的 gfx 接口改为 C++ 的 Binding。

Integration: SE defines new APIs reusing Helio’s JSContext, EAGLView reuses Helio’s glContext, the adapter layer adjusts _flow at forward-render init (initWebGL), and turns JS-side gfx calls into C++ Bindings during draw.

se::ScriptEngine *se = se::ScriptEngine::getInstance();
jsb_register_all_modules();
se->start(self.gameEJView.jsGlobalContext);

最关键的设计是可插拔:编译参数可选编译 GFX 模块,Helio 可选接 lib 库 —— 业务方按需引入。这件事对 Helio 的可推广性至关重要 —— 不强制升级,业务方按自己的节奏迁移。

The most important design choice is being pluggable: a compile flag selects whether GFX is included; Helio can optionally link the library — products opt in. This matters enormously for adoption: no forced upgrades, teams migrate at their own pace.

FIG 21 GFX 模块在 Helio 容器中的接入方式 How GFX Plugs into the Helio Container
HELIO 容器HELIO CONTAINER JSContext JSEngine + bindings 已存在,不动existing, untouched EAGLView glContext 已存在,不动existing, untouched GFX C++ Extension 可插拔 · 可选编译PLUGGABLE · OPT-IN SE 新接口(复用 JSContext)SE new APIs (reuse JSContext) C++ Binding 替换 JS gfx 调用C++ Bindings replace JS gfx calls 渲染管线拦截(initWebGL)Render pipeline intercept (initWebGL) 交叉编译双端动态库Cross-compiled dual-platform libs ▸ 500 → 37 GL 指令/帧▸ 500 → 37 GL commands/frame 复用 JSContextreuse JSContext 复用 glContextreuse glContext
GFX 没有重新发明轮子。它复用了 Helio 既有的 JSContext 与 glContext,只新增 C++ Binding + 管线拦截两层 —— 这就是「可插拔」三个字背后的具体含义。 GFX didn’t reinvent any wheels. It reuses the existing JSContext and glContext, only adding C++ Bindings and a pipeline intercept — that’s what “pluggable” means in concrete terms.

3.4 横向对照:渲染优化的两条哲学路 3.4 Two Philosophies of Render Optimization

在 GFX 这件事上,业界其实分成了两条哲学路。它们的目标都是「让游戏渲染更快」,但实现的姿态截然不同。

On the GFX question, the industry actually splits into two philosophical paths. Both aim at “make game rendering faster,” but the stance is fundamentally different.

FIG 22 两条渲染优化哲学路 Two Philosophies of Render Optimization
A
渐进式 · Helio 选择Incremental · Helio’s pick

保留 WebGL 语义,下沉 C++ 实现Preserve WebGL semantics, lower implementation into C++

WebGL 数百个接口保持不变,把热路径的 GL 调用从 JS 沉到 C++。业务侧零改动 —— Cocos 引擎层吸收。Keep all hundreds of WebGL APIs unchanged. Move hot-path GL calls from JS into C++. Zero business-side change — absorbed at the Cocos engine layer.

代价:天花板有限,毕竟仍是 WebGL 语义。Cost: the ceiling is bounded — it’s still WebGL semantics underneath.

▸ 500 → 37 GL 指令/帧 ▸ 500 → 37 GL commands/frame
B
革命式Revolutionary

重新定义渲染接口Redefine the rendering API itself

抛弃 WebGL,新设计一套精简接口(约十几个)。抽象出 Pipeline / RenderPass 等更现代的概念。在引擎侧重新生成 shader / 重写绑定。Abandon WebGL; design a slimmed API (about a dozen calls). Abstract more modern concepts like Pipeline / RenderPass. Re-generate shaders / rewrite bindings at the engine layer.

代价:需要业务方/引擎层配合改造,落地周期长。Cost: business teams and engines must come along; rollout takes much longer.

▸ 单帧 17ms → 3ms ▸ 17ms → 3ms per frame
两条路的 trade-off 完全相反。我们选了 A 是因为 我们的业务方接的引擎五花八门,要他们改 shader 和接口的落地周期会拖很长。如果是单一引擎主导的平台,B 路可能反而是更优解。 The trade-offs are diametrically opposite. We picked A because our partner teams use a wide variety of engines — asking each of them to rewrite shaders and bindings would push the rollout out for years. For a single-engine-dominant platform, B might be the better answer.

3.5 GFX 落地的 6 个难点 Bug 3.5 Six Hard Bugs From the GFX Rollout

落地不是平的。GFX 接入过程中遇到了 6 个有代表性的难点 Bug,每一个都需要对 GL 管线、JS 引擎、适配层都有深入理解才能定位。

Production wasn’t flat. Six representative bugs surfaced during integration. Each required deep understanding of the GL pipeline, the JS engine, and the adapter layer to root-cause.

FIG 23 GFX 落地的 6 个有代表性 Bug Six Representative Bugs From GFX Integration
BUG #01
黑屏Black screen
原因CauseGL 管线问题,交换 Buffer 时其中一个 FrameBuffer 是空的(降采样 Bug)GL pipeline issue — one of the FrameBuffers was empty during swap (downsampling bug)
修复Fix修正降采样初始化顺序Fix downsampling init order
BUG #02
渲染错乱Rendering mangled
原因CauseJSBinding 回调线程异常 —— 异步任务回调时没切主线程,导致 GL 线程不安全JSBinding callbacks didn’t switch back to main thread — GL thread-unsafe
修复Fix所有 GL 相关回调强制切主线程Force main-thread switch for all GL-touching callbacks
BUG #03
纹理本身错乱Wrong textures
原因Cause适配层 Image 预设出错,glTexImage2D 时 glType 与 glFormat 出错Adapter-layer Image presets were off — glType and glFormat wrong at glTexImage2D
修复Fixcanvas2d 元素类型判断 + 枚举值改常量Fix canvas2d type detection; replace missing enums with constants
BUG #04
圆角不生效Rounded corners broken
原因Cause适配层 cc.Mask(clearGraphics)逻辑未接入到更新循环cc.Mask (clearGraphics) logic wasn’t hooked into the update loop
修复Fix补全 mask 更新调用Wire mask updates into the loop
BUG #05
点击事件丢失Click events lost
原因CauseGFX 初始化时尺寸出错,导致注册点位偏移Wrong size at GFX init shifted registration coordinates
修复Fix强制指定手势支持,补 polyfillForce gesture support, add the missing polyfill
BUG #06
GL_VALIDATE_STATUSGL_VALIDATE_STATUS fails
原因Cause需要主动调用 glValidateProgram 验证Required calling glValidateProgram explicitly
修复Fix在 program link 后补一次 validateAdd a validate pass after program link
FIG 24 6 个 Bug 在 5 层栈上的分布 Where the Six Bugs Lived Across the Five-Layer Stack
业务游戏Business Game JS Cocos 引擎Cocos Engine JS JS 适配层JS Adapter JS 容器层Container OC / Java GFX 扩展GFX Extension C++ 03 纹理错乱wrong tex 04 圆角不生效mask broken 05 注册链路registration 点击丢失click lost 02 渲染错乱render mangled 01 黑屏black screen 06 validatevalidate
3 个 Bug 落在 JS 适配层(包含 Cocos 那部分),3 个落在 C++ 扩展(其中 #02 跨 OC/C++ 边界、#05 跨 JS/OC 边界)。这就是为什么调试系统必须 5 层全打通 —— 任何一层断了,这些 Bug 都没法在合理时间内定位。 Three bugs lived in the JS adapter layer (including the Cocos parts); three in the C++ extension (with #02 straddling the OC/C++ boundary and #05 straddling JS/OC). This is why the debug system has to cover all five layers — break any layer and these bugs become unsolvable in reasonable time.

3.6 6 个 Bug 能高效解决,靠的是全链路调试体系 3.6 Solving Those Bugs Required a Full-Chain Debug System

这 6 个 Bug 能在合理的时间内定位掉,不是因为我们运气好,而是因为 Helio 同时在做调试体系建设。我们沉淀了 5 维调试能力:

These bugs got fixed in reasonable time not because we got lucky, but because Helio had built a debug system in parallel. Five dimensions of debug capability:

  • 源码:支持 C++ Debug 模式源码集成
  • Source: C++ Debug-mode source integration
  • 日志:支持 JS 日志代理 + C++ 日志代理
  • Logs: JS-side proxy + C++-side proxy
  • 断点:支持全链路断点 —— 一次断点穿透 5 层
  • Breakpoints: full-chain — a single break can pause across 5 layers
  • 渲染:支持抓帧分析 GL 命令
  • Render: frame capture for GL command analysis
  • 请求:支持 JS 侧与 NA 侧的请求抓包
  • Network: request inspection on both JS and NA sides

最关键的是「全链路断点穿透」 —— 一次单步可以从游戏业务的 JS 代码,逐步调到底层 C++ 的 GL 实现。

The most important piece is full-chain breakpoint traversal — a single step-through can walk from game JS code all the way down to the C++ GL implementation.

FIG 25 一次断点,穿透 5 层语言栈 One Breakpoint, Five Languages Deep
业务游戏代码Business game code JS
Cocos 引擎Cocos engine JS
Helio JS 适配层Helio JS adapter JS
Helio 容器层Helio container layer OC / Java
GFX C++ 扩展GFX C++ extension C++
JS / OC / C++ 三种语言栈打通,一次单步从最上面的业务代码可以走到最下面的 GL 命令。这不是工具集成,是底层调试器协议的串联。 JS / OC / C++ stitched together. A single step-through walks from top-level business code down to the underlying GL command. Not just a tooling integration — a stitching of debugger protocols at the lowest level.

3.7 体积、内存、析构、Command Buffer 3.7 Size, Memory, Teardown, Command Buffer

接下来是几个体积、内存、析构相关的优化点。一并放在一个小节里,避免散乱。

A handful of size, memory, and teardown improvements — grouped here to avoid scattering.

体积优化:-35.4%

Binary Size: -35.4%

移除了 Network 模块(libwebsockets / libjson / libuv 三个库),充分利用原生组件优化纹理加载链路(减少 2 次跨语言调用),加上 -O3 编译优化。最终从 11532 KB 降到 7448 KB。

Removed the Network module (libwebsockets / libjson / libuv), shortened the texture-loading call chain by two cross-language hops, and added -O3 compile flags. From 11,532 KB down to 7,448 KB.

FIG 26 体积优化构成:从 11,532 KB 到 7,448 KB(−35.4%) Binary-Size Breakdown: 11,532 KB → 7,448 KB (−35.4%)
优化前Before
7,448 KB核心Core
−3,500Network
−584−O3 / Tex
11,532 KB
优化后After
7,448 KB核心Core
7,448 KB−35.4%
两块红色/橙色区域是被砍掉的部分:Network 模块(libwebsockets + libjson + libuv 三个库)占大头,−O3 编译 + 纹理加载链路精简贡献剩余。 The red and orange bands show what was cut: the Network module (libwebsockets + libjson + libuv) was the bulk; −O3 plus the leaner texture path contributed the rest.

CanvasBuffer 内存:从 2 份到 1 份

CanvasBuffer Memory: Two Copies → One

优化前所有 Canvas 操作存在 2 份 Texture2D 的内存开销,通过注册的回调函数跨语言发送 Buffer 数据。优化后改成 C++ 直接操纵 JS 引擎修改 Buffer 数据,仅留 1 份 Texture2D。

Previously every Canvas operation kept two copies of a Texture2D, shuttling buffer data across languages via callback. Now C++ writes directly into the JS engine’s buffer, leaving just one copy.

FIG 27 CanvasBuffer 内存:2 份 Texture2D → 1 份共享 Buffer CanvasBuffer Memory: Two Texture2D Copies → One Shared Buffer
优化前 · 跨语言回调Before · Cross-Language Callback
JS · Texture2D
↕ callback
C++ · Texture2D
2 × Texture2D2 × Texture2D JS 与 C++ 各持一份 · 跨语言搬运 buffer 数据one copy on each side · buffer shuttled across languages
优化后 · 直写 JS BufferAfter · C++ Writes JS Buffer Directly
Shared Buffer
(JS & C++)
1 × Texture2D1 × Texture2D C++ 操纵 JS 引擎内存 · 没有第二份拷贝C++ writes the JS engine memory · no second copy
这是 GFX 化思路的延伸:既然 C++ 已经持有 JS 引擎,那就让它直接写引擎内存,不必再通过回调把数据复制一份。Texture2D 内存对半砍。 An extension of the GFX idea: now that C++ holds a JS engine reference, let it write the engine’s memory directly — no callback, no second copy. Texture2D memory cut in half.

析构优化:4 步 + 2 个真实崩溃

Teardown: Four Steps + Two Real Crashes

析构是个被低估的复杂问题。Cocos 的设计上没考虑过销毁(毕竟 Cocos 本身就是个应用),GFX 只是容器中的一个类库。我们补充了 4 步析构流程:

Teardown is underestimated. Cocos isn’t designed to be destroyed (it expects to *be* the application); GFX is just one library within our container. We added a four-step teardown:

FIG 28 Helio 4 步析构时序图 Helio’s Four-Step Teardown Sequence
JS window.* / cocos 容器侧Container OC / Java C++ native engine STEP 1 stopMainLoop() destroy() 销毁 window.gfx / .jsb / .ccclear window.gfx / .jsb / .cc exit() STEP 2 取消监听器cancel listeners 取消渲染循环代理cancel render-loop proxy end() STEP 3 销毁 AudioEnginedestroy AudioEngine 停渲染循环stop render loop 销毁事件分发器destroy event dispatcher 销毁 JS 引擎destroy JS engine 销毁 EAGLViewdestroy EAGLView return STEP 4 销毁 View · 触发 end 事件 · 后处理destroy View · fire end · post-process

实战中有两个有代表性的崩溃:

Two representative crashes hit production:

案例 1:异步任务持有的 JSContext 在析构后变 null。某个网络请求回调时容器已经销毁,回调里还在用 JSContext。解决:对异步任务里持有的 JSContext 等对象增加保护,避免析构后异步任务出错。

Case 1: async tasks hold a now-null JSContext after teardown. A network callback fired after the container had been destroyed, still trying to use JSContext. Fix: guard JSContext (and friends) in async-held closures so post-teardown invocations fail safely.

案例 2:渲染循环仍在触发 JS 执行。析构期间渲染循环还没真正停下,触发了一次 JS tick,撞上正在销毁中的 JS 引擎。解决:先销毁渲染循环、再销毁 JS 引擎。注意要用标记位/事件等方式确认循环已销毁,不是简单的延时执行。

Case 2: render loop still firing JS during teardown. The loop hadn’t fully stopped, fired one more JS tick, and hit a half-destroyed engine. Fix: destroy the render loop before the JS engine, and confirm it has actually stopped via a flag/event — not a naive setTimeout.

Command Buffer:双 Buffer 设计减少 43.6% 内存

Command Buffer: Dual-Buffer Cuts 43.6% Memory

GFX C++ 化之后,JS 侧到 C++ 侧的通信开销成了新瓶颈。原生对象导出 JSObject 给 JS 侧使用,JS 调用时因为 JSCore 的线程安全机制会进行上锁,但目前业务是单线程,上锁和解锁属于冗余开销。

Once GFX moved into C++, JS↔C++ chatter became the new bottleneck. Native objects exported as JSObjects acquire a JSCore lock on every JS call for thread safety — but our workload is single-threaded, so that lock is pure overhead.

基于 TypedArray 实现的 Command Buffer 直接操作 JSCore 内存,无需调用访问,规避了 JSLock 开销。两个命令 Buffer (Float64Array) 占 56 字节。业务场景中每帧 GL 命令在 80~600 区间,最大内存占用约 14 KB。

A TypedArray-backed Command Buffer writes directly into JSCore memory, sidestepping the JSLock entirely. Two Float64Array command buffers occupy 56 bytes. In real workloads, GL commands per frame range 80–600, peaking around 14 KB.

FIG 29 双 Buffer 设计:规避 JSLock + 命令/参数分离 Dual-Buffer Design: Bypass JSLock + Cmd/Param Split
传统:JSObject + JSLock Before — JSObject + JSLock
gl.drawArrays(0, 6) acquire JSLock
C++ 执行 release JSLock × 80~600 次/帧 = 锁开销巨大
gl.drawArrays(0, 6) acquire JSLock
C++ executes release JSLock × 80~600 calls/frame = huge lock cost
双 Buffer:直写 JSCore 内存 After — Direct JSCore Memory
cmdBuf[i] = 23 · paramBuf[j..j+1] = [0, 6]
C++ 直接读 ArrayBuffer 无 JSObject · 无 JSLock 每帧 0 锁开销
cmdBuf[i] = 23 · paramBuf[j..j+1] = [0, 6]
C++ reads ArrayBuffer directly no JSObject · no JSLock 0 lock cost per frame
命令 Buffer · Float64ArrayCommand Buffer · Float64Array 每格 = 1 个 opcode(8 字节)each cell = 1 opcode (8 B)
5clear
23drawArrays
12bindTexture
17uniform
14viewport
参数 Buffer · Float64ArrayParameter Buffer · Float64Array 连续存放,按 opcode 切片(虚线 = 新命令起点)contiguous, sliced per opcode (dashed = new cmd start)
16384→ clear
0→ drawArr
6
3553→ bindTex
5
11→ uniform
0→ viewport
0
720
1280
示例:5 命令 + 10 参数 = 120 字节Example: 5 commands + 10 params = 120 B 单帧峰值:约 14 KBPeak per-frame: ~14 KB 命令/参数分离 → Cmd/param split → −43.6% 内存 memory
命令与参数分离 + 直接写 JSCore 原生内存 + 规避 JSLock,三者叠加把 JS↔C++ 通信开销压到最低。 Splitting commands from params + writing JSCore memory directly + bypassing JSLock — three layers stacked to minimize JS↔C++ overhead.

3.8 渲染时钟与 rAF 标准化 3.8 Render Clock & rAF Standardization

GFX 引入之后出现了一个意外问题:双渲染时钟。原本 JS 侧有一个 rAF 时钟,GFX 引入后又有了一个 C++ 侧渲染时钟,两个时钟同时跑:意外切主线程导致 Jank,意外切子线程导致 Crash。

GFX introduced a surprise: dual render clocks. The JS side had its own rAF clock; GFX brought a C++-side render clock; both ran concurrently. Accidental main-thread switches caused Jank; accidental worker-thread switches caused Crash.

解决方法是合并渲染时钟 —— 移除多余时钟,把移除的时钟逻辑通过新实现的 C++ 接口,把函数指针给到 C++ 侧代理执行。

The fix: merge the clocks — remove the redundant one, hand its logic to the C++ side via a new binding that takes function pointers and proxies execution.

顺手把 rAF 也标准化了:原本 Ejecta 的 rAF 用 setTimeout 0 模拟(不合规范),由 Native 维护 Timer 队列等待 vsync 信号才回调(链路太长)。改成:vsync 后直接触发 JS tick 调用,JS 侧自己维护 Timers 队列,移除了 JSBinding 中转开销。

We also standardized rAF along the way. Ejecta’s rAF was simulated with setTimeout 0 (out of spec), with Native maintaining a Timer queue and waiting for vsync to call back (too long a chain). Now: vsync triggers JS tick directly; the JS side keeps its own Timers queue; the JSBinding round-trip is gone.

FIG 30 双时钟两线程:意外跨线程 → Jank / Crash Two Clocks & Two Threads: Stray Crossings → Jank / Crash
BEFORE · TWO CLOCKS · TWO THREADS 016.733.3 5066.783.3 100116.7 ms Main thread JS rAF 应在这里JS rAF lives here → JANK → JANK Worker thread GFX C++ 应在这里GFX C++ lives here → CRASH JS rAF tick GFX C++ tick 误切线程的 tick(→ Jank / Crash)stray-thread tick (→ Jank / Crash) AFTER · ONE CLOCK · ONE THREAD Main thread vsync 直接驱动 JS + rendervsync drives JS + render 每 16.67ms 一次 tick · 主线程独占 · 无线程争抢one tick every 16.67ms · main thread only · no contention
FIG 31 rAF 标准化:5 跳长链路 → 2 跳直达 rAF Standardization: 5-Hop Chain → 2-Hop Direct Path
BEFORE · Ejecta 的 rAF:setTimeout(0) + Native Timer Q BEFORE · Ejecta’s rAF: setTimeout(0) + Native Timer Q 5 跳 · 不合规范5 hops · non-spec
JS rAF 调用链 JS rAF call chain
JS rAF(cb)main thread
setTimeout(0)不合规范non-spec
JSBinding跨语言cross-lang
Native Timer Qnative
等 vsyncwait vsyncnative
JSBinding cbJSBinding cbmain thread
不合规范Non-spec setTimeout(0) 模拟 rAF,不是浏览器标准实现,行为不可预期。 Simulating rAF with setTimeout(0) isn’t the spec; behaviour isn’t reliable.
链路太长Chain too long Native 维护 Timer 队列、等待 vsync 信号消费完才回调 JS,多次 binding 跨语言开销叠加。 Native maintains the Timer queue and waits for vsync to drain it before calling JS back — multiple binding round-trips stacked up.
AFTER · vsync 直接触发 JS tick · JS 自维护 Timers Q AFTER · vsync directly fires JS tick · Timers Q on JS side 2 跳 · 标准化2 hops · spec-aligned
新 rAF 调用链 New rAF call chain
vsyncsystem
C++ 直接触发C++ direct firemain thread
JS tick + Timers QJS tick + Timers Qmain thread
规范化Spec-aligned vsync 之后 C++ 直接调用 JS tick,行为与浏览器 rAF 一致。 C++ fires JS tick right after vsync — behaviour matches browser rAF.
无中转No middleman Timers 队列移到 JS 侧自维护,移除了原本走 Native 中转的 JSBinding 开销。 The Timers queue moves to the JS side; the JSBinding round-trip through Native is gone.
链路 5 跳 → 2 跳 · 与合并时钟一起得到 FPS +10% / Small Jank −78.5% 5 hops → 2 hops · combined with clock-merge: FPS +10% / Small Jank −78.5%

数据:FPS +10%,Small Jank ↓ 78.5%。

Data: FPS +10%, Small Jank −78.5%.

3.9 JS 执行效率:四招 3.9 JS Execution: Four Tactics

GFX 解决了「重渲染」场景,但「重逻辑、轻渲染」场景里,JS 解释执行仍然是瓶颈。我们做了 4 招:

GFX solved render-heavy scenarios, but in logic-heavy / render-light scenarios, interpreted JS was still the bottleneck. Four tactics:

FIG 32 JS 执行效率四招 JS Execution: Four Tactics
01
阻多余 JSLock 调用Avoid redundant JSLock calls

详见 Command Buffer 一节 —— 把 GL 调用从 JSObject 调用改成直接读写 TypedArray。Covered in the Command Buffer section — turn GL calls from JSObject methods into direct TypedArray reads/writes.

02
JSON 序列化原生化Native JSON serialization

每次切 Tab 都要序列化 148 KB JSON,JSCore 同步阻塞约 100ms。改原生实现导出 JSExport 后,性能提升约 95%,单次 Tab 切换耗时降到 8ms 以内。Every Tab switch serialized 148 KB of JSON, blocking JSCore for ~100ms. Moving the impl to native via JSExport sped this up by ~95% — each Tab switch shaved off up to 8ms.

03
主动 + 被动 GC 调用Active + passive GC

被动:Helio.onMemoryWarning 协议,容器侧收到内存告警时清理主进程资源。主动:jsb.garbageCollect,业务在合适时机调用,清理长时间未使用的纹理缓存。Passive: Helio.onMemoryWarning — when the container gets a memory warning, clear main-process resources. Active: jsb.garbageCollect — game code calls at safe moments to drop long-unused texture caches.

04
Separate JS ThreadSeparate JS thread

把 JS 执行放到独立线程上,让 JSVM 与 UI 线程解耦。落地中 —— 难点是 UI 相关接口需要切主线程、JSCallback 切回子线程,整体 binding 协议要重写。Move JS execution onto its own thread to decouple the JSVM from the UI thread. In progress — the hard part is that UI-touching APIs must switch to main and JSCallbacks must switch back, so the entire binding protocol needs a rewrite.

3.10 粒子与骨骼:同样的「下沉 C++」 3.10 Particles & Spine: The Same “Lower into C++”

同样的「下沉 C++」思路也用在了两个具体的 JS 模块上:粒子(Particle)和骨骼(Spine)。

The same “lower into C++” idea applied to two concrete JS modules: Particles and Spine.

Particle:原 JS 实现里 render 是每帧调用的热路径。我们把 render 封装成中间件,移除 JS 模块,render 移到渲染时钟中每帧调用。原 JS 对象改 C++ 实现,调用改为 JSBinding。

Particle: in the original JS implementation, render was a per-frame hot path. We wrapped render as middleware, removed the JS module, and called render every frame in the render-clock loop. Original JS objects became C++ implementations; calls became JSBindings.

Spine(骨骼动画):基于开源 spine-runtimes 项目,把 TypeScript 模块迁到 C++。集成到渲染时钟中间件,替换 TS 模块为 C++ 模块并提供 JSBinding。优势是可以自主控制 Spine 版本并做二次优化(比如对不活动骨骼做屏蔽)。

Spine: built on the open-source spine-runtimes project, migrating the TypeScript module to C++. Integrated into the render-clock middleware; TS swapped for C++ + JSBinding. The bonus is autonomy over the Spine version and the freedom to do further optimizations (e.g. cull inactive bones).

FIG 33 粒子 / 骨骼 C++ 化前后性能对比 Particles / Spine: JS Module vs C++ Module
SCENE 01 16K 粒子particles iPhone XS +10 FPS 60fps target JS module 50 FPS C++ module 60 FPS SCENE 02 100 骨骼bones iPhone 15 Pro +33 FPS JS module 27.2 FPS C++ module 60 FPS 10-MIN JANK 44.7 0 完全消除eliminated
两个场景里 C++ 化都把帧率打到 60 上限。骨骼场景的 10 分钟 Jank 数从 44.7 直接归零 —— 这是 C++ 化最戏剧化的一处。 In both scenes, C++-ification pushed frame rate to the 60fps cap. In the bones scene, 10-min Jank count dropped from 44.7 straight to 0 — the most dramatic single result of the C++ migration.

3.11 灰度数据:最终战绩 3.11 Field Data: The Final Tally

把所有渲染层优化叠在一起,看灰度数据 —— 最直接的对比是同帧率下的卡顿表现:

Stacking every render-layer optimization, the field data tells a clean story — the most direct comparison is stutter behavior at equal frame rates:

FIG 34 同帧率下的卡顿率分布(Helio vs WebView 灰度数据) Stutter Distribution at Equal Frame Rate (Helio vs WebView, field)
⚠ STUCK ZONE SMOOTH ZONE ✓ 60fps 0% 10 20 30 40 50% 30 35 40 45 50 55 60 FPS · 帧率FPS · frame rate 卡顿率 Stuck %Stutter % μ WebView ~42 fps · 26% μ Helio ~58 fps · 7% WebView · n≈50 Helio · n≈50
WebView 的点散落在低帧率 + 高卡顿区;Helio 的点紧密聚集在 60fps 附近、低卡顿区。这是两种系统行为的本质差异。 WebView points scatter into low-FPS / high-stutter territory; Helio points cluster near 60fps with low stutter. Two systems with fundamentally different behavior.

最终的战绩对比 ——

The final tally ——

FPS
+43%
Small Jank
−61.5%
Jank
−78.9%
Big Jank
−86%
CPU
−35%
GPU
−50%
至此「重渲染」和「重逻辑」两大场景问题都得到解决。 Both render-heavy and logic-heavy scenarios are now in good shape.

3.12 iOS JIT:那个绕不开的天花板 3.12 iOS JIT: The Ceiling You Can’t Avoid

但故事还没完。iOS 上 JSCore 不支持 JIT 是绕不开的天花板 —— 在 V8 benchmark 里,WKWebView(带 JIT)的 JS 性能比 JSCore 高出 5-15 倍:

The story isn’t over. On iOS, JSCore’s no-JIT ceiling is unavoidable. In V8 benchmarks, WKWebView (with JIT) is 5–15× faster than JSCore:

FIG 35 V8 benchmark · WKWebView vs JavaScriptCore (iOS) V8 Benchmark · WKWebView vs JavaScriptCore (iOS)
Benchmark WKWebView (JIT) JSCore (No JIT) Ratio
Richards11,0952,0905.3×
Crypto37,0002,31316.0×
RayTrace16,7482,8355.9×
NavierStokes26,7532,9849.0×
DeltaBlue7,1671,7194.2×
Score (v7)16,7502,3147.2×
FIG 36 JIT 性能差距:每个 benchmark 的倍率 JIT Performance Gap: Ratio Per Benchmark
Crypto
16.0×
NavierStokes
9.0×
Score (v7)
7.2×
RayTrace
5.9×
Richards
5.3×
DeltaBlue
4.2×
每根柱子是「带 JIT / 不带 JIT」的性能倍率。最坏情况差 4.2×,最好情况差 16×。这是一道无法绕过的天花板 —— 也是为什么 iOS JIT 是 Helio 下一阶段的必修课。 Each bar is the “JIT vs no-JIT” ratio. Worst case 4.2×; best case 16×. An unavoidable ceiling — and why iOS JIT is the next required item on Helio’s roadmap.

解决思路上,业界目前有两条主流路径,我们都在探索:

There are two mainstream paths in the industry. We’re exploring both:

FIG 37 两条 JIT 突破路径 Two Paths Past the JIT Ceiling
1
线程隔离Thread isolation

Separate JS ThreadSeparate JS Thread

在子线程创建 JSContext,让 JSVM 固定在子线程执行;UI 相关接口里切主线程;JSCallback 切回子线程。Create the JSContext on a worker thread so the JSVM is pinned there. UI-touching APIs switch to main; JSCallbacks switch back.

难点:整个 binding 协议要重写,UI 切换的边界要逐一审视。Hard part: entire binding protocol needs a rewrite; every UI-switching boundary needs review.

▸ 落地中▸ In progress
2
JIT 借位Borrow JIT

用 WKWebView 提供 JIT,Helio 只做渲染WKWebView for JIT, Helio for rendering

利用 WKWebView 中 JSCore 的 JIT 能力。改 xhr 拦截方式处理原本的 JSBinding 逻辑。业界其他团队走的「计算渲染分离」也是相似思路 —— 把 JIT 交给系统浏览器内核,渲染交给客户端。Use the JIT-enabled JSCore inside WKWebview. Replace JSBinding with xhr-intercepted JSBridge. Other industry teams have followed a similar “split compute and render” pattern — give JIT to the system browser kernel, keep rendering on the client.

难点:JSBinding 改 JSBridge 工作量大;JSBridge 通道吞吐量需要保障。Hard part: JSBinding → JSBridge is significant work; JSBridge throughput must be guaranteed.

▸ 探索中▸ Exploring
两条路解的是同一个问题:iOS 的 JIT 限制。最终选哪条,可能是路径 1 + 2 的某种组合 —— 这是 Helio 下一阶段的工作。 Both paths attack the same problem: iOS’s JIT restriction. The final shape may be a combination of the two — that’s on Helio’s next-phase roadmap.

渲染这一章到此为止。下面两章短一些 —— 第 4 章是业务视角,第 5 章是收获与展望。

That ends the rendering chapter. The next two are shorter — chapter 4 is the business view, chapter 5 is takeaways and what’s next.

第 04 章Chapter 04

业务视角:怎么用 Helio From the Business Side: Using Helio

前面三章是「Helio 怎么炼成的」 —— 引擎视角。这一章换到业务方视角:拿到 Helio 之后怎么接入、怎么调试、出问题之后怎么定位。短一些,但是日常用得最多的那部分。

The first three chapters were “how Helio was built” — the engine view. This chapter switches to the business view: how to integrate it, how to debug it, how to root-cause production issues. Shorter — but the part teams use every day.

4.1 构建模版与加载流程 4.1 Build Template & Load Sequence

业务方接入 Helio 的成本被压到很低。一份标准的构建模版,几个固定的入口文件,剩下都交给 CI 完成。结构如下:

Integration cost was pushed down hard. One standard build template, a few fixed entry files, the rest handled by CI. The shape:

FIG 38 Helio 构建产物的标准结构 Helio Build Output: Standard Structure
build-template/framework/ ├── adapter/ // 适配层:抹平差异Adapter layer: smooths over differences │ ├── gfx-builtin-min.js │ ├── gfx-engine-min.js │ └── gfx-extensions-min.js ├── cocos/ │ └── cocos2d-gfx-min.js ├── core-js.min.js // es polyfill ├── game.js // 主入口Main entry ├── gamePrepare.js // 业务 SDK 钩子Business SDK hook ├── main.js // cocos 游戏入口Cocos game entry ├── polyfill_preload.js ├── preload.js // 预加载文件Preloaded └── host-bridge.js // 业务 SDKBusiness SDK
直接使用各游戏框架提供的主流小游戏 CI 即可。流式加载下,game.js 加载 adapter + cocos sdk + 业务 SDK + cocos 游戏入口 main.js,全程异步、业务无感。 Use whatever mainstream mini-game CI your game framework already ships. Under streaming, game.js loads adapter + cocos sdk + business SDK + cocos entry main.js — all async, all invisible to the game code.

4.2 JSContext 上的对象设计 4.2 What Lives on the JSContext

JSContext 上的对象做了明确的分层:业务侧只需要关注 Helio 协议对象,其他对象按需访问。

Objects on the JSContext are explicitly layered. Business code only needs the Helio protocol objects; everything else is reachable on demand.

  • wx.* —— 所有业界主流小游戏协议方法,业务侧主要使用
  • wx.* — full industry-standard mini-game protocol surface, the main business-side touchpoint
  • env —— 配置信息(设备、版本、环境变量)
  • env — config info (device, version, environment variables)
  • loading —— 控制 Loading 模版
  • loading — controls the Loading template
  • event —— 内建事件机制,支持自定义事件
  • event — built-in event system, custom events allowed
  • callNative —— 对宿主 JSBridge 的桥接方法
  • callNative — bridge into the host’s JSBridge
  • native —— 原生方法的 JS 导出
  • native — JS exports of native methods
  • loadFile —— 流式加载入口,业务无感
  • loadFile — streaming-load entry, invisible to business

GFX C++ 扩展绑定的对象(仅在 cocos 初始化创建前向渲染管线时用到,之后转交给 window.cc 维护)和 Ejecta 渲染引擎绑定的内置类,业务侧基本不需要关注。

The GFX C++ extension bindings (only used when Cocos initializes the forward render pipeline, then handed off to window.cc) and Ejecta’s built-in classes are not meant for business consumption.

FIG 39 JSContext 上的对象分层 JSContext Object Layering
window.* · JSContext
业务必看 · Business-facingBusiness-facing
  • wx.*所有小游戏协议方法all mini-game protocol methods
  • env配置信息(设备 / 版本 / 环境)config (device / version / env)
  • loading控制 Loading 模版controls the Loading template
  • event内建事件机制(支持自定义)built-in event system (custom OK)
  • callNative桥接宿主 JSBridgebridge into host JSBridge
  • native原生方法的 JS 导出JS exports of native methods
  • loadFile流式加载入口(业务无感)streaming entry (invisible)
容器内部 · InternalInternal
  • window.gfxGFX C++ 扩展绑定GFX C++ extension bindings
  • window.cccocos 引擎对象cocos engine objects
  • Ejecta.*Ejecta 内置类Ejecta built-in classes
业务侧无需关注,不建议直接使用Not for business use; avoid direct access
左栏 7 个对象是日常接入会用到的;右栏 3 个对象由容器初始化使用,业务侧基本不需要碰。 The 7 objects on the left are what you’ll touch in everyday integration; the 3 on the right are container-internal and rarely need attention.

4.3 三类问题的归因路径 4.3 Three Categories of Problems, Three Triage Paths

线上问题分三类:本地能复现的、用户反馈的、隐藏的没有反馈的。三类的处理路径完全不同 ——

Production issues come in three flavors: locally reproducible, user-reported, and silent / unreported. Each has a different triage path:

FIG 40 问题分诊:从问题类型到工具选择 Issue Triage: From Type to Toolchain
问题类型Type
主要工具Tools
关键操作Key Steps
本地可复现Reproducible locally

最幸运的一类 —— 直接走断点流程The lucky case — go straight to breakpoint flow

  • 本地开发服务(npm run dev
  • Local dev server (npm run dev)
  • Android: adb forward + Chrome devtools
  • Android: adb forward + Chrome devtools
  • iOS: Safari Inspector + Xcode 混编调试
  • iOS: Safari Inspector + Xcode mixed debug
  • 真机连接 → 启动端口转发
  • Connect device → forward ports
  • 输入游戏 ID → 点击启动
  • Enter game ID → launch
  • 在 5 层栈任意位置打断点
  • Set breakpoints at any of the 5 layers
用户反馈的User-reported

用户报上来但本地复现不出来 —— 走日志流程User reports come in but local repro fails — go to the log flow

  • 被动获取:用户附件日志(火眼 / 树洞)
  • Pull mode: user-attached logs (firewatch tools)
  • 主动获取:根据 uin 在 wns 平台捞
  • Push mode: pull by uin from the WNS platform
  • 日志分类:Helio 容器日志 / C++ 侧 / JS 侧
  • Log categories: container / C++ / JS
  • 搜索游戏关键词向下检索
  • Search by game keyword and scroll forward
  • 核心看「framework」tag 的日志
  • Focus on the “framework”-tagged lines
  • 交叉对比 JS Console.warn 与 GL 错误
  • Cross-reference Console.warn with GL errors
没有反馈的Silent / unreported

最难的一类 —— 用户已经流失但没说话。靠数据归因。The hardest — users churned without saying anything. Attribution via data.

  • 上报数据 + 数据描述(mean / 75% / std / 50%)
  • Telemetry + descriptive stats (mean / 75% / std / 50%)
  • 散点图、饼图、聚类
  • Scatter plots, pie charts, clustering
  • 分维度:平台 / 系统 / 机型天梯分
  • Dimensions: platform / OS / device tier
  • 定量随机取样 3-5 万条
  • Quantitative random sample, 30K–50K rows
  • 先看数据描述常有意外收获
  • Always start from descriptive stats
  • 锁定特定维度后再做对照
  • Lock a dimension, then run comparisons

4.4 案例 1:卡顿率分析(3 步定位) 4.4 Case 1: Stutter-Rate Analysis (3 Steps)

第三类「没有反馈的问题」最考验功夫。讲两个真实案例。第一个案例:iOS 外网平均卡顿率 Helio 高于 WebView,与实验室数据不符 —— 怎么办?

The third category — silent issues — is where the hard work lives. Two real cases. Case one: iOS field stutter rate for Helio was higher than WebView, contradicting lab data — what now?

FIG 41 归因 3 步法:取样 → 聚类 → 结论 Three-Step Attribution: Sample → Cluster → Conclude
取样Sample
3 万条随机数据上报,先看 mean / 75% / std / 50% 分布30K random telemetry rows · start from mean / 75% / std / 50%
聚类Cluster
锁 FPS × Stuck 维度绘散点 · 看分布形状而不是平均值Lock FPS × Stuck and plot · look at the shape, not the average
结论Conclude
WebView 易掉 30 帧而隐藏卡顿样本 · Helio 反而更诚实WebView drops to 30fps and hides stutter samples · Helio is more honest
每一步都有具体的工具与产物 —— 这是把「拍脑袋」变成「可复用归因路径」的关键。详细操作见下文。 Each step has concrete tools and outputs — this is how “intuition” becomes a “reusable attribution path.” Full detail below.
01
预备:定量且随机取样Prep: quantitative random sampling

取 3 万条随机数据上报。如果数量不够,凑多天。先看各指标的数据描述 —— 帧率、卡顿率、DrawCall、机型、系统。Pull 30K random telemetry rows. If volume is insufficient, span more days. Start from descriptive stats — FPS, stutter rate, DrawCall, device, OS.

02
聚类:锁 FPS × Stuck 维度绘图Cluster: lock FPS × Stuck and plot

绘制散点图(结构跟第 3 章 FIG 34 一样)。结果一眼看到:Helio 表现更合理、更集中;WebView 散乱。Build a scatter plot (same shape as FIG 34 in chapter 3). The eye sees it instantly: Helio clusters tightly; WebView scatters.

03
结论:WebView 遇到卡顿更容易降到 30 帧Conclusion: WebView is more likely to drop to 30fps under stress

WebView 在卡顿时大幅降帧到 30,从而规避了很多卡顿场景的样本(变成「掉帧」而非「卡顿」)。Helio 始终保持 60,反而把所有卡顿都暴露在统计里。这不是 Helio 更卡 —— 是 Helio 更诚实。When stressed, WebView drops to 30fps — which dodges many stutter samples (they become “dropped frames” instead of “stutters”). Helio sticks at 60, which means every stutter shows up in the stats. Helio isn’t more janky — Helio is more honest.

这个案例的价值不只是结论本身,更是结论的「反直觉」 —— 监控数据看似变差,但用户体验实际更好。如果不做归因分析,很容易被表面数据带偏,做出错误的优化决策。

The value here isn’t just the conclusion — it’s the counterintuitive shape of it. The metric got worse on paper, but actual user experience improved. Without doing the attribution work, the surface number could push you toward a wrong optimization decision.

4.5 案例 2:转化率分析(猜想 → 验证) 4.5 Case 2: Conversion Analysis (Hypothesis → Verification)

第二个案例更复杂。背景:iOS 早期转化率比 Android 低 10% 左右,但 iOS 加载耗时比 Android 快 30%。「更快但是转化更差」 —— 这中间一定有问题。

Case two is messier. Background: early iOS conversion was ~10% below Android, even though iOS load time was 30% faster. “Faster but worse conversion” — something didn’t add up.

猜想:加载耗时只有进入之后才上报。如果用户在加载过程中遇到问题主动退出,加载耗时这个指标看不到。会不会就是这种情况?

Hypothesis: load-time telemetry only fires after entry. If users hit problems mid-load and bail, that drop-off doesn’t show up in load-time numbers. Could that be it?

三轮聚类分析 ——

Three rounds of clustering ——

FIG 42 转化率漏斗:iOS vs Android(同期数据) Conversion Funnel: iOS vs Android (Same Period)
iOSAndroid
启动加载Launch & load
100%
100%
实时上报丢失率Real-time drop-off
12.6%
3.6%
显式加载失败率Explicit load fail
0.5%
4.0%
卡在 87% 进度Stuck at 87% progress
~10%
关键观察:iOS 显式加载失败率只有 0.5%,远低于实时上报丢失率的 12.6%。中间有 12.1% 的用户「无声地走了」 —— 没报错也没完成加载。 Key observation: iOS’s explicit load-failure rate is just 0.5%, far below the 12.6% real-time drop-off. That leaves 12.1% of users who silently disappeared — no error, no completed load.

聚类 1(uin 个例分析):捞了几个个例,发现 iOS 用户出现这个问题之后,会高概率复现,难以进入游戏。

Cluster 1 (per-uin): sampled individual users — those who hit the issue on iOS reproduced it consistently, with no path forward into the game.

聚类 2(加载失败 vs 实时上报维度):Android 加载失败率 4%(约等于实时上报的 3.6%);iOS 加载失败率只有 0.5%(远低于实时上报的 12.6%)。结论:用户在 iOS 下会遇到卡住,且不会有 JS 报错。

Cluster 2 (failure rate vs telemetry): Android failure rate 4% (matching the 3.6% real-time drop-off). iOS failure rate just 0.5% (far below the 12.6% drop-off). Conclusion: iOS users were getting stuck, with no JS error firing.

聚类 3(按加载进度分布):iOS 大部分卡在了 87% 进度。

Cluster 3 (by load progress): the bulk of iOS stuck users sat at exactly 87% progress.

结论:业务侧 87% 函数有问题,转业务侧定位修复。这个分析价值在于:从一个看起来抽象的「转化率差距」,一路追到具体到哪个加载进度的哪个函数。Helio 提供完备的上报和工具,让这种归因成为可能。

Conclusion: a function called at 87% load progress on the business side was broken — handed off for product-side fix. The value here is the trace path: from an abstract “conversion gap” all the way down to a specific function at a specific load percentage. Helio’s telemetry and tooling are what make that possible.

4.6 Helio 在 GameSDK 中的位置(呼应第 1 章) 4.6 Helio’s Place in GameSDK (echoing Chapter 1)

回到第 1 章那张生态图。业务方接的是 GameSDK,不是 Helio。GameSDK 已经覆盖 10+ 款游戏,业务方接入耗时从 10 天降到 3 天。这个数字怎么来的?因为 Helio 把脏活全包了:

Back to that ecosystem diagram in Chapter 1. Games integrate with GameSDK, not Helio. GameSDK already covers 10+ titles; integration time per game dropped from 10 days to 3. How? Because Helio absorbed the dirty work:

  • 业务方不用考虑端能力差异(几个不同宿主 App 各家 bridge 协议不同)
  • Business teams don’t have to handle host-app capability differences (several host apps each ship their own bridge protocol)
  • 业务方不用关心运行时优化(粒子 / 骨骼 / GL 调用)
  • Business teams don’t have to worry about runtime tuning (particles / bones / GL calls)
  • 业务方不用维护本地开发链路(adb / Safari / Xcode 调试栈打通了)
  • Business teams don’t have to maintain a local-dev pipeline (adb / Safari / Xcode debug stacks already stitched together)
  • 业务方不用搭建上报与监控(Helio 框架自带)
  • Business teams don’t have to build telemetry & monitoring (it’s in the framework)

业务方写完游戏,挂上 GameSDK 的几个钩子,剩下的事 Helio 管。从他们的视角看,Helio 是「不用知道存在的那一层」 —— 这是底座的最高赞誉。

Game teams ship the game, wire up a few GameSDK hooks, and let Helio handle the rest. From their seat, Helio is “the layer you don’t need to know exists” — about the highest compliment a foundation layer can earn.

第 05 章Chapter 05

收获与展望 Takeaways & Future

前面四章讲了 Helio 是怎么炼成的。最后这一章讲讲它带给我们什么样的复利、以及还能往哪里走。

The previous four chapters covered how Helio was built. This last one is about what it compounds into — and where it goes next.

5.1 类库封装与推广 5.1 Library Extraction & Adoption

Helio 在被验证可行之后,开始把核心能力沉淀成可复用的类库,让别的容器也能受益:

Once validated, Helio’s core capabilities started being extracted into reusable libraries so that other containers could benefit too:

  • GFX 类库:支持各类容器接入,把「500 → 37 GL 指令」这套优化做成开箱即用的可选模块
  • GFX library: drop-in for any container — packaging the “500 → 37 GL commands” optimization as an opt-in module
  • JSAdapter:支持容器自定义适配层,把「适配层抽象」这件事开放出去
  • JSAdapter: container-specific adapter layer support, opening up the “adapter abstraction” pattern itself
  • GameSDK:建设游戏统一容器,让多 App、多游戏、多端的接入成本进一步降低 [DOING]
  • GameSDK: a unified game container across host apps, games, and platforms — driving integration cost still lower [in progress]
  • 流式加载推广:把第 2 章那条优化路径做成可被其他容器借用的标准能力
  • Streaming-load adoption: package the chapter 2 path as a standard capability other containers can pick up
FIG 43 Helio 沉淀的能力向外扩散 Capabilities Spreading Outward From Helio
Helio 已验证VALIDATED SPOKE 01 GFX 类库GFX Library 支持各容器接入 · 500→37 GL 指令优化Plug into any container · 500→37 GL commands SPOKE 02 JSAdapter 容器自定义适配层 · 抽象层开放Custom adapter layer · pattern open SPOKE 03 · DOING GameSDK 统一容器Unified GameSDK 跨多 App / 多游戏 / 多端Across apps / games / platforms SPOKE 04 流式加载推广Streaming Load 推广为标准能力 · 业务零感知Standardized · zero biz change
Helio 验证可行后,核心能力被解耦成 4 条独立的能力线,每一条都可以被其他容器或团队单独借用 —— 这是底座的复利。 Once Helio proved itself, its core capabilities were unbundled into four standalone tracks, each of which other containers or teams can borrow independently — that’s the compounding return of a foundation layer.

5.2 游戏引擎架构抽象(远期) 5.2 Game Engine Architecture (Long-Term)

远一点看,Helio 容器还有一个更大的方向:把所有模块下沉 C++ 实现,交叉编译出双端动态库。

Looking further out, Helio has a bigger ambition: lower every module into C++, cross-compile into dynamic libraries for both platforms.

指导原则:可以 C++ 实现的模块只实现一次(双端共享),不能 C++ 实现的能力则按 C++ 统一协议在双端各自实现。

Guiding principle: anything that can be C++ is implemented once (shared across both platforms); anything that can’t goes through a unified C++-style protocol with native implementations on each side.

这件事如果能做成,会带来三个收益:

If we land this, three returns follow:

  1. 解决双端协议对齐问题(很多线上 Bug 来自双端不一致)
  2. Cross-platform protocol parity (a lot of production bugs come from drift between the two)
  3. 可维护性提升,减少重复实现
  4. Better maintainability — less duplicated implementation
  5. 对接其他业务/容器更友好
  6. Easier integration into other businesses and containers

这是一个 18 个月以上的工程,但方向已经定下来了。

An 18-month-plus project, but the direction is fixed.

5.3 性能优化方法论:平衡的艺术 5.3 The Methodology: It’s a Balancing Act

一年的工程下来,最大的收获不是某一个具体的优化,而是对「权衡」的体感。性能优化本质上不是「做加法」 —— 加越多优化代价越大;它是一门取舍的艺术。我们碰到的取舍主要有三组:

A year in, the biggest takeaway isn’t any single optimization — it’s the visceral sense of trade-off. Performance optimization isn’t additive — every added optimization comes with a cost. It’s the art of choosing what to give up. We hit three pairs of trade-offs repeatedly:

FIG 44 三组权衡 Three Trade-offs
对资源的权衡Resources
CPUCPU
GPU / 内存GPU / Memory

硬件瓶颈是天然的。任何一种资源被压满,必须把负载转移到另一种资源上 —— 没有「全都更省」的方案。Hardware bottlenecks are inherent. When one resource saturates, the load has to move elsewhere — there’s no “cheaper across the board” option.

对规范的权衡Standards
规范Standards
便利性Convenience

严守 Web 规范保证了通用性,但同时关上了某些性能优化的门。我们多次在「合规」和「快」之间做选择 —— 比如标准 rAF。Strict Web compliance keeps things universal, but also closes certain optimization doors. We picked between “to spec” and “to fast” many times — standardized rAF was one.

对技术的权衡Engineering
性能Performance
研效Dev Velocity

下沉 C++ 性能更好,但 C++ 的开发与调试体验远不如 JS。我们要的是两者都好 —— 这就是 Helio 在调试体系上花大量精力的原因。C++ runs faster; but C++ dev/debug feels nowhere as good as JS. We wanted both — which is why Helio invested so heavily in the debug stack.

三组权衡里,没有一组有「绝对正确」的答案。所谓工程哲学,就是在每个具体场景下,明确地知道自己在为什么取舍、为什么放弃 ——「哦,我这次选性能放弃了规范,是因为业务方就这一个引擎,不需要通用」。把「为什么」写下来,比「做什么」更重要。

None of these have a single right answer. Engineering philosophy is, at every concrete decision, knowing exactly what you’re trading for and why — “here I picked performance over the spec, because this team uses just one engine and doesn’t need universality.” Writing the why down matters more than the what.

5.4 未来方向 5.4 What’s Next

最后是 Helio 接下来要做的事。每一项都对应前面文章里某条没拉满的线 ——

Finally, Helio’s near-term roadmap. Each item corresponds to a thread from earlier in this article that isn’t yet fully pulled:

01
iOS JIT 模式iOS JIT mode在做in flight

第 3.12 节展开过的两条路 —— Separate JS Thread + WKWebview 借位。最终形态可能是两者的组合。The two paths from §3.12 — Separate JS Thread + WKWebview-borrowed JIT. The final shape will likely combine both.

02
WASM 能力WASM support规划中planned

如果未来引入重度 wasm 品类游戏(不是当前的 JS 游戏),需要补齐 wasm 运行时与函数级分包能力 —— 第 2.7 节提到的那条路就要走起来。If we onboard heavy wasm titles (different from today’s JS games), we’ll need a wasm runtime plus function-level splitting — the second path described in §2.7.

03
HotReload 能力HotReload提开发体验DX upgrade

第 1.7 节说「研发体验是另一半」。HotReload 是把开发体验再往前推一档 —— 改完代码不用重启容器,秒级看到效果。§1.7 said “dev experience is the other half.” HotReload pushes that experience another notch — change code, no container restart, see the result in seconds.

04
容器全面 C++ 化Full C++-ification of the container远期long-term

5.2 节描述的远期方向。一旦做成,Helio 就不只是一个 iOS / Android 容器,而是一套能被其他业务/团队接入的标准化游戏运行时。The long-term direction from §5.2. Once landed, Helio stops being “an iOS/Android container” and becomes a standardized game runtime that other businesses and teams can integrate.

FIG 45 Helio 一年里程碑(2023 春 → 2024 中) Helio Milestones Across One Year (2023 Spring → 2024 Mid)
2023 · Q2
立项与选型Kickoff & selection
三方案横评 → M 框架3-way eval → M framework
2023 · Q3
分包加载上线Subpackages live
首屏 31s → 10sfirst paint 31s → 10s
2023 · Q4
流式加载 + 调用优化Streaming + call tuning
首屏反超 WebView 12.5%surpasses WebView by 12.5%
2024 · Q1
GFX C++ 化GFX in C++
500 → 37 GL 指令/帧500 → 37 GL cmds/frame
2024 · Q2
全链路调试 · 灰度上量Full-chain debug · ramp-up
Big Jank 反超 WebView 4.8×Big Jank 4.8× smoother than WebView
2024 · MID
DAU 上量 · 质量稳定DAU ramp · quality steady
容器 DAU 十万级 · Crash 0.0015%100K+ DAU · Crash 0.0015%
一年 6 个里程碑。每个节点对应文章里的一个章节 —— 这就是「先证明能跑,再证明跑得更快,最后证明可以一直跑下去」的全流程。 Six milestones across one year. Each node maps to a chapter — the arc of “first prove it works, then prove it’s faster, finally prove it stays that way.”

2023 到 2024,一年的工程,1.4 万字的复盘可以收尾了。如果这篇文章让你对「容器层做什么」「为什么这么做」「能做到什么程度」有了一些更具体的概念,那它的目的就达到了。

2023 to 2024, one year of work, ~14K words of retrospective — time to wrap. If this article made “what a container layer actually does,” “why these choices,” and “how far it can go” feel a little more concrete to you, it’s done its job.

技术没有银弹,工程是一连串取舍的艺术。能做的,是把每一次取舍的「为什么」都讲清楚。

There’s no silver bullet — engineering is a chain of trade-offs. The most we can do is say why each one was made.

✦ ✦ ✦
阅读Reads

留下评论Leave a comment

评论Comments

加载中…Loading…