沉积的像素 — 50+ 图片格式全谱百科

PHASE I

Web 显示派 — 屏幕的 50 KB 战争

Web display — the 50 KB war on the screen

像素现在走到了「压缩」与「传输」两站。这一派的核心矛盾是 体积 / 画质 / 解码时间 / 兼容性 四角拉扯——每个角动一下,另外三个就要赔。三十年里十三个格式接力,每一代都在解决前任的痛:GIF 的专利、JPEG 的方块、PNG 的体积、WebP 的软糊、HEIC 的专利墙、AVIF 的编码慢、JXL 的政治。读完这十三章,你会知道为什么「永远的 JPEG」迟迟没被替代。

The pixel is passing through compress and transmit. The defining tension of the Web family is a four-corner pull — size / quality / decode cost / compatibility — and you can never tug one corner without paying somewhere else. Thirteen formats over thirty years, each solving a wound left by its predecessor: GIF's patent, JPEG's blocking, PNG's bulk, WebP's smear, HEIC's patent wall, AVIF's encode time, JXL's politics. Read these thirteen and you'll see why "good old JPEG" still won't die.

birth

→

edit

→

compress

→

transmit

→

decode

→

VRAM

→

sample

→

screen

BMP — 没有压缩的童年

BMP — A Childhood Without Compression

YEAR 1990 AUTHOR Microsoft / IBM EXT .bmp / .dib MIME image/bmp STD vendor (Microsoft) LOSSY lossless (opt. RLE) DEPTH 1/4/8/16/24/32 bit ALPHA since BMPv5 ANIM none STATUS legacy (still inside Windows)

把像素原样写进硬盘,这就够了。

Just write the pixels straight to disk; that's enough.

1980 年代末,Windows 需要一个不依赖任何压缩库、可以从显存里直接 dump 出来、又能直接 load 回去的位图容器。设计目标根本不是体积,而是"零依赖、零解码、零思考"。BMP 因此把当年显存的扫描方向、字节顺序、行对齐规则一并写进了文件头,并固定下来——三十多年后,这些 80 年代显存的影子仍然活在 .bmp 里。

In the late 1980s Windows needed a bitmap container that depended on no compression library, could be dumped straight from video memory, and loaded straight back in. The design goal was not file size; it was zero dependencies, zero decoding, zero thinking. BMP froze the scan direction, byte order, and row alignment of the era's video memory into the file header. Three decades later, those 1980s VRAM ghosts still live inside every .bmp.

图 1 · BMP 文件结构。FILE + INFO 两个头(共 54 字节)→ 可选调色板 → 像素阵列。像素以 BGR 顺序、自下而上存放,每行末尾用 0 padding 至 4 字节对齐——这三条都源自 1980 年代 Windows 显存布局。

Fig 1 · BMP file layout. Two headers (54 B total) → optional palette → pixel array. Pixels are stored in BGR order, bottom-up, with each row zero-padded to a 4-byte boundary — three rules inherited verbatim from 1980s Windows VRAM layout.

技术内核

Technical core

BMP 由两段定长头组成:14 字节 BITMAPFILEHEADER(magic "BM" + 文件大小 + 像素数据偏移)和 40 字节 BITMAPINFOHEADER(宽、高、位深、压缩方式、调色板大小等)。像素阵列自下而上排列——origin 在左下角,这直接对应 80 年代 CRT 显存的扫描方向。颜色通道顺序是 BGR 而不是 RGB,同样源自当年 Windows 显存的字节排列;打开任何一个 .bmp,前三个像素字节读出来都是蓝绿红。每一行的字节数必须是 4 字节的整数倍——不足的尾部用 0 padding,目的是让 32 位 CPU 一次取一个像素时不需要做对齐计算。后期的 BMPv4 / BMPv5 加了 RLE-4 / RLE-8 行程编码、bitfield 自定义通道掩码、ICC profile、alpha 通道,但生态并没有真的跟进——大多数解码器只认那 40 字节头。

A BMP file is two fixed-size headers: a 14-byte BITMAPFILEHEADER (magic "BM", file size, pixel-data offset) and a 40-byte BITMAPINFOHEADER (width, height, bit depth, compression, palette size). The pixel array is stored bottom-up: origin in the lower-left corner, mirroring the scan direction of 1980s CRT VRAM. Channel order is BGR, not RGB — again copied from how Windows laid out video memory. Every row must be padded with zeros to a multiple of 4 bytes, so a 32-bit CPU can read one pixel per fetch with no alignment math. Later BMPv4 / BMPv5 added RLE-4 / RLE-8 run-length encoding, bitfield channel masks, ICC profiles, and a real alpha channel — but the ecosystem never caught up; most decoders still only recognise the original 40-byte info header.

适用

USE FOR

Windows 系统资源(.cur 光标、.ico 图标内层)
嵌入式 / RTOS 等没有解码库可用的环境
编解码器教学:位图最素的样本
临时 dump:debug 时把 framebuffer 直接存盘

Windows system resources (.cur cursors, .ico inner bitmaps)
Embedded / RTOS targets with no decoder library
Codec teaching: the most stripped-down bitmap sample
Throw-away framebuffer dumps when debugging

反适用

AVOID

任何 web 场景:体积是 PNG 的 5–20×
移动端 / 流量敏感传输
需要 metadata、color profile、HDR 的工程影像
需要稳定 alpha 的 UI 资源(BMPv5 兼容性差)

Anything on the web: 5–20× larger than PNG
Mobile / bandwidth-sensitive delivery
Engineering imagery that needs metadata, color profiles, or HDR
UI assets needing reliable alpha (BMPv5 support is patchy)

scope	browsers	tools	CLI
BMP	✓ 100% (no one ships it on the web)	✓✓✓ Photoshop · GIMP · Paint · Preview	`convert in.png out.bmp` (ImageMagick)

奇闻 · TRIVIA

TRIVIA

BMP 的像素是 BGR 顺序而不是 RGB——这并不是设计师的偏好,而是当年 Windows 显存就这么排,BMP 干脆把显存原样写进文件头。三十多年后,这块"显存布局考古"还活在每一个 .bmp 里。另一件冷知识:你右键看到的 .ico 图标文件,内部其实是 BMP——只是在头部多塞了几个字节告诉解码器"我里面是几张图"。

BMP stores pixels in BGR order, not RGB — not a designer's preference, but a literal copy of how 1980s Windows video memory was wired. Three decades on, that VRAM-layout fossil still lives inside every .bmp. Bonus fact: the .ico icon files you right-click on Windows are BMP inside — just a few extra header bytes telling the decoder how many sub-bitmaps it contains.

←父:parent: none (one of the earliest PC bitmap standards) →子:children: ICO/CUR (embeds BMP) · concept lives on in NetPBM / TGA / Farbfeld

GIF — 1987 与 LZW 专利往事

GIF — 1987 and the LZW Patent Saga

YEAR 1987 (87a) · 1989 (89a) AUTHOR CompuServe · Steve Wilhite EXT .gif MIME image/gif STD CompuServe → de-facto LOSSY palette-lossy + LZW lossless DEPTH 8-bit indexed ALPHA 1-bit binary ANIM ✓ STATUS survivor (memes keep it alive)

用 256 个颜色坚持了 39 年。

Held the line with 256 colors for 39 years.

1987 年是拨号上网的年代——一张 100 KB 图片要传整整一分钟。CompuServe 需要一种比 BMP 小得多、跨平台、还能拼成动图的格式。Wilhite 拿了刚发表不久的 LZW 字典压缩,加上 256 色调色板,做出了 GIF87a。两年后 89a 又补上透明色与动画扩展,从此一锤定音——并且谁也没想到,它会撑过 GeoCities、撑过宽带、撑过 Flash,最后被 Twitter 时代的"表情包"二次激活。

1987 was the dial-up era — a 100 KB image took a full minute to download. CompuServe needed something far smaller than BMP, cross-platform, and capable of stitching frames into a loop. Wilhite combined freshly published LZW dictionary compression with a 256-colour palette and shipped GIF87a. Two years later 89a added transparency and animation extensions, locking the format in. No one expected it to outlive GeoCities, broadband, and Flash — only to be re-ignited by the Twitter-era reaction-meme.

图 2 · GIF 三件套。左:全局 256 色调色板,每像素一个 8-bit 索引。中:LZW 字典随输入流动态扩展(9 → 12 bit 自适应)。右:多帧 + frame disposal 三种模式;NETSCAPE2.0 扩展开启了"无限循环"。

Fig 2 · The GIF trio. Left: global 256-colour palette, one 8-bit index per pixel. Middle: LZW dictionary grows on the fly (9 → 12 bit adaptive). Right: multi-frame stack with three disposal modes; the NETSCAPE2.0 extension is what unlocks infinite looping.

技术内核

Technical core

GIF 由四件事拼出来:① 调色板——一张全局调色板(GCT),最多 256 个 RGB888 entry,每帧也可以再覆盖一张局部调色板(LCT);② LZW 压缩——变长 9 至 12 bit 字典,字典随像素索引流动态扩展,字典满了就清空重来;③ 多帧 + disposal——每帧带一个 Graphic Control Extension,disposal method 决定下一帧绘制前如何处理当前帧(保留 / 还原背景 / 还原前一帧);④ 89a 扩展——透明色索引(让其中一个 palette 槽位变透明,所以 alpha 永远是 1 bit)、Comment / Plain Text Extension、以及最关键的 NETSCAPE2.0 Application Extension——后者带一个 16-bit 循环计数,从此 GIF 可以无限循环。这个扩展不是 GIF 标准的一部分,是 Netscape 1995 年自己加的——但今天每一张循环 GIF 都欠 Netscape 一个 credit。

GIF is four things stitched together. ① Palette — one global colour table (GCT) of up to 256 RGB888 entries, optionally overridden per frame by a local table (LCT). ② LZW compression — a variable-width 9-to-12-bit dictionary that grows with the pixel-index stream and resets when full. ③ Frames + disposal — each frame carries a Graphic Control Extension whose disposal method tells the decoder how to wipe the previous frame (keep / restore-background / restore-previous). ④ 89a extensions — a transparent-colour index (one palette slot becomes "transparent", which is why alpha is forever 1-bit), Comment and Plain-Text extensions, and the all-important NETSCAPE2.0 Application Extension that carries a 16-bit loop counter. That last one isn't part of any standard — Netscape just added it in 1995 — yet every looping GIF on Earth still owes Netscape a credit.

适用

USE FOR

表情包 / 反应 GIF(社交平台仍按 GIF MIME 处理)
极简动图、像素艺术、loading 转圈
少色线稿、单色 banner、低保真预览

Reaction memes (most platforms still pipe them as image/gif)
Minimal motion loops, pixel art, spinners
Low-colour line art, monochrome banners, low-fi previews

反适用

AVOID

真实照片:256 色根本不够,带 dither 还是色块
渐变 / 长视频片段:体积爆炸,远不如 mp4 / WebM
需要半透明 alpha 的任何场景

Photographs: 256 colours simply isn't enough, even with dither
Gradients or long clips: file size explodes; mp4 / WebM win by 10–50×
Anything needing real (non-binary) alpha

scope	browsers	tools	CLI
GIF	✓✓✓ universal since Mosaic 1993	✓✓✓ Photoshop · GIMP · ezgif · Figma	`gifsicle -O3 in.gif -o out.gif` · `ffmpeg -i in.mp4 out.gif`

奇闻 · TRIVIA

TRIVIA

1994 年 Unisys 突然开始对所有使用 LZW 的 GIF 编码器收专利费——包括 CompuServe 自己,这直接催生了一周内组队启动的 PNG。专利 2003 年到期,但 PNG 早已无可阻挡;GIF 留给后世的是"动图"这个概念,不是"静图"格式。另一桩公案:作者 Steve Wilhite 在 2013 年 Webby 颁奖礼上公开宣告"GIF 念 jif,像花生酱那个",但全世界 80% 的人不听他的——发音之争至今未息。

In 1994 Unisys abruptly began enforcing the LZW patent against every GIF encoder — CompuServe itself included — which is what kicked off PNG within a week. The patent expired in 2003, but PNG was unstoppable by then; GIF's lasting bequest to the web is the idea of an animated image, not a still one. And one more: at the 2013 Webby Awards, Steve Wilhite himself proclaimed "it's pronounced jif, like the peanut butter" — and roughly 80 % of the planet still ignores him.

←父:parent: none (contemporary of BMP, different lineage) →子:children: APNG (carries the animation idea) · WebP / AVIF (visual successors, palette dropped)

PNG — DEFLATE、scanline filter、与一场弑父

PNG — DEFLATE, Scanline Filters, and a Patricide

YEAR 1996 (RFC 2083) · 2003 (W3C / ISO 15948) AUTHOR PNG Dev Group · Thomas Boutell et al. EXT .png MIME image/png STD ISO/IEC 15948 LOSSY lossless DEPTH 1/2/4/8/16 bit/channel ALPHA 8 / 16-bit ANIM none (see APNG) STATUS mainstream · everywhere

GIF 收钱的那天,自由工程师写了一个免费的弑父者。

The day GIF started charging, free engineers wrote its successor.

1995 年初 Unisys 开始执行 LZW 专利,所有 GIF 编码器都要付费——包括 CompuServe 自己。Usenet comp.graphics 上 Thomas Boutell 在两周内拉起一支 30 人志愿团队,目标四条:(a) 完全无专利;(b) 比 GIF 更小;(c) 真正的 alpha 通道,而不是 1 bit 透明色;(d) 16 bit/channel + ICC profile + gamma 校正,为下一个十年的设备做准备。九个月后 PNG 1.0 + zlib + DEFLATE 三个 RFC 同时发布——这是互联网历史上最快、最干净的一场技术弑父。

In early 1995 Unisys began enforcing its LZW patent: every GIF encoder, including CompuServe's own, now owed money. Within two weeks a thirty-person volunteer crew on Usenet's comp.graphics rallied around Thomas Boutell with four goals: (a) wholly patent-free; (b) smaller than GIF; (c) real alpha, not a single transparent palette slot; (d) 16 bit / channel plus ICC profiles and gamma, ready for the next decade of hardware. Nine months later PNG 1.0, zlib, and DEFLATE shipped as three simultaneous RFCs — possibly the cleanest, fastest patricide in internet history.

图 3a · DEFLATE 两步走。第一步 LZ77 把重复串替换成 (距离, 长度) 反向引用,第二步对剩下的字面量与引用做 Huffman 变长编码。两步都没专利——zip / gzip / PNG / WOFF 全部用同一套。

Fig 3a · DEFLATE in two passes. LZ77 first replaces repeated runs with (distance, length) back-references, then Huffman gives variable-length codes to the resulting literals and references. Zero patents — zip, gzip, PNG, and WOFF all share this exact stack.

图 3b · 五种 scanline filter。每一行像素独立挑选最优 filter——None / Sub(减左) / Up(减上) / Average(减平均) / Paeth(基于 L、U、UL 三像素的方向预测)。filter 的目的不是压缩,而是把数据变成"更易被 DEFLATE 压缩的样子"。

Fig 3b · The five scanline filters. Each row independently picks the best one — None / Sub (subtract left) / Up (subtract above) / Average (subtract floor-of-mean) / Paeth (a directional predictor over the L, U and UL neighbours). Filters don't compress on their own; they reshape the bytes into something DEFLATE can crush further.

图 3c · PNG chunk 链。强制顺序 IHDR → [PLTE] → IDAT × N → IEND;中间可以塞任意数量的 ancillary chunks(gamma、ICC、文本元数据等)。每个 chunk 都带 CRC32,且解码器可安全跳过未知 chunk——PNG 因此可以无限扩展而不破坏旧解码器。

Fig 3c · The PNG chunk chain. The mandatory order is IHDR → [PLTE] → IDAT × N → IEND; any number of ancillary chunks (gamma, ICC, text metadata, …) may be sprinkled in between. Every chunk carries a CRC32, and decoders are required to skip unknown chunks safely — which is exactly why PNG keeps gaining features without breaking old readers.

图 3d · Adam7 隔行扫描的 8×8 子块。第 1 趟只发 1 个像素(整图的 1/64),后续每趟密度翻倍——浏览器可以在还没下完整张图时,先按这个顺序拼出一个粗糙预览。

Fig 3d · Adam7's 8×8 super-block. Pass 1 ships a single pixel (1/64 of the image); each subsequent pass doubles density — browsers can render a rough preview before the file finishes downloading.

技术内核

Technical core

PNG 的五个支柱:① DEFLATE = LZ77 + Huffman——和 zip / gzip 完全同款,1996 年的 RFC 1951,从设计第一天起就保证无专利。② 5 种 scanline filter(None / Sub / Up / Average / Paeth):每行像素独立选最优 filter——filter 不压缩,而是把数据预测成残差,让后面的 DEFLATE 更容易找到重复。Paeth filter 用左、上、左上三像素做方向预测,在自然图像上几乎总赢。③ chunks 体系:IHDR / PLTE / IDAT / IEND 是必须的,后面可以追加 tRNS(调色板透明)、gAMA / cHRM / iCCP(色彩管理)、tEXt / iTXt(metadata)、acTL / fcTL(APNG 扩展)等等;每个 chunk 一个 CRC32 校验,未知 chunk 解码器必须安全跳过——因此 PNG 的扩展性几乎是无限的。④ 真 alpha:8 或 16 bit 的独立 alpha 通道,不再借调色板槽位伪装;PNG-32 = RGBA 8-bit。⑤ Adam7 interlace:7 趟扫描的渐进显示——20 年前在 56 K 调制解调器上极其有用,今天不再常用。

Five pillars hold PNG up. ① DEFLATE = LZ77 + Huffman — the exact stack used by zip and gzip, RFC 1951, patent-free by construction. ② Five scanline filters (None / Sub / Up / Average / Paeth): each row picks its best filter independently — the filter doesn't compress, it predicts residuals so DEFLATE can spot repetitions. Paeth, which predicts from the left, upper and upper-left neighbours, almost always wins on natural images. ③ The chunks system: IHDR / PLTE / IDAT / IEND are mandatory; everything else (tRNS for palette transparency, gAMA / cHRM / iCCP for colour management, tEXt / iTXt for metadata, acTL / fcTL for APNG, …) is optional and CRC-checked, and decoders are required to ignore chunks they don't recognise — so PNG can grow forever without breaking old readers. ④ Real alpha: an independent 8- or 16-bit alpha channel, no longer disguised as a palette slot. PNG-32 is plain RGBA 8-bit. ⑤ Adam7 interlace: a 7-pass progressive scan — invaluable on 56 K modems twenty years ago, mostly obsolete today.

图 3 · 全流程 · 原始 RGBA → 逐行选 filter(None / Sub / Up / Average / Paeth)→ DEFLATE 压(LZ77 + Huffman, level 0–9)→ 打包成 chunks(IHDR + IDAT×N + IEND, 每个带 CRC32)→ 输出 .png 文件。可选:在 filter 之前先做 Adam7 隔行重排。

Fig 3 · Full pipeline · raw RGBA → per-row filter (None / Sub / Up / Average / Paeth) → DEFLATE (LZ77 + Huffman, zlib level 0–9) → pack into chunks (IHDR + IDAT × N + IEND, each CRC-checked) → emit .png. Optional: pre-shuffle rows with Adam7 before filtering.

历史专栏 · GENESIS

HISTORY · GENESIS

PNG 诞生记 · 一场为期九个月的开源弑父

The Birth of PNG · A Nine-Month, Open-Source Patricide

1995 年 1 月,Unisys 发出律师函:从此每一个支持 GIF 的软件——包括 CompuServe、网景、PhotoShop——都要按 LZW 专利付费。GIF 用户瞬间从"用一种免费格式"变成"用一种被持续征税的格式"。一周之内,Thomas Boutell 在 Usenet comp.graphics 上发起了"a Portable Network Graphics format"提案。提案的语气很克制,但目的赤裸裸:做一个无专利的替代者。短短几天,Oliver Fromme、Glenn Randers-Pehrson、Greg Roelofs、Mark Adler、Jean-loup Gailly 等三十多人加入。讨论以邮件列表方式公开进行,所有决议存档可查——这本身就是 1995 年罕见的"工程透明度"。

技术决策几乎被两件事定型:① 选 DEFLATE(LZ77 + Huffman)而不是 LZW——前者已发表多年、可证无专利、且压缩率优于 LZW,Mark Adler 当时正好在写 zlib;② 设计 chunks 体系,让任何人都可以加扩展而不破坏旧解码器。1996 年 10 月 1 日,RFC 1950(zlib)、RFC 1951(DEFLATE)、RFC 2083(PNG 1.0)三连发,分别覆盖三个抽象层。1999 年 PNG 1.2 发布,2003 年成为 W3C 推荐标准并进入 ISO/IEC 15948,2004 年 LZW 专利到期——但那一天 GIF 早已失去静图战场,只靠"动图"延续到表情包时代。

这九个月是互联网技术史上少见的"理想叙事完整闭环":社区被资本压榨 → 工程师组队 → 一年内交付 → 二十年后整个行业受益。整个过程没有任何报酬,所有专利豁免都写进了 RFC。今天每一张 macOS 截图、每一个 web 透明 logo、每一个 Material UI 图标,都站在这群志愿者的肩膀上。

January 1995: Unisys sent the cease-and-desist that would change everything — every GIF-capable program, CompuServe and Netscape and Photoshop included, would now owe LZW royalties. Overnight, GIF users went from "using a free format" to "using a permanently taxed format". Within a week Thomas Boutell posted a calmly worded proposal on Usenet's comp.graphics for "a Portable Network Graphics format". The tone was careful; the intent was unmistakable — build a patent-free successor. Within days Oliver Fromme, Glenn Randers-Pehrson, Greg Roelofs, Mark Adler, Jean-loup Gailly and thirty more had joined the mailing list, where every decision was made in public and archived — itself a rare degree of engineering transparency for 1995.

Two technical choices defined the format. First, DEFLATE (LZ77 + Huffman) instead of LZW — older, proven patent-free, and compresses better than LZW; Mark Adler happened to be writing zlib at the time. Second, the chunks system, so anyone could extend the format without breaking older decoders. On 1 October 1996 three RFCs landed simultaneously — RFC 1950 (zlib), RFC 1951 (DEFLATE), RFC 2083 (PNG 1.0) — neatly covering three abstraction layers. PNG 1.2 followed in 1999. W3C made PNG a Recommendation in 2003, ISO published it as 15948, and the LZW patent finally expired in 2004 — by which point GIF had long lost the still-image battle and survived only by selling "animation" into the meme era.

Those nine months are one of the rare clean-loop stories in internet history: community squeezed by a rights-holder → volunteer engineers organise → ship in a year → entire industry benefits twenty years later. Nobody was paid; every patent waiver is written into the RFCs themselves. Every macOS screenshot, every transparent web logo, every Material UI icon today stands on those volunteers' shoulders.

format	year	lossless	palette	alpha	animation	typical size vs JPEG-Q85
BMP	1990	✓	✓	partial (v5)	—	≈ 8–20 ×
GIF	1987	partial	✓ (256)	1-bit	✓	≈ 0.4 × (low colour)
PNG-8	1996	✓	✓ (256)	8-bit	—	≈ 0.3 ×
PNG-24/32	1996	✓	—	8 / 16-bit	—	≈ 1.5–5 ×
JPEG (Q85)	1992	—	—	—	—	1.0 × (baseline)

$ oxipng -o6 in.png                  # brute-force re-pack, 5–30% smaller
$ pngcrush -reduce -brute in.png out.png   # classic, slower but still useful
$ convert in.png -strip out.png      # ImageMagick — drop metadata chunks
$ pngquant --quality=70-90 in.png    # lossy palette quantisation → PNG-8
$ zopflipng -m in.png out.png        # Google's zopfli, max DEFLATE compression

适用

USE FOR

屏幕截图、录屏帧、UI 设计稿(无损 + 锐边)
透明 logo、PWA 图标、Material 图标
需要稳定 alpha + 跨平台一致性的任何资源
16 bit/channel 工程影像中转(在 EXR 之前的轻量选项)

Screenshots, recorded frames, UI design exports (lossless + sharp edges)
Transparent logos, PWA icons, Material icons
Anything needing reliable alpha and cross-platform consistency
A lightweight 16-bit/channel transit format before EXR enters the picture

反适用

AVOID

真实照片:体积比 JPEG / WebP / AVIF 大 5–10 ×
视频帧序列:用 H.264 / AV1 / WebM,不是 APNG
对加载时间极敏感的首屏大图

Photographs: 5–10 × larger than JPEG / WebP / AVIF
Video frame sequences: use H.264 / AV1 / WebM, not APNG
Above-the-fold hero images where bytes matter most

scope	browsers	tools	CLI
PNG	✓✓✓ universal since IE 4 / Mozilla 1.0	✓✓✓ Photoshop · Figma · Sketch · GIMP · Preview	`oxipng -o6` · `pngquant` · `zopflipng`

奇闻 · TRIVIA

TRIVIA

PNG-8 的调色板模式经常比 GIF 还小——因为 DEFLATE 的压缩率比 LZW 高,而且 PNG-8 还可以带真正的 8-bit alpha。一个鲜为人知的设定:PNG 工作组明确拒绝把动画写进 PNG 标准——他们认为"一个静图格式不该长腿";今天的 APNG 是 Mozilla 自己加的扩展,而真正"官方"的 PNG 动图变体 MNG 因为太复杂没人用,基本死掉了。还有更冷的:Apple 工具链会对 iOS app 内的 .png 做一种叫 CgBI 的"优化"——预乘 alpha + BGRA 字节序——结果这些 .png 在 iOS 之外的解码器里直接打不开,需要用 pngdefry 反向修复。

PNG-8 in palette mode is often smaller than GIF — DEFLATE simply compresses better than LZW, and PNG-8 also gets a real 8-bit alpha channel for free. Lesser-known: the PNG working group explicitly refused to put animation into the standard ("a still-image format shouldn't grow legs"). Today's APNG is a Mozilla extension; the official animated cousin, MNG, was so over-engineered that nobody uses it. Even more obscure: Apple's iOS toolchain "optimises" bundled .png assets with a flavour called CgBI — premultiplied alpha plus BGRA byte order — producing files that simply won't decode outside iOS unless you run them through pngdefry first.

←父:parent: BMP · GIF (replaces both) →子:children: APNG (animation extension) · MNG (failed animated cousin)

APNG — PNG 偷偷做了动图

APNG — PNG Secretly Grew Frames

YEAR 2004 (Mozilla) · 2017 (W3C) AUTHOR Stuart Parmenter · Vladimir Vukićević (Mozilla) EXT .apng · .png MIME image/apng STD non-PNG-WG → 2017 W3C LOSSY lossless (same as PNG) DEPTH same as PNG (1–16 bit/ch) ALPHA same as PNG (8 / 16-bit) ANIM ✓ STATUS mainstream · all major browsers

PNG 工作组说不要,Mozilla 偷偷加了。

PNG WG said no. Mozilla shipped it anyway.

2004 年 Mozilla 想给 Firefox 加一类"加载中"的动态图标:GIF 只剩 256 色看起来很丑,而 PNG 工作组官方动图方案 MNG 又庞大复杂,几乎没人实现。两个 Mozilla 工程师 Stuart Parmenter 和 Vladimir Vukićević 干脆在 PNG 上塞了三个新 chunk:acTL(动画控制)、fcTL(每帧控制)、fdAT(帧数据)。提案发到 PNG 邮件列表,工作组明确拒绝接收——理由是"会破坏 PNG 简洁性"。Mozilla 不为所动,2008 年随 Firefox 3 直接发布;十年后 Apple、Google 跟进,2017 年 W3C 终于回头把它收编为标准。一个被官方拒绝的扩展,反过来被市场和标准追认。

In 2004 Mozilla wanted lightweight loading animations in Firefox: GIF's 256 colours looked ugly and the PNG working group's official MNG was so vast that almost no one implemented it. Two Mozilla engineers, Stuart Parmenter and Vladimir Vukićević, simply added three new chunks to PNG — acTL (animation control), fcTL (per-frame control), fdAT (frame data). They sent the proposal to the PNG mailing list; the working group flatly refused, citing damage to "PNG's simplicity". Mozilla shipped it anyway in Firefox 3 (2008). A decade later Apple and Google followed, and in 2017 the W3C finally adopted APNG as a standard. A rejected extension, ratified later by the market and the spec.

图 4 · APNG chunk 链。IHDR 之后插入新的 acTL(动画控制),第一个 IDAT 仍是合法的静态 PNG 首帧——不支持 APNG 的解码器止于此(图中红色虚线)。其后是 fcTL 与 fdAT 的反复:每帧一个 fcTL 描述位置/时长/disposal,后面紧跟若干 fdAT 携带帧数据。

Fig 4 · The APNG chunk chain. A new acTL sits right after IHDR; the first IDAT is still a perfectly legal static PNG — old decoders stop at the dashed red line. From there on, fcTL + fdAT alternate per frame, with each fcTL describing the frame's position, delay and disposal mode.

技术内核

Technical core

APNG 在 PNG 上做的改动只有三件事:① 三个新 chunk——acTL 携带帧数和循环次数;fcTL 是每帧控制块,描述偏移、宽高、显示时长(分子/分母两个 16-bit)、blend mode 与 disposal mode;fdAT 本质是带 4 字节 sequence 编号的 IDAT,数据段格式完全相同。② 第一帧仍是合法 PNG——把首帧仍写成 IDAT,意味着不支持 APNG 的解码器(早期 Safari、ImageMagick 旧版本)看到的就是一张静态图,向后兼容性极佳。这是 APNG 比 MNG 成功的最大原因之一。③ blend / disposal 模式——blend mode 有 SOURCE(直接覆盖)与 OVER(alpha 合成)两种;disposal mode 有 NONE(保留)、BACKGROUND(清空)、PREVIOUS(还原前一帧)三种,跟 GIF 89a 的 disposal 完全是同一套语义。除此之外,APNG 对色彩空间、滤波、压缩(DEFLATE)的处理与 PNG 一字不差。

APNG only adds three things to PNG. ① Three new chunks: acTL carries frame count and loop count; fcTL is a per-frame control block describing offset, width/height, delay (a 16-bit numerator and denominator), blend mode and disposal mode; fdAT is essentially an IDAT prefixed with a 4-byte sequence number — its data payload format is identical. ② The first frame is still a valid PNG: keeping frame 0 as an IDAT means a decoder that doesn't understand APNG (old Safari, older ImageMagick) just sees a static image. This backward-compatibility trick is the biggest reason APNG won where MNG failed. ③ Blend / disposal modes: blend modes are SOURCE (overwrite) and OVER (alpha composite); disposal modes are NONE (keep), BACKGROUND (clear), PREVIOUS (restore prior frame) — exact same semantics as GIF 89a. For everything else (colour space, filters, DEFLATE), APNG inherits PNG byte for byte.

适用

USE FOR

高质量动图——Twitter / Telegram / 微信表情包
需要真 alpha 通道的动效(GIF 只能 1-bit)
静图回退极重要的场景:不支持 APNG 的浏览器仍能显示首帧

High-quality animated stickers (Twitter / Telegram / WeChat)
Animations needing real alpha (GIF gives you only 1-bit)
Anywhere a static fallback matters: non-APNG decoders still see frame 0

反适用

AVOID

体积敏感场景:同等画质比 WebP / AVIF 大 2–5 ×
长序列 / 视频片段:用 H.264 / AV1 / WebM

Bandwidth-tight contexts: 2–5 × larger than WebP / AVIF at equal quality
Long sequences or video clips: use H.264 / AV1 / WebM

scope	browsers	tools	CLI
APNG	✓✓✓ Firefox 3+ · Safari 8+ · Chrome 59+ · Edge 18+	✓✓ GIMP · Photoshop (plugin) · ezgif	`apngasm in_*.png out.apng` · `ffmpeg -plays 0 ... out.apng`

奇闻 · TRIVIA

TRIVIA

APNG 的官方"拒收信"今天还能在 PNG 邮件列表存档里翻到——Glenn Randers-Pehrson 写道 "we do not believe APNG is an appropriate extension to PNG"。十三年后,W3C 反过来把 APNG 立为标准,几乎所有浏览器都支持。还有一段冷历史:iOS 8 之前 Safari 不认 APNG,Twitter 的 iOS 客户端因此长期采用 webp / gif 双发——同一张表情包传两份,iOS 用 GIF,Android / Web 用 APNG / WebP。

The original PNG-WG rejection of APNG is still in the mailing-list archive — Glenn Randers-Pehrson wrote, verbatim, "we do not believe APNG is an appropriate extension to PNG". Thirteen years later the W3C blessed APNG anyway, and every major browser now ships it. A second piece of trivia: Safari ignored APNG until iOS 8, so for years Twitter's iOS client double-shipped its sticker library — GIF for iOS, APNG / WebP everywhere else.

←父:parent: PNG →子:children: conceptually replaces GIF animation · superseded by animated WebP & AVIF Sequence

animated WebP — WebP 的动图分身

animated WebP — WebP's Multi-Frame Twin

YEAR 2010 (WebP) · ~2012 (animation ext) AUTHOR Google EXT .webp MIME image/webp STD Google proprietary LOSSY lossy + lossless (dual mode) DEPTH 8 bit/channel ALPHA 8-bit ANIM ✓ STATUS mainstream

WebP 在容器里偷偷塞了多帧——比 PNG 优雅,比 GIF 漂亮。

WebP slipped multiple frames into one container — neater than PNG, prettier than GIF.

静态 WebP 已经在体积上把 GIF 摁在地上摩擦——同一张表情包,WebP 通常只要 GIF 的 1/3。Google 接下来要做的事很自然:把 RIFF 容器从"装一帧"扩展成"装多帧",加一个 VP8X 扩展头声明 alpha / animation / ICC 等 feature flags,再加一个 ANIM 全局动画块和若干 ANMF 帧块。这套扩展在 2012 年附近随 libwebp 0.2 公开,WebP 一夜之间从静图格式变成"比 GIF 小 30%、画质好一个数量级、还有真 alpha"的动图格式。今天 Telegram、WhatsApp 的"高级表情包"几乎都是 animated WebP。

Static WebP already crushed GIF on bytes — the same sticker is typically a third the size in WebP. The next step was obvious: extend the RIFF container from one frame to many. Google added a VP8X extended header to declare feature flags (alpha / animation / ICC), an ANIM global animation block, and a stream of ANMF per-frame blocks. The extension landed around 2012 with libwebp 0.2, and overnight WebP went from a still-image format to one that beats GIF by ~30 % in size, an order of magnitude in quality, and finally adds real alpha. Today's "premium" stickers on Telegram and WhatsApp are almost all animated WebP.

图 5 · animated WebP 的容器嵌套。最外层是 1991 年的 RIFF,次层 WEBP 子块。VP8X 声明本文件含 animation / alpha / ICC 等 feature flag;ANIM 一次性给出背景色和循环次数;然后每一帧是一个 ANMF 块,内部再嵌一段 VP8(有损)或 VP8L(无损)bitstream。

Fig 5 · How animated WebP nests. The outermost layer is the 1991-vintage RIFF; the WEBP sub-block sits one level in. VP8X declares which feature flags are active (animation / alpha / ICC); ANIM gives the background colour and loop count once; each ANMF then carries one frame, internally wrapping a VP8 (lossy) or VP8L (lossless) bitstream.

技术内核

Technical core

animated WebP 的扩展逻辑可以用三句话概括:① RIFF 容器 + VP8X 扩展头——RIFF 是 Microsoft 1991 年发明的"分块容器"标准(用过 .wav / .avi 都见过它),WebP 直接复用,VP8X 是 WebP 自己加的扩展头,8 字节里第一字节是 feature flags(ICC profile / alpha / EXIF / XMP / animation 各占一位),其余字节给出 24-bit 的画布宽高。② ANIM + ANMF——ANIM 块在文件级别声明背景色和循环次数,ANMF 块在帧级别给出偏移、宽高、duration、blend mode、disposal mode,跟 APNG / GIF 是同一套语义。③ 每帧可独立选编码——WebP 内置两套编码器 VP8(有损,基于运动补偿和 DCT)与 VP8L(无损,基于 LZ77 + 颜色变换 + Huffman),animated WebP 允许逐帧切换:贴纸的纯色背景一帧用 VP8L 无损,人物动画一帧用 VP8 有损,同一文件里混排。

Three sentences cover the entire mechanism. ① RIFF + VP8X header: RIFF is Microsoft's 1991 chunk container (anyone who's opened a .wav or .avi has met it). WebP reuses it verbatim and adds an 8-byte VP8X header — the first byte is a bitfield of feature flags (ICC profile / alpha / EXIF / XMP / animation), and the remainder encodes a 24-bit canvas width and height. ② ANIM + ANMF: ANIM sits at file scope and declares background colour plus loop count; each ANMF then carries per-frame offset, dimensions, duration, blend mode and disposal mode — exact same semantics as APNG and GIF. ③ Per-frame codec choice: WebP ships two encoders, VP8 (lossy, motion-compensation + DCT) and VP8L (lossless, LZ77 + colour transform + Huffman). An animated WebP can switch encoders frame by frame — a sticker's flat background uses lossless VP8L, the character animation uses lossy VP8, all in one file.

适用

USE FOR

现代动图首选 / 表情包 / 短动效
需要 alpha 又要小体积的循环动画
一帧无损一帧有损混排的复杂贴纸

The default modern animated-image format · stickers · short loops
Looped animation that needs alpha and small bytes
Complex stickers mixing lossless and lossy frames in one file

反适用

AVOID

Safari < 14(老 iOS):需要 GIF/APNG 兜底
需要超低延迟硬解的视频流——还是用 H.264 / AV1

Safari < 14 (older iOS): you'll need a GIF/APNG fallback
Latency-critical hardware-decoded video — stick with H.264 / AV1

scope	browsers	tools	CLI
animated WebP	✓✓✓ Chrome 32+ · Firefox 65+ · Safari 14+	✓✓ Photoshop (plugin) · ezgif · GIMP 2.10+	`cwebp` · `webpmux -frame f1.webp +100 ... -o anim.webp`

奇闻 · TRIVIA

TRIVIA

WebP 的 RIFF 容器其实是 Microsoft 1991 年发明的——最初用于 .wav 和 .avi。Google 套了一层 WEBP 子块、塞进 VP8 / VP8L 比特流,就把一个 1991 年的容器拿来装 2010 年的 codec 用。这种"老房子住新人"的复用,在图像格式史上其实非常常见——HEIC 用 ISOBMFF(MP4 的容器)、AVIF 也用 ISOBMFF——容器和 codec 几乎永远是两层独立演化的事。

WebP's RIFF container was invented by Microsoft in 1991 — originally for .wav and .avi. Google simply wrapped it in a WEBP sub-chunk and stuffed VP8 / VP8L bitstreams inside, putting a 2010 codec inside a 1991 container. That "new tenant in an old house" pattern is everywhere in image-format history — HEIC and AVIF both ride ISOBMFF (the MP4 container) — container and codec are almost always two independently evolving layers.

←父:parent: WebP · GIF (animation idea) →子:children: coexists with APNG · superseded by AVIF Sequence

JPEG — 8×8 DCT 三十年统治

JPEG — Three Decades of the 8×8 DCT

YEAR 1992 (ISO/IEC IS 10918-1) AUTHOR Joint Photographic Experts Group EXT .jpg · .jpeg MIME image/jpeg STD ISO/IEC 10918 · ITU-T T.81 LOSSY lossy (a near-lossless mode exists, rarely used) DEPTH 8 bit/channel ALPHA none ANIM none (MJPEG is a separate beast) STATUS the internet's largest image bucket

8×8 的格子,装下了三十年的人类视觉。

An 8×8 grid that held three decades of human vision.

1980 年代末,扫描仪、数码相机、传真机几乎同时崛起,所有人都需要一个能把"自然图像"压到 1/10 体积、人眼又看不太出来的标准。JPEG 委员会用三个事实搭了一条压缩流水线:人眼对亮度比对色度敏感、对低频比对高频敏感、对能量集中的信号(自然图像)有极强冗余可挖。把这三件事翻译成代码,就是 YCbCr + 4:2:0 + 8×8 DCT + 量化——JPEG 因此能在 Q85 这个挡位上把 5 MB 的照片压到 250 KB,而你几乎看不出差。

By the late 1980s scanners, digital cameras and fax machines were arriving in parallel — everyone needed a way to crunch "natural images" to a tenth of their size while the human eye barely noticed. The JPEG committee built a pipeline around three facts: the eye is more sensitive to luma than to chroma, more sensitive to low frequencies than to high, and natural images carry enormous redundancy in their energy distribution. Translated into code, that becomes YCbCr + 4:2:0 + 8×8 DCT + quantisation — and lets JPEG turn a 5 MB photo into 250 KB at Q85 with practically no visible loss.

图 6a · 第一步:RGB → YCbCr,把"亮度"与"色度"分开。再做 4:2:0 子采样——色度面在水平和垂直方向各砍一半,体积立省 50%。整张照片你的眼睛只会觉得"颜色边缘稍微软了一点",几乎察觉不到。

Fig 6a · Step one: RGB → YCbCr, separating luma from chroma. Then 4:2:0 subsampling — chroma planes are halved on both axes, killing 50 % of the data on the spot. The eye perceives only a slight softening of colour edges; almost no one notices.

图 6b · 第二步:每 8×8 块做一次 DCT-II,把"像素"换成"频率系数"。左上角 DC 系数代表整块平均亮度,右下角是最高频。自然图像的能量天然集中在左上区域——右下大量系数接近 0,正等着量化把它们清零。

Fig 6b · Step two: each 8×8 block runs through a DCT-II, swapping "pixels" for "frequency coefficients". The top-left DC term is the block's average; the bottom-right is the highest frequency. In natural images the energy clusters near the top-left — the bottom-right is mostly near-zero coefficients ready to be quantised away.

图 6c · 标准 Q50 量化表(亮度 + 色度)。每个频域系数除以表中对应位置的整数再四舍五入——表里的数从左上(低频)到右下(高频)递增,意味着高频被砍得更狠;色度表更激进,几乎所有中高频位置都是 99。JPEG 压缩的"损失"主要发生在这一步。

Fig 6c · The standard Q50 quantisation tables (luma + chroma). Each frequency coefficient is divided by the integer at the matching position and rounded — the values rise from top-left (low frequency) to bottom-right (high frequency), so high frequencies get crushed harder. The chroma table is more aggressive — nearly every mid- and high-frequency slot is 99. This step is where almost all of JPEG's loss lives.

图 6d · zig-zag 扫描路径。把 8×8 = 64 个系数按"之"字形排成一维序列:DC 在 #1,后面按频率从低到高展开。量化后右下大量系数都是 0,序列的尾部出现长串零——刚好喂给 RLE(run-length)和 Huffman 吃。

Fig 6d · The zig-zag scan path. 64 coefficients get unrolled into a 1-D stream — DC sits at position #1, then frequencies fan out from low to high. After quantisation, the bottom-right is mostly zero, so the tail of the stream becomes long runs of zeros — perfect food for RLE and then Huffman.

技术内核

Technical core

JPEG 的压缩流水线有六环:① RGB → YCbCr——分离亮度与色度,为后续区别对待打基础。② 4:2:0 子采样——色度面 Cb / Cr 在水平和垂直方向各降一半采样率,体积立刻砍掉 50%,人眼几乎无感。③ 切成 8×8 块,每块做 DCT-II——空域换频域,自然图像的能量集中在左上(低频),右下大量系数趋近 0。④ 量化表——亮度表 + 色度表(色度表更激进),把不重要的高频系数除以一个较大的整数后取整,大量系数被清零。这一步是 JPEG 唯一的有损步骤,所有"画质损失"都在此发生。⑤ zig-zag 扫描 + RLE + Huffman——把 64 个系数按"之"字形排成一维序列,尾部一长串零交给 RLE 压缩,剩下的字面量做 Huffman 熵编码,无损。⑥ JFIF / Exif 容器封装——JPEG 标准本身只规定 codec 流(SOI / APPn / DQT / DHT / SOF / SOS / EOI marker),文件格式是另一层:JFIF 1.02(1992)规定了标准的 APP0 元数据,Exif(1995)在 APP1 里塞相机参数。今天你看到的每张 .jpg 几乎都是 "JFIF + Exif 包了一段 JPEG codec 流"。

Six stages make up the JPEG pipeline. ① RGB → YCbCr: split luma from chroma so the rest of the pipeline can treat them differently. ② 4:2:0 chroma subsampling: halve Cb and Cr horizontally and vertically — instantly drops 50 % of the data with virtually no perceptual cost. ③ Split into 8×8 blocks; run DCT-II per block: spatial → frequency domain. Natural-image energy clusters in the top-left (low frequency); the bottom-right is mostly near-zero. ④ Quantisation tables (luma + chroma, chroma being more aggressive): each coefficient is divided by the matching integer and rounded, killing huge swathes of high-frequency information. This is the only lossy step — every visible artefact JPEG ever produces comes from here. ⑤ Zig-zag scan + RLE + Huffman: unroll the 64 coefficients into a 1-D stream so the long zero-tail compresses cleanly under RLE, then Huffman-encode the remaining literals. Lossless. ⑥ JFIF / Exif container: the JPEG spec only defines the codec stream (SOI / APPn / DQT / DHT / SOF / SOS / EOI markers); the file format is a separate layer. JFIF 1.02 (1992) standardised an APP0 metadata segment, Exif (1995) tucked camera metadata into APP1. Almost every .jpg you've ever seen is "JFIF + Exif wrapping a JPEG codec stream".

图 6 · JPEG 全流程 · RGB → YCbCr 分离 → 4:2:0 子采样(−50%)→ 切 8×8 块 → DCT-II 频域变换 → 量化(★ 唯一有损步骤,Q 控制狠度)→ zig-zag 扫描成 1-D → RLE 压零 → Huffman 熵编码 → JFIF / Exif 包外壳 → .jpg。Q、量化表、子采样比、baseline / progressive 是编码器仅有的几个旋钮。

Fig 6 · The full JPEG pipeline · RGB → YCbCr split → 4:2:0 subsample (−50 %) → 8×8 blocks → DCT-II → Quantise (★ the one and only lossy step, Q controls how brutal) → zig-zag scan → RLE → Huffman → JFIF / Exif wrapper → .jpg. The encoder really only has four knobs: Q, quant tables, subsample ratio, and baseline-vs-progressive scan.

历史专栏 · STANDARDS

HISTORY · STANDARDS

JPEG 委员会 · 一份避开了 LZW 命运的标准

The JPEG Committee · A Standard That Dodged LZW's Fate

JPEG 这个名字其实不是格式名,而是工作组的名字——Joint Photographic Experts Group,1986 年由 ISO 与 CCITT(后来的 ITU-T)联合成立,目标是给"连续色调静态照片"做一个国际通用的有损压缩标准。委员会从一开始就刻意压住了"个人专利"的影响:DCT-II 这个数学工具是 Nasir Ahmed 在 1972 年发表的,早就过了任何专利保护期;Huffman 编码 1952 年发表;LZ77 由 Lempel-Ziv 1977 年发表——所有底层算法都是公开领域,委员会要做的只是把它们组装在一起。1992 年第一份正式标准 ISO/IEC IS 10918-1 / ITU-T T.81 发布,此时距 PNG 还有四年,距 GIF 的 LZW 公案也还有两年。

真正让 JPEG"成为一个文件格式"的,是 1992 年 C-Cube Microsystems 提出的 JFIF 1.02——此前 JPEG 标准只规定 codec 流,不规定如何把它放进文件、不规定 APP0 段的内容、甚至没有 SOI / EOI marker 的强制顺序。JFIF 把这一切定下来,1994 年成为事实标准,几乎所有图片软件至今仍按 JFIF 解析 .jpg。1995 年日本 JEIDA 推出 Exif,把"相机参数 + 缩略图 + GPS"塞进 APP1 段,数码相机时代由此开启。

2002 年 Forgent Networks 突然声称对 JPEG 拥有专利(US 4,698,672),勒索了 Sony、Apple、Microsoft 等 30 多家公司,索赔上亿美元——当时所有人都担心 JPEG 要重蹈 GIF 覆辙。但这次故事不同:大量先例证据被找出来,公共审查检索文档(PubPat / EFF)发起反击,2006 年 USPTO 撤销了核心专利权利要求,2007 年 Forgent 撤诉。这是一场互联网工程史上少有的"专利劫持失败案例"——而它之所以能赢,正是因为 JPEG 的所有核心算法都是预先公开领域的。这条经验后来直接被 W3C 和 IETF 内化:任何要进标准的算法,必须有 royalty-free 承诺。

"JPEG" is not actually the format name — it's the working group: the Joint Photographic Experts Group, founded in 1986 by ISO and CCITT (later ITU-T) to standardise lossy compression for continuous-tone still images. From day one the committee deliberately steered around individual patents. The DCT-II is from a 1972 paper by Nasir Ahmed, long out of any patent term; Huffman coding dates to 1952; LZ77 is from Lempel and Ziv in 1977 — all foundational algorithms in the public domain. The committee's job was just to wire them together. The first formal standard, ISO/IEC IS 10918-1 / ITU-T T.81, landed in 1992 — four years before PNG, two years before the GIF/LZW patent war.

What turned JPEG into an actual file format was JFIF 1.02 in 1992 — before that the JPEG spec only described the codec stream; it said nothing about how to embed it in a file, nothing about APP0, not even a mandatory order for SOI / EOI markers. C-Cube Microsystems' JFIF nailed all of that down, became the de-facto standard in 1994, and to this day virtually every imaging tool parses .jpg as if JFIF were law. JEIDA followed in 1995 with Exif, slipping camera metadata, thumbnails and GPS into APP1 — the moment that turned JPEG into the digital-camera format.

In 2002 Forgent Networks suddenly claimed it owned a JPEG patent (US 4,698,672), shaking down more than thirty companies — Sony, Apple, Microsoft — for hundreds of millions. Everyone braced for a GIF-style disaster. This time the story went differently: prior-art was assembled, PubPat and the EFF organised the public re-examination, and in 2006 the USPTO struck down the core claims. Forgent dropped the suit in 2007. It's one of the rare clean wins against patent capture in internet history — and it succeeded precisely because every core algorithm in JPEG was already in the public domain. The lesson was internalised by both the W3C and the IETF: anything that enters a web standard must come with a royalty-free commitment.

format	year	typical 1080p photo	quality at same size
JPEG Q85	1992	≈ 250 KB	baseline
WebP Q75	2010	≈ 165 KB	≈ JPEG Q85
HEIC Q60	2015	≈ 125 KB	≈ JPEG Q85
AVIF Q60	2019	≈ 95 KB	≈ JPEG Q85
JXL Q90	2021	≈ 85 KB	≈ JPEG Q85

$ cjpeg -quality 85 -optimize -progressive in.ppm > out.jpg   # reference libjpeg encoder, progressive scan
$ jpegoptim --max=85 --strip-all in.jpg                       # cap quality at 85, drop all metadata in place
$ mozjpeg cjpeg -quality 85 in.png > out.jpg                  # Mozilla's encoder — 5–10% smaller at same Q
$ exiftool -all= -overwrite_original in.jpg                   # nuke all Exif / GPS / thumbnail metadata

适用

USE FOR

真实照片(自然图像、人像、风景)
真彩渐变、模糊背景、艺术摄影
任何"颜色丰富、连续变化"的内容
需要最大兼容性的场景:每一台设备都能解

Real photographs (nature, portraits, landscapes)
Truecolor gradients, soft backgrounds, art photography
Anything with rich, continuously varying colour
Maximum compatibility — every device on Earth decodes JPEG

反适用

AVOID

文字 / 截图 / UI:8×8 块边界会出现明显方格 artifact
线稿 / 卡通 / 像素艺术:锐边附近出现 ringing
需要 alpha 通道的任何场景
需要无损保留每一个像素的工程影像

Text / screenshots / UI: visible 8×8 block artefacts
Line art / cartoons / pixel art: ringing near sharp edges
Anything needing an alpha channel
Engineering images where every pixel must survive intact

scope	browsers	tools	CLI
JPEG / JFIF / Exif	✓✓✓ universal — every browser, every OS, every camera	✓✓✓ Photoshop · Lightroom · Figma · Preview · everything	`cjpeg` · `jpegoptim` · `mozjpeg` · `exiftool`

奇闻 · TRIVIA

TRIVIA

JPEG 标准里其实定义了一个 lossless mode(10918-1 第 4 部分),用差分预测 + Huffman,但几乎没人实现——现实中说"lossless JPEG"一般指的是另一个独立标准 JPEG-LS(14495-1)。再一个冷知识:渐进式 JPEG 不是"分块下载",而是把 DCT 系数按频率分多次扫描传输——浏览器收到第一段(只有低频)就能立即渲染一张模糊的全图,后续扫描逐步把细节填补上去。这就是 90 年代拨号上网时代图片"先模糊再清晰"的来历。还有更扎心的:JPEG 的 8×8 分块边界本身就是它最大的画质短板——压得狠时(Q < 50),整张图上能肉眼看到细密的"方格",这就是著名的 blocking artifact;mozjpeg / Guetzli 这些后来的优化器,本质上都是在 8×8 这个固定边界上想办法变魔术。

The JPEG spec actually defines a lossless mode (10918-1 part 4) using differential prediction plus Huffman — but almost nobody implements it. When people say "lossless JPEG" today they usually mean an entirely separate standard, JPEG-LS (14495-1). One more piece: progressive JPEG isn't "chunked download" — it's DCT coefficients transmitted in multiple frequency-ordered scans. The browser renders a blurry full-resolution image as soon as the first scan (low frequencies only) arrives, and subsequent scans fill in detail. That's where the "fuzzy → sharp" loading effect of 1990s dial-up came from. And the painful one: JPEG's 8×8 block boundaries are its biggest visual weakness. Crank the quality down (Q < 50) and the whole image visibly tiles — that's the infamous blocking artefact. mozjpeg, Guetzli and every other modern encoder is essentially doing magic tricks within those fixed 8×8 walls.

←父:parent: none — DCT-II + Huffman + RLE was a fresh assembly →子:children: JPEG-LS · JPEG 2000 · JPEG XR · WebP · HEIC · AVIF · JPEG XL — every one of them is a "JPEG successor"

JPEG-LS — 你没听说过的无损 JPEG

JPEG-LS — The Lossless JPEG You Never Heard Of

YEAR 1997 (ISO/IEC 14495-1) AUTHOR HP Labs · LOCO-I (Weinberger et al.) EXT .jls / .jpgls MIME image/jls STD ISO/IEC 14495 LOSSY lossless + near-lossless DEPTH 2–16 bit ALPHA none (multi-channel handled separately) ANIM none STATUS medical-imaging niche

比 PNG 快 3 倍,但因为不带容器和颜色管理,谁都没记住它。

3× faster than PNG, but no container, no colour management — and no one remembered.

1990 年代中期,医学影像的需求摆在桌面上:CT 和 MRI 一帧就是 12-bit 灰度大图,一次扫描几百帧——必须无损,但要比 PNG 简单、要比 JPEG 的 lossless mode 实用。HP Labs 的 Marcelo Weinberger 团队拿出了 LOCO-I(LOw COmplexity LOssless COmpression for Images)算法:用三像素中位数预测器(MED)估算下一个像素,把残差送进 Golomb-Rice 编码,在长平滑区域切到 RLE。1997 年 ISO/IEC 14495-1 正式发布,实测比 PNG 压缩率略好、解码速度快 3-5 倍——但它没有自带容器(只是裸 bitstream + 极简 marker),没有 ICC profile,没有元数据,没有 alpha,Web 浏览器全部当它不存在。最后只有医学影像活到了今天:DICOM 把 JPEG-LS 当成它的标准内嵌编码之一。

By the mid-1990s the medical-imaging world had a clear ask: CT and MRI frames were 12-bit greyscale, hundreds of slices per scan, and they had to be lossless — but simpler than PNG and more usable than JPEG's lossless mode. Marcelo Weinberger and team at HP Labs produced LOCO-I (LOw COmplexity LOssless COmpression for Images): a median-of-three predictor (MED) estimates each pixel, the residual goes into Golomb-Rice coding, and long flat runs switch to RLE. ISO/IEC 14495-1 shipped in 1997, beating PNG slightly on ratio and decoding 3–5× faster — but JPEG-LS arrived as a bare bitstream with minimal markers: no container, no ICC profile, no metadata, no alpha. Browsers ignored it entirely. Only medical imaging kept it alive — DICOM still embeds JPEG-LS as one of its standard encodings.

图 7 · MED 预测器。当前像素 x 由其上方 b、左方 a、左上 c 三个邻居预测——根据 c 与 a/b 极值的关系三选一,本质上是在猜测当前位置是水平边缘、垂直边缘还是平面。预测出来的值与真实值的差(残差)再送进 Golomb-Rice 编码——这套加法运算简单到 90 年代医院 CT 设备的弱 CPU 也能跑得动。

Fig 7 · The MED predictor. The current pixel x is estimated from neighbours b (above), a (left) and c (top-left) — three branches pick whichever fit, essentially guessing whether the local context is a horizontal edge, a vertical edge, or a flat surface. The residual (predicted minus actual) is then Golomb-Rice coded. The arithmetic is so simple that even the weak CPUs in 1990s hospital CT scanners could keep up.

技术内核

Technical core

JPEG-LS 的精彩之处在于"用三件极简的事打败了 PNG"。① MED 预测器——只用左、上、左上三个像素就能判断当前像素位置在水平边、垂直边还是平坦区:c ≥ max(a,b) 说明这是从右上往下的边,预测取 min(a,b);c ≤ min(a,b) 反向,取 max(a,b);否则就是平坦区,预测取 a+b-c(即沿梯度延伸)。预测准了,残差就接近 0。② Golomb-Rice 熵编码——预测残差大致服从拉普拉斯/几何分布,Golomb-Rice 是这种分布的最优前缀码:把残差除以 2^k 拆成商和余数,商用 unary 码(若干个 1 加一个 0),余数用 k 位定长。参数 k 在编码过程中根据上下文自适应,完全跳过了 Huffman 表的构造,无需多遍扫描。③ Run-length mode——当解码器探测到连续多个像素被相同的上下文预测、且残差全部为 0 时,自动切换到 RLE 模式直接编码 run length——这是它在医学灰度图(大量黑底)和文档扫描上完胜 PNG 的关键。整个 codec 没有 DCT、没有变换、没有量化(无损模式),数学上接近"算术替代变换"。

JPEG-LS beats PNG with three tiny ideas. ① MED predictor: just three neighbouring pixels (left, above, top-left) decide whether the current pixel sits on a horizontal edge, vertical edge or smooth surface. c ≥ max(a,b) picks min(a,b); c ≤ min(a,b) picks max(a,b); otherwise a + b − c (planar extrapolation). When the predictor is right, the residual is near zero. ② Golomb-Rice entropy coding: residuals roughly follow a Laplacian / geometric distribution, and Golomb-Rice is the optimal prefix code for it — divide the residual by 2^k, encode the quotient in unary (k ones plus a terminating zero) and the remainder in k bits flat. The parameter k adapts per context during encoding, so there's no Huffman table to construct and no extra pass over the data. ③ Run-length mode: when the decoder sees consecutive pixels predicted by the same context with zero residuals, it switches to RLE and encodes the run length directly — the move that destroys PNG on medical greyscales (mostly black background) and document scans. The whole codec has no DCT, no transform, no quantisation (in lossless mode); it's almost pure arithmetic replacing transforms.

适用

USE FOR

医学影像 DICOM 内嵌(CT / MRI 标准无损编码)
高速无损归档:解码比 PNG 快 3–5×
嵌入式设备无损相机(算力极低)
文档扫描:大平面 + 边缘的混合内容

DICOM medical imaging (a standard lossless encoding for CT / MRI)
High-throughput lossless archival — 3–5× faster decoding than PNG
Embedded lossless cameras with very tight CPU budgets
Document scans — flat regions plus sharp edges

反适用

AVOID

Web:浏览器零原生支持
需要 alpha 通道
需要内嵌 ICC profile / EXIF / 元数据

The web — zero native browser support
Anything requiring an alpha channel
Anything requiring embedded ICC profiles / EXIF / metadata

scope	browsers	tools	CLI
JPEG-LS	✗ none	✓ DCMTK · CharLS · MATLAB / Python (pylibjpeg)	`charls -e in.pgm out.jls` · `dcmcjpls in.dcm out.dcm`

奇闻 · TRIVIA

TRIVIA

DICOM 的传输语法 UID 1.2.840.10008.1.2.4.80(无损)和 1.2.840.10008.1.2.4.81(近无损)就是 JPEG-LS——你下次去医院做 CT,扫描出来的每一片切片走的就是这个编码。"LOCO" 这个名字,LO 是 LOw、CO 是 COmplexity——设计目标第一条就是"慢机器也跑得动",因为 90 年代的 CT 工作站 CPU 都很弱,医院预算又有限,Weinberger 团队的论文反复强调"实现复杂度低于 JPEG 本身"。讽刺的是,正因为它太省心、太朴素,反而完全没人在 Web 上用——浏览器厂商连"研究一下要不要支持"的会都没开过。

DICOM transfer-syntax UIDs 1.2.840.10008.1.2.4.80 (lossless) and 1.2.840.10008.1.2.4.81 (near-lossless) are JPEG-LS — every CT slice your hospital prints is encoded with it. The name LOCO stands for LOw COmplexity. Goal one in Weinberger's paper was that "even slow machines can run it" — 1990s CT workstations had weak CPUs and hospital budgets were tight, so the paper repeatedly emphasises "implementation complexity lower than JPEG itself". Ironically, that minimalism is exactly why nobody on the web ever picked it up — browser vendors never even held a meeting to consider it.

←父:parent: JPEG (shared standardisation track, independent algorithm) →子:children: indirectly informed PNG's filter strategy · still alive inside DICOM today

JPEG 2000 — 小波变换的悲壮失败

JPEG 2000 — The Tragic Defeat of the Wavelet

YEAR 2000 (ISO/IEC 15444-1) AUTHOR JPEG Working Group EXT .jp2 / .jpx / .j2k MIME image/jp2 STD ISO/IEC 15444 LOSSY lossy + lossless (same algorithm) DEPTH 1–38 bit ALPHA ✓ ANIM none STATUS cinema (DCI) · satellite · medical — dead on the web

技术比 JPEG 强,专利让它寸步难行。

Technically beats JPEG. Patents tied its feet.

90 年代末 JPEG 的痛点是清楚的:8×8 块边界肉眼可见、没有 alpha、压缩等级单一、metadata 设计落后。JPEG 工作组想用一个全新算法一次性解决——结果就是 JPEG 2000:用整图 DWT(离散小波变换) 取代 8×8 DCT,无块边界、天然多分辨率;用 EBCOT(Embedded Block Coding with Optimized Truncation) 做编码,可以按"质量层、分辨率层、组件层、空间区域"任意子集解码——同一个 .jp2 文件,你可以只取低分辨率缩略,也可以只取一个画面区域。技术上完胜 JPEG。但有两件事压垮了它:① 解码复杂度比 JPEG 高 10 倍以上,移动设备根本跑不动;② 标准里嵌了几十项专利(虽然多数 RAND 免费),浏览器厂商出于法律风险拒绝实现。Mozilla 和 Google 多次明确说"不"。最后只在三个不在意延迟和算力的领域活下来:数字影院(DCI 强制使用)、卫星图像、医学影像。Safari 是唯一原生支持的浏览器——这是 Apple ImageIO 框架顺带带的,Apple 自己也不主推。

JPEG's 1990s pain points were obvious: visible 8×8 block boundaries, no alpha, only one compression curve, dated metadata. The JPEG WG tried to fix all of it with a clean-sheet algorithm — JPEG 2000. Replace the 8×8 DCT with a whole-image discrete wavelet transform (no block edges, naturally multi-resolution). Replace the entropy coder with EBCOT (Embedded Block Coding with Optimised Truncation), which lets a decoder grab any subset of quality / resolution / component / region from the same .jp2 file — pull just a thumbnail, or just one ROI. Technically it crushes JPEG. Two things broke it. ① Decoding cost is 10× JPEG or more — mobile silicon could not keep up. ② The standard sits on dozens of patents (most RAND-free, but the legal cloud was real), and browser vendors refused to implement it. Mozilla and Google both said no on record. JPEG 2000 survived only in three latency-insensitive, compute-rich worlds: digital cinema (DCI mandates it), satellite imagery, and medical imaging. Safari is the only browser that ships native support — and even that came along for free with Apple's ImageIO framework. Apple never promoted it.

图 8 · 三级 DWT 子带金字塔。每一级把图像分解成 LL(低频/缩略)、HL(水平细节)、LH(垂直细节)、HH(对角细节)四个子带,然后在 LL 上递归。三级递归后,最左上角那一小块 LL₃ 就是 1/8 大小的天然缩略图——无需重新解码原图就能拿到。这正是"按需取分辨率"的物理基础:解码器要 1/8 缩略只读 LL₃,要 1/4 加上 LH₂/HL₂/HH₂,以此类推。

Fig 8 · A 3-level DWT subband pyramid. Each level splits the image into LL (low-frequency / thumbnail), HL (horizontal detail), LH (vertical detail) and HH (diagonal detail), then recurses on LL. After three levels, the tiny top-left LL₃ is a free 1/8-scale thumbnail — no re-decoding required. This is the physical basis of "decode any resolution you want": grab just LL₃ for 1/8, add the level-2 subbands for 1/4, and so on.

技术内核

Technical core

JPEG 2000 的技术结构有四件事值得记住。① DWT 小波变换——替代 DCT。无损模式用 5/3 整数小波(可逆),有损模式用 9/7 浮点小波(更高效)。整图变换没有 8×8 块边界,所以彻底消除了 JPEG 那种"打格子"的 artifact;同时小波天然多分辨率——见上图。② tile + code-block + EBCOT 三层切分——大图先按 tile(典型 256×256 或 1024×1024)分块独立处理,tile 内部 DWT 后的每个子带再切成 code-block(典型 64×64),EBCOT 对每个 code-block 做位平面编码 + 算术编码,最后 R-D 优化决定哪些位平面截断。③ quality / resolution / component / position progression——同一个 .jp2 文件可以按四种顺序组织码流,解码器拿到任意前缀就能解出对应的一份"低质量但完整"或"高质量但单一分辨率"或"单一区域"的图像。这是 IIIF(图书馆/博物馆高分辨率扫描)的核心能力。④ 同算法既无损又有损——只通过量化步长切换,不像 JPEG / JPEG-LS 是两个独立标准。

Four pieces are worth remembering. ① Discrete wavelet transform replaces the DCT. Lossless mode uses the reversible 5/3 integer wavelet; lossy mode uses the 9/7 floating-point wavelet (higher efficiency). Whole-image transform = no 8×8 block edges = no JPEG-style tiling artefacts. The wavelet is also naturally multi-resolution (see figure above). ② tile + code-block + EBCOT three-level partitioning. Large images are first split into tiles (typically 256×256 or 1024×1024), each tile is wavelet-transformed, each subband is split into code-blocks (typically 64×64), and EBCOT bit-plane codes each block with arithmetic coding before R-D optimisation decides which bit-planes to truncate. ③ Quality / resolution / component / position progression: a single .jp2 can order its codestream four different ways, and any prefix the decoder receives yields either a "low-quality but complete" or "high-quality but single-resolution" or "single-region" image. This is the core capability behind IIIF (the library / museum high-resolution scan protocol). ④ One algorithm, both lossless and lossy — switching is just a matter of the quantisation step, not a separate standard like JPEG vs JPEG-LS.

适用

USE FOR

DCI 数字影院(标准强制 — 你影院看的每一帧都是 .j2k)
卫星 / 遥感 / 航拍超大图(按需取分辨率)
医学影像 DICOM 的"高保真无损"选项
文化遗产高分辨率扫描(IIIF 图像服务器)

DCI digital cinema (mandatory — every frame in your theatre is .j2k)
Satellite / remote-sensing / aerial gigapixel imagery (decode-on-demand)
DICOM medical imaging when "high-fidelity lossless" is required
Cultural-heritage high-resolution scans (IIIF image servers)

反适用

AVOID

Web — 除 Safari 外全军覆没
移动端 / 任何低算力解码场景
需要快速预览的桌面应用(解码慢)

The web — every browser except Safari refuses to ship it
Mobile / any low-compute decoding context
Desktop apps that need snappy thumbnails — decoding is slow

scope	browsers	tools	CLI
JPEG 2000	✓ Safari only · ✗ Chrome / Firefox / Edge	✓✓ Photoshop · GIMP · Preview · ImageMagick	`opj_compress -i in.png -o out.jp2` · `kdu_compress` (commercial)

奇闻 · TRIVIA

TRIVIA

Safari 在 macOS 和 iOS 上是唯一原生支持 JPEG 2000 的浏览器——但这不是 Apple 主动推 JPEG 2000,而是 Apple 的 ImageIO 框架(系统级的图像编解码层)顺带带的能力,Safari 只是把所有 ImageIO 能解的格式都接到 <img> 上。一个更扎心的事实:DCI 数字影院规范(2005 年颁布)强制 JPEG 2000 用作电影发行格式——你最近一次去影院看电影,无论是 2K 还是 4K,每一帧都是 JPEG 2000 编码的(打包成 MXF 容器,装在影院专用硬盘里)。你在 Web 上几乎看不到的格式,是全世界电影院每天都在解码的格式。

Safari is the only browser with native JPEG 2000 support on macOS and iOS — but Apple never championed JPEG 2000. It came along free because Apple's system-level ImageIO framework supports it, and Safari just wires up everything ImageIO can decode to the <img> tag. A sharper fact: the DCI digital-cinema spec (2005) mandates JPEG 2000 as the film distribution format. Every frame of every movie you've seen in a cinema, 2K or 4K, is JPEG 2000 — wrapped in MXF containers, shipped to theatres on physical hard drives. The format almost no one sees on the web is the format every cinema decodes daily.

←父:parent: JPEG (intended successor that never inherited) →子:children: JPX (scientific variant) · survives in DCI / DICOM / IIIF

JPEG XR — 微软的最后一次努力

JPEG XR — Microsoft's Last Attempt

YEAR 2006 (HD Photo) · 2009 (ISO/IEC 29199) AUTHOR Microsoft EXT .jxr / .hdp / .wdp MIME image/jxr STD ISO/IEC 29199 LOSSY lossy + lossless DEPTH 8 / 16 / 32 bit · float HDR ALPHA ✓ ANIM none STATUS near-dead — Edge Legacy only · removed in Chromium Edge

微软第一个支持 HDR 32-bit float 的 web 格式,但 Chrome 没要它。

Microsoft's first 32-bit float HDR web format. Chrome said no.

2006 年微软看着 Web 上的图片格式仍然是 JPEG / GIF / PNG 三件套,觉得机会来了:推一个比 JPEG 强、带 alpha、支持 HDR 浮点、解码比 JPEG 2000 快的"下一代"web 图片格式。原名 HD Photo / Windows Media Photo,2009 年通过 ISO/IEC 29199 标准化为 JPEG XR(XR = eXtended Range)。技术上确实漂亮:16×16 大块 PCT 变换比 JPEG 的 8×8 DCT artifact 更不可见、原生支持 RGBE 和 scRGB 的 32-bit float HDR、无损与有损共用算法。微软在 Internet Explorer 9 / Edge Legacy 里直接内置了原生支持。但是——Chromium 拒绝实现,Mozilla 拒绝实现。理由很直白:"我们已经在押注 WebP / AVIF,不想为一个微软推的格式增加攻击面"。Edge 在 2018 年放弃自己的渲染引擎转 Chromium 后,JPEG XR 的最后一个原生支持者也消失了。讽刺的是,这套"微软推格式 - Chrome 拒绝实现 - 格式死亡"的剧本,后来被 Google 反过来用在了 WebP 推广上——你推什么我接什么,我推什么你最好接。

By 2006 Microsoft surveyed the web and saw JPEG / GIF / PNG still ruling the field. They saw a gap: ship a next-generation format that beats JPEG, adds alpha, supports HDR floats, and decodes faster than JPEG 2000. Originally HD Photo / Windows Media Photo, it was standardised in 2009 as JPEG XR ("eXtended Range") under ISO/IEC 29199. The technology was genuinely good: a 16×16 photo core transform (PCT) replaces JPEG's 8×8 DCT, with much less visible blocking; native support for RGBE and scRGB 32-bit float HDR; lossless and lossy sharing one algorithm. Microsoft baked native support into Internet Explorer 9 and Edge Legacy. But Chromium refused. Mozilla refused. The reasoning was blunt: "we're already betting on WebP / AVIF; we don't want extra attack surface for a Microsoft-pushed format." When Edge gave up its own rendering engine and switched to Chromium in 2018, the last browser with native JPEG XR support vanished. The painful irony: the "Microsoft pushes a format → Chrome refuses → format dies" playbook was later inverted by Google for WebP — what you push, I'll accept; what I push, you'd better accept.

图 9 · 块大小对比。同一片区域,JPEG XR 用一个 16×16 PCT 块覆盖(左,蓝边),JPEG 要拆成 4 个 8×8 DCT 块(右,红边——内部 4 条红线就是肉眼可见的 blocking artifact 来源)。块越大,边界越少,低质量(Q < 50)时画面就越平滑。这是 JPEG XR 视觉上比 JPEG 干净的核心原因。

Fig 9 · Block-size comparison. Across the same physical area, JPEG XR covers it with a single 16×16 PCT block (left, blue border), while JPEG must split it into four 8×8 DCT blocks (right, red borders — those four red lines inside are exactly where the visible blocking artefact lives). Larger blocks mean fewer boundaries, which means smoother images at low quality (Q < 50). That's the core reason JPEG XR looks visually cleaner than JPEG.

技术内核

Technical core

JPEG XR 的技术设计有三个亮点。① 整数 16×16 PCT(Photo Core Transform)——本质上是一个类 DCT 的整数变换,但块更大、内部还有一层 4×4 子变换做"重叠"(lapped transform),让块与块之间不再有硬边界。同等质量下,JPEG XR 的 blocking artifact 比 JPEG 弱得多,但解码复杂度只比 JPEG 高一点点(远低于 JPEG 2000 的 10×)。② 原生 HDR float 支持——这是 JPEG XR 最超前的部分。它直接编码 RGBE(共享指数 32-bit)和 scRGB 浮点,不需要色调映射就能存高动态范围内容。这比 HEIC / AVIF 推广 HDR 早了将近十年——但当时显示器和操作系统都没准备好,没人用得上。③ 共享熵编码思路——熵编码部分仍然用类 JPEG 的"块+扫描+游程+熵"路径,所以软件实现成本低,微软自己的参考实现一千多行 C 就够了。这跟 JPEG 2000 几万行的复杂度相比,工程上确实"够轻"——但终究敌不过浏览器厂商的政治意愿。

JPEG XR has three technical strengths. ① Integer 16×16 PCT (Photo Core Transform) — essentially a DCT-like integer transform with a larger block, plus an inner 4×4 sub-transform that does a lapped overlap, killing hard block edges between adjacent macro-blocks. At equal quality JPEG XR shows much weaker blocking than JPEG, while costing only marginally more to decode (nowhere near JPEG 2000's 10×). ② Native HDR float support — the most forward-looking piece. It encodes RGBE (shared-exponent 32-bit) and scRGB floating-point directly, storing high-dynamic-range content without tone-mapping. This predated HEIC's and AVIF's HDR push by nearly a decade — but in 2006 neither displays nor operating systems were ready, and nobody had a workflow for it. ③ Shared entropy-coding lineage — the entropy back end is still a JPEG-style "block + scan + run-length + entropy" pipeline, so implementations are small. Microsoft's own reference implementation is barely a thousand lines of C — far lighter than JPEG 2000's tens of thousands. Engineering cost wasn't the problem. Browser-vendor politics was.

适用

USE FOR

(历史) Windows 7 Photo Viewer 默认支持的高质量缩略图
(历史) Office 2010+ 内置 HD Photo 编辑
研究 / 兼容老 Windows 资源时

(historical) High-quality thumbnails in Windows 7 Photo Viewer
(historical) HD Photo editing built into Office 2010+
Research / interoperating with legacy Windows assets

反适用

AVOID

2026 任何现代场景:HEIC / AVIF / JPEG XL 全面替代
Web — 没有任何主流浏览器原生支持

Any modern 2026 scenario — HEIC / AVIF / JPEG XL fully replace it
The web — no major browser ships native support

scope	browsers	tools	CLI
JPEG XR	✗ none (Edge Legacy only · removed in Chromium Edge)	✓ Photoshop (plugin) · Windows Photos (legacy)	`JxrEncApp -i in.tif -o out.jxr` · `JxrDecApp`

奇闻 · TRIVIA

TRIVIA

JPEG XR 是 W3C 候选过的 image format,2009 年还出过 Working Draft——但 Chromium 和 Mozilla 同时表态"不会实现",事实上判了死刑。这件事成了后来 Google 推 WebP 时反复研究的剧本范本:看吧,光是 ISO 标准还不够,你必须搞定浏览器厂商的工程师才行。另一段冷历史:Windows 7 Photo Viewer 内置 JPEG XR 缩略图缓存——你的 .jxr 文件双击能直接预览,但没人知道这个文件是 JPEG XR,因为图标和 JPEG 一模一样。微软自己也没在任何宣传材料里提"我们的 Photo Viewer 用了下一代图片格式"——一个被自家产品默默使用、但用户感知为零的格式。

JPEG XR was a W3C candidate format and even reached Working Draft status in 2009 — but Chromium and Mozilla declared, in unison, that they would not implement it. That was effectively a death sentence. The whole episode became a textbook case Google studied carefully when launching WebP: an ISO stamp is not enough; you must win the browser-vendor engineers. A second piece of trivia: Windows 7 Photo Viewer used JPEG XR for its thumbnail cache — double-click any .jxr and it would preview natively, but nobody knew it was JPEG XR because the icon was identical to JPEG. Microsoft never advertised "we use a next-gen format in Photo Viewer" anywhere either — a format silently used by its own product, with zero user-facing presence.

←父:parent: JPEG →子:children: indirectly informed HEVC's intra coding · fully superseded by HEIC / AVIF

WebP — Google 把 VP8 帧内拿来做图

WebP — Google Carved an Image Format Out of a Video Frame

YEAR 2010 AUTHOR Google (acquired On2 Technologies, 2009) EXT .webp MIME image/webp STD Google 私有 → 事实标准 (RFC 6386 covers the underlying VP8) LOSSY VP8 intra (lossy) + VP8L (lossless) DEPTH 8 bpc ALPHA ✓ (8-bit, ALPH chunk) ANIM ✓ (ANIM + ANMF chunks) STATUS 主流 · Chrome 32+ / Firefox 65+ / Safari 14+ / Edge 18+

把 VP8 视频的一帧抠出来当图片,体积砍掉 30%。

Took one frame out of a VP8 video, shaved 30% off image size.

2010 年的 Google 看着 web 图片世界,觉得三件套(JPEG / PNG / GIF)中间还有一道明显的"裂缝":没有一种格式能同时满足"比 JPEG 小 30%、比 PNG 小 26%、还能动图 + alpha"。Google 当时刚刚在 2009 年用 1.246 亿美元收购了视频编码公司 On2 Technologies,手里握着一颗刚开源的 VP8 视频 codec——VP8 的 intra-frame(I 帧) 已经具备完整的图像帧内编码能力。Google 工程师的算盘很直接:与其重新发明轮子,不如直接把 VP8 的一帧拿出来,套一层 RIFF 容器,就是一种新的图片格式。WebP 由此诞生——它是历史上第一个"视频 codec 直接派生为图片格式"的工业级例子,后来 HEIC / AVIF 都走了完全相同的路线。

In 2010 Google looked at the web's image landscape and saw a clear gap in the JPEG / PNG / GIF triumvirate: nothing was simultaneously "30% smaller than JPEG, 26% smaller than PNG, and capable of both animation and alpha." Having just paid $124.6 million in 2009 to acquire the video-codec company On2 Technologies, Google now owned the VP8 video codec — and a VP8 intra-frame (I-frame) is already a complete still-image encoding pipeline. The Googlers did the obvious thing: pull out a single VP8 frame, wrap it in a RIFF container, ship it as a new image format. WebP was born — historically the first industrial-scale example of "video codec directly repurposed into still-image format". HEIC and AVIF later took the exact same playbook.

图 10a · VP8 帧内 4 种基础预测。块拿左列、上行、或左上角已经解出来的像素去"猜"自己,然后只编"猜错的差值"。这是 VP8(以及后来 HEVC / AV1)能比 JPEG 小一截的核心原因——JPEG 完全没有空间预测这一步。

Fig 10a · VP8's four base intra-prediction modes. Each block uses the already-decoded pixels from the left column, top row, or top-left corner to "guess" itself, then only the prediction residual gets encoded. This is the core reason VP8 (and later HEVC / AV1) beats JPEG on size — JPEG has no spatial-prediction step at all.

技术内核

Technical core

WebP 内部其实是两个完全独立的格式,共用一个 .webp 后缀和一个 RIFF 外壳。① VP8 intra-frame(有损):4×4 / 16×16 块预测(共 10 + 4 种 intra mode)→ 类 DCT 整数变换 → 量化 → boolean arithmetic coding(算术编码)。预测让"猜得准的部分不用传",算术编码比 Huffman 多挤出 5-15% 体积——这是 WebP 比 JPEG 小 30% 的两大功臣。② VP8L(无损):跟 VP8 一点关系都没有,是 Google 自己写的一套独立无损算法——14 种 spatial predictor + color cache(用 hash table 缓存最近用过的颜色)+ LZ77 + Huffman。在自然图像上比 PNG 小 26%,但编码慢 5-10×。③ RIFF 容器:借用微软 Wave 音频用过的 RIFF 格式——文件头是 RIFF<size>WEBP,后面跟 chunk 序列:VP8X(全局信息)/ VP8(有损主帧)/ VP8L(无损主帧)/ ALPH(独立 alpha 通道)/ ANIM + ANMF(动图)/ ICCP(色彩配置)/ EXIF / XMP。④ 独立 alpha:lossy 主帧不带 alpha,alpha 走单独的 ALPH chunk,可以选择无损 lossless 或有损 lossy 编 alpha——这是 WebP 比 JPEG + PNG 拼凑方案精巧的地方。⑤ animated WebP:ANIM 设全局参数(背景色 / 循环次数), ANMF 每帧带 disposal / blend / xy offset,逻辑跟 GIF 完全同源,但每帧用 VP8 / VP8L 编。

WebP is, in fact, two unrelated formats sharing a .webp extension and a RIFF wrapper. ① VP8 intra-frame (lossy): 4×4 / 16×16 block prediction (10 + 4 intra modes) → DCT-like integer transform → quantise → boolean arithmetic coding. Prediction means "the easy-to-guess parts don't need to ship" and arithmetic coding squeezes out another 5–15 % over Huffman — together those are why WebP runs ~30 % smaller than JPEG. ② VP8L (lossless): unrelated to VP8 — a separate lossless codec Google wrote from scratch — 14 spatial predictors + a color cache (hash-tabling recently-used colours) + LZ77 + Huffman. ~26 % smaller than PNG on natural images but 5–10 × slower to encode. ③ RIFF container: borrowed from Microsoft's Wave audio — the file starts with RIFF<size>WEBP, then a sequence of chunks: VP8X (global info) / VP8 (lossy main frame) / VP8L (lossless main frame) / ALPH (separate alpha channel) / ANIM + ANMF (animation) / ICCP (color profile) / EXIF / XMP. ④ Separate alpha: lossy main frames don't carry alpha; alpha lives in a dedicated ALPH chunk that can itself be encoded losslessly or lossily — much cleaner than JPEG + PNG patchwork. ⑤ animated WebP: ANIM sets the globals (background colour, loop count), each ANMF frame carries disposal / blend / xy-offset just like GIF, but each frame is itself VP8 or VP8L.

图 10b · VP8L 的两个核心武器。左:14 种空间 predictor(方向 / 平均 / Paeth 风格 / Select 等)负责"猜下一个像素的颜色";右:color cache 是一个 2^k 大小的 hash table,缓存最近用过的颜色,命中时只发一个短索引。两层杀完冗余,再交给 LZ77 + Huffman 做最终压缩。

Fig 10b · VP8L's two main weapons. Left: 14 spatial predictors (directional, average, Paeth-style, select-type, etc.) "guess" the next pixel. Right: a color cache — a 2^k-entry hash table of recently-seen colours; on a hit only a tiny index is emitted. Both layers strip redundancy before LZ77 + Huffman do the final pack.

图 10 · WebP 全流程(lossy 主路径) · RGB → YUV 4:2:0 → 16×16/4×4 切块 → intra 预测(10 种)→ DCT-like 整数变换 → 量化(★ 唯一有损步骤,Q 0-100)→ boolean arithmetic 编码 → RIFF 包外壳(VP8X + VP8 + 可选 ALPH/ICCP/EXIF/ANIM)→ .webp。无损路径走另一条线:VP8L 的 14 predictor + color cache + LZ77 + Huffman。

Fig 10 · The full WebP pipeline (lossy main path) · RGB → YUV 4:2:0 → split into 16×16 / 4×4 blocks → intra prediction (10 modes) → DCT-like integer transform → quantise (★ the only lossy step, Q 0–100) → boolean arithmetic coding → RIFF wrap (VP8X + VP8 + optional ALPH / ICCP / EXIF / ANIM) → .webp. The lossless path goes elsewhere: VP8L's 14 predictors + color cache + LZ77 + Huffman.

图 10c · WebP 的 RIFF 容器结构。最外层是 RIFF<size>WEBP 12 字节文件头;再里头 VP8X 描述全局 flag + 画布尺寸;然后 VP8(lossy)和 VP8L(lossless)二选一;ALPH 单独装 alpha(可独立选有损或无损);ANIM + 多个 ANMF 用于动图;ICCP / EXIF / XMP 是可选 metadata。

Fig 10c · WebP's RIFF container layout. Outer 12-byte header is RIFF<size>WEBP; VP8X holds global flags + canvas size; then either VP8 (lossy) or VP8L (lossless); ALPH carries alpha independently (itself lossy or lossless); ANIM + multiple ANMF chunks make up animation; ICCP / EXIF / XMP are optional metadata.

历史专栏 · BROWSER POLITICS

HISTORY · BROWSER POLITICS

WebP 推广记 · "先有产品再有标准"的得与失

Pushing WebP · The Cost of "Product First, Standard Later"

2010 年 5 月,Google 在 Chromium 博客上正式公布 WebP——发布会上的对比图很硬:"同一张照片,JPEG 312 KB,WebP 219 KB,体积小 30%"。但是首版 WebP 只支持有损,没有 alpha,没有动图——基本只能替代 JPEG。社区的反应是冷淡而怀疑的:"PNG 和 GIF 你都干不掉,凭什么叫 WebP?"Mozilla 在 2010 年公开拒绝实现,理由是"算法优势没有压倒性,没有合作开放标准流程"。Apple 干脆装作没看见。

Google 听到了,然后开始补功能:2012 年加 VP8L 无损模式(填上 PNG 的坑),同年加 ALPH chunk(填上 alpha 的坑),同年再加 ANIM/ANMF 做 animated WebP(填上 GIF 的坑)。三件事一年内做完——这是 WebP 历史上最快的发力期。即便如此,Mozilla 仍然不肯接受,直到 2019-01 才在 Firefox 65 里加上支持;Apple 一直拖到 2020-09 的 Safari 14 (iOS 14 / macOS 11)。等所有主流浏览器到齐,已经是 WebP 出生的第十年。

这是工业界从 WebP 上学到的两条惨痛经验。第一,光有 Google 一家推不动 web 标准——Chrome 装机量再大,Mozilla / Apple 不点头你就只是"半个支持"。第二,"先有产品再有标准"的路径会让格式背负多年的"非开放标准"骂名,即使技术上没问题。这正是后来 AOMedia 联盟做 AVIF 时反着干的原因——AV1 先在 AOMedia 联盟里走完标准化、把所有专利明确放到免费池子里,再做容器、再推浏览器。AVIF 从公布到全部主流浏览器支持只用了三年(2019 → 2022),比 WebP 快了三倍。

May 2010 — Google announces WebP on the Chromium blog. The pitch is sharp: "same photo, JPEG 312 KB, WebP 219 KB, 30 % smaller." But v1 is lossy-only, no alpha, no animation — really just a JPEG replacement. The community reception is sceptical: "you can't replace PNG or GIF — why are you calling it WebP?" Mozilla publicly refuses to ship it in 2010, citing "no overwhelming algorithmic advantage and no open-standards collaboration." Apple simply ignores the existence of the format.

Google heard it and went to work. 2012 brought VP8L's lossless mode (filling the PNG hole), the ALPH chunk (filling the alpha hole), and ANIM / ANMF (filling the GIF hole) — all three within twelve months, the most concentrated push in WebP's history. Even after that Mozilla refused for years; Firefox 65 finally added support in January 2019, and Safari 14 (iOS 14 / macOS 11) followed in September 2020. Universal browser availability arrived a full decade after WebP shipped.

Two painful lessons came out of WebP. First: Google alone cannot push a web standard — Chrome's market share doesn't matter if Mozilla and Apple don't ship it. Second: the "product first, standard later" path saddles a format with years of "not an open standard" criticism, even when the technology is fine. This is exactly why the AOMedia alliance played AVIF in reverse — AV1 was standardised inside AOMedia first, with every patent explicitly placed in a royalty-free pool, before the container and browsers came. AVIF needed only three years from announcement to full mainstream browser support (2019 → 2022) — three times faster than WebP.

codec	encode time	decode time	typical Q	1080p photo
JPEG (mozjpeg)	1.0 ×	1.0 ×	85	≈ 250 KB
WebP (cwebp)	≈ 3 ×	≈ 1.5 ×	75	≈ 165 KB
AVIF (avifenc)	≈ 50 ×	≈ 3 ×	60	≈ 95 KB

图 10d · 同一张 1080p 照片在视觉接近的 Q 下三种格式的体积。WebP 砍掉 JPEG 三分之一,AVIF 在 WebP 基础上再砍掉四成。这正是"为什么 WebP 不是终点"——AVIF 后来居上的根本理由。

Fig 10d · The same 1080p photograph at visually-equivalent Q levels across three codecs. WebP cuts a third off JPEG; AVIF then takes another 40 % off WebP. This is exactly why "WebP isn't the finish line" — and the core reason AVIF eventually overtook it.

$ cwebp -q 75 in.png -o out.webp                        # lossy default · Q 75 ≈ JPEG Q85 quality
$ cwebp -lossless in.png -o out.webp                    # VP8L lossless path · 5–10× slower
$ cwebp -near_lossless 60 in.png -o out.webp            # lossy preprocessing then lossless encode
$ cwebp -q 80 -alpha_q 100 in.png -o out.webp           # keep alpha lossless even with lossy RGB
$ webpmux -frame f1.webp +100 -frame f2.webp +100 \
          -loop 0 -o anim.webp                          # build animated WebP from frames
$ dwebp out.webp -o decoded.png                         # decode back to PNG for inspection

适用

USE FOR

2026 web 图片首选——所有现代浏览器都支持(Chrome 32+ / Firefox 65+ / Safari 14+ / Edge 18+)
需要 alpha 的产品图、电商主图(替代 PNG-24)
需要 animation 的 UGC、表情、loading(替代 GIF,体积只有 1/4)
CDN 自动转换 pipeline(Cloudinary、Fastly、Imgix 都支持)

The default web image format in 2026 — every modern browser ships it (Chrome 32+, Firefox 65+, Safari 14+, Edge 18+)
Product photos and e-commerce hero shots that need alpha (replaces PNG-24)
UGC stickers, reactions, loading anims (replaces GIF at ¼ the size)
CDN auto-conversion pipelines (Cloudinary, Fastly, Imgix all support it)

反适用

AVOID

iOS < 14 设备(2014 年前的 iPhone 5/5s 等)
邮件附件(很多邮件客户端、Outlook 老版本不渲染)
设计交付 / 印刷输出(用 PNG / TIFF / PSD)
需要更高压缩率的现代场景——直接用 AVIF / JXL

iOS < 14 devices (pre-2014 iPhone 5 / 5s and friends)
Email attachments — many clients and older Outlook versions still won't render WebP
Design hand-off / print output — use PNG / TIFF / PSD instead
Modern scenarios that need maximum compression — go straight to AVIF / JXL

scope	browsers	tools	CLI
WebP (lossy + lossless + alpha + anim)	✓ Chrome 32+ · Firefox 65+ · Safari 14+ · Edge 18+ · Opera 19+	✓ Photoshop (24+ native) · Sketch · Figma · Squoosh · ImageMagick · GIMP · Affinity	`cwebp / dwebp / webpmux / gif2webp` (libwebp by Google)

奇闻 · TRIVIA

TRIVIA

WebP 在 lossless 模式下经常比 PNG 小 26%,但同样的图编码要慢 5-10×——Google 的官方对比表里从来不主动放这个数字。社区里流传的另一个梗是"WebP 软糊问题":在低 Q(< 50)下,WebP 的颜色和细节会被算术编码 + 较少的 mode 数过度平滑成一团糊,而 JPEG 同 Q 下的方块虽然丑但至少棱角分明——美学上 JPEG 还更"干脆"。AVIF 的诞生有一部分原因就是为了解决 WebP 的这个低 Q 软糊问题。还有一段没什么人讲:Google 内部代号 WebP 来自"WebM + Picture",WebM 是 Google 同期推的 VP8 视频容器格式,两者本来要做"姐妹"——但 WebM 在 web 视频上彻底败给了 H.264 / H.265,只剩 WebP 一个孤儿活下来。

In lossless mode WebP is routinely ~26 % smaller than PNG, but the same image encodes 5–10 × slower — a number Google never volunteers in its official comparison tables. Another community meme is "WebP smear": at low Q (< 50) the combination of arithmetic coding and a small mode set tends to over-smooth colours and detail into a foggy blur, while JPEG at the same Q produces visible blocking that is at least crisp at the edges — aesthetically, the JPEG often feels more honest. Solving WebP's low-Q smear was a stated reason for building AVIF. Less-told history: WebP's internal codename came from "WebM + Picture" — WebM was Google's contemporaneous VP8-based video container, intended to be WebP's sibling. WebM lost the web-video war to H.264 / H.265 outright; WebP is the surviving orphan.

←父:parent: VP8 (Google / On2 video codec, 2008) →致敬:tribute: AVIF (same idea: video intra-frame → still image) →派生:derived: animated WebP · WebP2 (R&D, never shipped)

HEIC / HEIF — 苹果与专利墙

HEIC / HEIF — Apple and the Patent Wall

YEAR 2015 (HEIF, ISO/IEC 23008-12) · 2017 iOS 11 default capture AUTHOR MPEG (HEIF container) · Apple (HEIC instantiation) EXT .heic / .heif MIME image/heic · image/heif STD ISO/IEC 23008-12 (HEIF) + HEVC (payload, ISO/IEC 23008-2) LOSSY HEVC intra (lossy) · lossless mode exists but rare DEPTH 8 / 10 / 12 bit ALPHA ✓ ANIM ✓ + Live Photo 混合容器(.heic + .mov 双文件) STATUS iOS 默认 / macOS 原生 · Web 几乎不支持(Safari ✓ · Chrome / Firefox ✗)

技术上是 AVIF 的爸爸,专利上是 AVIF 的反例。

Technically the parent of AVIF; legally the cautionary tale.

2015 年 MPEG 把 HEVC(H.265 视频)的帧内编码能力封装成一个图像容器规范,叫 HEIF(High Efficiency Image File Format),标准号 ISO/IEC 23008-12。思路与 WebP 完全同源:用现代视频 codec 的 intra-frame 做静态图像编码,用 ISOBMFF(MP4 同根的容器)装。HEIF 是个"容器规范",真正的像素 codec 由 payload 决定——用 HEVC 装就叫 HEIC(.heic),用 AVC/H.264 装就叫 HEIF AVCI;Apple 选了前者。2017 年 9 月 iOS 11 把相机默认存储格式从 JPEG 改成 HEIC——一夜之间,全球数亿台 iPhone 开始产生 HEIC 文件。比 JPEG 体积小一半、支持 10-bit HDR、支持 alpha、支持多对象嵌套——技术上没毛病,问题全在专利。

In 2015 MPEG wrapped HEVC's (H.265 video) intra-frame coding into an image-container spec called HEIF — High Efficiency Image File Format, ISO/IEC 23008-12. Same thinking as WebP: take a modern video codec's intra-frame, use it as a still-image codec, package it in ISOBMFF (the same container family as MP4). HEIF itself is just a container spec; the actual pixel codec depends on the payload — HEVC-payloaded HEIF is HEIC (.heic), AVC/H.264-payloaded HEIF is HEIF AVCI. Apple picked HEVC. In September 2017, iOS 11 switched the camera's default capture format from JPEG to HEIC — overnight, hundreds of millions of iPhones started producing HEIC files. Half the size of JPEG, 10-bit HDR support, alpha, nested multi-image objects — technically flawless. All the problems are in the patents.

图 11 · HEIF 的 ISOBMFF box 树。ftyp 标 brand(heic = HEVC payload);meta 是元数据容器,里头 hdlr 标"图像句柄"、pitm 指定主图 item id、iinf 列所有 item、iloc 给 byte 偏移、iprp 装属性(HEVC config / color / 尺寸);mdat 装真正的 HEVC bitstream——主图、缩略图、派生项、alpha 都是独立 item,通过 iloc 查表找位置。

Fig 11 · HEIF's ISOBMFF box tree. ftyp declares the brand (heic = HEVC payload). meta is the metadata container — hdlr tags it as an image handler, pitm names the primary-item id, iinf lists every item, iloc gives their byte offsets, iprp carries item properties (HEVC config, colour, dimensions). mdat holds the actual HEVC bitstreams — main image, thumbnails, derived items, alpha all live as independent items, each addressed via iloc.

技术内核

Technical core

HEIF / HEIC 的技术构造分四层。① HEIF 容器 = ISOBMFF box 系——跟 MP4 / MOV / 3GP 同根的"box-in-box"二进制格式,每个 box 4-byte size + 4-byte FourCC type + payload。这套格式过去 20 年被全球视频行业打磨得极其成熟,标准库一抓一大把,Apple 自然顺手。② HEVC intra-frame payload——CTU(Coding Tree Unit)最大可达 64×64,远大于 JPEG 的 8×8 / WebP 的 16×16,同样质量下 macroblock artifact 几乎肉眼不可见;intra prediction 有 35 种方向(DC + Planar + 33 angular),比 VP8 的 10 种细得多;后处理还有 SAO(Sample Adaptive Offset)和 deblocking filter,把块边界进一步抹平。这是 HEIC 能比 JPEG 小 50% 的核心。③ 多对象 / 派生项 / 网格——HEIF 不止能存"一张图",它能存"主图 + 缩略图 + 多视角图 + 派生编辑(裁剪 / 旋转 / 网格拼接)",每个对象一个 item,iloc 表查偏移。Apple 利用这个特性做"突发拍照"(把一个 burst session 的 10 张图打包成 1 个 .heic)。④ Live Photo 混合容器——iPhone 的 Live Photo 不是单文件,它是 1 张 .heic 静图(主关键帧)+ 1 段 .mov 视频(前后 1.5 秒 + 音频)的组合,iCloud 同步时把它们绑在一起作为"一个资产"管理——这是 HEIF 最被低估的工程贡献。

HEIF / HEIC has four technical layers. ① HEIF container = ISOBMFF box family — the same "box-in-box" binary format as MP4 / MOV / 3GP, every box is 4-byte size + 4-byte FourCC type + payload. Twenty years of video-industry tooling makes the spec battle-tested and trivial for Apple to adopt. ② HEVC intra-frame payload — the Coding Tree Unit can reach 64×64, much larger than JPEG's 8×8 or WebP's 16×16, so macroblock artefacts are practically invisible at the same quality; intra prediction has 35 directions (DC + Planar + 33 angular) versus VP8's 10; post-processing adds SAO (Sample Adaptive Offset) and a deblocking filter that further smooth block boundaries. That's the core reason HEIC weighs ~50 % less than JPEG. ③ Multi-item, derived items, grids — HEIF doesn't store "one image"; it stores "main image + thumbnails + multi-view images + derived edits (crop / rotate / grid-tile composition)". Each object is its own item, addressed via the iloc table. Apple uses this to pack a burst-photo session of ten images into a single .heic file. ④ Live Photo as a hybrid container — iPhone's Live Photo isn't a single file; it's a .heic still (the keyframe) + a .mov video (1.5 s before + 1.5 s after, with audio). iCloud syncs them as a bound pair, treating the combo as a single asset — HEIF's most underappreciated engineering contribution.

适用

USE FOR

iPhone / iPad 拍照默认存储(2017 iOS 11+ 至今)
iCloud 相册同步 / Apple Photos 编辑链
10-bit HDR 静态照片(P3 色域 + Dolby Vision Stills)
Live Photo 双文件混合资产
Apple 生态闭环内的高效存储与传输

iPhone / iPad default photo storage (2017 iOS 11+ onward)
iCloud Photos sync · Apple Photos edit chain
10-bit HDR stills (P3 gamut + Dolby Vision Stills)
Live Photo's two-file hybrid asset
High-efficiency storage and transfer inside Apple's walled garden

反适用

AVOID

任何需要 Web 通用兼容的场景:Chrome / Firefox 至今不支持原生 HEIC
跨平台分享 / 邮件附件:Windows、Android 默认看不了
商业项目的 Web 主图——用 WebP / AVIF
开源 / 自由软件管线——HEVC 专利费让大多数 FOSS 项目不愿意 ship 解码器

Anything that needs broad web compatibility — Chrome and Firefox still won't ship native HEIC
Cross-platform sharing / email attachments — Windows and Android won't render it by default
Web hero images for commercial projects — use WebP / AVIF instead
Open-source / libre pipelines — HEVC patent fees keep most FOSS projects from shipping a decoder

scope	browsers	tools	CLI
HEIC / HEIF	✓ Safari 17+ (macOS 14+ / iOS 17+) · ✗ Chrome · ✗ Firefox	✓ macOS Preview · Apple Photos · Windows 10+ (HEIF Image Extension paid) · Photoshop 2023+	`heif-enc -q 60 in.png -o out.heic` · `heif-dec out.heic out.png` (libheif)

奇闻 · TRIVIA

TRIVIA

HEIC 的核心专利由 三个独立专利池 各自收钱——MPEG LA(老牌组织)、HEVC Advance(2015 年单独成立的"另一个池子")、Velos Media(再分一刀)——三方互不让步,任何想商用 HEVC 的厂商必须分别签三份合同。这套"专利地狱"是 AOMedia 联盟(Google / Mozilla / Microsoft / Netflix / Cisco / Amazon / Intel / Nvidia / Samsung 等十几家)2015 年成立的直接原因——他们说"我们要做一个完全免专利费的视频编码,把 HEVC 这套专利墙绕开"。两年后 AV1 出来,再两年后基于 AV1 的 AVIF 出来——AVIF 的整个存在,本质上是 HEIC 专利问题的一个工程回应。另一段冷历史:iOS 设备有一个隐藏的"自动转码逻辑"——你把 HEIC 通过 AirDrop 发给非 Apple 设备,系统会偷偷在传输前转成 JPEG;通过邮件、消息、第三方应用导出时也常常自动转。Apple 自己心里很清楚:HEIC 是给苹果生态内部用的,不是给世界用的。

HEIC's core patents are licensed through three independent pools simultaneously — MPEG LA (the old guard), HEVC Advance (a "second pool" spun up in 2015), and Velos Media (yet another slice) — none of which would cooperate, so any vendor wanting to ship HEVC commercially had to sign three separate contracts. That patent purgatory is the direct reason the AOMedia alliance (Google, Mozilla, Microsoft, Netflix, Cisco, Amazon, Intel, Nvidia, Samsung — a dozen-plus founding members) was formed in 2015. Their explicit goal: "build a video codec with no royalties, end-run the HEVC patent wall." Two years later AV1 shipped; two years after that AVIF shipped on top of AV1 — the very existence of AVIF is, in essence, an engineering response to HEIC's patent problem. Less-told history: iOS devices contain a hidden "transcode-on-the-way-out" rule — AirDrop a HEIC to a non-Apple device and the system silently re-encodes to JPEG before transmission; the same happens for many email / messaging / third-party export paths. Apple knows perfectly well: HEIC is for inside the garden, not for the world.

←父:parents: HEVC (H.265 intra) + ISOBMFF (MP4 container family) →子:children: AVIF (same container lineage, AV1 payload swapped in to escape patents) →派生:derived: Live Photo (.heic + .mov hybrid asset)

AVIF — AV1 的副产品成了王

AVIF — A Video Codec's Side-Effect Became King

YEAR 2019 (AOMedia AV1 1.0 + AVIF spec 1.0.0) AUTHOR Alliance for Open Media (Google · Mozilla · Cisco · Apple · Netflix · Microsoft · Intel · Amazon · Nvidia · Samsung 等) EXT .avif MIME image/avif STD AOMedia AVIF 1.0.0 + AV1 (AV1 Bitstream & Decoding Process) LOSSY AV1 intra (lossy) · 也支持 lossless 模式 DEPTH 8 / 10 / 12 bit ALPHA ✓ ANIM ✓ (AVIF Sequence) STATUS 现代 Web 主推 — Chrome 85+ (2020-08) · Firefox 93+ (2021-10) · Safari 16.4+ (2023-03)

为视频生的 codec,顺手把图片格式革命了一遍。

A video codec by birth — and it casually rewrote image formats.

2018 年 3 月 AOMedia 发布 AV1 视频编码 1.0,目标是做"完全免专利费的 HEVC 替代品"——背后是 Google / Mozilla / Cisco / Apple / Netflix / Microsoft / Intel / Amazon / Nvidia / Samsung 三十多家公司组成的联盟,带着各自的专利池交叉许可。AV1 走的是同一条"视频帧内 → 静态图片"路径(WebP / HEIC 都是这条路),把 intra-frame 编码能力套进 HEIF 容器(ISOBMFF),就拿到了 AVIF (AV1 Image File Format)——体积比 HEIC 略小、专利免费、跨厂商共识、Chrome 与 Firefox 与 Safari 三大引擎都点头。AVIF 2019 年 2 月发布标准,Chrome 85 (2020 年 8 月) 落地,Firefox 93 (2021 年 10 月) 跟进,Safari 16.4 (2023 年 3 月) 收尾——HEIC 阵营在 Web 上正式退场。

In March 2018 AOMedia shipped AV1 1.0 — the goal was a "completely royalty-free HEVC alternative". The alliance behind it is 30+ companies (Google, Mozilla, Cisco, Apple, Netflix, Microsoft, Intel, Amazon, Nvidia, Samsung…) cross-licensing their patent pools to make it stick. AV1 took the same "video intra-frame → still image" route as WebP and HEIC, wrapped its intra-frame encoder in HEIF (ISOBMFF), and out came AVIF (AV1 Image File Format) — smaller than HEIC, patent-free, cross-vendor, with all three big browser engines on board. The spec landed in February 2019, Chrome 85 shipped it in August 2020, Firefox 93 in October 2021, Safari 16.4 in March 2023. On the open web HEIC was officially out.

图 12.a · AV1 superblock 是 128×128,可以递归切到最小 4×4——比 H.264 的 16×16 / HEVC 的 64×64 都更灵活,平坦区可整块保留(省 bit)、纹理区可切到 4×4(省失真),还能用 2:1 / 1:2 / 4:1 矩形切分。

Fig 12.a · An AV1 superblock is 128×128 and can recursively split down to 4×4 — more flexible than H.264's 16×16 or HEVC's 64×64. Flat regions stay at 64 / 128 (saving bits); textured regions split to 4×4 (saving distortion); rectangular partitions (2:1, 1:2, 4:1) are also available.

图 12.b · AV1 的 intra prediction 罗盘 — 56 个角度方向(粗扇 9° 步、细扇 3° 步)+ 4 种特殊模式(DC / Planar / Smooth / Paeth);对比 HEVC 35、VP8 10。方向越细,纹理预测越准,残差越小。

Fig 12.b · AV1's intra-prediction compass — 56 angular directions (coarse 9° steps, fine 3° steps) plus 4 special modes (DC / Planar / Smooth / Paeth). HEVC has 35; VP8 has 10. Finer angles mean better texture prediction and smaller residuals.

图 12.c · CfL (Chroma from Luma) — 同一块的色度直接从已编码的亮度推导,公式 C = α·Y + β,只需要 signal 一个 α(每块 4 bit 左右);β 是块内均值。色度残差因此大大缩小——这是 AV1 在低 bitrate 下色彩还能保真的关键之一。

Fig 12.c · CfL (Chroma from Luma) — chroma is derived from the already-encoded luma via C = α·Y + β. Only α needs to be signalled (≈4 bits per block); β is the chroma mean. Chroma residuals shrink dramatically — a major reason AV1 keeps colour fidelity at low bitrates.

图 12.d · 同一张 1080p 自然照片在四种 codec 下的体积与 PSNR 参考值。AVIF 用约 38% 的 JPEG 体积达到比 JPEG 高 3 dB 的客观质量;AVIF 的 95 KB 已经接近 HEIC,但有完整的 web 兼容。

Fig 12.d · The same 1080p natural photo under four codecs (indicative numbers). AVIF reaches ≈3 dB higher PSNR at ≈38 % of JPEG's bytes; the 95 KB figure matches HEIC while keeping full web compatibility.

技术内核

Technical core

AVIF 的技术深度全在 AV1 这一侧——容器只是 HEIF 的复用。① AV1 intra prediction:56 种角度方向(粗扇 9° 步 + 细扇 3° 步)+ 4 种特殊模式(DC / Planar / Smooth / Paeth)+ CfL(Chroma from Luma,色度从亮度推导)+ Palette mode(每块独立小调色板,适合 UI 截图)+ Intra Block Copy(块内自指,跟视频的"运动补偿"对偶)——比 HEVC 的 35 方向、VP8 的 10 方向都细很多。② Superblock 128×128 + 多种切分:递归切到最小 4×4,还允许 2:1 / 1:2 / 4:1 矩形,平坦区整块保留、纹理区切细。③ 变换块 16 种组合:DCT-2 / ADST(非对称离散正弦)/ WHT(Walsh-Hadamard)/ IDTX(恒等)四种变换在 H/V 两个方向独立选择,共 4×4 = 16 种组合——纹理方向不同,选不同变换效率最优。④ HEIF 容器:跟 HEIC 完全同根的 ISOBMFF box 树(ftyp 'avif' · meta · iloc · iprp · mdat),thumbnail / alpha / depth map 都是独立 item。⑤ 专利策略是它最大的非技术杀招:AOMedia 的核心承诺是"会员单位互相 royalty-free 交叉许可,且对所有人 patent non-assert"——Google / Cisco 把已有专利池贡献进来,把"做一个免费 codec"从技术问题变成了行业政治问题,并赢了。

AVIF's technical depth lives on the AV1 side; the container is just HEIF reused. ① AV1 intra prediction: 56 angular directions (coarse 9° + fine 3° steps) + 4 special modes (DC / Planar / Smooth / Paeth) + CfL (Chroma-from-Luma) + Palette mode (per-block tiny palette, great for UI screenshots) + Intra Block Copy (intra-frame self-reference, the still-image dual of motion compensation). HEVC has 35 directions; VP8 had 10. ② Superblocks at 128×128 recursively split down to 4×4, with 2:1 / 1:2 / 4:1 rectangular partitions — flat regions stay whole, textured regions split. ③ Sixteen transform-block combinations: DCT-2 / ADST (asymmetric discrete sine) / WHT (Walsh-Hadamard) / IDTX (identity) chosen independently for H and V — 4×4 = 16 combos, different texture orientations get different transforms. ④ HEIF container: the same ISOBMFF box tree as HEIC (ftyp 'avif', meta · iloc · iprp, mdat); thumbnails, alpha and depth maps live as independent items. ⑤ Patent strategy is the real masterstroke: AOMedia's binding promise is "members cross-license royalty-free; every patent the alliance touches is non-asserted against the world". Google and Cisco committed their pools, and the question of "can a free codec exist?" turned from a technical one into an industry-politics one — which they won.

图 12 · AVIF 全流程 · YUV4:2:0 → 128×128 superblock 多级切分 → 56 方向 intra 预测(+ CfL/Palette/IBC) → 16 种变换组合 → 量化(★ 唯一有损步骤,cq-level 控制狠度) → CDF 自适应算术编码 → HEIF box 包外壳 → .avif。cq-level、speed、subsample 比、bit depth 是编码器主要旋钮。

Fig 12 · The full AVIF pipeline · YUV 4:2:0 → 128×128 superblock recursive split → intra prediction with 56 angular modes (+ CfL / Palette / IBC) → one of 16 transform combinations → quantise (★ the only lossy step, governed by cq-level) → CDF adaptive arithmetic coding → HEIF box wrapper → .avif. The main knobs are cq-level, encoder speed, chroma subsample and bit depth.

历史专栏 · ALLIANCE

HISTORY · ALLIANCE

AOMedia 联盟 · 三十多家公司联手做一套免专利的 codec

The AOMedia Alliance · Three Dozen Companies Build a Royalty-Free Codec

2015 年 9 月 1 日,Google / Cisco / Mozilla / Microsoft / Netflix / Amazon / Intel 七家公司在一份联合公告里宣布成立 Alliance for Open Media,目标只有一句:做一个"完全免版税的下一代视频编码标准"——绕开 HEVC 的三大专利池(MPEG LA / HEVC Advance / Velos Media)。背景是 2014–2015 年的 HEVC 专利劫持高潮:三大池互不让步、价格不透明、跨厂家共识彻底破裂,Netflix 算了一笔账——按当时报价付 HEVC 专利费,Netflix 一年要烧掉两亿美元。所有人都明白:再让 HEVC 这套机制重演一次,Web 就要变成"每个 codec 都要单独签合同"的地狱。

AV1 的工程基础来自三个独立项目的合并:Google VP10 / Cisco Thor / Mozilla Daala。每家把自己已有的代码与专利池都贡献进来,做技术合并 + 法律承诺:成员单位之间互相 royalty-free 交叉许可,联盟对外做 patent non-assert 承诺。技术上反复打磨了三年,2018 年 3 月 28 日 AV1 1.0 正式发布——压缩率比 HEVC 还略好(Netflix 内测同质量小 30%),专利干净。AVIF 1.0.0 在 2019 年 2 月跟进,直接复用 HEIF 容器,只把 payload 从 HEVC 换成 AV1——用最小的工程代价拿到了"现代格式"的全部技术红利。

浏览器落地节奏几乎是教科书式的: Chrome 85 (2020-08) 默认开启 → Firefox 93 (2021-10) 跟进 → Safari 16.4 (2023-03) 收尾——这是十多年来第一个被 Apple 主动接住的"非 Apple 阵营"图像格式。Apple 甚至加入了 AOMedia (2018) 并把 AVIF 解码塞进 macOS / iOS 系统库,但 iPhone 拍照默认仍然存 HEIC——意思是"我用 AVIF 兼容你们的世界,我自己关起门来继续用 HEIC"。这套姿态其实正是 AVIF 胜利的标志:HEIC 阵营默认它已经赢了 Web,只在自己围墙内坚守。今天打开 Cloudflare Images / imgix / Squoosh,默认输出格式都是 AVIF;Chrome 团队 2024 年内部数据显示 AVIF 在 web 流量上的占比已经超过 5%——对于一个还不到五岁的 codec,这是历史最快的一次落地。

On 1 September 2015 seven companies — Google, Cisco, Mozilla, Microsoft, Netflix, Amazon and Intel — jointly announced the Alliance for Open Media. The mission, in one sentence: build a "completely royalty-free, next-generation video codec standard" that side-steps HEVC's three patent pools (MPEG LA, HEVC Advance, Velos Media). The backdrop was the HEVC patent crisis of 2014–2015 — three pools that wouldn't cooperate, opaque pricing, and broken cross-vendor consensus. Netflix had done the maths: paying HEVC royalties at quoted rates would have cost the company about two hundred million dollars a year. Everyone understood that letting HEVC happen again would turn the web into a hellscape where every codec needed its own per-vendor contract.

AV1's engineering base merged three independent projects: Google's VP10, Cisco's Thor, and Mozilla's Daala. Every member contributed code and patent rights into a joint pool, with two binding commitments: members cross-license royalty-free among themselves, and the alliance refuses to assert any covered patent against anyone. Three years of public iteration followed; AV1 1.0 shipped on 28 March 2018. Compression ratio came in slightly better than HEVC (Netflix's internal tests showed ~30 % smaller at the same quality), and the patent story was clean. AVIF 1.0.0 followed in February 2019, reusing HEIF as the container and just swapping HEVC for AV1 in the payload — the minimum possible engineering cost for inheriting every benefit of a "modern format".

Browser adoption played out almost textbook-perfectly: Chrome 85 (Aug 2020) shipped it by default, Firefox 93 (Oct 2021) followed, Safari 16.4 (Mar 2023) closed the loop — the first non-Apple image format Apple voluntarily caught in over a decade. Apple even joined AOMedia (2018) and shipped AVIF decoders in macOS and iOS — though iPhones still capture in HEIC. The message is "I'll use AVIF to interoperate with your world, but inside the garden it's still HEIC". That posture is the trophy: the HEIC camp has tacitly conceded the open web. Open Cloudflare Images, imgix or Squoosh today and the default output is AVIF; Chrome's 2024 internal telemetry shows AVIF is now over 5 % of all web image traffic — the fastest landing in image-codec history for something less than five years old.

codec	year	patent	1080p photo @ JPEG-Q85 quality	encode time	browser support
JPEG	1992	free (post-2007)	≈ 250 KB	1 ×	✓✓✓ universal
WebP	2010	free	≈ 165 KB	≈ 3 ×	✓✓✓ since 2020
HEIC	2015	$$$ (3 pools)	≈ 125 KB	≈ 20 ×	only Safari
AVIF	2019	free (AOMedia)	≈ 95 KB	≈ 50 ×	✓✓✓ all modern
JXL	2021	free	≈ 85 KB	≈ 5 ×	partial (Safari · Firefox flag)

$ avifenc --min 0 --max 63 -a end-usage=q -a cq-level=23 in.png out.avif   # typical Q23 — visually near-lossless
$ avifenc -j 8 -s 6 in.png out.avif                # speed 0–10 (lower=better/slower); -j threads
$ avifenc -d 10 --yuv 444 in.png out.avif          # 10-bit + 4:4:4 chroma — for HDR / design assets
$ avifdec out.avif decoded.png                     # reference decode via libavif
$ cavif --quality 80 in.png -o out.avif            # Rust CLI built on rav1e; faster preset

适用

USE FOR

现代 Web 首屏主图 / Hero 图(预编码后 CDN 分发)
体积敏感 + 质量要求高的内容图(电商、媒体、博客)
透明 PNG 替代 — 体积可省 80–95%,肉眼几乎无损
10-bit HDR 图像分发(P3 / Rec.2020 色域)
响应式 <picture> 中作为优先 source(配 WebP / JPEG fallback)

Modern web hero / above-the-fold images (pre-encoded, CDN-served)
Bandwidth-sensitive content images — e-commerce, media, blogs
Transparent-PNG replacement — 80–95 % smaller, visually identical
10-bit HDR image delivery (P3 / Rec.2020 gamut)
Top source in <picture> with WebP / JPEG fallback

反适用

AVOID

需要 IE / 老 Android(< 5.0) / 老 Safari(< 16) 兼容的场景
编码时间敏感:CI 实时构建 / 服务器实时转码 / 浏览器端用户上传
用户头像 / 缩略图等"用一次就丢"的小图(编码成本不划算)
需要无损归档的工程影像(改用 PNG / EXR / TIFF)

Anything that must run on IE, old Android (< 5.0), or old Safari (< 16)
Encode-time-sensitive paths: CI builds, on-the-fly server transcoding, browser-side user uploads
Avatars / throwaway thumbnails — encode cost outweighs the savings
Lossless engineering archives — use PNG / EXR / TIFF instead

scope	browsers	tools	CLI
AVIF · AVIF Sequence (anim)	✓✓✓ Chrome 85+ (2020-08) · Firefox 93+ (2021-10) · Safari 16.4+ (2023-03) · Edge 121+	✓✓ Photoshop 24.2+ · Figma (export only) · GIMP 2.10+ · Squoosh · Cloudflare Images · imgix	`avifenc` (libavif) · `cavif` (rav1e) · `sharp` (Node) · `ffmpeg -c:v libaom-av1`

奇闻 · TRIVIA

TRIVIA

AVIF 编码时间是 JPEG 的 ~50×,快进档(-s 10)也要 5–8×——服务端预编码 OK,浏览器实时编码(用户上传头像即转)几乎不可行。第二条:AVIF 的"lossy + alpha 替代透明 PNG"是它最大杀手锏——一张 200 KB 透明 PNG icon 转 AVIF 通常掉到 15–25 KB,肉眼对比无差,这件事在 Web 性能上是革命性的。第三条比较冷:Adobe 早期(2020–2022)的 Photoshop / Figma 都不支持读 AVIF——设计师收到客户给的 .avif 资产要先用命令行转 PNG;直到 Photoshop 24.2 (2023-02) 才原生支持。这是 AVIF 在设计圈推广的最后一公里,也是为什么不少设计交付至今仍坚持 PNG 的原因。第四条:AVIF 的解码器 dav1d(VideoLAN 出品,SIMD 极致优化)比 AOMedia 自己的参考实现快 2–3 倍——所有主流浏览器现在都用它,而不是用 AOMedia 的官方代码。

AVIF encodes are about 50× slower than JPEG; even the fast preset (-s 10) is 5–8×. Pre-encoding on the server is fine; encoding live in the browser (an avatar upload, say) is essentially infeasible. Two: AVIF's "lossy + alpha as a transparent-PNG replacement" is its killer move — a 200 KB transparent PNG icon usually drops to 15–25 KB in AVIF with no visible difference. For web performance that single trick is revolutionary. Three (more obscure): Adobe was late — Photoshop and Figma couldn't read AVIF until 2022; designers receiving .avif assets had to convert to PNG via CLI first. Photoshop only got native AVIF in 24.2 (Feb 2023). That last-mile gap is the reason a lot of design hand-offs still ship PNGs today. Four: the actual AVIF decoder shipped in browsers is dav1d (VideoLAN, hand-tuned SIMD) — 2–3× faster than AOMedia's reference implementation. Every major browser uses dav1d, not AOMedia's official code.

←父:parents: AV1 (video codec, AOMedia 2018) + HEIF / ISOBMFF (container, MPEG 2015) →致敬:tribute to: HEIC (same container lineage, AV1 swapped in to dodge HEVC patents) →派生:derived: AVIF Sequence (animation extension) ↔竞争:rival: JPEG XL (technically slightly better, politically blocked from Chrome)

JPEG XL — 被 Chrome 砍掉的"完美"格式

JPEG XL — The "Perfect" Format Chrome Killed

YEAR 2021 (ISO/IEC 18181) · 提案 2017 起 AUTHOR Cloudinary + Google Research · 主要作者 Jon Sneyers · Jyrki Alakuijala · Luca Versari · Zoltan Szabadka EXT .jxl MIME image/jxl STD ISO/IEC 18181 (Part 1: codec · Part 2: file format · Part 3: conformance · Part 4: reference software) LOSSY VarDCT 模式 · 无损: Modular 模式 · JPEG transcode 可逆 DEPTH 8 / 10 / 12 / 16 / 32 bit + 16/32-bit float (HDR / wide gamut) ALPHA ✓ (full bit depth) ANIM ✓ STATUS Apple Safari 17+ (2023-09) · Firefox flag · Chrome 移除 (2022-10) · macOS / iOS 内部全 native

技术上吊打所有人,被 Chrome 团队以"兴趣不足"砍掉。

Technically beats everyone. Chrome killed it citing "insufficient interest".

2017 年 AOMedia 已经在猛推 AVIF,但有一群人不满足:HDR 摄影师、印刷出版业、漫画 / 插画家、需要无损归档的博物馆、还有手里握着几十亿张 JPEG 资产没法迁移的所有人——AVIF 解决不了他们的问题。Cloudinary 与 Google Research 把两个独立项目(Cloudinary 的 FUIF + Google 的 PIK)合并,推出 JPEG XL,目标是做"一个能同时干完所有事的下一代格式":(a) 把现存 JPEG 文件 无损 transcode 成 JXL,体积省 ~20%,任何时候可逆向恢复原 byte-exact JPEG;(b) 现代 VarDCT lossy 编码,质量比 AVIF 略好;(c) Modular 模式做无损,比 PNG / WebP-LL 都小;(d) 真正的渐进式解码——第一段 ~1/64 数据就能显示完整的"像素化粗略图",随后几段越来越清晰;(e) 8–32 bit + float、HDR、宽色域、CMYK、高位深 alpha 全套原生。技术上几乎是"现代格式应该有的样子"的完整集成,2021 年 2 月以 ISO/IEC 18181 标准化通过——但落地之路比技术艰难得多。

By 2017 AOMedia was already pushing AVIF hard, but a constituency wasn't satisfied: HDR photographers, the print and publishing industry, comic/manga artists, archival museums, and anyone holding billions of legacy JPEGs they couldn't migrate — AVIF solved none of their problems. Cloudinary and Google Research merged two independent projects (Cloudinary's FUIF and Google's PIK) into JPEG XL with the explicit ambition of "doing all of it at once": (a) losslessly transcode existing JPEGs into JXL, ~20 % smaller, reversible to byte-exact original JPEG; (b) modern VarDCT lossy with quality slightly above AVIF; (c) Modular mode for lossless, smaller than both PNG and WebP-LL; (d) real progressive decoding — the first ≈1/64 of the bitstream already displays a complete coarse image, with subsequent segments adding detail; (e) native 8–32 bit + float, HDR, wide gamut, CMYK and high-bit-depth alpha. Technically it's the complete integration of "what a modern format should look like". ISO/IEC 18181 was published in February 2021. The path to adoption proved much harder than the engineering.

图 13.a · JXL 同时支持两条编码路径:VarDCT(可变块大小 DCT,2×2 到 256×256,lossy)和 Modular(meta-adaptive 预测器 + 通道链变换,lossless 或 near-lossless)。两条路在同一个 .jxl 容器里可以混用——一张图的不同区域可以用不同模式。

Fig 13.a · JXL ships two coding paths in one container: VarDCT (variable-size DCT blocks from 2×2 to 256×256, lossy) and Modular (meta-adaptive predictors with per-channel transform chains, lossless or near-lossless). They can be mixed within a single .jxl — different regions of an image can take different modes.

图 13.b · JPEG → JXL 无损 transcode:解码 JPEG 拿到原始 DCT 系数(不去量化、不再变换),用 JXL 更强的熵编码重新打包,体积省约 20%;djxl 反向恢复时,bit-by-bit 还原原始 .jpg。这是其它现代 codec 都做不到的事。

Fig 13.b · JPEG → JXL lossless transcode: decode the JPEG to obtain its raw DCT coefficients (no requantising, no re-transform), then re-encode with JXL's stronger entropy coder — about 20 % smaller. Run djxl to recover the original .jpg byte for byte. No other modern codec offers this.

图 13.c · JXL 的渐进式是 真分辨率渐进(spatial / DC-first)——只下载前 ~1/64 的字节(约 1.5 KB)就能渲染一张完整的低分辨率图;再下载到 1/16、1/4,逐步精化。这跟 progressive JPEG"按 DCT 频率扫描"完全不同——JPEG 的渐进结果在中途看起来是糊的,JXL 是清晰的小图。

Fig 13.c · JXL's progressive mode is true spatial / DC-first: load only the first ~1/64 of bytes (~1.5 KB) and you can render a complete, low-resolution image; load to 1/16 and 1/4 to refine. Unlike progressive JPEG (which streams DCT frequencies and looks blurry mid-load), JXL renders crisp small images at every stage.

图 13.d · 同一组测试图在 JXL 与 AVIF 下的客观 R-D 曲线(示意)。两条曲线非常接近,JXL 略上略左——同 bpp 下 PSNR / SSIM 通常高 0.5–1.5 dB,中高质量段尤明显;低 bpp 段两者 trade-off 各有胜负。技术差距远小于政治差距。

Fig 13.d · Indicative rate-distortion curves for JXL and AVIF on the same test set. The two are very close; JXL sits slightly above-left — typically 0.5–1.5 dB higher PSNR / SSIM at the same bitrate, most clearly in the mid-to-high quality range. At very low bitrates the trade-off goes either way. The technical gap is much smaller than the political gap.

技术内核

Technical core

JXL 的技术广度是当代图像格式里最大的——它把"现代图像格式应该有的所有能力"打包进同一个容器,六个核心点:① VarDCT(可变块 DCT)——块大小可在 2×2 到 256×256 之间自由变化,远比 AVIF 的 4×4–128×128 灵活;搭配 XYB(感知分离的色彩空间,JXL 自创)+ 自适应量化矩阵(可按图像内容定制),lossy 模式直接对标 AVIF。② Modular 模式——meta-adaptive 预测器(WP / Gradient / Self-correcting,可学习权重)+ 通道变换链(Squeeze / RCT / 自定义 transform),做无损或 near-lossless,小于 PNG / WebP-LL 30–50%。③ JPEG 无损 transcode(最革命性):任意 JPEG 文件解码到 DCT 系数,不再变换、不再量化,直接用 JXL 的熵编码重新打包,体积省 ~20%;djxl 反向时 byte-exact 恢复原 JPEG——这是其它 codec 全都做不到的事。④ 真渐进式解码——比特流头部就是低分辨率版本,解码器收到前 ~1/64 字节就能渲染一张完整的低分辨率图(不像 progressive JPEG 是按频率扫描,中途看起来糊);非常适合慢网。⑤ HDR / 32-bit float / wide gamut / CMYK 全原生——无需 ICC profile hack,XYB 色空间内部就支持 HDR;打印行业的高位深 + CMYK 也是一等公民。⑥ Patch 系统——对图片中重复出现的 pattern(同一个表情、漫画里反复出现的角色脸)单独编码一次,在出现位置插入引用,极大压缩漫画 / 表情包 / 截图。技术上几乎是"现代图像格式应该有的样子"的完整集成。

JXL has the broadest technical surface area of any current image format — it bundles every capability "a modern image format ought to have" into one container. Six pillars: ① VarDCT — block sizes range freely from 2×2 to 256×256, far more flexible than AVIF's 4×4–128×128. Combined with XYB (a perceptually separated colour space JXL invented) and content-adaptive quantisation matrices, lossy mode trades blow-for-blow with AVIF. ② Modular mode — meta-adaptive predictors (WP / Gradient / Self-correcting, weights learnable) plus channel-transform chains (Squeeze / RCT / custom) deliver lossless or near-lossless that's 30–50 % smaller than PNG and WebP-lossless. ③ JPEG lossless transcode (the revolutionary one): decode any JPEG into its DCT coefficients, skip requantising and re-transforming, and just re-encode with JXL's entropy coder — about 20 % smaller. djxl recovers the original JPEG byte for byte. No other codec offers this. ④ True progressive decoding — the bitstream's head is the low-resolution version. Receive the first ~1/64 of bytes and the decoder renders a complete coarse image (unlike progressive JPEG, which scans by frequency and stays blurry mid-load). Excellent for slow networks. ⑤ HDR, 32-bit float, wide gamut, CMYK all native — no ICC-profile hacks; XYB supports HDR internally; high-bit-depth + CMYK are first-class for print. ⑥ Patch system — encode a repeating pattern (an emoji, a recurring character face in a comic) once, then place references at every occurrence. Comics, sticker sheets and screenshots compress dramatically.

图 13 · JXL 三条编码路径并存:lossy 走 VarDCT + 自适应量化(★ 唯一有损步骤)、lossless 走 Modular + 预测器、JPEG transcode 直接打包 DCT 系数;三条路径都汇入 ANS(asymmetric numeral system)熵编码,最后包进 .jxl 容器。djxl 可把 transcode 路径反向恢复为 byte-exact 的原 JPEG。

Fig 13 · JXL fans out into three coding paths: lossy via VarDCT + adaptive quantisation (★ the only lossy step), lossless via Modular + predictors, JPEG transcode by repacking DCT coefficients. All three converge on ANS (asymmetric numeral system) entropy coding before being wrapped in the .jxl container. djxl reverses the transcode path back to a byte-exact original JPEG.

历史专栏 · POLITICS

HISTORY · POLITICS

JPEG XL 与 Chrome · Web codec 史上最大的政治事件

JPEG XL vs Chrome · The Biggest Political Event in Web-Codec History

2017 年,Cloudinary 的 Jon Sneyers 在维护 FUIF(Free Universal Image Format),Google Research 的 Jyrki Alakuijala 团队在做 PIK(Practical Image Coding)。两个项目目标高度重叠,2018 年 ISO/IEC JTC 1/SC 29/WG 1(老 JPEG 委员会)发起"下一代 JPEG"招标(JPEG XL),两队合并提案胜出,联合作者团队加入 Luca Versari、Zoltan Szabadka 等人,工程上由 Cloudinary 与 Google 各派人贡献到开源参考实现 libjxl。三年迭代后,2021 年 2 月 ISO/IEC 18181 正式成为标准——一个完整覆盖 lossy / lossless / JPEG transcode / HDR / 渐进 / CMYK 的下一代格式。

Chrome 团队 2021 年 4 月在 Chromium 后面悄悄加了 --enable-features=JXL flag,允许实验性使用。2022 年 10 月底,Chrome 110 的 release notes 里突然出现一行:"将在 Chrome 110 中移除 JPEG XL 实验性支持"。理由是 issue tracker 里那句被反复引用的话:"We don't see sufficient interest from the broader ecosystem to ship JPEG XL."——业界兴趣不足。Bug 1178058 帖子被冲爆:Adobe Imaging Group 的 Eric Chan(Camera Raw / Lightroom 主管)亲自回帖反对、Facebook / Intel / Krita / Serif Affinity / Cloudinary 公开站队,几个 ISO 国家代表(JPEG WG1 主席 Touradj Ebrahimi、瑞士、法国代表)联名抗议,issue 收到 280+ 条 comment、几乎一边倒反对——但 Chrome 团队回复"已决定,closed",没有改主意。技术决策被关在 Google 内部讨论,公开理由从未细化。

外界普遍把这事归结为三个未明说的原因:① 代码体积——JXL 解码器代码比 dav1d (AVIF) 大,Chrome 不愿意 ship;② 同时维护 AVIF 和 JXL 是浪费工程资源,Google 已经 all-in AVIF(自家 AOMedia + 自家投入大量人力);③ 内部的"WebP / WebP2 / AVIF 路线图"已经被 chosen,临门一脚再加 JXL 不符合既定 roadmap。Mozilla 选择中立(Firefox 把 JXL 放在 image.jxl.enabled flag 后),不主动 ship 也不主动反对。最戏剧性的转折是 2023 年 9 月 Apple Safari 17 (macOS 14 / iOS 17) 逆风官方支持 JXL——Apple 同时悄悄在 macOS / iOS 内部存储链路把 HEIC 替换成 JXL,JXL 成了 Apple 桌面照片管线的"后端实际格式"(用户看不见,但系统底层在用)。Adobe 也跟进,Camera Raw / Lightroom / Photoshop 全 native 支持 JXL,印刷 / 摄影 / 漫画社群把 JXL 当作"事实上的下一代格式"在用。JXL 在桌面 Web 上没赢,但在所有 Web 之外的领域全赢了。

这段历史的余韵:JXL 让所有人重新审视 Chrome 在 Web 标准里的"软权力"——Chrome 占 65%+ 的浏览器份额,它"不 ship"几乎等于这个格式在公开 Web 上死刑;ISO 标准在它面前没有强制力。社区里因此出现一种说法:"Web codec 的命运不由 ISO 决定,由 Chrome 决定。"另一边,Apple 与 Adobe 合力托底——只要 macOS / iOS / Photoshop / Lightroom 支持,JXL 就在专业市场永远活着,Web 不 ship 反而让 JXL 形成了"专业 vs 大众"的明确分工。社交媒体上一直流传一句话:"JPEG XL 没死,只是被 Chrome 关在了门外。"

In 2017 Cloudinary's Jon Sneyers was maintaining FUIF (Free Universal Image Format) while Jyrki Alakuijala's team at Google Research was building PIK (Practical Image Coding). The projects' goals overlapped heavily; in 2018 ISO/IEC JTC 1/SC 29/WG 1 — the original JPEG committee — opened a "next-generation JPEG" call (JPEG XL), and the merged proposal won. The joint team grew (Luca Versari, Zoltan Szabadka and many more), and Cloudinary and Google co-developed the open-source reference implementation, libjxl. Three years of iteration produced ISO/IEC 18181, formally a standard in February 2021 — one format covering lossy, lossless, JPEG transcode, HDR, progressive and CMYK end to end.

Chrome quietly added --enable-features=JXL behind a flag in April 2021. At the end of October 2022, Chrome 110's release notes carried a single line: "Will remove experimental JPEG XL support in Chrome 110." The justification, quoted from the issue tracker ever since: "We don't see sufficient interest from the broader ecosystem to ship JPEG XL." Bug 1178058 was overwhelmed: Eric Chan (Camera Raw / Lightroom lead at Adobe Imaging Group) personally replied in opposition; Facebook, Intel, Krita, Serif Affinity and Cloudinary went on record; several ISO national bodies (JPEG WG1 chair Touradj Ebrahimi, Switzerland, France) co-signed protests; the bug accumulated 280+ comments, almost unanimously against. Chrome's response: "decision made, closed." The technical deliberation happened inside Google; the public reasoning was never elaborated.

Outside observers generally pin three unspoken reasons: ① decoder binary size — JXL's decoder is larger than dav1d (AVIF), and Chrome didn't want to ship it; ② maintaining both AVIF and JXL is duplicate engineering, and Google was already all-in on AVIF (its own AOMedia, its own dedicated team); ③ Chrome's internal "WebP / WebP2 / AVIF roadmap" was already locked, and bolting JXL on at the last minute didn't fit. Mozilla took the neutral path (Firefox keeps JXL behind image.jxl.enabled), neither shipping nor opposing. The dramatic reversal came in September 2023 when Apple's Safari 17 (macOS 14 / iOS 17) officially shipped JXL against the Chrome decision — and Apple quietly started replacing HEIC with JXL inside macOS/iOS storage paths, making JXL the de-facto back-end format of the Apple Photos pipeline (invisible to users, but used by the system). Adobe followed: Camera Raw, Lightroom and Photoshop all gained native JXL. The print, photography and comics communities now treat JXL as "the de-facto next-generation format". JXL didn't win desktop web, but it won everywhere outside it.

The aftertaste of this chapter is what it revealed about Chrome's "soft power" over web standards. Chrome holds 65 %+ browser share — when it declines to ship something, that something is functionally dead on the open web; ISO standards have no enforcement power against it. A line that's circulated in the community ever since: "the fate of web codecs is not decided by ISO, it's decided by Chrome." Conversely, Apple and Adobe became JXL's safety net — as long as macOS, iOS, Photoshop and Lightroom support it, JXL lives on in professional markets forever. Chrome's refusal accidentally produced a clean "professional vs consumer" divide. The line you'll still see on social media: JPEG XL didn't die — Chrome just locked it outside the door.

feature	JPEG	WebP	HEIC	AVIF	JXL
HDR / wide gamut	✗	✗	✓	✓	✓ (XYB native)
16+ bit depth	✗	✗	partial (10/12)	✓ (10/12)	✓ (up to 32 + float)
lossless mode	nominal (10918-1 part 4)	✓	✓	✓	✓ (best in class)
JPEG recompress	—	✗	✗	✗	✓ (lossless · ~20 % smaller · reversible)
progressive	by frequency (blurry)	✗	✗	✗	✓ true spatial / DC-first
CMYK	✓	✗	✗	✗	✓ first-class
Chrome support	✓	✓	✗	✓	✗ (removed 2022-10)
Safari 17+	✓	✓	✓	✓	✓ (since 2023-09)

$ cjxl in.png out.jxl --quality 90       # quality 0–100 (≈90 visually lossless)
$ cjxl in.png out.jxl --distance 1.0     # distance: 0=lossless, ~1=Q90, ~3=Q75
$ cjxl in.jpg out.jxl --lossless_jpeg 1  # JPEG → JXL lossless transcode (~20% smaller)
$ djxl out.jxl roundtrip.jpg             # reverse transcode — byte-exact original .jpg
$ cjxl in.png out.jxl -d 0 -e 9          # lossless, max effort (smallest, slowest)

适用

USE FOR

macOS / iOS 17+ 内部存储链路(Apple Photos 后端)
摄影 / RAW 后期管线(Lightroom · Capture One 已 native)
印刷出版业(CMYK + 高位深 first-class)
HDR / wide-gamut / Dolby Vision Stills 长期归档
把现存 JPEG 资产无损迁移省 ~20% 体积(可逆)
漫画 / 表情包 / 截图(patch 系统压缩极优)

macOS / iOS 17+ internal storage pipeline (Apple Photos back-end)
Photography / RAW post pipelines (Lightroom · Capture One ship JXL natively)
Print and publishing (CMYK + high bit depth as first-class)
HDR / wide-gamut / Dolby Vision Stills long-term archives
Migrating existing JPEG libraries — ~20 % smaller, fully reversible
Comics / sticker sheets / screenshots (patch system compresses superbly)

反适用

AVOID

桌面 Chrome / Edge 主流量场景(2022-10 已移除支持)
Android 主流浏览器(WebView / Chrome 同样不支持)
实时性能敏感的服务端 / 客户端 transcoding(库还在快速演进)
需要"全 Web 兼容"的公共图床 / CDN 默认输出

Desktop Chrome / Edge mainstream traffic (support removed Oct 2022)
Android's main browsers (WebView / Chrome don't support it either)
Latency-sensitive server/client transcoding (libraries still maturing)
"Universal web compatibility" as the default CDN output

scope	browsers	tools	CLI
JPEG XL	✓ Safari 17+ (2023-09) · flag Firefox `image.jxl.enabled` · ✗ Chrome (removed 2022-10) · ✗ Edge	✓✓ Photoshop 24.2+ · Camera Raw · Lightroom · Capture One · Krita · GIMP 2.10.30+ · Affinity Photo 2 · macOS Preview / iOS Photos	`cjxl` · `djxl` (libjxl) · `sharp` (Node, libjxl-bind)

奇闻 · TRIVIA

TRIVIA

JXL 的 JPEG transcode 是它最革命性的 feature——现存几十亿张 JPEG 可以无损 transcode 成 JXL,体积省约 20%,任何时候 djxl 反向都能恢复 byte-exact 的原 JPEG;这件事其它任何现代 codec(WebP / HEIC / AVIF)都做不到,因为它们都要"重新解码 + 重新有损压缩",一定会引入新的画质损失。Cloudflare、Facebook 内部都试验过:把全站 JPEG 默默 transcode 成 JXL 存储,需要时再实时反向给"不支持 JXL 的客户端"——这可以省掉 20% 的存储成本,而且对前端完全透明。

第二条:Chrome 砍 JXL 的官方理由 "industry interest insufficient" 在 issue tracker (Chromium bug 1178058) 里被 280+ 行业人士反对,包括 Adobe Imaging Group 主管 Eric Chan(他罕见地以个人身份在 Google bug tracker 留言)、Facebook 工程师、Intel 工程师、ISO JPEG 工作组主席 Touradj Ebrahimi——但 Chrome 团队不为所动。这是 web codec 史上最大政治事件之一,也是关于"事实标准谁说了算"的一次公开教训。

第三条:Apple Safari 17 加 JXL 支持那天(2023-09-18),社交媒体一堆人说"JXL 没死,只是被 Chrome 关在门外"。Apple 还在 macOS Sonoma / iOS 17 内部把 Camera Raw 与 ProRAW 的存储链路也接上了 JXL——iPhone 拍的 ProRAW 文件在系统底层会被无损打包成 JXL,但用户看到的还是 .dng,这是个隐藏的"内部格式切换"。这种事 Apple 干过好几次(HEIC、AAC 都是同一套姿态)——表面看不出来,系统底层已经换了。

JXL's JPEG transcode is its single most revolutionary feature. Billions of existing JPEGs can be losslessly repacked into JXL, ~20 % smaller, and djxl always recovers a byte-exact original. No other modern codec (WebP, HEIC, AVIF) can do this — they all require "decode + re-encode lossy", which inevitably introduces new artefacts. Cloudflare and Facebook have both experimented with quietly transcoding their entire JPEG corpora to JXL for storage, then live-transcoding back when serving clients without JXL support — that's a transparent 20 % storage win, invisible to the front end.

Two: Chrome's official reason — "industry interest insufficient" — was rejected on Chromium bug 1178058 by 280+ industry voices, including Adobe Imaging Group lead Eric Chan (who, unusually, posted under his own name on a Google bug tracker), Facebook engineers, Intel engineers, and JPEG WG1 chair Touradj Ebrahimi. Chrome did not move. This stands as one of the largest political moments in web-codec history and a public lesson on who actually decides "de-facto standards".

Three: the day Apple Safari 17 shipped JXL (18 September 2023), social media filled up with the line "JXL didn't die — Chrome just locked it outside the door." Apple also wired JXL into macOS Sonoma / iOS 17's Camera Raw and ProRAW storage paths — ProRAW files captured on iPhone are losslessly repackaged as JXL underneath, while the user still sees a .dng. A hidden "internal-format switch" — Apple has done this several times before (HEIC, AAC follow the same playbook): the surface looks unchanged, but the back end has quietly moved.

←父:parents: Google PIK (Practical Image Coding) + Cloudinary FUIF (Free Universal Image Format) ←致敬:tribute to: JPEG (the only modern codec that can losslessly transcode it back and forth) ↔竞争:rival: AVIF (technically near-tied, politically AVIF won the open web)

PHASE II

GPU 纹理派 — 显存里的 4×4 块世界

GPU textures — the 4×4 block world in VRAM

像素现在出了浏览器,进入「显存」「采样」「屏幕」三站。这一派的核心矛盾不再是文件大小,而是 VRAM 占用 / 采样带宽 / 硬件原生支持 / mipmap 友好。Web 派追"压缩比",GPU 派追"硬件 sample 不解码"——块压缩不是一个文件格式问题,是一个内存访问问题。读完这十四章,你会理解为什么 BC7 与 ASTC 是同一时代的两个王。

The pixel has left the browser. It now passes through VRAM, sample, and screen. The defining tension shifts from file size to VRAM footprint / sample bandwidth / native hardware support / mipmap friendliness. The Web family chases compression ratio; the GPU family chases sample-without-decode — block compression is a memory-access problem, not a file-format problem. By the end you'll see why BC7 and ASTC are two co-reigning kings of the same era.

birth

→

edit

→

compress

→

transmit

→

decode

→

VRAM

→

sample

→

screen

KTX / KTX2 — 容器与 payload 的分离

KTX / KTX2 — separating container from payload

YEAR 2005 (KTX1) · 2019 (KTX2) AUTHOR Khronos Group EXT .ktx · .ktx2 MIME image/ktx2 STD Khronos KTX 1.0 / 2.0 KIND container (any GPU block-compressed payload) DEPTH payload-defined (8 / 10 / 16 / float) ALPHA payload-defined ANIM mip chain · cubemap faces · array layers STATUS KTX2 主流(glTF 2.0 / WebGPU 资产标准)

"它本身不是格式,是装格式的盒子。"

"Not a format itself — a box that holds formats."

GPU 块压缩格式(BCn / ETC2 / ASTC)的规范只规定了"4×4 像素块怎么编成几个字节",但没规定一个完整的纹理资产文件要怎么组织——mipmap 链怎么排?cubemap 的六个面怎么放?array layer 怎么索引?ICC color profile 放哪?Khronos 看不下去,做了 KTX(Khronos TeXture)当通用容器:头部 + key-value metadata + level/layer/face 的 byte-offset 索引表 + 真正的像素 payload。KTX 不关心 payload 是 BC7 还是 ASTC,只负责"把它装好、运行时一次性 upload 到 GPU"。2019 年 KTX2 加上 supercompression(用 Zstd 或 Basis Universal 把已经 GPU 压过的 payload 再压一遍),并把 mip 顺序改成 smallest-first 便于流式加载——成了 glTF 2.0 / WebGPU / Babylon / three.js 的资产事实标准。

GPU block-compression specs (BCn / ETC2 / ASTC) only define "how a 4×4 pixel block is encoded into a few bytes" — they say nothing about how a complete texture asset is laid out: how the mip chain is ordered, how the six faces of a cubemap sit together, how array layers are indexed, where the ICC colour profile lives. Khronos picked up the slack with KTX (Khronos TeXture): header + key-value metadata + a byte-offset index table for every level/layer/face + the actual pixel payload. KTX is payload-agnostic — it doesn't care whether the payload is BC7 or ASTC, it just packs the asset and lets the runtime upload it to the GPU in one go. KTX2 (2019) added supercompression — running the already-GPU-compressed payload through Zstd or Basis Universal a second time — and reversed the mip order to smallest-first so streaming loaders can swap in a low-res placeholder immediately. It is now the de-facto asset format for glTF 2.0, WebGPU, Babylon.js and three.js.

图 14 · KTX2 文件布局。前 80 字节是 header(魔数 + 格式 ID + width/height/levels/layers/faces);紧跟 index 表(每个 level 的 byte offset);DFD = data format descriptor(描述 payload 格式);KVD = key-value metadata(ICC profile / 作者信息);SGD = supercompression global data(Basis 字典);最后是按 mip level 排好的真正 payload——KTX2 反过来 smallest-first,L7(最小)在前、L0(最大)在后,流式加载时先拿小的占位。

Fig 14 · KTX2 file layout. The first 80 bytes are the header (magic + format ID + width/height/levels/layers/faces); next is the level index (a byte offset per level); DFD = data format descriptor (describes the payload's format); KVD = key-value metadata (ICC profile, author info); SGD = supercompression global data (Basis dictionary); then the actual payload, ordered by mip level. KTX2 reverses the order — L7 (smallest) first, L0 (largest) last — so streaming loaders can grab a tiny placeholder before the rest arrives.

技术内核

Technical core

KTX 的设计有四个支点。① header + index 表——文件头 80 字节,描述纹理的逻辑维度(width / height / depth / mip levels / array layers / faces);后面跟一张 level index 表,告诉 loader 第 N 级 mip 在文件内的 byte offset 和 byte length。这种"先索引后数据"的布局让 loader 不用扫整个文件就能跳读任意 level。② 每 mip level 内有 padding——GPU 上传时纹理需要按硬件对齐(通常 4 字节或 8 字节边界),KTX 直接在 file format 层面加 padding,运行时 memcpy 一行就能直接交给 glCompressedTexImage2D。③ KTX2 supercompression——这是 KTX2 相对 KTX1 最大的进化。GPU 块压缩(BC7 / ASTC)在 GPU 端是不能再压的——它们必须保持"硬件能直接 sample"的格式。但传输时(网络下载、磁盘存储)可以再用 Zstd 把字节流压一遍,运行时解压回原样再 upload。Basis Universal 更激进:它在 KTX2 里存的是一种"中间表示",运行时按目标设备转码成 BC7(桌面 D3D12 / Vulkan)、ETC2(老移动)或 ASTC(现代移动)——一个文件,所有平台。④ 多对象类型——同一份 KTX2 可以装单 2D 纹理、cubemap(6 face)、texture array(N layer)、3D 体积纹理,甚至带 mipmap 的 cubemap array(常用于 IBL 反射探针)。glTF 2.0 用 KHR_texture_basisu 扩展把 KTX2 + Basis 钉成 PBR 资产的官方携带格式。

KTX rests on four pillars. ① Header + level index — an 80-byte header describes the texture's logical dimensions (width / height / depth / mip levels / array layers / faces); then a level index lists the byte offset and byte length of every mip level. With "index first, data later" a loader can seek straight to any level without scanning the whole file. ② Padding inside each mip level — GPUs require texture rows to land on hardware-aligned boundaries (typically 4- or 8-byte). KTX bakes the padding into the file so the runtime can memcpy a row straight into glCompressedTexImage2D. ③ KTX2 supercompression — the headline upgrade over KTX1. GPU block compression (BC7 / ASTC) cannot be re-compressed on the GPU — the format has to stay "hardware-sampleable". But for transit (download, disk) the byte stream can be Zstd'd once and decompressed at load time before upload. Basis Universal goes further: KTX2 stores an intermediate representation that the runtime transcodes per-device into BC7 (desktop D3D12 / Vulkan), ETC2 (older mobile) or ASTC (modern mobile). One file, every platform. ④ Multi-object payload — a single KTX2 can carry a 2D texture, a cubemap (6 faces), a texture array (N layers), a 3D volume texture, even a mipmapped cubemap array for IBL reflection probes. glTF 2.0's KHR_texture_basisu extension nails KTX2 + Basis as the official carrier for PBR assets.

适用

USE FOR

glTF 2.0 模型纹理(KHR_texture_basisu)
WebGPU / WebGL 2 资产管线
跨平台游戏纹理(一个 .ktx2 + Basis,运行时转目标格式)
cubemap / texture array / 3D volume 纹理打包
需要流式加载的大尺寸纹理(smallest-first mip 顺序)

glTF 2.0 model textures (KHR_texture_basisu)
WebGPU / WebGL 2 asset pipelines
Cross-platform game textures (one .ktx2 + Basis, transcoded at runtime)
Cubemaps, texture arrays, 3D volumes packed into one file
Large textures that need streaming (smallest-first mip order)

反适用

AVOID

Web 主图 / 普通照片——KTX2 不是"图片格式",浏览器 <img> 不解
编辑链(Photoshop / Affinity)——这是终端纹理资产,不是工作格式
不需要 GPU 直接 sample 的场景(用 PNG / WebP)

Web hero images / regular photos — KTX2 is not an image format, <img> won't decode it
Editing chains (Photoshop / Affinity) — this is a final-asset format, not a working format
Anything that doesn't need direct GPU sampling — use PNG / WebP

scope	browsers / engines	tools	CLI
KTX2 / Basis	✗ 浏览器原生 · ✓ WebGL/WebGPU 通过 loader · ✓ Babylon.js · ✓ three.js KTX2Loader	✓ Khronos KTX-Software · NVIDIA Texture Tools Exporter · AMD Compressonator	`toktx --bcmp --t2 out.ktx2 in.png` · `basisu in.png -ktx2 -uastc`

奇闻 · TRIVIA

TRIVIA

KTX1 的 mip levels 是 largest-first(L0 最大那张排第一),因为它最早是给"一次性 upload 整张纹理"设计的——按从大到小读没毛病。KTX2 反过来 smallest-first,理由很 Web:大纹理流式下载时先到的是最小那张 mip(几十字节),立刻 upload 当占位,后面的高分辨率慢慢补——和 progressive JPEG / interlaced PNG 的思路是一回事,只不过搬到了 GPU 资产上。还有个细节:KTX2 文件名通常带 .ktx2,但某些工具(尤其老版本的 toktx)仍输出 .ktx 后缀的 KTX2 文件——靠扩展名分不清,得看魔数:0xAB 4B 54 58 20 32 30 BB 0D 0A 1A 0A 才是 KTX2(开头 «KTX 20»),KTX1 是 «KTX 11»。

KTX1's mip levels are largest-first (L0, the biggest level, comes first) because it was designed for "upload the whole texture at once" — large-to-small reads naturally. KTX2 flips it to smallest-first for very Web reasons: when a large texture streams in, the tiny mip arrives first (just dozens of bytes), gets uploaded as a placeholder, and the higher-res levels backfill — same idea as progressive JPEG or interlaced PNG, just moved to GPU assets. Another gotcha: KTX2 files usually end in .ktx2, but some tools (older toktx in particular) still emit KTX2 content under a .ktx extension. Don't trust the filename — read the magic: 0xAB 4B 54 58 20 32 30 BB 0D 0A 1A 0A ("«KTX 20»") means KTX2; "«KTX 11»" means KTX1.

←概念前辈:conceptual ancestor: DDS (the Microsoft container that came first) →装载:payloads it carries: BC1 · BC2/3 · BC4/5 · BC6H · BC7 · ETC2 · ASTC · Basis Universal ↔绑定:bound to: glTF 2.0 (KHR_texture_basisu) · WebGPU asset pipelines

DDS — DirectDraw Surface 容器

DDS — the DirectDraw Surface container

YEAR 1999 (DirectX 7.0) AUTHOR Microsoft EXT .dds MIME — (无官方 MIME · 私有格式) STD Microsoft DirectX SDK 文档(非 ISO) KIND container (DXT / BCn / 未压缩 RGBA payload) DEPTH payload-defined (8 / 16 / float16 / float32) ALPHA payload-defined ANIM mip chain · cubemap · array · volume STATUS Windows / D3D 主流 · 跨平台基本不用

"D3D 时代的 KTX,只是没人记得它先来。"

"The KTX of the D3D era — except few remember it came first."

1999 年 Direct3D 7.0 推出的时候,游戏行业急需一个"硬件能直接 sample 的纹理容器"——你不能用 BMP / TGA,因为它们是 CPU 端 RGBA,显卡读到要先解压再上传,带宽吃不住。微软干脆把 DirectDraw Surface(.dds)定义成纹理资产的标准磁盘格式:头部 124 字节描述维度 / mip 数 / pixel format / cubemap 标记,后面直接是 DXT(后来的 BCn)块或未压缩 RGBA8 字节流。Khronos 的 KTX 要 6 年后(2005)才出来。所以严格讲,"GPU 纹理容器"这个范式是微软先做的——KTX 是开放生态对它的回应。Bethesda 时代的 PC 游戏 mod 圈,几乎所有纹理替换包都是 .dds——这就是它的护城河。

When Direct3D 7.0 shipped in 1999, the games industry urgently needed "a texture container the hardware could sample directly". BMP and TGA were CPU-side RGBA — the GPU would have to decompress and re-upload before sampling, and the bus simply couldn't take it. Microsoft defined DirectDraw Surface (.dds) as the standard on-disk texture asset: a 124-byte header describing dimensions / mip count / pixel format / cubemap flags, followed by raw DXT (later BCn) blocks or uncompressed RGBA8. Khronos's KTX wouldn't appear for another six years. Strictly speaking, the "GPU texture container" idea was Microsoft's first — KTX is the open-ecosystem reply. The Bethesda-era PC modding scene (Skyrim / Fallout) shipped texture replacements almost exclusively as .dds — that's the moat that keeps DDS relevant.

图 15 · DDS 文件布局。"DDS " 4 字节魔数 → 124 字节 DDS_HEADER(width/height/mipcount/pitch/PixelFormat)→ 可选 20 字节 DX10 扩展头(用 DXGI_FORMAT 枚举,DX10+ 才能描述 BC6H/BC7) → 像素 payload。Mip 顺序与 KTX1 一致:largest-first;cubemap 按 +X/-X/+Y/-Y/+Z/-Z 顺序拼接。

Fig 15 · DDS file layout. The 4-byte "DDS " magic → a 124-byte DDS_HEADER (width / height / mipcount / pitch / PixelFormat) → an optional 20-byte DX10 extension header (DXGI_FORMAT enum — required for BC6H / BC7 on DX10+) → the pixel payload. Mip order matches KTX1 (largest-first); cubemap faces are concatenated in +X/-X/+Y/-Y/+Z/-Z order.

技术内核

Technical core

DDS 的结构简单到几乎没什么可讲的——这是它的优点。① 头部 124 字节 DDS_HEADER:固定字段描述 width / height / depth(volume 纹理用)/ mipMapCount / pitch(每行 byte 数)/ PixelFormat(老的 FourCC 字段:DXT1/DXT3/DXT5/...);加上 dwCaps / dwCaps2 标记(cubemap / volume / mip)。② DX10 扩展头 20 字节(可选):DirectX 10+ 引入的现代头,用 DXGI_FORMAT 枚举(DXGI_FORMAT_BC7_UNORM / DXGI_FORMAT_BC6H_UF16 / ...)替代 FourCC——因为新的块压缩格式(BC6H、BC7)的 FourCC 名字位不够用了。③ payload 直接是块压缩字节流——没有 padding 设计、没有 supercompression、没有 key-value metadata,只有最直接的 mip + face + layer 字节拼接。这是它跟 KTX2 最大的差距:DDS 是"足够好"的工程容器,KTX2 是"考虑到 Web / 跨平台 / Basis 转码"的现代容器。但对于 Windows / D3D 闭环,DDS 已经够用 25 年。

DDS's structure is almost embarrassingly simple — and that's its strength. ① The 124-byte DDS_HEADER: fixed fields for width / height / depth (for volume textures) / mipMapCount / pitch (bytes per row) / PixelFormat (the old FourCC field — DXT1 / DXT3 / DXT5 / …); plus dwCaps / dwCaps2 flags (cubemap / volume / mip). ② The optional 20-byte DX10 extension header: a modern header introduced in DirectX 10+ that swaps FourCC for the DXGI_FORMAT enum (DXGI_FORMAT_BC7_UNORM / DXGI_FORMAT_BC6H_UF16 / …) — necessary because newer block formats (BC6H, BC7) ran out of FourCC bits. ③ The payload is just block-compressed bytes — no padding scheme, no supercompression, no key-value metadata, just the most direct possible concatenation of mip × face × layer bytes. That's the gap with KTX2: DDS is a "good enough" engineering container, KTX2 is a modern container that thinks about Web, cross-platform delivery and Basis transcoding. For a Windows / D3D walled garden, though, DDS has been sufficient for 25 years.

适用

USE FOR

Windows PC 游戏纹理资产
D3D9 / D3D11 / D3D12 引擎(原生支持)
Bethesda / Valve / id Tech 等老牌游戏 mod 包
Unreal Engine 4 / 5 中间纹理(导入前)

Windows PC game texture assets
D3D9 / D3D11 / D3D12 engines (native support)
Bethesda / Valve / id Tech mod packs
Unreal Engine 4 / 5 intermediate textures (pre-import)

反适用

AVOID

跨平台 / Web / 移动端——用 KTX2
需要 supercompression(Zstd / Basis)的资产管线
需要 ICC color profile / 丰富 metadata 的工程影像

Cross-platform / Web / mobile — use KTX2
Pipelines that need supercompression (Zstd / Basis)
Engineering imagery that needs ICC profiles or rich metadata

scope	engines	tools	CLI
DDS	✓ D3D9–12 原生 · ✓ Unreal · ✓ Unity · ✓ Source / id Tech	✓ NVIDIA Texture Tools · DirectXTex · GIMP DDS 插件 · Photoshop NVIDIA Plug-in	`texconv -f BC7_UNORM in.png` · `nvtt_export -f bc7 -o out.dds in.png`

奇闻 · TRIVIA

TRIVIA

Bethesda 系游戏(Skyrim / Fallout 3 / 4 / 76)所有纹理 mod 都是 .dds——装一个高清材质包动辄解压几百兆 .dds 到 Data/Textures/ 目录,游戏启动时 D3D 直接 mmap 到显存。Nexus Mods 上一个流行的"Skyrim 4K 纹理包"压缩 6 GB,解压后 18 GB 全是 BC7 / BC5 编码的 DDS 文件。这个生态把 DDS 强行续命了 20 年——即使 KTX2 在技术上更好,Bethesda 的引擎不认。

Bethesda's games (Skyrim / Fallout 3 / 4 / 76) ship every texture mod as .dds — install a high-res pack and you'll dump hundreds of MB of .dds files into Data/Textures/; on launch, D3D mmaps them straight into VRAM. A popular "Skyrim 4K Textures" pack on Nexus Mods is 6 GB compressed → 18 GB unpacked, almost all BC7 / BC5 in DDS containers. This ecosystem has kept DDS alive for 20 years — even though KTX2 is technically better, Bethesda's engine doesn't speak it.

←父:parent: Microsoft DirectDraw → DirectX 7.0 (1999) →概念后裔:conceptual descendant: KTX / KTX2 (open-ecosystem reply, 6 years later) →装载:payloads: BC1 · BC2/3 · BC4/5 · BC6H · BC7 · uncompressed RGBA

BC1 (DXT1) — 4×4 块、4 bpp 的祖宗

BC1 (DXT1) — the 4×4-block, 4-bpp ancestor

YEAR 1998 (S3TC) · 2000s 进入 D3D 改名 DXT1 · 2008+ Khronos 改名 BC1 AUTHOR S3 Graphics (S3TC) → Microsoft (DXT1) → Khronos (BC1) EXT — (payload, 装在 DDS / KTX 里) MIME — STD D3D BC1_UNORM · OpenGL EXT_texture_compression_s3tc · Vulkan VK_FORMAT_BC1_RGB BLOCK 4×4 / 8 byte = 4 bpp DEPTH RGB(端点 5:6:5)+ 1-bit alpha(可选) ALPHA 1-bit (透 / 不透二选一) SAMPLE GPU 硬件原生(纹理单元直接解块) STATUS 桌面 / 主机 GPU 全平台原生 25 年

"4 个像素压成 8 字节,显存砍掉 8 倍,从此再也回不去。"

"Four pixels squeezed into eight bytes — VRAM cut 8×, no going back."

1998 年的 GPU 显存极其稀缺——NVIDIA Riva TNT 旗舰 16 MB,普通卡 8 MB,而一张 256×256 的 RGBA 纹理就要 256 KB。一个游戏关卡要几十张纹理,显存装不下,带宽更扛不住(显存带宽要支撑帧缓冲、Z-buffer、纹理 sample 三路并发)。S3 Graphics 提出 S3TC(S3 Texture Compression):把 4×4 = 16 个像素打包成 8 字节,体积压到 1/8(原 64 字节),GPU 纹理单元在 sample 时硬件解块——不需要 CPU 全图解压上传,显存里存的就是块数据。一夜之间,同样显存能装 8 倍的纹理,带宽吃掉 1/8。这是 GPU 块压缩的开山之作,定义了往后 25 年所有 BCn / ETC / ASTC 的基础范式:固定大小块 + 端点 + 内插 + 索引。

In 1998, GPU VRAM was scarce — NVIDIA's flagship Riva TNT had 16 MB, mid-range cards 8 MB. A single 256×256 RGBA texture cost 256 KB. A game level needed dozens; the VRAM couldn't hold them and the bus couldn't feed them (memory bandwidth had to serve framebuffer, Z-buffer and texture sampling at the same time). S3 Graphics proposed S3TC (S3 Texture Compression): pack 4×4 = 16 pixels into 8 bytes, an 8× shrink from the original 64 bytes; the texture unit decodes a block on the fly during sampling, so VRAM stores the compressed blocks directly without any CPU-side full-image decompression. Overnight, the same VRAM could hold 8× as many textures and the bus had to move 1⁄8 the bytes. This is the founding act of GPU block compression and it set the template every later BCn / ETC / ASTC variant follows: fixed-size block + endpoints + interpolation + per-pixel index.

图 16 · BC1 块解码。每 4×4 块 8 字节:c0 / c1 是两个 RGB565 端点(各 2 字节);c2 / c3 由 c0 / c1 线性插值算出(c2 = (2c0 + c1)/3,c3 = (c0 + 2c1)/3);剩 4 字节是 16 个像素的 2-bit index,每个像素从 c0/c1/c2/c3 四色里选一个。8 字节 / 16 像素 = 4 bpp,比未压缩 RGBA8 的 32 bpp 小 8 倍——而且 GPU 纹理单元硬件解块,sample 时 0 开销。

Fig 16 · BC1 block decode. Every 4×4 block is 8 bytes: c0 / c1 are two RGB565 endpoints (2 bytes each); c2 / c3 are linearly interpolated from c0 / c1 (c2 = (2c0 + c1)/3, c3 = (c0 + 2c1)/3); the remaining 4 bytes hold 16 × 2-bit indices, each picking one of {c0, c1, c2, c3} per pixel. 8 bytes ÷ 16 pixels = 4 bpp — 8× smaller than uncompressed RGBA8 at 32 bpp — and the GPU's texture unit decodes a block in hardware with zero per-sample cost.

技术内核

Technical core

BC1 的"4×4 块 + 端点 + 内插 + 索引"四件套是它的全部技术内核,也是后面所有 BCn / ETC / ASTC 都在改进的同一个范式。① 固定大小块——4×4,绝不可变。这是为了让 GPU 纹理单元能直接通过坐标计算定位到块,不需要扫表;sample 一个像素只需要"算块号 → 加载 8 字节 → 解端点 → 查 index → 输出颜色"四步,完全硬件实现。② 端点 + 内插——只存两个端点 c0/c1(RGB565,各 16-bit),内插出 c2/c3 让块能表达 4 种颜色。这是个赌博:它假设一个 4×4 块内的颜色变化是"沿着色空间一条直线"的,适用于大多数自然纹理(草地、石头、皮肤)但对锯齿状颜色边缘会糊。③ 2-bit/像素 index——每像素只需要 2 bit 选 4 选 1,16 像素共 32 bit = 4 byte,跟 endpoints 的 4 byte 加一起正好 8 byte 一块。④ 1-bit alpha 隐藏档——如果 c0 ≤ c1(数值上),BC1 进入"alpha 模式":c3 变成"完全透明",c2 = (c0 + c1)/2 只有一种内插;每像素的 index = 3 表示透明。这就是 BC1 的"穷人 alpha"——只有透/不透,但不占额外字节。需要平滑 alpha 必须升级 BC2 / BC3。

BC1's "4×4 block + endpoints + interpolation + index" combo is the entire technical core — every later BCn / ETC / ASTC just iterates on this same template. ① Fixed-size blocks — 4×4, immutable. This lets the GPU's texture unit address a block directly via coordinate arithmetic, no lookup needed; sampling one pixel reduces to "compute block id → load 8 bytes → decode endpoints → read index → emit colour", four steps, all hardware. ② Endpoints + interpolation — only two endpoints c0/c1 (RGB565, 16 bits each) are stored; c2/c3 are interpolated so the block expresses four colours. It's a bet: BC1 assumes the colour variation in any 4×4 block lies along a straight line in colour space. True enough for most natural textures (grass, stone, skin), but jagged colour edges blur. ③ 2 bits per pixel of index — each pixel just needs 2 bits to choose one of four colours; 16 pixels × 2 bits = 32 bits = 4 bytes, which combined with the 4 bytes of endpoints lands exactly at 8 bytes per block. ④ 1-bit alpha hidden mode — if c0 ≤ c1 numerically, BC1 enters "alpha mode": c3 becomes fully transparent, c2 = (c0 + c1)/2 is the only interpolated colour, and an index of 3 means transparent. That's BC1's "poor man's alpha" — opaque/transparent only, no extra bytes. For smooth alpha you have to step up to BC2 / BC3.

适用

USE FOR

不带 alpha 或仅需 1-bit alpha 的 RGB 纹理
老游戏 / 移动端低端设备(对带宽极度敏感)
显存预算极紧的 lightmap / 大尺寸地形纹理
BC7 不可用的旧 D3D9 / OpenGL ES 2.0 平台

RGB textures with no alpha (or 1-bit alpha at most)
Older games / low-end mobile devices (extreme bandwidth sensitivity)
Lightmaps and large terrain textures with tight VRAM budgets
Legacy D3D9 / OpenGL ES 2.0 platforms where BC7 isn't available

反适用

AVOID

需要平滑 alpha 渐变(粒子、烟雾、UI 圆角)——用 BC3 / BC7
颜色梯度细致的高质量纹理——块伪影明显
法线贴图——4-bpp 端点精度不够,用 BC5
HDR——用 BC6H

Smooth alpha gradients (particles, smoke, UI rounded corners) — use BC3 / BC7
Fine colour gradients in high-quality textures — block artefacts show
Normal maps — 4-bpp endpoint precision is too coarse, use BC5
HDR — use BC6H

scope	APIs	tools	CLI
BC1 / DXT1 / S3TC	✓ D3D 全版本 · ✓ Vulkan · ✓ Metal · ✓ OpenGL 4.2+ (ARB) / OpenGL ES 3.0+ (extension)	✓ NVIDIA Texture Tools · AMD Compressonator · texconv · Crunch (cross-platform)	`nvtt_export -f bc1 -o out.dds in.png` · `toktx --t2 --bcmp out.ktx2 in.png`

奇闻 · TRIVIA

TRIVIA

S3TC 的核心专利由 S3 Graphics 持有,2017 年才过期。期间 OpenGL 标准里只能放 EXT_texture_compression_s3tc(扩展状态),Khronos 不敢把它升级成 ARB / core——专利持有人没授权。这导致 Linux 桌面图形栈很长时间默认禁用 S3TC,游戏(尤其 Steam Proton 跑 Windows 游戏)纹理直接渲染成灰块或全白。MESA 项目长期靠"libtxc-dxtn"这种第三方库偷偷支持。直到 2017 年 10 月专利过期,Khronos 才把 S3TC 正式升级成 ARB_texture_compression_s3tc 进入 OpenGL 4.2 core——晚了 19 年。BC1 不带 alpha,非要带就只有 1-bit(透/不透二选一)——这是 BC2 / BC3 / BC7 出生的直接原因,也是为什么"alpha 平滑度"成了块压缩格式演化的主要驱动力之一。

S3TC's core patent was held by S3 Graphics until 2017. While the patent stood, the OpenGL spec could only carry EXT_texture_compression_s3tc (extension status); Khronos didn't dare promote it to ARB / core — the patent holder had not granted a licence. As a consequence, the Linux desktop graphics stack disabled S3TC by default for years, and games (especially Windows games running under Steam Proton) often rendered textures as grey blocks or pure white. The MESA project quietly leaned on a third-party library called "libtxc-dxtn" to keep things working. Only in October 2017, after the patent finally expired, did Khronos promote S3TC to ARB_texture_compression_s3tc in OpenGL 4.2 core — 19 years late. BC1 carries no alpha; the most you can squeeze in is 1-bit (transparent/opaque). That's the direct reason BC2 / BC3 / BC7 had to exist, and one of the major drivers of the "smoother alpha" arc through every later block format.

←父:parent: S3 Graphics · S3TC (1998, the very first hardware-decodable block format) →子:descendants: BC2 · BC3 · BC4 · BC5 · BC6H · BC7 · ETC1 / ETC2 · ASTC ↔同辈:sibling lineage: ETC family (mobile / Khronos) — same template, different endpoint scheme

BC2 / BC3 (DXT3 / DXT5) — alpha 处理两条路

BC2 / BC3 (DXT3 / DXT5) — two ways to handle alpha

YEAR 1999 (DirectX 7.0) AUTHOR S3 Graphics → Microsoft (DXT3 / DXT5) → Khronos (BC2 / BC3) EXT — (payload, 装在 DDS / KTX 里) MIME — STD D3D BC2_UNORM · BC3_UNORM · GL EXT_texture_compression_s3tc BLOCK 4×4 / 16 byte = 8 bpp(BC1 的两倍) DEPTH RGBA(端点 5:6:5 + 4-bit / 8-bit alpha) ALPHA BC2 = 显式 4-bit · BC3 = 内插 8-bit SAMPLE GPU 硬件原生 STATUS 老但仍用 · 现代项目首选 BC7

"BC2 给你显式 4-bit alpha,BC3 让 alpha 也学 BC1 的内插。"

"BC2 gives explicit 4-bit alpha; BC3 lets alpha use the BC1 trick too."

BC1 的 1-bit alpha(透 / 不透)对游戏 UI 圆角、粒子边缘、烟雾、玻璃、毛发都不够——这些都需要平滑的 alpha 渐变(0 到 255 中间的值)。S3 / Microsoft 在 1998-1999 同时提出 DXT3 和 DXT5 两条路:DXT3(BC2)粗暴,每像素直接给 4-bit alpha,16 像素共 64 bit = 8 byte;再加 BC1 的 8 byte 颜色块,共 16 byte/块,8 bpp。DXT5(BC3)聪明,把 alpha 也当成"端点 + 内插"块——存 2 个 8-bit alpha 端点 + 6 个内插值(共 8 种 alpha) + 每像素 3-bit index;颜色块仍用 BC1 那套。两者体积一样(16 byte/块),但 BC3 在平滑 alpha 渐变(粒子、烟雾)上明显好,BC2 在锐利 alpha 边缘(UI 图标的 1-bit-like alpha)上略好——但实践中 BC3 几乎全胜。所以游戏圈 BC3 / DXT5 才是事实主流。

BC1's 1-bit alpha (opaque or transparent, nothing in between) wasn't enough for game UI rounded corners, particle edges, smoke, glass or hair — all of those need smooth alpha gradients (values between 0 and 255). S3 / Microsoft proposed DXT3 and DXT5 in 1998-1999, two roads. DXT3 (BC2) is brute force: store an explicit 4-bit alpha per pixel; 16 pixels × 4 bits = 64 bits = 8 bytes; plus the 8-byte BC1 colour block, total 16 bytes per block at 8 bpp. DXT5 (BC3) is clever: treat alpha as an "endpoints + interpolation" block too — 2 × 8-bit alpha endpoints + 6 interpolated values (8 alpha levels in total) + a 3-bit index per pixel; the colour block still uses BC1. Both occupy the same 16 bytes per block, but BC3 clearly wins on smooth alpha gradients (particles, smoke); BC2 has a slight edge on razor-sharp alpha edges (UI icons that are basically 1-bit alpha). In practice BC3 wins almost everywhere — so the games industry treats BC3 / DXT5 as the de-facto default.

图 17 · BC2 vs BC3 的 alpha 块对比。BC2 上半:每像素直接 4-bit alpha(0-15),共 8 byte,无端点无内插——alpha 边缘锐利但只有 16 阶,平滑渐变会出阶梯。BC3 下半:存 2 个 8-bit alpha 端点 + 6 个内插值(共 8 阶) + 每像素 3-bit index;颜色块两者都用 BC1 那 8 byte——总都是 16 byte/块。

Fig 17 · BC2 vs BC3 alpha-block contrast. BC2 (top): 4-bit alpha per pixel (0-15), 8 bytes total, no endpoints, no interpolation — sharp alpha edges but only 16 quantisation levels, so smooth gradients banding-stair. BC3 (bottom): 2 × 8-bit alpha endpoints + 6 interpolated values (8 levels total) + 3-bit per-pixel index; both formats use the same BC1 colour block (8 bytes), so each ends up at 16 bytes per block.

技术内核

Technical core

两个格式的核心差异全在 alpha 块。① 都是 16 byte/块,8 bpp——BC1 的颜色块 8 byte 不变,各加 8 byte 的 alpha 块。颜色端点和 BC1 一样:c0/c1 RGB565 + 内插 c2/c3 + 2-bit index——没区别。② BC2 的 alpha 块 = 16 个 4-bit 直接值——每像素 0-15 表示 alpha 量化到 16 阶。优点:对锐利 alpha 边界(UI 图标、纹理掩码)无量化误差;缺点:平滑渐变只有 16 阶,会出 banding。BC2 在 1999 年被一些早期 UI 系统用过,后来逐渐让位给 BC3。③ BC3 的 alpha 块 = BC1 alpha 化——存 2 个 8-bit alpha 端点 a0/a1(各 1 byte = 2 byte),如果 a0 > a1 用 6 个 1/7 步长内插值(共 8 阶),如果 a0 ≤ a1 用 4 个内插值 + 2 个保留(0 和 255 的硬端点)= 8 阶里有 2 个固定;每像素 3-bit index(16 px × 3 bit = 48 bit = 6 byte)。共 2+6 = 8 byte。BC3 在平滑 alpha(粒子、烟雾、毛发)上明显优于 BC2,代价是锐利 alpha 边缘会有轻微模糊。④ 命名混乱:游戏圈一般叫 DXT3 / DXT5(D3D 老命名),Khronos / Vulkan / Metal 一般叫 BC2 / BC3——同一个东西两套名字,是 OpenGL 和 D3D 命名分歧的活化石。

The whole difference between the two lives in the alpha block. ① Both are 16 bytes per block, 8 bpp — the BC1 colour block (8 bytes) is unchanged; each format adds an 8-byte alpha block. Colour endpoints, c2/c3 interpolation and 2-bit indices are identical to BC1 — no surprises there. ② BC2's alpha block = 16 explicit 4-bit values — each pixel quantises alpha to one of 16 levels. Pro: zero quantisation error on sharp alpha edges (UI icons, masks). Con: only 16 levels, so smooth gradients band. BC2 saw use in some early-2000s UI systems and then quietly handed the baton to BC3. ③ BC3's alpha block = BC1, applied to alpha — store 2 × 8-bit alpha endpoints a0/a1 (1 byte each = 2 bytes); if a0 > a1, interpolate 6 values at 1/7 steps (8 levels total); if a0 ≤ a1, interpolate 4 values + reserve two slots for hard 0 and 255 (2 of the 8 levels are fixed); 3-bit index per pixel (16 × 3 = 48 bits = 6 bytes). Total 2 + 6 = 8 bytes. BC3 clearly beats BC2 on smooth alpha (particles, smoke, hair), at the cost of slightly fuzzier sharp alpha edges. ④ Naming chaos: the games industry says DXT3 / DXT5 (D3D legacy); Khronos / Vulkan / Metal say BC2 / BC3 — same thing, two name systems, a living fossil of the OpenGL-vs-D3D naming split.

适用

USE FOR

BC2 → 锐利 alpha 边缘的 UI 图标、纹理掩码
BC3 → 平滑 alpha 渐变的粒子、烟雾、毛发、玻璃、UI 圆角
需要 alpha 但 BC7 不可用的旧平台(D3D9 / GL ES 2.0)

BC2 → sharp alpha edges (UI icons, texture masks)
BC3 → smooth alpha gradients (particles, smoke, hair, glass, UI rounded corners)
Anything that needs alpha on legacy platforms where BC7 isn't available (D3D9 / GL ES 2.0)

反适用

AVOID

现代项目(2015+)——BC7 在质量上完全替代,体积一样
法线贴图——用 BC5
HDR——用 BC6H

Modern projects (2015+) — BC7 fully replaces both at the same size with better quality
Normal maps — use BC5
HDR — use BC6H

scope	APIs	tools	CLI
BC2 / BC3	✓ D3D 全版本 · ✓ Vulkan · ✓ Metal · ✓ OpenGL 4.2+	✓ NVIDIA Texture Tools · AMD Compressonator · texconv · Crunch	`nvtt_export -f bc3 -o out.dds in.png` · `texconv -f BC3_UNORM in.png`

奇闻 · TRIVIA

TRIVIA

BC3 在游戏圈从来没人叫"BC3",大家叫 DXT5——同一个东西两个名字,是 OpenGL(Khronos)和 D3D(Microsoft)命名分歧的活化石。微软在 1999 年用 DXT1-5 命名(DXT2 / DXT4 是带 premultiplied alpha 的变体,几乎没人用),Khronos 后来重新归类为 BC1-7。结果游戏开发者写 shader 注释还是 DXT5,但 Vulkan / Metal API 调用时必须写 VK_FORMAT_BC3_UNORM_BLOCK——同一份代码两种语言。这种"老名字活在文档,新名字活在 API"的分裂在图形圈非常常见。

In the games industry no one calls BC3 "BC3" — everyone says DXT5. Two names for one thing, a living fossil of the OpenGL (Khronos) vs D3D (Microsoft) naming split. Microsoft used DXT1-5 in 1999 (DXT2 / DXT4 were premultiplied-alpha variants, basically unused). Khronos later reorganised the family as BC1-7. Game developers still write shader comments in the DXT5 dialect, but Vulkan / Metal API calls have to spell VK_FORMAT_BC3_UNORM_BLOCK — same code, two languages. This "old name lives in the docs, new name lives in the API" split is everywhere in graphics.

←父:parent: BC1 (alpha block bolted onto the 8-byte colour block) →现代替代:modern replacement: BC7 (same 8 bpp, much better quality, all-in-one mode) ↔兄弟:sibling: BC4 / BC5 (took the alpha block and re-purposed it for single / dual channels)

BC4 / BC5 — 单/双通道,法线贴图省一通道

BC4 / BC5 — single / dual channel, dropping a channel from normal maps

YEAR 2007 (DirectX 10) AUTHOR Microsoft / Khronos EXT — (payload, 装在 DDS / KTX 里) MIME — STD D3D BC4_UNORM / BC5_UNORM · GL ARB_texture_compression_rgtc · Vulkan VK_FORMAT_BC4/5_* BLOCK 4×4 / BC4 = 8 byte (4 bpp) · BC5 = 16 byte (8 bpp) DEPTH BC4 = 1 通道 8-bit · BC5 = 2 通道 8-bit ALPHA — (没有 alpha 概念,通道任意) SAMPLE GPU 硬件原生 STATUS 法线贴图行业标准 · 灰度/高度图标准

"法线贴图省一通道,显存再砍一半。"

"Drop a channel from normal maps; halve the VRAM again."

游戏图形里,法线贴图是仅次于 albedo 的第二大显存消耗——每个像素一个法线向量(X, Y, Z)。直觉上要 RGB 三通道,但法线是单位向量(长度 = 1),所以 Z 可以由 X / Y 推导出来:Z = sqrt(1 - X² - Y²)。这意味着实际只需要存 X / Y 两个通道,Z 在 fragment shader 里现算。BC5 就是为这个场景设计的——只存 R / G 两通道,每个通道用 BC3 的 alpha 块法(端点 + 内插 + 3-bit index),共 16 byte/块、8 bpp。BC4 是 BC5 的"半个版本",只存一个通道,用于灰度纹理:高度图、roughness 图、AO 遮罩、metallic 通道。BC4 / BC5 的本质是"把 BC3 的 alpha 块单独拎出来当颜色通道用"——这种"通道拆分 + 几何内插"的思路把法线贴图的显存占用从 BC3 RGB 的 8 bpp 直接砍到 BC5 RG 的 8 bpp 但质量提升 3-5×(因为不浪费 bits 在不需要的通道上)。

In game graphics, normal maps are the second-largest VRAM hog after albedo — every pixel stores a normal vector (X, Y, Z). Intuitively that means three RGB channels, but a normal is a unit vector (length 1), so Z can be derived: Z = sqrt(1 − X² − Y²). You really only need to store X / Y; the fragment shader recomputes Z. BC5 is built for exactly that — store just R / G, each compressed with the BC3 alpha-block trick (endpoints + interpolation + 3-bit index), 16 bytes per block at 8 bpp. BC4 is the "half-version" of BC5: just one channel, for greyscale textures — height maps, roughness maps, AO masks, the metallic channel. BC4 / BC5 are essentially "BC3's alpha block lifted out and used as a colour channel". This "channel split + geometric interpolation" trick keeps normal maps at 8 bpp (same as BC3 RGB) but bumps quality 3-5× because no bits are wasted on a channel you don't need.

图 18 · 法线贴图从 RGB 3 通道压缩到 BC5 2 通道,Z 在 shader 里用 z = √(1 − x² − y²) 现算。同样 8 bpp,BC5 因为不浪费 bits 在 B 通道,X / Y 各得到更多 endpoint 精度,法线质量提升 3-5×。BC4 是 BC5 的半通道版,用于 roughness / AO / 高度图等单通道纹理。

Fig 18 · A normal map shrinks from 3-channel RGB to 2-channel BC5, with Z reconstructed in-shader as z = √(1 − x² − y²). Same 8 bpp budget — but because BC5 doesn't waste bits on the B channel, X / Y each get more endpoint precision, lifting normal-map quality 3-5×. BC4 is the half-channel version of BC5, used for single-channel textures (roughness, AO, height maps).

技术内核

Technical core

BC4 / BC5 的设计思路简洁到一句话:把 BC3 的 alpha 块当成"通用的单通道压缩块"用。① BC4 = BC3 的 alpha 块独立——4×4 块,8 byte;存 2 个 8-bit 端点 r0 / r1(2 byte)+ 6 个内插值(隐含,不占字节,运行时算)+ 每像素 3-bit index(16 × 3 = 48 bit = 6 byte);共 8 byte / 16 像素 = 4 bpp。每像素只有一个 8-bit 通道(原数据的 R)。② BC5 = 两个 BC4 块叠加——一个 BC4 块存 R(法线 X),一个 BC4 块存 G(法线 Y);共 16 byte / 块 = 8 bpp。Z 不存,fragment shader 里算 z = sqrt(1 - x*x - y*y)——单 sqrt + 2 mul + 1 sub,GPU 一周期完成。③ BC4 的"unsigned"和"signed"两种模式:BC4_UNORM(0-255)和 BC4_SNORM(-128 到 127),后者专门给法线分量这种"中心对称"信号用,避免 0.5 偏置。BC5 同理。④ 命名又分裂:Khronos 叫 BC4 / BC5,Microsoft 老命名叫 ATI1 / ATI2(AMD 提出的格式名),OpenGL ARB 扩展叫 RGTC1 / RGTC2(Red-Green Texture Compression)——三套名,一个东西。游戏引擎源码里三种叫法都能见到。

BC4 / BC5's design boils down to one sentence: take BC3's alpha block and reuse it as a generic single-channel compression block. ① BC4 = BC3's alpha block, standalone — 4×4 block, 8 bytes; 2 × 8-bit endpoints r0 / r1 (2 bytes) + 6 implicit interpolated values (computed at runtime, no bytes spent) + 3-bit per-pixel index (16 × 3 = 48 bits = 6 bytes); total 8 bytes / 16 pixels = 4 bpp. Each pixel carries one 8-bit channel (the input's R). ② BC5 = two BC4 blocks stacked — one BC4 block for R (normal X), one for G (normal Y); 16 bytes per block = 8 bpp. Z isn't stored — the fragment shader computes z = sqrt(1 − x*x − y*y), one sqrt + two muls + one sub, retired in a single GPU cycle. ③ BC4 has UNORM and SNORM modes — BC4_UNORM (0-255) and BC4_SNORM (−128 to 127); the signed variant is specifically for centre-symmetric signals like normal components, avoiding a 0.5 bias. BC5 mirrors this. ④ Naming forks again: Khronos says BC4 / BC5; Microsoft's legacy names are ATI1 / ATI2 (AMD-coined names); the OpenGL ARB extension calls them RGTC1 / RGTC2 (Red-Green Texture Compression). Three names, one thing — and you'll see all three in any sufficiently old engine source tree.

适用

USE FOR

BC5 → 法线贴图(行业标准,Unreal / Unity / id Tech 默认)
BC4 → roughness / metallic / AO / 高度图等单通道
SDF(Signed Distance Field)字体纹理(BC4)
需要 R / G 双通道但不需要 B 的任何场景

BC5 → normal maps (industry standard — Unreal / Unity / id Tech default)
BC4 → single-channel data: roughness / metallic / AO / height maps
SDF (Signed Distance Field) font textures (BC4)
Anything that needs R / G but not B

反适用

AVOID

需要 RGB 三通道的彩色纹理(用 BC1 / BC7)
HDR(用 BC6H)

3-channel colour textures (use BC1 / BC7)
HDR (use BC6H)

scope	APIs	tools	CLI
BC4 / BC5	✓ D3D10+ · ✓ Vulkan · ✓ Metal · ✓ OpenGL 4.0+ (RGTC)	✓ NVIDIA Texture Tools · AMD Compressonator · texconv · Unreal / Unity 自动用	`nvtt_export -f bc5 -o normal.dds normal.png` · `texconv -f BC5_UNORM normal.png`

奇闻 · TRIVIA

TRIVIA

"法线贴图省 B 通道、shader 里算 Z"这个技巧不是 BC5 发明的——它在 2003 年的 Half-Life 2 引擎里就用了:Valve 把法线 RGB 中的 R / G 存进 DXT5 的 alpha 块和某个颜色通道,Z 在 shader 里算。这种"魔改 DXT5 当 BC5 用"的 hack 在 BC5 正式发布(2007)前流行了 4 年。BC5 出来后大家终于不用 hack 了——但很多老游戏的源码里还能看到"DXT5nm"这种命名,意思是"用 DXT5 装的法线贴图"(nm = normal map)。这是 GPU 历史里"工程实践推动标准"的典型案例:厂商先 hack,Khronos 后追认。

The "drop B, recompute Z in-shader" trick wasn't invented by BC5 — Valve's Half-Life 2 engine was doing it back in 2003: pack the R / G channels of a normal into DXT5's alpha block and one colour channel, derive Z in shader. That "abuse DXT5 as a poor man's BC5" hack was the norm for four years before BC5 shipped officially in 2007. Once BC5 arrived the hack went away — but in old game source trees you'll still find a "DXT5nm" naming convention ("DXT5 normal map"). It's a textbook case of GPU history: vendors hack first, Khronos blesses it later.

←父:parent: BC3 (alpha block lifted out and reused as a generic single-channel block) →高质量替代:higher-quality alternative: BC7 (for normal maps that demand the very best, at 8 bpp like BC5) ↔同辈别名:sibling aliases: ATI1 / ATI2 (AMD names) · RGTC1 / RGTC2 (OpenGL ARB)

BC6H — HDR 块压缩

BC6H — HDR block compression

YEAR 2011 (DirectX 11) AUTHOR Microsoft / Khronos (BPTC family) EXT — (payload, 装在 DDS / KTX 里) MIME — STD D3D BC6H_UF16 / SF16 · GL ARB_texture_compression_bptc · Vulkan VK_FORMAT_BC6H_* BLOCK 4×4 / 16 byte = 8 bpp DEPTH float16 RGB · 范围 [-65504, 65504] ALPHA — (无 alpha,HDR 场景一般不需要) SAMPLE GPU 硬件原生 STATUS HDR cubemap / IBL 反射探针行业标准

"显存里的 HDR — 反射探针、cubemap 全靠它。"

"HDR in VRAM — reflection probes and cubemaps depend on it."

PBR(基于物理的渲染)需要 HDR 环境贴图——天空、室内 IBL 反射探针、自发光场景全是。问题是 BC1-5 都基于 8-bit/通道端点 + 内插,根本无法表达 float16 的 [-65504, 65504] 范围。如果用未压缩 RGBA16F,一张 1024×1024 的 cubemap(6 面)要 1024×1024×6×8 = 48 MB。一个室外场景几张 cubemap 几百 MB 就没了。BC6H 是 D3D11 时代专门为 HDR 设计的块压缩:4×4 块、16 byte/块、8 bpp(跟 BC7 同尺寸),但 payload 直接是 float16 RGB(无 alpha)。它用 14 种块模式来权衡精度——根据这块的颜色分布选最合适的模式。BC6H 让 HDR cubemap 体积从 RGBA16F 的 64 bpp 砍到 8 bpp(8× 压缩),同时保持 float16 的动态范围——这是 PBR 渲染管线得以普及的硬件基础。Unreal Engine 4 / 5、Unity HDRP 默认对 cubemap 的 HDR 资产用 BC6H。

PBR (physically based rendering) needs HDR environment maps — skies, indoor IBL reflection probes, emissive scenes all live in HDR. The trouble is that BC1-5 all rely on 8-bit-per-channel endpoints + interpolation, so they simply cannot express float16's [−65504, 65504] range. Uncompressed RGBA16F would cost 1024 × 1024 × 6 × 8 = 48 MB for a single 1024² cubemap (six faces); an outdoor scene with a handful of cubemaps blows past hundreds of MB. BC6H is the D3D11-era block format built specifically for HDR: 4×4 block, 16 bytes per block, 8 bpp (same size as BC7), but the payload is float16 RGB (no alpha). Its trick is 14 block modes that trade off precision differently — the encoder picks the mode best suited to that block's colour distribution. BC6H takes HDR cubemaps from RGBA16F's 64 bpp down to 8 bpp (8× compression) while keeping float16's dynamic range. That's the hardware foundation that lets PBR pipelines exist at scale today. Unreal Engine 4 / 5 and Unity HDRP default to BC6H for HDR cubemap assets.

图 19 · BC6H 的 14 种块模式表(每模式有不同的端点 bit 分配 / 是否分区 / index bit 数)。右侧:float16 的 [-65504, +65504] 数值范围,远超 BC1-7 的 0-1 LDR 区间;BC6H 是 GPU 唯一原生 HDR 块压缩,把 HDR cubemap 的体积从 RGBA16F 的 64 bpp 砍到 8 bpp。UF16 / SF16 区分无符号 / 有符号变体。

Fig 19 · BC6H's 14 block modes (each with different endpoint bit allocations, partition counts and index bit widths). Right: float16's [−65504, +65504] range, far beyond the 0-1 LDR range BC1-7 are limited to. BC6H is the GPU's only native HDR block-compression format, dropping HDR cubemaps from RGBA16F's 64 bpp to 8 bpp. UF16 / SF16 distinguish unsigned and signed variants.

技术内核

Technical core

BC6H 跟 BC1-5 不是同一类设计——它没有"统一 4 端点 + 内插"的简洁结构,而是 14 种块模式让编码器按块的颜色分布挑最优解。① 14 种块模式——每种模式给端点不同 bit 数(如 10-10-10、7-6-6-6、11-5-4-4 等三/四个分量)、是否启用 2 分区(把 4×4 块拆成两组,每组独立端点 + 内插,适用于块内有明显颜色边界的情况)、index 用 3-bit 还是 4-bit。编码器对每个块尝试多种模式,挑 PSNR 最高那个塞进 16 byte。② 端点用 float16 表示——这是 BC6H 区别于所有其他 BC 的核心。BC1-5 的端点是定点整数(RGB565 或 8-bit),只能表示 0-1;BC6H 的端点是浮点,可以表示 [-65504, 65504]——HDR 高光、太阳直射、自发光物体的真实数值都能装进去。③ UF16 (unsigned) vs SF16 (signed)——UF16 范围 [0, 65504],适合不会有负值的 HDR 颜色;SF16 范围 [-65504, 65504],适合可能有负值的 HDR 法线或其他工程数据。④ 4×4 块仍只 16 byte——这是工程上最重要的一点:BC6H 跟 BC7 一样是 8 bpp,HDR 的体积成本只比 LDR 多 1×(BC1 是 4 bpp,BC7 / BC6H 都是 8 bpp)。这个"HDR 不贵"的承诺让 IBL 反射探针 / cubemap 的大规模使用成为可能——Unreal Engine 默认每个室外场景烘焙几十张 BC6H cubemap。

BC6H isn't built like BC1-5 — there's no clean "two endpoints + interpolation" template. Instead, 14 block modes let the encoder pick the best fit for that block's colour distribution. ① 14 block modes — each mode allocates different bit counts to the endpoints (e.g. 10-10-10, 7-6-6-6, 11-5-4-4, three or four components), optionally enables 2-partition mode (split the 4×4 block into two regions, each with its own endpoints + interpolation, which helps when a block has a sharp colour boundary), and uses 3- or 4-bit indices. The encoder tries multiple modes per block and packs whichever maximises PSNR into the 16-byte block. ② Endpoints expressed as float16 — this is the one thing that sets BC6H apart from every other BCn. BC1-5 endpoints are fixed-point integers (RGB565 or 8-bit) capped at 0-1; BC6H endpoints are floating point and can express [−65504, 65504] — the actual numerical range of HDR highlights, direct sun, emissive surfaces. ③ UF16 (unsigned) vs SF16 (signed) — UF16's range is [0, 65504], suitable for non-negative HDR colour; SF16's is [−65504, 65504], suitable for HDR normals or other engineering data that may go negative. ④ 4×4 block, still just 16 bytes — and this is the most important engineering fact: BC6H is 8 bpp, the same as BC7. HDR costs only 1× more bytes than LDR (BC1 is 4 bpp, BC7 / BC6H are 8 bpp). That "HDR isn't expensive" promise is what makes large-scale IBL reflection probes and HDR cubemaps practical — Unreal Engine routinely bakes dozens of BC6H cubemaps per outdoor scene.

适用

USE FOR

HDR cubemap(天空盒、IBL 反射探针)
烘焙的 lightmap HDR 部分
HDR 自发光纹理(霓虹灯、屏幕、火焰)
volumetric 体积纹理(雾 / 云,需要 HDR 强度)

HDR cubemaps (skyboxes, IBL reflection probes)
HDR portions of baked lightmaps
HDR emissive textures (neon, screens, flames)
Volumetric textures (fog / clouds — need HDR intensity)

反适用

AVOID

LDR 纹理(用 BC7,质量更好且支持 alpha)
需要 alpha 的 HDR(BC6H 不支持 alpha)
D3D10 及以下平台(BC6H 是 D3D11+)
移动 GPU 早期型号(看 BPTC / ASTC HDR 支持情况)

LDR textures (use BC7 — better quality and supports alpha)
HDR that needs alpha (BC6H has no alpha channel)
D3D10 and earlier (BC6H requires D3D11+)
Older mobile GPUs (check BPTC / ASTC HDR support)

scope	APIs	tools	CLI
BC6H	✓ D3D11+ · ✓ Vulkan · ✓ Metal (macOS / iOS Apple Silicon) · ✓ OpenGL 4.2+ (BPTC)	✓ NVIDIA Texture Tools · AMD Compressonator · texconv · ISPC bc6h_enc	`nvtt_export -f bc6h -o sky.dds sky.exr` · `texconv -f BC6H_UF16 sky.exr`

奇闻 · TRIVIA

TRIVIA

BC6H 是 GPU 唯一原生 HDR 块压缩格式——这个"唯一"很关键。在它之前(D3D10 时代)做 HDR 反射探针只能用未压缩 RGBA16F(64 bpp,体积爆炸)或 RGBM / RGBE 这种"把 HDR 压回 LDR + exponent 通道"的 hack;hack 方案 sample 时还要 shader 解码,精度损失大。BC6H 出来后,Unreal Engine 4(2014)立刻把 cubemap 默认改成 BC6H,Unity HDRP 跟进。手机端的对应物是 ASTC HDR(同期发布,Khronos 推),格式不同但定位一样。BC6H 的 14 种块模式编码极其复杂——一个高质量 BC6H 编码器(Intel 的 ISPC bc6h_enc)要对每个 4×4 块跑数十次模式尝试,所以离线烘焙一张 4K HDR cubemap 可能要几分钟。这就是为什么游戏构建系统通常把 BC6H 烘焙单独抽出来跑——它跟 BC1 / BC7 不在一个量级。

BC6H is the GPU's only native HDR block-compression format — and that "only" matters. Before it shipped (D3D10 era), HDR reflection probes had to use uncompressed RGBA16F (64 bpp, exploding asset sizes) or hacks like RGBM / RGBE that smuggled HDR back into LDR + an exponent channel; the hacks needed a shader decode at sample time and lost precision badly. Once BC6H landed, Unreal Engine 4 (2014) immediately switched cubemaps to BC6H by default, and Unity HDRP followed. The mobile counterpart is ASTC HDR (released around the same time, pushed by Khronos) — different format, same role. BC6H's 14 block modes make the encoder fiendishly expensive: a high-quality BC6H encoder like Intel's ISPC bc6h_enc tries dozens of mode permutations per 4×4 block, so offline baking a 4K HDR cubemap can take minutes. That's why game build systems usually carve BC6H baking into its own job — it's not in the same compute league as BC1 / BC7.

←家族:family: BCn block-compression lineage (extends the family into HDR for the first time) ↔同期发布:released alongside: BC7 (D3D11, the LDR sibling — same 8 bpp, complementary roles) →移动对应物:mobile counterpart: ASTC HDR (Khronos, same era, different geometry)

BC7 — 现代 BCn 的集大成

BC7 — the synthesis of modern BCn

YEAR 2011 (DirectX 11) AUTHOR Microsoft / Khronos (BPTC family) EXT — (payload, 装在 DDS / KTX 里) MIME — STD D3D BC7_UNORM · GL ARB_texture_compression_bptc · Vulkan VK_FORMAT_BC7_* BLOCK 4×4 / 16 byte = 8 bpp DEPTH RGBA 8-bit MODES 8 种内部块模式 (mode 0-7) SAMPLE GPU 硬件原生 STATUS 桌面纹理王 · 现代 AAA 游戏 90% 资产用它

"一种格式,八种块模式,自动挑最合适那种。"

"One format, eight block modes — pick whichever fits best."

BC1-5 各自只擅长一种场景:BC1 是 RGB 无 alpha、BC2 是 RGB + 锐利 alpha、BC3 是 RGB + 平滑 alpha、BC4 是单通道、BC5 是双通道。游戏纹理混合场景多——一张角色贴图可能同时有平滑 RGB 渐变 + 锐利 alpha 边缘 + 高频金属反光,任何单一 BCn 都解释不了整张。美术希望"一种格式覆盖所有"——不用每张图手动挑 BCn。BC7 的解法是 8 种内部块模式 + 编码器为每个 4×4 块自动挑最合适那种:同样 8 bpp(跟 BC2 / BC3 一样),但同图视觉质量比它们好 5-10×,几乎追上未压缩。BC7 因此成为 D3D11 时代之后桌面游戏纹理的事实唯一选择——AAA 游戏 90% 桌面贴图都用 BC7。

BC1-5 each excel at exactly one scenario: BC1 is RGB without alpha, BC2 is RGB + sharp alpha, BC3 is RGB + smooth alpha, BC4 is single-channel, BC5 is dual-channel. Real game textures mix scenarios — a single character map can carry smooth RGB gradients, sharp alpha edges and high-frequency metallic specular all at once, and no single BCn explains the whole thing. Artists want "one format that covers everything" without per-texture format picking. BC7's answer: 8 internal block modes plus an encoder that picks the best mode per 4×4 block. At the same 8 bpp as BC2 / BC3, BC7 looks 5-10× better visually — close to uncompressed. That's why, post-D3D11, BC7 became the de-facto only choice for desktop game textures: 90 % of AAA desktop textures are BC7.

图 20a · BC7 的 8 种内部块模式总览。每种 mode 给端点不同的 bit 分配、不同的分区数(1/2/3 个子区)、不同的 index bit 数,以及可选的 p-bit / 通道旋转。蓝色 mode 0-3 偏 RGB 高质量,橙色 mode 4-7 偏 RGBA。一张 16 byte block 只能用其中一种 mode——编码器要为每个 4×4 块挑最合适那种。

Fig 20a · Overview of BC7's eight internal block modes. Each mode allocates different bit budgets to the endpoints, picks a different partition count (1, 2 or 3 subsets), uses different index widths, and optionally adds p-bits / channel rotation. Blue (mode 0-3) lean toward high-quality RGB; orange (mode 4-7) lean toward RGBA. A single 16-byte block uses exactly one mode — the encoder must pick the best mode per 4×4 block.

图 20b · BC7 的 per-block mode 选择:对每个 4×4 块,编码器枚举 8 种 mode(每种再加上若干分区候选)逐一试编码,计算 SSE(squared error sum),选最低那个,把结果塞进 16 byte block。这就是 BC7 编码慢的根源——每块要试几十到上百次。

Fig 20b · BC7's per-block mode selection: for each 4×4 block the encoder enumerates all 8 modes (each with several partition candidates), trial-encodes each, computes SSE (squared error sum), keeps the lowest, and packs it into the 16-byte block. This is exactly why BC7 encoding is slow — tens to hundreds of trials per block.

图 20c · 5 类典型纹理(草地 / 砖墙 / 角色 / UI / 法线)的 PSNR 对比:BC1 vs BC7。BC7 在所有场景全面领先,典型 +8-12 dB(对应视觉质量 5-10× 提升)。代价:BC7 是 8 bpp 而 BC1 是 4 bpp(体积 2×),编码时间 50-200×(详见下图)。

Fig 20c · PSNR comparison across five typical texture categories (grass, brick, character, UI, normals): BC1 vs BC7. BC7 wins everywhere, typically by +8-12 dB (corresponding to a 5-10× visual quality improvement). The cost: BC7 is 8 bpp vs BC1's 4 bpp (2× the bytes) and 50-200× the encode time (see next figure).

图 20d · BC7 在 5 种编码方法下相对 BC1 的编码时间:naive brute-force 8 mode × 64 子分区 ≈ 250×;Intel ISPC SIMD ≈ 5×(开源 ispc_texcomp 是行业救命稻草);NVIDIA nvtt SIMD ≈ 8×;nvtt CUDA GPU 加速 ≈ 2×;AMD Compressonator SIMD ≈ 6×。BC7 编码慢的本质是 per-block 枚举,工程上用 SIMD / GPU 才把它压回可接受水平。

Fig 20d · BC7 encode time relative to BC1 across five encoders: naive brute-force (8 modes × 64 partitions) ≈ 250×; Intel ISPC SIMD ≈ 5× (open-source ispc_texcomp was the industry's lifeline); NVIDIA nvtt SIMD ≈ 8×; nvtt CUDA GPU-accelerated ≈ 2×; AMD Compressonator SIMD ≈ 6×. BC7 is slow at heart because of per-block mode enumeration; SIMD and GPU are what bring it back to acceptable wall time.

技术内核

Technical core

BC7 的设计哲学跟 BC1-5 完全相反——BC1-5 是"一种结构覆盖一类场景",BC7 是"八种结构都做出来,让编码器临时挑"。① 8 种 mode (mode 0-7):每种 mode 内部不同的 (a) 区块切分(1 / 2 / 3 个子区,subsets——把 4×4 块拆成多组,每组独立端点 + 内插,适用于块内有明显颜色边界);(b) endpoint bit 分配(如 mode 1 给端点 6·6·6 高精度,mode 2 给 5·5·5 留更多 bit 给 index);(c) index bit width(2 或 3 或 4 bit,索引位越多越能精细内插);(d) 可选 p-bit(端点末位补一位精度)与 rotation(把 alpha 跟某个颜色通道交换,提升 alpha 精度)。② mode 0-3 偏 RGB 高质量,mode 4-7 偏 RGBA——RGB 模式给颜色更多 bit 但不要 alpha;RGBA 模式拨一些 bit 给 alpha 通道。这种"分工"让 BC7 既能当 BC1 的 RGB 升级,又能当 BC3 的 RGBA 升级,完全覆盖。③ 编码器枚举所有 mode 选最优——每个 4×4 块要对 8 mode × 几十种分区组合 × 端点优化跑一遍,计算 SSE(平方误差和),选 SSE 最低那个塞进 16 byte。这就是 BC7 编码慢的根本原因——典型 8K 纹理用 naive brute-force 要 40 分钟,Intel ISPC SIMD 后降到几秒。④ 8 bpp(同 BC2 / BC3,但视觉质量好 5-10×)——BC1 / BC4 是 4 bpp,BC7 / BC2 / BC3 / BC5 / BC6H 都是 8 bpp。BC7 跟 BC2 / BC3 同 bpp,胜在 mode 选择灵活,典型纹理 PSNR 高 +8-12 dB。⑤ 解码硬件原生——D3D11+ / GL 4.2+ / Vulkan / Metal 全平台支持,GPU sample 一个 BC7 texel 跟 sample 一个 RGBA8 一样快。这是 BC7 比"软件解码 + 上传"格式(如 KTX 装 zlib)的根本优势。

BC7's design philosophy inverts BC1-5: BC1-5 use one structure per scenario, BC7 ships eight structures and lets the encoder pick at runtime. ① 8 modes (mode 0-7), each varying along (a) partitioning (1, 2 or 3 subsets — splitting the 4×4 block into independent regions, useful when there's a sharp colour boundary inside the block); (b) endpoint bit allocation (mode 1 gives endpoints 6·6·6 high precision; mode 2 gives 5·5·5 and donates the saved bits to the index); (c) index bit width (2, 3 or 4 bits — more bits means finer interpolation); (d) optional p-bits (one extra LSB on the endpoints) and rotation (swap alpha with one of the colour channels to boost alpha precision when warranted). ② Mode 0-3 lean toward high-quality RGB; mode 4-7 lean toward RGBA — RGB modes give colour more bits with no alpha, RGBA modes shave bits off colour to fund alpha. That division of labour is what lets BC7 simultaneously upgrade BC1 (RGB) and BC3 (RGBA). ③ The encoder enumerates all modes and picks the optimum — for every 4×4 block it tries 8 modes × tens of partition combinations × endpoint optimisations, scores them by SSE (sum of squared error), and writes the best one into 16 bytes. This is the core reason BC7 encoding is slow: a typical 8K texture needs ~40 minutes with naive brute-force, dropping to seconds with Intel's ISPC SIMD encoder. ④ 8 bpp (the same as BC2 / BC3) with 5-10× better visual quality — BC1 / BC4 are 4 bpp; BC7 / BC2 / BC3 / BC5 / BC6H are all 8 bpp. At equal bpp BC7's mode-selection flexibility wins +8-12 dB PSNR over BC2 / BC3 on typical textures. ⑤ Hardware-native decoding — D3D11+ / GL 4.2+ / Vulkan / Metal all decode BC7 in silicon; sampling a BC7 texel costs the same as sampling RGBA8. That hardware-native sampling is BC7's fundamental advantage over "software-decode + upload" formats like KTX-with-zlib payloads.

图 20 · BC7 完整编码流程:输入一个 4×4 RGBA 块,编码器并行尝试 8 种 mode(每种 mode 内部还要枚举分区方案 / 端点优化),为每种 mode 算出 SSE(squared error sum),取最低那个,把"哪种 mode + 端点 + index"打包成 16 byte block。整张纹理重复几十万次——这就是 BC7 编码慢的根源,也是 ISPC / CUDA 加速器存在的理由。

Fig 20 · BC7's full encode pipeline: take a 4×4 RGBA block, run trial encodes through all 8 modes (each one in turn enumerates partition layouts and endpoint optimisations), score them by SSE (sum of squared error), pick the lowest, and pack "chosen mode + endpoints + indices" into a 16-byte block. A whole texture repeats this hundreds of thousands of times — exactly why BC7 encoding is slow, and exactly why ISPC / CUDA-accelerated encoders exist.

历史专栏 · ENCODE-TIME WAR

HISTORY · ENCODE-TIME WAR

BC7 的"等三年才被广泛使用" · ISPC 的救命

Why BC7 took three years to become mainstream · ISPC to the rescue

BC7 作为 D3D11 强制功能在 2011 年发布。理论上,它的视觉质量比 BC1 好 5-10×、跟 BC2 / BC3 同 8 bpp 但显著好——这是任何美术看了对比图都会立刻倒戈的级别。但前 3 年(2011-2013)几乎没人在生产中用。原因只有一个:编码慢得离谱。

BC7 的 brute-force 编码要为每个 4×4 块跑 8 mode × 64 子分区 × 若干端点优化迭代,折合数百次 SSE 计算/块。一张 8K(8192² = 4096² 个块,约 1670 万块)纹理用 naive 单核 CPU 编码大约要 40 分钟。生产线上一个游戏可能有上千张这种贴图,夜间烘焙跑两天还没结束——美术晚上提交、第二天发现还在跑——这种工作流根本没法用。Unreal Engine 3 / 4 早期默认仍 fallback 到 BC1 / BC3,只在"我必须 BC7"的少数贴图上手动开。社区论坛(GameDev.net、Beyond3D)从 2011 末就一直有人哀嚎"BC7 帅是帅,但没法用"。

2014 年 Intel 把它的 ISPC(Implicit SPMD Program Compiler)开源 BC7 编码器 ispc_texcomp 放出来。ISPC 是 Intel 给 SIMD 写并行程序的语言,把 BC7 的 per-block 计算用 AVX2 8 路并行——单核效率提升约 8×,再加上多核并行(8-16 核),总加速 50-100×。一张 8K 纹理从 40 分钟降到 30-60 秒。这才让 BC7 真正可用。从此 ispc_texcomp 几乎成为行业标准——Unreal Engine、Unity、Source、idTech、Frostbite 全部内嵌了它。Intel 这一手是 web codec 史上很少见的"一个公司开源一个工具,救活一个 ISO 标准"。

2018 年 NVIDIA 推 Texture Tools 4 加 CUDA GPU 加速,BC7 编码再降到接近 BC1 速度(~2× 而非 ~5×),专业管线开始用 GPU 烘焙。2019 年 AMD Compressonator 跟进 SIMD 优化。2020 年代,每个 AAA 游戏构建管线都用 SIMD/GPU 跑 BC7,夜间烘焙不再是问题——但这背后其实是从 2011 到 2014 三年的"沉默期"才换来。

现状:桌面游戏 90% 纹理资产用 BC7,移动端则普遍用 ASTC(参见 C24)。Web 端 KTX2 容器封 BC7,WebGPU 直接拿来 sample——浏览器把它当原生纹理,完全无需 CPU 解压。这是一个"标准很完美,但要等工程把编码器写出来才真正活"的典型案例;ISO/工程标准之间的延迟,有时是按年计的。

BC7 shipped as a mandatory feature of D3D11 in 2011. On paper its visual quality crushes BC1 by 5-10× and clearly beats BC2 / BC3 at the same 8 bpp — the kind of comparison-image gap that flips any artist instantly. Yet for three years (2011-2013) it was almost never used in production. One reason: encoding was absurdly slow.

A brute-force BC7 encoder needs, per 4×4 block, 8 modes × 64 partitions × several endpoint-optimisation iterations — hundreds of SSE evaluations per block. An 8K texture (8192² = 16.7 million blocks) on a naive single-core CPU encoder ran roughly 40 minutes. A game might have a thousand such textures; an overnight bake would still be running two days later — the artist commits at night and finds the bake unfinished next morning. The workflow was simply unusable. Unreal Engine 3 / 4 early defaults stayed on BC1 / BC3, with BC7 reserved for the few "must have" textures. Forums (GameDev.net, Beyond3D) carried the same lament from late 2011 on: "BC7 is gorgeous, but unusable."

In 2014 Intel released its ISPC (Implicit SPMD Program Compiler) BC7 encoder ispc_texcomp as open source. ISPC is Intel's SIMD-parallel programming language; the BC7 encoder uses AVX2 8-wide SIMD inside each block — about 8× single-core, then multi-core parallelism on top of that (8-16 cores) for a combined 50-100× speedup. An 8K texture dropped from 40 minutes to 30-60 seconds. That was the moment BC7 became practical. ispc_texcomp quickly became the industry standard, embedded in Unreal Engine, Unity, Source, idTech and Frostbite. Intel's move is a rare example in web-codec history of "one company open-sourcing one tool that revived an ISO standard."

In 2018 NVIDIA's Texture Tools 4 added CUDA GPU acceleration, dropping BC7 encoding to nearly BC1 speed (~2× instead of ~5×); professional pipelines moved to GPU baking. AMD's Compressonator followed with SIMD work in 2019. Through the 2020s every AAA game build pipeline runs BC7 through SIMD or GPU, and overnight bakes are no longer a worry — but that ease only exists because of the silent period from 2011 to 2014.

Today, ~90 % of desktop game texture assets use BC7, while mobile pipelines use ASTC (see C24). On the web, KTX2 wraps BC7 and WebGPU samples it directly — the browser treats it as a native texture, no CPU decompression required. This is the classic case of "the standard is perfect, but it only truly lives once an engineer ships the encoder." The lag between an ISO standard and an engineering reality is sometimes measured in years.

format	bpp	RGBA	quality	encode time
BC1	4	RGB + 1-bit α	low	1× (baseline)
BC3	8	RGBA	medium	1×
BC7	8	RGBA	high	~50-200× of BC1
ASTC 4×4	8	RGBA	high+	similar to BC7
ASTC 6×6	3.56	RGBA	medium+	similar to BC7

$ nvtt_export --bc7 in.png -o out.dds            # NVIDIA Texture Tools, GPU-accelerated
$ ispc_texcomp -bc7 in.png out.dds               # Intel SIMD encoder, ~10× faster than naive
$ toktx --encode bc7 out.ktx2 in.png             # wrap into KTX2 (web / WebGPU friendly)
$ texconv -f BC7_UNORM in.png                    # Microsoft DirectXTex CLI
$ Compressonator.exe -fd BC7 in.png out.dds      # AMD Compressonator

适用

USE FOR

桌面 AAA 游戏纹理(角色 / 场景 / UI / 道具,99% 默认)
WebGPU 高质量纹理(KTX2 容器封装)
同时需要 RGB 高保真 + alpha 的混合贴图
升级现存 BC1 / BC3 资产以提升画质(同 / 双倍体积)
金属反光 / 高频细节贴图(mode 6 RGBA 单分区 + 4-bit index 表现极佳)

Desktop AAA game textures (characters / environments / UI / props — 99 % default)
High-quality WebGPU textures (wrapped in KTX2 containers)
Mixed maps that need both fidelity-grade RGB and alpha
Upgrading existing BC1 / BC3 assets for better quality (same or 2× the bytes)
Metallic specular / high-frequency detail (mode 6 — RGBA single subset + 4-bit index — excels)

反适用

AVOID

移动端(用 ASTC,块尺寸更灵活、bpp 可调)
HDR 纹理(用 BC6H,BC7 仍是 LDR 0-1)
D3D10 及以下的老硬件(BC7 是 D3D11+)
实时编码场景(即便 SIMD 仍比 BC1 慢 5-10×,服务端实时压缩慎用)
单 / 双通道贴图(用 BC4 / BC5 更省空间)

Mobile (use ASTC — flexible block sizes, tunable bpp)
HDR textures (use BC6H — BC7 is still LDR 0-1)
D3D10 or older hardware (BC7 requires D3D11+)
Real-time encoding (even SIMD is 5-10× slower than BC1; server-side live compression is risky)
Single / dual-channel maps (BC4 / BC5 are more space-efficient)

scope	APIs	tools	CLI
BC7	✓ D3D11+ · ✓ Vulkan · ✓ Metal · ✓ OpenGL 4.2+ (BPTC) · ✓ WebGPU (texture-compression-bc)	✓✓ NVIDIA Texture Tools (CUDA) · Intel ISPC `ispc_texcomp` · AMD Compressonator · Microsoft `texconv` · KTX-Software `toktx`	`nvtt_export --bc7` · `ispc_texcomp -bc7` · `toktx --encode bc7`

奇闻 · TRIVIA

TRIVIA

BC7 编码慢得离谱——brute-force 8 mode × 64 子分区 = 数百次 SSE 计算/块,典型 8K 纹理离线烘焙单核 40 分钟。Intel 2014 开源 ispc_texcomp(用 AVX2 SIMD)是行业救命稻草,从此每个 AAA 引擎都内嵌它——这是开源工具单枪匹马救活一个 ISO 标准的少见例子。

BC7 经常被人说"PSNR 跟 ASTC 4×4 一样高"——技术上没错,两者在桌面 RGBA 4×4 块上是棋逢对手。但 BC7 是固定 4×4 块(永远 8 bpp),ASTC 可以从 4×4(8 bpp)灵活换到 12×12(0.89 bpp),同张图你能根据画质 / 体积需求拨档。BC7 是"一档高质量",ASTC 是"一格谱"——这个差异决定了 BC7 在桌面赢、ASTC 在移动赢的格局。

BC7 encoding is absurdly expensive — brute-force 8 modes × 64 partitions = hundreds of SSE evaluations per block; an 8K texture offline-baked on a single core takes ~40 minutes. Intel open-sourcing ispc_texcomp in 2014 (AVX2 SIMD) was the industry's lifeline, and every AAA engine has embedded it ever since. It's a rare example of a single open-source tool single-handedly reviving an ISO standard.

People often say "BC7 has the same PSNR as ASTC 4×4." Technically true — they're evenly matched on desktop RGBA 4×4 blocks. But BC7 is fixed at 4×4 (always 8 bpp), while ASTC scales from 4×4 (8 bpp) to 12×12 (0.89 bpp); on the same image you can dial the quality / size trade-off per asset. BC7 is "one premium tier"; ASTC is "a whole spectrum." That single difference is why BC7 won the desktop and ASTC won mobile.

←家族:family: BC1-BC6H(BCn 的统一进化,BC7 是 LDR 集大成) ↔同期发布:released alongside: BC6H(D3D11,HDR 兄弟,同 8 bpp 互补分工) →移动对应:mobile counterpart: ASTC(在桌面 / 移动并行,BC7 守桌面、ASTC 守移动)

ETC1 — Android 早期标准

ETC1 — the early Android standard

YEAR 2005 AUTHOR Ericsson Research EXT — (payload, 装在 PKM / KTX 里) MIME — STD OES_compressed_ETC1_RGB8_texture · GLES 2.0+ 强制 BLOCK 4×4 / 8 byte = 4 bpp DEPTH RGB 8-bit ALPHA — (无 alpha,需要单独的 alpha 贴图) SAMPLE GPU 硬件原生(全部 OpenGL ES 2.0+ 设备) STATUS 历史标准 · 已被 ETC2 / ASTC 替代

"OpenGL ES 时代第一个免专利的块压缩。"

"The first patent-free block codec of the OpenGL ES era."

2005 年 Khronos 在为 OpenGL ES 标准化纹理压缩时遇到一个棘手问题——S3TC(BC1-3)效果好但被 S3 Graphics 申请了一堆专利,Khronos 不可能把"必须授权才能用"的格式塞进开放标准。Ericsson Research 提了 ETC1(Ericsson Texture Compression),声明免专利,正好填上空缺,跟着 OpenGL ES 2.0(2007)一起进入 Android 强制基线。Android 从此在游戏纹理上有了统一格式——美术不必为不同 GPU 厂商分别打包,Mali / Adreno / PowerVR / Tegra 全都能解 ETC1。代价是 ETC1 没有 alpha 通道,任何带透明度的资产(UI 图标、粒子、角色边缘)都要拆成"RGB 用 ETC1 + alpha 用 8-bit 灰度图"两份纹理上传——显存和带宽都要付双份钱。这是 ETC2 在 2013 年出生的根本原因。但回到 2005,免专利 + GLES 2.0 强制 = ETC1 一夜之间成了 Android 游戏纹理事实标准。Angry Birds(2009)、Cut the Rope(2010)这一代手机游戏的纹理资产几乎全是 ETC1。

In 2005, while Khronos was standardising texture compression for OpenGL ES, it ran into a thorny problem — S3TC (BC1-3) worked beautifully but was wrapped in patents owned by S3 Graphics, and an open standard couldn't mandate "must license to use" formats. Ericsson Research proposed ETC1 (Ericsson Texture Compression), declared it patent-free, and it slotted neatly into the gap, riding alongside OpenGL ES 2.0 (2007) into the Android mandatory baseline. Suddenly Android had a single texture format every artist could ship — no need to repackage per vendor, since Mali, Adreno, PowerVR and Tegra all decoded ETC1. The price was that ETC1 had no alpha channel, so anything translucent (UI icons, particles, character edges) had to be split into "RGB as ETC1 + alpha as an 8-bit greyscale map" — two texture uploads, double the VRAM and bandwidth. That is exactly why ETC2 was born in 2013. But back in 2005, patent-free + GLES 2.0 mandatory equals ETC1 becoming the de-facto Android texture standard overnight. Angry Birds (2009) and Cut the Rope (2010) — that generation of mobile games — shipped almost their entire texture base in ETC1.

图 21 · ETC1 把一个 4×4 块切成上下两半(各 2×4),每半独立存一个 RGB444 base color(12 bit)+ 一个 3-bit modifier id(从 16 行预设表里挑一行)+ 每像素 2-bit index(在 modifier 的 4 个偏移值里选一个)。整块 8 byte = 4 bpp,跟 BC1 同体积。但这个结构没有 alpha 通道——任何带透明度的资产都要单独再上传一张 alpha 贴图,显存和带宽要付双份。

Fig 21 · ETC1 splits a 4×4 block into two 2×4 halves; each half independently stores an RGB444 base colour (12 bits) + a 3-bit modifier id (one row picked out of a 16-row preset table) + a 2-bit per-pixel index (one of the modifier's four offset values). The whole block is 8 bytes = 4 bpp — the same footprint as BC1. But this structure has no alpha channel, so any translucent asset has to upload a second alpha texture, doubling VRAM and bandwidth.

技术内核

Technical core

ETC1 的设计是"把 BC1 的思路换一种几何切分,绕开专利"。① 4×4 块切两半——不像 BC1 把 4×4 当整体处理,ETC1 把块切成上下 2×4 或左右 4×2 两半(块头有 1 bit 标记 flip 方向),每半独立有自己的颜色 base + modifier。这是 ETC1 跟 BCn 最大的几何差异——BCn 块是统一的 16 像素插值,ETC1 是两组 8 像素插值。② RGB444 base + 16 行 modifier 表——每半的 base color 只有 12 bit(RGB444),精度比 BC1 的 RGB565 还低;但靠 modifier 表补救——3 bit 选 16 行预设里的一行,每行给出 4 个亮度偏移值(如 ±2 / ±8 这种"小幅"组,或 ±42 / ±183 这种"大幅"组),覆盖从平滑渐变到硬边缘的不同需求。③ 2-bit/像素 index——每像素再用 2 bit 选 modifier 行里的 4 个偏移值之一,加到 base color 上得到最终颜色。换言之 ETC1 的颜色计算是"base ± modifier",只在亮度方向上调,色相不变——这意味着 ETC1 处理彩色高频细节(花布、彩色噪点)很差,但处理"单色平滑+亮度变化"(皮肤、墙面、地形)很好。④ 没有 alpha——这是 ETC1 最致命的局限。Android 游戏的解决方案是"双纹理上传":RGB 用 ETC1,alpha 用单通道 8-bit 灰度图(或 ETC1 的另一个块当 alpha 用,叫 ETC1+A 的 hack)。⑤ 每块 8 byte / 16 像素 = 4 bpp——跟 BC1 同体积。质量略差于 BC1(因为色相方向死板),但免专利 = 能强制进 GLES 标准,这是 BC1 做不到的。

ETC1's design is "use a different geometric split from BC1 to dodge the patents". ① 4×4 block split into two halves — unlike BC1, which treats the 4×4 as one unit, ETC1 splits the block into two 2×4 halves (or two 4×2 halves; the block header carries a single flip bit). Each half independently owns its colour base + modifier. That's the biggest geometric difference from BCn: BCn's block is one 16-pixel interpolation; ETC1's is two 8-pixel interpolations. ② RGB444 base + a 16-row modifier table — each half's base colour is only 12 bits (RGB444), even less precise than BC1's RGB565; the modifier table makes up the difference. Three bits pick one of 16 preset rows, each row carrying four brightness offsets (a "fine" set like ±2 / ±8, a "coarse" set like ±42 / ±183), covering everything from smooth gradients to hard edges. ③ 2 bits per pixel for the index — each pixel picks one of the four offsets in the chosen modifier row and adds it to the base colour, producing the final value. ETC1's colour math is therefore "base ± modifier" — adjustment only along brightness, never along hue. That makes ETC1 poor on coloured high-frequency detail (patterned cloth, coloured noise) and excellent on monochrome-plus-brightness signals (skin, walls, terrain). ④ No alpha — ETC1's most fatal limitation. The Android workaround was the "two-texture upload": RGB as ETC1, alpha as a single-channel 8-bit greyscale map (or a second ETC1 block reused as alpha — the "ETC1+A" hack). ⑤ 8 bytes per 16-pixel block = 4 bpp, the same footprint as BC1. Quality lags BC1 slightly (because hue can't move) but ETC1 is patent-free, which lets it become a mandatory part of the GLES standard — something BC1 could never be.

适用

USE FOR

(历史) OpenGL ES 2.0 时代 Android 游戏纹理
(历史) Android 4.x / 5.x 时代不带 alpha 的资产(地形、天空盒、道具背景)
极少数仍需要兼容 OpenGL ES 2.0 设备的旧游戏维护

(historical) OpenGL ES 2.0-era Android game textures
(historical) Android 4.x / 5.x assets without alpha (terrain, skyboxes, prop backgrounds)
The rare modern case of maintaining a legacy game that still ships to GLES 2.0 devices

反适用

AVOID

任何现代项目(用 ETC2 或 ASTC 替代)
需要 alpha 的纹理(ETC1 根本不支持)
桌面 / 主机(用 BC7)
彩色高频纹理(ETC1 色相方向死板,效果差)

Any modern project (use ETC2 or ASTC instead)
Textures that need alpha (ETC1 simply doesn't support it)
Desktop / console (use BC7)
Coloured high-frequency textures (ETC1's fixed-hue model handles them badly)

scope	APIs	tools	CLI
ETC1	✓ OpenGL ES 2.0+(强制) · ✓ OpenGL 4.3+(ARB_ES3_compatibility) · ~ Vulkan(扩展) · ✗ D3D / Metal	✓ Khronos `etc1tool` · Mali Texture Compression Tool · ImageMagick · Unity / Unreal 早期内置	`etc1tool in.png --encode -o out.pkm` · `etc2comp -format ETC1 in.png -o out.ktx`

奇闻 · TRIVIA

TRIVIA

ETC1 是 "Ericsson Texture Compression" 的缩写——没错,就是那个做手机基站的 Ericsson。Ericsson Research 在 2005 年提出这个格式,主要动机是让自家 PowerVR 之外的手机 GPU 也有标准纹理压缩可用。Ericsson 后来又主导了 ETC2(2013),依然免专利,这是它在 GPU 历史里独特的一笔——一家电信设备厂商居然给手机图形界定义了两代标准。Khronos 为这件事专门给 Ericsson 发了 contributor 致谢,这在 Khronos 标准文档里是少见的明文署名。

ETC1 stands for "Ericsson Texture Compression" — yes, the same Ericsson that builds telecom base stations. Ericsson Research proposed the format in 2005, mainly so that mobile GPUs other than PowerVR would have a standard texture-compression option. The same group later led ETC2 (2013), again patent-free, giving Ericsson a quietly remarkable footprint in GPU history: a telecom-equipment vendor effectively defined two generations of mobile graphics standards. Khronos formally credited Ericsson as a contributor, which is unusual for an explicit by-name acknowledgement in a Khronos spec document.

←受启发于:inspired by: S3TC / BC1(免专利仿制,绕开 S3 Graphics 的专利墙) →直接继承:direct successor: ETC2 / EAC(在 ETC1 基础上加 alpha 通道) ↔同期对手:contemporary rival: PVRTC(PowerVR 私有方案,iOS 阵营)

ETC2 / EAC — alpha 加成

ETC2 / EAC — adding alpha

YEAR 2013 AUTHOR Khronos / Ericsson Research EXT — (payload, 装在 KTX 里) MIME — STD GL_COMPRESSED_RGB8_ETC2 / RGBA8_ETC2_EAC · GLES 3.0+ 强制 BLOCK 4×4 / RGB8 = 8 byte (4 bpp) · RGBA8 = 16 byte (8 bpp) DEPTH RGB / RGBA 8-bit · R11 / RG11 单/双 11-bit ALPHA 11-bit EAC alpha block(高精度) SAMPLE GPU 硬件原生(全部 OpenGL ES 3.0+ 设备) STATUS Android 基线纹理格式 · Vulkan 移动端主流

"ETC1 加上 alpha 通道,正好赶上 OpenGL ES 3.0。"

"ETC1 with alpha — just in time for OpenGL ES 3.0."

ETC1 在 Android 上跑了 6 年(2007-2013),但"没有 alpha"这个缺陷越用越疼。任何带透明度的资产——UI 图标、HUD、粒子系统、抠图角色——都要拆成两份纹理上传:RGB 用 ETC1(4 bpp),alpha 用 8-bit 灰度图(8 bpp),合计 12 bpp,显存和带宽是单纹理的 3 倍。手机游戏的 UI 又特别多透明元素,这个负担实打实地让中低端 Android 设备跑不动。2013 年 Khronos 正式推 ETC2 / EAC——保持向下兼容(老的 ETC1 块在 ETC2 解码器里能直接用),同时加入 RGBA 模式(ETC2 RGB 块 + EAC alpha 块,共 16 byte = 8 bpp)。ETC2 还顺手补齐了 R11 / RG11 单/双通道格式(对应桌面的 BC4 / BC5,用于法线贴图、roughness 等),让移动端也有了完整的"通道拆分"工具箱。最重要的政治决定:Khronos 把 ETC2 定成 OpenGL ES 3.0 的强制基线——任何宣称支持 GLES 3.0 的 GPU 都必须解码 ETC2。这意味着 2014 年之后的 Android 游戏可以放心地"全资产 ETC2",不再需要为"老设备没 ETC2"留 fallback。Unity / Unreal 在 2014 年都把 Android 默认纹理改成了 ETC2。

ETC1 ran on Android for six years (2007-2013), but "no alpha" hurt more every year. Anything translucent — UI icons, HUDs, particle systems, alpha-masked characters — had to upload two textures: RGB as ETC1 (4 bpp) plus alpha as an 8-bit greyscale map (8 bpp), 12 bpp combined and roughly 3× the bandwidth of a single texture. Mobile UI is unusually heavy on translucent elements, and that overhead measurably broke mid- to low-end Android devices. In 2013 Khronos shipped ETC2 / EAC: keep ETC1 backward compatibility (legacy ETC1 blocks decode unchanged in an ETC2 decoder) and add an RGBA mode (an ETC2 RGB block + an EAC alpha block, 16 bytes = 8 bpp total). ETC2 also rounded out single- and dual-channel formats with R11 / RG11 (the mobile counterparts to desktop BC4 / BC5 — normals, roughness, etc.), giving mobile its own full "channel-split" toolbox. The crucial political decision: Khronos made ETC2 a mandatory baseline for OpenGL ES 3.0. Any GPU that claims GLES 3.0 support must decode ETC2. Post-2014 Android games could finally ship all-ETC2 with no "device might not have ETC2" fallback. Unity and Unreal both flipped their Android default to ETC2 in 2014.

图 22 · ETC2 在 ETC1 的"上下两半 base + modifier"基础上新增 T-mode / H-mode / Planar mode 三种块布局——T/H 处理块内有硬边的情况,Planar 处理平滑渐变。RGBA8 模式则把一块拆成"前 8 byte ETC2 RGB block + 后 8 byte EAC alpha block",alpha 用 11-bit 高精度,共 16 byte = 8 bpp。EAC 还独立支持 R11 / RG11 单/双通道格式,对应桌面的 BC4 / BC5。

Fig 22 · On top of ETC1's "two halves with base + modifier", ETC2 adds three new block layouts — T-mode, H-mode and Planar mode. T and H handle blocks containing hard colour edges; Planar handles smooth gradients. The RGBA8 mode splits each block into "8 bytes of ETC2 RGB + 8 bytes of EAC alpha", with alpha quantised at 11-bit high precision — 16 bytes total = 8 bpp. EAC also supports standalone R11 / RG11 single- and dual-channel formats, the mobile counterparts to desktop BC4 / BC5.

技术内核

Technical core

ETC2 的设计哲学是"在 ETC1 上做加法,不做减法"——所有 ETC1 块在 ETC2 解码器里都能正常工作(向下兼容),新增的能力都通过"block 头部模式位"切换。① RGB 块 4 种模式:(a) ETC1 兼容模式(老的"两半 base + modifier" 结构,8 byte);(b) T-mode(把 4×4 块按 T 形分成两个颜色区,适合块内有 L 形 / T 形硬边);(c) H-mode(把块按 H 形分两区,适合垂直硬边);(d) Planar mode(用三个角点的颜色定义平面,块内每像素从平面采样,适合平滑渐变如皮肤、天空)。每块的头部 1 bit 指明用哪种模式,编码器为每块挑最优。② RGBA8 = ETC2 RGB block + EAC alpha block——一块 16 byte,前 8 byte 是 ETC2 RGB,后 8 byte 是 EAC(Ericsson Alpha Compression)alpha 块。EAC alpha 块用 8 个端点 + 内插值 + 3-bit/像素 index,提供 11-bit 等效精度,远高于 BC3 alpha 块的 8-bit。③ R11 / RG11——独立的单/双 11-bit 通道格式,对应桌面的 BC4 / BC5,用于法线贴图(RG11)、高度图 / roughness(R11)等。R11 是 8 byte/块 = 4 bpp,RG11 是 16 byte/块 = 8 bpp。④ punch-through alpha——一种特殊模式叫 ETC2 RGBA1(RGB8_PUNCHTHROUGH_ALPHA1_ETC2),只允许 alpha = 0 或 255 的硬切边(像 BC1 的 1-bit alpha),用于树叶、栅栏这种"完全透明 / 完全不透"的资产,体积仍是 4 bpp。⑤ OpenGL ES 3.0 强制 = 不需要 fallback——这是 ETC2 最大的工程优势。BC1-7 在桌面是"硬件支持但要查 capability",ETC2 在 Android GLES 3.0+ 是"必然存在"。Unity / Unreal 因此在 2014 年果断把 Android 默认纹理改成 ETC2。

ETC2's design philosophy is "add to ETC1, never subtract" — every ETC1 block decodes correctly in an ETC2 decoder (backward compatibility), and new capabilities are gated behind block-header mode bits. ① The RGB block has four modes: (a) ETC1-compatible (the legacy "two halves, base + modifier" structure, 8 bytes); (b) T-mode (the 4×4 block split into two colour regions in a T shape — handy for blocks with L- or T-shaped hard edges); (c) H-mode (split into two regions in an H shape — for vertical hard edges); (d) Planar mode (three corner colours define a plane, every pixel is sampled from that plane — for smooth gradients like skin and sky). One bit in the block header chooses the mode; the encoder picks per-block. ② RGBA8 = ETC2 RGB block + EAC alpha block — 16 bytes per block: the first 8 are ETC2 RGB, the next 8 are EAC (Ericsson Alpha Compression). The EAC alpha block carries 8 endpoints + interpolated values + 3-bit per-pixel index, delivering an effective 11-bit precision — far above BC3 alpha's 8-bit. ③ R11 / RG11 — standalone single- and dual-channel 11-bit formats, the mobile counterparts to desktop BC4 / BC5, used for normal maps (RG11), height / roughness maps (R11), etc. R11 is 8 bytes per block = 4 bpp; RG11 is 16 bytes = 8 bpp. ④ Punch-through alpha — a special mode called ETC2 RGBA1 (RGB8_PUNCHTHROUGH_ALPHA1_ETC2) only allows alpha = 0 or 255 hard cut-outs (like BC1's 1-bit alpha), targeted at foliage / fences / "fully on or fully off" assets at 4 bpp. ⑤ OpenGL ES 3.0 mandatory = no fallback needed — and that is ETC2's biggest engineering advantage. On desktop, BC1-7 are "hardware-supported but capability-checked"; on Android with GLES 3.0+, ETC2 is guaranteed to exist. That is exactly why Unity and Unreal flipped the Android default to ETC2 in 2014.

适用

USE FOR

Android 游戏纹理 / OpenGL ES 3.0+ 全部资产(默认选择)
需要兼容老 Android 设备但又想要 alpha 的项目
Vulkan 移动端纹理(广泛支持)
R11 / RG11 用于移动端法线贴图、roughness、高度图
punch-through alpha 用于树叶 / 栅栏 / UI 硬边切边资产

Android game textures / OpenGL ES 3.0+ assets (the default choice)
Projects that need to support older Android devices yet still ship alpha
Vulkan mobile textures (broadly supported)
R11 / RG11 for mobile normal maps, roughness, height maps
Punch-through alpha for foliage / fences / hard-edged UI cut-out assets

反适用

AVOID

桌面 / 主机(用 BC7,质量更高且行业标准)
新硬件移动端(用 ASTC,块尺寸灵活、bpp 可调)
iOS(Apple 设备只支持到 iOS 13;后续要 ASTC)
HDR 纹理(ETC2 仍是 LDR 0-1)

Desktop / console (use BC7 — higher quality and the industry standard)
Newer mobile hardware (use ASTC — flexible block sizes, tunable bpp)
iOS (Apple devices only supported it through iOS 13; everything later wants ASTC)
HDR textures (ETC2 is still LDR 0-1)

scope	APIs	tools	CLI
ETC2 / EAC	✓ OpenGL ES 3.0+(强制) · ✓ OpenGL 4.3+(ARB_ES3_compatibility) · ✓ Vulkan(VK_FORMAT_ETC2_*) · ~ Metal(iOS 13 之前) · ✗ D3D 原生	✓✓ Google `etc2comp`(开源,SIMD 加速) · Mali Texture Compression Tool · Compressonator · Unity / Unreal 内置	`etc2comp -format RGBA8 in.png -o out.ktx` · `toktx --encode etc2 out.ktx2 in.png`

奇闻 · TRIVIA

TRIVIA

ETC2 至今仍是 Android 基线纹理格式——即使 ASTC 已经在 2014 年发布、绝大多数 2017 年后的 Android 设备都支持 ASTC LDR Profile,但 Khronos 没有把 ASTC 列为 OpenGL ES 3.x 的强制功能(只在 ES 3.2 强制 ASTC LDR),所以游戏开发者不能"假设设备一定有 ASTC",fallback 仍然要走 ETC2。Vulkan 也类似——ASTC 是可选 feature,ETC2 才是默认。这意味着 2026 年发布的 Android 游戏,如果要覆盖到 GLES 3.0 的老设备,纹理资产仍然要打包一份 ETC2 版本。Google 的 etc2comp 编码器现在仍在维护,虽然主流项目正在迁移到 ASTC + ETC2 双轨。

ETC2 is still Android's baseline texture format today. Even though ASTC shipped in 2014 and the vast majority of post-2017 Android devices support the ASTC LDR Profile, Khronos never made ASTC mandatory for OpenGL ES 3.x (only OpenGL ES 3.2 mandates ASTC LDR), so game developers cannot assume a device has ASTC and must keep an ETC2 fallback. Vulkan is similar — ASTC is an optional feature; ETC2 is the default. The practical consequence is that an Android game shipped in 2026, if it wants to reach GLES 3.0 hardware, still has to pack an ETC2 build of every texture. Google's etc2comp encoder is still maintained, even as flagship projects migrate to dual-track ASTC + ETC2.

←直接父辈:direct parent: ETC1(向下兼容,所有 ETC1 块在 ETC2 解码器里直接可用) ↔同代但路径不同:contemporary but different track: BC7(D3D11 桌面方向) →在新硬件被替代:superseded on newer hardware: ASTC(更灵活的块尺寸 + HDR 支持)

PVRTC — Apple 早期独占

PVRTC — Apple's early proprietary lock-in

YEAR 2003 AUTHOR Imagination Technologies (PowerVR) EXT .pvr · 也装在 KTX 里 MIME — STD IMG_texture_compression_pvrtc · OES_compressed_paletted_texture(早期) BLOCK 8×4 (2 bpp 模式) · 4×4 (4 bpp 模式) / 8 byte DEPTH RGBA 8-bit · PVRTC2 加 punch-through alpha ALPHA 原生 alpha 通道 SAMPLE PowerVR GPU 硬件原生 · 其他 GPU 不支持 STATUS iOS 老设备唯一 · A10 后被 ASTC 取代

"PowerVR 的私有方案,iPhone 一代到 7 代的纹理本命。"

"PowerVR's proprietary scheme — texture-of-life for iPhone 1 through 7."

PVRTC 的诞生跟一个非常特定的硬件架构绑定:Imagination Technologies 的 PowerVR GPU 用的是 TBDR(Tile Based Deferred Rendering,基于瓦片的延迟渲染)——把屏幕切成小瓦片(典型 32×32 像素),每个瓦片独立渲染、合成,显著省功耗(手机的核心需求)。问题是 TBDR 处理瓦片时,纹理 sample 经常跨瓦片边界,如果纹理压缩格式是"块独立"的(像 BC1 / ETC1 那种,每个 4×4 块独立解码),瓦片边界处会出现明显的"块状不连续"(blocky artifact)。Imagination 在 2003 年提出 PVRTC 解决这个问题:不存"每块独立的颜色",而是存两层"低分辨率的颜色信号" + 一个"调制信号"——运行时 GPU 在采样点对两层信号做双线性插值,然后用调制信号在两个插值结果之间混合。这样块之间天然连续,没有边界 artifact——完美适配 TBDR。代价是 PVRTC 是私有格式,只有 PowerVR GPU 能解。但 Apple iPhone 1(2007)到 iPhone 7(2016)全部用 PowerVR GPU,所以 PVRTC 是 iOS 游戏的唯一标准纹理格式近十年。Infinity Blade、Real Racing、Monument Valley 一代游戏的纹理资产基本全是 PVRTC。iPhone 8 / iPad Pro A10(2017)改用 Apple 自研 GPU,默认 ASTC,PVRTC 进入历史。

PVRTC's birth is tied to one very specific hardware architecture: Imagination Technologies' PowerVR GPUs use TBDR (Tile Based Deferred Rendering), which slices the screen into small tiles (typically 32×32 pixels), renders and composites each tile independently, and saves significant power — the core mobile requirement. The trouble is that during tile processing, texture samples regularly cross tile boundaries; if the texture format is "block independent" (like BC1 / ETC1, each 4×4 block decoded in isolation), tile boundaries grow visible "blocky" artifacts. In 2003 Imagination proposed PVRTC to solve this. Instead of storing "independent colour per block", PVRTC stores two layers of low-resolution colour signals plus a modulation signal — at sample time the GPU bilinearly interpolates both colour layers, then blends the two interpolated results using the modulation signal. Blocks are naturally continuous across boundaries — no block artifacts, a perfect TBDR fit. The price is that PVRTC is proprietary, decodable only on PowerVR GPUs. But every iPhone from the iPhone 1 (2007) through the iPhone 7 (2016) shipped with a PowerVR GPU, so PVRTC was the de facto sole texture standard on iOS for nearly a decade. Infinity Blade, Real Racing and Monument Valley — that generation of iOS games — basically shipped their entire texture base as PVRTC. The iPhone 8 / iPad Pro A10 (2017) switched to Apple's own GPU, defaulting to ASTC, and PVRTC slid into history.

图 23 · PVRTC 不像 BCn / ETCn 那样存"每块独立颜色",而是存两个 1/4 分辨率的 RGB 信号(A、B)+ 一个调制信号 mod。运行时 GPU 在采样点对 A、B 各做双线性插值,得到两个候选颜色,再用 mod 在两者之间混合得到最终像素。这种结构让块之间天然连续——块边界没有 BCn 那种"blocky" artifact,完美适配 PowerVR 的 TBDR(基于瓦片的延迟渲染)架构。块尺寸 8×4(2 bpp)或 4×4(4 bpp)双精度档可选。

Fig 23 · Unlike BCn / ETCn, PVRTC does not store "independent colour per block". Instead it stores two quarter-resolution RGB signals (A, B) plus a modulation signal mod. At sample time the GPU bilinearly interpolates A and B independently, producing two candidate colours, and then uses mod to blend the two into the final pixel. This structure makes block boundaries naturally continuous — no BCn-style "blocky" artifacts, perfectly suited to PowerVR's TBDR (tile-based deferred rendering). Block sizes are 8×4 (2 bpp) or 4×4 (4 bpp), two precision tiers.

技术内核

Technical core

PVRTC 的技术结构跟 BCn / ETCn 完全是另一条思路——它不做"每块独立解码",而是用"全图低分辨率信号 + 调制图"的方案。① 两个低分辨率 RGB 层 + 一个调制层——记原图分辨率 W×H,PVRTC 把它编码为:(a) 信号 A,分辨率 (W/4)×(H/4)(2 bpp 模式)或 (W/4)×(H/2)(4 bpp 模式),每个采样点存 RGB 端点;(b) 信号 B,跟 A 同分辨率,存另一组 RGB 端点;(c) 调制信号 mod,跟原图同分辨率,每像素 1 bit(2 bpp 模式)或 2 bit(4 bpp 模式)指明 A 和 B 的混合比例。② 采样时的实际运算:GPU 对 A、B 各自做双线性插值得到 colorA、colorB,再用 mod 在两者之间混合。这不是块独立——同一个像素的颜色受周围 4 个 A 端点 / 4 个 B 端点的影响,块边界因此天然平滑过渡。③ 块尺寸 8×4(2 bpp)或 4×4(4 bpp)——两种码率档:2 bpp 是 8×4 块 / 8 byte = 2 bpp(注意是 8 byte/块,跟 BC1 同 byte 数但块更大,所以 bpp 减半),4 bpp 是 4×4 块 / 8 byte。④ 原生 alpha——比 ETC1 强,能直接装 RGBA 数据(虽然质量略差于 BC3 / BC7)。⑤ "分辨率必须是 2 的幂 + 正方形 + ≥8×8"——PVRTC v1 的硬限制。这个限制源于"信号 A、B 必须能均匀采样到原图所有像素"。PVRTC2(2009)放宽了这个限制(支持任意宽高 + punch-through alpha),但 PVRTC2 的硬件支持远不如 v1 普及。⑥ PowerVR 独占解码硬件——这是 PVRTC 同时是它的优势和坟墓。优势:iPhone 1-7 全部 PowerVR,PVRTC 在 iOS 游戏里是"必然支持";坟墓:其他 GPU 不解 PVRTC,Android 设备完全用不了,跨平台游戏要分别打包 PVRTC(iOS)+ ETC2(Android)两份纹理。Apple 在 A10(2017)改用自研 GPU(基于 Imagination 但有改造),iOS 11+ 推荐 ASTC 后,PVRTC 就停止发展了。Imagination Technologies 也在 2017 年因 Apple 流失被收购,PVRTC 实际上跟着公司一起进入历史。

PVRTC's technical structure is on a completely different track from BCn / ETCn — it doesn't do "decode each block in isolation"; it uses "global low-resolution signals + a modulation map." ① Two low-resolution RGB layers plus one modulation layer — given source resolution W×H, PVRTC encodes: (a) signal A at (W/4)×(H/4) (2 bpp mode) or (W/4)×(H/2) (4 bpp mode), each sample storing an RGB endpoint; (b) signal B, same resolution as A, holding another RGB endpoint set; (c) modulation signal mod, at the source's full resolution, with 1 bit/pixel (2 bpp mode) or 2 bits/pixel (4 bpp mode) specifying the blend ratio between A and B. ② Sample-time arithmetic: the GPU bilinearly interpolates A and B independently to produce colourA and colourB, then uses mod to blend them. This is not block-independent — a single pixel's colour depends on the surrounding 4 A endpoints + 4 B endpoints, so block boundaries transition smoothly by construction. ③ Block size 8×4 (2 bpp) or 4×4 (4 bpp) — two bitrate tiers. The 2 bpp variant uses 8×4 blocks at 8 bytes per block (note: same bytes-per-block as BC1, but the block is larger, so bpp halves); the 4 bpp variant is 4×4 blocks at 8 bytes. ④ Native alpha — stronger than ETC1, can carry RGBA directly (though with somewhat lower quality than BC3 / BC7). ⑤ "Power-of-two, square, ≥ 8×8" — PVRTC v1's hard requirement, rooted in the need for signals A and B to sample uniformly onto every source pixel. PVRTC2 (2009) relaxed this (arbitrary aspect ratios + punch-through alpha), but PVRTC2 hardware support never reached v1's ubiquity. ⑥ PowerVR-exclusive decode hardware — both PVRTC's strength and its tomb. The strength: every iPhone 1-7 had a PowerVR GPU, so PVRTC was guaranteed-supported on iOS. The tomb: no other GPU decodes PVRTC, so Android couldn't use it at all, and cross-platform games had to ship two texture builds — PVRTC (iOS) + ETC2 (Android). When Apple's A10 (2017) moved to in-house GPUs (originally Imagination-derived, but heavily modified), and iOS 11+ recommended ASTC, PVRTC stopped evolving. Imagination Technologies itself was acquired in 2017 after losing the Apple business; PVRTC effectively went into the history books with the company.

适用

USE FOR

(历史) iPhone 1-7 / iPad 第一代到 Pro A9 的 iOS 游戏
(历史) 老 Android PowerVR 设备(MX 系列芯片)
仍需要兼容到 iOS 9-10 的旧游戏维护
需要"块边界天然连续"的特殊场景(罕见)

(historical) iOS games on iPhone 1-7 / first-gen iPad through iPad Pro A9
(historical) Older Android PowerVR devices (MX-series SoCs)
Maintaining a legacy game that still ships to iOS 9-10
The rare niche that genuinely needs "natively continuous block boundaries"

反适用

AVOID

现代 iOS 项目(用 ASTC,Apple Silicon 默认)
任何非 PowerVR GPU 设备(Android Mali / Adreno / Tegra,完全不解)
跨平台游戏(双轨打包成本高,统一用 ASTC + ETC2 fallback)
非 2 的幂 / 非正方形纹理(PVRTC v1 硬限制)

Modern iOS projects (use ASTC, Apple Silicon's default)
Any non-PowerVR GPU device (Android Mali / Adreno / Tegra simply can't decode it)
Cross-platform games (the dual-track packaging cost is high — unify on ASTC + ETC2 fallback)
Non-power-of-two or non-square textures (a hard PVRTC v1 limitation)

scope	APIs	tools	CLI
PVRTC v1 / v2	✓ PowerVR GPU(iOS A4-A9, 部分老 Android) · ~ Vulkan iOS 兼容层 · ✗ 其他 GPU	✓ Imagination `PVRTexTool`(GUI + CLI) · texconv 不支持 · Unity / Unreal 早期 iOS 默认	`PVRTexToolCLI -i in.png -o out.pvr -f PVRTC1_4_RGB` · `PVRTexToolCLI -f PVRTC1_2_RGBA -i in.png -o out.pvr`

奇闻 · TRIVIA

TRIVIA

PVRTC 是 PowerVR 的招牌技术 TBDR(Tile Based Deferred Rendering)的配套产物——它的"块间天然连续"设计就是为了完美适配 TBDR 的瓦片处理。Imagination Technologies 把 PVRTC 当成 PowerVR GPU 的核心差异化卖点,从 1999 年就开始研发,2003 年正式发布,主打"手机 GPU 专用"。Apple 在 2007 年 iPhone 一代就选了 PowerVR GPU + PVRTC 组合,这一选定直接让 Imagination 在手机 GPU 市场坐稳头部位置长达 10 年。

iPhone 8 / iPad Pro A10 之后(2017)Apple 改用自研 GPU(基于 Imagination 的架构但深度改造),并在 Metal 里默认推 ASTC,PVRTC 进入历史。Imagination Technologies 同年被中资基金(Canyon Bridge)收购,PVRTC 几乎跟着公司一起退场——一个"绑定单一客户"的格式被那个客户抛弃后的命运。

PVRTC was built around PowerVR's signature technology, TBDR (Tile Based Deferred Rendering) — the "naturally continuous between blocks" design exists specifically to fit TBDR's tile processing perfectly. Imagination Technologies treated PVRTC as PowerVR's flagship differentiator, starting research in 1999 and shipping it in 2003 as the "purpose-built mobile GPU codec." When Apple chose a PowerVR GPU + PVRTC combination for the original iPhone in 2007, that single decision kept Imagination at the top of mobile-GPU market for a decade.

After iPhone 8 / iPad Pro A10 (2017) Apple moved to its own GPUs (originally derived from Imagination's architecture but heavily reworked) and pushed ASTC by default in Metal — PVRTC slid into history. Imagination Technologies was acquired by a Chinese fund (Canyon Bridge) the same year, and PVRTC effectively retired alongside the company — the textbook fate of a format bound to a single customer once that customer walks away.

←私有起源:proprietary origin: PowerVR 私有研发(1999-2003,跟 TBDR 架构同源) ↔同期对手:contemporary rival: ETC1(开放路线,Android 阵营) →Apple 抛弃后接班:Apple's chosen successor: ASTC(Khronos 标准,A10 后默认)

ASTC — 可变块大小的现代之王

ASTC — the modern king of variable block size

YEAR 2012 AUTHOR ARM + Khronos EXT .astc · 装在 KTX2 里 MIME — STD KHR_texture_compression_astc_ldr / _hdr · Vulkan VK_FORMAT_ASTC_* BLOCK 4×4 ~ 12×12(14 档)/ 16 byte BPP 8 bpp ~ 0.89 bpp 灵活档 DEPTH LDR / HDR 双 profile ALPHA 原生 alpha 通道(独立权重平面) SAMPLE GPU 硬件原生 STATUS Vulkan / OpenGL ES 3.2 强制 · Apple A8+ · WebGPU 可选 feature

"4×4 还是 12×12?同一个格式,你自己挑。"

"4×4 or 12×12? Same format — you choose."

BCn / ETC2 都是固定 4×4 块,bpp 永远 4 或 8——只能"全图统一档位"。但游戏里的纹理质量需求从来不是一档的:UI 图标、角色脸需要高质量(密集块、高 bpp);远处地形、天空盒可以低质量(稀疏块、低 bpp)。美术希望一种格式同时支持"质量/体积"光谱滑块——同样一个文件结构,从 8 bpp 一路滑到 1 bpp。ARM 主导设计 ASTC(Adaptive Scalable Texture Compression),提供 14 种块大小(4×4 至 12×12),bpp 从 8 降到 0.89——同一格式覆盖近 9× 体积范围。Khronos 在 2012 年通过标准化(GLES 3.2 强制 + Vulkan 默认 + Apple A8 起原生支持),ASTC 成为现代移动 + WebGPU 的事实之王。BC7 守桌面、ASTC 守移动——这是 GPU 纹理压缩 2010 年代后的两强格局。

BCn / ETC2 are fixed at 4×4 blocks; bpp is locked at 4 or 8 — every texture in a project must pick one tier for the whole image. But real game textures need a spectrum: UI icons and character faces want high quality (dense blocks, high bpp), while distant terrain and skyboxes can run low quality (sparse blocks, low bpp). Artists want one format that exposes a quality / size dial — the same file structure sliding from 8 bpp down to 1 bpp. ARM led the design of ASTC (Adaptive Scalable Texture Compression), shipping 14 block sizes from 4×4 to 12×12, with bpp dropping from 8 to 0.89 — one format spanning nearly a 9× size range. Khronos standardised it in 2012 (mandatory in GLES 3.2, default in Vulkan, native on Apple from A8 onward), and ASTC became the de-facto king of modern mobile and WebGPU. BC7 owns desktop, ASTC owns mobile — that's the post-2010s duopoly of GPU texture compression.

图 24a · ASTC 的 14 种块大小及对应 bpp。每块固定 16 byte——块越大(像素更多),bpp 就越低。从 4×4 块(每块 16 px,8 bpp)滑到 12×12 块(每块 144 px,0.89 bpp),同一格式覆盖近 9× 体积范围。这是 ASTC 相对 BCn / ETC2 最根本的优势:不是"格式更好",而是"档位更宽"。

Fig 24a · ASTC's 14 block sizes and the bpp each one yields. Every block is a fixed 16 bytes — the larger the block (more pixels packed in), the lower the bpp. Sliding from 4×4 (16 px, 8 bpp) to 12×12 (144 px, 0.89 bpp), the same format covers nearly a 9× size range. ASTC's most fundamental advantage over BCn / ETC2 isn't "better quality" — it's "wider tier range."

图 24b · ASTC 块的 16 byte 内部结构:① 块模式(~11 bit)指示 weight grid 的形状与精度;② partition 计数(1-4 子区);③ CEM(Color Endpoint Mode,13 种 endpoint 编码格式);④ 颜色端点(可变长);⑤ 权重网格(从 byte 15 反向打包,网格大小可独立于块大小)。无论块是 4×4 还是 12×12,总长永远 128 bit。

Fig 24b · The internal layout of an ASTC block's 16 bytes: ① block mode (~11 bits) describing the weight grid's shape and precision; ② partition count (1-4 subsets); ③ CEM (Color Endpoint Mode — 13 endpoint encodings); ④ colour endpoints (variable); ⑤ weight grid (packed in reverse from byte 15, with a grid size independent of the block size). Whether the block is 4×4 or 12×12, the total is always 128 bits.

图 24c · 同一张纹理在 ASTC 三种块大小下的视觉表现:4×4(8 bpp,PSNR ≈ 44 dB,几乎无损),6×6(3.56 bpp,PSNR ≈ 38 dB,移动端默认档),12×12(0.89 bpp,PSNR ≈ 30 dB,明显糊但 9× 省体积)。美术按用途挑档——UI 用 4×4,环境用 6×6,远处 LOD 用 10×10 或 12×12。

Fig 24c · The same texture rendered with three ASTC block sizes: 4×4 (8 bpp, PSNR ≈ 44 dB, near-lossless), 6×6 (3.56 bpp, PSNR ≈ 38 dB, mobile's default tier), 12×12 (0.89 bpp, PSNR ≈ 30 dB, visibly blurry but 9× smaller). Artists pick a tier per use — 4×4 for UI, 6×6 for environments, 10×10 or 12×12 for distant LODs.

图 24d · ASTC vs BCn 在三个 bpp 档位的 PSNR 对比。8 bpp:ASTC 4×4 略胜 BC7 ~2 dB(同 bpp 同高质量);3.56 bpp:ASTC 6×6 比 BC1(4 bpp,稍多体积)还好 6 dB;2 bpp:ASTC 8×8 仍能 ~33 dB,而 BCn 根本没有这个档位——这是 ASTC 真正的杀手锏:低 bpp 档没有竞品。

Fig 24d · ASTC vs BCn at three bpp tiers. 8 bpp: ASTC 4×4 narrowly beats BC7 by ~2 dB (same bpp, same premium tier). 3.56 bpp: ASTC 6×6 even beats BC1 (4 bpp, more bytes) by ~6 dB. 2 bpp: ASTC 8×8 still hits ~33 dB while BCn has no tier in this range at all — this is ASTC's true killer move: at low bpp it has no competition.

技术内核

Technical core

ASTC 的设计哲学是"一格框架,无限档位"——所有块共用 16 byte 容器,但内部组件按块大小重新分配比例,让格式从 8 bpp 一路滑到 0.89 bpp。① 14 种块大小:4×4 / 5×4 / 5×5 / 6×5 / 6×6 / 8×5 / 8×6 / 8×8 / 10×5 / 10×6 / 10×8 / 10×10 / 12×10 / 12×12——LDR profile 全部支持,HDR profile 仅前 8 种。还有 3D 体素扩展(用 3×3×3 等 11 种 3D 块)。② 每块固定 16 byte——这是 ASTC "档位光谱"的根本机制:容器不变,块越大(像素更多)→ 每像素分到的 bit 越少 → bpp 越低。比如 4×4 块 = 16 px / 16 byte = 8 bpp;12×12 块 = 144 px / 16 byte = 0.89 bpp。同样的解码硬件、同样的文件结构,档位却覆盖近 9× 体积差。③ 13 种 endpoint 编码格式 (CEM):LDR int 8/16/24-bit、HDR float、单/双/三平面(单平面 = RGB 共享一组端点;三平面 = RGB 各自独立端点,适合高频彩色细节)。每块在 endpoint mode 字段内挑一种,精确匹配局部像素分布。④ 权重平面 + 双权重平面:基本 ASTC 用一张权重图控制所有通道的内插;双权重平面(dual-plane)模式让 alpha 或某一颜色通道走独立权重——类比 BC7 的 "rotation",但更通用,在彩色 + 高频 alpha 混合贴图上质量明显更好。⑤ HDR + 3D 双扩展——LDR profile(主流硬件全支持)给颜色 0-1 范围;HDR profile(部分硬件)给 float 范围,直接当移动版 BC6H 用;3D profile(更小众)给体素纹理(医疗影像、烟雾模拟、地形 3D 噪声)。⑥ 权重网格大小可独立于块大小——一个 12×12 块的权重网格可以是 4×4(更稀疏,更节省 bit 给 endpoint),也可以是 8×8(更密,牺牲 endpoint 精度换插值精度)。这是 ASTC 比 BC7 更灵活的核心,编码器要在"块大小 × endpoint mode × 权重网格"三维空间搜索最优。⑦ 编码极慢——astcenc 参考编码器是 brute-force 搜全部组合,单图 6×6 thorough 模式可能要几分钟。但解码硬件原生,sample 一个 ASTC texel 跟 sample BC7 一样快。

ASTC's design philosophy is "one frame, infinite tiers" — every block shares a 16-byte container, but the internal allocation re-balances by block size, sliding the format from 8 bpp all the way to 0.89 bpp. ① 14 block sizes: 4×4 / 5×4 / 5×5 / 6×5 / 6×6 / 8×5 / 8×6 / 8×8 / 10×5 / 10×6 / 10×8 / 10×10 / 12×10 / 12×12 — all supported by the LDR profile; HDR is limited to the first 8. A 3D extension also defines 11 voxel block sizes. ② Every block is exactly 16 bytes — the mechanism behind ASTC's tier spectrum. The container stays constant; the bigger the block (more pixels packed in), the fewer bits per pixel, and the lower the bpp. 4×4 = 16 px / 16 bytes = 8 bpp; 12×12 = 144 px / 16 bytes = 0.89 bpp. Same decode hardware, same file layout, nearly 9× size difference between extremes. ③ 13 endpoint encodings (CEM): LDR int at 8 / 16 / 24-bit, HDR float, with one / two / three planes (one-plane = RGB share endpoints; three-plane = each colour channel has independent endpoints, ideal for high-frequency colour). Each block picks one CEM in its endpoint-mode field to match the local pixel distribution. ④ Weight plane + dual weight plane: basic ASTC uses one weight grid controlling all channels' interpolation; dual-plane mode lets alpha or one colour channel travel on an independent weight grid — analogous to BC7's "rotation" but more general, and visibly better on mixed colour + high-frequency-alpha maps. ⑤ HDR + 3D extensions — the LDR profile (universally supported) covers colour in [0, 1]; the HDR profile (partial hardware support) gives float range and effectively serves as mobile BC6H; the 3D profile (more niche) targets voxel textures (medical imaging, smoke simulation, 3D terrain noise). ⑥ Weight grid size independent of block size — a 12×12 block can use a 4×4 weight grid (sparser, donating bits to endpoints) or an 8×8 grid (denser, trading endpoint precision for interpolation precision). This is what makes ASTC more flexible than BC7: the encoder searches a three-dimensional space of "block size × endpoint mode × weight grid." ⑦ Encoding is brutally slow — astcenc, the reference encoder, brute-forces the whole combination space; a single image at 6×6 with the thorough preset can take minutes. But decoding is hardware-native — sampling an ASTC texel costs the same as sampling BC7.

图 24 · ASTC 完整编码 + 采样流程:输入一个 RGBA 块,编码器对 14 种块大小逐一试压(每种内部还要枚举 endpoint mode 和权重网格组合),按 SSIM 评分,在用户给定的"bpp 预算"约束下选最优块大小,把结果塞进 16 byte。GPU 在 sample 时硬件原生解码——Vulkan / OpenGL ES 3.2 / Metal / WebGPU 全部一次 cycle 取出像素,跟 sample BC7 同样快。

Fig 24 · ASTC's full encode + sample pipeline: take an RGBA tile, run trial encodes against all 14 block sizes (each enumerating endpoint modes and weight-grid configurations), score by SSIM, and pick the best block size that fits the project's bpp budget. The result is always 16 bytes. GPUs decode it natively at sample time — Vulkan / OpenGL ES 3.2 / Metal / WebGPU all fetch a pixel in a single cycle, exactly as fast as sampling BC7.

历史专栏 · ASTC 的"标准胜利"之路

HISTORY · ASTC's "standard victory" road

从 ARM 实验室到移动 + WebGPU 事实之王

From ARM's lab to the de-facto king of mobile + WebGPU

ASTC 由 ARM 的图形架构组在 2010 年起开发,主导工程师 Jørn Nystad 跟团队从一个简单的设计目标出发:"一种格式覆盖所有 bpp 档位"。当时的纹理压缩格局是 BCn(桌面、Microsoft / Khronos 体系)+ ETC1/2(移动、Khronos 体系)+ PVRTC(iOS,私有)三足鼎立——但每种格式都有"不灵活"的硬伤:固定块大小、固定 bpp、固定颜色精度。ARM 想做出一种"一统天下"的格式,关键设计就是 14 种块大小的"档位光谱"。

2012 年 Khronos 把 ASTC 列为 KHR 扩展(KHR_texture_compression_astc_ldr),GLES 和 Vulkan 默认要求支持。硬件厂商陆续跟进:AMD 2013 年 GCN 1.1 起支持,NVIDIA 2014 年 Tegra K1 起支持(桌面 GPU 直到 Maxwell 之后才完全支持),Apple 2014 年 A8(iPhone 6)起原生支持。但Intel HD Graphics 一直不支持桌面 ASTC LDR——这是个奇特的"政治"问题:Intel 集成显卡跟桌面 PC 游戏深度绑定,而桌面游戏继续用 BC7,Intel 没动力实现 ASTC。这直接导致 BC7 在桌面继续主导,ASTC 主战场退到移动端。

移动端的胜利是绝对的。OpenGL ES 3.2(2015)强制要求支持 ASTC LDR,所有支持 GLES 3.2 的设备(意味着 2015 年后所有主流 Android 旗舰)都解 ASTC。Apple 在 Metal 里把 ASTC 列为推荐格式(从 A8 起),iOS 11+ 全面推荐美术从 PVRTC 迁到 ASTC。Unity 和 Unreal Engine 在 2018 年都把移动平台默认纹理格式改成 ASTC 6×6——这是"3.56 bpp 视觉接近 BC7 8 bpp"的甜蜜点,体积省 2.25× 还看不出差异。从此移动游戏全面 ASTC 化。

2023 年 WebGPU 把 ASTC LDR Profile 列为可选 feature(macOS WebGPU 默认开启,Windows 桌面 WebGPU 通常不支持)——配合 KTX2 + Basis Universal 转码框架(Basis 把通用 IR 实时转 ASTC / BC7 / ETC2,跨平台单文件部署),ASTC 在 Web 端也逐渐被接受。但 ASTC 的痛点至今没解决:① 编码极慢(astcenc 单图 6×6 thorough 几分钟,brute-force 试块大小 + endpoint mode);② HDR Profile 至今未普及,大多数硬件只实现 LDR Profile;③ 桌面 Intel HD 不支持。这些痛点让 ASTC 跟 BC7 形成"桌面 BC7 / 移动 ASTC"的稳定二分,十年内格局基本不会变。

ASTC was developed at ARM's graphics architecture group starting in 2010. The lead engineer, Jørn Nystad, and his team set out with one design goal: "one format covering every bpp tier." At the time the texture-compression landscape was a three-way split — BCn (desktop, Microsoft / Khronos), ETC1/2 (mobile, Khronos), PVRTC (iOS, proprietary) — but every one of them had an inflexibility ceiling: fixed block size, fixed bpp, fixed colour precision. ARM wanted "the unifier"; the keystone of that design was the spectrum of 14 block sizes.

Khronos took ASTC in as a KHR extension (KHR_texture_compression_astc_ldr) in 2012, with GLES and Vulkan defaulting to require it. Hardware vendors followed: AMD added support in GCN 1.1 (2013), NVIDIA in Tegra K1 (2014, with desktop GPUs only fully on board post-Maxwell), Apple natively from A8 (iPhone 6, 2014). But Intel HD Graphics never supported desktop ASTC LDR — a curious "political" wrinkle: Intel integrated graphics is tied tightly to desktop PC gaming, desktop gaming stayed on BC7, so Intel had no incentive to ship ASTC. That single gap kept BC7 dominant on desktop and pushed ASTC's main battleground to mobile.

Mobile's victory was total. OpenGL ES 3.2 (2015) mandates ASTC LDR — every GLES-3.2 device (which means every Android flagship from 2015 onward) decodes ASTC. Apple lists ASTC as the recommended texture format in Metal (from A8 on), and iOS 11+ universally pushes art teams to migrate from PVRTC to ASTC. Unity and Unreal Engine both moved their default mobile texture format to ASTC 6×6 in 2018 — the sweet spot where "3.56 bpp visually approaches BC7 at 8 bpp," 2.25× smaller and indistinguishable. From there mobile gaming went all-in on ASTC.

In 2023 WebGPU added ASTC LDR Profile as an optional feature (macOS WebGPU enables it by default; desktop Windows WebGPU usually doesn't) — combined with KTX2 + Basis Universal's transcode framework (Basis stores a generic IR and live-transcodes to ASTC / BC7 / ETC2 on the target device, single-file cross-platform), ASTC has been gradually accepted on the web too. But ASTC's pain points are unresolved: ① encoding is brutally slow (astcenc 6×6 thorough takes minutes per image, brute-forcing block size × endpoint mode); ② HDR Profile is still not widely supported, most hardware implements only LDR; ③ desktop Intel HD does not decode ASTC. These pain points lock in the stable "BC7 owns desktop / ASTC owns mobile" split — a balance unlikely to shift in the next decade.

block	bpp	typical use	vs BCn at same bpp
4×4	8.00	UI icons, important textures	≈ BC7 (slightly better)
5×5	5.12	mid-quality, skin / cloth	BCn no equivalent tier
6×6	3.56	environment, mobile default	≫ BC1 (4 bpp) by ~6 dB
8×8	2.00	terrain, distant	BCn no tier here
10×10	1.28	far LOD, skybox	BCn no tier here
12×12	0.89	extreme low, ambient	BCn no tier here

$ astcenc -cl in.png out.astc 6x6 -medium         # LDR profile, 6×6 block, medium preset
$ astcenc -cs in.png out.astc 6x6 -thorough       # thorough = brute-force search, much slower / better
$ astcenc -ch in.exr out.astc 6x6 -medium         # HDR profile (input EXR float)
$ toktx --astc 6x6 out.ktx2 in.png                # wrap into KTX2 (web / WebGPU friendly)
$ toktx --encode astc --astc_blk_d 6x6 out.ktx2 in.png  # explicit block-size flag

适用

USE FOR

移动游戏(iOS A8+ / Android GLES 3.2+,99% 默认)
WebGPU(macOS 默认 / 移动 Web,配合 KTX2)
VR 头显纹理(Quest / Vision Pro 全部 ASTC)
跨平台游戏纹理打包(配合 Basis Universal 转码)
需要"质量/体积"档位灵活调节的项目(同 image 不同 mip 用不同档)

Mobile games (iOS A8+ / Android GLES 3.2+ — 99 % default)
WebGPU (macOS default / mobile web, paired with KTX2)
VR headset textures (Quest / Vision Pro all use ASTC)
Cross-platform game texture packaging (with Basis Universal transcoding)
Projects that need a flexible quality / size dial (same image, different mip tiers)

反适用

AVOID

桌面 Windows / Linux 含 Intel HD 集显的目标(用 BC7)
D3D11 / D3D12 桌面游戏(BC7 是事实标准)
HDR 纹理在多数桌面硬件(用 BC6H;ASTC HDR Profile 桌面支持差)
实时编码场景(astcenc thorough 单图几分钟,极不适合服务端实时)
极老的 Android 设备(GLES 3.0 及以下,改用 ETC2 fallback)

Desktop Windows / Linux targets that include Intel HD iGPUs (use BC7)
D3D11 / D3D12 desktop games (BC7 is the de-facto standard)
HDR textures on most desktop hardware (use BC6H; desktop ASTC HDR support is poor)
Real-time encoding (astcenc thorough takes minutes per image — never use server-side live)
Very old Android devices (GLES 3.0 and below — fall back to ETC2)

scope	APIs	tools	CLI
ASTC LDR	✓✓ Vulkan · ✓✓ OpenGL ES 3.2(强制)· ✓ Metal (Apple A8+) · ~ WebGPU (可选 feature) · ✗ Intel HD Graphics 桌面	✓✓ ARM `astcenc`(参考编码器,开源)· KTX-Software `toktx` · NVIDIA Texture Tools · Mali Texture Compression Tool · Unity / Unreal 内置	`astcenc -cl in.png out.astc 6x6 -medium` · `toktx --astc 6x6 out.ktx2 in.png`
ASTC HDR	~ Vulkan(部分硬件)· ~ Apple Metal (A11+ 部分支持) · ✗ 多数移动 GPU	✓ `astcenc -ch`	`astcenc -ch in.exr out.astc 6x6 -medium`

奇闻 · TRIVIA

TRIVIA

ASTC 6×6(3.56 bpp)在视觉上经常和 BC7 4×4(8 bpp)难分伯仲——两个极不同的码率档,因为各自格式的设计取舍不同,在多数自然纹理上 PSNR 相差仅 1-2 dB,但 ASTC 6×6 的显存占用是 BC7 的 1/2.25。这是 ASTC 在移动端"碾压式"成为默认的根本原因——同样视觉,两倍纹理量,显存稀缺的移动端美术不会拒绝。

astcenc 的编码用的是 brute-force 搜索:对每个块尝试所有"块大小 × endpoint mode × 权重网格"组合。一张 4K 纹理用 6×6 thorough 模式可能要几分钟——这就是为什么游戏构建系统通常把 ASTC 烘焙单独抽出来跑,跟 BC7 一样从 build 流水线分离。Khronos 后来推 astc-encoder 2.0 加了启发式跳过明显劣势的组合,加上 SIMD 优化(AVX2 / NEON),普通模式提速到秒级,但 thorough 模式依然分钟级。

ASTC HDR Profile 还没普及——大多数硬件只支持 LDR Profile。这导致移动端 HDR 反射探针、HDR 环境贴图至今没有"统一格式"——iOS 用 ASTC HDR(A11+),Android 各厂硬件支持不一,WebGPU 移动端基本不支持。这是 ASTC 设计上的"未完成态",移动 HDR 纹理的事实标准可能要等到下一代格式(也许 KTX2 + Basis HDR transcoding)解决。

ASTC 6×6 (3.56 bpp) often visually matches BC7 4×4 (8 bpp) — two very different bitrate tiers, but because each format optimises differently, on most natural textures PSNR differs by only 1-2 dB, while ASTC 6×6 occupies 1/2.25 the VRAM of BC7. That's the underlying reason ASTC became "crushingly default" on mobile — same visuals, 2.25× more textures, and mobile artists short on VRAM won't refuse the deal.

astcenc encodes by brute-force search: every block trials every combination of "block size × endpoint mode × weight grid." A 4K texture in 6×6 thorough mode can take minutes — which is why game build systems usually carve ASTC baking into its own job, just like BC7 splits off the build pipeline. Khronos later shipped astc-encoder 2.0 with heuristics to skip obviously inferior combinations, plus SIMD optimisation (AVX2 / NEON), bringing the medium preset down to seconds — though thorough is still measured in minutes.

ASTC HDR Profile is still not widely supported — most hardware ships only the LDR Profile. As a result mobile HDR reflection probes and HDR environment maps still lack a single unified format: iOS uses ASTC HDR (A11+), Android vendors are inconsistent, and mobile-WebGPU mostly does not support it at all. It's an "unfinished" corner of ASTC's design; the de-facto standard for mobile HDR textures may well wait on the next-generation format (perhaps KTX2 + Basis HDR transcoding).

←突破固定块大小:broke the fixed-block-size mold: ETC2 / BC7(都是 4×4 固定,ASTC 14 档) ↔桌面 / 移动并行:desktop / mobile parallel: BC7(BC7 守桌面 · ASTC 守移动) →影响:influenced: KTX2 + Basis(把"自适应"思路推进到容器层)

Basis Universal — 一次编码、多平台转码

Basis Universal — encode once, transcode anywhere

YEAR 2018 AUTHOR Binomial LLC (Rich Geldreich) · 捐赠给 Khronos EXT .basis(独立) / 内嵌 KTX2 supercompression MIME — · 容器层走 image/ktx2 STD Khronos Basis Universal · KTX2 supercompressionScheme PROFILE ETC1S(~2 bpp · 小)/ UASTC(~8 bpp · 高质) DEPTH RGB / RGBA ALPHA 原生 alpha(UASTC 4×4 / ETC1S 双段) SAMPLE 运行时转码到 GPU 原生块格式 STATUS WebGL / WebGPU 主流 · glTF 推荐 · three.js / Babylon.js 内置

"一份 .basis,运行时按设备转 BC7、ETC2 或 ASTC。"

"One .basis file — transcoded at runtime to BC7, ETC2 or ASTC depending on the device."

Web 和跨平台游戏一直有个尴尬:桌面要 BC7、Android 要 ETC2、现代移动要 ASTC、老 iOS 要 PVRTC——同一张纹理要打四份,资产包体积爆炸,CDN 流量翻倍,管理痛苦不堪。Rich Geldreich(前 Valve、Crunch 作者、桌面纹理压缩领域的活字典)在 2018 年提出"中间格式"思路:编码时存为 Basis Universal(一种紧凑的 IR——intermediate representation),运行时用 JS 或 Wasm 解码到目标设备的块格式。一份资产 → 任意设备。Khronos 接受捐赠后,Basis 成为 KTX2 supercompression scheme 的事实标准,glTF 2.0 把 KTX2 + Basis 列为推荐的纹理 payload。three.js / Babylon.js / Unity WebGL / godot Web 全都内置 Basis transcoder。Web 端从此告别"打四份纹理"的时代。

Web and cross-platform games long suffered an awkward problem: desktop needs BC7, Android needs ETC2, modern mobile wants ASTC, legacy iOS wants PVRTC — the same texture has to be packed four ways, asset bundles balloon, CDN traffic doubles, and asset management becomes a nightmare. In 2018 Rich Geldreich (ex-Valve, author of Crunch, a living encyclopedia of desktop texture compression) proposed an "intermediate format" approach: encode once into Basis Universal (a compact IR — intermediate representation), then at runtime use JS or Wasm to transcode to whatever block format the target device wants. One asset → any device. After Geldreich donated the project to Khronos, Basis became the de-facto KTX2 supercompression scheme; glTF 2.0 lists KTX2 + Basis as the recommended texture payload. three.js / Babylon.js / Unity WebGL / godot Web all ship the Basis transcoder. The "pack four textures" era of the Web ended here.

图 25 · Basis Universal 工作流。编码端用 basisu 把 PNG 压成 ETC1S(超小,~2 bpp,适合普通贴图)或 UASTC(高质,~8 bpp,适合法线/UI/重要纹理)中间格式,装进 .basis 独立容器或 KTX2 supercompression payload。运行时 JS/Wasm transcoder 检测设备能力——桌面转 BC7、Android 转 ETC2、现代移动转 ASTC、老 iOS 转 PVRTC——一份资产打通所有平台,GPU 拿到的是原生块格式可以直接 sample。

Fig 25 · The Basis Universal pipeline. On the encode side, basisu compresses a PNG into either ETC1S (tiny, ~2 bpp, for general textures) or UASTC (high quality, ~8 bpp, for normals / UI / important textures), packed into a standalone .basis container or a KTX2 supercompression payload. At runtime a JS/Wasm transcoder probes the device — desktops get BC7, Android gets ETC2, modern mobile gets ASTC, legacy iOS gets PVRTC — one asset covers every platform, and the GPU receives a native block format it can sample directly.

技术内核

Technical core

Basis 的核心是"中间表示 + 运行时转码"——既不像 BCn/ASTC 那样直接是 GPU 块格式,也不像 PNG/JPEG 那样是 CPU 像素流,而是一种专门设计来"几乎零成本转码到任何块格式"的紧凑中间形态。① 两个 profile:ETC1S 基于 ETC1 的色彩端点结构,每块 ~2 bpp,体积极小,质量约相当于 JPEG 中等;UASTC("Universal ASTC")基于 ASTC 4×4 的子集,每块 8 bpp,质量约等于 ASTC 4×4 / BC7。两档对应"小贴图随便堆"和"重要纹理用高质量"。② 编码后再用 supercompression 压一遍——ETC1S 用 LZ-style + RDO(rate-distortion optimisation)再压缩 30-50%,UASTC 用 Zstd 压缩 ~30%。最终 .basis / KTX2 文件比裸 BCn 还小,比 PNG/JPEG 慢一点解码但能直接送给 GPU。③ 运行时转码极快——transcoder 是设计成 O(blocks) 的简单查表 + 位重排,Wasm 实现单核能跑几百 MB/s,比 PNG 解码快一个数量级。这是 Basis 跟传统"在线解码 PNG → CPU RGBA → uploadTexture"路径的根本区别——后者占 CPU + 占带宽 + 占显存,前者一步到位送 GPU 块格式。④ 支持目标:BC1 / BC3 / BC4 / BC5 / BC7 / ETC1 / ETC2 / ASTC 4×4 / PVRTC1 / PVRTC2 / RGBA32(无块格式硬件兜底)——基本覆盖现役所有 GPU。

Basis's core idea is "intermediate representation + runtime transcode" — it is neither a direct GPU block format like BCn / ASTC, nor a CPU pixel stream like PNG / JPEG, but a compact intermediate form deliberately designed to transcode to any block format at almost zero cost. ① Two profiles: ETC1S is built on ETC1's colour-endpoint structure, ~2 bpp per block, extremely small, with quality roughly on par with mid-quality JPEG; UASTC ("Universal ASTC") is built on a subset of ASTC 4×4, 8 bpp per block, with quality close to ASTC 4×4 / BC7. The two tiers map to "stack lots of small textures" vs "use high quality on important textures." ② The encode is then run through supercompression — ETC1S uses an LZ-style codec plus RDO (rate-distortion optimisation) and shrinks another 30–50 %; UASTC uses Zstd for about 30 %. The resulting .basis / KTX2 file is smaller than raw BCn, decodes a touch slower than PNG / JPEG, but ships straight to the GPU. ③ Runtime transcoding is blazing fast — the transcoder is engineered as O(blocks) with simple table lookups and bit re-shuffling; the Wasm build hits hundreds of MB/s on a single core, an order of magnitude faster than PNG decoding. That's the fundamental difference between Basis and the traditional "decode PNG → CPU RGBA → uploadTexture" path: the latter eats CPU + bandwidth + VRAM, while the former hands the GPU a block format in one step. ④ Supported targets: BC1 / BC3 / BC4 / BC5 / BC7 / ETC1 / ETC2 / ASTC 4×4 / PVRTC1 / PVRTC2 / RGBA32 (an uncompressed fallback for hardware without block formats) — effectively every GPU in service.

适用

USE FOR

glTF 2.0 模型纹理(KTX2 + Basis 是官方推荐)
WebGPU / WebGL 资产(配合 KTX2 容器)
three.js / Babylon.js / PlayCanvas / godot Web 项目
跨平台游戏纹理打包(一份资产覆盖桌面/移动/Web)
CDN 流量敏感的场景(ETC1S ~2 bpp 体积比 PNG 小很多)

glTF 2.0 model textures (KTX2 + Basis is the official recommendation)
WebGPU / WebGL assets (paired with the KTX2 container)
three.js / Babylon.js / PlayCanvas / godot Web projects
Cross-platform game texture packaging (one asset for desktop / mobile / Web)
CDN-bandwidth-sensitive scenarios (ETC1S at ~2 bpp is much smaller than PNG)

反适用

AVOID

服务端不预编译就直接用的场景(Basis 必须离线编码)
需要无损质量的纹理(用未压缩 RGBA8 装进 KTX2)
原生桌面引擎不走 Web 路径(直接用 DDS + BC7)
HDR 纹理(目前 Basis 只支持 LDR;HDR 用 BC6H / ASTC HDR)

Pipelines that skip pre-compilation on the server (Basis requires offline encoding)
Textures that demand lossless quality (use uncompressed RGBA8 inside KTX2)
Native desktop engines not on the Web path (use DDS + BC7 directly)
HDR textures (Basis is LDR-only today; use BC6H / ASTC HDR)

scope	runtimes	tools	CLI
Basis Universal	✓✓ three.js / Babylon.js / PlayCanvas 内置 transcoder · ✓✓ Unity WebGL / godot Web · ✓ 任意 WebGL/WebGPU + Wasm transcoder	✓✓ Khronos `basisu`(参考编码器,开源) · `toktx`(打 KTX2 + Basis payload) · KTX-Software 套件	`basisu in.png -uastc -output_file out.basis` · `toktx --encode uastc out.ktx2 in.png`

奇闻 · TRIVIA

TRIVIA

Basis 是 Rich Geldreich 一个人写的——他在 Valve 时期主导过 Source 引擎的纹理压缩,后来离职做 Binomial,用一年多时间独自完成 Basis 的核心算法和 transcoder。2019 年捐赠给 Khronos 成为开放标准,Khronos 接管后才有团队继续维护。可以说,glTF 2.0 整个 Web 3D 生态的纹理底层,是一个人写的代码——单人产品长成行业标准的范例。

Basis was written essentially by one person — Rich Geldreich. He led texture compression on the Source engine during his Valve days, then left to found Binomial and spent over a year writing Basis's core algorithms and transcoder solo. In 2019 he donated it to Khronos as an open standard, and only afterward did a team take over maintenance. The texture foundation of the entire glTF 2.0 / Web 3D ecosystem is, effectively, code written by one person — a textbook example of a solo product becoming an industry standard.

←统一接口:unifies: BC7 / ETC2 / ASTC →承载:carried by: KTX2 supercompression payload ←思想前身:conceptual ancestor: Crunch(同作者 6 年前作品)

Crunch — 在 BC 体积上再砍一半

Crunch — halving BC's size with a second pass

YEAR 2012 AUTHOR Rich Geldreich(Basis Universal 同作者) EXT .crn(独立容器) MIME — · 私有 STD crnlib 开源(MIT) · 无 ISO PAYLOAD BC1 / BC3(可选 BC4/5) DEPTH RGB / RGBA ALPHA BC3 通道(独立 64-bit alpha 段) SAMPLE 解码回 BCn → GPU 原生 sample STATUS niche · 桌面/移动游戏纹理(被 Basis 替代)

"先 cluster 再 BC1 — 在 BC 体积上再砍一半。"

"Cluster first, then BC1 — halve the BC size."

2010s 初期移动 + 主机游戏的纹理资产包动辄几百 MB,主要是 BC1/BC3 块的累积——iOS App Store 限制单包 < 2 GB,主机光盘也是有限介质。Rich Geldreich 在 2012 年观察到一个事实:大量 4×4 块其实彼此相似——同一张草地纹理里成千上万个块都是"绿色为主、轻微噪点变化",同一面墙的砖块色调几乎一致。如果这些块共用一个码本(codebook),只存"指向码本的索引 + 微小偏移",体积可以再砍一半。Crunch 把这个想法落地:对所有块做 k-means 聚类,然后用 RC(range coder)+ Huffman 二次熵编码 BC 字典——磁盘体积比裸 BCn 再小 30-50%。运行时只需在 CPU 上花几十毫秒解回普通 BCn,再上传给 GPU。这是"BC 之上还能再压"的第一个工程化实践。后来同作者 6 年后用同样思路做了 Basis Universal,覆盖更广 GPU 块格式 + 加上 transcode 维度——Crunch 进入历史。

By the early 2010s, mobile and console game texture bundles had ballooned to hundreds of MB, mostly accumulated BC1/BC3 blocks — the iOS App Store capped single bundles at 2 GB, and console discs are finite media. In 2012 Rich Geldreich noticed an obvious truth: most 4×4 blocks are similar to each other — a grass texture has thousands of blocks that are all "mostly green with mild noise"; the bricks on a wall share an almost identical palette. If those blocks shared a single codebook and we only stored "codebook index + small offset," size could be halved again. Crunch put that idea into practice: run k-means clustering across all blocks, then run RC (range coder) + Huffman as a second-pass entropy code over the BC dictionary — on disk the result is 30–50 % smaller than raw BCn. At runtime the CPU spends tens of milliseconds decoding back to ordinary BCn, then uploads it to the GPU. It was the first engineering-grade demonstration of "compressing on top of BC." Six years later the same author took the idea further, covering more GPU block formats plus an extra transcode dimension — that became Basis Universal, and Crunch quietly walked off into history.

图 26 · Crunch 的核心思路。把整张图所有 4×4 BCn 块视作样本,做 k-means 聚类成 1024 个代表块(codebook),原图每块只存"codebook 索引 + 小偏移量"——典型 ~10 bit 一块,远小于裸 BCn 的 64 bit。codebook 自身和 index 流再用 RC + Huffman 二次熵压缩,最终磁盘体积比 BCn 再砍 30-50%。运行时 CPU 侧解回普通 BC1/BC3 块,再上传 GPU——sample 性能跟普通 BCn 完全一样。

Fig 26 · Crunch's core idea. Treat every 4×4 BCn block in an image as a sample, run k-means to cluster them into 1024 representative blocks (a codebook), and store each block as just "codebook index + small offset" — typically about 10 bits per block, far less than the 64 bits raw BCn would use. Both the codebook and the index stream go through a second pass of RC + Huffman entropy coding, and the on-disk size shrinks another 30–50 % below BCn. At runtime the CPU decodes back to ordinary BC1 / BC3 blocks before upload — sample performance is identical to vanilla BCn.

技术内核

Technical core

Crunch 的工程实现只有两步,但每步都精妙。① k-means cluster:把所有 4×4 块当成一个 64-bit 高维向量样本(BC1 块结构 = 2 个 16-bit 端点 + 32-bit 4-color 索引),用 k-means 在 BC 块空间内聚类成 N 个代表块(典型 N = 1024);每块只存"代表块索引 + 局部偏移量"。这一步把"每块 64 bit 独立"变成"每块 ~10 bit 索引 + 小残差",体积压缩比通常 4-6 倍,但因为 BCn 本身已经是有损,残差很小,质量损失可控。② RC + Huffman 二次熵编码:对 codebook 自身(1024 × 64 bit = 8 KB)和 index 流(~10 bit × 块数)再用 range coder + Huffman 树压缩——index 流通常有强自相关(同一区域的相邻块很可能落在同一 cluster),熵很低,Huffman 能再砍 30-50%。最终 .crn 文件平均比裸 BCn 小 50%,跟 PNG 体积差不多但能直接送 GPU(还要先在 CPU 上 swizzle 回 BCn,有几十 ms 解码延迟)。运行时解码是 streaming 的——可以一边读文件一边解块,不需要一次加载整张图——这是 Crunch 设计上对 mmap 友好的一个细节。

Crunch's engineering implementation has just two steps, but each is delicately tuned. ① k-means clustering: treat every 4×4 block as a 64-bit high-dimensional vector (a BC1 block = two 16-bit endpoints + a 32-bit 4-colour index), then run k-means in BC-block space to find N representative blocks (typically N = 1024); each block stores "representative-block index + local offset." This step turns "64 independent bits per block" into "about 10 bits of index + a tiny residual," giving a 4–6× size compression — and because BCn is already lossy, the residual is small and quality loss stays controlled. ② Second-pass RC + Huffman entropy coding: the codebook itself (1024 × 64 bits = 8 KB) and the index stream (~10 bits × block count) go through a range coder plus Huffman tree — the index stream is strongly auto-correlated (adjacent blocks in the same region almost always fall in the same cluster), entropy is low, and Huffman shaves another 30–50 %. The resulting .crn file averages 50 % smaller than raw BCn, comparable in size to PNG but ready to ship to the GPU (you do still need a CPU swizzle back to BCn first, costing tens of ms of decode latency). Runtime decode is streaming — blocks can be decoded as the file streams in, no need to load the whole image at once — a deliberate design choice that makes Crunch friendly to mmap.

适用

USE FOR

2010s 移动游戏纹理资产包压缩(BC1/BC3 体积敏感场景)
需要"BC 之上再省一半"的资产管线
研究/学习 cluster + 熵编码思路的参考

2010s mobile-game texture bundle compression (BC1 / BC3 size-sensitive cases)
Asset pipelines needing another 50 % off on top of BC
Reference for studying cluster + entropy-coding designs

反适用

AVOID

2018 之后任何新项目——直接用 Basis Universal
需要支持 BC7 / ASTC / ETC2 等多种块格式(Crunch 只覆盖 BC1/BC3)
需要运行时直接送 GPU,不接受 CPU 解码延迟
跨平台部署(无 ETC/ASTC 转码,移动端覆盖差)

Any new project after 2018 — use Basis Universal instead
Pipelines needing multiple block formats (Crunch only covers BC1 / BC3)
Cases that demand zero CPU decode at runtime
Cross-platform deployment (no ETC / ASTC transcode, weak mobile coverage)

scope	engines	tools	CLI
Crunch (.crn)	✓ Unity 内置 importer(早期版本) · ✓ Unreal Engine 4 移动端(可选) · ✗ 现代主流引擎已弃用	✓ crnlib(开源 C++ 库) · ✓ crunch CLI · ~ 命令仍可用但维护停滞	`crunch -file in.png -fileformat crn -dxt1`

奇闻 · TRIVIA

TRIVIA

Crunch 是 Basis 的前身——同一个人(Rich Geldreich)在 6 年后用同样的"cluster + 二次熵编码"思路做了 Basis Universal,但把单一 BC1/BC3 输出扩展到 BC7 / ETC2 / ASTC 等所有 GPU 块格式,并加入了"运行时按设备转码"的维度。可以说 Basis 是 Crunch 的"通用化"升级——Crunch 教会了行业"BC 之上还能再压",Basis 把这个想法扩展到了"任何块格式都能从一份资产生成"。同一个工程师两次发明了行业基础设施,这在图形格式领域非常少见。

Crunch is the prototype of Basis — the same person (Rich Geldreich) used the same "cluster + second-pass entropy coding" idea six years later to build Basis Universal, this time extending single-format BC1 / BC3 output to BC7 / ETC2 / ASTC and every other GPU block format, with the new "runtime per-device transcode" dimension layered on top. You can think of Basis as Crunch's "generalisation" upgrade — Crunch taught the industry that "you can compress further on top of BC," and Basis extended that lesson into "any block format can be generated from a single asset." For one engineer to invent industry infrastructure twice over is exceedingly rare in the graphics-format world.

←基于:based on: BC1 / BC3(额外熵编码层) →被替代:superseded by: Basis Universal(同作者 2018 年的通用化版本) ↔同代竞品:contemporary peer: KTX2 + Zstd supercompression

Mipmap — 一张纹理八张分辨率

Mipmap — one texture, eight resolutions

YEAR 1983 AUTHOR Lance Williams(NYIT) EXT — · 概念,不是文件格式 PAPER "Pyramidal Parametrics" · SIGGRAPH 1983 STD OpenGL / D3D / Vulkan / Metal / WebGL / WebGPU 全部内置 STORAGE base + log₂(N) 降采样层 · 显存 +33% SAMPLE GPU 按 LOD 自动选 mip level · 双线性 / trilinear / aniso CONTAINERS KTX / KTX2 / DDS 内置 mip chain STATUS 所有 3D 场景纹理的基础设施(必须)

"一张纹理八张分辨率,采样时按距离自动选。"

"One texture, eight resolutions — auto-picked by distance at sample time."

1983 年 Lance Williams(NYIT,纽约理工学院计算机图形实验室)在 SIGGRAPH 发表 'Pyramidal Parametrics',第一次系统提出 mipmap 概念。它解决的问题是 3D 场景里最古老也最折磨人的视觉 bug:aliasing(走样/摩尔纹/闪烁)。当一个有纹理的多边形(墙、地面、远处地形)远离相机,屏幕上一个像素就会覆盖纹理上多个 texel——简单的"取最近 texel"采样会随机丢掉大部分信息,产生移动时的闪烁、网格图案上的摩尔纹、远处瓦片纹理的"沸腾"效果。Williams 的洞察:预先存好 N 个降采样层(每层是上一层的 2× 缩放,带 box filter 平均),采样时按"屏幕像素覆盖纹理多大"(LOD,Level of Detail)自动选合适那一层。代价是显存 +33%(几何级数 1+1/4+1/16+...→4/3),收益是无 aliasing + 缓存命中率提升(远处 mip 是小图,容易留在 GPU L2)。所有现代 GPU 纹理默认都带 mipmap,几乎所有引擎都强制开启——这是 GPU 时代的"一旦学会就回不去"的基础设施。

In 1983 Lance Williams (NYIT — the New York Institute of Technology Computer Graphics Lab) published "Pyramidal Parametrics" at SIGGRAPH, the first systematic proposal of the mipmap concept. It solved one of the oldest and most maddening visual bugs in 3D rendering: aliasing (moiré patterns, shimmering, sparkle). When a textured polygon (a wall, the ground, distant terrain) recedes from the camera, a single screen pixel covers many texels — naive "nearest-texel" sampling randomly throws away most of the information, producing shimmering as the camera moves, moiré on grid textures, and "boiling" on distant tiled surfaces. Williams's insight: pre-store N down-sampled layers (each is the previous one box-filtered to 2× smaller), and at sample time pick the right layer based on how much texture area each screen pixel covers (LOD — Level of Detail). The cost is +33 % VRAM (a geometric series 1 + 1/4 + 1/16 + … → 4/3); the payoff is zero aliasing plus better cache hits (distant mips are small and fit in GPU L2). Every modern GPU texture defaults to having mipmaps, almost every engine enforces them — once you know how it feels, you never go back. It's the infrastructure of the GPU era.

图 27 · 标准 mipmap 金字塔。一张 256×256 的 base 纹理(mip 0)逐层降采样到 128 / 64 / 32 / ... / 1(mip 8)。每层面积是上一层的 1/4,所有 mip 加起来体积为 base × (1 + 1/4 + 1/16 + ...) = 4/3 × base——只多 33% 显存,换来全距离零 aliasing + 远处纹理 cache 命中率大幅提升。GPU sample 时基于像素 derivative(屏幕像素覆盖纹理多大)自动选合适 mip level,可在两层之间双线性插值(trilinear filtering)避免 mip 跳变可见的"接缝"。

Fig 27 · A standard mipmap pyramid. A 256×256 base texture (mip 0) is down-sampled layer by layer to 128 / 64 / 32 / … / 1 (mip 8). Each layer is one-quarter the area of the previous one, and the total comes to base × (1 + 1/4 + 1/16 + …) = 4/3 × base — only 33 % more VRAM in exchange for zero aliasing at any distance and dramatically better cache hit-rate for distant textures. At sample time the GPU picks the right mip level from pixel derivatives (how much texture area one screen pixel covers), and can bilinearly blend between two adjacent mips (trilinear filtering) to hide the visible "seams" of mip transitions.

技术内核

Technical core

Mipmap 的工程实现非常直接,但每个细节都暗含思想。① 每张 base 图额外存 log₂(N) 个降采样层:mip 0 = base 原始图;mip 1 = box-filtered 2× 缩放;mip 2 = 又 2×;直到 1×1。一张 1024×1024 base 共 11 个 mip(0~10)。降采样可以用 box filter(简单平均)、Lanczos(更锐利但更贵)、或在 sRGB 空间需要先 gamma decode 再 filter 再 encode 回去——很多老引擎在 sRGB 纹理上没做 gamma-correct mip 生成,导致远处纹理看起来"灰蒙蒙"。② 显存额外 +33%:几何级数 1 + 1/4 + 1/16 + ... = 4/3,极限是基础体积的 4/3。这是个固定开销,大概率值得——除非你的纹理永远只在近距离用(UI、屏幕特效)。③ GPU 采样时按 LOD 自动选 mip level:GPU 在像素着色器里能算出当前像素的 dPdx/dPdy(纹理坐标在屏幕水平/垂直方向的偏导数),据此估出"一个屏幕像素覆盖纹理多大",对数运算后得到 LOD 浮点数。整数部分选 mip level,小数部分用于trilinear filtering——在两个 mip 之间双线性插值,避免 mip 跳变可见的"接缝"。④ 各向异性过滤(anisotropic filtering)是 mipmap 的延伸——当视角倾斜时(比如往远处看的地面),屏幕像素在纹理上覆盖的不是正方形而是细长的矩形,简单 trilinear 会过模糊。aniso filtering 沿主轴方向多采样几次再加权,质量更好但带宽更大,通常给"开 16x aniso"档位。⑤ 容器内置 mip chain:KTX/KTX2/DDS 都把 mip 0 → mip N 顺序拼接进 payload,加载时一次 mmap 全部入显存——这就是为什么 GPU 容器格式天生跟 mipmap 绑定的设计。

The engineering of mipmap is straightforward, but every detail hides a small lesson. ① Each base texture stores log₂(N) extra down-sampled layers: mip 0 = the original base; mip 1 = box-filtered 2× smaller; mip 2 = another 2×; … down to 1×1. A 1024×1024 base has 11 total mips (0–10). The down-sample filter can be box (simple averaging), Lanczos (sharper but more expensive), or — for sRGB textures — must gamma-decode, filter in linear, then re-encode; many older engines skipped gamma-correct mip generation, which is why distant textures looked "washed out" in their games. ② +33 % VRAM: the geometric series 1 + 1/4 + 1/16 + … = 4/3, fixed extra cost. Almost always worth it, unless the texture is only ever used up close (UI, screen FX). ③ GPU picks the mip level by LOD at sample time: in a pixel shader the GPU can compute dPdx / dPdy (the partial derivatives of the texture coordinate in screen X / Y), use that to estimate "how much texture one screen pixel covers," and take the log to get a floating-point LOD. The integer part chooses the mip; the fractional part feeds trilinear filtering — bilinear blending between two adjacent mips to mask the visible "seams" of a mip transition. ④ Anisotropic filtering is an extension of mipmap — at oblique viewing angles (looking down at distant ground, say), one screen pixel covers a long thin rectangle on the texture, not a square, and plain trilinear over-blurs. Aniso filtering takes multiple samples along the major axis and weights them, giving better quality at the cost of bandwidth — usually exposed as a "16× aniso" toggle. ⑤ Containers embed the mip chain: KTX / KTX2 / DDS all concatenate mip 0 → mip N into the payload so a load can mmap the whole pyramid into VRAM at once — which is why GPU container formats have always been designed hand-in-glove with mipmaps.

适用

USE FOR

任何 3D 场景纹理(地形、建筑、角色、远景)——必须开启
视差贴图、normal map、AO、roughness 等 PBR 纹理
大场景中需要保留远处细节但避免 aliasing 的所有用途
纹理 cache 性能优化(远处用小 mip,提升 L2 命中)

Any 3D scene texture (terrain, architecture, characters, vistas) — must be enabled
Parallax / normal / AO / roughness and other PBR maps
Any case in a large scene that wants distant detail without aliasing
Texture cache optimisation (distant geometry uses small mips, boosting L2 hit-rate)

反适用

AVOID

2D UI 元素、屏幕后处理 LUT(永远 1:1 采样,mip 浪费 33% 显存)
动态生成的纹理(每帧重新生成 mip 太贵)
Render target / framebuffer attachment(通常不需要 mip)
极小图(< 32×32,mip 层只有几层,收益小)

2D UI elements, screen post-process LUTs (always 1:1 sampling — mip wastes 33 % VRAM)
Dynamically generated textures (regenerating mips every frame is expensive)
Render target / framebuffer attachments (usually don't need mips)
Very small textures (< 32×32 — only a couple of mip levels, minimal benefit)

scope	APIs	tools	CLI
Mipmap	✓✓ 所有 GPU 硬件原生 · OpenGL / D3D / Vulkan / Metal / WebGL / WebGPU 全部内置	✓✓ `glGenerateMipmap`(GPU 端生成) · `texconv -m` · NVIDIA Texture Tools · KTX-Software toktx · ImageMagick	`texconv -m 0 -f BC7_UNORM in.png`(0 = full chain)· `toktx --genmipmap out.ktx2 in.png`

奇闻 · TRIVIA

TRIVIA

"mip" 来自拉丁文 multum in parvo——"多在小中"——Lance Williams 在 1983 年论文里特意挑这个词描述他的金字塔结构。论文里他还预测了"未来硬件会内置自动 mip 选择"——当时的 SGI 工作站还要 CPU 软件实现,但十年后的 OpenGL 1.0 真的把这个功能做进了固定流水线。

GPU 不开 mipmap 会有非常明显的视觉 bug——远处的瓦片地面会"沸腾"(boiling),网格状围栏会出摩尔纹,移动时纹理像素会闪烁。试着在 Unity / Unreal 里关掉某张纹理的 Generate Mipmaps,把它放在远处看——立刻就懂了为什么所有现代游戏纹理都强制开 mipmap。

Anisotropic filtering 是 mipmap 的延伸,但很多人不知道它跟 mipmap 是耦合的——aniso 仍然在 mip chain 上采样,只是沿主轴方向多采几次。"16x aniso" 不是说 16 个像素,是说沿主轴最多 16 个 mip-level 加权采样。这也是为什么开 aniso 显著吃显存带宽——一个像素可能要采 16 次纹理。

"Mip" comes from the Latin multum in parvo — "much in little" — and Lance Williams deliberately picked the word in his 1983 paper to describe the pyramid structure. The same paper predicted that "future hardware will choose mip levels automatically" — at the time SGI workstations had to do it in CPU software, but ten years later OpenGL 1.0 baked exactly that capability into the fixed-function pipeline.

A GPU sampling without mipmaps produces extremely obvious visual bugs — distant tiled ground "boils," chain-link fences moiré, and texture pixels shimmer as the camera moves. Try disabling Generate Mipmaps on a texture in Unity / Unreal and placing it in the distance — you immediately understand why every modern game texture forces mipmaps on.

Anisotropic filtering is an extension of mipmap, but many people don't realise it's coupled to mip — aniso still samples up and down the mip chain, just multiple times along the major axis. "16× aniso" doesn't mean 16 pixels, it means up to 16 weighted samples across mip levels along the major axis. That's also why enabling aniso noticeably eats bandwidth — a single pixel can fetch the texture 16 times.

←起源:origin: Lance Williams · "Pyramidal Parametrics" · SIGGRAPH 1983 ↔运行环境:runtime substrate for: BCn / ETC2 / ASTC →内置于:embedded in: KTX · DDS · KTX2 · 所有 GPU 容器 →延伸:extension: anisotropic filtering / sparse virtual textures (Mega-Texture)

PHASE III

HDR / 工程影像派 — 旅程的源头

HDR · engineering imaging — the origin of the journey

像素回到了「诞生」与「加工」两站。这一派不在意"网页打开多快",在意"亮度有没有丢、色彩有没有变、医生看片有没有误诊、星星有没有被切掉"。从相机 sensor 出来的 12-bit Bayer,到影视合成的 32-bit float,再到 CT 扫描的 4000 个 metadata tag —— 这些格式承担的不是观看,是测量。八章读完,你会知道为什么 OpenEXR 没人替代得了,DICOM 锁住了医院 IT 四十年。

The pixel returns to birth and edit. This family doesn't care how fast your webpage loads — it cares whether brightness was preserved, whether colors shifted, whether a radiologist misread a scan, whether a star got clipped from the sky. From a camera sensor's 12-bit Bayer to VFX's 32-bit float to CT's 4,000-tag metadata, these formats serve measurement, not viewing. By the end you'll see why OpenEXR has no replacement and why DICOM has locked hospital IT for forty years.

birth

→

edit

→

compress

→

transmit

→

decode

→

VRAM

→

sample

→

screen

OpenEXR — 影视行业标准

OpenEXR — the film industry standard

YEAR 1999 (ILM 内部) / 2003 (开源) AUTHOR Industrial Light & Magic · Florian Kainz / Rod Bogart EXT .exr MIME image/x-exr STD ASWF (Academy Software Foundation, Linux Foundation) CODECS NONE / RLE / ZIP / PIZ (wavelet 无损) / PXR24 / B44 / DWAA / DWAB DEPTH float16 (half) · float32 · uint32 CHANNELS 任意数量 · 不止 RGBA · 可有 Z / motion / object_id / normal / UV / pass… ALPHA ✓ (premultiplied 默认) ANIM 帧序列 (多文件 .0001.exr / .0002.exr) · 单文件 multi-part (2.0+) STATUS VFX / 动画 / 渲染输出行业默认 · ACES 工作流绑定

"星战幕后用了 30 年的格式,你做合成第一个学的就是它。"

"30 years of Star Wars VFX runs on this. The first format you learn in compositing."

1999 年 ILM 在做《珍珠港》《魔戒》前期合成时,发现手上没有一个合用的中间格式:16-bit TIFF 不够动态范围(镜头闪光、火焰、HDRI 环境贴图很容易超过 1.0),Radiance HDR(C29)只有 RGBE 三通道、不能装 Z-depth / motion vector / object ID。VFX 合成流程的真实需求是:(a) 真 HDR float——亮度无上限,可正可负;(b) 任意 channel——一张文件能塞 RGBA + Z + Normal + Motion + Object ID + UV pass + 几十层灯光分层;(c) tile-based 部分加载——一个 8K EXR 可能 2 GB,Nuke / Houdini 经常只读视口看得到的那一小块;(d) 多分辨率 mip——给 IBL 环境贴图直接拿不同 LOD 采样。OpenEXR 就是为这四件事设计的,30 年没出过第二个对手。

In 1999 ILM was deep in pre-production on Pearl Harbor and The Lord of the Rings, and discovered that no existing intermediate format fit their pipeline: 16-bit TIFF lacked dynamic range (lens flares, explosions and HDRI environment maps easily exceed 1.0), and Radiance HDR (C29) only carried three RGBE channels — no Z-depth, motion vector or object ID. The real VFX-compositing requirements were: (a) true HDR float — unbounded brightness, possibly negative; (b) arbitrary channels — one file holding RGBA + Z + Normal + Motion + Object ID + UV pass + dozens of light groups; (c) tile-based partial loading — an 8K EXR can be 2 GB, and Nuke / Houdini routinely read only the viewport tile; (d) multi-resolution mips — sampling IBL environment maps at the right LOD. OpenEXR was designed for those four needs, and in 30 years no rival has emerged.

图 28a · OpenEXR 文件布局。开头 magic + version,接下来是 header attributes(displayWindow / dataWindow / compression / lineOrder 等键值对),然后是 channel list(每个 channel 名 + pixelType,可以有几十个),再到 tile / scanline offset table(随机访问索引),最后才是按 codec 压缩过的 pixel data。底部展示 tiled 模式下"只读视口那一块"的能力——offset table 让你能 seek 到任意 tile,Nuke / Houdini 加载 2 GB EXR 时只 stream 屏幕上看得到的几十 KB。这是 OpenEXR 跟 TIFF / PNG 最本质的区别:不是为完整读图设计的,是为合成流水线的"局部读"设计的。

Fig 28a · OpenEXR file layout. The header begins with magic + version, then a stream of attributes (displayWindow / dataWindow / compression / lineOrder…), then a channel list (each channel's name + pixelType — there can be dozens), then a tile / scanline offset table (random-access index), and finally the codec-compressed pixel data. The lower diagram shows tiled mode's "read only the visible tile" capability: the offset table lets you seek to any tile in O(1), so Nuke / Houdini reading a 2 GB EXR only ever streams the few kilobytes the viewport actually shows. That is OpenEXR's deepest difference from TIFF / PNG: it was not designed for whole-image reads, it was designed for partial reads in a compositing pipeline.

图 28b · EXR 6 种主流 codec 在同张 4K HDRI 上的相对体积:NONE 1.00(基准,纯字节流);RLE 0.78(适合大片同色,如 alpha matte);ZIP 0.48(zlib 通用快速);PIZ 0.33(wavelet 无损,行业默认,比 ZIP 略慢但小 30%);PXR24 0.27(把 float32 截到 24-bit,几乎无损);DWAA 0.12(基于 DCT 的 lossy,视觉无损,体积砍 5-10×,常用于 dailies)。EXR 的特殊之处:你能 per-part / per-channel 选 codec——RGBA 用 DWAA,Z-depth 必须 ZIP 无损,Object ID 用 RLE 整数最佳。

Fig 28b · Relative size of the six main EXR codecs on the same 4K HDRI: NONE 1.00 (raw byte stream baseline); RLE 0.78 (great for large flat regions like alpha mattes); ZIP 0.48 (general-purpose, zlib, fast); PIZ 0.33 (wavelet, lossless, the industry default — slightly slower than ZIP but ~30 % smaller); PXR24 0.27 (truncates float32 to 24 bits, nearly lossless); DWAA 0.12 (DCT-based lossy, visually lossless, 5-10× smaller — the dailies favourite). What makes EXR special: you can pick a codec per part / per channel — DWAA for RGBA, lossless ZIP for Z-depth, RLE for the integer Object ID.

图 28c · 一张典型 VFX 渲染输出 EXR:除了常规 RGBA,还可同时存 Z(depth pass,景深 / 雾 / Z-merge 用)、Normal(法线 pass,relight 用)、Motion(运动矢量,motion blur 用)、Object ID(整数 ID,选择性合成用)、UV(投影修改贴图)、以及若干灯光分层(key / fill / rim 各自的 RGB,用于在合成里调灯比)。一张 EXR 几十个 channel 是常态。Nuke 用 "Shuffle" 节点按名抽取——这就是为什么 EXR 是合成行业的母语,而 PNG / JPEG 在合成里完全没用武之地。

Fig 28c · A typical VFX render-output EXR: alongside the regular RGBA it also stores Z (depth pass — for DOF, fog, Z-merge), Normal (for relighting), Motion (motion-blur vectors), Object ID (integer IDs for selective compositing), UV (projection-based texture edits), plus several light-group passes (key / fill / rim — each as its own RGB, used to re-balance lighting in the comp). Dozens of channels in one EXR is normal. Nuke's "Shuffle" node extracts any group by name — which is exactly why EXR is the compositing industry's mother tongue, and why PNG / JPEG simply have no role in the comp room.

图 28d · 三种像素类型的数值范围。uint16 只能 [0, 65535] 而且只是整数,SDR 上限是 1.0(=65535)——传统 TIFF 用它。float16(half)用 16 bit 表示 [−65504, +65504] 的浮点 + ±Inf + NaN,可以远超 SDR 1.0(高光、火焰、太阳),也可以是负数(色彩空间变换中间值),这是 EXR 的默认。float32 范围 ±3.4×10³⁸,精度极高但体积 2×,科学渲染 / 高精度 IBL 用它。HDR 的本质就是"突破 1.0 这条线",而 float16 是工程上最经济的实现。

Fig 28d · Numeric range for the three pixel types. uint16 covers only the integer range [0, 65535] — the SDR ceiling is 1.0 (= 65535), the realm of traditional TIFF. float16 (half) uses 16 bits to represent floats in [−65504, +65504] plus ±Inf and NaN, can far exceed SDR 1.0 (highlights, fire, the sun), and can be negative (intermediate values in colour-space transforms) — this is EXR's default. float32 spans ±3.4 × 10³⁸ with extreme precision, but at 2× the storage; reserved for scientific rendering and high-precision IBL. HDR is fundamentally about "crossing the 1.0 line," and float16 is the cheapest engineering vehicle for that.

技术内核

Technical core

OpenEXR 的设计跟 PNG / JPEG / TIFF 走的不是一条路——它不是"把一张 RGBA 图存好",而是"为合成流水线提供一个可流式部分加载、可任意配 channel、可分通道选 codec 的容器"。① 任意 channel:不止 RGBA,可以是 R / G / B / A / Z / object_id / motion.x / motion.y / normal.x / normal.y / normal.z / UV.x / UV.y 以及任意自定义命名。channel list 在 header 里,每个 channel 自带 pixelType(half / float / uint)和 sampling rate(支持次采样)。② 半精度 float16(half)是默认 pixelType,16 bit 表示 [−65504, +65504] + ±Inf + NaN——这是 ILM 跟 NVIDIA 在 1999 年一起定义的格式,后来被 IEEE 754-2008 收编(binary16),并成为 GPU 显存里 HDR 纹理的事实标准。③ 多压缩 codec:NONE(纯字节流)/ RLE(整数离散最佳)/ ZIP(zlib 通用)/ ZIPS(逐 scanline ZIP)/ PIZ(wavelet 无损,行业默认)/ PXR24(把 float32 截到 24-bit,Pixar 贡献,几乎无损)/ B44 / B44A(老 lossy)/ DWAA / DWAB(基于 DCT 的现代 lossy,DreamWorks 贡献,体积砍 5-10×,常用于 dailies)。可以 per-part 选不同 codec——RGBA 用 DWAA、Z 用 ZIP、Object ID 用 RLE,各取所长。④ Tile-based 部分加载:文件可选 scanline 或 tile 模式,tile 模式下 header 里有 offset table,Nuke / Houdini / Mari 加载 8K EXR 时只读视口看得到的几个 tile(可能 64 KB 而不是 2 GB)——这个能力是非线性合成软件 / 数字绘景的命脉。⑤ 多分辨率 mip:tiled EXR 可存 mipmaps(rip-maps 也行),IBL 环境贴图按 LOD 直接采样,不必外部生成 mip chain。⑥ 多帧 / multi-part:OpenEXR 1.x 用 .0001.exr / .0002.exr 帧序列(每帧一文件,管线友好);2.0(2013)引入单文件多 part,可在一个 .exr 里塞多个 layer / 多个 view(立体渲染左右眼)/ 多个分辨率,每个 part 独立 codec。这种"容器化"路线让 EXR 跟 USD / OCIO / ACES 这些现代 VFX 中间件无缝衔接。

OpenEXR's design takes a different road from PNG / JPEG / TIFF — it is not "store one RGBA image well" but "provide a streamable, partially-loadable container with arbitrary channels and per-channel codec choice for the compositing pipeline." ① Arbitrary channels: not just RGBA but R / G / B / A / Z / object_id / motion.x / motion.y / normal.x / normal.y / normal.z / UV.x / UV.y plus any custom name. The channel list lives in the header, each channel carrying its own pixelType (half / float / uint) and sampling rate (subsampling supported). ② Half-precision float16 is the default pixelType — 16 bits representing [−65504, +65504] plus ±Inf and NaN — a format ILM and NVIDIA jointly defined in 1999, later folded into IEEE 754-2008 (binary16) and now the de-facto standard for HDR textures in GPU VRAM. ③ Multiple compression codecs: NONE (raw bytes) / RLE (best for integer discrete data) / ZIP (general-purpose zlib) / ZIPS (per-scanline ZIP) / PIZ (wavelet lossless, industry default) / PXR24 (truncates float32 to 24 bits — Pixar's contribution, near-lossless) / B44 / B44A (legacy lossy) / DWAA / DWAB (modern DCT-based lossy, DreamWorks' contribution, 5-10× smaller — the dailies workhorse). Codecs can be picked per part — DWAA on RGBA, ZIP on Z, RLE on Object ID; each plays to strength. ④ Tile-based partial loading: files can be scanline or tile mode; in tile mode the header carries an offset table, so Nuke / Houdini / Mari loading an 8K EXR read only the visible tiles (possibly 64 KB out of 2 GB) — that capability is the lifeblood of node-based compositors and digital matte painters. ⑤ Multi-resolution mips: tiled EXR can carry mipmaps (or rip-maps), so IBL environment maps sample at the right LOD without an external mip chain. ⑥ Multi-frame / multi-part: OpenEXR 1.x used per-frame .0001.exr / .0002.exr sequences (one file per frame, pipeline-friendly); 2.0 (2013) added single-file multi-part, packing multiple layers, multiple views (stereo left/right), or multiple resolutions into one .exr with per-part codec choice. That "container-like" direction lets EXR plug seamlessly into modern VFX middleware — USD, OCIO, ACES.

图 28 · OpenEXR 完整编码流程。渲染输出是一个多通道浮点 buffer——RGBA、Z(深度,float32)、Normal.xyz、Motion.xy、Object ID(整数)、UV、若干灯光分层。EXR 编码器先按用途分组(beauty / data / int / light / UV),再per-part 选 codec:RGBA 和灯光分层用 DWAA(视觉无损,体积砍 5-10×);Z / Normal / Motion / UV 必须 ZIP 无损(几何数据 1 bit 错就出 artifact);Object ID 是整数用 RLE 最优。然后把每个 part 切成 64×64 tile 并建 offset table(Nuke 加载时可只读视口 tile),最后封装进 multi-part 容器并附 ACES 色彩空间属性。整张 8K EXR 可能从 RAW 800 MB 压到 80 MB,而且任何子集可独立读出——这是 30 年没人能替代它的原因。

Fig 28 · OpenEXR's full encode pipeline. The render output is a multi-channel floating-point buffer — RGBA, Z (depth, float32), Normal.xyz, Motion.xy, Object ID (integer), UV, and several light-group passes. The EXR encoder first groups channels by purpose (beauty / data / int / light / UV) and then picks a codec per part: RGBA and light groups go through DWAA (visually lossless, 5-10× smaller); Z / Normal / Motion / UV must stay lossless ZIP (a single bit-flip in geometry data is a visible artifact); integer Object ID is best as RLE. Then each part is split into 64×64 tiles with an offset table (so Nuke can read just the viewport tiles), and finally everything is packed into a multi-part container with the ACES colour-space attribute attached. An entire 8K EXR can compress from raw 800 MB down to about 80 MB, and any subset can still be read independently — which is why no one has displaced it in 30 years.

历史专栏 · 30 YEARS OF HOLLYWOOD VFX

HISTORY · 30 YEARS OF HOLLYWOOD VFX

从 ILM 内部工具到全行业事实标准 · 一个格式承载了 30 年好莱坞特效史

From an ILM internal tool to the industry standard · one format carrying 30 years of Hollywood VFX history

1999 年,Industrial Light & Magic(《星球大战》《侏罗纪公园》《夺宝奇兵》的特效背后那家公司)在做《珍珠港》的烟火 / 爆炸合成时遇到了一个问题:他们手上没有合用的中间格式。16-bit TIFF 不够动态范围,Radiance HDR 没多通道,Pixar 自家的 PIC 是私有的——而 VFX 流程要求"一张文件能存渲染器吐出的所有 pass(RGBA + Z + Normal + Motion + Object ID),还要能 lossy 压缩节省传输"。ILM 的 R&D 工程师 Florian Kainz 跟 Rod Bogart 决定自己写一个,跟 NVIDIA 合作把 float16(half)定义出来——这是 IEEE 754-2008 binary16 的源头之一,也是后来所有 GPU HDR 纹理的事实精度。

2003 年 8 月,在 SIGGRAPH 大会上,ILM 正式宣布 OpenEXR 开源。这件事在 VFX 圈引起的震动相当于 2007 年苹果发 iPhone:一夜之间,所有要做特效的工作室都不再需要购买 Pixar PIC 或 Wavefront RLA 的专有库,可以直接拿 ILM 的代码用。Houdini、Nuke、Mari、Shake 第一时间集成。Pixar 在 2003 年贡献了 PXR24 codec(把 float32 截到 24 bit,几乎无损),DreamWorks 在 2009 年贡献了 DWAA / DWAB(基于 DCT 的 lossy,体积砍 5-10×)——这种"竞争对手共同维护一个开源格式"的现象在工业界极其少见,但因为大家都吃过自己造轮子的亏,反而出奇和谐。

2007 年,OpenEXR 因"对电影工业的技术贡献"获得 Academy Award for Technical Achievement(奥斯卡技术成就奖)——一个文件格式拿奥斯卡,这事在好莱坞历史上也独此一例。颁奖词里写:"OpenEXR 让数字合成的高动态范围工作流成为可能,深刻影响了过去十年里几乎每一部主要好莱坞电影的视觉特效。"《加勒比海盗》《变形金刚》《阿凡达》《复仇者联盟》《沙丘》——你能叫得出名字的特效大片,后期合成里跑的都是 EXR。

2020 年,ILM 把 OpenEXR 捐赠给 Academy Software Foundation(ASWF,Linux Foundation 旗下专为电影工业开源软件设立的基金会)。ASWF 同时托管 OpenColorIO(色彩管理)、OpenVDB(体素数据)、USD(Universal Scene Description)、MaterialX、OpenTimelineIO 等一整套现代 VFX 中间件——OpenEXR 是这套生态的"像素层"。Pixar / Disney / Weta FX(《指环王》《阿凡达》)/ DNEG / Framestore / MPC / ILM 自家、几乎所有大厂都用 EXR;Maya / Houdini / Blender / Cinema 4D / Nuke / DaVinci Resolve / Adobe After Effects 全部原生支持。

而真正把 EXR 推到"现代电影工业事实标准"高度的,是 ACES(Academy Color Encoding System)的崛起。ACES 是 AMPAS(美国电影艺术与科学学院)2014 年定的全行业 HDR 色彩管理标准,所有 ACES 工作流的原生交换格式就是 EXR with ACES2065-1 色彩空间——任何一个用 ACES 的项目(2015 后几乎所有好莱坞主流片),拍摄、调色、合成、混音、母版输出每一步交接的都是 .exr 文件。从这一刻起,OpenEXR 不再只是合成软件之间的中间格式,而是整个电影制作流水线的统一像素容器。

30 年来,EXR 从来没出现过"挑战者"。WebP / AVIF 是消费端格式,跟它不在一个赛道;HEIF 想做电影但没有 multi-part / per-channel codec 的灵活度;JPEG XL 理论上能做 HDR + 多通道但行业惯性不会切。一个 1999 年由几个工程师写的格式,30 年后仍是不可撼动的事实标准——这本身就是工程史的奇景:你做对的设计选择(任意 channel / 部分加载 / 浮点),会让你的格式跨越整个时代。

In 1999, Industrial Light & Magic — the company behind Star Wars, Jurassic Park, and Indiana Jones — hit a wall while compositing pyrotechnics and explosions for Pearl Harbor: no usable intermediate format existed. 16-bit TIFF lacked dynamic range; Radiance HDR carried no multi-channel; Pixar's PIC was proprietary. The VFX pipeline demanded "one file holding every pass the renderer outputs (RGBA + Z + Normal + Motion + Object ID) and lossy compression to save bandwidth." ILM's R&D engineers Florian Kainz and Rod Bogart decided to roll their own, and partnered with NVIDIA to define float16 (half) — one of the source streams that later became IEEE 754-2008 binary16, the de-facto precision of every HDR GPU texture today.

In August 2003, at SIGGRAPH, ILM announced OpenEXR as open source. The shock to the VFX community was on the order of Apple's 2007 iPhone reveal: overnight, every studio doing visual effects was free of needing to license Pixar's PIC or Wavefront's RLA libraries — they could just use ILM's code. Houdini, Nuke, Mari, and Shake integrated it immediately. Pixar contributed the PXR24 codec in 2003 (truncating float32 to 24 bits, nearly lossless); DreamWorks contributed DWAA / DWAB in 2009 (DCT-based lossy, 5-10× smaller). The phenomenon of "direct competitors jointly maintaining one open-source format" is vanishingly rare in industry, but because everyone had been burned by reinventing the wheel, this collaboration ran with surprising harmony.

In 2007, OpenEXR received an Academy Award for Technical Achievement — for an image file format to win an Oscar is unique in Hollywood history. The citation read: "OpenEXR enabled the high-dynamic-range workflows of digital compositing and has profoundly influenced the visual effects of nearly every major Hollywood film over the past decade." Pirates of the Caribbean, Transformers, Avatar, The Avengers, Dune — every blockbuster you can name runs EXR through its compositing pipeline.

In 2020 ILM donated OpenEXR to the Academy Software Foundation (ASWF — a Linux Foundation umbrella created specifically for open-source film-industry software). ASWF also stewards OpenColorIO (colour management), OpenVDB (volumetric data), USD (Universal Scene Description), MaterialX, and OpenTimelineIO — an entire stack of modern VFX middleware. OpenEXR is that stack's "pixel layer." Pixar / Disney / Weta FX (The Lord of the Rings, Avatar) / DNEG / Framestore / MPC / ILM itself — every major shop uses EXR. Maya, Houdini, Blender, Cinema 4D, Nuke, DaVinci Resolve and After Effects all support it natively.

What truly elevated EXR to "the modern film industry's de-facto standard" was the rise of ACES (Academy Color Encoding System). ACES is the industry-wide HDR colour-management standard set by AMPAS in 2014; the native exchange format of every ACES workflow is EXR with the ACES2065-1 colour space. Any ACES project (essentially every major Hollywood release after 2015) hands off .exr files at every step — capture, grade, comp, mix, master delivery. From that moment on OpenEXR ceased to be just an inter-DCC intermediate and became the unified pixel container of the entire film production pipeline.

In 30 years EXR has never had a real challenger. WebP / AVIF are consumer formats — different lane entirely; HEIF aspired to film but lacks multi-part / per-channel codec flexibility; JPEG XL could theoretically do HDR + multi-channel, but industry inertia won't switch. A format written by a handful of engineers in 1999 remains the unshakeable de-facto standard 30 years later — a wonder of engineering history. Make the right design choices (arbitrary channels, partial loading, floating-point) and your format crosses entire eras.

format	bit depth	float	channels	typical use
8-bit JPEG	8	✗	RGB(YCbCr 内部)	screen photo / web
16-bit TIFF	16 int	✗	RGBA	print photo / scan
Radiance HDR	RGBE 32-bit	✓ (shared exp)	RGB	early CG / IBL
OpenEXR	16 / 32 float	✓ (true)	unlimited (任意 named)	VFX / film / render output
HDR10 / HLG	10-bit PQ	✗ (perceptual)	YCbCr	TV broadcast / streaming

$ exrheader scene.exr                          # 看 channel / codec / displayWindow / attrs
$ exrinfo scene.exr                            # 简洁版 header 摘要 (OpenEXR 3.x)
$ oiiotool scene.exr -ch R,G,B,A -o rgb.exr    # OpenImageIO 抽 channel
$ oiiotool scene.exr --channels=Z -o depth.exr # 单独抽 depth pass
$ exrenvmap input.exr cubemap.exr              # latlong → cube · IBL 预处理
$ exrmaketiled in.exr tiled.exr                # scanline → tiled (启用部分加载)
$ exrmultipart -combine -i a.exr b.exr -o m.exr# 多 part 合并到一个文件
$ exr2aces in.exr out.exr                      # 转 ACES2065-1 色彩空间

适用

USE FOR

VFX 合成中间格式(Nuke / After Effects / Fusion 必备)
渲染器输出(Arnold / V-Ray / Renderman / Cycles 默认 EXR)
IBL 环境贴图(latlong / cube,带 mip)
ACES 工作流(2015+ 几乎所有好莱坞片)的全流程交换格式
数字绘景 / matte painting(Mari / Photoshop 32-bit 模式)
需要保留 Z / Normal / Motion / Object ID 等 AOV pass 的渲染管线
立体 / 多视图渲染(单文件 multi-part 装左右眼)

VFX compositing intermediate (mandatory in Nuke / After Effects / Fusion)
Renderer output (Arnold / V-Ray / Renderman / Cycles default to EXR)
IBL environment maps (latlong / cubemap, with mip chain)
End-to-end exchange format for any ACES workflow (essentially every Hollywood release post-2015)
Digital matte painting (Mari, Photoshop 32-bit mode)
Render pipelines that must preserve AOV passes — Z / Normal / Motion / Object ID
Stereo / multi-view renders (single-file multi-part packs left/right eyes)

反适用

AVOID

Web 显示(浏览器不解 EXR · 文件巨大)
移动端 / app 资源(没有 GPU 硬件解码 · 体积不友好)
消费级照片分发(用 JPEG / AVIF / HEIF)
需要 8-bit / 整数像素的最终交付(用 TIFF / PNG)
对体积极敏感的传输场景(EXR 即便 DWAA 也比 JPEG 大几倍)

Web display (browsers don't decode EXR; files are huge)
Mobile / app assets (no GPU hardware decode; size unfriendly)
Consumer photo distribution (use JPEG / AVIF / HEIF)
Final-delivery 8-bit / integer pixels (use TIFF / PNG)
Bandwidth-critical transmission (even DWAA EXR is several times larger than JPEG)

scope	APIs / DCC	tools	CLI
OpenEXR	✓✓ Nuke · Houdini · Mari · Maya · Blender · Cinema 4D · DaVinci Resolve · After Effects · Fusion · Photoshop(32-bit) · Arnold / V-Ray / Renderman / Cycles 渲染器全部原生	✓✓ OpenEXR 官方 lib(C++) · OpenImageIO(`oiiotool`) · ImageMagick · ffmpeg(EXR sequence) · DJV / mrViewer 看片器	`exrheader` · `exrinfo` · `oiiotool` · `exrmaketiled` · `exrenvmap` · `exrmultipart`

奇闻 · TRIVIA

TRIVIA

EXR 默认的 PIZ codec 是用 wavelet 做无损压 RGBA,比 ZIP 略慢但小 30%——这是 ILM 在 1999 年自己写的算法,不是任何标准库里的。30 年后仍然是行业默认,因为它专门针对 float16 高频图像优化(自然图像的 wavelet 分解很稀疏),通用 zlib 在 float 上效果差很多。

DWAA 是 DreamWorks 2009 年贡献的 lossy codec,基于 DCT(JPEG 同源思想,但工作在 float16 上),能把 EXR 体积砍 5-10×,视觉无损。常用于 dailies——导演看的临时合成片,几个 GB 的原始 EXR 经 DWAA 压成几百 MB 才能在内部网传得动。这也是为什么《阿凡达》《复仇者联盟》这种几千个 shot 的项目能跑得起来——离开 DWAA,光是把 dailies 从 Weta 传到剧组就要好几小时。

EXR 可以存负数像素值——这听起来奇怪("亮度怎么会负"),但在 HDR 高动态范围处理里非常常见:色彩空间变换(sRGB → ACES → linear → wide-gamut)产生的中间值、降噪算法的残差、deep compositing 的修正项,都可能短暂出现负值。LDR 整数格式会被 clip 到 0,信息直接丢;EXR 原样保留,直到最终调色环节再做 tone mapping 收回 [0, 1]。这是"为流水线设计"和"为最终显示设计"在数学层面的根本区别。

EXR's default PIZ codec is a wavelet-based lossless compressor for RGBA — slightly slower than ZIP but ~30 % smaller. ILM wrote it themselves in 1999, not from any standard library. Thirty years later it remains the industry default, because it's tuned specifically for float16 high-frequency imagery (natural-image wavelet decompositions are sparse), where generic zlib does poorly on float.

DWAA is DreamWorks' 2009 lossy contribution, DCT-based (the same family as JPEG, but operating on float16), 5-10× smaller, visually lossless. It's the workhorse for dailies — the director's review cut. Multi-GB raw EXR sequences compressed with DWAA shrink to a few hundred MB and only then become tractable for the internal network. It's a key reason productions like Avatar or the Avengers films, with their thousands of shots, are even logistically possible — without DWAA, just shipping dailies from Weta to set would take hours.

EXR can store negative pixel values — sounds odd ("how can luminance be negative?"), but in HDR processing it's routine: colour-space transforms (sRGB → ACES → linear → wide-gamut) leave intermediate negatives, denoising algorithms produce residuals, deep compositing carries correction terms, all transiently negative. LDR integer formats clip to zero and lose the information outright; EXR preserves it bit-exact until the final grade applies tone mapping back into [0, 1]. That is the deep mathematical difference between "designed for the pipeline" and "designed for the final display."

←前辈:predecessors: Radiance HDR · TIFF(EXR 是它们的精神接班人) ←起源:origin: ILM (Florian Kainz / Rod Bogart) · 1999 内部使用 · 2003 SIGGRAPH 开源 ↔同生态:ecosystem peers: OpenColorIO · OpenVDB · USD · MaterialX(同属 ASWF 现代 VFX 中间件) →影响:influence: USDZ / glTF 的 HDR IBL 标准都源自 EXR latlong/cubemap 的设计 →绑定标准:bound to standard: ACES (AMPAS) — 现代电影工业 HDR 色彩管理 · EXR 是 ACES 的原生交换格式 ★荣誉:honour: Academy Award for Technical Achievement, 2007 — 唯一拿过奥斯卡的图像格式

Radiance HDR — 光照贴图老兵

Radiance HDR — the lightmap veteran

YEAR 1989 AUTHOR Greg Ward · Lawrence Berkeley National Lab(Radiance renderer) EXT .hdr · .pic MIME image/vnd.radiance LOSSY RGBE 编码(hack-y lossy · ~1% 精度) DEPTH 32-bit RGBE(三个 8-bit 尾数 + 一个 8-bit 共享指数) RANGE 理论 1e-38 ~ 1e38 ALPHA 无 STATUS IBL / 全景图老兵 · 老资产仍在用

"用 8-bit 共享指数装 32-bit float —— 1989 的 hack。"

"32-bit float packed via shared 8-bit exponent — a 1989 hack."

1989 年 Greg Ward 在 Lawrence Berkeley National Lab 写 Radiance —— 一套物理光照模拟渲染器,要算光在场景里的真实辐射度,输出值会从 1e-6(月光)横跨到 1e6(太阳直射)。当时的难题不是算法,而是把这些数装到磁盘里:浮点 IEEE 754 32-bit/通道的话,一张 1024×768 的图就要 12 MB,而 1989 的硬盘是几十 MB 起跳的奢侈品。Greg Ward 的 hack:RGB 三个通道共享一个 8-bit 指数—— 把 R/G/B 三个 float 归一化到同一个 2^E 之下,只存归一化后的 8-bit 尾数 + 一个 8-bit 指数,合计 32 bit/像素(跟 RGBA8 一样大)。范围理论上 1e-38 到 1e38,精度 ~1%(对光照足够,对色彩管理就显粗糙)。再配一个极简的 RLE 行内压缩,这就是 .hdr 格式。靠着这个 hack,IBL 环境贴图、PSPI(panoramic stereo painted images)、HDR 全景照片在 90 年代到 2000 年代撑了 20 年,直到 OpenEXR 把它替换掉。

In 1989 Greg Ward at Lawrence Berkeley National Lab was writing Radiance — a physically-based lighting-simulation renderer — and needed to store radiance values that spanned 1e-6 (moonlight) to 1e6 (direct sun). The challenge wasn't the math; it was fitting that range on disk: IEEE 754 32-bit per channel meant a 1024×768 image cost 12 MB, and 1989 hard drives were luxuries measured in tens of megabytes. Ward's hack: have R/G/B share a single 8-bit exponent — normalise the three floats to the same 2^E, store the normalised 8-bit mantissas plus one 8-bit exponent, totalling 32 bit/pixel (the same as RGBA8). Range nominally 1e-38 to 1e38, precision ~1 % (good enough for lighting, coarse for colour management). Add a minimal scanline RLE on top, and you have the .hdr format. The hack carried IBL environment maps, PSPI panoramas and HDR photography through the 1990s and 2000s for two decades, until OpenEXR finally retired it.

图 29 · Radiance RGBE 字节布局。每个像素 4 个字节:三个 8-bit 尾数 R / G / B 各占一字节,加一个三个通道共享的 8-bit 指数 E。解码公式 value = (RGB / 256) × 2^(E − 128)——共享指数让三个通道一起放缩,代价是亮度差异极大的颜色(比如蓝色通道很弱、红色很强)精度退化。范围理论上 1e-38 ~ 1e38,精度 ~1%——对光照模拟够用,对色彩管理就嫌粗糙。这是 1989 年的工程取舍:用跟 RGBA8 一样的 32 bit/pixel 装下 6 个数量级的动态范围。

Fig 29 · Radiance RGBE byte layout. Four bytes per pixel: an 8-bit mantissa for each of R / G / B plus a single 8-bit exponent E shared across all three channels. Decode is value = (RGB / 256) × 2^(E − 128) — the shared exponent scales the three channels together, the cost being precision loss when channel intensities differ wildly (a strong red beside a weak blue). Range is nominally 1e−38 to 1e38 at ~1 % precision — fine for lighting simulation, coarse for colour management. The 1989 trade-off: pack six orders of magnitude into the same 32 bit/pixel as RGBA8.

技术内核

Technical core

Radiance HDR 的内核小到只有三件事。① RGBE 编码——三个通道共用一个指数。编码时找 max(R, G, B),归一化到 [0, 1],尾数乘 256 取整,指数加 128 偏移存为一字节;解码时反向。共享指数让"亮度差异极大的颜色"(蓝色通道极弱、红色极强)精度退化——这是它跟 float16 / float32 在色彩管理意义上的本质差距。② 极简 RLE——文件里每一行像素分开压缩:旧格式整行 RGBE 一起 RLE,1991 之后改成"先把 R / G / B / E 四个字节流分别拆开,再各自 RLE",压缩率显著提升(因为 E 经常大段重复,RLE 在它上面收益最大)。压缩开销小到 90 年代的 SGI 能软件实时解码。③ 文本头——文件开头是 ASCII 头,几行 #?RADIANCE / 标识 / 曝光值 / EXPOSURE= / FORMAT= 32-bit_rle_rgbe,然后一个空行,然后是分辨率字符串(-Y 480 +X 640),再之后才是 RLE 二进制流。这种"文本头 + 二进制 payload"的设计后来被 PFM / NetPBM 继承。三件事加起来就是整个 .hdr 格式——简单、自包含、跨平台。代价:① 不支持 alpha;② 没有 metadata(没有 ICC profile、白点、色彩空间);③ 只有 RGB,不能存 Z / Normal / Motion;④ 共享指数精度天生粗糙。这些缺点直接催生了 OpenEXR 在 1999 年的设计目标——"做 Radiance HDR 做不到的所有事"。

The Radiance HDR core is just three things. ① RGBE encoding — three channels share one exponent. Encode by finding max(R, G, B), normalising to [0, 1], scaling mantissas by 256 and rounding, and storing the exponent biased by 128 in one byte; decode is the inverse. The shared exponent loses precision for "channels of wildly different magnitudes" (a strong red beside a weak blue) — that's the format's fundamental colour-management weakness compared to float16 / float32. ② Minimal RLE — pixels are compressed per scanline: the old format ran RLE over the interleaved RGBE bytes; the post-1991 format de-interleaves into four byte streams (R / G / B / E) and RLEs each separately, dramatically improving compression (E often has long runs, where RLE wins biggest). Compression overhead is light enough that 1990s SGI workstations decoded in software in real time. ③ Text header — the file begins with an ASCII header: a few lines of #?RADIANCE / identifier / EXPOSURE= / FORMAT=32-bit_rle_rgbe, then a blank line, then a resolution string (-Y 480 +X 640), and only then the RLE binary stream. The "text header + binary payload" pattern was later inherited by PFM and the NetPBM family. Those three things are the entire .hdr format — simple, self-contained, portable. The costs: ① no alpha; ② no metadata (no ICC profile, no white-point, no colour-space tag); ③ RGB only — no Z, normal or motion channels; ④ inherent precision floor from the shared exponent. Those very gaps drove OpenEXR's 1999 design brief: "do everything Radiance HDR can't."

适用

USE FOR

IBL 环境贴图老资产 · 兼容旧渲染器(1990s-2000s 的 .hdr 库)
全景 latlong HDR 照片(Bracketed exposure stitch 工作流末端)
HDR 光照模拟的最终输出 · 论文 demo / 教学
需要跨多个 DCC 但又不愿付 EXR 复杂度的场景

Legacy IBL environment maps · old-renderer compatibility (1990s-2000s .hdr libraries)
Panoramic latlong HDR photographs (the tail end of bracketed-exposure stitch workflows)
Final output of HDR lighting simulations · paper demos / teaching
Cross-DCC interchange when EXR's complexity isn't worth paying for

反适用

AVOID

现代 VFX 合成场景(用 OpenEXR · 多通道 / 高精度)
需要 alpha 的任何场景(RGBE 没有 A)
色彩管理严格的工作流(共享指数精度太粗 · 没有 ICC)
负数 / 复杂数学中间值(RGBE 只能存非负 RGB)

Modern VFX compositing (use OpenEXR — multi-channel, higher precision)
Anything needing alpha (RGBE has no A)
Strict colour-managed pipelines (the shared exponent is too coarse, no ICC)
Negative or complex intermediate maths (RGBE stores only non-negative RGB)

scope	renderers / tools	editors	CLI
Radiance HDR (.hdr / .pic)	✓✓ Radiance · pbrt · Mitsuba · Arnold · V-Ray · Blender Cycles · 几乎所有 IBL 输入支持	✓ Photoshop(32-bit) · GIMP · Affinity Photo · HDRShop · Picturenaut	`ra_ppm` · `ra_tiff` · `ra_xyze`(Radiance 自带 ra_* 套件)· `oiiotool --convert`

奇闻 · TRIVIA

TRIVIA

Radiance 是 Greg Ward 1985 起在 Lawrence Berkeley National Lab 开发的物理光照模拟渲染器,用于建筑日照和能耗分析;HDR 文件格式只是它的"输出格式",但因为是当时唯一能存 HDR 数据的便携格式,反而比渲染器本身传播更广。Greg Ward 后来又发明了 LogLuv TIFF 编码、JPEG-HDR、VFX 行业的多个色彩科学算法——他是 HDR 工程化的"幕后元老"。

PSPI(panoramic stereo painted images)和早期 IBL 环境贴图行业,90 年代到 2000 年代几乎全部是 .hdr 文件——Paul Debevec 1998 的著名论文《Rendering Synthetic Objects into Real Scenes》用的就是 Radiance HDR 全景照片做光照,从此固化为"HDR IBL = .hdr 文件"的工业惯例。直到 OpenEXR 普及之后,这个惯例才慢慢迁移到 .exr。

Radiance was Greg Ward's physical lighting-simulation renderer, started in 1985 at Lawrence Berkeley National Lab for daylighting and energy-use analysis in buildings; the HDR file format was merely its "output container," but because it was the only portable HDR format at the time, it spread further than the renderer itself. Ward went on to invent LogLuv TIFF encoding, JPEG-HDR, and several VFX-industry colour-science algorithms — he is the quiet elder of HDR engineering.

PSPI (panoramic stereo painted images) and the early IBL environment-map industry ran almost entirely on .hdr files through the 1990s and 2000s — Paul Debevec's 1998 paper Rendering Synthetic Objects into Real Scenes used Radiance HDR panoramas as the lighting source, cementing the convention "HDR IBL = .hdr file." Only as OpenEXR matured did the convention slowly migrate to .exr.

←起源:origin: Greg Ward · Lawrence Berkeley National Lab · 1989 · Radiance renderer 输出格式 ←思想前辈:conceptual ancestor: TIFF(自描述容器思想) →影视接班人:VFX successor: OpenEXR · 1999 ILM 设计目标"做 Radiance 做不到的" ↔仍活在:still alive in: 老 IBL 资产库 · HDRI Haven 早期下载包 · 教育材料

PFM — Portable FloatMap

YEAR ~2005(spec by Paul Debevec / 学术圈) AUTHOR Paul Debevec(USC ICT)/ 计算机图形学术圈 EXT .pfm CONTAINER ASCII 头 + raw float32 像素流 DEPTH float32 RGB(PF)/ float32 灰度(Pf) COMPRESSION 无 · 故意 METADATA 无 · 故意 STATUS 学术 niche · 渲染器中间盘

"NetPBM 的 HDR 表亲 —— ASCII 头加裸 float。"

"NetPBM's HDR cousin — ASCII header plus raw float."

学术研究和渲染器中间格式有一种长期需求,主流格式都满足不了:"最简单的 HDR 容器"——不要任何压缩(读写都是 mmap,瞬间)、不要任何 metadata(纯净,bit 级 reproducibility)、能直接当 float* 数组操作(C 代码 fopen + fseek 过头部就能用,不需要任何库)。OpenEXR 太复杂(几百种 attribute、wavelet codec、tile / scanline 切换),Radiance HDR 精度太粗(RGBE shared exponent),float TIFF 的 IFD 解析又是一坨。Paul Debevec 等学术圈的人在 NetPBM(PPM / PGM / PBM)风格基础上,做了 PFM:三行 ASCII 头(magic / 宽高 / scale 字段)紧跟 raw float32 像素流。论文 supplementary、渲染器中间盘、调试图像 dump,这些场景里 PFM 是最舒服的——别的格式都嫌"太聪明"。

Academic research and renderer-internal storage share a recurring need that no mainstream format satisfies: the simplest possible HDR container — no compression (read / write is just mmap), no metadata (pure, bit-exact reproducibility), and direct use as a float* array (C code can fopen, fseek past the header, and operate on the bytes without any library). OpenEXR is too complex (hundreds of attributes, wavelet codecs, tile / scanline modes), Radiance HDR is too coarse (RGBE shared exponent), float TIFF's IFD parsing is its own mess. Paul Debevec and colleagues in academia took the NetPBM lineage (PPM / PGM / PBM) and produced PFM: three ASCII header lines (magic / width-height / scale) followed by a raw float32 pixel stream. For paper supplementaries, renderer internal dumps, and debugging-image scratch storage, PFM is the most comfortable choice — every other format feels "too clever."

图 30 · PFM 文件布局。前三行是 ASCII 文本:第一行 magic(PF = RGB · Pf = 灰度);第二行宽高(空格分隔);第三行 scale 字段(浮点数,符号位决定字节序——负数小端,正数大端,数值本身用作曝光缩放)。三行加起来约 20 字节。紧跟着就是 raw float32 像素流,RGB 顺序排列,12 字节/像素,自下而上(跟 BMP 同向)。整个文件可以 mmap 直接当 float* 用,跳过头部就行——这是 PFM 唯一的设计目标。

Fig 30 · PFM file layout. The first three lines are ASCII text: line 1 is the magic (PF = RGB, Pf = grayscale); line 2 is width and height separated by a space; line 3 is the scale field (a float whose sign bit encodes endianness — negative for little-endian, positive for big-endian — and whose magnitude doubles as an exposure factor). Total header ~20 bytes. The raw float32 pixel stream follows, RGB-interleaved, 12 bytes per pixel, stored bottom-up (same orientation as BMP). The entire file can be mmap'd as a float* after skipping the header — that simplicity is PFM's single design goal.

技术内核

Technical core

PFM 内核三件事,合起来不到 30 行 C 代码就能写完读写器。① NetPBM 风格的文本头:像 PPM 一样,前几行是 ASCII。第一行是 magic 标识——PF 表示 float32 RGB,Pf 表示 float32 单通道灰度。第二行是宽高(空格分隔的整数)。第三行是 scale 字段——一个浮点数,绝对值是曝光 / 缩放因子(读取时通常忽略,渲染器自己处理),符号编码字节序:负数小端,正数大端。三行,十几个字符。② raw float32 RGB 像素流:头部紧跟二进制 float32 数据,RGB 交错(R₀ G₀ B₀ R₁ G₁ B₁ …),12 字节/像素;灰度模式 4 字节/像素。自下而上(像 BMP,但跟 OpenGL 纹理坐标天然吻合)——这是最常见的踩坑点,新人写 reader 很容易上下颠倒。③ 无任何压缩 / 无任何 metadata:这是故意的。没有 ICC profile,没有色彩空间,没有曝光记录,没有作者注释——纯粹"一张数字"。这恰好是论文实验、渲染器调试、参考实现里最重要的属性:你要 reproduce 别人的结果,任何额外 metadata 都是干扰。代价是它没法用于生产:文件大(4K RGB float32 ≈ 95 MB,无压缩),没法做色彩管理,工具支持 niche。但在它的位置——学术调试、bit-exact 中间盘——没人能替代它。

PFM has three core elements; a complete reader/writer fits in under 30 lines of C. ① NetPBM-style text header: like PPM, the first few lines are ASCII. Line 1 is the magic — PF for float32 RGB, Pf for float32 grayscale. Line 2 is the width and height (space-separated integers). Line 3 is the scale field — a float whose absolute value is an exposure / scale factor (typically ignored at read time; the renderer handles tone mapping itself), and whose sign encodes endianness: negative is little-endian, positive is big-endian. Three lines, a dozen characters. ② Raw float32 RGB pixel stream: binary float32 data follows the header, RGB-interleaved (R₀ G₀ B₀ R₁ G₁ B₁ …) at 12 bytes per pixel; grayscale is 4 bytes per pixel. Bottom-up (like BMP, but conveniently aligned with OpenGL's texture coordinate origin) — the most common pitfall when writing a reader is flipping the rows. ③ No compression, no metadata: intentional. No ICC profile, no colour-space tag, no exposure record, no author comment — just "the numbers." That happens to be the single most important property for paper experiments, renderer debugging, and reference implementations: when you reproduce someone else's result, any extra metadata is noise. The cost is that PFM is unsuitable for production: files are huge (a 4K RGB float32 image is ~95 MB uncompressed), there's no colour management, and tooling support is niche. But in its niche — academic debugging, bit-exact intermediate scratch — nothing else replaces it.

适用

USE FOR

学术论文 supplementary / reproducibility 数据集
渲染器内部中间盘(每帧 dump,要求 mmap 速度)
算法 bit-exact 对比(diff 两个 .pfm 必须完全一致)
调试 / 可视化 float buffer(GPU readback dump)

Academic-paper supplementaries / reproducibility datasets
Renderer internal scratch storage (per-frame dumps that need mmap speed)
Bit-exact algorithm comparison (diff'ing two .pfm files must match byte for byte)
Debugging / visualising float buffers (GPU readback dumps)

反适用

AVOID

任何生产场景(无压缩 · 文件巨大)
需要色彩管理的工作流(无 ICC / 无色彩空间)
跨工具协作(支持 niche)
Web / 移动端(浏览器不解码)

Any production scenario (uncompressed → enormous files)
Colour-managed pipelines (no ICC, no colour-space tag)
Cross-tool collaboration (niche support)
Web / mobile (no browser decode)

scope	tools	libraries	CLI
PFM (.pfm)	✓ ImageMagick · OpenImageIO · pfstools · MATLAB / OpenCV(自定义 reader 常见)	`libpfs` · `OIIO` · pbrt 自带 reader · 论文配套源码常自带 30 行 C 实现	`pfsin` / `pfsout`(pfstools)· `oiiotool in.pfm -o out.exr`

奇闻 · TRIVIA

TRIVIA

PFM 没有正式 RFC,也没有 ISO 标准——它的 spec 是 Paul Debevec 在 USC ICT 网站上写的一篇短笔记,加上 pfstools / pbrt 这些项目里的 reader 源码做"事实标准"。学术圈接受这个,因为大家关心的是"我能 reproduce 你的结果",不是"哪个标准化组织祝福了它"。这是极简优先于流程的格式哲学的活样本——跟同源的 NetPBM 一样。

PFM 的 scale 字段用符号位编码字节序是个非常聪明又非常 confusing 的设计——你解析时先读一个 float,看它是正还是负,再决定后面 raw float32 像素的字节序。绝对值通常是曝光因子(渲染器爱设成 1.0 或者实际曝光值)。新人 reader 很容易踩坑——直接当作"曝光浮点数"读,结果 endian 全反了。

PFM has no formal RFC and no ISO standard — its specification is a short note Paul Debevec posted on the USC ICT site, plus the de-facto reader source in pfstools and pbrt. Academia accepts this because what matters is "I can reproduce your result," not "a standards body blessed it." It's a living example of the simplicity-over-process philosophy — the same line of thinking as NetPBM.

PFM's sign-bit endianness trick on the scale field is brilliantly economical and notoriously confusing — your reader first parses a float, checks its sign to learn the byte order, and only then knows how to read the raw float32 pixels. The absolute value is the exposure factor (renderers usually leave it at 1.0 or set the actual exposure). Beginner readers routinely treat it as just an exposure number and end up with every pixel byte-swapped.

←同源精神:kindred spirit: PPM / NetPBM · 简洁优先,文本头 + raw payload ↔并存:coexists with: OpenEXR(学术 vs 工业 · PFM 给 paper,EXR 给 pipeline) →活跃在:alive in: 计算机图形学论文 supplementary · pbrt / Mitsuba 教科书

16/32-bit TIFF — 被忽视的扛把子

16/32-bit TIFF — the unsung workhorse

YEAR 1986(1.0)· 1992(6.0,事实标准至今) AUTHOR Aldus → Adobe(1994 收购后接管) EXT .tif · .tiff MIME image/tiff STD TIFF 6.0(1992,Aldus / Adobe 联合发布) CODECS NONE / LZW / DEFLATE(ZIP)/ JPEG-in-TIFF / Group 4 fax / PackBits DEPTH 1-bit 黑白 → 32-bit float · 任意精度 ALPHA ✓(ExtraSamples · 多通道任意命名) ANIM 多页(IFD chain · 传真 / 扫描一文件多张) STATUS 印刷 / 摄影 / 扫描 / 卫星 / 医学 / 显微镜行业默认

"40 年了,印刷厂还在用它,因为没有更好的替代。"

"40 years on, print shops still use it because nothing better replaced it."

1986 年 Aldus(后来被 Adobe 在 1994 收购)推 PageMaker —— 桌面排版革命的开端,问题是当时图像格式碎成一地:GIF 只有 256 色,EPS 是 PostScript 矢量,Mac PICT 跨不了平台,扫描仪厂商各自用私有格式。Aldus 跟扫描仪厂商一起设计了 TIFF —— Tag Image File Format —— 目标是"任何位深、任何 codec、任何 metadata、跨平台无损"。它的解法是 tag 系统:不像 BMP 那样固定字段,而是用 IFD(Image File Directory)装一个"几百种 tag 都可选"的描述表,payload 可换 codec,可多页,可跨设备元信息。从此所有需要"高保真 + 灵活元数据"的领域都默认 TIFF:印刷出版、扫描仪、显微镜、医学影像、卫星遥感、文物档案。40 年了,没人替代得了——不是因为它优秀,是因为它什么都能装:DICOM 内嵌它,GeoTIFF 是它的子集,DNG 是它的子集,Photoshop 16-bit 工作流默认它。被忽视的扛把子。

In 1986 Aldus (acquired by Adobe in 1994) launched PageMaker, the start of the desktop publishing revolution. The problem: image formats were a Tower of Babel — GIF only did 256 colours, EPS was vector PostScript, Mac PICT didn't cross platforms, scanner vendors each shipped a proprietary format. Aldus partnered with scanner vendors to design TIFF — Tag Image File Format — aiming for "any bit depth, any codec, any metadata, cross-platform, lossless." The solution was a tag system: rather than fixed fields like BMP, an IFD (Image File Directory) carries a descriptive table of "hundreds of optional tags," the payload swaps codecs, files can be multi-page, and device metadata travels with the image. From then on every domain needing "high fidelity + flexible metadata" defaulted to TIFF: print publishing, scanners, microscopes, medical imaging, satellite remote-sensing, museum archives. Forty years later nothing has replaced it — not because it's elegant, but because it can hold anything: DICOM embeds it, GeoTIFF is its subset, DNG is its subset, Photoshop's 16-bit workflow defaults to it. The unsung workhorse.

图 31 · TIFF 的 IFD(Image File Directory)结构。文件以 8 字节 header 开始(MM=大端 / II=小端 + 版本号 42 + IFD0 偏移)。IFD0 是tag 表,每行 12 字节(tag id 2 字节 + 数据类型 2 字节 + 元素个数 4 字节 + value 或指向 value 的 offset 4 字节)。tag 256 = ImageWidth,tag 259 = Compression(8 = LZW,32773 = PackBits,7 = JPEG,…),tag 322 = TileWidth,几百种。每个 IFD 末尾还有"指向下一个 IFD 的 offset"——多页 TIFF 就是 IFD chain(传真组 4 文档常见,扫描仪扫一沓纸输出一个 .tif)。GeoTIFF / DNG / OME-TIFF / 内嵌 TIFF 的 DICOM 全都是在这个 tag 系统上加自定义 tag——TIFF 的真正持久力在于 tag 系统的扩展性,不在于 baseline TIFF 本身。

Fig 31 · TIFF's IFD (Image File Directory) structure. The file opens with an 8-byte header (MM=big-endian or II=little-endian + version 42 + IFD0 offset). IFD0 is a tag table; each row is 12 bytes (tag id 2 B + data type 2 B + element count 4 B + value-or-offset 4 B). Tag 256 = ImageWidth, tag 259 = Compression (8 = LZW, 32773 = PackBits, 7 = JPEG, …), tag 322 = TileWidth — hundreds of tags exist. Each IFD ends with an "offset to the next IFD" — a multi-page TIFF is an IFD chain (Group 4 fax documents are the canonical case; flatbed scanners produce one .tif per stack of pages). GeoTIFF, DNG, OME-TIFF, and TIFF-embedded DICOM all extend the tag system with custom tags — TIFF's true longevity lies in the extensibility of the tag system, not in baseline TIFF itself.

技术内核

Technical core

TIFF 的设计核心可以总结成四条规则,40 年没变。① 基于 IFD 的 tag 系统:文件不是"按字段顺序"装数据,而是"我有什么属性,就在 tag 表里加一行"。tag id 是 16-bit 无符号整数(0~65535),Adobe 保留 32768 以下,32768~65535 是private tags(GeoTIFF / DNG / OME-TIFF 等子集格式都在这个区域)。每个 tag 自带数据类型(BYTE / SHORT / LONG / RATIONAL / ASCII / FLOAT / DOUBLE 等 12 种),解析器只要"我认识的 tag 处理,不认识的跳过"。这种设计直接借鉴了 IBM 的 EBCDIC 数据描述传统,后来又被 ISOBMFF / Matroska 等容器借鉴。② 多页(IFD chain):每个 IFD 末尾有一个指向"下一个 IFD"的 offset,多页 TIFF 就是把 IFD 串成链表。最经典用例是传真组 4(Group 4 Fax)——黑白文档扫描多页存一个 .tif;现在扩展到扫描仪批量扫描、显微镜 z-stack、卫星多光谱波段,每页一个 IFD。③ 多种 codec 可选:NONE(原始)/ PackBits(早期 Mac RLE)/ LZW(默认无损,90 年代有专利争议)/ DEFLATE(zlib,无损,现在最常用)/ JPEG-in-TIFF(把 JPEG bitstream 当 strip 数据装,1992 加,但 spec 模糊导致实现不一致)/ Group 3 / Group 4 Fax(双值黑白图像专用)/ LERC(地理空间近无损)。每个 strip 或 tile 独立 codec。④ 任意位深:1 bit(黑白扫描)/ 4 / 8(普通照片)/ 16(高保真扫描、医学影像)/ 32-bit float(IEEE 754,科研、HDR)。BitsPerSample tag 是个数组——可以是 (16, 16, 16) 表示 RGB 各 16 bit,可以是 (8, 8, 8, 8) 表示 RGBA8,甚至 (16, 16, 16, 16, 16, 16) 表示 6 通道高光谱。SampleFormat tag 进一步指定每个通道是 unsigned int / signed int / IEEE float / void(自定义)——这就是 TIFF 能存 16-bit 摄影、32-bit float HDR、整数 ID buffer 的根源。

TIFF's design boils down to four rules, unchanged in 40 years. ① IFD-based tag system: the file isn't laid out as "fields in fixed order," it's "whatever properties exist, add a row to the tag table." Tag IDs are 16-bit unsigned integers (0–65535); Adobe reserves 0–32767 and the 32768–65535 range is private tags (where GeoTIFF, DNG, OME-TIFF and other subset formats live). Each tag carries its own data type (BYTE / SHORT / LONG / RATIONAL / ASCII / FLOAT / DOUBLE — 12 in total), and a parser simply handles tags it knows and skips the rest. The design borrows directly from IBM's EBCDIC data-description tradition and was later borrowed by ISOBMFF, Matroska and other modern containers. ② Multi-page (IFD chain): each IFD ends with an offset to the next IFD, so multi-page TIFFs are linked lists of IFDs. The classic use case is Group 4 fax — multi-page black-and-white document scans in a single .tif; today this extends to flatbed batch scans, microscope z-stacks, and satellite multi-spectral bands, one IFD per page. ③ Multiple codec options: NONE (raw) / PackBits (early Mac RLE) / LZW (default lossless, embroiled in 1990s patent disputes) / DEFLATE (zlib, lossless, today's most common choice) / JPEG-in-TIFF (a JPEG bitstream stuffed into strip data, added in 1992 but with vague enough spec language that implementations still disagree) / Group 3 and Group 4 fax (bilevel black-and-white only) / LERC (near-lossless geospatial). Each strip or tile picks its codec independently. ④ Arbitrary bit depth: 1-bit (B&W scans) / 4 / 8 (regular photos) / 16 (high-fidelity scans, medical imaging) / 32-bit float (IEEE 754, science, HDR). The BitsPerSample tag is an array — it can be (16, 16, 16) for RGB at 16 bpp, (8, 8, 8, 8) for RGBA8, or even (16, 16, 16, 16, 16, 16) for six-channel hyperspectral. The SampleFormat tag further specifies whether each channel is unsigned int / signed int / IEEE float / void (custom) — that combination is exactly why TIFF can hold 16-bit photography, 32-bit float HDR, and integer ID buffers in the same container.

适用

USE FOR

印刷出版 · 高保真扫描 · CMYK 精确印前流程
扫描仪 / 复印机 / 传真机默认输出(Group 4 多页)
卫星 / 航测 GeoTIFF · 医学 DICOM · 显微镜 OME-TIFF · 文物数字化
16-bit 摄影 / Photoshop 高位深工作流中间格式

Print publishing · high-fidelity scanning · CMYK pre-press pipelines
Default output of scanners / copiers / fax machines (multi-page Group 4)
Satellite / aerial GeoTIFF · medical DICOM · microscopy OME-TIFF · cultural-heritage digitisation
16-bit photography / intermediate format in Photoshop's high-bit-depth workflow

反适用

AVOID

Web 网页内嵌图(浏览器不解码,得转 JPEG / PNG / WebP / AVIF)
移动端 / app 内分发(体积大、解码慢)
消费级照片分享(用 JPEG / HEIC)
对"必须 100% 兼容"敏感的场景(各家 reader 支持 tag 子集不同)

Web pages (browsers don't decode TIFF; convert to JPEG / PNG / WebP / AVIF)
Mobile / in-app distribution (large, slow to decode)
Consumer photo sharing (use JPEG / HEIC)
Anywhere "must be 100 % compatible" matters (different readers support different tag subsets)

scope	editors / DCC	libraries	CLI
TIFF (.tif / .tiff)	✓✓✓ Photoshop · Lightroom · Capture One · Affinity · GIMP · DaVinci Resolve · ArcGIS / QGIS · Fiji / ImageJ · DICOM viewers · 几乎所有图像工具	`libtiff`(40 年事实标准 reference 实现)· OpenImageIO · GDAL · scikit-image · libgeotiff · OME Bio-Formats	`tiffinfo` · `tiffcp` · `tiffsplit` · `tiff2pdf` · `oiiotool` · `gdalinfo`

奇闻 · TRIVIA

TRIVIA

TIFF 太灵活反而成了问题——不同实现支持的 tag 子集各不相同,工业里"TIFF 兼容"经常意味着"只支持 baseline TIFF"(即 TIFF 6.0 spec 第一部分定义的最小核心子集:8-bit RGB、LZW、单页、strip 模式)。一个用 Photoshop 存的 32-bit float TIFF + JPEG-in-TIFF 压缩,丢到老印刷厂的 RIP 上,可能直接打不开——这是一个长达 30 年的兼容性隐疾。

TIFF 6.0 在 1992 加了 JPEG-in-TIFF —— 把 JPEG bitstream 直接当 strip 数据塞进去 —— 但因为 spec 写得模糊(尤其关于 quantization table 怎么存),不同实现至今不一致,导致同一个 JPEG-in-TIFF 文件在 libtiff 4.0 / Photoshop / Preview 之间可能解出不同结果。后来又出了 "Technical Note 2" 修正,但很多老工具没跟上。这是 TIFF 唯一一次重大失败的扩展。

Adobe 1994 收购 Aldus 后接管 TIFF,但有 30 年没正式发新 spec——TIFF 6.0 (1992) 至今仍是事实标准。中间出过 "Technical Notes" 修补(JPEG-in-TIFF 修正、BigTIFF 64-bit offset 扩展、LERC 接入)但都不是正式版本。所以"TIFF 7.0"是个永远的传说——印刷行业宁可继续用 1992 spec 也不愿冒险。

TIFF's flexibility is also its problem — different implementations support different tag subsets, and "TIFF-compatible" in industry usually means "baseline TIFF only" (the minimal core defined in TIFF 6.0 part 1: 8-bit RGB, LZW, single-page, strip mode). A 32-bit float TIFF saved by Photoshop with JPEG-in-TIFF compression handed to an old print-shop RIP may simply fail to open — a 30-year compatibility chronic illness.

TIFF 6.0 (1992) added JPEG-in-TIFF — stuffing a JPEG bitstream directly into strip data — but the spec language was vague (especially around how quantization tables are stored), and implementations still disagree, so the same JPEG-in-TIFF file can decode differently in libtiff 4.0, Photoshop, and Preview. A later "Technical Note 2" patched the spec, but many older tools never followed. It's TIFF's one major failed extension.

Adobe acquired Aldus in 1994 and took over TIFF, but has not published a new spec in 30 years — TIFF 6.0 (1992) remains the de-facto standard. Several "Technical Notes" patched things along the way (the JPEG-in-TIFF fix, BigTIFF for 64-bit offsets, LERC integration) but none are full versions. So "TIFF 7.0" is an eternal myth — the print industry would rather keep using the 1992 spec than gamble on a new one.

←起源:origin: Aldus(1986)→ Adobe(1994 收购后接管)· TIFF 6.0 (1992) 仍是事实标准 ↔设计灵感:design inspiration: IBM EBCDIC 数据描述传统 → tag 系统 →子集 / 扩展:subsets / extensions: GeoTIFF(地理空间)· DNG(Adobe Raw)· OME-TIFF(显微镜)· BigTIFF(64-bit offset) →影响:influence: OpenEXR 的 attribute 系统 · ISOBMFF / Matroska 的 box 系统都来自这条思想 ↔嵌入于:embedded in: DICOM(医学)· 印刷 RIP · 扫描仪固件 · 摄影 raw 工作流

RAW — 厂商林立的原始数据

RAW — the manufacturer-fragmented origin data

YEAR 1990s 起 · 各厂商各自起步 AUTHOR 各厂商各自(Canon / Nikon / Sony / Fuji / Olympus / Pentax / Adobe …) EXT 无统一(.cr2 / .cr3 / .nef / .arw / .raf / .orf / .rw2 / .dng …几十种) CONTAINER 绝大多数 TIFF/IFD base · CR3 例外(ISOBMFF) DEPTH 12 / 14 / 16 bit/channel · sensor 原生 PATTERN 通常 Bayer (RGGB / BGGR / GRBG / GBRG) · Fuji X-Trans 例外 LOSSY 基本无损 · 部分厂商有"有损 RAW"模式(Canon CRaw / Sony Lossy) METADATA EXIF + 厂商私有 MakerNotes(白平衡 / 色彩矩阵 / 镜头 / 拍摄参数) STATUS 摄影后期主流 · 商业拍摄 / 专业新闻 / 风光 / 婚礼 / 影楼默认

"所谓 RAW,不是一个格式,是几十个互不兼容的格式族。"

"'RAW' is not one format — it's a zoo of several dozen incompatible formats."

数码相机 sensor 的原始输出是 12-14 bit Bayer pattern raw 数据——每个像素位置上只有一个颜色样本(R 或 G 或 B),需要 demosaic 算法才能算出完整 RGB。如果在相机里直接转 JPEG,会立即丢掉四样东西:(a) 高位深(14 bit → 8 bit,动态范围砍 64 倍);(b) demosaic 之前的灵活性(JPEG 已经是固定算法插值过的结果,不能换);(c) 白平衡可调性(JPEG 已经把 WB 烘进像素,后期改容易出色偏);(d) 曝光宽容度(过曝 / 欠曝在 14 bit RAW 里能拉回来,JPEG clip 后无法恢复)。摄影师需要"把决定留到后期再做"的格式 = RAW。但每家相机厂商都自己定义,互不兼容,这是后期工作流 30 年的最大头疼——也是 LibRaw / Lightroom 这些工具存在的全部理由。

A digital camera sensor's raw output is 12-14 bit Bayer-pattern data — each pixel position carries only one colour sample (R or G or B), and a demosaic algorithm has to interpolate the full RGB. Convert to JPEG inside the camera and you immediately lose four things: (a) high bit depth (14 bit → 8 bit, dynamic range cut by 64×); (b) flexibility before demosaic (JPEG is already a fixed-algorithm interpolation, you can't swap it); (c) white-balance malleability (JPEG bakes WB into pixels; later changes risk colour casts); (d) exposure latitude (over- and under-exposure can be recovered in 14 bit RAW; JPEG clips and the data is gone). The format that lets photographers "defer decisions to post" is RAW. But every camera maker defined its own, none compatible with the others — that has been the post-production headache of the past 30 years, and the entire reason LibRaw / Lightroom / Capture One exist.

图 32a · 主流 Bayer CFA(Color Filter Array)的四种 2×2 排列。绿色样本是红/蓝的两倍,因为人眼对绿色亮度最敏感(Bayer 1976 在专利里就这样设计的)。RGGB 是 Sony / Nikon / Canon 大多数现代机的默认;BGGR / GRBG / GBRG 是早期或特定厂商的选择。RAW 文件必须在 metadata 里声明排列,否则解码器 demosaic 会把红绿蓝全部错位。Fuji X-Trans 是另外一套 6×6 排列,完全不是 Bayer——这是 Fuji RAF 文件特别麻烦的原因。

Fig 32a · The four common 2×2 arrangements of the Bayer CFA (Color Filter Array). Green samples are double red and blue because the human eye is most sensitive to green luminance — Bayer's original 1976 patent already specified this. RGGB is the default on most modern Sony / Nikon / Canon bodies; BGGR / GRBG / GBRG appear in earlier or vendor-specific lines. The RAW file must declare its arrangement in metadata, otherwise the demosaic step will permute red, green, and blue everywhere. Fuji's X-Trans is a separate 6×6 arrangement, not Bayer at all — which is what makes Fuji RAF files particularly awkward.

图 32b · demosaic(去马赛克)是 RAW 处理的灵魂:Bayer raw 每个像素只有一个颜色样本,需要算法内插出另外两个通道。常见算法有 AHD(Adaptive Homogeneity-Directed,各向异性同质,默认平衡)、VNG(Variable Number of Gradients,梯度自适应)、PPG(Patterned Pixel Grouping,Canon DPP 老算法)、AMaZE(RawTherapee 自研,细节保留好但慢)。不同 demosaic 算法会让同一张 RAW 出来的细节、伪色、摩尔纹完全不同——这是为什么不同 RAW 处理软件(Lightroom / Capture One / DxO PhotoLab)对同一文件的"出图味道"差别很大。

Fig 32b · Demosaic is the soul of RAW processing: a Bayer raw stores only one colour sample per pixel, and an algorithm has to interpolate the other two channels. Common choices: AHD (Adaptive Homogeneity-Directed, a balanced default), VNG (Variable Number of Gradients, gradient-adaptive), PPG (Patterned Pixel Grouping, Canon DPP's classic), and AMaZE (RawTherapee's home-grown, detail-preserving but slow). Different demosaic algorithms produce visibly different detail, false colour and moiré on the very same RAW — which is why Lightroom, Capture One and DxO PhotoLab all give the same file a different "look."

可调 · 后期不喜欢可重做 · 这是 RAW 的全部价值 JPEG 是把这条流水线在相机里跑完一次 · 烘进 8-bit 像素 · 不可逆 → Lightroom / Capture One / RawTherapee 都是这条流水线的 GUI 底层引擎来自 LibRaw / dcraw / DCP color profile

图 32c · RAW 处理流水线。从 sensor 出来的 Bayer raw,要经过 demosaic → 白平衡 → 色彩矩阵(camera RGB → XYZ → 输出色彩空间)→ 曝光 / tone curve → 最终编码(JPEG / TIFF / DNG / HEIC)。每一步都是可调参数——这正是 RAW 区别于 JPEG 的全部价值。JPEG 是相机里把整条流水线跑完一次、把结果烘进 8-bit 像素;RAW 是把"sensor 出厂的样子"原封不动存下来,把所有决定留到电脑前的后期阶段。Lightroom / Capture One / RawTherapee 的全部 UI,本质上都是这条流水线的可视化前端。

Fig 32c · The RAW processing pipeline. Bayer raw from the sensor passes through demosaic → white balance → colour matrix (camera RGB → XYZ → output colour space) → exposure / tone curve → final encoding (JPEG / TIFF / DNG / HEIC). Every step is a tunable parameter — that's the entire value RAW has over JPEG. JPEG runs this whole pipeline once inside the camera and bakes the result into 8-bit pixels; RAW preserves the sensor's "as-shipped" state byte-for-byte and defers every decision to post on the computer. The UIs of Lightroom, Capture One and RawTherapee are essentially visual front-ends to this pipeline.

图 32d · 主流相机厂商的 RAW 格式与起源年份。绝大多数都是 TIFF/IFD 容器加私有 tag + 私有有损/无损压缩;Canon CR3 是个例外——2018 年 Canon 把容器从 TIFF 改成了 ISOBMFF(同 HEIF / MP4),为的是跟 HEIF 工具链兼容。Leica 是唯一原生用 DNG 的大厂。三巨头(Canon / Nikon / Sony)都不公开 spec,LibRaw / dcraw 全靠逆向工程支持。这就是为什么"开一个 ARW 文件"在不同软件里得到的结果不完全一致——大家都在猜 Sony 的有损压缩到底怎么解。

Fig 32d · Major camera vendors' RAW formats and start years. The vast majority are TIFF/IFD containers with private tags and proprietary lossy/lossless compression; Canon CR3 is the exception — in 2018 Canon swapped the container from TIFF to ISOBMFF (same family as HEIF / MP4) to interoperate with the HEIF toolchain. Leica is the only major brand using DNG natively. The three giants (Canon, Nikon, Sony) don't publish open specs, and LibRaw / dcraw support them entirely through reverse engineering. That's why "open an ARW file" gives slightly different results across applications — everyone is guessing how Sony's lossy compression actually decodes.

技术内核

Technical core

RAW 不是一个格式,是一种思路的几十种实现。技术上有五条共同线索。① Bayer mosaic CFA:sensor 上每个物理像素只盖一种颜色滤镜(R/G/B 中的一种),按 2×2 重复排列。每个 2×2 块里有 2 绿 + 1 红 + 1 蓝(模拟人眼对绿色亮度更敏感)。读 RAW 必须先知道是 RGGB / BGGR / GRBG / GBRG 哪种,再用demosaic 算法(AHD / VNG / PPG / AMaZE / DCB / Igv …十多种)插出每个像素完整的 RGB。Fuji X-Trans 是个异类——6×6 X 形排列,普通 demosaic 算法对它效果差,得用专门的 X-Trans demosaic。② 12-14 bit/channel:不是 8 bit。这意味着比 JPEG 多 4-6 stop 动态范围(高光 / 暗部都能拉)。CMOS sensor 物理 ADC 通常 14 bit,Phase One 等中画幅可达 16 bit。RAW 把这些位深原样保留,后期"曝光 +2 / -2"才不会出 banding。③ 白平衡 / 色彩矩阵 / tone curve 全部未应用:相机只在 EXIF / MakerNotes 里"记录"拍摄时的 WB 是 5500K 还是 Auto,但不烘进像素。色彩矩阵(把 sensor 厂商特定的 R/G/B 响应曲线映射到标准 XYZ 色彩空间的 3×3 矩阵)也是同样:存为 metadata,由后期解码器应用。这是 RAW 跟 JPEG 的根本不同——后者是"决定都做完了的最终结果",前者是"原料 + 配方,但还没开火"。④ 容器基本都基于 TIFF/IFD:Canon CR2 / Nikon NEF / Sony ARW / Fuji RAF / Olympus ORF / Pentax PEF / Panasonic RW2 几乎全是 TIFF base 加私有 tag 区(0x8769 EXIF + 0x927C MakerNote + 厂商私有 tag id)。例外是 Canon CR3(2018 起,改用 ISOBMFF / HEIF 同源容器)和 Sigma X3F(自家完全独立)。这种"TIFF + 私有 tag"的设计意味着标准 TIFF reader 能看到大致结构,但解不出像素——必须靠厂商 SDK 或 LibRaw 的逆向工程。⑤ 解码必须靠厂商 SDK 或 LibRaw:Adobe Camera Raw / Lightroom 的 RAW 解码引擎是闭源商业;开源世界里 LibRaw(Dave Coffin 单文件 C 程序 dcraw 的继承者)通过逆向工程支持几乎所有相机 RAW,是 darktable / RawTherapee / digiKam / Fiji 的共同底层。dcraw 本身是工程史奇迹——Coffin 一个人 20 年维护一份单文件 C,支持上千款相机。LibRaw 接手后变成了正式 lib + 持续更新。

RAW is not one format but one idea realised dozens of times. Five common technical threads. ① Bayer mosaic CFA: every physical sensor pixel sits behind a single colour filter (one of R/G/B), arranged in a repeating 2×2. Each 2×2 has 2 green + 1 red + 1 blue (mirroring the eye's stronger luminance response to green). Reading a RAW requires first knowing the arrangement (RGGB / BGGR / GRBG / GBRG) and then running a demosaic algorithm (AHD / VNG / PPG / AMaZE / DCB / Igv … more than ten exist) to interpolate the full RGB at every pixel. Fuji X-Trans is the oddball — a 6×6 X-shaped pattern, on which generic Bayer demosaicers do poorly; it needs a dedicated X-Trans demosaic. ② 12-14 bit/channel, not 8. That means 4-6 stops more dynamic range than JPEG (highlights and shadows both recoverable). CMOS sensor ADCs are usually physically 14 bit; Phase One and similar medium-format gear reach 16 bit. RAW keeps every bit, so post-exposure "+2 / −2" doesn't band. ③ White balance, colour matrix, and tone curve are not applied. The camera only records in EXIF / MakerNotes that WB was set to 5500K or Auto — it does not bake it into the pixels. The colour matrix (a 3×3 mapping from the sensor's vendor-specific R/G/B response into standard XYZ) is likewise stored as metadata for the decoder to apply later. That is the deep difference from JPEG: JPEG is "all decisions, finalised"; RAW is "ingredients plus recipe, but the burner is off." ④ The container is almost always TIFF/IFD: Canon CR2 / Nikon NEF / Sony ARW / Fuji RAF / Olympus ORF / Pentax PEF / Panasonic RW2 are all TIFF-based with private tag regions (0x8769 EXIF + 0x927C MakerNote + vendor-private tag ids). Exceptions: Canon CR3 (since 2018, ISOBMFF — the HEIF / MP4 family) and Sigma X3F (entirely independent). The "TIFF + private tags" design means a generic TIFF reader can see the gross structure but can't decode the pixels — that requires the vendor SDK or LibRaw's reverse-engineering. ⑤ Decoding leans on the vendor SDK or LibRaw: Adobe Camera Raw / Lightroom's RAW decoder is a closed-source commercial engine; in open source, LibRaw (the successor to Dave Coffin's single-file dcraw) supports nearly every camera RAW through reverse engineering and is the shared backend of darktable / RawTherapee / digiKam / Fiji. dcraw itself is an engineering miracle — Coffin maintained a single-file C program for 20 years that supported thousands of cameras solo. LibRaw took over and turned it into a proper library with continuous updates.

图 32 · RAW 端到端处理流水线。左:相机内,sensor 出 14-bit Bayer raw,经厂商私有压缩(基本无损或可选有损)写到 .CR3 / .NEF / .ARW 文件,带 EXIF + MakerNotes 元数据。中:导入电脑后由 LibRaw / Adobe Camera Raw 用 DCP color profile + 厂商 SDK 解码,跑 demosaic → 白平衡 → 曝光 → tone curve → 色彩空间。右:输出多种最终格式——印刷归档用 16-bit TIFF,跨厂商归档用 DNG,网页用 JPEG,手机端用 HEIC / AVIF,新选项 JPEG XL。原 RAW 文件保留——这是 RAW 的全部价值:5 年后有更好的 demosaic 算法或调色风格,你可以重新出图。

Fig 32 · The end-to-end RAW pipeline. Left: in camera, the sensor produces 14-bit Bayer raw, vendor-private compression writes it to .CR3 / .NEF / .ARW with EXIF + MakerNotes metadata. Middle: imported to a host computer where LibRaw / Adobe Camera Raw decode it with the DCP colour profile and the vendor SDK, running demosaic → white balance → exposure → tone curve → colour space. Right: multiple final outputs — 16-bit TIFF for print masters and archive, DNG for vendor-neutral archive, JPEG for the web, HEIC / AVIF for mobile, and JPEG XL as the newer high-quality / low-size option. The original RAW is kept — this is the whole point of RAW: in five years, better demosaic algorithms or a new grade let you re-render the same shot.

历史专栏 · DCRAW · LIBRAW · 30 YEARS OF RE

HISTORY · DCRAW · LIBRAW · 30 YEARS OF REVERSE ENGINEERING

从 dcraw 单文件 C 程序到 LibRaw 现代库 · 一个人写了 20 年,撑起了整个开源摄影生态

From dcraw — a single-file C program — to LibRaw · one engineer for 20 years, holding up the entire open-source photography ecosystem

1990 年代末,数码相机刚普及,Canon / Nikon / Sony 等厂商各自定义 RAW 格式,谁都不公开 spec,理由很现实——RAW 解码绑着 Adobe Camera Raw 卖,工作流锁定在 Lightroom 上,对厂商有利。开源摄影圈陷入死局:你拍了一张 .NEF,Linux 上没工具能开。1997 年,一个叫 Dave Coffin 的程序员开始逆向工程——写一个叫 dcraw 的 C 程序,通过 hex dump + 尝试解码 + 跟相机生成的官方 JPEG 比对,反推每家相机 RAW 文件的内部结构。整个 dcraw 是单文件 C,大约 1 万行,没有任何依赖。

这一干就是 20 年。Coffin 一个人,无报酬,持续更新 dcraw 支持每一款新出的相机——包括 Canon EOS 全系、Nikon D 系、Sony α 系、Fuji X、Olympus E、Panasonic G、Pentax K、Sigma DP、Phase One、Hasselblad、Leica…到 2017 年最后一版,dcraw 支持1000+ 款相机型号。整个开源摄影世界——RawTherapee、darktable、digiKam、UFRaw、Picasa Linux 版、ImageMagick 的 RAW 输入、Krita 的 RAW 导入、ART——全都靠 dcraw。Coffin 一个人撑起了一个生态。这是开源史上"个人英雄主义"最壮烈的样本之一。

2008 年,俄罗斯程序员 Alex Tutubalin 创建 LibRaw 项目——把 dcraw 的解码核心抽出来,做成正式的 C/C++ 库,加 API、单元测试、版本号、商业支持。dcraw 本来是命令行工具,LibRaw 让它变成可嵌入的 lib——这才让 darktable / RawTherapee 等大型应用敢长期依赖。Coffin 退休后(约 2018 年),LibRaw 接过棒,继续支持新相机,目前每月一两次更新,跟着相机厂商发新机的节奏走。

为什么厂商不告 LibRaw?有几个原因。① 厂商心里清楚,如果开源圈能解 RAW,反而帮他们卖相机(Linux 摄影师也是用户)。② RAW 解码不像 DRM,没有版权层面的保护——逆向工程合法。③ Adobe DNG 标准给了厂商一个台阶下:Adobe 推 DNG 时承诺"DNG 公开免费",但不强制厂商用——结果三大厂没用 DNG,但也默许 LibRaw 解他们的 RAW。这是个微妙的"心照不宣"。④ Pentax / Hasselblad / Leica / iPhone ProRAW 主动用 DNG,等于公开承认"开放比锁定更划算"——但 Canon / Nikon / Sony 没有跟。

现代 RAW 生态长这样:商业三巨头 Adobe Lightroom / Capture One / DxO PhotoLab(各自闭源、各自 demosaic 引擎、各自色彩科学),开源三大件 RawTherapee / darktable / ART(都用 LibRaw 做底层解码,UI 各异)。专业摄影师常常同时装两套——Lightroom 做日常批处理 + 库管理,Capture One / DxO 做精修(因为它们的 demosaic 和肤色科学更好)。手机端 iPhone 拍 ProRAW(实际是 DNG)和 Pixel 拍 Computational RAW 是新战场——这次大厂们其实都用 DNG,Adobe 算是赢了一次。30 年回看,Coffin 当年那 1 万行 C 代码不仅撑起了开源摄影,还间接推动了厂商开放——没有 dcraw,RawTherapee / darktable 这些工具不存在,Linux 摄影师不存在,iPhone ProRAW 也未必会选 DNG。

By the late 1990s, digital cameras had become widespread but Canon / Nikon / Sony each defined their own RAW formats, and none of them published specs. The business reason was concrete — RAW decoding came bundled with Adobe Camera Raw, the workflow was locked into Lightroom, and that benefited the vendors. The open-source photography world hit a wall: shoot a .NEF and there was no Linux tool to open it. In 1997, a programmer named Dave Coffin started a reverse-engineering project — a C program called dcraw that combined hex dumps, trial decodes, and comparisons against the camera's own JPEG output to deduce the internal structure of each vendor's RAW file. The entire program was a single C file, about 10,000 lines, with no dependencies.

He kept that going for twenty years. Coffin, alone and unpaid, continually updated dcraw to support every new camera model released — Canon's full EOS lineup, Nikon's D series, Sony's α series, Fuji X, Olympus E, Panasonic G, Pentax K, Sigma DP, Phase One, Hasselblad, Leica… By the final 2017 release dcraw supported over 1,000 camera models. The entire open-source photography world — RawTherapee, darktable, digiKam, UFRaw, Picasa for Linux, ImageMagick's RAW input, Krita's RAW import, ART — all leaned on dcraw. One person held up an ecosystem. It is among the most dramatic individual-heroism stories in open-source history.

In 2008 the Russian programmer Alex Tutubalin launched LibRaw — extracting dcraw's decode core into a proper C/C++ library with an API, unit tests, version numbers, and commercial support. dcraw had been a command-line tool; LibRaw turned it into an embeddable lib, which is what gave darktable / RawTherapee the confidence to depend on it long-term. After Coffin retired around 2018, LibRaw picked up the baton and now ships an update every month or two, tracking the cadence of camera releases.

Why didn't the vendors sue LibRaw? A few reasons. ① They quietly understand that an open-source RAW pipeline helps them sell cameras (Linux photographers are still customers). ② RAW decoding isn't DRM — there's no copyright handle, and reverse engineering is legal. ③ Adobe's DNG standard gave vendors an exit ramp: Adobe promised "DNG is open and free" but did not force vendors to adopt it — the big three opted out, while tacitly tolerating LibRaw decoding their formats. It's a subtle unspoken pact. ④ Pentax / Hasselblad / Leica / iPhone ProRAW chose DNG natively, effectively conceding that "open beats locked" — but Canon / Nikon / Sony never followed.

The modern RAW ecosystem looks like this: three commercial giants — Adobe Lightroom / Capture One / DxO PhotoLab (each closed-source, each with its own demosaic engine and colour science) — and three open-source mainstays — RawTherapee / darktable / ART (all leaning on LibRaw for decode, with very different UIs). Working professionals often install both: Lightroom for daily batch and library management, Capture One / DxO for finishing (their demosaic and skin-tone science are stronger). On phones, iPhone ProRAW (effectively DNG) and Pixel's Computational RAW are the new front; this time the big players actually do use DNG, so Adobe finally won one. Looking back across 30 years: those 10,000 lines of C didn't just hold up open-source photography, they indirectly pushed vendors toward openness — without dcraw, RawTherapee and darktable wouldn't exist, Linux photographers wouldn't exist, and iPhone ProRAW probably wouldn't have chosen DNG.

brand	format	year	bit depth	container
Canon	CR2 / CR3	2004 / 2018	14	TIFF base · CR3 改 ISOBMFF
Nikon	NEF	1999	12-14	TIFF base
Sony	ARW	2005	14	TIFF base
Fujifilm	RAF	2000	14	TIFF base · X-Trans CFA
Olympus	ORF	2003	12	TIFF base
Adobe	DNG	2004	12-32	TIFF base · 公开 spec

$ dcraw -v -w in.NEF                          # dcraw: 用相机 WB 解码 NEF 输出 PPM
$ dcraw -i -v in.CR2                          # 只读 metadata 不解码
$ rawtherapee-cli -o out.tif -t -c in.CR2     # RawTherapee 命令行 RAW → 16-bit TIFF
$ darktable-cli in.ARW out.jpg                # darktable 命令行 RAW → JPEG
$ exiv2 -p a in.RAF                           # 查 EXIF + MakerNotes
$ exiftool -a -G1 -s in.NEF                   # 万能元数据查看 · 厂商私有 tag 都列出来
$ libraw_unpack in.ARW                        # LibRaw 命令行: 输出未处理 Bayer raw
$ Adobe\ DNG\ Converter --convert in.CR3 out.dng # 转 DNG 归档

适用

USE FOR

商业摄影 / 婚礼 / 时尚 / 风光 / 影楼后期(必需 RAW)
专业新闻 / 体育摄影(后期裁剪 / 曝光宽容度)
HDR 包围曝光合成源(三张 RAW 比三张 JPEG 信息多得多)
天文摄影 / 长时间曝光(暗部噪点处理依赖 14 bit)
需要"5 年后用新工具重出"的归档(DNG 推荐)

Commercial / wedding / fashion / landscape / studio post-production (RAW required)
Professional news / sports (post-crop, exposure latitude)
HDR bracketed merging (three RAWs carry vastly more information than three JPEGs)
Astrophotography / long exposures (shadow noise-handling needs 14-bit headroom)
Archives expected to be re-rendered with future tools (DNG recommended)

反适用

AVOID

终端用户分享(没人想看 .NEF · 给 JPEG / HEIC)
实时预览 / 直播(解码慢)
移动端 / Web(浏览器不解 · 工具链没接)
手机日常拍照(ProRAW 例外,但 99% 场景普通 JPEG / HEIC 够用)
极小存储 / 极小内存设备(RAW 文件 20-100 MB / 张)

Sharing with end users (nobody wants a .NEF — give them JPEG / HEIC)
Live preview / streaming (decode is slow)
Mobile / web (browsers don't decode; toolchains aren't wired)
Everyday phone photography (ProRAW excepted; JPEG / HEIC suffices for 99 % of cases)
Very-tight-storage / tight-memory devices (a RAW file is 20-100 MB)

scope	commercial	open source	CLI / lib
vendor RAW (CR3 / NEF / ARW / RAF / ORF / RW2 / DNG …)	✓✓✓ Adobe Lightroom · Camera Raw · Capture One · DxO PhotoLab · Phase One Capture · ON1 Photo RAW · Luminar	✓✓ RawTherapee · darktable · ART · digiKam · UFRaw · Krita(导入)· GIMP(via plug-in)· Fiji	`dcraw` · `libraw` · `exiftool` · `exiv2` · `rawtherapee-cli` · `darktable-cli` · Adobe DNG Converter

奇闻 · TRIVIA

TRIVIA

Canon CR3(2018 起)的内部容器其实是 ISOBMFF——跟 MP4 / HEIF / HEIC 同一个 spec 族,完全不是 TIFF。Canon 这么做是为了跟 HEIF 工具链共享 box 解析器:CR3 里能直接塞 HEIF 缩略图、HEVC 视频片段、AAC 音频(给"双重曝光"和短视频功能用)。这意味着从 box 结构层面看,CR3 跟 .heic 是同源——但 RAW 像素数据 box 仍是私有的,只有 LibRaw / 厂商 SDK 能解。

Nikon NEF 早期(2005 左右)的有损"加密"被开源圈逆向工程出来——当时 Nikon 试图对 RAW 里的某些 metadata 块做轻量加密,以阻止第三方读"白平衡设置"等信息。社区破解后,Nikon 起诉过 Adobe 但没赢——最终 Nikon 默许了。这是数码摄影史上"厂商 vs 开源"少有的几次正面冲突之一,结果是开源赢了。

Fujifilm 的 X-Trans CFA(从 X-Pro1 开始)不是 Bayer——它是 6×6 的 X 形排列,设计目标是减少摩尔纹(因为周期更长不容易跟拍摄物体的高频图案重合)。代价是普通 demosaic 算法对 X-Trans 效果差,Fuji 用户长期抱怨"X-Trans 文件出图味道怪"——其实是软件 demosaic 的锅。RawTherapee 后来出了专门的 X-Trans demosaic,Capture One 跟 Fuji 合作做了官方插件,问题才缓解。

Canon CR3 (since 2018) is internally an ISOBMFF container — same family as MP4 / HEIF / HEIC, not TIFF at all. Canon did this to share box parsers with the HEIF toolchain: CR3 can carry HEIF thumbnails, HEVC video clips, and AAC audio (used for "double exposure" and short-video features). At the box-structure level CR3 is a sibling of .heic — but the RAW pixel-data box is still private, decodable only by LibRaw or the vendor SDK.

Early Nikon NEF (around 2005) lossy "encryption" was reverse-engineered by the open-source world — Nikon tried lightweight encryption on certain metadata blocks (e.g. white-balance settings) to keep third parties out. The community broke it, Nikon sued Adobe but didn't win, and the matter was eventually tolerated. It's one of the few open vendor-vs-open-source clashes in digital photography history, and the open-source side won.

Fujifilm's X-Trans CFA (since the X-Pro1) is not Bayer — it's a 6×6 X-shaped pattern, designed to reduce moiré (the longer period is less likely to coincide with high-frequency patterns in the subject). The cost is that generic Bayer demosaicers do poorly on X-Trans, and Fuji users long complained "X-Trans files look weird" — really a software-demosaic problem. RawTherapee eventually shipped a dedicated X-Trans demosaic, Capture One co-developed an official plug-in with Fuji, and the situation eased.

←容器祖宗:container ancestor: TIFF(几乎所有 RAW 都是 TIFF base · CR3 例外用 ISOBMFF) →试图统一:attempted unification: DNG(Adobe 2004 推 · 部分采用) →主流厂商分支:vendor branches: CR3 / NEF / ARW(Canon / Nikon / Sony 三巨头各自专有) ↔解码生态:decode ecosystem: dcraw → LibRaw(开源)· Adobe Camera Raw / Capture One / DxO(商业) ↔输出去向:delivery targets: 16-bit TIFF · JPEG · JPEG XL · HEIC / AVIF

DNG — Adobe 想统一 RAW

DNG — Adobe's attempt to unify RAW

YEAR 2004 AUTHOR Adobe Systems EXT .dng MIME image/x-adobe-dng STD DNG 1.7 (2023) · 公开 spec · 免授权 BASE TIFF 6.0 / TIFF/EP 扩展 DEPTH 12 / 14 / 16 / 32 bit · 整数或浮点 STATUS Pentax / Leica / Hasselblad / iPhone ProRAW 原生 · Canon / Nikon / Sony 不主动

"想做 RAW 的 PNG,部分成功。"

"Tried to be the PNG of RAW. Partial success."

2004 年 Adobe 看到 RAW 生态彻底碎掉:Canon CR2、Nikon NEF、Sony ARW、Fuji RAF、Olympus ORF、Pentax PEF、Panasonic RW2…几十种格式互不兼容,每出一款新相机 Adobe Camera Raw / Lightroom 就得加一个 decoder profile,工作量惊人;摄影师归档时也心慌——5 年后还能不能开一张今天的 .ARW?Adobe 推出 DNG(Digital Negative),基于开放的 TIFF/EP(TIFF Electronic Photography)扩展,目标只有一个:"一个公开 spec 的 RAW 格式装所有厂商的数据"。结果一半成功:Pentax / Leica / Hasselblad 选择原生输出 DNG,Apple 2020 年的 iPhone ProRAW 也用 DNG 包装;但 Canon / Nikon / Sony 三巨头坚持自家专有,从未给 DNG 让路。Adobe DNG Converter 工具可以把任意厂商 RAW 离线转 DNG 做归档,但转换过程可能有损 metadata——某些 MakerNotes 字段在 DNG 里没有标准对应,只能丢弃。

By 2004 Adobe saw the RAW ecosystem fully fragmented: Canon CR2, Nikon NEF, Sony ARW, Fuji RAF, Olympus ORF, Pentax PEF, Panasonic RW2 — dozens of mutually incompatible formats. Every new camera body forced Adobe Camera Raw / Lightroom to add another decoder profile, the workload was extraordinary, and photographers were nervous about archiving — would today's .ARW still open in five years? Adobe introduced DNG (Digital Negative), built on the open TIFF/EP (TIFF Electronic Photography) extension, with one goal: "one publicly specified RAW format that holds every vendor's data". The result was half a success: Pentax / Leica / Hasselblad chose to output DNG natively, and Apple's 2020 iPhone ProRAW wraps DNG too — but Canon / Nikon / Sony stuck with their proprietary formats and have never made room for DNG. The Adobe DNG Converter can offline-convert any vendor's RAW to DNG for archive, but conversion may lose some metadata — certain MakerNotes fields have no standard DNG equivalent and are simply dropped.

图 33 · DNG 容器结构。底层是 TIFF 6.0 + TIFF/EP 扩展;Adobe 加了一组 DNG 私有 tag,核心是 ColorMatrix1 / ColorMatrix2(标准光源 D65 / A 下的色彩矩阵,把 sensor RGB 映射到 XYZ)、CFAPattern(Bayer 排列)、CalibrationIlluminant(校准光源)、AsShotNeutral(拍摄时白平衡)。第三个 chunk 是 MakerNotes 透传区——厂商专有元数据原样保留,即使 DNG 解码器看不懂也不丢。最后是原始 Bayer 像素数据,通常还嵌一张 JPEG preview 给 Lightroom 做缩略图。任何 TIFF reader 能看到结构,但只有 DNG-aware 解码器(Adobe Camera Raw / LibRaw)能正确出图。

Fig 33 · DNG container structure. The base is TIFF 6.0 + the TIFF/EP extension; Adobe adds a set of DNG-private tags whose core is ColorMatrix1 / ColorMatrix2 (colour matrices under standard illuminants D65 / A, mapping sensor RGB to XYZ), CFAPattern (the Bayer arrangement), CalibrationIlluminant (calibration illuminant), and AsShotNeutral (white balance at capture). The third chunk is a MakerNotes passthrough region — vendor-private metadata is preserved verbatim even if the DNG decoder can't interpret it. Finally comes the raw Bayer pixel data, usually with an embedded JPEG preview for Lightroom thumbnails. Any TIFF reader can see the structure, but only a DNG-aware decoder (Adobe Camera Raw / LibRaw) can render it correctly.

技术内核

Technical core

DNG 三件事撑起整个设计。① 基于 TIFF/EP 扩展:DNG 不是从零设计的容器,而是在 TIFF 6.0 + TIFF/EP(TIFF Electronic Photography,1998 ISO 12234-2)上加了一组规范化的私有 tag。这意味着已有 TIFF reader 能看到大致结构(虽然不能正确出图),也意味着 DNG spec 公开后,任何人能写 DNG 解码器——Adobe 故意降低门槛。② 厂商私有 metadata 透传:DNG 在容器里专门留一块 MakerNotes 区,把原厂的私有元数据(比如 Sony ARW 里的某个加密曝光块)原样塞进去,DNG 解码器看不懂也不会丢。这是 Adobe 跟厂商的"和解":你转 DNG 不会丢你的相机特定信息,某天厂商 SDK 想读还能读回去。③ 包含 demosaic 后的可选预览 + 完整原始 sensor 数据:DNG 文件里通常嵌一张 JPEG preview(给 Lightroom 缩略图秒开)+ 完整的 Bayer raw payload(给后期重新解码)。比起原厂 RAW 多 5-10% 体积,但换来"打开就有缩略图"的体验。某些 DNG 还可选 lossy compressed 模式(Adobe Lossy DNG,基于 JPEG 在 raw 域上做有损,体积砍 50% 但有损 RAW 的灵活度——主要给 iPhone ProRAW 用)。

DNG rests on three pillars. ① Built on TIFF/EP: DNG is not a from-scratch container; it sits on TIFF 6.0 + TIFF/EP (TIFF Electronic Photography, ISO 12234-2 from 1998) with a standardised set of private tags. Existing TIFF readers can see the gross structure (without rendering correctly), and once the DNG spec was public anyone could write a DNG decoder — Adobe deliberately lowered the barrier. ② Vendor metadata passthrough: DNG reserves a MakerNotes region in the container and stores the original vendor's private metadata (e.g. some encrypted exposure block from Sony ARW) verbatim; the DNG decoder needn't understand it, but it isn't dropped. This is Adobe's reconciliation gesture to vendors: converting to DNG doesn't lose your camera-specific information, and a vendor SDK could in principle read it back later. ③ Optional demosaiced preview + full original sensor data: a DNG file usually carries an embedded JPEG preview (so Lightroom thumbnails appear instantly) plus the complete Bayer raw payload (for re-decoding in post). The cost is 5-10 % more bytes than the original vendor RAW, in exchange for "opens with a thumbnail" UX. Some DNGs also enable a lossy mode (Adobe Lossy DNG — JPEG-style lossy in the raw domain, 50 % smaller, at the cost of some RAW flexibility — primarily targeted at iPhone ProRAW).

适用

USE FOR

厂商无关的 RAW 长期归档(摄影师整理 5-10 年素材)
iPhone ProRAW(Apple 2020 起的官方 RAW 选项)
Pentax / Leica / Hasselblad 原生输出
Lightroom 默认导入选项("转 DNG 后导入")
需要可移植 RAW 的科研 / 文物数字化场景

Vendor-neutral long-term RAW archive (5-10 years of photographer footage)
iPhone ProRAW (Apple's official RAW option since 2020)
Native output from Pentax / Leica / Hasselblad
Lightroom's default import option ("convert to DNG on import")
Research / cultural-heritage digitisation needing portable RAW

反适用

AVOID

Canon / Nikon / Sony 主流相机原生输出(没有,只能事后转)
当前流水线已绑定厂商 SDK 的工作流(转换增加风险)
体积极敏感场景(DNG 通常比原厂 RAW 大 5-10%)
普通终端用户分享(用 JPEG / HEIC)

Native output from mainstream Canon / Nikon / Sony bodies (none — can only convert)
Workflows already bound to vendor SDKs (conversion adds risk)
Strictly size-sensitive scenarios (DNG is typically 5-10 % larger than the original RAW)
Sharing with regular end users (use JPEG / HEIC)

scope	tools	libraries	CLI
DNG (.dng)	✓✓ Adobe Camera Raw · Lightroom · Capture One · darktable · RawTherapee · Apple 系统(iPhone ProRAW 原生)	`LibRaw`(读)· Adobe DNG SDK(读 / 写)· libtiff(读基础结构)	`Adobe DNG Converter`(GUI + CLI)· `dnglab`(开源 RAW → DNG)· `exiftool`

奇闻 · TRIVIA

TRIVIA

iPhone 的 ProRAW(2020 iPhone 12 Pro 起)实际就是 DNG 包装——Apple 选了 DNG 而不是自己造一个新格式,这是 Adobe 在 RAW 标准化之路上最大的一次胜利。ProRAW 还把 Apple 的 computational photography(多帧合成 + Deep Fusion + 智能 HDR)结果烘进 12-bit DNG 里——所以 ProRAW 不是"纯 sensor raw",是"已经做了多帧合成的 raw",这是它的特殊点。

Adobe DNG Converter 可以离线把任意厂商 RAW(Canon / Nikon / Sony / Fuji…)批量转 DNG 归档,但转换过程可能有损 metadata:某些厂商 MakerNotes 字段在 DNG spec 里没有标准对应位,转完后 Lightroom 看着没问题,但用厂商自家软件(比如 Capture NX-D 或 DPP)再开就发现某些参数没了。归档前最好留份原始 RAW 副本——这是社区共识。

iPhone ProRAW (since the 2020 iPhone 12 Pro) is essentially a DNG wrapper — Apple picked DNG rather than inventing a new format, which is Adobe's biggest win on the RAW-standardisation road. ProRAW also bakes Apple's computational photography results (multi-frame fusion + Deep Fusion + Smart HDR) into a 12-bit DNG — so ProRAW is not "pure sensor raw" but "raw that has already been multi-frame-merged," which is its peculiarity.

The Adobe DNG Converter can batch-convert any vendor's RAW (Canon / Nikon / Sony / Fuji…) to DNG offline for archive, but the conversion may lose metadata: some vendor MakerNotes fields have no standard slot in the DNG spec, and after conversion Lightroom looks fine but opening the file in the vendor's own tool (e.g. Capture NX-D or DPP) reveals missing parameters. Keeping a copy of the original RAW before archiving is community consensus.

←基于:based on: TIFF 6.0 / TIFF/EP(ISO 12234-2) →试图替代:attempted to replace: CR3 / NEF / ARW 各家厂商专有 RAW(部分成功) ↔原生采用:native adopters: Pentax · Leica · Hasselblad · Apple iPhone ProRAW ↔解码生态:decode ecosystem: Adobe Camera Raw · LibRaw · darktable · RawTherapee

CR3 / NEF / ARW — 主流厂商的 RAW

CR3 / NEF / ARW — the big-three vendor RAWs

YEAR Canon CR3 (2018) / Nikon NEF (1999) / Sony ARW (2005) AUTHOR Canon · Nikon · Sony 各自 EXT .cr3 · .cr2 · .nef · .nrw · .arw · .srf · .sr2 CONTAINER CR3 = ISOBMFF(同 HEIF)· CR2 / NEF / ARW = TIFF base DEPTH 14-bit 主流(部分 12-bit 入门机) COMPRESSION 私有 · 基本无损 + 可选有损(Canon CRaw / Nikon NEF Compressed / Sony Lossy) SPEC 不公开 · 全靠 LibRaw 逆向工程支持 STATUS 三家占数码相机市场 80%+ · 现役相机主流

"三家相机巨头各做一套,都不兼容,都活得很好。"

"Three camera giants, three formats, none compatible — and all thriving."

Canon / Nikon / Sony 三家占数码相机市场 80% 以上,各家拥有完整的 DSLR / 无反 + 镜头生态(EF / RF / F / Z / E / FE 卡口等),RAW 格式是其专有生态的最后一环——锁定到自家 RAW 意味着用户后期工作流也跟着锁定(用 Canon DPP / Nikon NX Studio / Sony Imaging Edge 时体验最完整,跨家就得依赖 LibRaw 或商业第三方)。Canon 2018 把 CR2 升级 CR3,容器从 TIFF 换成 ISOBMFF(同 HEIF / MP4 spec 族)——为的是跟 HEIF 工具链共享 box 解析器,顺便能在 RAW 文件里塞 HEIF 缩略图、HEVC 视频片段、AAC 音频(给"双重曝光"和短视频功能用)。Nikon NEF 一直是 TIFF base,从 1999 年 D1 到现在 Z 系列没换。Sony ARW 也是 TIFF base,但有臭名昭著的"有损 RAW"模式——早期 α 系列默认输出"压缩 RAW"实际上是有损,被摄影社区批评后才允许选"未压缩"。三家都不公开 RAW spec,LibRaw / dcraw 全靠逆向工程支持。

Canon / Nikon / Sony together hold over 80 % of the digital-camera market, each with a complete DSLR / mirrorless + lens ecosystem (EF / RF / F / Z / E / FE mounts and so on), and the RAW format is the final piece of that proprietary stack — being locked into a vendor's RAW means your post workflow follows (the experience is most complete in Canon DPP / Nikon NX Studio / Sony Imaging Edge; cross-vendor work depends on LibRaw or commercial third parties). Canon upgraded CR2 to CR3 in 2018, swapping the container from TIFF to ISOBMFF (same family as HEIF / MP4) — to share box parsers with the HEIF toolchain and incidentally to embed HEIF thumbnails, HEVC video clips, and AAC audio in the RAW file (for "double-exposure" and short-video features). Nikon NEF has been TIFF-based since the 1999 D1 and the Z series has not changed it. Sony ARW is also TIFF-based, but with the notorious "lossy RAW" mode — early α bodies defaulted to "compressed RAW" that was actually lossy, and only after sustained criticism from the photography community was an "uncompressed" option allowed. None of the three publish RAW specs; LibRaw / dcraw support them entirely through reverse engineering.

图 34 · Canon CR2 vs CR3 容器对比。CR2 (2004) 是 TIFF base——文件开头是 TIFF header,然后线性排着多个 IFD(每个 IFD 是一张图:预览 JPEG / 缩略图 / RGB preview / RAW 数据),最后是 Canon 私有 raw payload。CR3 (2018) 完全换成 ISOBMFF——跟 HEIF / MP4 同一个 spec 族,文件由 ftyp + moov + mdat + meta 等 box 组成,RAW 数据、JPEG 预览、HEVC 视频片段、AAC 音频可同时装在一个 mdat 里。Canon 这么改是为了:① 跟 HEIF 工具链共享 box 解析器;② 给"双重曝光"和短视频拍摄留接口;③ 跟现代 ISOBMFF 生态(MP4 / HEIF / AVIF / JPEG XL 容器)对齐。代价:老 dcraw / 老 LibRaw 全部要重写——CR3 出来后开源世界花了一年多才稳定支持。

Fig 34 · Canon CR2 vs CR3 container comparison. CR2 (2004) is TIFF-based — the file opens with a TIFF header, then several IFDs in linear order (each IFD is one image: preview JPEG / thumbnail / RGB preview / RAW data), and ends with the Canon-private raw payload. CR3 (2018) switches entirely to ISOBMFF — the same family as HEIF / MP4 — so the file is composed of ftyp + moov + mdat + meta boxes, and RAW data, JPEG previews, HEVC video clips, and AAC audio can all sit in the same mdat. Canon made the move to: ① share box parsers with the HEIF toolchain; ② leave room for "double-exposure" and short-video features; ③ align with the modern ISOBMFF ecosystem (MP4 / HEIF / AVIF / JPEG XL containers). Cost: old dcraw / old LibRaw had to be rewritten — after CR3 launched, it took the open-source world over a year to support it stably.

技术内核

Technical core

三巨头 RAW 共三条线索。① 容器:CR3 是 ISOBMFF,CR2 / NEF / ARW 是 TIFF 系。Canon 2018 把 CR2 升级 CR3 时换了容器,目的就是跟现代 ISOBMFF 生态(HEIF / MP4 / AVIF / JPEG XL)对齐,顺便能在一个 .CR3 里塞 RAW + JPEG preview + HEVC 视频片段 + AAC 音频(给"双重曝光"和短视频功能用)。Nikon NEF 和 Sony ARW 还是传统 TIFF base——文件开头 TIFF header,接 IFD chain,每个 IFD 装一张图(thumbnail / preview JPEG / 真正 RAW),Sony 还在 IFD 里加私有 SR2 sub-IFD 装额外 metadata。② 各家私有有损 RAW 压缩。Canon 有 CRaw(visually lossless,体积砍 30-40%);Nikon 有 NEF Compressed(实际是把 14-bit raw 用一个查找表压成 12-bit 等价精度,有损但视觉无损);Sony 早期默认就是有损"压缩 RAW"(被批评后允许选"未压缩")。这些有损模式都是闭源算法,LibRaw 逆向支持但有时跟厂商官方解码结果略有偏差。③ "有损 RAW"概念的兴起。原本 RAW 的精神就是"无损保留 sensor 数据",但 14-bit 有损压缩(类似 Lossy DNG)能砍体积 50-70%、视觉几乎无损,对存储敏感的场景(连拍 / 4K 视频拍摄间隙拍照)很有吸引力。Canon CRaw / Sony Compressed RAW / Nikon NEF Compressed 都属于这类——长远看 RAW 文件正在向 "有损但视觉无损" 滑动,这跟 JPEG XL / HEIC 的设计哲学不谋而合。

Three threads connect the big-three RAWs. ① Container: CR3 is ISOBMFF, CR2 / NEF / ARW are TIFF-family. When Canon upgraded CR2 to CR3 in 2018 it swapped the container — the goal was to align with the modern ISOBMFF ecosystem (HEIF / MP4 / AVIF / JPEG XL) and incidentally to pack RAW + JPEG preview + HEVC video clips + AAC audio into one .CR3 (used by "double-exposure" and short-video features). Nikon NEF and Sony ARW remain traditional TIFF-based — the file opens with a TIFF header, then an IFD chain, each IFD holding one image (thumbnail / preview JPEG / actual RAW); Sony additionally puts a private SR2 sub-IFD inside the IFD to carry extra metadata. ② Each vendor's private lossy RAW compression. Canon offers CRaw (visually lossless, 30-40 % size reduction); Nikon offers NEF Compressed (effectively a lookup-table that compresses 14-bit raw to a 12-bit-equivalent precision, lossy but visually lossless); Sony's early default was a lossy "compressed RAW" (after criticism, an "uncompressed" option was added). These lossy modes are closed-source algorithms — LibRaw supports them via reverse engineering but its decoder occasionally diverges slightly from the vendor's. ③ The rise of "lossy RAW". RAW's original spirit is "preserve sensor data losslessly," but 14-bit lossy compression (similar to Lossy DNG) cuts size by 50-70 % with virtually no visible loss — attractive in storage-sensitive scenarios (burst shooting, stills between 4K video clips). Canon CRaw / Sony Compressed RAW / Nikon NEF Compressed all belong here. Long-term, RAW files are sliding toward "lossy but visually lossless" — coincidentally the same design philosophy as JPEG XL / HEIC.

适用

USE FOR

各家相机的原生输出(谁拍用谁的 RAW · 没第二选择)
用厂商官方软件做后期(Canon DPP / Nikon NX Studio / Sony Imaging Edge)
需要厂商完整 metadata 的场景(镜头校正 / 自动 WB 微调)
跟 HEIF / MP4 工具链协同(CR3 ISOBMFF 容器友好)

Native output from each vendor's cameras (whoever you shoot with, that's your RAW — no choice)
Post in the vendor's official software (Canon DPP / Nikon NX Studio / Sony Imaging Edge)
Scenarios needing the vendor's complete metadata (lens correction, auto-WB fine-tuning)
Pipelines aligned with the HEIF / MP4 toolchain (CR3's ISOBMFF container fits naturally)

反适用

AVOID

跨厂商 / 跨工具长期归档(转 DNG 更稳)
需要公开 spec 的科研归档(三家都不公开)
极度敏感的 bit-exact 比对(LibRaw 解出的结果跟厂商 SDK 可能略有偏差)
移动端 / Web 直接显示

Cross-vendor / cross-tool long-term archiving (DNG is more reliable)
Scientific archives needing public specs (none of the three publish)
Highly bit-exact comparisons (LibRaw decodes can deviate slightly from vendor SDKs)
Direct display on mobile / web

scope	vendor	libraries	CLI
CR3 / NEF / ARW + 各家私有压缩	Canon DPP · Nikon NX Studio · Sony Imaging Edge · Adobe Camera Raw / Lightroom · Capture One	`LibRaw`(逆向)· Canon EDSDK / Nikon SDK / Sony SDK(闭源 / 需申请)	`libraw_unprocessed_raw` · `dcraw -D`(输出原始 sensor 数据)· `exiftool`

奇闻 · TRIVIA

TRIVIA

Sony 早期 α 系列(α7 / α7R / α7S 等初代)的 RAW 默认就是"compressed RAW"——实际上是有损算法(把 14-bit raw 用一个非线性查找表压成 11-bit 等价精度),但 Sony 当时没在菜单里明说"有损"。摄影社区在 2014-2015 年用合成测试图发现高对比度边缘会出现微小色带 / posterization,大量批评后,Sony 在固件更新里加了"uncompressed"选项作为可选项。这是数码摄影史上"用户压力推动厂商改设计"少见的成功案例。

Sony's early α series (the first-generation α7 / α7R / α7S etc.) defaulted to "compressed RAW" — actually a lossy algorithm (a non-linear lookup table compressed 14-bit raw to roughly 11-bit-equivalent precision), but Sony's menu didn't disclose "lossy" at the time. In 2014-2015 the photography community used synthetic test patterns to expose subtle banding / posterisation on high-contrast edges; after sustained criticism Sony added an "uncompressed" option in a firmware update. It's one of the few cases in digital-photography history where user pressure successfully pushed a vendor to change a design.

←起源:origin: 厂商各自分支 — Canon (2004 CR2 → 2018 CR3) · Nikon (1999 NEF) · Sony (2005 ARW) ↔容器关系:container kinship: CR2 / NEF / ARW 基于 TIFF · CR3 改用 ISOBMFF(同 HEIC / MP4) ↔并存:coexists with: DNG(Adobe 试图统一 · 三巨头未跟) →解码靠:decoded by: 厂商 SDK(闭源)· LibRaw(逆向工程)· Adobe Camera Raw / Capture One(商业)

DICOM — 医学影像的封闭城堡 · 扛把子

DICOM — the walled city of medical imaging · heavy hitter

YEAR 1985 (ACR-NEMA 1.0) · 1993 (DICOM 3.0) AUTHOR ACR + NEMA · DICOM 标准委员会 EXT .dcm · .dcm30 · .dicom MIME application/dicom STD ISO 12052 · DICOM PS 3.x(2024 仍在版本化) LOSSY 多 transfer syntax(JPEG · JPEG-LS · JPEG 2000 · RLE · 无压缩 · HEVC) DEPTH 8-32 bit · 整数 / 浮点 / 12-16 bit 灰度主流 STATUS 医院唯一标准 · 全球 CT / MRI / PACS / EHR 通用

"它不是图片格式,是带 4000 个字段的医疗记录。"

"Not just an image format — a medical record with 4,000 attributes."

1980 年代,医院里 CT、MRI、X-ray、超声各家厂商各做一套协议:GE 的 CT 出不来 Siemens MRI 能读的文件,科室之间没法交换数据,医生想做一次跨设备影像会诊基本不可能。ACR(美国放射学院)与 NEMA(美国电气制造商协会)1985 年合作发布 ACR-NEMA 1.0,1993 年改名 DICOM 3.0 并加入网络协议。DICOM 同时定义了三件事:(a) 文件格式——一个 .dcm 既是图像也是患者完整病历;(b) 网络协议 DIMSE——医院里 CT 跟 PACS 之间的传输怎么走;(c) 元数据字典——4000+ 标准 tag 涵盖患者姓名、研究日期、modality、像素数据、窗宽窗位等任何医疗影像可能需要的字段。这套体系后来成了医院 IT 的事实标准——全球任何 CT / MRI / 超声 / 病理切片设备出厂时都说 DICOM,任何 PACS / EHR / 工作站默认输入也是 DICOM。30 年没人能挑战,因为它解决的不是"压像素",而是整个医疗影像的协议栈。

In the 1980s, hospital CT / MRI / X-ray / ultrasound vendors each defined their own protocols: a GE CT image couldn't be opened by a Siemens MRI station, departments couldn't exchange data, and a multi-modality consult was effectively impossible. ACR (American College of Radiology) and NEMA (National Electrical Manufacturers Association) jointly released ACR-NEMA 1.0 in 1985, then renamed it DICOM 3.0 in 1993 and added a network protocol. DICOM defines three things at once: (a) a file format — one .dcm is simultaneously an image and a complete patient record; (b) a network protocol, DIMSE — how images move between a CT scanner and a PACS server inside a hospital; (c) a metadata dictionary — 4,000+ standard tags covering patient name, study date, modality, pixel data, window width / level, and every medical-imaging attribute imaginable. The whole stack became the de-facto standard of hospital IT: every CT / MRI / ultrasound / pathology-slide device on Earth speaks DICOM out of the box, every PACS / EHR / workstation reads DICOM by default. 30 years later it remains unchallenged — because what it solved isn't "compressing pixels" but the entire medical-imaging protocol stack.

图 a · DICOM 文件结构。开头 128 字节 preamble 可以塞任何数据(规范允许),紧跟 4 字节 magic 'DICM'。然后是 File Meta Information(group 0x0002 的 tag,核心是 transfer syntax UID,告诉解码器像素是 JPEG / JPEG-LS / 无压缩还是别的)。最后是 DataSet body——一长串 DataElement,每个是一个 (group, element) tag + 数据,既包含医疗 metadata(患者名 / modality / 窗宽窗位)也包含 (7FE0, 0010) PixelData——真正的图像。一份 .dcm 同时是图像 + 病历 + 设备信息 + 工作流上下文。

Fig a · DICOM file layout. The first 128-byte preamble can hold any data (the spec permits it), followed by the 4-byte magic 'DICM'. Next is the File Meta Information (tags in group 0x0002 — most importantly the transfer-syntax UID, telling the decoder whether pixels are JPEG / JPEG-LS / uncompressed or something else). Last is the DataSet body — a long sequence of DataElements, each a (group, element) tag plus data, mixing medical metadata (patient name / modality / window width / level) and the (7FE0, 0010) PixelData — the actual image. One .dcm is image + record + device info + workflow context all at once.

技术内核

Technical core

DICOM 体系庞大但有六根支柱。① DataSet = 一组 DataElement,每个 DataElement 由 4 字段组成:Tag(group, element)+ VR(Value Representation,数据类型,如 PN=PersonName / DA=Date / US=UnsignedShort / OB=OtherByte)+ Length + Value。DataSet 可以嵌套(SQ 类型 = Sequence)。② 4000+ 标准 tag 由 DICOM 数据字典维护:(0010, 0010)=PatientName、(0008, 0060)=Modality、(0008, 0020)=StudyDate、(0028, 0010)=Rows、(0028, 1050)=WindowCenter、(7FE0, 0010)=PixelData…奇数 group 留给私有扩展(各厂商私有 tag 的栖息地)。③ Transfer Syntax UID 决定像素压缩方式——这是 DICOM 最关键的"开关":1.2.840.10008.1.2(无压缩, implicit VR)、.1.2.1(无压缩 explicit VR,最常见)、.1.2.4.50(JPEG baseline)、.1.2.4.80(JPEG-LS lossless,CT/MRI 默认)、.1.2.4.91(JPEG 2000 lossy)、.1.2.5(RLE)、.1.2.4.107(HEVC main profile,新)等几十种。同一份 .dcm 可以"换 transfer syntax"重新压缩,但 metadata 完全保留。④ Multi-frame:CT 和 MRI 一次扫描会出几十到几百张切片,DICOM 既支持每张切片一个 .dcm 文件(典型用法),也支持一个文件多帧(类似 GIF 多帧)——后者方便长 cine 序列。⑤ Window / Level metadata:CT 是 12-bit 灰度数据(范围 -1024~3071 Hounsfield Units),但屏幕只能显示 8-bit。DICOM 在 metadata 里存窗宽(WW)+ 窗位(WL)——告诉显示器"把哪段 12-bit 范围映射到 8-bit 灰度"。同一张 CT,医生可以切到"骨窗"(WW=2000, WL=300)看骨折,切到"软组织窗"(WW=400, WL=40)看肿瘤,切到"肺窗"(WW=1500, WL=-600)看肺纹理——一张图三种用途。⑥ DICOMweb(WADO-RS / STOW-RS / QIDO-RS):2010s 后基于 HTTP REST 的现代接口,正在逐步替代 1980s 设计的 DIMSE TCP 协议——本质上是把 DICOM 网络层从 OSI 7 层改造成 HTTP 友好版,方便跟现代云原生 PACS / Web 浏览器集成。

DICOM is sprawling but rests on six pillars. ① DataSet = a list of DataElements; each DataElement has four fields: Tag (group, element) + VR (Value Representation — the type, e.g. PN=PersonName / DA=Date / US=UnsignedShort / OB=OtherByte) + Length + Value. DataSets can nest (the SQ / Sequence type). ② 4,000+ standard tags, maintained by the DICOM Data Dictionary: (0010, 0010)=PatientName, (0008, 0060)=Modality, (0008, 0020)=StudyDate, (0028, 0010)=Rows, (0028, 1050)=WindowCenter, (7FE0, 0010)=PixelData… Odd-numbered groups are reserved for private extensions (where vendor-private tags live). ③ Transfer Syntax UID decides pixel compression — DICOM's most important switch: 1.2.840.10008.1.2 (uncompressed, implicit VR), .1.2.1 (uncompressed, explicit VR, most common), .1.2.4.50 (JPEG baseline), .1.2.4.80 (JPEG-LS lossless, the CT/MRI default), .1.2.4.91 (JPEG 2000 lossy), .1.2.5 (RLE), .1.2.4.107 (HEVC main profile, new), and dozens more. The same .dcm can be "transcoded to a new transfer syntax" — the metadata survives untouched. ④ Multi-frame: a CT or MRI scan produces tens to hundreds of slices; DICOM supports either one file per slice (the typical layout) or one file holding many frames (like a multi-frame GIF) — the latter is handy for long cine sequences. ⑤ Window / Level metadata: CT data is 12-bit greyscale (range −1024 to 3071 Hounsfield Units) but a display only shows 8 bits. DICOM stores window width (WW) + window level (WL) in metadata — telling the viewer "map this slice of the 12-bit range to 8-bit greys." A single CT can be re-windowed: "bone window" (WW=2000, WL=300) for fractures, "soft-tissue window" (WW=400, WL=40) for tumours, "lung window" (WW=1500, WL=−600) for lung markings — one image, three purposes. ⑥ DICOMweb (WADO-RS / STOW-RS / QIDO-RS): a post-2010 HTTP-REST modernisation that is steadily replacing the 1980s-era DIMSE TCP protocol — fundamentally re-architecting DICOM's network layer from OSI-7 into something HTTP-friendly, so cloud-native PACS and web browsers can integrate cleanly.

图 b · DICOM DataElement 4 字段展开。Tag(group, element 各 2 字节)是字段身份,如 (0010, 0010) 是 PatientName,(0008, 0060) 是 Modality。VR(Value Representation)是 2 字符的类型代码——PN=PersonName / DA=Date / US=UnsignedShort / OB=OtherByte / SQ=Sequence(嵌套 DataSet)/ UI=UID 等约 30 种。Length 标 Value 字节数。Value 是真正的数据。一份 .dcm 就是几百到几千个这样的 DataElement 顺序排列——这种"自描述 + tag 化"的设计深受 TIFF IFD 影响,但比 TIFF 多了 4000 个标准化的 tag 字典。

Fig b · A DICOM DataElement expanded into its four fields. Tag (group + element, 2 bytes each) is the field's identity — e.g. (0010, 0010) is PatientName, (0008, 0060) is Modality. VR (Value Representation) is a 2-char type code — PN=PersonName / DA=Date / US=UnsignedShort / OB=OtherByte / SQ=Sequence (nested DataSet) / UI=UID — about 30 in total. Length records the byte count of Value. Value is the actual data. A .dcm is hundreds to thousands of these in sequence — the "self-describing, tag-based" design is deeply inspired by TIFF's IFDs, but with a 4,000-entry standardised tag dictionary on top.

图 c · DICOM transfer syntax UID 决定像素 codec。前缀 1.2.840.10008 是 DICOM 的 OID 根命名空间,后缀决定算法。CT / MRI 实际部署里 JPEG-LS lossless(.4.80) 是默认——因为医疗影像必须无损,而 JPEG-LS 对 12-16 bit 灰度高效。JPEG 2000 lossy 主要在科研和非诊断场景。HEVC(.4.107)是 2017 年加入的新选项,主要给超声 cine 和 4D 数据用。同一份 .dcm 可以离线 transcode 换 transfer syntax,metadata 不动,只换像素压缩——这是 DICOM 灵活性的关键。

Fig c · DICOM transfer-syntax UIDs select the pixel codec. Prefix 1.2.840.10008 is DICOM's OID root namespace; the suffix selects an algorithm. In real CT / MRI deployments JPEG-LS lossless (.4.80) is the default — medical imagery must be lossless and JPEG-LS handles 12-16 bit greyscale efficiently. Lossy JPEG 2000 is mostly for research and non-diagnostic uses. HEVC (.4.107) was added in 2017 mainly for ultrasound cine loops and 4D data. The same .dcm can be transcoded offline to a different transfer syntax — metadata stays untouched, only the pixel compression changes — which is the heart of DICOM's flexibility.

图 d · CT 12-bit 数据怎么映射到 8-bit 屏幕。CT 的原始像素是 12-bit Hounsfield Units(范围 −1024 到 3071,水=0,空气=−1000,致密骨=+1000),共 4096 灰度级别——但屏幕只能显示 256 级。DICOM 在 metadata 里存 窗宽 WW + 窗位 WL,定义一个"取景框":只把 [WL−WW/2, WL+WW/2] 这段映射到 0~255 输出,框外要么纯黑要么纯白。同一张 CT,医生在工作站可以瞬间切换:骨窗(WW=2000, WL=300)看骨折,软组织窗(WW=400, WL=40)看肿瘤,肺窗(WW=1500, WL=−600)看肺纹理,脑窗(WW=80, WL=40)看脑出血。一张图三种诊断用途——这是 DICOM 必须用 12+ bit 的原因。

Fig d · How 12-bit CT data is mapped to an 8-bit screen. Raw CT pixels are 12-bit Hounsfield Units (range −1024 to 3071; water=0, air=−1000, dense bone=+1000) — 4,096 grey levels — but a display only shows 256. DICOM stores window width (WW) + level (WL) in metadata, defining a "viewing frame": only [WL−WW/2, WL+WW/2] is mapped to 0-255; outside that window everything saturates to black or white. From the workstation a radiologist can switch instantly: bone window (WW=2000, WL=300) for fractures, soft-tissue (WW=400, WL=40) for tumours, lung window (WW=1500, WL=−600) for lung markings, brain window (WW=80, WL=40) for haemorrhage. One image, multiple diagnostic uses — this is why DICOM has to be 12+ bit.

图 33 · DICOM 端到端医疗 IT 流水线。左:CT 设备扫描出 12-bit Hounsfield,经 reconstruction 后写成 N 张 .dcm 切片(每张带 4000 tag,默认 JPEG-LS lossless),通过 DIMSE 的 C-STORE 命令推到医院 PACS。中:PACS 服务器(Orthanc / dcm4che)按"Patient → Study → Series → Instance"四级层次索引,对外提供 DICOMweb HTTP REST API——QIDO-RS 按 tag 搜、WADO-RS 取像素、STOW-RS 上传。右:三条下游路径——① 放射科医生工作站(OsiriX / Horos / RadiAnt)应用窗宽窗位 + MPR + 3D 重建,出诊断报告;② AI 模型(CheXpert / nnU-Net / MONAI)读 .dcm pixel + metadata 出 segmentation;③ EHR(Epic / Cerner)用 FHIR ImagingStudy 资源把影像挂到患者档案上。整个医院 IT 体系的"像素 + 协议 + 字典"层全是 DICOM——30 年没人能挑战。

Fig 33 · The end-to-end DICOM medical-IT pipeline. Left: a CT scanner produces 12-bit Hounsfield data, runs reconstruction, writes N .dcm slices (each carrying 4,000 tags, default JPEG-LS lossless), and pushes them to the hospital PACS via the DIMSE C-STORE command. Middle: the PACS server (Orthanc / dcm4che) indexes data along the four-level "Patient → Study → Series → Instance" hierarchy and exposes a DICOMweb HTTP REST API — QIDO-RS searches by tag, WADO-RS retrieves pixels, STOW-RS uploads. Right: three downstream consumers — ① the radiologist workstation (OsiriX / Horos / RadiAnt) applies window/level + MPR + 3D rendering and produces a written report; ② AI models (CheXpert / nnU-Net / MONAI) read .dcm pixel + metadata to output segmentations; ③ the EHR (Epic / Cerner) uses the FHIR ImagingStudy resource to attach the imaging to a patient record. The entire hospital-IT stack — pixel layer, protocol layer, dictionary layer — runs on DICOM. 30 years and no one has displaced it.

历史专栏 · 40 年 DICOM · 从 ACR-NEMA 到 ASWF / FHIR / AI

HISTORY · 40 YEARS OF DICOM · ACR-NEMA → ASWF / FHIR / AI

一个 1985 年的"不让设备孤岛化"标准,如何成了医疗 AI 时代的事实数据底座

How a 1985 "kill device-silos" standard became the de-facto data layer of the medical-AI era

1980 年代初,美国大医院开始普及 CT(1973 年发明)、MRI(1977 年第一台)、数字 X-ray、超声等数字成像设备,但每家厂商——GE、Siemens、Philips、Toshiba——都各做一套数据格式和传输协议。结果是放射科里 GE 的 CT 出来的图,Siemens 的工作站打不开;医院想换设备就要重新采购整套软件;影像不能跨医院流转。美国放射学院(ACR)跟美国电气制造商协会(NEMA)看不下去,1983 年成立联合委员会,1985 年发布 ACR-NEMA 1.0(后来叫 DICOM 的前身)。1.0 还很简陋——只是文件格式 + 点对点 50-pin 并行电缆,但定义了"tag-based DataElement"这个基础概念。

1988 年 ACR-NEMA 2.0 加了显示、存储查询;1993 年大改重命名 DICOM 3.0,引入完整的网络协议 DIMSE(基于 OSI 7 层,后来改为 TCP/IP),C-STORE / C-FIND / C-MOVE / C-GET 四个核心命令——CT 怎么把图推到 PACS、医生怎么从 PACS 查图、怎么从一台机器拉到另一台机器,全都标准化。从 1993 年起,DICOM 工作组(WG-XX,几十个分组,放射 / 心血管 / 病理 / 牙科 / 兽医各自一组)以"PS 3.x"形式持续每年更新规范——到 2024 年仍在出新版本,加新 modality、新 transfer syntax(2017 年加 HEVC)、新结构化报告格式。

开源世界的 DICOM 三大件分别在 1993、2002、2010 年代登场。DCMTK(德国 OFFIS 1993 起,C++ 库,事实上的 DIMSE 实现参考)是命令行 + lib;dcm4che(2002 起,Java)更适合医院 IT 后端,后来衍生出整套 PACS 套件;Orthanc(比利时 Sébastien Jodogne 2012 起,C++ + Lua + REST,设计目标是"一个二进制就能跑的轻量 PACS")让小诊所、研究室都能自己搭 PACS。Sébastien 因 Orthanc 在 2014 年获 Free Software Foundation 的 Award for Projects of Social Benefit——开源 PACS 让低收入国家医院也能用上现代影像系统。

2010 年代是 DICOMweb 的时代。原版 DIMSE 是 1980s 设计的,基于 TCP 直连和复杂的 association negotiation,极度难穿透 HTTP 防火墙。WG-27 制定了三个 REST 接口替代品:WADO-RS(取数据)、QIDO-RS(查询)、STOW-RS(上传),全部是 HTTP + JSON / multipart,可以直接被浏览器和现代 Web 服务消费。这一改动让 DICOM 第一次真正能跑在云端——AWS / Google / Azure 各家都有 PACS 服务,Cornerstone.js / OHIF Viewer 这种开源 Web 端 DICOM 浏览器也是这时候起来的。

现代医疗的两根柱子是 HL7 + DICOM:HL7 管文字病历、检验、医嘱、计费;DICOM 管影像。2014 年 HL7 发布的 FHIR(Fast Healthcare Interoperability Resources)是 HL7 的现代 REST 版,FHIR 用 ImagingStudy 资源直接引用 DICOM 实例 UID,把影像跟患者电子病历无缝串起来。Epic、Cerner、OpenMRS 等 EHR 都基于 FHIR + DICOMweb。这意味着医生看病时可以在 EHR 里点一下,直接调出三年前的 MRI 跟今天的 CT 做对比——这件事 1985 年是不可想象的。

2010s 后期开始,医学 AI 把 DICOM 推到了新位置。CheXpert(Stanford 2019,胸片诊断)、nnU-Net(德国 DKFZ 2018,医学影像分割开源框架)、MONAI(NVIDIA + KCL 2020,PyTorch 医学影像生态)、TotalSegmentator(2023,全身 CT 自动分割,识别 100+ 解剖结构)——所有这些 SOTA 模型的训练数据和推理输入全部是 DICOM。HuggingFace 上的医学影像数据集(MIMIC-CXR / NIH Chest X-ray14 / RSNA Pneumonia / BraTS)发布时也都是 DICOM(或简化的 NIfTI,但临床部署还是要回到 DICOM)。这意味着任何想做医学 AI 的人都必须先学会 DICOM——pydicom / dcmtk / SimpleITK 是入门必备。

隐私和合规也是 DICOM 的关键议题。HIPAA(美)和 GDPR(欧)都要求医疗数据共享前必须匿名化,DICOM 的设计允许"去掉特定 tag 但保留像素"——dcmodify / DCMTK 的 dcmodify、pydicom 的 anonymize 函数可以批量删 (0010, *) 患者信息组、(0008, 0090) 主诊医生、(0040, *) 安排相关 tag,同时保留像素和必要的诊断 metadata(modality / 拍摄参数)。但隐藏的雷是:某些 modality(如脑 MRI)从像素本身就能"重建出脸"——所以高级匿名化还要做去面部(defacing)处理,这是 BIDS / OpenNeuro 等公开神经影像数据集的强制流程。

回顾 40 年,DICOM 是工程史的奇景:一个由"医院学会 + 设备厂商协会"两个看似官僚的机构在 1985 年定的标准,40 年后没有任何挑战者——HL7 v3 失败了 FHIR 才成,XML 替代了 DTD,REST 替代了 SOAP,但 DICOM 的核心数据模型(tag-based DataSet)从 ACR-NEMA 1.0 到现在几乎没变。这不是因为 DICOM 设计完美——它臃肿、冗余、有大量历史包袱(implicit VR 跟 explicit VR 双轨,private tag 满天飞,网络层老得像化石)——而是因为它是一整个生态的"语言":CT 设备厂、PACS 厂、工作站厂、AI 模型、EHR 系统、保险公司、监管机构都在说 DICOM,任何想换的人都要重写半个医疗 IT 行业。这种"太大不能倒"的 lock-in 反过来保证了它的稳定——医院影像从 1985 年到 2025 年的连续性,可能是整个软件行业最长的协议级稳定期。

In the early 1980s, large U.S. hospitals were rapidly adopting CT (invented 1973), MRI (first machine 1977), digital X-ray, and ultrasound — but every vendor (GE, Siemens, Philips, Toshiba) defined its own data format and transport. The result was that a GE CT image couldn't be opened on a Siemens workstation, replacing equipment forced rebuying every piece of software, and imaging couldn't move between hospitals. The American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA) formed a joint committee in 1983 and released ACR-NEMA 1.0 in 1985 — the predecessor of what would become DICOM. 1.0 was crude — just a file format plus a point-to-point 50-pin parallel cable — but it defined the foundational concept of "tag-based DataElement."

ACR-NEMA 2.0 (1988) added display and query/store; the major 1993 rewrite, renamed DICOM 3.0, introduced the full DIMSE network protocol (originally OSI-7, later TCP/IP) and four core commands — C-STORE / C-FIND / C-MOVE / C-GET — standardising how a CT pushes images to a PACS, how a doctor queries the PACS, and how data moves between machines. From 1993 onward DICOM working groups (WG-XX, dozens covering radiology / cardiovascular / pathology / dentistry / veterinary, etc.) have continuously updated the spec as "PS 3.x," and 2024 is still seeing new releases — adding new modalities, new transfer syntaxes (HEVC arrived 2017), and new structured-report formats.

The open-source DICOM trinity arrived in 1993, 2002, and the 2010s. DCMTK (Germany's OFFIS, since 1993; a C++ library and the de-facto DIMSE reference implementation) is CLI + lib; dcm4che (2002, Java) is better suited for hospital-IT backends and grew into a complete PACS suite; Orthanc (Belgian Sébastien Jodogne, 2012; C++ + Lua + REST, designed as "a single binary that runs as a lightweight PACS") let small clinics and research labs stand up their own PACS. Jodogne won the Free Software Foundation's Award for Projects of Social Benefit in 2014 — Orthanc enabled modern imaging systems in low-income-country hospitals.

The 2010s were the decade of DICOMweb. Original DIMSE was a 1980s design — direct TCP plus elaborate association negotiation, brutally hard to push through HTTP firewalls. WG-27 produced three REST replacements: WADO-RS (retrieve), QIDO-RS (query), STOW-RS (upload) — all HTTP + JSON / multipart, directly consumable from browsers and modern web services. That change finally let DICOM live in the cloud: AWS / Google / Azure all offer PACS services, and open-source web-based DICOM viewers like Cornerstone.js / OHIF Viewer date from this era.

Modern healthcare rests on two pillars — HL7 + DICOM: HL7 covers text records, lab results, orders, billing; DICOM covers imaging. HL7's FHIR (Fast Healthcare Interoperability Resources, 2014) is the modern REST flavour of HL7, and FHIR's ImagingStudy resource directly references DICOM instance UIDs, knitting imaging to the electronic record. Epic, Cerner, OpenMRS — every major EHR — runs on FHIR + DICOMweb. From a clinician's seat, that means clicking once in the EHR pulls up the MRI from three years ago alongside today's CT for direct comparison. Unimaginable in 1985.

Since the late 2010s, medical AI has elevated DICOM into a new role. CheXpert (Stanford 2019, chest-X-ray diagnosis), nnU-Net (DKFZ Germany 2018, the open-source medical-segmentation framework), MONAI (NVIDIA + KCL 2020, the PyTorch medical-imaging ecosystem), TotalSegmentator (2023, whole-body CT segmentation across 100+ anatomical structures) — every SOTA model has DICOM as both training data and inference input. Public medical-imaging datasets on HuggingFace (MIMIC-CXR / NIH Chest X-ray14 / RSNA Pneumonia / BraTS) ship as DICOM (or its simplified NIfTI cousin, but clinical deployment always returns to DICOM). Anyone entering medical AI must first learn DICOM — pydicom / dcmtk / SimpleITK are entry-level requirements.

Privacy and compliance are also central. HIPAA (US) and GDPR (EU) demand anonymisation before sharing medical data; DICOM was designed so you can strip specific tags while preserving the pixels. dcmodify (DCMTK), pydicom's anonymize, and similar tools batch-remove the (0010, *) patient group, (0008, 0090) referring-physician, and the (0040, *) scheduling tags while keeping the pixel data and essential diagnostic metadata (modality, acquisition parameters). The hidden trap: certain modalities (e.g. brain MRI) let you reconstruct a face from the pixels themselves — so advanced anonymisation also requires defacing, the mandatory step in BIDS / OpenNeuro and other public neuroimaging datasets.

Looking back across 40 years, DICOM is an engineering marvel: a standard set in 1985 by two seemingly bureaucratic bodies (a medical society + a manufacturers' association) that 40 years later has no real challenger. HL7 v3 failed and only FHIR succeeded; XML replaced DTDs; REST replaced SOAP — but DICOM's core data model (tag-based DataSet) has barely changed since ACR-NEMA 1.0. Not because DICOM is elegant — it's bloated, redundant, encumbered with legacy (the implicit-VR / explicit-VR dual track, private tags everywhere, a network layer that feels fossilised) — but because it is the language of an entire ecosystem: CT vendors, PACS vendors, workstation vendors, AI models, EHR systems, insurers, and regulators all speak DICOM, and any replacement would force rewriting half of medical IT. That "too big to fail" lock-in paradoxically guarantees its stability — hospital imaging's 1985-to-2025 continuity may be the longest protocol-level stability period in the entire software industry.

transfer syntax UID	codec	lossy?	typical use
1.2.840.10008.1.2	uncompressed (implicit VR)	—	small images / legacy
1.2.840.10008.1.2.1	uncompressed (explicit VR)	—	most common · default
1.2.840.10008.1.2.4.50	JPEG baseline	lossy	low-priority / preview
1.2.840.10008.1.2.4.80	JPEG-LS lossless	lossless	CT / MRI default
1.2.840.10008.1.2.4.91	JPEG 2000 lossy	lossy	research / non-diagnostic
1.2.840.10008.1.2.5	RLE	lossless	simple / integer
1.2.840.10008.1.2.4.107	HEVC main profile	lossy	ultrasound cine / 4D (new)

$ dcmdump in.dcm                              # DCMTK: 看所有 tag + value
$ dcm2pnm in.dcm out.pnm                      # DICOM → PNM (应用 W/L)
$ dcmconv -ti in.dcm out.dcm                  # 改 transfer syntax(转码)
$ dcmodify -ea "(0010,0010)" in.dcm           # 删除 PatientName tag(匿名化)
$ python -c "import pydicom; print(pydicom.dcmread('in.dcm'))"  # pydicom 读 .dcm
$ orthanc-cli upload http://pacs:8042/ in.dcm # 上传到 Orthanc PACS
$ curl http://pacs/dicom-web/studies          # DICOMweb QIDO-RS 查 studies
$ TotalSegmentator -i ct.dcm -o seg/          # AI 分割: 100+ 解剖结构

适用

USE FOR

医学影像所有 modality(CT / MRI / X-ray / 超声 / 病理切片 / 核医学 / 心电)
医院 PACS / EHR 集成(没第二选择)
医学 AI 模型训练 / 推理(DICOM 是事实输入格式)
跨设备 / 跨医院影像交换(必须遵循)
公开医学影像数据集发布(MIMIC-CXR / BraTS / RSNA / NIH)
临床 GxP / HIPAA / GDPR 合规归档

Every medical-imaging modality (CT / MRI / X-ray / ultrasound / pathology slides / nuclear medicine / ECG)
Hospital PACS / EHR integration (no alternative)
Medical-AI training / inference (DICOM is the de-facto input format)
Cross-device / cross-hospital image exchange (mandatory)
Releasing public medical-imaging datasets (MIMIC-CXR / BraTS / RSNA / NIH)
Clinical GxP / HIPAA / GDPR compliant archival

反适用

AVOID

任何非医疗场景(几乎是定义)
消费级 / Web / 手机端图片(没浏览器支持)
纯科研非临床(NIfTI / NRRD / Zarr 更轻)
艺术 / 摄影 / 设计(用错赛道)
需要小文件 / 低复杂度的场景(DICOM 元数据开销大)

Any non-medical scenario (almost by definition)
Consumer / web / mobile imagery (no browser support)
Pure-research non-clinical work (NIfTI / NRRD / Zarr are lighter)
Art / photography / design (wrong lane entirely)
Scenarios needing tiny files / low complexity (DICOM metadata overhead is heavy)

scope	commercial	open source	CLI / lib
DICOM 文件 + DIMSE + DICOMweb	✓✓✓ GE Centricity · Siemens syngo · Philips IntelliSpace · Epic Radiant · Sectra · Agfa IMPAX · Carestream	✓✓ Orthanc · dcm4che · DCMTK · OHIF Viewer · Cornerstone.js · OsiriX Lite · Horos · Weasis · 3D Slicer · pydicom · MONAI	`dcmdump` · `dcm2pnm` · `dcmconv` · `dcmodify` · `pydicom` · `SimpleITK` · `orthanc-cli` · `dcm4che-tools`

奇闻 · TRIVIA

TRIVIA

DICOM tag (0010, 0010) 是 PatientName ——这个数字组合全世界医疗 IT 工程师都背得出。group 0x0010 留给"患者识别信息",element 0x0010 是名字本身;紧邻的 (0010, 0020) 是 PatientID,(0010, 0030) 是 PatientBirthDate,(0010, 0040) 是 PatientSex。HIPAA 匿名化标准操作就是把整个 (0010, *) 组清空。

DICOM 文件开头的 128 字节 preamble 是一段"任意数据区"——规范明确允许塞任何东西。最常见的玩法是在 preamble 里嵌入一张JPEG 或 PNG 缩略图,这样 .dcm 文件在 Windows 资源管理器或 Mac Finder 里直接能预览(系统按文件头识别成图片格式)。某些 PACS 还会在 preamble 里塞 PDF 报告头、二维码、甚至加密签名——本质上 preamble 是 DICOM "夹带私货"的官方后门。

DICOM 网络协议 DIMSE 至今仍在用 1980s 设计的 OSI 7 层风格——association negotiation、abstract syntax、transfer syntax 三层握手,读起来像在跟老式电信交换机对话。DICOMweb(2010s 推出的 HTTP REST 替代)才让现代云端真正能集成 DICOM——但医院里 90% 的部署还在跑 DIMSE,因为旧设备改不动。这是现代医疗 IT 最割裂的双栈共存现象之一。

DICOM tag (0010, 0010) is PatientName — the magic combo every medical-IT engineer on Earth has memorised. Group 0x0010 is reserved for "patient identification," element 0x0010 is the name itself; the neighbours are (0010, 0020) PatientID, (0010, 0030) PatientBirthDate, (0010, 0040) PatientSex. HIPAA anonymisation operationally just clears the entire (0010, *) group.

The first 128 bytes of every DICOM file are a "free-form preamble" — the spec explicitly allows anything in there. The most common trick is to stuff in a JPEG or PNG thumbnail so the .dcm previews directly in Windows Explorer or macOS Finder (the OS sniffs the magic bytes as a picture format). Some PACS also put PDF report headers, QR codes, and even encrypted signatures in the preamble — fundamentally it is DICOM's official "smuggle whatever you want" back door.

DICOM's network protocol DIMSE still runs a 1980s-style OSI-7 design — association negotiation, abstract syntax, transfer syntax — three layers of handshakes that read like a conversation with an old telecom switch. DICOMweb (the 2010s HTTP-REST replacement) is what finally let modern clouds integrate DICOM cleanly — yet 90 % of hospital deployments still run DIMSE because legacy devices can't be upgraded. It's one of the most clearly bifurcated two-stack coexistences in modern medical IT.

←设计灵感:design inspiration: TIFF(自描述 tag-based 容器思想)+ ACR-NEMA 1.0 (1985) ↔独立分支:independent branch: 与所有非医疗格式并存 · 完全不交集 ↔内嵌 codec:embedded codecs: JPEG-LS(默认)· JPEG 2000 · RLE · HEVC payload ↔现代搭档:modern partner: HL7 / FHIR ImagingStudy(EHR 串影像)· DICOMweb(HTTP REST 现代化) →AI 时代:AI era: CheXpert · nnU-Net · MONAI · TotalSegmentator(全部以 DICOM 为输入)

PHASE IV

矢量 / 文档派 — 反旅程

Vector · document — the anti-journey

这一派是反旅程。矢量与文档从来不是"像素",它们是数学:贝塞尔曲线、字形轮廓、PostScript 程序。八站旅程里它们只在最后一刻"屏幕"被栅格化。SVG 装在 DOM 里给浏览器,PDF 装在打印队列给打印机,EPS / AI 装在设计师工具链——本质上都是"延后渲染"。这五章告诉你为什么矢量不会被位图取代,以及 PDF 是怎么把字体、JS、JBIG2 黑白扫描全装进同一个容器的。

This family is the anti-journey. Vectors and documents were never pixels — they're math: Bézier curves, glyph outlines, PostScript programs. Across all eight stops they only become pixels at the final screen. SVG lives in the DOM for browsers; PDF lives in print queues for printers; EPS / AI live in the designer's toolchain. All of them defer rasterization. These five chapters explain why vectors will never be replaced by bitmaps — and how PDF managed to pack fonts, JavaScript, and JBIG2 black-and-white scans all into one container.

birth

→

edit

→

compress

→

transmit

→

decode

→

VRAM

→

sample

→

screen

SVG — 不是位图,但 web 里就是图

SVG — not a bitmap, but on the web it just is the image

YEAR 2001 (SVG 1.0) · 2018 (SVG 2 部分) AUTHOR W3C SVG Working Group EXT .svg · .svgz (gzip) MIME image/svg+xml STD W3C Recommendation LOSSY 无 (矢量数学,栅格化时按 DPR 重算) DEPTH 任意 (display dependent) ALPHA ✓ (每个元素 fill-opacity / stroke-opacity) ANIM SMIL (legacy) · CSS animation · JS · Lottie (外接) STATUS Web 矢量唯一标准 · 设计/数据可视化默认

"不存像素,存数学。屏幕多大,它就多清晰。"

"Stores math, not pixels — sharp at any size."

1990 年代末,W3C 想要一个"web 上的矢量"——能在浏览器里直接渲染、能跟 HTML / CSS / JS 共存的开放格式。当时的对手是 Adobe Flash 和 Macromedia 的私有矢量动画,以及微软推的 VML(Vector Markup Language)。1999 年 W3C 启动 SVG WG,2001 年发布 SVG 1.0 Recommendation。SVG 的核心是 XML + 矢量数学:一份 .svg 文件就是一棵 DOM 树,根 <svg> 下挂着 <rect> / <circle> / <path> / <text> 等几何元素,辅以 <linearGradient> / <filter> 等装饰。整张图被嵌入到 HTML 的 DOM 里,可被 CSS 染色、被 JS 操控、被屏幕阅读器朗读。最关键的:它不是被栅格化后才渲染——浏览器在屏幕分辨率上重新计算每条 path,所以它在 1×、2×、3× DPR 上都同等清晰。这是位图永远做不到的事。最终,SVG 战胜 VML(微软 2010 起放弃),又熬过了 Adobe 在 2010 因安全和性能问题逐步禁用的 Flash —— 2020 年 Adobe 正式停止 Flash 支持,SVG 成为 web 矢量的唯一标准。

In the late 1990s, W3C wanted a "web-native vector" — an open format that could be rendered in the browser and live alongside HTML / CSS / JS. The contenders of the day were Adobe Flash, Macromedia's proprietary vector animation, and Microsoft's VML (Vector Markup Language). W3C started the SVG WG in 1999 and shipped SVG 1.0 Recommendation in 2001. SVG's core is XML + vector math: an .svg file is a DOM tree — a root <svg> with <rect> / <circle> / <path> / <text> geometry inside, decorated by <linearGradient> / <filter> and friends. The whole image lives inside HTML's DOM — colourable by CSS, scriptable by JS, readable by screen readers. Most crucially: SVG is not rasterised first and rendered second — the browser recomputes every path at the screen's true resolution, so it stays equally sharp at 1×, 2×, 3× DPR. That is something a bitmap can never do. SVG eventually defeated VML (Microsoft abandoned it after 2010) and outlived Flash (which Adobe began phasing out in 2010 for security and performance reasons, and formally killed in 2020) — leaving SVG as the web's sole vector standard.

图 36a · SVG path 的"一笔画"演示。d 属性是一段命令字符串:M 移笔(move,起点)/ L 直线(line)/ C 三次贝塞尔(cubic)/ Q 二次贝塞尔(quadratic)/ A 椭圆弧(arc)/ Z 闭合(close)。一条 path 就是一串这种命令拼起来的轨迹,引擎按命令顺序"画"一遍。所有矢量字体、Adobe Illustrator 输出、绝大多数图标 SVG 都是这种 path —— 圆和矩形只是它的语法糖。

Fig 36a · SVG path as a "single stroke" demo. The d attribute is a command string: M move (start point) / L line / C cubic Bézier / Q quadratic Bézier / A elliptical arc / Z close. A path is just that command string strung together; the engine "draws" it in order. All vector fonts, Adobe Illustrator output and the vast majority of icon SVGs are paths like this — <rect> and <circle> are merely sugar.

图 36b · viewBox / preserveAspectRatio 缩放示意。同一份 SVG(viewBox="0 0 40 40")在 CSS 上分别按 40 / 60 / 80 px 渲染,内部的圆不被预栅格化再放大,而是浏览器在当前屏幕分辨率下重新解一遍 path/circle 的几何方程。这就是矢量"无限清晰"的本质 —— 不是"图变大",而是"图被重新画了一遍"。

Fig 36b · viewBox / preserveAspectRatio scaling demo. The same SVG (viewBox="0 0 40 40") is rendered at 40 / 60 / 80 px in CSS; the inner circle is not pre-rasterised then scaled — the browser re-solves the geometric equation of path / circle at the current screen resolution. That is the essence of vector "infinite sharpness" — not "the picture got bigger," but "the picture was redrawn."

图 36c · SVG filter primitive 4 种。feGaussianBlur 高斯模糊;feColorMatrix 颜色矩阵(等价于 LUT,可做去色 / 偏色 / 反相);feOffset 像素偏移(常用作 drop shadow 第一步);feMerge 把若干层合并(把 offset+blur 跟原图叠成投影)。SVG filter 是一条链,跟 Photoshop 的滤镜栈同源 —— 实际上 Photoshop / Sketch / Figma 的"投影 / 内阴影 / 模糊"导出 SVG 时就是翻译成这几个 primitive。

Fig 36c · Four SVG filter primitives. feGaussianBlur for blur; feColorMatrix (an LUT — desaturate, tint, invert); feOffset for pixel translation (the first step of a drop shadow); feMerge to stack outputs (combine offset+blur with the source for a shadow). An SVG filter is a chain — the same lineage as Photoshop's filter stack — and indeed Photoshop / Sketch / Figma translate "drop shadow / inner shadow / blur" into exactly these primitives when exporting SVG.

图 36d · SVG vs PNG 在 1× / 2× / 3× DPR 显示对比。SVG 三种尺寸下都同等清晰(浏览器按显示分辨率重算)。PNG 在原生 1× 像素对齐时清晰,放到 2× / 3× 屏幕上必须做双线性 / 双三次重采样,锯齿与模糊不可避免。这就是为什么图标 / logo / UI 装饰永远应该用 SVG —— 不仅体积更小,而且抗 Retina / 抗 4K 屏天生免疫。

Fig 36d · SVG vs PNG at 1× / 2× / 3× DPR. SVG stays equally sharp across all three sizes (the browser recomputes at the device's true resolution). PNG is sharp at native 1× pixel alignment but must bilinear / bicubic resample on 2× / 3× screens, so aliasing and blurring are inevitable. That is why icons, logos and UI decoration should always be SVG — not just smaller, but inherently immune to Retina and 4K displays.

技术内核

Technical core

SVG 的工程内核可分六块。① XML 文档——不是二进制,是文本,因此可被 grep / diff / git blame / sed / 任何文本工具处理,这一点跟 PNG / JPEG 完全相反。优点是版本管理友好、可程序生成、可手写;缺点是大体积场景(几十万节点的复杂可视化)解析慢、内存大。② shapes + path——基本几何元素 <rect> / <circle> / <ellipse> / <line> / <polyline> / <polygon>,加最强的 <path>(命令字符串拼出任意曲线 — M/L/H/V 直线类、C/S/Q/T 贝塞尔、A 椭圆弧、Z 闭合)。所有矢量字体、所有 Illustrator 输出本质都是 path。③ 装饰 = gradient + pattern + filter——<linearGradient> / <radialGradient> 渐变;<pattern> 平铺纹理;<filter> 是滤镜链,提供 feGaussianBlur(模糊)/ feColorMatrix(LUT)/ feOffset(偏移)/ feMerge(合并)/ feComposite(合成)/ feTurbulence(柏林噪声)/ feMorphology(膨胀腐蚀)等 20+ primitive,串联起来等价于 Photoshop 滤镜栈,Sketch / Figma 的"投影"导出 SVG 就是 feOffset+feGaussianBlur+feMerge 三件套。④ CSS 染色 + class——SVG 元素接受 fill / stroke / opacity / transform 等表现属性,也接受 CSS。一个图标 SVG 在不同 dark/light theme 下只需切换 CSS 变量,不必重新导出;currentColor 关键字让 fill 跟随父元素文字颜色,这是图标库(Heroicons / Lucide / Phosphor)的核心机制。⑤ JS 操控——每个 SVG 元素都是 DOM Node,document.querySelector('circle').setAttribute('cx', 100) 直接生效。这是 D3.js / Observable Plot / Chart.js / Recharts 这一整代数据可视化库的根基 —— 它们的真正能力不是"画 SVG",而是"把数据 join 到 SVG DOM 元素上,让 SVG DOM 跟随数据更新"。⑥ 动画三条路:(a) SMIL(Synchronized Multimedia Integration Language)在 SVG 1.0 时定的 <animate> / <animateTransform> / <animateMotion>,声明式但被 Chrome 一度想废弃,现在保留但不推荐;(b) CSS animation + transform / opacity,现代主流,跟 HTML 一致;(c) JS / requestAnimationFrame,最灵活,D3 / GSAP / anime.js 都用。在它们之上,Lottie(2017,Airbnb 的 Bodymovin AE 插件 → JSON,JS lib 渲染)是矢量动画的现代补充 —— 设计师在 After Effects 里做动画,导成 JSON,Lottie lib 在浏览器 / iOS / Android 上以 SVG 或 Canvas 渲染。底层渲染路径仍然是 SVG / Canvas 的几何指令。

SVG's engineering core breaks into six pieces. ① XML document — text, not binary, so it grep / diff / git blame / sed / any text tool — the polar opposite of PNG / JPEG. The upside is version-control friendliness, scriptability, hand-authorability; the downside is that giant scenes (a viz with 100k nodes) parse slowly and bloat memory. ② Shapes + path — primitive geometry <rect> / <circle> / <ellipse> / <line> / <polyline> / <polygon>, plus the killer <path> (a command string composing arbitrary curves — M/L/H/V for straight, C/S/Q/T for Béziers, A for elliptical arcs, Z to close). Every vector font and every Illustrator export is essentially a path. ③ Decoration = gradient + pattern + filter — <linearGradient> / <radialGradient>; <pattern> for tiles; <filter> is a filter chain with 20+ primitives — feGaussianBlur, feColorMatrix (LUT), feOffset, feMerge, feComposite, feTurbulence (Perlin noise), feMorphology — strung together equivalent to Photoshop's filter stack. Sketch / Figma's "drop shadow" export is exactly feOffset + feGaussianBlur + feMerge. ④ CSS styling + class — SVG elements accept presentation attributes (fill / stroke / opacity / transform) and also CSS. An icon SVG can switch dark/light theme via a single CSS variable; currentColor lets fill inherit the parent's text colour — the central mechanism behind icon libraries like Heroicons / Lucide / Phosphor. ⑤ JS manipulation — every SVG element is a DOM Node, so document.querySelector('circle').setAttribute('cx', 100) just works. That is the foundation of an entire generation of dataviz libraries — D3.js, Observable Plot, Chart.js, Recharts — whose true superpower isn't "drawing SVG" but "joining data to SVG DOM nodes so the DOM updates with the data." ⑥ Three animation paths: (a) SMIL (Synchronized Multimedia Integration Language) defined the original <animate> / <animateTransform> / <animateMotion> — declarative, briefly threatened with deprecation by Chrome, now retained but not recommended; (b) CSS animation + transform / opacity, the modern mainstream — same as HTML; (c) JS / requestAnimationFrame, the most flexible — D3 / GSAP / anime.js. Layered on top, Lottie (2017, Airbnb's Bodymovin After-Effects plugin → JSON + JS runtime) is the modern vector-animation supplement: designers animate in AE, export to JSON, and Lottie renders in browsers / iOS / Android via SVG or Canvas. The underlying render path is still SVG / Canvas geometric drawing.

最后一步,且发生在当前 DPR 上 — 同一 SVG 在 Retina / 4K 上重新栅格化、永远清晰。 CSS 样式 / JS 操控 / filter chain 都发生在 DOM 树上,不需重导出。这是 SVG 跟 PNG 最本质的区别。 Chromium / Firefox / WebKit 内部最终仍调用 Skia / Cairo / Core Graphics 的 path 渲染器,跟 Canvas 2D 同一条路径。

图 36 · SVG 完整处理流程。XML 源被浏览器解析为 SVG DOM 树,CSS 样式与 JS 操控直接作用于 DOM 节点(可热更新),布局阶段按 viewBox + transform 计算几何,可选的 filter chain(由 feOffset / feBlur / feMerge 等 primitive 串联)在栅格化前应用,最后才在当前屏幕 DPR 上栅格化为像素。这条流水线跟 PNG / JPEG 走"解码 → 完整位图 → 缩放采样"完全不同 — SVG 永远在最后一刻、按设备分辨率重画一次,所以无论 1× / 2× / 3× 屏都同等清晰。CSS 染色 / JS 数据可视化 / filter 投影都发生在 DOM 上,不需要重新导出文件 — 这是 D3 / Heroicons / Figma 设计交付能跑起来的工程基础。

Fig 36 · SVG's full processing pipeline. The XML source is parsed by the browser into a live SVG DOM tree; CSS and JS act directly on the DOM (hot-reloadable); layout computes geometry from viewBox + transform; an optional filter chain (composed of feOffset / feBlur / feMerge primitives) runs before rasterisation; only at the very end is the result rasterised at the current screen DPR. This differs fundamentally from PNG / JPEG's "decode → full bitmap → resample on resize" — SVG is redrawn once, at the device's true resolution, in the last moment, so it stays equally sharp on 1× / 2× / 3× displays. CSS theming, JS-driven dataviz, filter shadows — all on the DOM, no re-export needed — that's the engineering foundation under D3, Heroicons and Figma's "design hand-off" workflow.

历史专栏 · SVG vs FLASH vs VML

HISTORY · SVG vs FLASH vs VML

三方割据,标准胜出 · Web 矢量的 20 年战争

Three-way war, the standard wins · twenty years of web vector

2001 年 SVG 1.0 W3C Recommendation 发布,但 SVG 真正占据 web 矢量唯一标准位,是一场拖了 20 年的三方战争。第一个对手是微软 VML(Vector Markup Language)——1998 年微软联合 Macromedia 等几家公司提交给 W3C,差不多跟 Adobe / Sun 提交的 PGML(Precision Graphics Markup Language)同时。W3C 把两份草案合并,最终演化成 SVG。VML 落选后,微软仍在 IE 5.5-IE 9 里偷偷支持它(因此当年 jQuery / Raphaël 这样的库都内置 VML fallback),直到 2010 年 IE 10 / 标准模式才完全切到 SVG。

第二个、也是最致命的对手是 Adobe Flash。Flash 1996 起作为 Macromedia 的 web 矢量动画插件,2003 年 Adobe 收购 Macromedia 把它纳入麾下。2000 年代,YouTube 早期、网页游戏、广告横幅、视频播放器几乎全是 Flash。SVG 在那十年里被普遍认为"很美但没用"——浏览器支持差、性能跟不上、IE 用 VML 不解 SVG、移动端直到 iOS Safari 才稳定。真正翻盘的不是 SVG 自己,而是乔布斯:2010 年 4 月,乔布斯写《Thoughts on Flash》公开信,以"耗电、闭源、安全漏洞、不为触屏设计"为由禁止 iOS 装 Flash。整个移动端瞬间倒向 HTML5 + CSS3 + SVG + Canvas。Adobe 也在那一年把 Flash 路线图改成 HTML5 优先,并开始为自家工具做 SVG 输出。

2017 年 Adobe 宣布 2020 年底正式停止 Flash 支持,2020 年 12 月 31 日 Flash Player 在所有浏览器中走向终点(Chrome / Edge / Firefox 联合下线)。2021 年起,SVG 是 web 矢量的唯一开放标准——没有第二个。但 SVG 自己也并非毫无遗憾:SVG 2 标准从 2013 起在制定,因 SMIL 老路与 CSS3 / HTML5 现代路打架,迟至 2018 年才发布部分 spec,且 Chromium 一度想废弃 SMIL;<script> 在 SVG 中可执行任意 JS,导致 <img src="..."> 模式下 JS 被禁用以防 XSS,只有 inline SVG 可执行 JS 这一条工程决定影响至今。

还有一个有趣的"暗线":Adobe Illustrator 内部其实就是 SVG 的上层。AI 文件格式从 CS2(2005)起底层切到 PDF + Adobe 私有 metadata,而 PDF 的矢量绘图原语跟 SVG 同源(都从 PostScript 派生)。Illustrator 的"导出 SVG"只是去掉私有 metadata 后输出 — 也就是说,设计师在 Illustrator 里画的每一笔,在数学上跟 SVG 等价。SVG 不是"为程序员设计的简化矢量",而是"被标准化的、跟设计软件原生兼容的矢量数学"。

2017 年 Airbnb 推出 Lottie(基于 Bodymovin AE 插件 → JSON,JS lib 渲染),作为现代矢量动画补充 — 设计师可以在 After Effects 里做更复杂的动画(关键帧、缓动曲线、表达式),导出 JSON,Lottie runtime 在 web / iOS / Android 上以 SVG 或 Canvas 重放。Lottie 的存在并不是要替代 SVG,而是补足"复杂时间线动画"这一块 — 其底层渲染路径仍然是 SVG path / Canvas 2D 的几何指令。这个工程意义上的"SVG = 现代 web 矢量的事实运行时"一直延续到今天。

SVG 1.0 became a W3C Recommendation in 2001, but SVG only took the throne of "web vector standard" after a twenty-year three-way war. The first contender was Microsoft's VML (Vector Markup Language) — submitted to W3C in 1998 by Microsoft and Macromedia, around the same time Adobe and Sun submitted PGML (Precision Graphics Markup Language). W3C merged the two drafts; SVG was the result. After losing the standardisation, Microsoft quietly kept VML alive in IE 5.5 through IE 9 (which is why libraries like jQuery and Raphaël shipped with VML fallbacks for years), until IE 10 / standards-mode finally moved entirely to SVG in 2010.

The second — and far more deadly — rival was Adobe Flash. Originally Macromedia's web vector-animation plug-in from 1996, Flash came under Adobe's roof in 2003. Through the 2000s it powered early YouTube, web games, banner ads and video players — SVG was widely seen as "beautiful but useless" in that decade: poor browser support, lacklustre performance, VML in IE instead of SVG, and unstable mobile support until iOS Safari. The decisive turn wasn't from SVG itself — it was from Steve Jobs: in April 2010 his open letter "Thoughts on Flash" banned Flash from iOS, citing battery, closed-source, security and touch-unfriendly arguments. Mobile pivoted overnight to HTML5 + CSS3 + SVG + Canvas. Adobe revised Flash's roadmap to HTML5-first that same year and began adding SVG output to its tools.

Adobe announced in 2017 that Flash would end on 31 December 2020, and on that date Flash Player was retired across Chrome / Edge / Firefox together. From 2021 onward, SVG is the only open vector standard on the web. SVG itself isn't blemish-free, though: the SVG 2 spec, drafted from 2013, dragged on as SMIL legacy and CSS3 / HTML5 modernity clashed, and only partially shipped in 2018; Chromium briefly threatened to deprecate SMIL; and <script> running arbitrary JS inside SVG forced browsers to disable JS in <img src="..."> mode for XSS safety — only inline SVG can run JS, an engineering decision still felt today.

There's an amusing "hidden lineage" too: Adobe Illustrator is essentially an SVG super-set. From CS2 (2005), the .ai file format switched its underbelly to PDF + Adobe-private metadata, and PDF's vector primitives share a common ancestor with SVG (both descend from PostScript). Illustrator's "Export → SVG" is mostly stripping the private metadata — every stroke a designer makes is mathematically equivalent to SVG. SVG is not "simplified vector for programmers" but "standardised vector math, native-compatible with design tools."

In 2017 Airbnb open-sourced Lottie (Bodymovin AE plugin → JSON + JS runtime), the modern complement for vector animation: designers can author complex timelines (keyframes, easing curves, expressions) in After Effects, export to JSON, and have Lottie's runtime replay it in web / iOS / Android via SVG or Canvas. Lottie doesn't replace SVG; it fills the "complex timeline animation" gap — and its underlying render path is still SVG-path / Canvas-2D geometric drawing. So in engineering terms, "SVG = the de-facto runtime of modern web vector" has stayed unbroken to this day.

feature	SVG	PNG	PDF	Lottie
缩放无损	✓	✗	✓	✓
Web 嵌入	✓ inline / img	✓ img	✗(大多数)	✓ JS lib
动画	SMIL / CSS / JS	APNG	无	JSON timeline
文本可搜索	✓ XML	✗	✓	partial
体积(图标)	~1 KB	~3-10 KB	~10 KB	~30 KB

$ svgo in.svg -o out.svg                  # 优化 SVG · 删冗余、合并 path
$ inkscape --export-png=out.png in.svg    # SVG → PNG · CLI 友好
$ resvg in.svg out.png                    # Rust SVG 渲染器 · 服务端常用
$ lottie2html in.json out.html            # Lottie JSON → HTML/SVG 静态化
$ npx @figma/code-connect svg in.fig      # Figma → SVG 导出

适用

USE FOR

图标 / logo / UI 装饰(Heroicons / Lucide / Phosphor)
数据可视化(D3.js / Observable Plot / Chart.js / Recharts)
需要 Retina / 4K 屏天生免疫的任何图
需要 CSS 变量切 dark/light theme 的图
需要 currentColor 跟随父元素文字颜色的图标
简单动画 / loader / 微交互(CSS animate)
设计交付 / 跨 DCC(Figma / Sketch / Illustrator 都原生输出)

Icons / logos / UI decoration (Heroicons / Lucide / Phosphor)
Data visualization (D3.js / Observable Plot / Chart.js / Recharts)
Anything that must stay sharp on Retina / 4K displays
Anything theme-able via CSS variables (dark / light)
Icons that follow parent text colour via currentColor
Simple loaders / micro-interactions (CSS animation)
Design hand-off across DCCs (Figma / Sketch / Illustrator all export SVG)

反适用

AVOID

照片(没有压缩比优势 · 体积爆炸)
复杂栅格内容(渐变噪点 / 真实纹理 / 模糊)
百万节点级 dataviz(DOM 解析 + 重排极慢 · 改用 Canvas / WebGL)
外部嵌入需执行 JS 的场景(<img src> 模式 JS 被浏览器禁)

Photos (no compression edge — files explode)
Complex raster content (noisy gradients, real textures, blur)
Million-node dataviz (DOM parse + reflow are slow — use Canvas / WebGL)
Embeds that need to run JS (browsers disable JS in <img src> mode)

scope	browsers / runtimes	editors / DCC	CLI
SVG (W3C)	✓✓ 所有现代浏览器原生 · React / Vue / Svelte 原生 JSX 支持 · iOS / Android (WebView · React Native SVG) · Skia / Cairo / Core Graphics 引擎	✓✓ Figma · Sketch · Illustrator · Inkscape · Affinity Designer · Boxy SVG · 所有现代设计工具均原生导出	`svgo` · `inkscape` · `resvg` · `rsvg-convert` · `imagemagick convert` · `cairosvg`

奇闻 · TRIVIA

TRIVIA

SVG 文件可以包含 <image href="data:image/png;base64,..."/> 嵌入 base64 PNG / JPEG —— 一个"SVG"文件其实可以全部是位图,只是装在 SVG 容器里。这种"伪 SVG"在导出工具里很常见(Sketch / Illustrator 把不可矢量化的素材用 image 标签嵌入),所以一个 .svg 体积 5 MB 你不要惊讶 —— 它可能 99% 是嵌入的位图。

SVG 的 <foreignObject> 可以嵌入 HTML —— "把任意 HTML 截图为 SVG 图片"这个 trick 就靠它:把 HTML 节点放进 foreignObject,把整张 SVG 画进 Canvas,再 toBlob 成 PNG。html2canvas / dom-to-image / Satori 这一类库的核心机制都是它。Vercel 的 OG image 生成服务底层就是 Satori + foreignObject。

An SVG can contain <image href="data:image/png;base64,..."/> embedding a base64 PNG / JPEG — so a "SVG" file can actually be entirely a bitmap dressed in an SVG container. Such pseudo-SVGs are common in export tools (Sketch / Illustrator embed non-vectorisable assets via the image tag), so don't be shocked when a single .svg weighs 5 MB — 99 % of it may be embedded raster.

SVG's <foreignObject> can embed HTML — and that single feature powers the "screenshot any HTML as an SVG image" trick: drop your HTML into foreignObject, paint the whole SVG into a canvas, then toBlob as PNG. html2canvas / dom-to-image / Satori all rely on this. Vercel's Open Graph image generator is essentially Satori + foreignObject under the hood.

SVG can carry inline <script> running arbitrary JS — which is why browsers disable JS when SVG is loaded via <img src="..."> (XSS protection). Inline <svg> embedded directly in HTML keeps JS active. The two modes look identical to a designer but are sandboxed completely differently.

←前辈:predecessors: Microsoft VML · Adobe/Sun PGML · PostScript(思想源头) ←起源:origin: W3C SVG Working Group · 2001 SVG 1.0 Recommendation ↔同源亲戚:cousins: PDF(同源 PostScript · 视为"分页 SVG")· AI(底层就是 PDF) →现代补充:modern supplement: Lottie · 复杂时间线动画 · 底层仍走 SVG / Canvas 2D →击败:defeated: Microsoft VML(2010)· Adobe Flash(2020)— 2021 起 web 矢量唯一标准

PDF — 容器之王

PDF — king of containers

YEAR 1993 (Acrobat 1.0) AUTHOR Adobe Systems · John Warnock EXT .pdf MIME application/pdf STD ISO 32000 (1.7 / 2.0) · PDF/A 归档子集 ISO 19005 DEPTH 任意(取决于嵌入对象) ALPHA ✓ (PDF 1.4+ 透明) STATUS 印刷 / 文档 / 表单 / 归档全行业

"你以为它是文档,其实是个能装一切的容器 —— 矢量、位图、字体、JS。"

"You think it's a document. It's a container holding everything — vectors, bitmaps, fonts, JavaScript."

1993 年 Adobe Acrobat 1.0 推出 PDF(Portable Document Format),目标是"任何打印机、任何屏幕看到的内容一致"——这件事在 1993 年其实没解决:你在 Mac 上排好的版到 Windows 打印机上字体丢失、布局错位是日常,LaTeX / TeX 那种把布局序列化进文件的思路在工业界没普及。Adobe 创始人 John Warnock 决定把自家的 PostScript(打印机用的页面描述语言)简化、加上随机访问索引,做成一个面向查看与归档的格式 — 这就是 PDF。基于 PostScript 简化,固定页面布局,可嵌字体 + 位图 + 矢量 + JS + 表单。30 年后成为合同、表单、印刷、归档的事实标准,2008 年 PDF 1.7 成为开放 ISO 32000 标准,Adobe 对自家格式失去专有控制权 —— 这个让步反而是 PDF 真正普及的关键。

In 1993 Adobe Acrobat 1.0 launched PDF (Portable Document Format) with a single ambition: "any printer, any screen, the same page." That problem wasn't solved at the time — typing a layout on Mac and printing it on Windows routinely produced missing fonts and broken pages, while LaTeX / TeX's idea of serialising the layout into the file hadn't reached industry. Adobe co-founder John Warnock chose to simplify his own PostScript (the page-description language inside printers), add a random-access index, and ship it as a viewer / archival format — that is PDF. Built on a simplified PostScript, fixed-page-layout, and able to embed fonts + bitmaps + vectors + JS + forms. Thirty years later it is the de-facto standard for contracts, forms, print, and archival. In 2008 PDF 1.7 became the open ISO 32000 standard and Adobe lost proprietary control of its own format — the concession that finally made PDF universal.

图 37 · PDF 对象树。文件根是 Catalog(全局入口),挂着 Page Tree(分页索引,允许嵌套以支持长文档),Page Tree 下挂多个 Page,每个 Page 引用一个 Resources 对象(字体 / XObject 图像 / ColorSpace 等共享资源),再加一个 ContentStream(实际的绘图指令流,跟 PostScript 同源)。文件末尾的 xref 表让阅读器随机访问任意对象,所以打开 1000 页 PDF 不需要先读完全部 — 这是 PDF 跟 PostScript 最关键的工程改进。

Fig 37 · PDF's object tree. The root is the Catalog (global entry), which references the Page Tree (page index, possibly nested to scale to long documents). Each Page points to a Resources object (fonts / XObject images / ColorSpace — shared resources) plus a ContentStream (the actual drawing-command stream, descended from PostScript). The xref table at the file tail lets viewers seek to any object in O(1), so opening a 1000-page PDF doesn't require streaming the whole file — the key engineering gain over plain PostScript.

技术内核

Technical core

PDF 的工程内核四件事。① 基于 PostScript 改良的页面描述语言—— 矢量绘图原语(m moveto / l lineto / c curveto / S stroke / f fill),跟 SVG path 命令同源思想。但 PDF 把 PostScript 的"图灵完备 + 解释执行"裁掉了,只保留可渲染的子集,加上 xref 随机访问索引,让 1000 页文件能任意翻页。② 可嵌入字体 + 位图 + 矢量—— 字体支持 Type1 / TrueType / OpenType / CID(CJK 大字符集);图像支持 JPEG / JBIG2(C40·黑白扫描)/ CCITT G4(传真)/ JPEG 2000(C8)/ Flate(zlib)等多种 codec — 整个 PDF 文件本质上是一个容器,实际像素由内嵌的 codec 解码。③ 分页 + 表单 + JS + 数字签名—— Page Tree 支持长文档;AcroForm / XFA 表单(可填写、可提交);Action 对象可绑定 JavaScript(报税表 / 计算字段);Signature 字段配合 PKI 数字签名让 PDF 在合同 / 法律文件场景立足。④ 归档子集 PDF/A —— ISO 19005,2005 起定义,禁用透明 / JS / 外部依赖 / 加密,要求嵌入所有字体 — 是 PDF 的 strict 子集,目的是"30 年后还能打开"。法律 / 政府 / 科研论文归档是 PDF/A 的主战场。

PDF's engineering core, four pieces. ① Page-description language descended from PostScript — vector primitives (m moveto / l lineto / c curveto / S stroke / f fill), kindred to SVG's path commands. But PDF removed PostScript's "Turing-complete interpreted execution," keeping only the renderable subset and adding the xref random-access table, so jumping around a 1000-page file is fast. ② Embeddable fonts + bitmaps + vectors — fonts: Type1 / TrueType / OpenType / CID (large CJK glyph sets); images: JPEG / JBIG2 (C40 · black-white scan) / CCITT G4 (fax) / JPEG 2000 (C8) / Flate (zlib). PDF is fundamentally a container; actual pixels are decoded by the inner codecs. ③ Pagination + forms + JS + digital signatures — Page Tree scales to long docs; AcroForm / XFA forms (fillable, submittable); Action objects bind JavaScript (tax forms with computed cells); Signature fields use PKI to put PDF on solid legal ground for contracts. ④ PDF/A archival subset — ISO 19005, defined from 2005, bans transparency / JS / external dependencies / encryption and mandates embedded fonts — a strict subset designed to "still open in 30 years." Legal, government and scientific-paper archival lives on PDF/A.

适用

USE FOR

合同 / 法律文件(数字签名 + 跨平台一致)
印刷 / 排版交付(InDesign 导出 PDF/X 印刷标准)
表单(税表 / 申请表 / 可填可提交)
长文档归档(PDF/A · 30 年后仍可打开)
科研论文 / 学术出版(LaTeX → pdflatex 输出)
电子书(固定布局 · 不重排)

Contracts / legal documents (digital signature + cross-platform consistency)
Print / typography delivery (InDesign → PDF/X print standard)
Forms (tax forms, applications — fillable, submittable)
Long-term archival (PDF/A — still openable in 30 years)
Scientific papers / academic publishing (LaTeX → pdflatex)
E-books (fixed layout, no reflow)

反适用

AVOID

Web 主图(浏览器有原生 viewer 但加载慢 · 用 SVG / image)
响应式 / 重排内容(PDF 是固定布局 · 用 EPUB / HTML)
移动端阅读体验(放大缩小笨重 · 用 EPUB)
需要修改 / 协作的活文档(用 Google Doc / Notion / Office 365)

Web hero images (browsers have viewers but loading is slow — use SVG / image)
Responsive / reflowable content (PDF is fixed-layout — use EPUB / HTML)
Mobile reading (zoom is clumsy — use EPUB)
Live collaborative documents (use Google Docs / Notion / Office 365)

scope	viewers	tools	CLI
PDF (ISO 32000)	✓✓ Adobe Acrobat / Reader · macOS Preview · pdf.js (Mozilla, 浏览器内置) · Foxit · SumatraPDF · Skim	✓✓ Adobe Acrobat Pro · Affinity Publisher · LibreOffice · LaTeX (pdflatex) · Word / Pages 导出 · InDesign 导出 PDF/X	`qpdf` · `pdftk` · `pdftoppm` · `pdfinfo` · `mutool` · `ghostscript` · `pandoc`

奇闻 · TRIVIA

TRIVIA

PDF 内的图片可以是 JBIG2 黑白 codec(C40)—— 扫描合同 / 票据 PDF 体积能砍 5-10×。Acrobat 9 起作为黑白扫描默认压缩;但 2013 年 Xerox 复印机用有损 JBIG2 把扫描合同里的数字 6 替换成 8(因为 symbol 视觉相似),导致工程图纸尺寸出错,Xerox 召回固件。法律 / 工程行业现在都关 JBIG2 选无损 CCITT G4。

PDF/A-1(2005,ISO 19005-1)归档标准禁用了透明、JS、外部字体引用、加密、嵌入音视频 —— 是 PDF 的 strict 子集,目的是"几十年后还能打开"。法律 / 政府 / 科研期刊归档基本都强制 PDF/A。后续 PDF/A-2(2011,基于 PDF 1.7)和 PDF/A-3(2012,允许嵌入任意附件,如发票场景的原始 XML)逐步放宽限制 — 工程标准也是会"修订"的。

PDF can embed JBIG2 (C40) for black-and-white pages — scanned contracts and invoices shrink 5-10×. Acrobat 9 made it the default for B&W scans, but in 2013 a Xerox copier using lossy JBIG2 silently replaced "6" with "8" in scanned engineering blueprints (the symbols look alike), causing real dimensional errors. Xerox recalled the firmware. Legal and engineering shops now disable JBIG2 and use lossless CCITT G4 instead.

PDF/A-1 (2005, ISO 19005-1), the archival subset, forbids transparency, JS, external font references, encryption, and embedded audio/video — a strict subset to ensure "still openable in decades." Legal, government and scholarly archives mandate it. Later PDF/A-2 (2011, on PDF 1.7) and PDF/A-3 (2012, allowing arbitrary attachments such as a source XML for an invoice) loosened constraints — even engineering standards get revised.

←前辈:predecessor: Adobe PostScript(打印机页面描述语言) ←起源:origin: Adobe Acrobat 1.0 · 1993 · John Warnock ↔同源亲戚:cousin: SVG(同源 PostScript · 视为"分页 SVG") →装载:embeds: JBIG2 · JPEG · JPEG 2000 · CCITT G4 · Flate →变体:variants: PDF/A(归档)· PDF/X(印刷)· PDF/UA(无障碍)· PDF/E(工程)

EPS — PostScript 的图片化身

EPS — PostScript dressed as an image

YEAR 1987 AUTHOR Adobe Systems EXT .eps · .epsf · .epsi(带 preview) MIME application/postscript DEPTH 任意(取决于 PS 内嵌资产) ALPHA 无(EPS 不支持透明) STATUS 印刷历史遗存 · 2010 年代后被 PDF 替代

"PostScript 加上 BoundingBox,就成了'图片'。"

"PostScript plus a BoundingBox = an 'image'."

1987 年 Adobe 为印刷出版定义 EPS(Encapsulated PostScript)—— 解决一个具体的工程问题:PostScript 1985 起作为打印机页面描述语言,文件本身是整页描述,没有"这张图占多大区域"的概念。但当时的 DTP(桌面出版)排版软件(Aldus PageMaker · QuarkXPress · 后来的 InDesign)需要把插图嵌入文档,要知道图边界做版心 / 文字绕排 / 缩放。Adobe 的解法非常简洁:一个普通 PostScript 文档,加上一行 %%BoundingBox: x1 y1 x2 y2 注释声明图像边框 —— 排版软件读这一行就能知道该图占多少空间,不必真去解 PS。再加 %%BeginPreview / %%EndPreview 嵌入位图预览(给不能渲染 PS 的程序看)。这就是 EPS。一个超低成本的"约定":不修改 PS 语法本身,只用注释扩展。这个格式撑起 90 年代到 2000 年代的全部印刷设计与 LaTeX 论文图表,2010 年代后被 PDF 完全替代 —— 因为 PDF 同样能做这件事,而且不需要"约定",直接是标准。

In 1987 Adobe defined EPS (Encapsulated PostScript) for the print-publishing industry — solving a concrete engineering problem. PostScript, from 1985, was a printer page-description language; a file described an entire page, with no notion of "how much space this illustration takes." But DTP applications (Aldus PageMaker / QuarkXPress / later InDesign) needed to embed illustrations inside documents, with a known bounding box for layout, text wrap, and scaling. Adobe's fix was elegantly minimal: a regular PostScript document plus one comment line — %%BoundingBox: x1 y1 x2 y2 — declaring the image's frame. The DTP app reads that line to know the size, without ever interpreting the PS itself. Add %%BeginPreview / %%EndPreview to embed a bitmap preview (for apps that can't render PS), and you have EPS — a near-zero-cost "convention" that extends PS via comments rather than syntax. The format carried virtually all print design and LaTeX paper figures through the 1990s and 2000s, and was wholly replaced by PDF after the 2010s — because PDF does the same thing without needing a convention; it's just the standard.

图 38 · EPS 文件结构 — 横向 4 段。① PS header(%!PS-Adobe-3.0 EPSF-3.0 标识 + 一些 DSC structuring comments);② %%BoundingBox: x1 y1 x2 y2(图像边框,DTP 软件读这一行决定版心);③ 可选的位图预览(给不能渲染 PS 的旧程序看);④ 真正的 PostScript 绘图代码(m / l / c / S / f 等命令)。整个文件是合法的 PostScript,可被 GhostScript 直接解释 —— EPS 的"图片化"完全靠 BoundingBox 这一行注释,不修改 PS 语法本身。

Fig 38 · EPS file structure — four horizontal chunks. ① PS header (%!PS-Adobe-3.0 EPSF-3.0 marker + DSC structuring comments); ② %%BoundingBox: x1 y1 x2 y2 (the bounding box DTP apps read for layout); ③ optional bitmap preview (for legacy apps that can't render PS); ④ the actual PostScript drawing code (m / l / c / S / f commands). The whole file is valid PostScript, interpretable directly by GhostScript — the "image-ness" of EPS rests entirely on the BoundingBox comment, with no change to PS syntax itself.

技术内核

Technical core

EPS 的内核只有两件事。① 普通 PostScript 文档 + 必须包含 %%BoundingBox 注释 —— BoundingBox 用 4 个数字声明图框(左下 x / 左下 y / 右上 x / 右上 y,单位 PostScript point = 1/72 inch);DTP 软件读这一行做版心,完全不需要解释 PS 本体。这是"约定式扩展"的工程经典 —— 0 成本扩展旧标准。② 可选 %%BeginPreview / %%EndPreview 嵌入位图缩略图(TIFF / WMF / PICT 三种主流格式)。1990 年代很多 DTP 软件不能在屏幕上渲染 PS(GhostScript 普及前 PS 解释开销大),所以排版时屏幕看到的是 preview 位图,打印时打印机解释真正的 PS 输出矢量。这种"屏幕用预览 / 打印用矢量"的双轨工作流是 EPS 的实际使用模式。EPS 的限制也很清楚:不支持透明(PS 没有 alpha 通道概念)、不支持多页(单页才叫"图片")、不支持表单 / JS / 加密(那是 PDF 的事)。这些限制在 1987 年是合理的,但到 2000 年代设计需求复杂化后,PDF(同样基于 PostScript 但加了透明、压缩、随机访问、多页、嵌入字体)就成了天然替代。LaTeX \includegraphics{fig.eps} 是 90 年代-2010 年代论文标配 —— pdflatex 流行后,EPS 几乎被 PDF 替代,因为 pdflatex 不能直接吃 EPS,需要 epstopdf 转换。

EPS's core is only two things. ① A regular PostScript document that must contain a %%BoundingBox comment — BoundingBox declares the figure frame using four numbers (lower-left x / lower-left y / upper-right x / upper-right y, in PostScript points = 1/72 inch). DTP apps read that single line for layout without interpreting the PS body — a textbook example of "convention-based extension," which extends a legacy standard at zero cost. ② Optional %%BeginPreview / %%EndPreview embeds a bitmap thumbnail (TIFF / WMF / PICT being the main formats). In the 1990s many DTP apps couldn't render PS on screen (PS interpretation was expensive before GhostScript matured), so on screen they showed the preview bitmap and at print time the printer interpreted the real PS as vectors. This "preview on screen / vector at print" two-track workflow was how EPS was actually used. EPS's limitations are equally clear: no transparency (PS has no alpha concept), no multi-page (a single page is what "image" meant), no forms / JS / encryption (those came in PDF). Reasonable in 1987, but as design needs grew through the 2000s, PDF — also PostScript-derived but with transparency, compression, random access, multi-page, embedded fonts — became the natural successor. LaTeX's \includegraphics{fig.eps} was the standard for academic figures from the 90s through the 2010s; once pdflatex became dominant, EPS was almost entirely replaced by PDF, since pdflatex doesn't ingest EPS directly and requires epstopdf.

适用

USE FOR

(历史)90 年代-2000 年代印刷设计交付
(历史)老 LaTeX 论文图表(latex+dvips 工作流)
跟老印刷机 / 老 RIP 兼容的图形交付
需要纯矢量 PostScript 输出的科学绘图(老版 gnuplot / xfig)

(legacy) 1990s-2000s print design hand-off
(legacy) old LaTeX figures (latex + dvips workflow)
Compatibility with vintage presses / RIPs
Pure-PostScript scientific plots (old gnuplot / xfig)

反适用

AVOID

Web(浏览器不支持 EPS)
现代设计交付(用 PDF / SVG)
需要透明的设计(EPS 不支持透明)
多页文档(用 PDF)
pdflatex 工作流(需 epstopdf 转换 · 不如直接 PDF)

Web (no browser support for EPS)
Modern design hand-off (use PDF / SVG)
Anything needing transparency (EPS has none)
Multi-page documents (use PDF)
pdflatex workflows (needs epstopdf — go straight to PDF)

scope	editors	renderers	CLI
EPS (Adobe)	✓ Adobe Illustrator · Inkscape · Affinity Designer · CorelDRAW(都可读老资产)	✓ GhostScript · old QuarkXPress · old PageMaker · 老印刷机 RIP	`ps2pdf` · `epstopdf` · `gs`(GhostScript)· `pstoedit`

奇闻 · TRIVIA

TRIVIA

LaTeX \includegraphics{fig.eps} 是 1990 年代-2010 年代学术论文的标配 —— latex(经典)→ DVI → dvips → PostScript → 印刷机。pdflatex 在 2000 年代中流行后,这条工作流逐渐被 pdflatex 直接吃 PDF / PNG / JPEG 替代,EPS 几乎被 PDF 取代;但用户仍可 epstopdf fig.eps 一键转换把老资产带过来。这种"工具链兼容性"是格式真正死亡的最后一根稻草 —— 不是因为 EPS 不好,是因为它在新工具链里需要额外步骤。

LaTeX's \includegraphics{fig.eps} was the academic-paper default through the 1990s and 2010s — classic latex → DVI → dvips → PostScript → press. After pdflatex became dominant in the mid-2000s, the workflow migrated to pdflatex consuming PDF / PNG / JPEG directly, and EPS was almost entirely replaced by PDF; users can still epstopdf fig.eps to bring legacy assets along. That kind of toolchain compatibility is what truly kills a format — not because it's bad, but because it now requires an extra step in the new chain.

←前辈:predecessor: Adobe PostScript (1985) ←起源:origin: Adobe · 1987 · 为 DTP 排版软件(PageMaker / QuarkXPress)定义 →被替代:replaced by: PDF(同源 PostScript · 但更全能) ↔仍活在:still alive in: 老 LaTeX 资产 / 老印刷机 / Adobe Illustrator 兼容读

AI — Illustrator 的私有

AI — Illustrator's proprietary file

YEAR 1987 (Illustrator 1.0) AUTHOR Adobe Systems EXT .ai SPEC 私有 · 无公开规范 DEPTH 任意(底层 PDF) ALPHA ✓ (CS2 后 PDF 1.4+ 透明) STATUS 设计交付源文件 · 视觉行业事实标准

"实质是 PDF + Adobe 私有 metadata。"

"Actually a PDF with Adobe-private metadata."

1987 年 Adobe Illustrator 1.0 推出,自定义 .ai 格式存放矢量插画 —— 跟 EPS / PDF 同年诞生,是 Adobe 在 80 年代末"PostScript 三件套"里专门给设计师用的源文件容器。早期(Illustrator 1.0 - CS1)的 .ai 是简化 PostScript,跟 EPS 几乎同源(都是 PS 子集),但加了 Illustrator 专有的图层、画板、笔刷等 metadata。CS2(2005)后 Adobe 做了一个有趣的工程决定:把 .ai 底层切到 PDF —— 因为 PDF 已经能装下 PostScript 矢量 + 字体 + 透明 + 嵌入位图(Adobe 内部的 PGF 私有 codec),再加上 Illustrator 私有的 PrivateData section 存放图层 / artboard / brush / 实时效果等 Illustrator 专有信息,就是完整的 .ai。结果:.ai 文件用 Adobe Reader 打开能看到栅格化预览(因为底层就是 PDF,Reader 直接渲染了内嵌的栅格化版本),但只有 Illustrator 才能完整编辑图层结构。这是设计师交付的"源文件"标准 —— 你在视觉行业接到的 brand kit、logo 源文件、海报源文件,90% 是 .ai。

In 1987 Adobe Illustrator 1.0 launched with the proprietary .ai format for vector illustrations — born the same year as EPS / PDF, the source-file container of Adobe's "PostScript trifecta" of the late 1980s. Early .ai (Illustrator 1.0 - CS1) was a simplified PostScript, kindred to EPS (both PS subsets), with Illustrator-specific metadata layered on top — layers, artboards, brushes. From CS2 (2005), Adobe made an interesting engineering decision: switch the .ai underbelly to PDF — because PDF already carried PostScript vectors + fonts + transparency + embedded raster (via Adobe's private PGF codec), plus an Illustrator-private PrivateData section for layers, artboards, brushes, live effects. So a .ai opens in Adobe Reader and shows a rasterised preview (because the file is fundamentally a PDF, and Reader renders the embedded raster version) — but only Illustrator can edit the layer structure. That is the "source file" standard for design delivery: 90 % of the brand kits, logo sources, and poster sources you'll receive in the visual industry are .ai files.

图 39 · AI vs PDF 文件结构对比。两个并排 chunk 树:.pdf 有 Catalog / Pages / Resources / ContentStream 四个核心对象;.ai 完全继承这四个(从 CS2 起 .ai 底层就是 PDF),但额外加一个私有 PrivateData section,存 Illustrator 特有的图层、画板、笔刷、实时效果。这就是为什么 .ai 改后缀为 .pdf 后 Adobe Reader 能直接打开看 — 它就是一份合法 PDF,只是 Illustrator 用私有 chunk 加了"图层结构"这层 metadata。

Fig 39 · AI vs PDF file structure side-by-side. .pdf has the four core objects: Catalog / Pages / Resources / ContentStream. .ai inherits all four exactly (since CS2 the underbelly is PDF) but adds a private PrivateData section for Illustrator-specific layers, artboards, brushes, and live effects. That's why renaming a .ai to .pdf opens cleanly in Adobe Reader — it is a valid PDF; Illustrator just hides "layer structure" in private chunks above the PDF base.

技术内核

Technical core

.ai 的内核两件事。① CS2(2005)后 .ai 格式底层就是 PDF —— 严格说是带 Adobe PGF(私有位图 codec,内嵌栅格化预览)+ 完整矢量绘图指令的 PDF 文档。这个工程决定的副作用极其有趣:.ai 保存时会内嵌一份 PDF 兼容预览(默认勾选 "Create PDF Compatible File"),所以 Adobe Reader / Preview / 浏览器 PDF viewer 都能直接打开 .ai 看到栅格化效果 — 但拿不到图层。② 私有 PrivateData section 存 Illustrator 特有的图层(Layers,可命名 / 锁定 / 隐藏 / 嵌套)/ 画板(Artboards,一份 .ai 可有多张画板,做 brand kit 一次性交付 logo + favicon + 名片)/ 笔刷(自定义 brush)/ 实时效果(Live Effects:阴影 / 模糊 / 3D 等可编辑非破坏性效果)/ 符号(Symbol,可重用元件)。这部分私有 chunk 是 Adobe 的护城河 —— 没有公开规范,Inkscape / Affinity Designer 只能部分解析(读到矢量 path 和填色,但图层结构 / 实时效果常丢)。设计师交付源文件时圈内默认就是 .ai —— 因为它是唯一能保留全部"可编辑性"的格式;导出 SVG / PDF 都会损失部分 Illustrator-only 信息。

.ai's core, two pieces. ① From CS2 (2005), the .ai underbelly is PDF — strictly, a PDF document with Adobe's PGF (private bitmap codec for the embedded raster preview) plus full vector drawing commands. The side effect is amusing: when you save a .ai, Illustrator embeds a PDF-compatible preview by default (the "Create PDF Compatible File" checkbox), so Adobe Reader / Preview / browser PDF viewers all open it and show the rasterised view — but never the layer structure. ② Private PrivateData section for Illustrator-specific layers (named / lockable / hidable / nestable), artboards (a .ai can hold many artboards — deliver logo + favicon + business card in one brand-kit file), brushes (custom), live effects (non-destructive shadow / blur / 3D), and symbols (reusable components). That private section is Adobe's moat — undocumented; Inkscape and Affinity Designer can only partially parse it (reading vectors and fills but commonly losing layer hierarchy and live effects). The designer's industry default is to hand off .ai because it is the only format that preserves full "editability" — exporting to SVG / PDF discards Illustrator-only information.

适用

USE FOR

设计师交付源文件(brand kit / logo / 海报源)
多画板项目(一文件装多张交付)
需要保留图层 / 实时效果的可编辑设计
跟其他 Adobe CC 软件协作(InDesign / After Effects / Photoshop 智能对象)

Designer source-file delivery (brand kit / logo / poster source)
Multi-artboard projects (one file for many deliveries)
Editable designs preserving layers / live effects
Adobe-CC interop (InDesign / After Effects / Photoshop smart objects)

反适用

AVOID

任何不装 Illustrator 的场景(Inkscape / Affinity 只能部分读)
Web 嵌入(用 SVG 导出)
跨工具协作(用 SVG / PDF 中间格式)
开源工作流(私有格式 · 锁定 Adobe 生态)

Anywhere without Illustrator (Inkscape / Affinity only partially parse)
Web embedding (export to SVG)
Cross-tool collaboration (use SVG / PDF as the lingua franca)
Open-source workflows (proprietary — locks you into Adobe)

scope	full editor	partial readers	CLI
.ai (Adobe 私有)	✓✓ Adobe Illustrator(唯一完整支持)	~ Inkscape(读矢量 + path)· Affinity Designer · CorelDRAW · Adobe Reader(只读 PDF 预览)	几乎无 · `uconv` / `pdftocairo` 把 PDF 部分提取

奇闻 · TRIVIA

TRIVIA

把 .ai 改后缀为 .pdf,Adobe Reader / macOS Preview / Chrome PDF viewer 能直接打开看(看到栅格化预览)—— 这是 Adobe CS2 后内部架构的副作用:.ai 保存时默认勾选 "Create PDF Compatible File",会内嵌一份完整 PDF 拷贝。设计师之间偶尔把 .ai 给客户但客户没装 Illustrator 的场景,这个副作用就是救命稻草 — 客户能"看到"设计但改不了,正好满足"我给你看,不给你改"的需求。

Renaming a .ai to .pdf and opening it in Adobe Reader / macOS Preview / Chrome's PDF viewer just works (you see the rasterised preview) — a side effect of Adobe's post-CS2 internals: saving a .ai with the default "Create PDF Compatible File" checkbox embeds a full PDF copy inside. Designers handing a .ai to a client who lacks Illustrator rely on exactly this — the client can see the design but not edit it, perfect for "show, don't share."

←前辈:predecessor: Adobe PostScript / 早期 .ai 是 simplified PS ←底层:underbelly (CS2+): PDF + Adobe PGF private codec + PrivateData section ↔导出:exports to: SVG · PDF · PNG · 但都会损失 Illustrator-only 信息 →影响:influence: 视觉行业事实标准 · "源文件 = .ai"是设计师默认假设

JBIG2 — PDF 里的黑白压缩

JBIG2 — the black-and-white compressor inside PDF

YEAR 2000 (ITU-T T.88) AUTHOR Joint Bi-level Image Experts Group EXT 通常嵌入 .pdf · 独立 .jb2 / .jbig2(罕见) MIME image/jbig2 (rare) STD ISO/IEC 14492 · ITU-T T.88 LOSSY 无损 + 有损(有损模式有 bug 历史) DEPTH 1-bit (黑白二值) STATUS PDF 内大量使用 · 法律 / 工程慎用有损

"扫描黑白合同的瘦身高手,但 2013 出过事故。"

"The B&W scan slimming wizard — with a 2013 incident on its record."

2000 年 ITU-T 标准化 JBIG2(T.88)替代上一代 G3 / G4 传真编码,专门压扫描黑白文档(合同、票据、账单、医学胶片黑白扫描)。它解决一个具体的工程问题:CCITT G4(1980 年代传真标准)是逐行 RLE,体积砍 10× 已经是上限,但 1990 年代末扫描分辨率从 200 DPI 升到 600 DPI,文件再次膨胀。JBIG2 的关键创新是把页面切成 symbol(连通域) —— 把扫描页面里所有连通的像素块识别出来,相似 symbol 共享一个 dictionary 模板,实际像素流变成"在 (x, y) 引用 dictionary 第 N 个 symbol"。一页扫描合同里所有的 'e' 在视觉上可能 90% 像,JBIG2 只存一个 'e' 模板,其余位置都是引用 —— 体积比 CCITT G4 砍一半到三分之二。Acrobat 9(2008)起,JBIG2 成为 PDF 默认黑白扫描压缩。但 2013 年 Xerox 复印机用有损 JBIG2(允许"用相似 symbol 替代")导致扫描合同里的数字 6 被替换成 8,工程图纸尺寸出错,Xerox 召回固件 —— 此后法律 / 工程行业默认关 JBIG2 选无损 CCITT G4。

In 2000 ITU-T standardised JBIG2 (T.88) to replace the previous-generation G3 / G4 fax encodings, targeting scanned black-and-white documents (contracts, invoices, statements, B&W medical scans). It solved a specific engineering problem: CCITT G4 (1980s fax) was per-row RLE, capped near 10× compression, but late-1990s scanners climbed from 200 DPI to 600 DPI and files swelled again. JBIG2's key innovation is cutting pages into symbols (connected components) — every connected pixel cluster on a page is identified, similar symbols share one dictionary template, and the actual stream becomes "at (x, y) reference symbol #N." On a scanned contract, all of the 'e' glyphs are ~90 % visually identical, so JBIG2 stores one 'e' template and turns the rest into references — file size drops to half or a third of CCITT G4. From Acrobat 9 (2008), JBIG2 became PDF's default for black-and-white scans. But in 2013 a Xerox copier using lossy JBIG2 (which permits "substitute a similar symbol") caused 6s in scanned contracts to be replaced by 8s, producing wrong dimensions in engineering blueprints. Xerox recalled the firmware. Legal and engineering industries have since defaulted to disabling JBIG2 and using lossless CCITT G4.

图 40 · JBIG2 符号字典工作原理。一行文本 "the entered amount" 里有 3 个 'e',视觉上几乎一模一样;JBIG2 把一个 'e' 字形(连通域)抽取成 dictionary symbol #17,然后位流里只存一个模板加 3 个 (x, y, refid) 引用 —— 体积从 462 px 降到 154 px + 3 个坐标对。整个扫描页里所有重复字形都这样处理,实际能砍掉 50-70% 的 G4 体积。有损模式更激进:允许"非常相似的 symbol 共享一个模板",这就是 2013 年 Xerox 把数字 6 错配成 8 的原因 — 6 和 8 在扫描噪点下视觉相似度极高。

Fig 40 · JBIG2 symbol-dictionary mechanics. A line "the entered amount" contains three near-identical 'e' glyphs; JBIG2 extracts one 'e' connected component as dictionary symbol #17, then the bitstream stores only the template plus three (x, y, refid) references — 462 px collapses to 154 px + three coordinate triples. Across an entire scanned page, all repeated glyphs go through this dictionary, knocking 50-70 % off CCITT G4. Lossy mode is more aggressive: it allows "very similar symbols to share one template," which is exactly how 2013 Xerox mis-substituted '6' for '8' — the two digits are visually indistinguishable to the heuristic under scan noise.

技术内核

Technical core

JBIG2 三件事。① 把扫描页面切成 symbol(连通域)—— 编码器扫描整页,识别出所有连通像素块(每个字符 / 每个标点 / 每段线条),把视觉相似的 symbol 共享一个 dictionary 模板。位流变成"位置 + dictionary 引用",而不是"逐像素栅格"。② 三种 region 编码:(a) generic region 用 CABAC 算术编码逐像素压缩,处理不规则内容(图标、签名、印章);(b) text region 用上面的 symbol 字典,处理文本(占扫描合同的 90%);(c) halftone region 用 grayscale 模板字典,处理半调网点(扫描照片的二值化)。三种 region 在同一页里可混用 —— 编码器自动分割。③ 有损模式 vs 无损模式:无损模式严格匹配 symbol(只共享 bit-exact 相同的连通域);有损模式允许"用相似 symbol 替代",阈值由编码器决定 —— 体积更小,但可能静默修改字符。这就是 2013 年 Xerox 事故的根因:数字 6 和 8 在扫描噪点下连通域形状相似,有损 JBIG2 把同一份模板用在两个不同字符上,导致扫描出的合同跟原件数字不一样。Xerox 召回固件,法律 / 工程 / 医疗行业从此默认关 JBIG2 选无损 CCITT G4 —— 即便牺牲一倍体积也要保证 bit-exact。Acrobat 提供"无损 JBIG2"选项,但 default 是有损,所以你扫描合同前要手动关掉。

JBIG2 in three pieces. ① Slice the scanned page into symbols (connected components) — the encoder scans the whole page, identifies every connected pixel cluster (every character, every punctuation mark, every line stroke), and lets visually similar symbols share one dictionary template. The bitstream becomes "position + dictionary reference," not "pixel-by-pixel raster." ② Three region encodings: (a) generic region uses CABAC arithmetic per-pixel for irregular content (icons, signatures, stamps); (b) text region uses the symbol dictionary above, for text (about 90 % of a contract scan); (c) halftone region uses grayscale-template dictionaries, for halftone screens (the binarisation of scanned photos). All three coexist on a single page, with the encoder choosing per-area. ③ Lossy vs lossless mode: lossless matches symbols strictly (sharing only bit-exact identical components); lossy permits "substitute a similar symbol" by an encoder-side threshold — smaller, but can silently rewrite characters. That is exactly the 2013 Xerox bug's root cause: the digits 6 and 8 have visually similar connected components under scan noise, and lossy JBIG2 reused one template across two different characters — so the scanned contract's digits no longer matched the original. Xerox recalled the firmware, and legal / engineering / medical industries have since disabled JBIG2 in favour of lossless CCITT G4 — willing to pay 2× the size to guarantee bit-exactness. Acrobat does offer a "lossless JBIG2" option, but the default is lossy, so you must turn it off explicitly before scanning a contract.

适用

USE FOR

非关键扫描黑白文档(图书馆藏书 / 报纸归档 / 普通账单)
已开启无损模式的合同 / 票据扫描
需要 PDF 体积砍 5-10× 的纯文本扫描场景
医学胶片黑白图像归档(无损模式)

Non-critical B&W scans (library books, newspaper archives, casual statements)
Contract / receipt scans only when lossless mode is enabled
Pure-text scan PDFs needing 5-10× shrink
B&W medical-image archival (lossless mode)

反适用

AVOID

灰度 / 彩色扫描(JBIG2 只有 1-bit · 用 JPEG 2000)
法律合同(默认有损模式可能改字符 · 强烈建议无损或关闭)
工程图纸 / 数字尺寸(2013 Xerox 事故先例)
医学诊断报告(任何字符替换都不可接受)

Grayscale / colour scans (JBIG2 is 1-bit only — use JPEG 2000)
Legal contracts (default lossy can rewrite characters — force lossless or off)
Engineering blueprints with numeric dimensions (the 2013 Xerox precedent)
Medical diagnostic reports (no character substitution acceptable)

scope	encoders	decoders	CLI
JBIG2 (ITU-T T.88)	✓ Adobe Acrobat Pro · LuraTech · CVision · ABBYY · Foxit Phantom	✓ 所有 PDF viewer(Adobe Reader · Preview · pdf.js)· 独立 .jb2 解码罕见	`jbig2enc`(开源 · Lepton)· `jbig2dec`(GhostScript)· Acrobat 命令行

奇闻 · TRIVIA

TRIVIA

2013 Xerox 事故:JBIG2 有损模式把扫描合同 / 工程图纸里的数字 6 替换成 8(因为 symbol 视觉相似度高,字典误把两者归为同一模板),导致建筑承包商收到的图纸尺寸跟原件不一样。事故由德国计算机科学家 David Kriesel 2013 年首先发现并系统化报告,被各大媒体报道("Xerox scanners alter numbers")。Xerox 召回了大量固件并改默认设置为无损模式。从此法律 / 工程 / 医疗行业的扫描默认都是无损 CCITT G4 + 可选 JBIG2 无损 — 即便牺牲一倍体积也要保证 bit-exact。

JBIG2 让扫描合同 PDF 体积砍 5-10×,这在 2008-2013 年云存储贵的时代是非常诱人的优化 —— Acrobat 9 把它设为黑白扫描默认。但 Xerox 事故之后,法律行业现在默认关 JBIG2,选 CCITT G4 无损 — 文件大,但合同上每一个数字 bit-exact 跟原件相同,这是法律凭据的底线。这也是工程史上一个有趣的案例:体积优化的诱惑让默认设置走向"大多数场景能用",但极少数不能容错的场景一旦出错代价巨大。

The 2013 Xerox incident: JBIG2 lossy mode silently substituted '8' for '6' in scanned contracts and engineering blueprints (the connected components are visually close, so the dictionary collapsed both into one template). German computer scientist David Kriesel first reported and systematised the bug in 2013, and major outlets ran the story ("Xerox scanners alter numbers"). Xerox recalled vast amounts of firmware and changed the default to lossless. From that point on, legal / engineering / medical scans default to lossless CCITT G4 plus optional lossless JBIG2 — accepting 2× the size to guarantee bit-exactness.

JBIG2 shrinks contract PDFs 5-10×, which was extremely attractive in the 2008-2013 era of expensive cloud storage — Acrobat 9 made it the default for B&W scans. After the Xerox bug, legal practice now defaults JBIG2 off and uses lossless CCITT G4 — bigger files, but every digit on the contract bit-identical to the original, which is the legal evidentiary baseline. It's also a great engineering-history case study: the temptation of size optimisation pushes defaults toward "works most of the time," but the rare cases that cannot tolerate errors pay enormously when something does go wrong.

←前辈:predecessors: CCITT G3(1980 传真)· CCITT G4(逐行 RLE 二进制压缩) ←起源:origin: Joint Bi-level Image Experts Group · ITU-T T.88(2000)· ISO/IEC 14492 →主要在:primarily lives inside: PDF(Acrobat 9+ 黑白扫描默认) ↔事故标记:incident marker: Xerox 2013 · 数字替换 bug · 法律行业从此关有损模式

PHASE V

复古 / 怪格式派 — 时间机器

Retro · oddities — the time machine

这是时间机器。旅程退到 1985:没有 alpha、没有 LZW、调色板就是世界,文件格式短到能拿打字机敲完。但看到 2014 的 Farbfeld 和 2021 的 QOI,你会发现:"祖宗其实比现代格式还简洁"——QOI 一个周末写完,用 300 行 C 打败 PNG 50 倍编码速度;Farbfeld 把整个 spec 写在 11 行里。九章读完会让你重新思考"复杂"这件事。

This is the time machine. The journey rewinds to 1985 — no alpha, no LZW, the palette is the entire world, and the spec is short enough to type on a typewriter. But by the time you reach 2014's Farbfeld and 2021's QOI, you realize: some ancestors are simpler than the modern stuff. QOI was written in a weekend, 300 lines of C, beating PNG by 50× encode speed; Farbfeld fits its entire spec in eleven lines. Nine chapters that will make you rethink what "complexity" means.

birth

→

edit

→

compress

→

transmit

→

decode

→

VRAM

→

sample

→

screen

TGA — Truevision 时代的纹理王

TGA — the texture king from the Truevision era

YEAR 1984 (Truevision Targa) AUTHOR Truevision, Inc. EXT .tga · .targa · .icb · .vda MIME image/x-tga LOSSY 无损(可选 RLE) DEPTH 8 / 16 / 24 / 32 bit ALPHA ✓ (32-bit RGBA) STATUS 游戏纹理老兵 · 工具链中转

"3D 游戏行业用了 20 年的纹理格式 —— 因为 alpha 简单。"

"Twenty years of 3D game textures — chosen for its simple alpha."

1984 年 Truevision 公司推出 Targa 系列显卡 —— 这是早期 PC 真彩(24-bit)显卡的代表作,而 TGA(Targa Graphics Adapter)正是该卡的"出厂格式":一种结构极简的位图容器,用来把显卡里 24-bit RGB / 32-bit RGBA 像素数据原样存到磁盘。规范一句话能讲完:18 byte 文件头 + 可选 image ID + 可选调色板 + 像素数据 + 可选 RLE,解析比 BMP 还快(BMP 还得分 V3 / V4 / V5 几代)。Truevision 公司在 90 年代后期破产,但 TGA 因为另一个生态延续了生命 —— 1990 年代中期 id Software 的 Quake 引擎、Epic 的 Unreal 引擎、Valve 的 Source 引擎都把 TGA 当作纹理标准,理由极简单:① 32-bit RGBA 透明用一个独立 alpha 通道,不像 BMP 那样要靠 magic 像素;② 18 byte 头部解析 30 行 C 代码搞定,引擎启动时一次性吃掉成百上千张纹理零负担;③ 跨平台 (DOS / Windows / IRIX / Mac),老纹理工具链全部支持。所以"格式厂商死了,格式靠用户死撑"是 TGA 的故事 —— 你今天打开 Quake 1 的 mod 包,里面 90% 的纹理仍是 .tga,Photoshop 也仍原生支持。

In 1984 Truevision launched the Targa line of graphics cards — early flagship 24-bit colour cards for the PC — and TGA (Targa Graphics Adapter) was the card's "factory" format: a minimal bitmap container for dumping 24-bit RGB / 32-bit RGBA pixel data straight from VRAM to disk. The whole spec fits in a sentence: 18-byte header + optional image ID + optional palette + pixel data + optional RLE, parseable faster than BMP (which has the V3 / V4 / V5 generation soup). Truevision went out of business in the late 1990s, but TGA lived on inside another ecosystem — id Software's Quake, Epic's Unreal, and Valve's Source engine all adopted TGA as their texture standard for embarrassingly simple reasons: (1) 32-bit RGBA carries transparency in a real alpha channel, not BMP's magic-pixel hack; (2) the 18-byte header parses in 30 lines of C, so an engine can wolf down hundreds of textures at startup; (3) it's cross-platform (DOS / Windows / IRIX / Mac) and every legacy texture tool already supported it. So TGA's story is "the vendor died, but the users carried the format" — open a Quake 1 mod pack today and 90 % of the textures are still .tga, with Photoshop still supporting it natively.

图 41 · TGA 文件结构 — 横向五段。① 18 byte 文件头(image type / 调色板属性 / 宽高 / 每像素位数 / origin / descriptor);② 可选 image ID 字段(通常空);③ 可选调色板(8 / 16-bit 模式才有);④ 真正的像素数据,顺序是 BGR(A) 而非 RGB(A)—— 这跟 BMP 一致,反映 80 年代显存按字节小端读出的事实;⑤ 可选的 v2.0 footer(20 byte),里面带 "TRUEVISION-XFILE.\0" 签名。整张图可选 RLE 压缩(每段 1 byte 头 + 像素),压缩率不高但解码 50 行 C。规范极简,所以 Quake/Unreal 系引擎用了 20 年。

Fig 41 · TGA file structure — five horizontal chunks. ① An 18-byte header (image type / palette attrs / width / height / bits-per-pixel / origin / descriptor); ② optional image ID (usually empty); ③ optional colormap (only for 8 / 16-bit modes); ④ the real pixel payload, ordered BGR(A) rather than RGB(A) — same as BMP, a reflection of 1980s little-endian byte-by-byte VRAM reads; ⑤ an optional v2.0 footer (20 bytes) carrying the "TRUEVISION-XFILE.\0" signature. The whole pixel block can be RLE-compressed (1-byte header per run + pixels) — compression isn't great but the decoder is 50 lines of C. That minimalism is exactly why Quake / Unreal engines used it for two decades.

技术内核

Technical core

TGA 内核三件事。① 18 byte 文件头是规范的全部 —— 包含 image type ID(决定 colormap / RGB / B&W,有无 RLE)、colormap 属性、image origin / 宽 / 高、bits-per-pixel(8 / 16 / 24 / 32)、image descriptor(里面有 alpha 位数 + 行扫描方向);整个解析 50 行 C 全搞定。这是极简的工程胜利:对比 BMP 三代 header(BITMAPINFOHEADER → V4 → V5),TGA 一份 header 写到死。② 像素深度 8 / 16 / 24 / 32 四档:8-bit indexed 走调色板(老游戏精灵图);16-bit 是 5:5:5 + 1-bit alpha(老 3D 卡硬件最爱);24-bit 真彩 BGR;32-bit BGRA 带完整 alpha —— 后两个是 1990 年代游戏纹理的事实标准。注意像素是 BGR 顺序(同 BMP),这是 80 年代 x86 小端 + 显存按 byte 读取的共识,移植到 OpenGL / Direct3D 时引擎要逐像素 swap。③ 可选 RLE 压缩:每段 1 byte run header(高位 1 表示 RLE,7 位长度) + 像素值,简陋但解码极快。Quake 1 / 2 / 3 / Unreal Tournament / Half-Life 1 时代的纹理基本都是 24-bit TGA + RLE off(磁盘大,但 mmap 进显存零拷贝),这个工作流一直延续到 2005 年前后 DDS / KTX 把 TGA 替代 —— 因为 GPU 直接支持的压缩纹理(DXT / BCn)能在 VRAM 里压成 1/4 体积,TGA 只是 raw RGBA。

TGA's core, three pieces. ① The 18-byte header is the entire spec — image type ID (colormap / RGB / B&W, RLE or not), colormap attrs, image origin / width / height, bits-per-pixel (8 / 16 / 24 / 32), image descriptor (alpha bit count + scan direction). The whole parser is 50 lines of C. A win for minimalism: compare BMP's three-generation header soup (BITMAPINFOHEADER → V4 → V5); TGA wrote one header and never changed it. ② Four pixel depths: 8 / 16 / 24 / 32 — 8-bit indexed via colormap (old game sprites); 16-bit as 5:5:5 + 1-bit alpha (favourite of early 3D cards); 24-bit truecolour BGR; 32-bit BGRA with full alpha — the last two are the 1990s game-texture standards. Note pixels are BGR-ordered (same as BMP), the consensus of 1980s x86 little-endian + byte-by-byte VRAM reads; engines must swap per pixel when porting to OpenGL / Direct3D. ③ Optional RLE: each run is 1 byte (top bit = RLE flag, 7 bits = length) + pixel value — crude, but very fast to decode. Quake 1 / 2 / 3 / Unreal Tournament / Half-Life 1 era textures were almost all 24-bit TGA, RLE off (bigger on disk, but mmap straight into VRAM with zero copies). That workflow lasted until ~2005 when DDS / KTX replaced TGA — because GPU-native compressed textures (DXT / BCn) shrink to 1/4 in VRAM, while TGA is just raw RGBA.

适用

USE FOR

老 3D 引擎纹理(Quake / Unreal / Source 系 mod)
需要带 alpha + 极简头的中转格式
工具链中间格式(渲染输出 → TGA → 压缩成 DDS / KTX)
视频后期合成中转(Nuke / After Effects 序列帧)

Old 3D-engine textures (Quake / Unreal / Source mods)
Intermediate format that wants alpha + a tiny header
Toolchain bridges (renderer output → TGA → DDS / KTX)
VFX compositing intermediates (Nuke / After Effects sequences)

反适用

AVOID

现代 web(浏览器不支持 · 用 PNG / WebP)
现代游戏引擎运行时(用 DDS / KTX)
需要高压缩比的存储(TGA RLE 几乎无效)
HDR / 浮点纹理(TGA 只到 32-bit 整数)

Modern web (no browser support — use PNG / WebP)
Modern engine runtimes (use DDS / KTX)
Storage where compression matters (TGA RLE barely helps)
HDR / float textures (TGA caps at 32-bit integer)

scope	editors	engines / readers	CLI
TGA (Truevision)	✓ Photoshop 原生 · GIMP · Krita · Affinity Photo	✓✓ Quake / Unreal / Source / Cryengine 系老引擎 · Nuke · After Effects	`convert in.png out.tga`(ImageMagick)· `tga2png` · `stb_image`(50 行 C)

奇闻 · TRIVIA

TRIVIA

"格式厂商死了,格式靠用户死撑" —— Truevision 公司 1990 年代后期破产,但 TGA 格式因为 Quake 引擎的采用而活到今天。这是格式生态学的一个有趣案例:技术格式的生命周期跟标准维护组织无关,跟使用它的圈子有关 —— 1996 年 id Software 选 TGA 当 Quake 纹理后,所有跟 Quake 引擎兼容的工具(Quake 编辑器 QuArK / GtkRadiant / Photoshop TGA 插件)都跟着支持 TGA;一旦这条工具链固化,即使 Truevision 倒了,TGA 也会被这条链上的人继续维护。今天你下 Quake 1 的 mod,里面 .tga 仍能直接 Photoshop 打开,这是 1990 年代游戏圈给 TGA 续命的成果。

TGA 像素是 BGR 顺序(蓝-绿-红),这跟 BMP 完全一致 —— 反映了 1980 年代 x86 小端字节序 + 显存"低字节先读"的硬件共识。把 TGA 移植到 OpenGL / Direct3D 时,引擎都要逐像素 swap 成 RGB —— 这个微小的 byte order 差异让无数初学者在写 TGA loader 时第一次输出"蓝色的天空、红色的水"。

"The vendor died but the users kept the format alive." Truevision went bankrupt in the late 1990s, yet TGA survived through Quake's adoption. It's a clean case study in format ecology: a technical format's lifespan tracks the community using it, not the body maintaining the spec. Once id Software picked TGA for Quake textures in 1996, everything Quake-compatible (QuArK editor / GtkRadiant / Photoshop TGA plugins) had to support TGA too — and once that toolchain calcified, even Truevision's death didn't matter; the chain kept feeding it. Open a Quake 1 mod today and the .tga files still load straight into Photoshop — a 1990s game-community gift that has outlived its original vendor by 25 years.

TGA pixels are BGR-ordered (blue-green-red), exactly like BMP — a reflection of 1980s x86 little-endian + "low byte first" VRAM reads. Porting TGA to OpenGL / Direct3D requires per-pixel swaps to RGB. That tiny byte-order quirk is why every first-time TGA-loader writer outputs blue skies and red water on their first run.

←同期:contemporary: BMP(同期 · 长期并存 · 同 BGR 顺序) ←起源:origin: Truevision Targa 显卡(1984)· 卡带格式 →游戏圈替代:replaced (in games) by: DDS(GPU 压缩纹理)· KTX(跨平台 GPU 容器) ↔仍活在:still alive in: 老 3D 引擎 mod / 工具链中转 / Photoshop 原生支持

ICO / CUR — 浏览器标签上的小图

ICO / CUR — the tiny image on your browser tab

YEAR 1985 (Windows 1.0) AUTHOR Microsoft EXT .ico · .cur(指针) MIME image/x-icon · image/vnd.microsoft.icon DEPTH BMP 内嵌(早期)+ PNG 内嵌(Vista+) ALPHA ✓ (32-bit BMP / PNG 内嵌) STATUS Web favicon 万年标准 · Windows 桌面图标

"它装着多分辨率的同一个图标,16 / 32 / 48 / 256 一锅端。"

"It packs the same icon at 16 / 32 / 48 / 256 — a multi-res bundle."

1985 年 Windows 1.0 出现时,Microsoft 面临一个具体的工程问题:同一个应用图标在 16×16(任务栏 / 标题栏)、32×32(桌面 / 文件管理器)、48×48(start menu)甚至更大尺寸下都要好看 —— 但单纯把 32×32 缩到 16×16 会糊掉,小尺寸需要手工像素绘制(每个像素都要算)。Microsoft 的解法是设计一个"多分辨率包" —— 一个 ICO 文件存 N 个 image entry,每个 entry 是一张完整图(不同尺寸 / 不同色深),操作系统按显示场景挑最合适的那个。1999 年 IE 5 把它推上 web 一等公民:<link rel="icon" href="favicon.ico"> 让浏览器标签也能展示网站图标 —— 这是 favicon 的诞生。Vista(2007)给 ICO 加了"内嵌 PNG"支持,256×256 大尺寸图标终于可用(BMP 256×256 太大,PNG 压缩后只剩 1/10)。CUR 是 ICO 的鼠标指针变种,几乎一样的容器结构,只多两个字段:hotspot x / y(指针的"实际点击位置",比如箭头尖在哪个像素)。今天每个浏览器仍优先认 favicon.ico,即便你已经声明了 SVG / PNG favicon —— 因为 IE 时代的事实标准实在太顽强了。

When Windows 1.0 shipped in 1985, Microsoft faced a concrete engineering problem: the same application icon had to look right at 16×16 (taskbar / titlebar), 32×32 (desktop / Explorer), 48×48 (start menu), and beyond — but naively scaling 32×32 down to 16×16 looks blurry, because at small sizes every pixel must be hand-painted. Microsoft's fix was a "multi-res package": one ICO file holding N image entries, each a complete image at a different resolution / colour depth, with the OS picking the best match for the display scenario. In 1999, IE 5 promoted it to a first-class citizen of the web: <link rel="icon" href="favicon.ico"> let browser tabs show site icons — that's the birth of favicon. Vista (2007) added "embedded PNG" support to ICO, finally making 256×256 icons practical (a 256×256 BMP is huge; PNG-compressed it's a tenth the size). CUR is the cursor variant — same container, plus two extra fields: hotspot x / y (which pixel of the cursor counts as "the click point," e.g. the arrow tip). Every browser today still favours favicon.ico even after you declare SVG / PNG favicons — the IE-era de-facto standard is just that hard to dislodge.

独立手绘 · OS 按场景选最合适

图 42 · ICO 容器结构。一个总容器 box 装着:① 6-byte 头(reserved + type + image count);② N 个 ICONDIRENTRY(16 byte 一条 · 描述每个 entry 的宽 / 高 / 色深 / 偏移);③ N 个真正的 image payload(早期 entry 内嵌 BMP,Vista 后允许 256×256 内嵌 PNG)。OS 显示时根据当前场景(任务栏 / 桌面 / 高 DPI)挑最合适的尺寸;每个尺寸都是独立手绘的,这就是为什么好的 ICO 在 16×16 看不糊 —— 不是缩出来的,是单独画的。

Fig 42 · ICO container layout. One outer box holds: ① a 6-byte header (reserved + type + image count); ② N ICONDIRENTRYs (16 bytes each, describing each entry's width / height / colour depth / offset); ③ N actual image payloads (early entries embed BMP; Vista+ allows 256×256 to embed PNG). At display time the OS picks the best size for the scenario (taskbar / desktop / hi-DPI); each size is independently hand-drawn, which is why a good ICO doesn't blur at 16×16 — that resolution wasn't downscaled, it was painted on its own.

技术内核

Technical core

ICO 内核三件事。① 容器结构 + N 个 image entry —— 6 byte ICONDIR 头(reserved + type=1 是 ICO / 2 是 CUR + image_count) + N 个 16 byte 的 ICONDIRENTRY(每条描述 width / height / colour count / planes / bit count / size / offset)+ N 段真正的 image data。entry 在文件末尾按偏移堆放。这种"目录 + payload"是 80 年代 PE / OLE 时期 Microsoft 的标准设计语言。② 早期 entry 内嵌 BMP,Vista 后允许内嵌 PNG —— 1985-2007 年所有 ICO 内嵌的都是 BMP(去掉 BITMAPFILEHEADER,只留 BITMAPINFOHEADER + 像素 + AND mask),32×32 32-bit alpha 一个 4096 byte 起步。Vista(2007)给 ICO 加 PNG 内嵌支持(magic 字节判断:开头 89 50 4E 47 是 PNG,否则当 BMP),终于让 256×256 大尺寸图标可用 —— BMP 256×256 32-bit 是 256 KB,PNG 压完通常 20-50 KB。这是 ICO 唯一一次重大演进。③ CUR 与 ICO 几乎一样,多 hotspot 字段 —— ICONDIRENTRY 里 reserved 的两个 byte,在 CUR 文件里被重新解释为 hotspot_x / hotspot_y(指针图像里的"实际点击点"坐标,比如箭头尖在 (0, 0) 像素位置)。这就是鼠标指针的全部技术差异。今天 favicon.ico 仍是浏览器最优先识别的图标格式 —— 即便你声明了 <link rel="icon" href="favicon.svg">,浏览器仍会先 GET /favicon.ico 再走 link 标签。

ICO's core, three pieces. ① Directory + N image entries — a 6-byte ICONDIR header (reserved + type=1 for ICO / 2 for CUR + image_count) + N 16-byte ICONDIRENTRYs (each describing width / height / colour count / planes / bit count / size / offset) + N image-data blobs piled at the end by offset. The "directory + payload" idiom is pure 1980s Microsoft (PE / OLE house style). ② Early entries embed BMP, Vista+ embeds PNG — from 1985 to 2007 every ICO entry was a BMP (BITMAPFILEHEADER stripped; just BITMAPINFOHEADER + pixels + AND mask), with a 32×32 32-bit alpha icon starting around 4 KB. Vista (2007) added PNG embedding (magic-byte sniff: 89 50 4E 47 means PNG, otherwise BMP), finally making 256×256 icons practical — a 256×256 32-bit BMP is 256 KB, PNG-compressed usually 20-50 KB. That was ICO's one and only major evolution. ③ CUR is ICO with hotspot fields — the two "reserved" bytes in ICONDIRENTRY are reinterpreted as hotspot_x / hotspot_y in CUR (the cursor's "actual click point," e.g. an arrow's tip lives at pixel (0, 0)). That's the whole cursor difference. Today favicon.ico is still the highest-priority icon for browsers — even with <link rel="icon" href="favicon.svg"> declared, browsers still GET /favicon.ico first, then check the link tags.

适用

USE FOR

Web favicon(浏览器最高优先级)
Windows 桌面 / 文件管理器 / 任务栏图标
需要多分辨率打包的应用图标
CUR · 自定义鼠标指针(游戏 / 老 Windows 主题)

Web favicon (highest browser priority)
Windows desktop / Explorer / taskbar icons
Application icons that need multi-res packaging
CUR · custom mouse cursors (games / old Windows themes)

反适用

AVOID

任何非 favicon / 非桌面图标场景
需要矢量缩放的图标(用 favicon.svg)
跨平台(macOS 用 .icns · Linux 用 PNG)
动画图标(用 SVG / GIF / animated PNG)

Anything that isn't a favicon or desktop icon
Vector-scalable icons (use favicon.svg)
Cross-platform (macOS uses .icns; Linux uses PNG)
Animated icons (use SVG / GIF / animated PNG)

scope	browsers / OS	editors	CLI
ICO / CUR (Microsoft)	✓✓✓ 所有浏览器(自 IE 5)· Windows native · macOS / Linux 也能读	✓ Photoshop(插件)· GIMP · IcoFX · Greenfish Icon Editor	`convert in.png -resize 256 out.ico`(ImageMagick)· `icotool` · `png2ico`

奇闻 · TRIVIA

TRIVIA

favicon.ico 是 IE 5(1999)发明的事实标准 —— 微软在没有跟任何标准组织协商的情况下,直接给浏览器加了"自动 GET /favicon.ico"的行为。一夜之间所有网站都加了 favicon。最初只支持 ICO,PNG / SVG favicon 是 2010 年代后各浏览器才陆续接受的(<link rel="icon" type="image/png">)。但即便今天你只声明 SVG favicon,浏览器仍会先 GET /favicon.ico(404 也照样请求一次)—— 这是 IE 兼容性留下的尾巴,所有 web server 日志里都有大量 GET /favicon.ico 404。

ICO 内嵌 256×256 PNG 的支持是 Windows Vista(2007)才加的 —— 之前必须 BMP,文件巨大(256 KB 起)。Vista 把 PNG 解码塞进了 ICO 解析器,所有 256×256 大尺寸图标才真正可用。这也解释了为什么 1990 年代的 Windows 应用图标普遍是 32×32 / 48×48 —— 不是设计师不想做更大,是文件大小和解码代价不划算。今天的 Windows 11 应用图标普遍是 256×256 + PNG 内嵌。

favicon.ico is an IE 5 (1999) de-facto standard — Microsoft simply added "auto GET /favicon.ico" to the browser without consulting any standards body. Overnight every site added a favicon. ICO was the only format at first; PNG / SVG favicons (<link rel="icon" type="image/png">) only crept in across browsers through the 2010s. But even today, with only an SVG favicon declared, browsers still GET /favicon.ico first (a 404 doesn't stop the request) — an IE-compat tail every web server's logs reveal as countless GET /favicon.ico 404s.

ICO's embedded 256×256 PNG support arrived in Windows Vista (2007). Before that, BMP was the only option, and a 256×256 BMP icon weighed 256 KB. Vista shoved a PNG decoder into the ICO parser, finally making large icons practical. That also explains why 1990s Windows apps mostly stuck to 32×32 / 48×48 — not for lack of artistry, but because file size and decode cost weren't worth it. Today's Windows 11 app icons are commonly 256×256 with embedded PNG.

←前辈:predecessor: BMP(早期 entry 内嵌 BMP) ←起源:origin: Microsoft Windows 1.0(1985)· IE 5 favicon(1999) →扩展:extension: Vista(2007)内嵌 PNG · 终于支持 256×256 ↔现代替代:modern alternatives: favicon.svg / apple-touch-icon.png · 但 favicon.ico 仍最高优先级

NetPBM (PPM / PGM / PBM) — 教科书最爱的 ASCII 三件套

NetPBM (PPM / PGM / PBM) — the textbook ASCII trio

YEAR 1988 AUTHOR Jef Poskanzer EXT .ppm · .pgm · .pbm · .pnm MIME image/x-portable-pixmap LOSSY 无压缩(纯文本 / raw bytes) DEPTH 1-bit (PBM) · 灰度 (PGM) · RGB (PPM) STATUS 学术 / 工具链中转 / Unix 哲学活化石

"你用文本编辑器就能写一张图。"

"You can write an image in a text editor."

1988 年 Jef Poskanzer 写 NetPBM 工具集时,需要一个"最简单的图像格式" —— 简单到 vim 能直接编辑、cat 能看出大致内容、Unix 管道能 convert in.png pgm:- | sharpen | convert pgm:- out.png 串联。三件套从极简递增:PBM(Portable Bitmap,1-bit 黑白)、PGM(Portable Graymap,灰度)、PPM(Portable Pixmap,RGB)。每个有两套编码:ASCII 模式(P1 / P2 / P3 magic)用空格分隔的十进制数字写像素值;binary 模式(P4 / P5 / P6 magic)用 raw 字节。ASCII 模式可以 vim 直接编辑像素 —— 这就是计算机视觉教学最常见的"hello world":学生第一次自己 fwrite 出一张图,几乎都是 PPM。NetPBM 工具集本身有 200+ 个小命令(pamflip / pamcat / pnmtopng / pamscale / ...),每个只做一件事 —— 这是 80s Unix 哲学的活化石,跟 grep / sed / awk 是同一种生物。今天 NetPBM 在生产几乎不用,但学术研究、算法测试、工具链中转格式至今仍用 —— 因为它太简单,任何人 1 小时能写完整的 PPM reader / writer。

In 1988, while writing the NetPBM toolkit, Jef Poskanzer needed "the simplest possible image format" — simple enough that vim could edit it directly, cat could roughly read it, and Unix pipes could chain it as convert in.png pgm:- | sharpen | convert pgm:- out.png. The trio steps up in capability: PBM (Portable Bitmap, 1-bit black & white), PGM (Portable Graymap, grayscale), PPM (Portable Pixmap, RGB). Each has two encodings: ASCII mode (magic P1 / P2 / P3) writing pixel values as space-separated decimal numbers; binary mode (P4 / P5 / P6) using raw bytes. The ASCII mode is editable in vim — which is why PPM is the canonical "hello world" of computer-vision teaching: a student's first fwrite-an-image is almost always a PPM. The NetPBM toolkit itself ships 200+ small commands (pamflip / pamcat / pnmtopng / pamscale / ...), each doing exactly one thing — a living fossil of 1980s Unix philosophy, kin to grep / sed / awk. Today NetPBM is almost never used in production, but academic work, algorithm tests, and toolchain bridges still rely on it — because it is so simple that anyone can write a complete PPM reader / writer in an hour.

图 43 · PPM(P3 ASCII)文件示意。一份完整文件四行起步:① magic P3(P1=PBM ASCII / P2=PGM / P3=PPM,P4 / 5 / 6 是对应 binary);② 4 4 宽高;③ 255 maxval(色深上限,8-bit 就是 255);④ 后面是 4×4=16 个像素的 RGB 三元组。整个文件可用 vim 编辑、cat 阅读。binary 模式(P6)只把第 ④ 段换成 raw 字节,前三行仍是 ASCII —— 所以 PPM 文件 head -3 永远是文本头,parser 一行解析一个字段即可。

Fig 43 · PPM (P3 ASCII) file content. A complete file starts with four lines: ① magic P3 (P1 = PBM ASCII / P2 = PGM / P3 = PPM; P4 / 5 / 6 are the binary counterparts); ② 4 4 width and height; ③ 255 maxval (colour depth ceiling — 255 for 8-bit); ④ then 16 RGB triples for the 4×4 image. Edit it in vim, read it with cat. Binary mode (P6) only replaces section ④ with raw bytes — the first three lines stay ASCII — so head -3 on any PPM is always a text header, and a parser can lex one field per line.

技术内核

Technical core

NetPBM 三件套的内核三件事。① 三档色深 = 三个格式:PBM(1-bit 黑白,每像素 0 / 1)、PGM(灰度,每像素一个 0..maxval 的整数)、PPM(RGB,每像素三个 0..maxval 的整数)。再加一个伞名 PNM(Portable Anymap)代表"上面三个之一",NetPBM 工具命令 pnmtopng 表示"任何 PNM 都能转 PNG"。② ASCII 模式 + binary 模式两套 magic:P1(PBM ASCII)/ P2(PGM ASCII)/ P3(PPM ASCII)的像素值用空格 / 换行分隔的十进制数字写;P4 / P5 / P6 是对应的 binary 模式,像素是 raw 字节(PBM packed bits,PGM / PPM 是 1 byte 或 2 byte per channel 取决于 maxval)。ASCII 体积大但能 vim 编辑、binary 紧凑但仍头部 ASCII。③ 头部 = magic + 宽 + 高 + maxval(PBM 无 maxval)+ 像素值,字段之间用任意空白(空格 / 制表 / 换行)分隔,允许 # 开头的注释行。整个 spec 一页能写完。NetPBM 工具集设计哲学:200+ 个小命令(pamflip 翻转 / pamcat 拼接 / pnmtopng 转 PNG / pamscale 缩放 / pamcomp 合成 / ...),每个 source 几百行 C,只做一件事,可 Unix 管道串联 —— cat input.ppm | pamflip -lr | pnmtopng > output.png 是合法工作流。这套哲学今天活在 ImageMagick / FFmpeg 里,但 NetPBM 是更纯的"原版"。

NetPBM trio's core, three pieces. ① Three depths = three formats: PBM (1-bit B&W, pixel is 0 / 1), PGM (grayscale, pixel is one 0..maxval integer), PPM (RGB, pixel is three 0..maxval integers). Plus an umbrella name PNM (Portable Anymap) meaning "any of the three"; NetPBM's pnmtopng command means "any PNM convertible to PNG." ② ASCII + binary modes, two magics each: P1 (PBM ASCII) / P2 (PGM ASCII) / P3 (PPM ASCII) write pixel values as decimal numbers separated by whitespace; P4 / P5 / P6 are the binary counterparts (PBM packed bits, PGM / PPM 1 byte or 2 bytes per channel depending on maxval). ASCII is bulky but vim-editable; binary is compact but still has an ASCII header. ③ Header = magic + width + height + maxval (PBM has no maxval) + pixel values, fields separated by any whitespace, with # comments allowed. The whole spec fits on one page. NetPBM's toolkit philosophy: 200+ small commands (pamflip, pamcat, pnmtopng, pamscale, pamcomp, ...), each a few hundred lines of C, each doing one thing, all pipeable — cat input.ppm | pamflip -lr | pnmtopng > output.png is a legitimate workflow. That spirit lives on inside ImageMagick / FFmpeg today, but NetPBM is the purer original.

适用

USE FOR

学术研究 / 计算机视觉教学(算法 hello world)
算法测试(写 50 行 C / Python 就能 fwrite 出可视化)
工具链中间格式(很多 Unix 命令直接读写 PPM)
需要"vim 能改"的极端调试场景

Academic research / CV teaching (the algorithm "hello world")
Algorithm testing (50 lines of C / Python fwrites a visualisation)
Toolchain bridges (many Unix tools read / write PPM natively)
Extreme debugging where you need vim-editable pixels

反适用

AVOID

任何生产 / web 场景(无压缩 · 体积比 BMP 还大)
需要 alpha 通道(PPM 本身无 alpha · PAM 才有)
需要色彩管理 / EXIF / 嵌入 ICC 的场景
跟现代 web / GPU 工具链对接(用 PNG)

Any production / web context (no compression — bigger than BMP)
Anything needing alpha (PPM has none — only PAM does)
Workflows needing colour management / EXIF / embedded ICC
Modern web / GPU toolchains (use PNG)

scope	readers	editors	CLI
NetPBM (PPM / PGM / PBM / PNM)	✓ GIMP · ImageMagick · ffmpeg · Python (PIL / OpenCV) · 几乎所有 Unix 图像工具	✓ 任意文本编辑器(ASCII 模式)· GIMP · ImageMagick	NetPBM 200+ 命令套件:`pamcat` · `pnmtopng` · `pnmtotiff` · `pamflip` · `pamscale`

奇闻 · TRIVIA

TRIVIA

NetPBM 工具箱有 200+ 个小命令 —— pamcat 拼接 / pamflip 翻转 / pnmtopng 转 PNG / pamscale 缩放 / pamcomp 合成 / pampaintspill 涂色 / ...,每个只做一件事,可 Unix 管道串联。这是 1980 年代 Unix 哲学最纯粹的活化石,跟 grep / sed / awk / cut / tr 是同一种生物;今天 ImageMagick 把所有这些功能塞进一个 convert 命令,但 NetPBM 仍是更纯的原版。Debian / Ubuntu 包名 netpbm 你装一下就有 200+ 个小命令出现在 PATH 里。

PPM 的 ASCII 模式可以用 vim 直接编辑像素值 —— 这是计算机视觉教学最常见的 hello-world。学生第一次写"画一个红绿蓝渐变图",老师让 fwrite 一份 PPM ASCII,因为不需要任何库。printf("P3\n%d %d\n255\n", w, h) 就是图像头,后面 printf("%d %d %d ", r, g, b) 一行打一个像素。这种"30 行 C 写一张图"的体验是其他格式(PNG / JPEG / WebP)做不到的。

The NetPBM toolkit ships 200+ small commands — pamcat (concat), pamflip (flip), pnmtopng (to PNG), pamscale (resize), pamcomp (composite), pampaintspill (paint spill), and on. Each does one thing, all pipeable — the purest living fossil of 1980s Unix philosophy, kin to grep / sed / awk / cut / tr. Today ImageMagick crams all of it into one convert command, but NetPBM is the cleaner original. apt install netpbm on Debian / Ubuntu and 200+ tiny commands suddenly appear on your PATH.

PPM ASCII mode is vim-editable, which is exactly why it's the canonical CV-teaching hello world. A student's first "draw a red-green-blue gradient" exercise just fwrites a PPM ASCII, with no library needed: printf("P3\n%d %d\n255\n", w, h) is the header, then printf("%d %d %d ", r, g, b) per pixel. The "30 lines of C draws an image" experience just isn't possible with PNG / JPEG / WebP.

←前辈 / 哲学:predecessor / philosophy: 1980s Unix 极简哲学(grep / sed / awk 同代) ←起源:origin: Jef Poskanzer · 1988 · NetPBM 工具集 →扩展:extension: PAM(Portable Arbitrary Map · 加 alpha + 多通道)· PFM(浮点 HDR 表亲) →现代继承:modern descendants: Farbfeld(2014 极简继承)· QOI(2021 也是"超简单"哲学)

XBM / XPM — X Window 的 ASCII 图

XBM / XPM — X Window's ASCII images

YEAR 1985 (XBM) · 1989 (XPM) AUTHOR X Consortium · Bull Research EXT .xbm · .xpm FORMAT 实质就是 C 源码 DEPTH 1-bit (XBM) · 多色 + 调色板 (XPM) STATUS 历史 · X Window 老应用图标

"图片就是 C 数组,#include 进 X 程序就能用。"

"The image is a C array — #include it into your X program."

1985 年 X Window System 在 MIT Athena 项目里诞生时,所有 X 应用都是 C 写的;开发者需要把图标(光标 / 应用 logo / 工具栏 button)直接嵌入程序 —— 当时没有"资源文件"的标准做法(Windows 的 .rc 资源系统 1985 年也才刚出来)。X Consortium 想出一个绝妙的偷懒解法:既然程序是 C,那图标就写成C 字节数组,编译时 #include "icon.xbm" 直接进 binary。XBM(X Bitmap)是 1-bit 黑白:static char name_bits[] = { 0xff, 0x80, 0x40, ... };,每个 byte 8 个像素。1989 年法国 Bull 公司扩展成 XPM(X PixMap)加调色板:文件顶部声明每个 ASCII 字符对应一种颜色(" c None" 透明,". c #ffffff" 白,"+ c #000000" 黑),下面是字符矩阵图,每个像素用一个或多个 ASCII 字符表示 —— 整个 .xpm 文件本身就是合法 C 字符串数组。Web 早期 Mozilla / Netscape 也支持过 XBM / XPM(因为 Unix 上的浏览器开发者太熟了),但 1990 年代后 PNG / GIF 成为主流,XBM / XPM 退到 X 老应用领域。今天 GIMP 安装目录里仍有 .xpm 图标,fvwm / twm 老 X 主题也仍用 XPM —— 这是 80 年代 Unix-C 共生关系的活化石。

When X Window System was born at MIT's Project Athena in 1985, every X app was written in C, and developers needed to embed icons (cursors / app logos / toolbar buttons) directly inside binaries — at the time there was no standard "resource file" idiom (Windows' .rc system also only emerged in 1985). The X Consortium's clever lazy fix: since the program is C, write the icon as a C byte array, then #include "icon.xbm" at compile time. XBM (X Bitmap) was 1-bit B&W: static char name_bits[] = { 0xff, 0x80, 0x40, ... };, eight pixels per byte. In 1989 France's Bull Research extended it to XPM (X PixMap) with palettes: the file header declares one ASCII character per colour (" c None" transparent, ". c #ffffff" white, "+ c #000000" black), followed by a character matrix where each pixel is one or more ASCII characters — the whole .xpm file is itself a valid C string array. Early web Mozilla / Netscape supported XBM / XPM (Unix-side browser devs knew them intimately), but after the 1990s PNG / GIF took over and XBM / XPM retreated to legacy X apps. Today GIMP's install directory still ships .xpm icons; fvwm / twm legacy X themes still use XPM — a living fossil of the 1980s Unix-C symbiosis.

图 44 · XPM 文件内容示例 — 一个 6×6 像素的笑脸图标。文件分两段:① 顶部 colormap(" " = 透明,"." = 黄,"+" = 黑),声明每个 ASCII 字符对应的颜色;② 字符矩阵图,每行一个字符串字面量,每个字符就是一个像素。整文件是合法的 C string 数组,#include "smile.xpm" 直接编译进 binary。这就是为什么早期 X 程序能"内嵌图标" —— 不需要加载器,编译时就嵌进去了。

Fig 44 · XPM file content — a 6×6 smiley icon. The file is two parts: ① a top colormap (" " = transparent, "." = yellow, "+" = black) declaring each ASCII character's colour; ② a character matrix — one string literal per row, each character is one pixel. The whole file is a valid C string array; #include "smile.xpm" compiles it straight into the binary. That's how early X programs "embedded icons" — no loader needed; the icon enters the binary at compile time.

技术内核

Technical core

XBM / XPM 内核两件事。① XBM = 1-bit 黑白 + C 字节数组 —— 文件就是 #define name_width 16 / #define name_height 16 + static char name_bits[] = { 0xff, 0x80, ... };;每个 byte 装 8 个像素(LSB-first,跟 X server 的 bitmap 图像对齐),解析 = C 编译器读取 = 0 解码代价。这种"图片即源码"的设计,只有在开发者就是用户(X Window 程序员)的语境下才合理 —— 没有非开发者会写 XBM。② XPM = 多色 + 调色板 + 字符矩阵 —— 1989 年 Bull 公司扩展:头部声明 "width height ncolors chars_per_pixel",然后是 ncolors 行 colormap("X c #rrggbb" 或 "X c colorname"),最后是字符矩阵(每行一个字符串字面量)。chars_per_pixel 通常是 1,但调色板超过 ASCII 印刷字符数(~94)时可以用 2 个字符代表一个像素(支持上千色)。整文件仍是 C string 数组语法,所以可 #include 进程序也可以从磁盘 fopen 解析(libXpm 提供解析器 —— 但其实就是个 C 源码 lexer)。这种"格式即源码"的设计后来还有几个变种:Apple PICT 早期可序列化为 PostScript,Lisp Machine 直接把图片存成 sexp,但只有 XBM / XPM 真正广泛使用过。今天 XBM / XPM 几乎被 PNG / SVG 完全替代,但你打开 GIMP 安装目录(/usr/share/gimp/2.10/)里的 themes / icons,仍能看到大量 .xpm —— 老 X 应用的代码资产惯性。

XBM / XPM's core, two pieces. ① XBM = 1-bit B&W + C byte array — the file is #define name_width 16 / #define name_height 16 + static char name_bits[] = { 0xff, 0x80, ... };; each byte holds 8 pixels (LSB-first to align with X server bitmaps), and "parsing" means letting the C compiler read it — zero decoding cost. The "image is source code" design only makes sense when developers are the users (X programmers). Non-developers don't author XBMs. ② XPM = multi-colour + palette + character matrix — Bull's 1989 extension declares "width height ncolors chars_per_pixel" at the top, then ncolors colormap lines ("X c #rrggbb" or "X c colorname"), then a character matrix (one string literal per row). chars_per_pixel is usually 1, but if the palette exceeds the ~94 printable ASCII range you can use 2 chars per pixel (supporting thousands of colours). The file remains valid C string-array syntax, so it's both #include-able into a program and fopen-parseable from disk (libXpm provides a parser — really just a C-source lexer). Several variants attempted similar tricks later (Apple PICT serialised to PostScript; Lisp Machines stored images as sexps), but XBM / XPM are the only ones that saw real adoption. They are essentially obsolete today, but the GIMP install directory (/usr/share/gimp/2.10/) still ships piles of .xpm icons in themes / icons — pure code-asset inertia from legacy X apps.

适用

USE FOR

(历史)X Window 系应用图标
(历史)fvwm / twm / IceWM 等老 X 主题
(历史)GIMP / xterm / xfig 内嵌图标资产
极简调试场景(直接 cat 看图,因为是 ASCII)

(legacy) icons in X Window applications
(legacy) fvwm / twm / IceWM old X themes
(legacy) embedded icons in GIMP / xterm / xfig
Extreme debugging — cat the file and "see" the image (it's ASCII)

反适用

AVOID

任何现代场景(用 PNG / SVG)
非 X 平台(Windows / macOS 几乎不用)
高色深 / 大尺寸(XPM 字符矩阵在 256×256 24-bit 下文件巨大)
需要压缩 / alpha 通道 / 色彩管理

Anything modern (use PNG / SVG)
Non-X platforms (almost unused on Windows / macOS)
High-depth / large images (XPM's char matrix bloats at 256×256 24-bit)
Anything needing compression / alpha / colour management

scope	readers	editors	CLI
XBM / XPM (X Consortium)	✓ ImageMagick · GIMP · libXpm · 老 X11 应用 · 早期 Mozilla / Netscape	✓ 任意文本编辑器(本质是 C 源码)· GIMP 导出 · pixmap-tools	`xbmtopbm` · `pamtoxpm` · `convert in.png out.xpm`(ImageMagick)

奇闻 · TRIVIA

TRIVIA

XPM 文件可以直接 cat 到终端用 ASCII 字符"看"图 —— 上世纪 telnet 时代的乐趣:你 ssh 进 sun 工作站,屏幕宽 80 字符,cat 一份 64×40 的 XPM 出来,直接能在终端"看到"那张笑脸 —— 因为字符矩阵就是图本身,空格当透明、点和加号当不同色块,黑白终端下也能勉强看出形状。这种"图就是文本"的设计在今天看来很奇怪,但在 1985 年 X Window + 串口连接的语境下是非常合理的:能把图通过文本协议(邮件 / news)传给同事,对方 cat 一下就能看。

An XPM file can be catted to a terminal and "viewed" as ASCII art — a small joy of the 1980s telnet era. SSH into a Sun workstation, terminal 80 columns wide, cat a 64×40 XPM, and you literally see the smiley face on screen — the character matrix is the picture, with spaces as transparent, dots and pluses as different colour cells; even on a B&W terminal the shape is roughly visible. The "image-is-text" design feels strange today, but in 1985 — X Window plus serial connections — it was deeply pragmatic: you could send pictures over a text protocol (mail / news) and the recipient just had to cat them.

←前辈:predecessor: X Consortium 1985(MIT Athena 项目)· "图片即 C 源码"哲学 ←起源 / 扩展:origin / extension: XBM(1985 1-bit)→ XPM(1989 Bull · 多色 + 调色板) ↔同期不同生态:contemporary, different ecosystem: ICO(Windows 一边)/ XBM(Unix 一边) →被替代:replaced by: PNG / SVG · 但 GIMP / fvwm 等老 X 应用仍带 .xpm 资产

PCX — DOS 时代的痕迹

PCX — a fingerprint of the DOS era

YEAR 1985 AUTHOR ZSoft Corporation EXT .pcx · .dcx (multi-page) MIME image/x-pcx LOSSY 无损 RLE DEPTH 1 / 4 / 8 / 24 bit STATUS 历史 · DOS / 早期 BBS 时代图片标准

"DOS Paintbrush 的产物,有过 BBS 时代的辉煌。"

"From DOS Paintbrush — once king of the BBS era."

1985 年 ZSoft 公司推出 PC Paintbrush —— 这是 DOS 时代最流行的画图程序;Microsoft 1990 年代收购 ZSoft 部分技术后,把 PC Paintbrush 改名 Windows Paintbrush 内置进 Windows 3.0(后来又改名 Paint)。配套的 PCX 格式有两个核心特性:简单 RLE 压缩(让 PCX 比 BMP 小一半)+ 可选 256-color 调色板放在文件末尾(这是个怪设计,也是 PCX 最大的工程特色)。1980 年代后期 BBS 时代是 PCX 的高光时刻 —— 那个年代上传 / 下载图片靠 9600 / 14400 baud 拨号调制解调器(几 KB/s),体积每减一半,下载时间就少一半。BBS 上传图片实际标准就是 PCX,跟 ZIP 套着发是常见组合。1990 年代初 GIF(1987)凭 LZW 压缩(更高压缩比 + 调色板更优)+ AOL / CompuServe 的推广迅速取代 PCX,JPEG(1992)再砍掉照片场景,PCX 跌出舞台。今天 PCX 几乎只在游戏考古(老 DOS 游戏的图形资产)和 fax 系统(早期 fax 软件用 PCX 做缓存)里出现;但 ImageMagick / GIMP 仍能读 .pcx 文件,这是格式生态学里"读得了但没人写"的典型案例。

In 1985 ZSoft Corporation launched PC Paintbrush — the most popular drawing program of the DOS era. Microsoft acquired some of ZSoft's tech in the early 1990s, renamed PC Paintbrush as Windows Paintbrush, and bundled it into Windows 3.0 (later renamed Paint). The accompanying PCX format had two core traits: simple RLE compression (making PCX about half the size of BMP) plus an optional 256-colour palette placed at the end of the file (a quirky design, and PCX's signature engineering trait). The late-1980s BBS era was PCX's golden age — that period's image transfers ran over 9600 / 14400 baud modems (a few KB/s), where halving file size meant halving download time. PCX was effectively the BBS image standard, often bundled inside ZIP archives. In the early 1990s GIF (1987) overtook PCX through LZW compression (better ratio + better palette handling) and AOL / CompuServe distribution; JPEG (1992) then claimed the photo niche, and PCX fell off the stage. Today PCX really only shows up in game archaeology (DOS-era game graphics) and fax systems (early fax software used PCX as an internal cache); ImageMagick / GIMP still read .pcx, a textbook case of "readable but no one writes" in format ecology.

末尾! byte 0 EOF ↑ DOS 早期生成时还不知道用了哪些颜色

图 45 · PCX 文件结构 — 横向三段。① 128 byte 文件头(manufacturer=10 / version / encoding / bits-per-pixel / window 边界 / dpi / 16-color EGA 调色板 / planes 数 / bytes-per-line);② RLE 像素数据(每段 = 1 byte run header + 1 byte 像素值;run header 高 2 位为 11 时,低 6 位是 run length 1-63);③ 256-color VGA 调色板放在文件末尾(769 byte:1 byte signature 0x0C + 256 × 3 byte RGB)。这个"调色板放末尾"的怪设计源于 DOS 早期生成图像时还不知道用了哪些颜色,只能边写像素边累计调色板,生成完才能写出来 —— 顺序写文件 + 不能 seek 回头改的硬约束。

Fig 45 · PCX file structure — three horizontal chunks. ① A 128-byte header (manufacturer = 10 / version / encoding / bits-per-pixel / window bounds / dpi / 16-colour EGA palette / planes / bytes-per-line); ② RLE pixel data (each run = 1 byte header + 1 byte pixel value; if the header's top two bits are 11, the lower six bits are the run length 1-63); ③ the 256-colour VGA palette lives at the file end (769 bytes: 1-byte signature 0x0C + 256 × 3 RGB bytes). The "palette at the end" oddity comes from a DOS-era constraint: when generating the image, the encoder didn't yet know which colours would be used — it had to accumulate the palette while streaming pixels, and only emit it once finished. Sequential write + no-seek constraint of early DOS file I/O.

技术内核

Technical core

PCX 内核两件事。① 简单 RLE 压缩 —— 像素数据每段读 1 byte:如果高 2 位是 11,低 6 位(1-63)就是 run length,下一 byte 是要重复的像素值;否则这 byte 本身就是单像素。这是极简 RLE,压缩率不高(漫画 / 大色块图能砍 50%,真彩照片几乎没效果),但解码代价低,DOS 上 8086 CPU 也能实时解。规范一句话写完。② 头部 128 byte + 像素数据 + 可选 256-color 调色板放在文件末尾 —— 这是 PCX 最大的工程特色,也是它怪的根源。原因:DOS 早期生成图像时是顺序写文件的(8086 上文件 seek 慢且不可靠),编码器边读屏幕像素边写 RLE,但还不知道会用到哪些颜色 —— 索引扩展时(偶尔出现新颜色)只能累计调色板,直到写完所有像素才能在文件末尾追加 769 byte 调色板(1 byte signature 0x0C 标记 + 256 × 3 byte RGB)。这是"流式编码 + 只能追加"硬约束下的产物,跟今天的 PNG / WebP 必须先决定调色板是完全相反的工程权衡。1980 年代 BBS 时代下载 PCX 的人其实经常解码到一半就显示出像素 —— 但调色板还没下载完,所以图像颜色是错的,直到下载完整个文件(包括末尾调色板),才能用正确颜色重绘。这就是 BBS 时代特有的"图像渐进显示但颜色慢慢校准"体验。今天的 progressive JPEG 是有意为之,PCX 的渐进显示其实是意外的副作用。

PCX's core, two pieces. ① Simple RLE — read 1 byte from the pixel stream: if its top two bits are 11, the lower six bits (1-63) give the run length and the next byte is the pixel value to repeat; otherwise the byte is itself a single pixel. Minimalist RLE — modest ratios (50 % for cartoons / flat-colour images, near-nothing for photos), but cheap to decode (real-time on a DOS 8086 CPU). The whole spec fits in one sentence. ② 128-byte header + pixel data + optional 256-colour palette at the end of the file — PCX's signature engineering trait, and the source of all its quirkiness. Reason: in early DOS, file generation was sequential (8086 file seek was slow and unreliable). The encoder streamed screen pixels into RLE while writing the file, yet didn't yet know the full set of colours used — palette accumulation could pick up a new colour at any time, and the palette could only be appended (769 bytes: 1-byte signature 0x0C + 256 × 3 bytes RGB) once all pixels had been written. It's the product of a "streaming encode, append-only" hardware constraint — the opposite of today's PNG / WebP, which must commit a palette up front. BBS-era PCX downloaders frequently saw the image render mid-download — but with wrong colours, until the trailing palette finally arrived and a redraw fixed them. That's the BBS-specific experience: progressive image display with gradually-correcting colours. Modern progressive JPEG is intentional; PCX's progressive display was an accidental side effect.

适用

USE FOR

(历史)DOS 时代图像 / PC Paintbrush 输出
(历史)BBS 上传 / 下载图片
(历史)早期 Windows 3.0 / 3.1 应用
游戏考古 / 老 DOS 游戏图形资产解码

(legacy) DOS-era images / PC Paintbrush output
(legacy) BBS image upload / download
(legacy) early Windows 3.0 / 3.1 applications
Game archaeology — DOS-era graphics asset decoding

反适用

AVOID

任何现代场景(用 PNG / WebP)
真彩照片(RLE 几乎无压缩 · 用 JPEG)
需要 alpha 通道(PCX 不支持透明)
需要色彩管理 / 嵌入 ICC profile

Anything modern (use PNG / WebP)
True-colour photos (RLE buys ~nothing — use JPEG)
Anything needing alpha (PCX has no transparency)
Workflows needing colour management or embedded ICC

scope	readers	editors	CLI
PCX (ZSoft)	✓ ImageMagick · GIMP · IrfanView · XnView · 老 DOS 软件	~ GIMP 导出 · ImageMagick · 现代编辑器很少原生写 PCX	`convert in.pcx out.png`(ImageMagick)· `pcxtoppm`(NetPBM)

奇闻 · TRIVIA

TRIVIA

PCX 是 1980 年代末 PC BBS 上传图片的标准 —— 在 GIF(1987)真正流行起来之前,BBS 上的图片几乎都是 PCX(经常套在 ZIP 里发)。"PCX 调色板放文件末尾"这个怪设计源于 DOS 早期编码器顺序写文件、不能回头 seek 改头部的硬约束 —— 编码器写到一半还不知道全部用了哪些颜色,只能等全写完再追加调色板。这是个有趣的格式 fossil:它告诉你 1980 年代 PC 硬件的真实限制(8086 文件 seek 慢且不可靠),格式设计被迫做"流式编码 + 只能追加"的工程权衡。今天 PCX 几乎不再用,但只要你打开一个老 DOS 游戏的资源包(比如 1990 年代的 King's Quest / Sierra 系列),里面的图形资产仍是 .pcx 文件 —— ImageMagick 现在仍能读,这是格式生态学里"读得了但没人写"的典型案例。

PCX was the de-facto image standard on late-1980s PC BBSes — before GIF (1987) really took off, BBS images were almost all PCX (often packed inside ZIP archives). The "palette at the end of the file" oddity comes from an early-DOS encoder constraint: streaming sequential write, no reliable seek-back to patch the header — the encoder wouldn't know the complete colour set until the very end, so the palette had to be appended. It's a fascinating format fossil that reveals real 1980s PC hardware limits (8086 file seeks were slow and flaky) and the engineering compromise it forced ("streaming encode, append-only"). PCX is essentially never written today, but crack open the resource pack of a 1990s DOS game — King's Quest / Sierra titles, etc. — and the graphics assets are still .pcx files; ImageMagick still reads them. A textbook "readable but no one writes" case in format ecology.

←前辈:predecessor: ZSoft PC Paintbrush(1985 DOS)· 后被 Microsoft 收购 → Windows Paintbrush ←起源:origin: ZSoft Corporation · 1985 · DOS 画图程序配套 →被替代:replaced by: GIF(BBS 时代后期)· JPEG(照片场景)· BMP(Windows 内置) ↔仍活在:still alive in: 老 DOS 游戏资产 / 早期 fax 系统缓存 / ImageMagick 仍能读

Sun Raster — 工作站老照片

Sun Raster — workstation snapshots

YEAR 1988 AUTHOR Sun Microsystems EXT .ras · .rast · .rs DEPTH 1 / 8 / 24 / 32 bit STATUS 已死 · 仅博物馆

"SunOS 屏幕截图的格式 —— 现在只在博物馆里。"

"SunOS screenshot format — found only in museums."

1988 年 Sun Microsystems 推出 SPARCstation —— 那个年代 Unix 工作站的代名词,"a network is the computer" 的实体。SunOS 桌面环境(OpenWindows,基于 NeWS / X11 混合)需要一个标准截图格式,Sun 顺手定义了 Sun Raster:32-byte header(magic / width / height / depth / length / type / colormap type / colormap length)+ 可选 colormap + raw 或 byte-RLE 像素流。极简到一页规格能写完。当时 X11 + xv(著名的图像查看器)是 Sun 工作站圈子的实际标准 —— 写论文 / 投幻灯片 / 文档配图,大家都用 .ras。但走出 Sun 生态就没人认了:PC 那边是 BMP / GIF / PCX,Mac 那边是 PICT / TIFF,Sun Raster 是 Sun 圈内独有方言。1990 年代后期 SGI / HP / IBM 各家工作站逐渐被 Linux / PC 替代,Sun 自己 2009 年被 Oracle 收购,Sun Raster 就跟 SunOS 一起进入历史。今天 ImageMagick 仍能读 .ras,但写它的人几乎绝迹 —— 跟 PCX 一样的"读得了但没人写"格式 fossil。

In 1988 Sun Microsystems shipped the SPARCstation — the very face of Unix workstations in that era, the physical embodiment of "the network is the computer". SunOS's desktop environment (OpenWindows, a NeWS / X11 hybrid) needed a standard screenshot format, and Sun casually defined Sun Raster: a 32-byte header (magic / width / height / depth / length / type / colormap type / colormap length), an optional colormap, and a raw-or-byte-RLE pixel stream. Minimal enough to fit on a single page of spec. At the time X11 + xv (the legendary image viewer) was the workstation circle's de-facto stack — papers, slides, documentation figures, all shipped as .ras. But step outside the Sun ecosystem and no one knew the format: PC land had BMP / GIF / PCX, Mac land had PICT / TIFF, Sun Raster was a Sun-only dialect. In the late 1990s SGI / HP / IBM workstations gave way to Linux / PCs; Sun itself was acquired by Oracle in 2009, and Sun Raster slipped into history alongside SunOS. ImageMagick still reads .ras today, but writers have all but vanished — another "readable but no one writes" fossil, just like PCX.

图 46 · Sun Raster 文件布局 — 横向四段:① 4-byte magic 0x59A66A95;② 28-byte 头部(width / height / depth / length / type / colormap type / colormap length);③ 可选 colormap(8-bit 索引模式);④ 像素流 raw 或 byte-RLE。整个 spec 一页能写完,跟同时代 PCX(128-byte 头 + 末尾调色板)的工程取舍正好相反 —— Sun Raster 把头放在最前。

Fig 46 · Sun Raster file layout — four horizontal chunks: ① 4-byte magic 0x59A66A95; ② 28-byte header (width / height / depth / length / type / colormap type / colormap length); ③ optional colormap (8-bit indexed mode); ④ pixel stream, raw or byte-RLE. The whole spec fits on a single page — the opposite engineering trade-off from contemporary PCX (128-byte header + trailing palette); Sun Raster puts everything up front.

技术内核

Technical core

Sun Raster 内核就一件事:32-byte header + 可选 colormap + raw 或 byte-RLE 像素。头部 8 个 32-bit big-endian 字段:magic(0x59A66A95)、width、height、depth(1/8/24/32)、length(像素数据字节数)、type(0=old / 1=standard / 2=byte-encoded RLE / 3=RGB / 5=TIFF / 6=IFF)、colormap type(0=none / 1=RGB / 2=raw)、colormap length。pixels 类型决定是 raw 还是 RLE:byte-encoded RLE 简单粗暴,跟 PCX RLE 一脉相承 —— 看到 0x80 byte 就开始 run-length 编码(0x80 = escape · 0x80 0x00 = 单个 0x80 字面量 · 0x80 N V = 重复 N 次 V)。整个格式没有 chunk 系统、没有 metadata、没有 EXIF、没有 ICC profile、没有 alpha(depth=32 时第 4 个 byte 是保留位通常不渲染)。这就是 1988 年工作站语境的设计:系统截图工具需要的是简单 + 快 + 跟 X server 内存布局对齐,其它都是包袱。Sun Raster 跟同期(也已死)Silicon Graphics 的 SGI RGB(.rgb / .sgi)、HP 的 PCL Raster 是同一类东西 —— 工作站厂商各自定义的"系统级图片格式",随着工作站本身退场而消亡。

Sun Raster's core is one thing: a 32-byte header + optional colormap + raw or byte-RLE pixels. The header has eight 32-bit big-endian fields: magic (0x59A66A95), width, height, depth (1 / 8 / 24 / 32), length (pixel byte count), type (0 = old / 1 = standard / 2 = byte-encoded RLE / 3 = RGB / 5 = TIFF / 6 = IFF), colormap type (0 = none / 1 = RGB / 2 = raw), colormap length. The type field decides raw versus RLE: byte-encoded RLE is brute-simple and inherits straight from PCX RLE — when a 0x80 byte is seen, run-length kicks in (0x80 = escape · 0x80 0x00 = a literal 0x80 · 0x80 N V = N copies of V). The format has no chunk system, no metadata, no EXIF, no ICC profile, no alpha (depth = 32 reserves the 4th byte but typically doesn't render it). That's the 1988 workstation mindset: a system screenshot tool wants simple + fast + aligned with the X server's framebuffer; everything else is overhead. Sun Raster sits next to Silicon Graphics' SGI RGB (.rgb / .sgi) and HP's PCL Raster as a class of "system-level image formats" defined by individual workstation vendors — and they all died with the workstations themselves.

适用

USE FOR

(已死)SunOS / OpenWindows 屏幕截图
(已死)1990s X11 + xv 工作站论文配图
计算机历史考古 · 老 Sun 实验室资产解码
博物馆数字化项目

(dead) SunOS / OpenWindows screenshots
(dead) 1990s X11 + xv workstation paper figures
Computing-history archaeology — decoding old Sun lab assets
Museum digitisation projects

反适用

AVOID

任何现代场景(用 PNG / WebP)
需要 alpha / metadata / 色彩管理
非 Unix 工作站平台
需要现代浏览器或编辑器原生支持

Anything modern (use PNG / WebP)
Anything needing alpha / metadata / colour management
Non-Unix-workstation platforms
Anything needing native browser or editor support

scope	readers	editors	CLI
Sun Raster (Sun Microsystems)	✓ ImageMagick(legacy)· NetPBM · 老 xv / xli · libsun-raster	~ ImageMagick 还能写 · GIMP 早期版本支持读 · 几乎无现代写入器	`convert in.ras out.png`(ImageMagick)· `rasttopnm`(NetPBM)

奇闻 · TRIVIA

TRIVIA

Sun Raster 的 magic number 0x59A66A95 是 ASCII 'Y' 加三个非 ASCII byte —— 设计者刻意让它在文本编辑器里看起来像"坏掉的二进制",避免有人误把 .ras 当成文本文件打开然后保存(那年代很多人 vi 任何东西)。同样的 trick 在 PNG 上也用过:0x89 P N G 的 0x89 高位是 1,确保被纯 ASCII 文本工具误处理时立刻爆出错误。两者都源于同一个工程焦虑:没有 magic byte 的二进制格式很容易被善意但无知的工具链破坏(FTP ASCII 模式会把 \r\n 改成 \n,vi 会把没有结尾换行的文件加换行,etc.)。今天 PNG 还活着是因为它的 magic 设计够好;Sun Raster 已死,但它留下的工程教训跟 PNG 是同一份。

Sun Raster's magic number 0x59A66A95 is ASCII 'Y' plus three non-ASCII bytes — the designers deliberately made it look like "broken binary" in a text editor, so no one would mistake .ras for a text file, open it in vi, and save it (back then a lot of people vi'd anything). PNG later used the same trick: the 0x89 in 0x89 P N G has its high bit set, so any ASCII-only text tool that touches the file produces an immediate error. Both come from the same engineering anxiety: binary formats without a magic-byte shield are easily corrupted by well-meaning but uninformed toolchains — FTP ASCII mode turns \r\n into \n, vi appends a trailing newline to files lacking one, and so on. PNG survives partly because its magic design is solid; Sun Raster is dead, but its lesson is the same lesson PNG inherited.

←起源:origin: Sun Microsystems 1988 · SPARCstation 配套 / SunOS OpenWindows 截图 ↔同期同类:contemporaries of the same kind: SGI RGB(.rgb / .sgi)· HP PCL Raster · 各家工作站系统级图片格式 →被替代:replaced by: PNG(完全替代)· TIFF(科研存档) ↔仍活在:still alive in: ImageMagick / NetPBM 仍能读 · 实际生产几乎绝迹

IFF / ILBM — Amiga 的传家宝

IFF / ILBM — Amiga's family heirloom

YEAR 1985 AUTHOR Electronic Arts · Jerry Morrison EXT .iff · .lbm · .ilbm STATUS Amiga 复古社区

"chunk 容器思想的祖宗 —— PNG 都受它影响。"

"The grandfather of chunk-based containers — even PNG owes it credit."

1985 年 Commodore 推出 Amiga 1000 —— 那台领先时代 5 年的多媒体个人电脑(custom 芯片组 Agnus / Denise / Paula 同时跑图形 / 音频 / DMA,自定义协处理器堆出 4096 色 HAM 模式 / 4 路立体声 8-bit PCM,1985 年的硬件配置直到 1990s 中期 PC 才追上)。EA(Electronic Arts)的工程师 Jerry Morrison 为 Amiga 设计了 IFF(Interchange File Format)—— 一个"通用的多媒体容器":每个 chunk 是 4-byte ASCII ID + 4-byte big-endian length + payload,FORM 是顶层 chunk(描述具体类型如 ILBM = Interleaved BitMap),内嵌 BMHD(图像头)、CMAP(调色板)、BODY(像素数据)等子 chunk。主流 image type 是 ILBM,按 bitplane 而非 packed pixels 存储 —— 6 张 1-bit bitmap 叠加表达 6-bit 色,正好对应 Amiga 显存的 bitplane 硬件布局。这套 chunk 容器思想后来直接影响了 PNG / WebP / RIFF / AIFF / ISOBMFF / MP4 —— 你今天用过的几乎所有 chunk-based 文件格式都欠 IFF 一个 credit。Amiga 1994 年破产被 Commodore 拖死,IFF 跟着退场到复古圈子,但它的设计 DNA仍活在你每天用的格式里。

In 1985 Commodore launched the Amiga 1000 — a multimedia PC five years ahead of its time (custom chipset Agnus / Denise / Paula running graphics / audio / DMA in parallel; coprocessors stacking up to 4096-colour HAM mode and four-channel 8-bit stereo PCM; 1985 hardware specs PCs only caught up to in the mid-1990s). EA's Jerry Morrison designed IFF (Interchange File Format) for Amiga — a "universal multimedia container": every chunk is a 4-byte ASCII ID + 4-byte big-endian length + payload; FORM is the top-level chunk (declaring the concrete type, e.g. ILBM = Interleaved BitMap), with sub-chunks like BMHD (image header), CMAP (palette), BODY (pixel data). The dominant image type is ILBM, stored as bitplanes rather than packed pixels — six 1-bit bitmaps stacked to represent 6-bit colour, matching Amiga's bitplane framebuffer layout exactly. This chunk-container idiom went on to shape PNG / WebP / RIFF / AIFF / ISOBMFF / MP4 — almost every chunk-based file format you touch today owes IFF a credit. Amiga went down with Commodore in 1994 and IFF retreated to the retro scene, but its design DNA still lives in formats you use every day.

图 47 · IFF chunk 树 + ILBM bitplane 布局。左:FORM "ILBM" 顶层容器内嵌 BMHD(头)、CMAP(调色板)、BODY(像素)三个 sub-chunk —— 这就是 chunk-based 容器的范本。右:ILBM 把图像拆成 N 张 1-bit bitmap(plane 0 = 像素值的 bit 0,plane 1 = bit 1,...)按行交错存储,跟 Amiga 显存 bitplane 硬件直接对齐 —— 显示时 Denise 芯片可以同时从 N 张 bitmap 取位组成像素索引。

Fig 47 · IFF chunk tree + ILBM bitplane layout. Left: a FORM "ILBM" top-level container holding BMHD (header), CMAP (palette), BODY (pixels) sub-chunks — the very prototype of chunk-based containers. Right: ILBM splits the image into N 1-bit bitmaps (plane 0 = bit 0 of the pixel value, plane 1 = bit 1, ...), interleaved row by row, matching Amiga's bitplane framebuffer hardware so the Denise chip could read N bitmaps in parallel to assemble pixel indices.

技术内核

Technical core

IFF / ILBM 内核两件事。① chunk = 4-byte ASCII ID + 4-byte big-endian length + payload(payload 末尾 padding 到偶数对齐)。这套语法极其简洁:解码器读 8 byte 就知道这个 chunk 是什么、有多大、跳到哪;不认识的 chunk 直接 skip 不报错 —— 前向兼容靠这个机制实现。FORM / LIST / CAT 是几个特殊容器 chunk(payload 内嵌其它 chunk),其它的 BMHD / CMAP / BODY / GRAB / CRNG / CCRT / ANNO / AUTH 等都是 leaf chunk。整套机制后来被 PNG 抄了:PNG 的 IHDR / PLTE / IDAT / IEND chunk 系统(4-byte type + 4-byte length + data + 4-byte CRC)就是 IFF chunk 加了一个 CRC 校验字段。WebP 用 RIFF(IFF 的 little-endian 变种)更是直接继承。② ILBM 按 bitplane 而非 packed pixels 存储 —— 一行 320 像素 6-bit 颜色,packed pixels 存法是 320 × 6 bit / 8 = 240 byte,一行像素值连续;ILBM 存法是 6 张 320-bit(40-byte)bitmap 交错,每张 bitmap 上一个像素位置只放该像素值的某一位。这种"诡异"布局不是为了压缩,是为了跟 Amiga Denise 芯片的 bitplane DMA 硬件直接对齐 —— Denise 在每个像素时钟从 6 张 bitmap 同时取一位组成 6-bit 索引,然后查 CMAP 调色板得到 RGB。这是 1985 年定制硬件 + 文件格式协同设计的范例,跟 GIF 的 LZW(为通用 CPU 设计)完全不同的工程方向。今天 ILBM 只在 Amiga 模拟器(WinUAE / FS-UAE)和老游戏(Lemmings / Defender of the Crown / Shadow of the Beast)资产解码里用得到 —— 但 chunk-container 思想已经统治了一切。

IFF / ILBM's core, two pieces. ① chunk = 4-byte ASCII ID + 4-byte big-endian length + payload (payload padded to even-byte alignment). Brutally simple: read 8 bytes and the decoder knows what the chunk is, how big, and where to jump; unknown chunks are silently skipped — forward compatibility falls out of this mechanism. FORM / LIST / CAT are the special container chunks (their payload nests other chunks); everything else (BMHD / CMAP / BODY / GRAB / CRNG / CCRT / ANNO / AUTH ...) is a leaf chunk. PNG copied the idiom wholesale: PNG's IHDR / PLTE / IDAT / IEND system (4-byte type + 4-byte length + data + 4-byte CRC) is essentially IFF chunks plus a CRC field. WebP, built on RIFF (the little-endian variant of IFF), inherits even more directly. ② ILBM stores bitplanes rather than packed pixels — a 320-pixel row at 6-bit colour, packed-pixel storage is 320 × 6 bit / 8 = 240 bytes (pixel values laid out contiguously); ILBM stores it as six interleaved 320-bit (40-byte) bitmaps, each bitmap holding one bit of the pixel index. The "weird" layout isn't for compression — it's aligned with the Amiga Denise chip's bitplane DMA hardware: every pixel clock Denise reads one bit from each of the six bitmaps in parallel to assemble a 6-bit index, then a CMAP lookup gives RGB. A perfect 1985 example of custom hardware co-designed with the file format — the opposite engineering direction from GIF's LZW (designed for general-purpose CPUs). Today ILBM lives only in Amiga emulators (WinUAE / FS-UAE) and legacy game asset decoding (Lemmings / Defender of the Crown / Shadow of the Beast) — but the chunk-container idea has gone on to dominate everything.

适用

USE FOR

(已死)Amiga 应用程序图像 / 游戏资产
Amiga 模拟器(WinUAE / FS-UAE)
计算机历史 / 复古游戏考古
研究 chunk-container 设计的"原型样本"

(dead) Amiga application images / game assets
Amiga emulators (WinUAE / FS-UAE)
Computing history / retro game archaeology
Studying chunk-container design from its prototype

反适用

AVOID

任何现代场景(用 PNG / WebP)
非 Amiga 平台原生显示
需要现代浏览器或社交媒体支持
大色深 / 真彩照片(bitplane 布局对 24-bit 不友好)

Anything modern (use PNG / WebP)
Native display outside Amiga platforms
Anything needing modern browser / social-media support
High-depth / true-colour photos (bitplane layout fights 24-bit)

scope	readers	editors	CLI
IFF / ILBM (EA · Jerry Morrison)	✓ ImageMagick · 部分 GIMP 版本 · WinUAE / FS-UAE · libilbm	~ GIMP(部分版本)· DPaint(原生 Amiga)· ImageMagick 转换	`iffinfo`(legacy)· `ilbmtoppm` / `ppmtoilbm`(NetPBM)

奇闻 · TRIVIA

TRIVIA

IFF chunk 容器思想直接影响了 PNG / WebP / RIFF / AIFF / ISOBMFF —— 你用过的几乎所有 chunk-based 文件格式都欠 IFF 一个 credit。Microsoft 1991 年定义 RIFF(用于 WAV / AVI)就是 IFF 的 little-endian 变种,只把 multibyte fields 改成 LE 读;Apple 同年定义 AIFF 干脆就是 IFF audio variant 直接用;PNG 1996 年的 chunk 系统(IHDR / PLTE / IDAT / IEND)仅在 IFF 基础上加了 CRC32 校验和分大小写位的 chunk type 标志(大写 = critical / 小写 = ancillary)。WebP 2010 年用 RIFF 容器装 VP8 帧 —— 等于又回到 IFF 的设计原点。Jerry Morrison 1985 年没料到他这个 8-byte chunk header 设计会成为后续 40 年文件格式的基础语法。

IFF's chunk-container idea directly shaped PNG / WebP / RIFF / AIFF / ISOBMFF — almost every chunk-based file format you've used owes IFF a credit. Microsoft's 1991 RIFF (for WAV / AVI) is just little-endian IFF, swapping multibyte fields to LE; Apple's same-year AIFF is literally IFF's audio variant. PNG's 1996 chunk system (IHDR / PLTE / IDAT / IEND) only adds CRC32 plus case-bit flags on chunk types (uppercase = critical / lowercase = ancillary) on top of IFF. WebP in 2010 wrapped VP8 frames in RIFF — closing the loop back to IFF's original design. Jerry Morrison in 1985 had no idea his 8-byte chunk header would become the foundational syntax of the next 40 years of file formats.

←起源:origin: Electronic Arts · Jerry Morrison · 1985 · 为 Amiga 1000 设计 →设计 DNA 影响:DNA influence: PNG chunk 系统 · WebP RIFF 容器 · AIFF · ISOBMFF · 几乎所有 chunk-based 格式 ↔同期同类:contemporaries of the same kind: Macintosh PICT · GIF(都是 1985-1987 年通用图像容器) ↔仍活在:still alive in: Amiga 模拟器 / 复古游戏资产 / NetPBM 工具链

QOI — 现代极简主义

QOI — modern minimalism

YEAR 2021 AUTHOR Dominic Szablewski (phoboslab) EXT .qoi MIME image/qoi STD spec 1 页 LOSSY 无损 DEPTH 8 / 16-bit/channel ALPHA ✓ STATUS 玩具 / 嵌入式 / 教学

"一个人,一个周末,写了一个比 PNG 简单 100 倍的格式。"

"One person, one weekend, made a format 100× simpler than PNG."

2021 年 Dominic Szablewski(phoboslab,知名 JS 游戏引擎 Impact / Q1K3 系列作者)做了一个反思:"为什么 PNG 这么复杂?LZ77 + Huffman + 5 种 filter + zlib 包装 + chunks 系统 + CRC32?能不能用一个周末写一个'够用'的无损图片格式?"答案是 QOI(Quite OK Image format):5 个简单 op(RGB / RGBA / INDEX / DIFF / RUN),编码器和解码器各 ~300 行 C 代码,速度比 PNG 快 50× 编码 / 3-4× 解码,体积大 5-10%。spec 一页 PDF 印得下。Dominic 把项目发到 Hacker News 后排第一,Reddit / Twitter 疯转,一周内 ffmpeg / ImageMagick / Rust crates / Go libraries 就接入了 QOI,phoboslab/qoi 单 repo 拿到 7000+ star。这是格式生态学里少见的"个人作品 vs 工业标准"现象 —— QOI 不会替代 PNG(浏览器零支持 + 体积更大),但它证明了"PNG 的复杂度其实很多是历史包袱,90% 用例不需要"。同时代 Farbfeld(2014 suckless)是同类哲学的另一个尝试,两者并存形成"现代极简主义"小流派。

In 2021 Dominic Szablewski (phoboslab, well-known author of the Impact JS engine and the Q1K3 game series) asked a sharp question: "Why is PNG so complex? LZ77 + Huffman + five filter types + zlib wrapping + chunks + CRC32 — could you write a 'good enough' lossless image format in a weekend?" The answer was QOI (Quite OK Image format): five simple ops (RGB / RGBA / INDEX / DIFF / RUN), encoder and decoder ~300 lines of C each, encoding ~50× faster and decoding 3-4× faster than PNG, files 5-10% larger. The spec fits on a one-page PDF. Dominic posted it on Hacker News and hit #1; Reddit / Twitter blew up; within a week ffmpeg / ImageMagick / Rust crates / Go libraries had QOI support, and phoboslab/qoi crossed 7000+ stars. A rare format-ecology event: a personal project versus an industrial standard — QOI won't displace PNG (zero browser support + larger files), but it proved "much of PNG's complexity is historical baggage; 90% of use cases don't need it." Farbfeld (2014, suckless) is a contemporary attempt in the same vein; the two coexist as the "modern minimalism" mini-school.

图 48 · QOI 5 个 op 标签表 + 一行像素的编码示意。① 5 个 op:RGB(8-bit prefix · 3 byte 像素)、RGBA(8-bit prefix · 4 byte 像素)、INDEX(2-bit prefix · 6-bit hash 表 idx)、DIFF(2-bit prefix · 6-bit ±2 颜色差)、RUN(2-bit prefix · 6-bit run 长度 1-62)。② 编码一行 6 像素 [R R R G B B] 流:先 RGB R0 写第一个红、RUN 2 重复 2 次、DIFF 跳到 G(假设差 ±2 内)、RGB 写新蓝色 B、RUN 1 重复 1 次。③ INDEX 引用最近 64 像素的 hash 表 —— 公式 (r·3 + g·5 + b·7 + a·11) % 64,极简但实测冲突率合理。

Fig 48 · QOI's five op tags and a sample one-row encoding. ① Five ops: RGB (8-bit prefix · 3-byte pixel), RGBA (8-bit prefix · 4-byte pixel), INDEX (2-bit prefix · 6-bit hash-table index), DIFF (2-bit prefix · 6-bit ±2 colour delta), RUN (2-bit prefix · 6-bit run length 1-62). ② Encoding a 6-pixel row [R R R G B B]: write the first red as RGB R0, then RUN 2, then DIFF for G (assuming the delta fits in ±2), then RGB for the new blue B, then RUN 1. ③ INDEX references a hash table of the last 64 pixels — the formula (r·3 + g·5 + b·7 + a·11) % 64 is dead simple but has a reasonable collision rate in practice.

技术内核

Technical core

QOI 内核四件事。① 极简 5 个 op:QOI_OP_RGB(prefix 8-bit + 3 byte 像素)、QOI_OP_RGBA(prefix 8-bit + 4 byte 像素)、QOI_OP_INDEX(prefix 2-bit + 6-bit 引用最近 64 像素 hash 表)、QOI_OP_DIFF(prefix 2-bit + 6-bit ±2 RGB 颜色差)、QOI_OP_RUN(prefix 2-bit + 6-bit run 长度 1-62)。整个码本就 5 个 op,跟 PNG 的 LZ77 + Huffman 比起来连"压缩"都谈不上 —— QOI 是用结构化预测(相邻像素相同 → RUN;相邻像素差 ±2 内 → DIFF;最近 64 像素曾出现 → INDEX)避免重复传输,LZ77 才是真正的字典压缩。② 编码 / 解码各只 ~300 行 C 代码 —— phoboslab 把整个 reference 实现做成单文件 header-only library `qoi.h`,500 行不到包括两套 API。对比之下 libpng + zlib 加起来 100k+ 行。③ 速度比 PNG 快 ~50×(编码)/ ~3-4×(解码)—— 因为没有 LZ77 字典查找、没有 Huffman 自适应、没有 5 种 filter 自适应预测;每个像素就是 1-5 byte 的 op + payload,编码器一遍扫,解码器也一遍扫,cache friendly 到极致。④ 体积比 PNG 大 ~5-10%—— 这是 QOI 的代价。在嵌入式 / 教学 / 不能依赖大型 codec lib 的场景里这点体积差完全可以接受,但 web 上下行带宽贵,所以 QOI 永远不会替代 PNG。Hacker News 上很多人不理解这一点 —— "为什么浏览器不集成?" 因为 web 是体积敏感的,QOI 是 CPU / 复杂度敏感的,目标用户群完全不同。

QOI's core, four pieces. ① Five minimal ops: QOI_OP_RGB (8-bit prefix + 3-byte pixel), QOI_OP_RGBA (8-bit prefix + 4-byte pixel), QOI_OP_INDEX (2-bit prefix + 6-bit reference into a hash table of the last 64 pixels), QOI_OP_DIFF (2-bit prefix + 6-bit ±2 RGB delta), QOI_OP_RUN (2-bit prefix + 6-bit run length 1-62). The whole codebook is five ops — compared to PNG's LZ77 + Huffman, you can barely call QOI "compression": it uses structured prediction (adjacent pixel identical → RUN; delta within ±2 → DIFF; one of the last 64 → INDEX) to avoid retransmitting redundant data; LZ77 is true dictionary compression. ② Encoder / decoder each ~300 lines of C — phoboslab ships the whole reference implementation as a header-only single file `qoi.h`, under 500 lines including both APIs. Compare libpng + zlib together: 100k+ lines. ③ ~50× faster encoding, ~3-4× faster decoding than PNG — no LZ77 dictionary lookup, no adaptive Huffman, no five-way filter prediction; each pixel becomes a 1-5 byte op + payload; the encoder scans once, the decoder scans once, cache-friendly to the extreme. ④ Files ~5-10% larger than PNG — QOI's cost. Acceptable in embedded / teaching / no-big-codec-lib scenarios; unacceptable on the web, where downlink bandwidth is expensive — which is why QOI will never displace PNG. Many on Hacker News didn't get this — "why don't browsers integrate it?" Because the web is size-sensitive while QOI is CPU- and complexity-sensitive; their target audiences barely overlap.

适用

USE FOR

嵌入式 / IoT(SRAM 紧张,装不下 libpng)
教学场景 · 让学生 1 周内写完一个无损图片格式
游戏内部资产(载入速度比体积重要)
命令行工具临时缓存(qoi 比 ppm / bmp 都好)

Embedded / IoT (tight SRAM, no room for libpng)
Teaching — students can write a complete lossless format in a week
Game internal assets (load speed beats file size)
CLI tool intermediate caches (better than ppm / bmp)

反适用

AVOID

Web(浏览器零支持 + 体积比 PNG 大)
需要 progressive / interlace / 多帧动画
需要色彩管理 / EXIF / ICC profile
对体积敏感的存档场景(用 PNG / WebP)

Web (zero browser support + larger than PNG)
Anything needing progressive / interlaced / multi-frame
Anything needing colour management / EXIF / ICC profiles
Size-sensitive archival (use PNG / WebP)

scope	readers	editors	CLI
QOI (phoboslab)	✓ ImageMagick · ffmpeg · Rust qoi crate · Go qoi · Python qoi-py · phoboslab/qoi.h	~ ImageMagick / GIMP(插件)/ ffmpeg 转换 · 浏览器零支持	`qoiconv in.png out.qoi` · `convert in.qoi out.png`

奇闻 · TRIVIA

TRIVIA

QOI 作者用一个周末写完发到 Hacker News 排行第一 —— 几天后 ffmpeg / ImageMagick / 数十个 Rust crate / Go library 全都接入了 QOI 支持。这种"个人项目一周内被工业标准 codec 库吸收"的事在格式生态学里极其罕见,通常是反过来的(工业标准花数年推动)。另一个有趣的设计细节:QOI 的 RGBA hash 函数 (r·3 + g·5 + b·7 + a·11) % 64 用四个相邻质数加权 —— 极其简洁但实测平均冲突率合理(对 photo-realistic 图像约 30-40% INDEX 命中,意味着大量像素只用 1 byte 就编完了);Dominic 在 spec 里说他试了几十种 hash 后选了这个,纯经验主义。这种"不证明只测试"的工程态度跟 PNG / WebP 那种"先论证再实现"的标准化流程形成强烈对比。

QOI's author wrote it in a weekend, posted to Hacker News, and hit #1 — within days ffmpeg / ImageMagick / dozens of Rust crates / Go libraries all shipped QOI support. The "individual project absorbed into industrial codec libraries within a week" pattern is vanishingly rare in format ecology; usually it's the reverse (an industrial standard takes years to push). Another delightful design detail: QOI's RGBA hash function (r·3 + g·5 + b·7 + a·11) % 64 weights four adjacent primes — brutally simple, yet measured average collisions are acceptable (photo-realistic images hit INDEX about 30-40% of the time, meaning many pixels encode in a single byte); Dominic says in the spec he tried dozens of hashes and just picked this one, pure empiricism. The "test, don't prove" stance contrasts sharply with PNG's / WebP's "argue first, implement later" standardisation tradition.

←起源:origin: Dominic Szablewski (phoboslab) · 2021 · "PNG 太复杂"的反思 ←设计动机:design motivation: 反思 PNG 的 LZ77 + Huffman + filter + zlib + chunks 复杂度 ↔同时代极简主义:contemporary minimalism: Farbfeld(2014 suckless · 完全无压缩)· NetPBM(1980s · ASCII 极简) ↔现实定位:real position: 不会替代 PNG · 但证明"PNG 90% 复杂度其实是历史包袱"

Farbfeld — suckless 的 32 字节头

Farbfeld — suckless's 32-byte header

YEAR 2014 AUTHOR suckless.org EXT .ff COMPRESS 无 · 让 gzip 来管 DEPTH 16-bit RGBA STATUS suckless 圈子

"32 byte 头 + 16-bit BE RGBA,可被 gzip 替你压缩。"

"32-byte header + 16-bit BE RGBA — let gzip do the compression for you."

2014 年 suckless.org —— 那个推崇 dwm / dmenu / st / surf 的极简主义社区(口号"software that sucks less",拒绝任何"非必要"的功能)—— 做了 Farbfeld:一个完全无压缩的图像格式,32 byte 头部 + raw 16-bit big-endian RGBA。整个 spec 文档总共 11 行,比 PNG spec(200+ 页)短 4 个数量级。设计哲学就一句话:"压缩是 gzip 的事,不是图像格式的事。"于是 Unix 管道用得很好:png2ff in.png | gzip > out.ff.gz 就能存档,cat in.ff.gz | gunzip | ff2png > out.png 就能恢复;每个工具只做一件事(do one thing well),完全是 Unix 哲学的复刻。学术 / suckless 圈子里这种极简哲学很受欢迎,但生产几乎无人用 —— 没有压缩(gzip 替代品)、没有 alpha 行为标准、没有 metadata、没有色彩管理、没有 progressive。Farbfeld 跟同时代的 QOI(2021)是"现代极简主义"双子星:QOI 是"五个 op + ~300 行解码器",Farbfeld 是"32 byte 头 + 0 行解码器(就是 raw bytes)"—— 走得比 QOI 更远,但也更不实用。

In 2014 suckless.org — the minimalism community behind dwm / dmenu / st / surf, motto "software that sucks less", famous for refusing anything "non-essential" — released Farbfeld: a fully uncompressed image format, 32-byte header + raw 16-bit big-endian RGBA. The whole spec document is 11 lines — four orders of magnitude shorter than PNG's 200+ page spec. The design philosophy is one sentence: "compression is gzip's job, not the image format's." So Unix pipes work beautifully: png2ff in.png | gzip > out.ff.gz for archive, cat in.ff.gz | gunzip | ff2png > out.png to restore; each tool does one thing well — pure Unix philosophy reimplemented. The academic / suckless circle adores this minimalism, but production usage is essentially zero — no compression (gzip stands in), no defined alpha semantics, no metadata, no colour management, no progressive. Farbfeld and the contemporaneous QOI (2021) are the twin stars of "modern minimalism": QOI is "five ops + ~300-line decoder", Farbfeld is "32-byte header + zero-line decoder (literally raw bytes)" — going further than QOI, and even less practical.

图 49 · Farbfeld 文件布局 — 32 byte 固定头(8 byte magic "farbfeld" + 4 byte 宽 BE + 4 byte 高 BE + 16 byte 保留)+ 像素流(每像素 8 byte = 4 channel × 16-bit BE = R / G / B / A)。完全无压缩。整个 spec 11 行写完(magic / width / height / 像素流定义 / 颜色空间约定),官方 site 上一页就能看完。设计哲学:"压缩是 gzip 的事,不是图像格式的事" —— 跟 PNG 内置 zlib / WebP 内置 VP8 完全相反。

Fig 49 · Farbfeld file layout — a fixed 32-byte header (8-byte magic "farbfeld" + 4-byte BE width + 4-byte BE height + 16 reserved) + pixel stream (each pixel 8 bytes = 4 channels × 16-bit BE = R / G / B / A). Fully uncompressed. The entire spec fits in 11 lines (magic / width / height / pixel stream definition / colour-space convention) — readable on a single page on the official site. Design philosophy: "compression is gzip's job, not the image format's" — the polar opposite of PNG bundling zlib or WebP bundling VP8.

技术内核

Technical core

Farbfeld 内核两件事。① 头部固定 32 byte:8 byte ASCII magic "farbfeld"(注意是小写,跟 PNG 0x89PNG 不同 —— suckless 觉得 magic byte 里塞 high-bit 是过度工程)+ 4 byte big-endian uint32 width + 4 byte big-endian uint32 height + 16 byte 保留。spec 没说保留区做什么,因为不需要—— 任何扩展应该靠 gzip 包装外面的 metadata 文件,而不是格式内部。② 像素流 16-bit big-endian RGBA × (width × height),完全无压缩 —— 设计哲学是"压缩是 gzip 的事,不是图像格式的事"。所以一张 1920×1080 的 Farbfeld 文件原始大小就是 32 + 1920 × 1080 × 8 = 16.59 MB,gzip 之后大概 2-5 MB(取决于内容);PNG 同图像大概 200 KB-2 MB,WebP 大概 100 KB-1 MB。Farbfeld 在体积上完全打不过 —— 但它的 spec 短(11 行)、解码器短(0 行,因为就是 raw bytes,memcpy 就完事)、跟 Unix 管道完美兼容(每个 pipeline stage 只做一件事:png2ff 转入,gzip 压缩,网络传输,gunzip 解压,ff2png 转出)。这是哲学上的图像格式,不是工程上的图像格式。生产用不了,但教 Unix 哲学课的时候完美样本。

Farbfeld's core, two pieces. ① Fixed 32-byte header: 8 bytes of ASCII magic "farbfeld" (note lowercase — different from PNG's 0x89PNG; suckless considers high-bit-in-magic over-engineering) + 4-byte big-endian uint32 width + 4-byte big-endian uint32 height + 16 reserved bytes. The spec doesn't say what reserved bytes are for, because they don't need to be — any extension should be a separate metadata file wrapped in the same gzip, not inside the format. ② Pixel stream: 16-bit big-endian RGBA × (width × height), completely uncompressed — the design philosophy is "compression is gzip's job, not the image format's." So a 1920×1080 Farbfeld file is literally 32 + 1920 × 1080 × 8 = 16.59 MB raw; gzip takes it down to 2-5 MB depending on content; PNG of the same image is ~200 KB-2 MB; WebP ~100 KB-1 MB. Farbfeld loses on size every time — but its spec is short (11 lines), its decoder is zero lines (raw bytes, memcpy and you're done), and it's perfect with Unix pipelines (each stage does one thing: png2ff converts in, gzip compresses, network transfers, gunzip decompresses, ff2png converts out). It's a philosophical image format, not an engineering one. Useless in production, perfect for teaching Unix philosophy.

适用

USE FOR

Unix 管道临时缓存(png2ff | gzip)
suckless 哲学的练习 / 教学样本
极简图像处理工具链(每段 pipeline 各司其职)
科研里需要完全确定字节布局的场景

Unix pipeline intermediate caches (png2ff | gzip)
suckless-philosophy exercises / teaching samples
Minimalist image-processing toolchains (one job per stage)
Research scenarios needing fully deterministic byte layout

反适用

AVOID

几乎任何实际场景(用 PNG / WebP / AVIF)
Web(浏览器零支持 + 体积巨大)
需要 metadata / 色彩管理 / EXIF
需要 8-bit RGB(Farbfeld 强制 16-bit RGBA,常见 8-bit 输入要 expand 一倍体积)

Almost any real-world scenario (use PNG / WebP / AVIF)
Web (zero browser support + massive size)
Anything needing metadata / colour management / EXIF
Common 8-bit RGB input (Farbfeld forces 16-bit RGBA — doubles the size)

scope	readers	editors	CLI
Farbfeld (suckless.org)	✓ suckless farbfeld utils · NetPBM(部分版本)· ImageMagick	~ 任意能写 16-bit BE RGBA 的工具(几行 C 就能写)	`png2ff` · `ff2png` · `jpg2ff` · `pamtoff`(NetPBM)

奇闻 · TRIVIA

TRIVIA

Farbfeld 的设计文档总共 11 行 —— 比 PNG spec(200+ 页 RFC 2083)短 4 个数量级,比 JPEG ITU T.81 spec(186 页)短 3 个数量级。suckless 官网上整个 Farbfeld 介绍页面用一屏就翻完。这种"格式 spec 一页放得下"的极简主义在历史上确实出现过几次:NetPBM(1980s · 7 行 PPM ASCII spec)、Sun Raster(1988 · 一页规格)、QOI(2021 · 一页 PDF),但 Farbfeld 把它推到了极致 —— 11 行,不可再短。这是哲学上的胜利(每个细节都被剃掉到只剩骨架),但工程上的失败(没人在生产用,因为 16-bit 强制 + 无压缩两条都不通融)。suckless 自己其实也清楚这点 —— 他们做 Farbfeld 主要是为了证明一种可能性,而不是为了占领市场。

Farbfeld's design document is exactly 11 lines — four orders of magnitude shorter than PNG's spec (200+ pages, RFC 2083), three orders shorter than JPEG ITU T.81 (186 pages). The entire Farbfeld page on suckless.org fits in a single screen. The "spec on one page" minimalism has shown up a few times in history: NetPBM (1980s · 7-line PPM ASCII spec), Sun Raster (1988 · one-page spec), QOI (2021 · one-page PDF) — but Farbfeld takes it to the limit: 11 lines, can't go shorter. A philosophical victory (every detail shaved down to bone), but an engineering loss (no one ships it in production, because the 16-bit-mandatory + no-compression combo is non-negotiable in the wrong direction). suckless themselves know this — Farbfeld exists primarily to prove a possibility, not to capture a market.

←起源:origin: suckless.org · 2014 · "software that sucks less" 哲学 ←设计动机:design motivation: Unix 哲学 + "压缩是 gzip 的事"分层设计 ↔同代极简主义双子星:contemporary minimalism twins: QOI(5 op + ~300 行解码器)· Farbfeld(0 行解码器 · raw bytes)· NetPBM(1980s 同哲学先驱) ↔现实定位:real position: 生产几乎无人用 · 但 Unix 哲学的活样本

PHASE VI

卫星 / 科学派 — 旅程的边境

Sat · science — the frontier of the journey

这是旅程的边境。医学 CT 的 12-bit 灰度承担误诊与否、天文 FITS 的 16-bit float 决定一颗星星是否被发现、卫星 GeoTIFF 的多光谱波段被用来推测森林火灾。这一派的像素是科学证据——每个 bit 都在被审阅。FITS 1981 至今仍在用、DICOM 锁住医院 IT 四十年,你会看到为什么"科学数据格式"必须是偏执的稳定。

This is the frontier of the journey. The 12-bit grayscale of a medical CT carries the weight of a possible misdiagnosis. The 16-bit float of an astronomical FITS decides whether a star gets discovered. The multi-spectral bands of a satellite GeoTIFF predict forest fires. Pixels in this family are scientific evidence — every bit is being audited. FITS from 1981 is still in use; DICOM has locked hospital IT for forty years. You'll see why "scientific data format" must be obsessively stable.

birth

→

edit

→

compress

→

transmit

→

decode

→

VRAM

→

sample

→

screen

GeoTIFF — 卫星图像的字面"地球图"

GeoTIFF — TIFF that literally maps Earth

YEAR 2000 (1.0) · 2019 OGC GeoTIFF 1.1 AUTHOR USGS / OGC / 各遥感机构 EXT .tif / .tiff (with GeoTIFF tags) BASE 基于 TIFF 6.0 STD OGC 19-008r4 DEPTH 8 / 16 / 32-bit int + float BANDS 多波段(RGB / NIR / SWIR / Thermal …) STATUS 遥感 / GIS 行业唯一

"TIFF 加 6 个 tag,就成了卫星图像的字面意思的'地球图'。"

"Six extra tags turn TIFF into a literal 'image of Earth'."

1990 年代末,卫星遥感(Landsat、SPOT、Sentinel)生成的图像数量爆炸式增长,带来一个共性问题:像素本身只是亮度,但图像必须能告诉下游"第 (1024, 768) 个像素在地球上是哪一点经纬度、用什么投影、用什么 datum"——否则它就只是一张漂亮的灰阶,做不了 GIS 分析。USGS 联合一批遥感机构在 TIFF 6.0 之上加了 6 个核心 GeoKey,把"像素 → 大地坐标"的映射元数据标准化,并在 2000 年正式发布 GeoTIFF 1.0。OGC 在 2019 年把它升级为 GeoTIFF 1.1 国际标准。结果是:卫星图像、航空摄影、数字高程模型(DEM)、土地利用图全部默认 GeoTIFF;GDAL(几乎所有 GIS 软件的底层 I/O 库)、QGIS、ArcGIS 是工具链命脉;NASA Worldview、Sentinel Hub、Google Earth Engine 内部也走 GeoTIFF。

By the late 1990s, satellite remote sensing (Landsat, SPOT, Sentinel) was producing images at industrial scale, all sharing one problem: a pixel is just a brightness value, but the image must tell downstream "where on Earth is pixel (1024, 768)? in what projection? on what datum?" — otherwise it's just a pretty greyscale, useless for GIS analysis. USGS and a coalition of remote-sensing agencies layered six core GeoKeys on top of TIFF 6.0 to standardise the "pixel → geographic coordinate" mapping metadata, and shipped GeoTIFF 1.0 in 2000. OGC promoted it to international standard GeoTIFF 1.1 in 2019. The result: satellite imagery, aerial photography, digital elevation models (DEMs) and land-use maps all default to GeoTIFF; GDAL (the I/O backbone of nearly every GIS app), QGIS and ArcGIS form the toolchain spine; NASA Worldview, Sentinel Hub and Google Earth Engine all consume GeoTIFF internally.

图 50 · GeoTIFF = TIFF 6.0 容器 + 6 个 GeoKey tag + 多波段。左:TIFF 容器内的 IFD 串挂着 N 个波段(R / G / B / NIR / SWIR / Thermal …);右上:6 个核心 GeoKey 寄生在 TIFF 私有 tag 域(34735 GeoKeyDirectory / 34736 GeoDoubleParams / 34737 GeoAsciiParams),通过 ModelPixelScale + ModelTiepoint 把任意像素 (col, row) 映射到 (lat, lon) 在指定 datum / projection 上的真实地理坐标;右:地球经纬网格代表"像素终点"。整套机制对 TIFF 阅读器完全向后兼容 —— 不认识 GeoKey 的工具仍能把 .tif 当普通 TIFF 打开。

Fig 50 · GeoTIFF = TIFF 6.0 container + six GeoKey tags + multi-band. Left: an IFD chain in the TIFF container holding N bands (R / G / B / NIR / SWIR / Thermal …); top right: six core GeoKeys parasitise three TIFF private tag slots (34735 GeoKeyDirectory / 34736 GeoDoubleParams / 34737 GeoAsciiParams), and via ModelPixelScale + ModelTiepoint map any pixel (col, row) to (lat, lon) on a chosen datum / projection; right: a lat/lon globe grid stands for the "pixel destination". The whole scheme is fully backward-compatible — a TIFF reader that doesn't know GeoKey can still open the .tif as a plain TIFF.

技术内核

Technical core

GeoTIFF 内核三件事。① 6 个核心 GeoKey tag:ModelTransformationTag(4×4 仿射矩阵,完整描述像素到地理坐标的线性变换)/ ModelTiepointTag(若干"控制点对",每对是像素位置 ↔ 地理坐标,适合非线性场景)/ ModelPixelScaleTag(每像素代表多少经纬度 / 米)/ GeoKeyDirectoryTag(主索引,记录所有 GeoKey 的 ID + value 偏移)/ GeoDoubleParamsTag(浮点参数表)/ GeoAsciiParamsTag(ASCII 字符串表,存投影名 / datum 名)。它们寄生在 TIFF 私有 tag 域 34735-34737,因此对不识别 GeoKey 的 TIFF 阅读器完全向后兼容 —— 这是 GeoTIFF 设计最聪明的地方。② 像素 → 地理坐标映射:典型组合是 ModelTiepoint(标定原点)+ ModelPixelScale(每像素单位),引擎按 lon = tiepoint.lon + col × scale.x / lat = tiepoint.lat - row × scale.y 反算;复杂场景上 ModelTransformationTag 直接给 4×4 矩阵。Datum / 投影通过 EPSG 代码引用(EPSG:4326 = WGS84,EPSG:3857 = Web Mercator)。③ 多波段(multi-band):一张 GeoTIFF 通常不止 RGB,Landsat 8 有 11 个 band(可见光 + 近红外 NIR + 短波红外 SWIR + 热红外 + 全色 panchromatic),Sentinel-2 有 13 个,科学家用 NIR - Red 算 NDVI(归一化植被指数)、SWIR 看含水量、Thermal 看地表温度。这些 band 全部走 TIFF 标准的 SamplesPerPixel + BitsPerSample 机制存,互不打扰。

GeoTIFF's core, three pieces. ① Six core GeoKey tags: ModelTransformationTag (a 4×4 affine matrix giving the full linear pixel → coordinate transform) / ModelTiepointTag (control-point pairs, each pixel position ↔ geographic coordinate, for non-linear cases) / ModelPixelScaleTag (lat/lon or metres per pixel) / GeoKeyDirectoryTag (the master index — every GeoKey's ID + value offset) / GeoDoubleParamsTag (floating-point parameter table) / GeoAsciiParamsTag (ASCII string table for projection / datum names). They live in TIFF private tag slots 34735–34737, so any TIFF reader that doesn't know GeoKey is fully backward-compatible — the cleverest part of the design. ② Pixel → coordinate mapping: the typical combination is ModelTiepoint (anchor) + ModelPixelScale (per-pixel unit) — the engine inverts lon = tiepoint.lon + col × scale.x / lat = tiepoint.lat - row × scale.y; for complex cases ModelTransformationTag carries a 4×4 matrix directly. Datum / projection are referenced via EPSG codes (EPSG:4326 = WGS84, EPSG:3857 = Web Mercator). ③ Multi-band: a GeoTIFF rarely stops at RGB — Landsat 8 has 11 bands (visible + NIR + SWIR + thermal IR + panchromatic), Sentinel-2 has 13; scientists compute NDVI (vegetation index) from NIR - Red, see water content with SWIR, surface temperature with thermal IR. All bands ride the standard TIFF SamplesPerPixel + BitsPerSample mechanism, no extra plumbing.

适用

USE FOR

卫星图像(Landsat / Sentinel / SPOT / 高分系列)
航空摄影 / 无人机正射影像
数字高程模型(DEM / DSM / DTM)
土地利用图 / 植被覆盖图 / 气象格点
Cloud Optimized GeoTIFF(COG)做云端流式读

Satellite imagery (Landsat / Sentinel / SPOT / Gaofen)
Aerial photography / UAV orthophotos
Digital elevation models (DEM / DSM / DTM)
Land-use / vegetation / weather grid maps
Cloud Optimized GeoTIFF (COG) for streaming from object storage

反适用

AVOID

任何非地理场景(普通照片用 JPEG / WebP)
Web 直接展示(浏览器不解 GeoTIFF · 需服务端切瓦片)
对 metadata 不敏感的临时图像处理

Anything non-geographic (use JPEG / WebP for ordinary photos)
Direct browser display (no native GeoTIFF — server-side tile a WMS / XYZ)
Throwaway image work that doesn't care about metadata

scope	readers	editors	CLI
GeoTIFF (OGC)	✓✓ GDAL · QGIS · ArcGIS · ENVI · ERDAS · rasterio (Python) · sf / terra (R) · OpenLayers / Leaflet (经服务端切瓦片)	✓ QGIS · ArcGIS · GlobalMapper · 任意基于 GDAL 的 GIS 软件	`gdalinfo in.tif` · `gdal_translate` · `gdalwarp` · `rio info`

奇闻 · TRIVIA

TRIVIA

Sentinel-2 / Landsat 等卫星的原始数据是 GeoTIFF 多波段,单张 5-10 GB。NASA Worldview、ESA Copernicus Open Hub 上你看到的卫星图都是从 GeoTIFF 切瓦片渲染来的 —— 浏览器永远不直接读 .tif,因为单文件太大且没有原生支持。Cloud Optimized GeoTIFF(COG)是 2018 年起的现代变种 —— 通过把 TIFF 内部的 tile 排列得对 HTTP Range request 友好(并预生成各级别 overview),客户端可以从 S3 / GCS 上像流式视频那样按需读子区域,而不必下载整张 5 GB 文件;这让"卫星图像云原生分析"(Earth Engine / Planetary Computer)成为可能。

Sentinel-2 and Landsat ship raw data as multi-band GeoTIFF, 5–10 GB per scene. Every satellite image you see on NASA Worldview or ESA Copernicus Open Hub is GeoTIFF tiled and rendered server-side — browsers never read .tif directly because the files are huge and natively unsupported. Cloud Optimized GeoTIFF (COG), introduced around 2018, is the modern variant: by laying out TIFF tiles HTTP-Range-friendly (and pre-baking overview pyramids), clients can stream sub-regions from S3 / GCS on demand, like seeking inside a video, without downloading the full 5 GB scene. That single trick is what makes "cloud-native satellite analysis" (Earth Engine, Planetary Computer) possible.

←起源:origin: USGS / OGC · 2000 GeoTIFF 1.0 · 2019 OGC 19-008r4 升 1.1 ←基于:based on: TIFF 6.0 · 私有 tag 域 34735-34737 寄生 6 个 GeoKey ↔同代邻居:contemporary neighbour: NITF(军用 / 国防) · COG(Cloud Optimized GeoTIFF · 2018 云原生变种) →现实定位:real position: 遥感 / GIS 行业唯一标准 · GDAL 是事实通用 I/O 后端

NITF — 军用情报图像

NITF — military intelligence imagery

YEAR 1987 (NITF 1.0) · 2005 (NITF 2.1 · 现役) AUTHOR US DoD / NGA EXT .ntf · .nitf STD MIL-STD-2500C(800+ 页) PAYLOAD JPEG / JPEG 2000 / 无压缩 SECURITY 强制 classification tag STATUS 美国国防部 / 北约

"军方版 GeoTIFF,加上一堆 'security classification' 标记。"

"Military GeoTIFF plus a pile of security classification tags."

1980 年代,美国国防部需要一个"统一的情报图像格式" —— 卫星侦察、航空侦察、地面侦察、目标识别、地图、火控影像都要互通,且要带军方专用的元数据。1987 年发布 NITF 1.0(National Imagery Transmission Format),1998 年升 NITF 2.0,2005 年定 NITF 2.1 / MIL-STD-2500C,即今天的现役版本(同步对外发布的 ISO/IEC 12087-5 国际标准基本与之等价)。设计目标包括:(a) 多 segment 文件 —— 一个 .ntf 可以同时装多张图、多张文本注释、多张图形覆盖物(graphic / overlay)、已知地标(LUT 类);(b) 强制 security classification —— 每个 segment 有 2 字节的密级标记(U=Unclassified / C=Confidential / S=Secret / T=Top Secret),还有控制释放范围的 NOFORN / REL TO 等代号;(c) 支持 JPEG / JPEG 2000 / 无压缩的 payload。MIL-STD-2500C 全文 800 多页,涵盖从"怎么标记机密"到"怎么嵌入手绘图标"的全套军用流程。商业领域几乎没人用 —— 它是国防 + 北约 + 部分国家测绘局的内部语言。

In the 1980s the US Department of Defense needed a single intelligence-imagery container — satellite recon, aerial recon, ground recon, target identification, maps and fire-control imagery had to interoperate and carry military-specific metadata. NITF 1.0 (National Imagery Transmission Format) shipped in 1987, NITF 2.0 in 1998, and NITF 2.1 / MIL-STD-2500C in 2005 — the version still in service (ISO/IEC 12087-5, published in parallel, is essentially equivalent). Design goals: (a) multi-segment file — one .ntf can hold multiple images, text annotations, graphic / overlay layers and known-landmark tables (LUT-class); (b) mandatory security classification — every segment carries a two-byte clearance marker (U = Unclassified / C = Confidential / S = Secret / T = Top Secret) plus distribution caveats like NOFORN or REL TO; (c) JPEG / JPEG 2000 / uncompressed payload. MIL-STD-2500C runs over 800 pages, covering everything from "how to tag classification" to "how to embed hand-drawn icons". Commercial use is essentially zero — NITF is the internal language of the US DoD, NATO and a few national mapping agencies.

图 51 · NITF 2.1 文件 segment 列表。一个 .ntf 文件由 file header(388 byte 起,固定字段 + 文件级 classification)+ N×image segment(每段独立 payload:JPEG / JPEG 2000 / 无压缩 · 可多波段)+ N×graphic segment(覆盖物 · 手绘标记 · 圆圈 / 箭头 / 标尺)+ N×text segment(分析员注记 / 释义)组成。每个 segment 头部都有 2 byte 的 CLAS(security class)字段:U / C / S / T。文件总密级取所有 segment 中的最高值;读者只看到自己有权限的段。

Fig 51 · NITF 2.1 multi-segment file layout. A .ntf is a file header (≥ 388 bytes — fixed fields + file-level classification) + N image segments (each with its own payload: JPEG / JPEG 2000 / uncompressed, multi-band possible) + N graphic segments (overlays — hand-drawn marks, circles, arrows, scales) + N text segments (analyst notes / interpretations). Every segment header carries a two-byte CLAS (security class) field: U / C / S / T. The file's overall classification equals the max across segments; readers see only the segments their clearance permits.

技术内核

Technical core

NITF 内核三件事。① 多 segment 容器:一个 .ntf 文件 = 一个 file header(388 byte 起,固定字段)+ N 个 image segment + N 个 graphic segment + N 个 text segment + N 个 reserved extension segment(给 NGA 或第三方扩展用)。每个 segment 是独立单元 —— 自己的 header、自己的 payload、自己的密级。这跟 GeoTIFF 一个文件一张图的设计思路完全不同 —— NITF 一个文件就是一份"情报包"。② 强制 security classification:每个 segment 的 header 里都有 2 byte CLAS 字段(U=Unclassified / C=Confidential / S=Secret / T=Top Secret);此外还有 control caveats(NOFORN=No Foreign Nationals / REL TO XXX=Releasable To 列表 / ORCON=Originator Controlled 等)。文件总密级 = 所有 segment 密级的最高值;读取系统按读者权限逐段 redact —— 你看到的可能是同一份 .ntf 但里面只有 2 个 segment,其它的被空白替代。③ 支持多种 payload:image segment 的实际像素数据可以是无压缩 raw、JPEG(legacy)、JPEG 2000(主流 · 因为 JP2 的 progressive + ROI + 任意分辨率层级正好契合"先看缩略图再放大看局部"的情报场景)、Vector Quantization(VQ,1990s 老格式)。NITF 也支持嵌入 GeoTIFF 风格的地理元数据(通过专门的 ICHIPB / RPC00B 之类 TRE,Tagged Record Extensions),所以一张 NITF 同时是图、是地理参考、是密级文档。

NITF's core, three pieces. ① Multi-segment container: a .ntf is one file header (≥ 388 bytes, fixed fields) + N image segments + N graphic segments + N text segments + N reserved-extension segments (for NGA or third-party extensions). Each segment is independent — its own header, its own payload, its own classification. The opposite of GeoTIFF's "one file, one image" model — NITF is "one file, one intelligence package". ② Mandatory security classification: each segment header carries a two-byte CLAS field (U = Unclassified / C = Confidential / S = Secret / T = Top Secret); plus control caveats (NOFORN = No Foreign Nationals / REL TO XXX = Releasable To list / ORCON = Originator Controlled, etc.). File-level classification = max across segments; reading systems redact per segment by clearance — your copy of the .ntf may show only two segments, the rest blanked. ③ Multiple payload types: image segments can carry uncompressed raw, JPEG (legacy), JPEG 2000 (mainstream — JP2's progressive + ROI + arbitrary resolution layers fit the "thumbnail first, zoom into a region" intelligence workflow perfectly), or Vector Quantization (VQ, an older 1990s codec). NITF also embeds GeoTIFF-style geo metadata via dedicated TREs (Tagged Record Extensions) such as ICHIPB / RPC00B — so a single .ntf is at once an image, a geo-reference and a classified document.

适用

USE FOR

美国国防部 / NGA / 北约相关情报图像
需要文件内多 segment 不同密级混存的场景
侦察图像 + 分析员注记 + 覆盖物一体化交付

US DoD / NGA / NATO intelligence imagery
Mixed-classification segments inside a single file
Recon image + analyst notes + overlays as one package

反适用

AVOID

任何民用场景(用 GeoTIFF)
不需要 classification metadata 的工作流
商业 GIS / web 地图(几乎没工具能直接渲染 NITF)

Anything civilian (use GeoTIFF)
Workflows that don't need classification metadata
Commercial GIS / web mapping (almost nothing renders NITF directly)

scope	readers	editors	CLI
NITF (US DoD / NGA / NATO)	~ 限国防工具链 · GDAL 部分支持 · ESRI ArcGIS for Defense · Hexagon ERDAS · BAE GXP	~ ArcGIS for Defense · ENVI · Hexagon ERDAS · 其它需许可证	`nitfutils`(NGA 官方) · `gdalinfo --formats \| grep NITF` · `gdal_translate -of NITF`

奇闻 · TRIVIA

TRIVIA

NITF 2.1 spec(MIL-STD-2500C)是 800 多页 —— 包含从"怎么标记机密"(2 byte CLAS 字段、9 byte CTLN 控制号、19 byte 释放清单)到"怎么嵌入手绘图标"(graphic segment 的 CGM 矢量子集)、再到"分发系统怎么按读者权限自动 redact 多 segment"全套军用流程。其 TRE(Tagged Record Extensions)机制让国防部可以为新型传感器单独定义私有元数据块而不破坏向后兼容 —— 这点跟 TIFF 私有 tag 的设计哲学一脉相承,只是一个为遥感 GIS 服务,一个为情报分析服务。

NITF 2.1's spec (MIL-STD-2500C) is over 800 pages — covering everything from "how to tag classification" (the 2-byte CLAS field, 9-byte CTLN control number, 19-byte release list) to "how to embed hand-drawn icons" (the CGM vector subset inside graphic segments) to "how distribution systems auto-redact per-segment by reader clearance". Its TRE (Tagged Record Extensions) mechanism lets the DoD define private metadata blocks for new sensors without breaking backward compatibility — the same design philosophy as TIFF private tags, only one serves remote-sensing GIS and the other serves intelligence analysis.

←起源:origin: US DoD · 1987 NITF 1.0 → 1998 2.0 → 2005 2.1 / MIL-STD-2500C ←设计借鉴:design borrows from: TIFF 私有 tag · GeoTIFF 地理参考 · JPEG 2000 progressive 解码 ↔同代邻居:contemporary neighbour: GeoTIFF(民用对应物) · ISO/IEC 12087-5(等价国际标准) →现实定位:real position: 美国国防 / 北约的情报图像通用语 · 商业领域不可见

FITS — 天文学的"什么都装"

FITS — astronomy's "one format to hold the universe"

YEAR 1981(首版)· 1988 IAU 标准 · 至今多次扩展 AUTHOR NASA / NRAO + IAU FITS Working Group EXT .fits · .fit · .fts MIME image/fits · application/fits STD IAU FITS 4.0(2018) LOSSY 多 tile compression(Rice / GZIP / PLIO / HCOMPRESS) DEPTH 8-64 bit int + 32 / 64-bit float ALPHA 一般无 STATUS 天文唯一标准 · 40+ 年未替代

"它是图,是表格,是光谱,是任何天文数据 —— '一种格式装下宇宙'。"

"It's an image, a table, a spectrum — anything astronomical. 'One format to hold the universe'."

1981 年,Don Wells、Eric Greisen、Ronald Harten 等几位射电 / 光学天文学家在《Astronomy & Astrophysics》期刊上发表论文,提议一种"统一的天文数据格式":FITS = Flexible Image Transport System。痛点是当时美国国家光学天文台(NOAO)、国家射电天文台(NRAO)、欧南台(ESO)、各高校观测站都在用各自不兼容的二进制格式,数据交换得反复转换,且磁带寄送是常态(那个年代没有互联网,数据靠 9 轨磁带跨大洲邮寄)。设计目标:(a) 高位深图像(8-64 bit int / 32-64 bit float),因为 CCD 输出动辄 16-bit、长曝叠加后是 32-bit;(b) 多波段 / 多维数据立方体(空间 X/Y + 波长 Z 三维,甚至时间 T 第四维);(c) 表格存观测元数据(曝光时间、滤镜、坐标、温度、读出噪声等数百个字段);(d) 跨望远镜兼容,且自描述(读它的人不需要望远镜手册也能解读)。1988 年 IAU(国际天文联合会)正式认可 FITS 为天文数据交换标准。至今 40 余年未被替代 —— 因为天文需要严格的可读性、自描述性、跨工具一致性、长期归档能力,这些目标 HDF5 / NetCDF 等更现代格式都做不到比 FITS 更好的平衡。astropy 是 Python 天文社区的事实标准,from astropy.io import fits 是天文程序员的"hello world"。

In 1981, Don Wells, Eric Greisen and Ronald Harten — radio and optical astronomers — published a paper in Astronomy & Astrophysics proposing a unified astronomical data format: FITS = Flexible Image Transport System. The pain: the US NOAO (optical), NRAO (radio), ESO and university observatories all used incompatible binary formats; exchange meant constant conversion, and magnetic-tape shipping was the norm (no Internet — data travelled the world on 9-track tapes). Design goals: (a) high bit-depth images (8-64 bit int, 32-64 bit float — CCDs produce 16-bit, long-exposure stacks reach 32-bit); (b) multi-band / multi-dimensional data cubes (spatial X/Y + wavelength Z, or even a time axis T); (c) tabular metadata (exposure, filter, coordinates, temperature, read-noise — hundreds of fields); (d) cross-telescope, self-describing (a reader needs no instrument handbook). The IAU formally adopted FITS in 1988. Forty-plus years on, no replacement has stuck — modern formats like HDF5 / NetCDF can't beat FITS's balance of human-readability, self-description, cross-tool consistency and long-term archival. astropy is the de-facto Python astronomy library, and from astropy.io import fits is the astronomer-programmer's "hello world".

图 52a · FITS 的 HDU(Header Data Unit)链。一个 .fits 文件由一个 Primary HDU(必含)+ N 个 Extension HDU(可选)线性串联。Primary HDU 通常装主图像;Extension 可以是 IMAGE(2D / 3D 图)、BINTABLE(二进制表 · 例如源星表)、TABLE(ASCII 表)。每个 HDU 都自带 ASCII header(卡片格式 80 byte/卡)+ 二进制 data block(2880 byte 块对齐 · 这数字是 1980s 9 轨磁带的物理记录长度遗产)。这种结构让"主图 + mask + error map + 源表"可以放进一个文件,这是为什么 NASA 的 JWST / Hubble pipeline 一个观测就生成一个 .fits。

Fig 52a · FITS HDU (Header Data Unit) chain. A .fits file is a single Primary HDU (mandatory) followed by N Extension HDUs (optional) in a linear chain. The Primary HDU usually holds the main image; Extensions can be IMAGE (2D / 3D), BINTABLE (binary table — e.g. a source catalog), or TABLE (ASCII table). Every HDU carries its own ASCII header (80-byte cards) and a binary data block aligned to 2880 bytes (a number inherited from the physical record length of 1980s 9-track magnetic tape). The structure lets "main image + mask + error map + source table" live in a single file — which is why NASA's JWST / Hubble pipelines emit one .fits per observation.

图 52b · FITS 80 byte ASCII header card 格式。byte 1-8 = KEYWORD(大写)/ byte 9 = '='/ byte 11-30 = VALUE / byte 31 = '/'/ byte 32-80 = COMMENT。全部大写 ASCII,不足部分用空格右补齐到 80 byte。36 张卡构成 1 个 2880 byte 块(继承 1980s 磁带物理记录长度),最后一张卡用 END 标记 header 结束。最神奇的是:你可以直接用 head -c 2880 image.fits 看到一段可读的元数据 —— 几十年的天文文件都能用文本编辑器"瞄一眼"。

Fig 52b · FITS 80-byte ASCII header card. Bytes 1–8 = KEYWORD (uppercase) / 9 = '=' / 11–30 = VALUE / 31 = '/' / 32–80 = COMMENT. All uppercase ASCII, padded with trailing spaces to 80 bytes. Thirty-six cards form a 2880-byte block (inherited from 1980s magnetic-tape record length); the last card carries END to mark header termination. The magic: you can head -c 2880 image.fits and read the metadata in plain text — decades-old astronomy files are still text-editor inspectable.

图 52c · WCS(World Coordinate System)像素 → 天球映射。FITS header 里的 CRPIX1 / CRPIX2(参考像素位置)、CRVAL1 / CRVAL2(参考像素的 RA / Dec)、CDELT1 / CDELT2 或 CDi_j 矩阵(每像素的角秒)、CTYPE1 = 'RA---TAN'(投影类型,常见 TAN / SIN / ARC / ZEA)共同定义了像素到天球的可逆函数。这套机制由 Greisen & Calabretta 在 2002 年的两篇里程碑论文里完整化,所有现代天文软件(ds9 · astropy.wcs · IDL)都按这套读 WCS。

Fig 52c · WCS (World Coordinate System) — pixel → sky mapping. Header keywords CRPIX1 / CRPIX2 (reference-pixel position), CRVAL1 / CRVAL2 (RA / Dec at the reference pixel), CDELT1 / CDELT2 or the CDi_j matrix (arcsec per pixel) and CTYPE1 = 'RA---TAN' (projection — typically TAN / SIN / ARC / ZEA) together define an invertible pixel-to-sky function. The framework was completed in two landmark Greisen & Calabretta papers (2002), and every modern astronomy tool (ds9 · astropy.wcs · IDL) reads WCS the same way.

图 52d · FITS 数据立方体(data cube)— 同一份 NAXIS=3 的 FITS 数据集,沿空间 X(RA)/ Y(Dec)+ 第三轴 λ(波长)三维索引。每"切片"是同一片天空在某个特定波长下的图像;沿 λ 方向取一个像素就得到该天体的光谱。IFU(integral field unit)光谱仪、射电干涉阵的频谱立方体都是这种数据;再加一个时间轴 T 就成 NAXIS=4 的"动态光谱立方体"。这是 FITS 一个文件就能装下"任何天文数据"承诺的核心 —— 没有"只支持 2D 图像"的限制。

Fig 52d · FITS data cube — a single NAXIS=3 dataset indexed along spatial X (RA) / Y (Dec) + third axis λ (wavelength). Each "slice" is the same patch of sky at one wavelength; reading a single pixel along λ yields that object's spectrum. IFU (integral-field unit) spectrographs and radio interferometer spectral cubes both produce this. Add a time axis T and you get NAXIS=4 "dynamic spectral cubes". This is the heart of FITS's promise to carry "any astronomical data" — there's no "2D-image only" limit.

技术内核

Technical core

FITS 内核六块。① HDU(Header Data Unit)链:一个 .fits 文件 = Primary HDU(必含)+ N 个 Extension HDU(可选),线性串联。Primary 装主图像;Extension 可以是 IMAGE(2D / 3D / N 维数组)、BINTABLE(二进制表)、TABLE(ASCII 表)。一个观测出来的 .fits 通常就是"主图 + mask + error + 源星表"四件套。② 80 byte ASCII header card:格式 KEYWORD = value / comment,全大写、固定 byte 1-8 是关键字、byte 9 是 '='、byte 11-30 是值、byte 31 是 '/'、byte 32-80 是注释。36 张卡 = 1 个 2880 byte 块(磁带遗产);最后一张是 END。你能用文本编辑器看见 FITS 的元数据 —— 这是 FITS 跨越 40 年的根本原因之一。③ 多维数据数组:NAXIS = N(维度数)/ NAXIS1, NAXIS2, …, NAXISN(各维大小)/ BITPIX(每像素位深 8/16/32/-32/-64,负数表示 IEEE 754 浮点)。NAXIS=2 是图,NAXIS=3 是数据立方体(X/Y/λ),NAXIS=4 加时间维。④ BINTABLE / TABLE:这是 FITS 真正的"杀手级"扩展 —— 二进制表能存源星表(N 行 × 几十列,从 ID / RA / Dec 到亮度 / 颜色 / 形态参数)、光谱(N 行波长 × 流量)、时序数据(N 行时间 × 通量),全部走标准 TFORM / TTYPE / TUNIT 描述,任何 FITS 工具都能读。这点让 FITS 既是"图像格式"又是"科学数据库",HDF5 / NetCDF 都没有这种"老牌 + 跨工具一致"的优势。⑤ WCS(World Coordinate System):由 Greisen & Calabretta 在 2002 年的两篇里程碑论文里完整化,通过 CRPIX(参考像素)/ CRVAL(参考点天球坐标)/ CDELT 或 CDi_j 矩阵(每像素角秒)/ CTYPE = 'RA---TAN'(投影类型,常见 TAN / SIN / ARC / ZEA)定义可逆函数。GeoTIFF 学的就是这套思路,只是把"天球"换成了"地球"。⑥ 多种 tile compression:Rice(整数、低噪图像最优 · 2-3×)/ GZIP(通用 · 2-3×)/ PLIO(mask 类整数图 · 4-8×)/ HCOMPRESS(有损 · 巡天图像 · 4-10×,JWST / Pan-STARRS 等大型项目用它)。压缩存在专门的 BINTABLE 扩展里,不破坏任何 FITS 兼容性 —— 不认识压缩的工具仍能识别 BINTABLE,只是看不懂里面是图。

FITS's core, six pieces. ① HDU (Header Data Unit) chain: a .fits file = Primary HDU (mandatory) + N Extension HDUs (optional), in a linear chain. Primary holds the main image; Extensions can be IMAGE (2D / 3D / N-D array), BINTABLE (binary table) or TABLE (ASCII table). A typical observation produces "main image + mask + error map + source catalogue" in a single file. ② 80-byte ASCII header card: format KEYWORD = value / comment, all uppercase, bytes 1–8 are the keyword, byte 9 is '=', bytes 11–30 are the value, byte 31 is '/', bytes 32–80 are the comment. Thirty-six cards = one 2880-byte block (magnetic-tape heritage); the last card is END. You can read FITS metadata in a text editor — one of the deep reasons FITS has lasted forty years. ③ Multidimensional data arrays: NAXIS = N (dimensions) / NAXIS1, NAXIS2, …, NAXISN (sizes) / BITPIX (per-pixel bit depth 8/16/32/-32/-64; negative = IEEE 754 float). NAXIS=2 is an image, NAXIS=3 is a data cube (X/Y/λ), NAXIS=4 adds time. ④ BINTABLE / TABLE — FITS's killer extension: binary tables hold source catalogues (N rows × tens of columns from ID / RA / Dec to magnitudes / colours / shape parameters), spectra (N rows of wavelength × flux), time series (N rows of time × flux) — all described via standard TFORM / TTYPE / TUNIT, readable by any FITS tool. This is what makes FITS simultaneously "image format" and "scientific database" — an edge HDF5 / NetCDF can't match. ⑤ WCS (World Coordinate System): completed in Greisen & Calabretta's two landmark 2002 papers — CRPIX (reference pixel) / CRVAL (sky coordinate at the reference pixel) / CDELT or CDi_j matrix (arcsec/pixel) / CTYPE = 'RA---TAN' (projection — typically TAN / SIN / ARC / ZEA) together define an invertible function. GeoTIFF borrowed exactly this design, just swapping "sky" for "Earth". ⑥ Tile compressions: Rice (best for low-noise integer images, 2-3×) / GZIP (general, 2-3×) / PLIO (mask-style integer images, 4-8×) / HCOMPRESS (lossy, survey imagery, 4-10× — used by JWST, Pan-STARRS and other large surveys). Compressed data lives inside a dedicated BINTABLE extension, so FITS compatibility is preserved — tools that don't understand the compression still see a BINTABLE, just can't decode the image inside.

整套观测产物,header 自描述、WCS 让跨望远镜数据可比较。这是一份观测的全部输出,而不只是"一张图" — 这条流水线 1981 年至今没有结构上的变化,只是工具变了(磁带 → CD → 网络 → S3)。 JWST / Hubble / Chandra / SDSS 全部走这条管道 · MAST / IRSA / NED 等数据中心也按 FITS 提供下载。 astropy + matplotlib + ds9 是天文程序员的"三件套" — 几乎所有现代天文论文里的图都从这条路径产出。

图 52 · FITS 完整天文工作流。望远镜 CCD 输出 16/32-bit raw,pipeline 写成多 HDU FITS(主图 + mask + error + 源星表),ds9 / astropy 读入后通过 header 里的 WCS 把像素转成 (RA, Dec) 天球坐标 —— 这一步让"今天 JWST 的红外图"和"30 年前 Hubble 的可见光图"可以叠在同一片天空上做联合分析。下游再走 SourceExtractor 测光、specutils 拟合光谱、matplotlib 出图,最后进入论文。这条流水线 1981 年至今没有结构上的变化,只是工具变了 —— 磁带 → CD → FTP → AWS S3。MAST(STScI)、IRSA(IPAC)、NED 等天文数据中心至今全部按 FITS 提供下载。

Fig 52 · The full FITS astronomical workflow. The telescope CCD emits 16/32-bit raw, the pipeline writes a multi-HDU FITS (main image + mask + error + source catalogue), ds9 / astropy reads it and uses the WCS in the header to convert pixels to (RA, Dec) sky coordinates — that single step lets "today's JWST infrared image" and "Hubble's visible-light image from thirty years ago" be stacked on the same patch of sky for joint analysis. Downstream: SourceExtractor for photometry, specutils for spectral fitting, matplotlib for publication figures, then into the paper. This pipeline has been structurally unchanged since 1981; only the tools changed — magnetic tape → CD → FTP → AWS S3. MAST (STScI), IRSA (IPAC), NED and other archives still distribute everything as FITS.

历史专栏 · FITS 的 40 年

HISTORY · FORTY YEARS OF FITS

从磁带寄送到 JWST · 一种格式装下宇宙

From magnetic tape to JWST · one format to hold the universe

1981 年,Don Wells、Eric Greisen、Ronald Harten 在《Astronomy & Astrophysics》上发表那篇被引用上千次的论文,提议 FITS。当时的痛点很具体:NRAO(美国国家射电天文台)和 NOAO(美国国家光学天文台)互相寄送 9 轨磁带,每个台站都用各自的二进制格式,接收方得专门写转换器,且每次磁带格式微调都得重写一遍。Wells 等人意识到:与其每两个观测台之间签 N×N 的转换协议,不如所有人都用同一种自描述格式。FITS 的"自描述"二字是设计基石 —— ASCII header 让接收方不需要寄送方的手册就能解读数据。

1982-1988 年,FITS 被各大天文台陆续采用:1982 年 NRAO 和 STScI(太空望远镜科学研究所)正式接受;1984 年 ESO 和 NOAO 跟进;1988 年 IAU(国际天文联合会)正式认可 FITS 为天文数据交换标准 —— 这是 FITS"被官方加冕"的一刻。1990 年 NASA 哈勃太空望远镜上天,从第一张照片起就全部用 FITS,这给了 FITS 决定性的工业级背书。1990 年代陆续加入 IMAGE / TABLE / BINTABLE 三种 extension type,FITS 从"图片格式"扩展为"科学数据格式"。

2000 年代是 FITS 的"巡天黄金时代"。Chandra X 射线天文台(1999)的事件列表用 BINTABLE;Spitzer 红外望远镜(2003)的多波段图全部 FITS;SDSS(Sloan Digital Sky Survey)产出 PB 级 FITS 数据,公开发布,成为天文学最大的开放数据集之一,催生了"数据驱动天文学"这一整代研究方式。同时期 Greisen & Calabretta 在 2002 年的两篇 WCS 论文把"像素 → 天球"的所有数学补全,WCS 成为 FITS header 的事实标配。

2013 年 astropy(Python 天文库)发布 1.0,把 PyFITS / pywcs / atpy 等老库整合,from astropy.io import fits 一举成为天文程序员的"hello world"。2018 年 IAU FITS Working Group 把 FITS 升到 4.0 版,主要是规范化历史沉积的若干细节 —— 注意:仍然完全向后兼容 1981 年的文件。2021 年 JWST(詹姆斯·韦伯太空望远镜)上天,2022 年发布"宇宙悬崖"等惊艳深空照片 —— 这些照片公众看到的是 PNG / JPEG,但原始数据全部是 FITS(数 GB 一张),分发在 MAST 数据中心。NASA 估算 JWST 的观测寿命内会产出 ~50 PB 的 FITS 数据。FITS spec 至今仍在演进(IAU FITS Working Group 每年开会),但向后兼容是铁律 —— 1981 年写的 FITS 文件在 2026 年仍能被现代工具完美读出来,不需要任何转换。

为什么 FITS 没被取代?HDF5(更现代、更灵活)/ NetCDF(科学社区主流)都比 FITS 在功能上更强,但都没在天文学站住脚。原因有三:(a) 历史路径依赖 —— 30 年的天文软件、教程、博士论文、归档磁带全是 FITS,迁移成本天文级别;(b) FITS 自描述足够好 —— ASCII header 可读、可 grep、可 diff,科学家爱这点(HDF5 是二进制,看不见);(c) WCS 跨工具一致 —— 世界各地天文台不同时代的工具对同一份 FITS 的 WCS 解读完全一致,这种一致性 HDF5 / NetCDF 在天文社区还没攒够。所以 FITS 大概率会再活 40 年 —— 它不是"最先进",而是"最不可替代"。

In 1981 Don Wells, Eric Greisen and Ronald Harten published the now-thousand-citation paper in Astronomy & Astrophysics proposing FITS. The pain was concrete: NRAO and NOAO routinely shipped 9-track magnetic tapes to each other, every observatory used its own binary format, the receiver had to write custom converters, and any tape-format tweak forced a rewrite. Wells et al. realised: rather than negotiating N×N conversion protocols between every pair of observatories, everyone should agree on a single self-describing format. The two words "self-describing" are FITS's foundation stone — the ASCII header lets the receiver read the data without the sender's manual.

From 1982 to 1988 the major observatories adopted FITS: NRAO and STScI (Space Telescope Science Institute) in 1982; ESO and NOAO in 1984; the IAU formally endorsed FITS as the astronomy data-exchange standard in 1988 — its "official coronation". When Hubble launched in 1990, every image from frame one was FITS — decisive industrial-scale backing. The 1990s added the IMAGE / TABLE / BINTABLE extension types, expanding FITS from "image format" to "scientific-data format".

The 2000s were FITS's "survey golden age". Chandra X-ray Observatory (1999) ships event lists as BINTABLEs; Spitzer (2003) ships multi-band imagery as FITS; SDSS (Sloan Digital Sky Survey) emits petabytes of FITS, publicly released, becoming one of astronomy's largest open datasets and seeding an entire generation of "data-driven astronomy". Greisen & Calabretta's two 2002 papers completed the WCS mathematics for pixel → sky, and WCS became the de-facto standard piece of every FITS header.

In 2013 astropy 1.0 shipped, merging PyFITS / pywcs / atpy and making from astropy.io import fits the astronomer-programmer's "hello world". The IAU FITS Working Group ratified FITS 4.0 in 2018, mostly cleaning up historically-accreted edge cases — note: still fully backward compatible with 1981 files. JWST launched in 2021 and from 2022 onward shipped "Cosmic Cliffs" and other stunning deep-sky portraits — what the public sees is PNG/JPEG, but the raw data is all FITS (gigabytes per frame), distributed via the MAST archive. NASA estimates JWST will accumulate roughly 50 PB of FITS over its mission life. The FITS spec is still evolving (the IAU FITS Working Group meets annually), but backward compatibility is iron law — a FITS file written in 1981 still opens flawlessly in 2026 with no conversion.

Why hasn't FITS been replaced? HDF5 (more modern, more flexible) and NetCDF (mainstream in other sciences) are functionally superior — but neither caught on in astronomy. Three reasons: (a) path dependency — thirty years of astronomy software, tutorials, PhD theses and archival tapes are FITS; the migration cost is astronomical (literally); (b) FITS's self-description is good enough — the ASCII header is readable, greppable, diffable; scientists love this (HDF5 is binary and invisible); (c) WCS is cross-tool consistent — observatories worldwide and across decades parse the same FITS WCS the same way, a consistency HDF5 / NetCDF haven't yet built up in astronomy. So FITS is good for another forty years — not because it's the most advanced, but because it's the most irreplaceable.

compression	typical ratio	lossy?	typical use
Rice	2-3×	无损	低噪整数图(CCD raw)
GZIP	2-3×	无损	通用 / 文本类数据
PLIO	4-8×	无损	mask 图(整数 / 稀疏)
HCOMPRESS	4-10×	有损	巡天图像(JWST · Pan-STARRS)

$ ds9 image.fits                                # SAOImage 查看 · 天文标配 GUI
$ python -c "from astropy.io import fits; \
             hdu = fits.open('img.fits'); \
             print(hdu.info())"                  # Python · 列出所有 HDU
$ funpack img.fits.fz                            # CFITSIO 解 Rice / GZIP / HCOMPRESS
$ wcsinfo img.fits                               # 看 WCS · CRPIX / CRVAL / CTYPE
$ fitsverify img.fits                            # 校验是否合规 FITS · NASA 出品

适用

USE FOR

所有天文数据(从太阳到深空)
多维科学数据(N-D 数组、数据立方体)
需要长期归档(40+ 年向后兼容)
需要 ASCII 可读 metadata 的科学场景
需要 BINTABLE 二进制表 + 图像同文件的工作流
跨望远镜 / 跨时代数据叠加(WCS 一致性)

All astronomical data (Sun to deep sky)
Multidimensional scientific data (N-D arrays, data cubes)
Long-term archival (40-plus-year backward compatibility)
Scientific work needing ASCII-readable metadata
Workflows mixing BINTABLE and image in one file
Cross-telescope / cross-era data stacking (WCS consistency)

反适用

AVOID

任何非科学场景(用 PNG / JPEG / TIFF)
需要浏览器原生展示(零浏览器支持)
对压缩比极度敏感的存档(用 HEIF / AVIF)
不需要 WCS / metadata 的工作流(开销浪费)

Anything non-scientific (use PNG / JPEG / TIFF)
Native browser display (zero browser support)
Archives extremely size-sensitive (use HEIF / AVIF)
Workflows that don't need WCS / metadata (overhead waste)

scope	readers	editors / pipelines	CLI
FITS (IAU)	✓✓ ds9 · astropy.io.fits · CFITSIO · IDL · IRAF · CASA · DS9 / Aladin · ESA Datalabs · MAST / IRSA / NED 数据中心	✓✓ astropy 全家桶(specutils / photutils / lightkurve)· STScI Hubble pipeline · NASA JWST pipeline · ESO 镜像	`fitsinfo` · `fitsdump` · `fitscopy` · `fitsverify` · `funpack` · `wcsinfo`

奇闻 · TRIVIA

TRIVIA

FITS 文件没有传统意义的 'magic number' —— 靠第一个 80-byte card 的 SIMPLE = T 字符串识别。换句话说,你可以用文本编辑器打开任何 FITS 文件,看见的第一行就是这个 "SIMPLE = T",这是 FITS 自描述哲学的字面贯彻。FITS 标准明确写着 "once written, never changed" —— 即新版本可以加新 keyword,但不能改老 keyword 的含义。这条铁律让 1981 写的 FITS 文件 2026 年仍能被现代工具完美读取,而 PNG / JPEG / HEIF 都做不到这种跨越四十年的 binary level 兼容。

JWST 的原始图像就是 FITS,数 GB 级别 —— 公众看到的"James Webb 的星空照片"都是 PNG / JPEG 渲染输出,但研究者下载的 .fits 一张通常 1-5 GB,因为它包含主图、mask、error、原始 raw 帧、校准元数据等多个 HDU。NASA MAST 数据中心是 JWST 数据的官方分发渠道,任何研究者都可以免费下载所有公开 FITS 数据 —— 这是天文社区"开放数据"哲学的代表。

FITS files have no traditional "magic number" — recognition relies on the first 80-byte card's literal string SIMPLE = T. Open any FITS file in a text editor and that's the first line — the literal embodiment of FITS's self-description philosophy. The FITS standard explicitly states "once written, never changed": new keywords may be added, but old keywords' meanings can never change. That iron rule is why a 1981 FITS file still opens flawlessly in 2026 — PNG / JPEG / HEIF don't approach this forty-year binary-level compatibility.

JWST's raw imagery is FITS, multi-gigabyte — the "James Webb starscapes" the public sees are PNG / JPEG renderings, but researchers download .fits files at 1–5 GB each, packing main image, mask, error, raw frames and calibration metadata as separate HDUs. NASA's MAST archive is the official JWST distribution channel, and any researcher can freely download all public FITS data — a flagship of astronomy's "open data" philosophy.

←起源:origin: Don Wells / Eric Greisen / Ronald Harten · 1981 A&A paper · 1988 IAU 标准 ←设计动机:design motivation: 解决 1980s 天文台之间磁带格式互不兼容 + 自描述跨工具一致性 ↔同源思想:cousins: TIFF(自描述容器思想)· GeoTIFF WCS 借鉴 FITS WCS · HDF5 / NetCDF(更现代但未替代) →现役:in service: JWST · Hubble · Chandra · SDSS · Pan-STARRS · LSST · 几乎所有现代望远镜 →现实定位:real position: 天文唯一标准 · 40+ 年未替代 · 永恒存在 · 不与任何其它格式直接竞争

JP2 / JPX — JPEG 2000 在科学领域的活路

JP2 / JPX — JPEG 2000's afterlife in science

YEAR 2000(JP2)/ 2003(JPX) AUTHOR JPEG 委员会(ISO/IEC JTC 1/SC 29/WG 1) EXT .jp2 · .jpx · .j2k(裸 codestream) BASE JPEG 2000 codestream + ISOBMFF box 容器 STD ISO/IEC 15444-1(JP2)· 15444-2(JPX) LOSSY 可有可无(同一 codestream 切换) STATUS DICOM 内嵌 / 卫星归档 / 文化遗产

"JPEG 2000 在 web 死了,在医学和卫星领域活得很好。"

"JPEG 2000 died on the web; it lives well in medicine and satellites."

2000 年 ISO/IEC 15444-1 发布,JPEG 2000 标准里附带 JP2 文件结构 —— 一个 ISOBMFF 风格的 box 容器(跟 MP4 同源,2001 年才被定为 ISOBMFF;但 JP2 box 框架是更早成型的同套思路),内部装 JPEG 2000 codestream payload + 色彩管理 + 分辨率信息 + ICC profile 等元数据。2003 年 ISO/IEC 15444-2 定义 JPX 扩展,允许多 codestream(类似多页)、复杂 metadata、跟 XML 的元数据集成、富互动结构。设计本意是替代 JPEG 成为下一代主流 —— 任意分辨率层级解码、无损 + 有损切换、ROI(region of interest)优先解码、progressive 流式播放。但是 web 浏览器拒绝实现:Chromium / Firefox 都说 "JPEG 2000 解码 CPU 开销太大,而 web 流量是体积敏感不是质量敏感",只有 Safari 至今支持(macOS / iOS 走系统 ImageIO 框架)。结果 JPEG 2000 在主流场景死亡,但在不被浏览器决定的场景里活得很好:DICOM transfer syntax(医学影像标准内嵌 JP2 codestream)、卫星图像归档(ESA / NASA 部分管线)、文化遗产高保真扫描(Library of Congress 古籍数字化)—— 这些场景需要"任意分辨率层级解码"和"同一文件无损 + 有损切换"。

In 2000 ISO/IEC 15444-1 shipped, with the JPEG 2000 standard also defining the JP2 file structure — an ISOBMFF-style box container (cousin of MP4 — formally ISOBMFF only in 2001, but JP2's box framework is an earlier instance of the same philosophy) wrapping a JPEG 2000 codestream plus colour-management, resolution and ICC-profile metadata. In 2003 ISO/IEC 15444-2 defined the JPX extension, allowing multiple codestreams (page-like), richer metadata, XML metadata integration and interactive structures. The original ambition was to replace JPEG as the mainstream — any-resolution-layer decoding, lossless / lossy switch, region-of-interest priority decoding, progressive streaming. Browsers refused: Chromium and Firefox both said "JPEG 2000 decode is too CPU-heavy, and web traffic is size-sensitive not quality-sensitive"; only Safari supports it today (via macOS / iOS ImageIO). So JPEG 2000 died on the mainstream — and lives well in scenes browsers don't gatekeep: DICOM transfer syntaxes (medical-imaging standards embedding JP2 codestreams), satellite-image archiving (ESA / NASA pipelines), cultural-heritage high-fidelity scans (Library of Congress book digitisation) — places that need "arbitrary resolution layers" and "lossless / lossy in one file".

图 53 · JP2 文件 box 树。ISOBMFF 风格的层叠容器,根下挂 jP(12 byte 签名)/ ftyp(文件类型 'jp2 ')/ jp2h(image header super-box,内含 ihdr 宽高位深 / colr 色彩空间)/ jp2c(实际的 JPEG 2000 codestream payload)。JPX(ISO/IEC 15444-2)扩展可加多个 jp2c(类似多页)+ uuid 自定义 box + XML metadata。这套 box 哲学跟 MP4 / HEIF 同源 —— 都把"容器和编码解耦"当成第一原则。

Fig 53 · JP2 file box tree. An ISOBMFF-style nested container — at the root: jP (12-byte signature) / ftyp (file type 'jp2 ') / jp2h (image-header super-box containing ihdr for width/height/depth and colr for colour space) / jp2c (the actual JPEG 2000 codestream payload). JPX (ISO/IEC 15444-2) extends this with multiple jp2c boxes (page-like), uuid custom boxes and XML metadata. The same box philosophy as MP4 / HEIF — "container decoupled from codec" as first principle.

技术内核

Technical core

JP2 / JPX 内核两件事。① JP2 = ISOBMFF box 容器 + JPEG 2000 codestream payload:容器层负责文件组织(签名 / 文件类型 / 图像头 / 色彩管理 / ICC profile / 分辨率信息 / metadata),payload 层是 JPEG 2000 的 wavelet codestream(EBCOT 码块 + 分辨率层 + 质量层),两者解耦。这套 box 哲学跟 MP4 / HEIF 同源,工业实现都共享 ISOBMFF parser。② JPX(ISO/IEC 15444-2)是 JP2 的扩展,加多 codestream(可装多张图,类似 PDF 的多页)、复杂 metadata(XML / RDF 集成,适合文化遗产场景描述古籍册次 / 著录信息)、富互动结构(超链接、分层标注)。JP2 在医学和卫星归档活下来的真正原因不是容器多复杂,而是 JPEG 2000 codestream 的两个核心特性:(a) 同一文件可无损或有损切换 —— 用 reversible 5/3 wavelet 是无损,irreversible 9/7 wavelet 是有损,客户端按 quality layer 选;DICOM 1.2.840.10008.1.2.4.91 transfer syntax 就是有损 9/7,部分医院 CT 用它做长期归档;(b) 任意分辨率层级解码 —— wavelet 多分辨率天然支持"先看 1/8 缩略图,再按需解 1/4、1/2、原始",对超大幅图像(古籍数字化 50K×50K 像素 / 卫星 10000×10000 多波段)做"渐进 + ROI 优先"流式查看是杀手锏。这种"同一文件多种用法"的能力 JPEG / WebP / AVIF 都没有(它们要么必须有损要么必须无损,无法切换)。

JP2 / JPX core, two pieces. ① JP2 = ISOBMFF box container + JPEG 2000 codestream payload: the container layer handles file organisation (signature / file type / image header / colour management / ICC profile / resolution / metadata); the payload is the JPEG 2000 wavelet codestream (EBCOT code blocks + resolution layers + quality layers); the two are decoupled. The same box philosophy as MP4 / HEIF — industrial implementations share the same ISOBMFF parser. ② JPX (ISO/IEC 15444-2) extends JP2 with multiple codestreams (page-like, à la PDF), richer metadata (XML / RDF integration — perfect for cultural-heritage cataloguing of book volumes and bibliographic records) and interactive structures (hyperlinks, layered annotations). The real reason JP2 lives on in medicine and satellite archiving isn't container sophistication — it's two core properties of the JPEG 2000 codestream itself: (a) lossless / lossy in one file — the reversible 5/3 wavelet is lossless, the irreversible 9/7 is lossy, and the client picks via quality layer; DICOM transfer syntax 1.2.840.10008.1.2.4.91 is lossy 9/7, used by some hospital CT archives; (b) arbitrary-resolution-layer decoding — wavelet multi-resolution naturally lets you "see a 1/8 thumbnail first, then decode 1/4, 1/2, original on demand". For huge imagery (50K×50K-pixel book scans, 10000×10000 multi-band satellite scenes), "progressive + ROI-priority" streaming is the killer feature. JPEG / WebP / AVIF have no equivalent — they're forced lossy or forced lossless, no switch.

适用

USE FOR

DICOM transfer syntax(医学影像归档)
卫星图像归档(ESA / NASA / 部分商业卫星)
文化遗产高保真扫描(LoC · 大英图书馆古籍)
电影 DCP(Digital Cinema Package · DCI 强制 JP2)
需要同一文件无损 / 有损切换的存档场景
需要任意分辨率层级 + ROI 优先解码的超大图

DICOM transfer syntax (medical-image archives)
Satellite-image archiving (ESA / NASA / commercial sats)
Cultural-heritage high-fidelity scans (LoC, British Library)
Digital Cinema Package (DCI mandates JP2)
Archives needing in-file lossless / lossy switch
Huge images needing arbitrary-resolution + ROI-priority decode

反适用

AVOID

Web(只 Safari 原生,Chromium / Firefox 拒绝)
移动端(解码 CPU / 内存开销大)
追求极致压缩比的现代场景(用 AVIF / HEIF)
不需要分辨率层级或 ROI 的普通图像

The web (Safari only — Chromium / Firefox refused)
Mobile (heavy decode CPU / memory)
Modern scenes chasing peak compression (use AVIF / HEIF)
Plain images without resolution-layer / ROI needs

scope	readers	editors	CLI
JP2 / JPX	~ Safari(macOS / iOS · 原生)· OpenJPEG(开源)· Kakadu(商业 · 性能基准)· DICOM 阅读器(MicroDicom · OsiriX · Horos)· GDAL · ImageMagick	~ Photoshop(JP2 / JPX 插件)· GIMP(部分版本)· IrfanView · 文化遗产专用扫描软件	`opj_decompress`(OpenJPEG)· `kdu_expand`(Kakadu)· `gdal_translate -of JP2OpenJPEG`

奇闻 · TRIVIA

TRIVIA

DICOM transfer syntax 1.2.840.10008.1.2.4.91 是 JPEG 2000 lossy —— 部分医院 CT / MR 归档系统就用它,因为 9/7 wavelet 在保留诊断细节的同时压缩比比 JPEG 大约高 20-30%,且同一文件 ROI 优先解意味着放射科医生可以"先看缩略图筛片,锁定可疑区域再解全分辨率",节省 PACS(Picture Archiving and Communication System)带宽。同源的无损 transfer syntax 1.2.840.10008.1.2.4.90 用 5/3 wavelet,法律强制无损归档(欧盟部分国家)的医院用它。电影 DCP(Digital Cinema Package)由 DCI(Digital Cinema Initiatives)强制规定每帧用 JPEG 2000 编码 —— 影院放映的所有数字电影,从 2K 到 4K HDR,底层都是 JP2 码流。

DICOM transfer syntax 1.2.840.10008.1.2.4.91 is JPEG 2000 lossy — used by parts of hospital CT / MR archives because 9/7 wavelet preserves diagnostic detail while compressing 20–30 % better than JPEG, and the in-file ROI-priority decode lets radiologists "scroll thumbnails to triage, then fully decode only the suspicious region", saving PACS (Picture Archiving and Communication System) bandwidth. The lossless sibling 1.2.840.10008.1.2.4.90 uses 5/3 wavelet — required by hospitals in jurisdictions that mandate lossless archives (some EU countries). Digital Cinema Packages mandated by DCI (Digital Cinema Initiatives) require every frame to be JPEG 2000 — every digital cinema projection, 2K through 4K HDR, ultimately rides on JP2 codestreams.

←起源:origin: JPEG 委员会 · 2000 ISO/IEC 15444-1(JP2)· 2003 15444-2(JPX) ←基于:based on: JPEG 2000 wavelet codestream + ISOBMFF box 容器 ↔同源容器哲学:cousin container philosophy: MP4 / HEIF / AVIF · 都是 ISOBMFF box 框架 →现实定位:real position: Web 死亡 · DICOM / 卫星归档 / 电影 DCP / 文化遗产高保真活得很好

PHASE VII

新兴 / 未来派 — 下一程旅程

Emerging · future — the next leg

这是下一程旅程——还没完全画出来。神经压缩用 NN 当 codec、JPEG XS 用 ms 级延迟接 8K 直播、HEIC Live Photo 把图片和视频塞进同一个容器、AVIF Sequence 把 AV1 video 接回图片。每一个都打开了一扇门,但门后是什么还说不准。读完会知道哪些是明天的 JPEG,哪些只是论文里的烟花。

This is the next leg of the journey — not yet fully drawn. Neural compression uses an NN as the codec; JPEG XS hits millisecond latency for 8K live broadcast; HEIC Live Photo packs an image and a video into one bundle; AVIF Sequence dials AV1 video back into images. Each opens a door, but no one's sure what's behind. Read these to tell which ones might be tomorrow's JPEG, and which are just paper-firework.

birth

→

edit

→

compress

→

transmit

→

decode

→

VRAM

→

sample

→

screen

WebP2 — Google 的实验后代

WebP2 — Google's experimental heir

YEAR 2021 起开发 AUTHOR Google libwebp2 团队 EXT .wp2(实验性) BASE 基于 AV1 思想自研 codec(非直接 AV1) STATUS 实验性 · libwebp2 0.x · 未推到 Chrome CONTEXT 与 AVIF / JXL 性能对比中

"WebP 的 mark II,但 AVIF 已经赢在了起跑线。"

"WebP Mark II — but AVIF already crossed the finish line."

2021 年 Google 启动 WebP2 项目,目标是"做 WebP 的下一代"—— 不再像 WebP v1 那样基于 VP8 帧内编码,而是吸收 AV1(2018 年发布的下一代视频 codec)的思想自研一套全新 codec,同时保留 WebP 的 web 优先哲学(简单容器、轻量解码、Chrome 直接支持)。但设计窗口已经关闭:WebP2 启动时,AVIF(直接基于 AV1 的图片格式)2019 年已被 Netflix / Google 推动落地,2020 年 Chrome 加入支持,2022 年 Firefox / Safari 全部跟进 —— Google 自己的浏览器都已经先支持了竞品。WebP2 在 libwebp2 仓库慢慢迭代,但从未推到 Chrome 主流支持;Google 自己也没有公开宣布要替代 WebP 或 AVIF。结果今天 WebP2 处于一种尴尬状态:技术上是真的在写,数据上压缩率确实跟 AVIF / JXL 在同一档,但商业上没有任何动机推它落地 —— 因为 AVIF 已经占据了"下一代 web 图片格式"的生态位。WebP2 项目自己的 README 第一句话就承认:"WebP 2 is an experimental successor of WebP. WebP 2 is not WebP, neither v2 of WebP."(WebP 2 是 WebP 的实验后继者,既不是 WebP,也不是 WebP 的 v2)—— 一个少见的、官方亲自打的"这是研究项目,不是产品"标签。

In 2021 Google started the WebP2 project, aiming to build "WebP's next generation" — no longer based on VP8 intra-frame coding like WebP v1, but absorbing ideas from AV1 (the 2018 next-gen video codec) into a freshly engineered codec, while keeping WebP's web-first philosophy (simple container, light decoder, native Chrome support). But the design window had already closed: by the time WebP2 launched, AVIF (the image format directly built on AV1) had been driven into production by Netflix and Google in 2019, picked up by Chrome in 2020, and joined by Firefox and Safari by 2022 — Google's own browser already supported the competitor. WebP2 keeps iterating in the libwebp2 repo, but has never been promoted to mainstream Chrome support; Google has never publicly committed to it replacing WebP or AVIF. Today WebP2 sits in an awkward limbo: technically real, with compression on par with AVIF / JXL, but with zero commercial pressure to ship — AVIF already owns the "next-gen web image" ecological niche. The project's own README opens with a rare self-aware disclaimer: "WebP 2 is an experimental successor of WebP. WebP 2 is not WebP, neither v2 of WebP." A research project, officially labelled as such.

图 54 · 四种"下一代 web 图片格式"的活跃度时间线 2010-2026。WebP(2010 Chrome 原生支持)是真正的主流;WebP2(2021 起开发)是细线 —— 实验性,从未被推到 Chrome 主线;AVIF(基于 2018 AV1,2019 落地)抢先占位"下一代 web 图片格式"生态位;JXL 在桌面 / 摄影 niche 活下来。WebP2 的窗口被 AVIF 关上的那一刻,就是 Chrome 自己 2020 年加入 AVIF 支持。

Fig 54 · activity timeline of four "next-gen web image" candidates, 2010-2026. WebP (Chrome native 2010) is actually mainstream; WebP2 (started 2021) is a thin line — experimental, never promoted into Chrome's main line; AVIF (built on AV1 2018, shipped 2019) seized the "next-gen web image" niche first; JXL survives in a desktop / photography niche. The moment that closed WebP2's window was Chrome itself adding AVIF support in 2020.

技术内核

Technical core

WebP2 内核两件事。① 基于 AV1 思想自研 codec(不直接用 AV1) —— Google 没有像 AVIF 那样直接抄 AV1 的帧内编码,而是从 AV1 借鉴几个思路(更大的 transform block 64×64、更聪明的 intra prediction、entropy coding 改进)然后自研一套独立 codec。原因有政治也有技术:技术上,Google 想做更轻量的解码器,AVIF 的解码器其实是 AV1 的子集,代码量大,移动设备 CPU 紧;政治上,WebP / VP9 / AV1 都是 Google 系的开放视频 codec 生态,WebP2 是想做"web 图片专用、不背 video codec 包袱"的小而美。但代价是 —— 没有现成的 AV1 解码器可借,得自己写。② 仍在 Google libwebp2 开发,未推到 Chrome 主流 —— libwebp2 是 Google 自己的开源库,在 GitHub 持续提交,但 Chrome 至今没有 webp2 的 image decoder 注册(对比 WebP 是 2010 年原生,AVIF 是 2020 年原生)。Google 自己也没公开 commit 推它落地 —— 一种"我们继续研究,但不答应商业化"的姿态。这种姿态在大公司开源项目里很少见,通常要么开发要么砍,WebP2 罕见地处于"长期实验状态"。

WebP2 core, two pieces. ① AV1-inspired but home-grown codec (not AV1 itself) — Google did not, like AVIF, just adopt AV1's intra-frame coding directly. Instead it borrowed ideas from AV1 (larger 64×64 transform blocks, smarter intra prediction, improved entropy coding) and engineered its own independent codec. The reason is partly political, partly technical: technically, Google wanted a lighter decoder — AVIF's decoder is essentially a subset of AV1, code-heavy and tight on mobile CPU; politically, WebP / VP9 / AV1 are all Google-aligned open video codecs, and WebP2 was meant to be a small purpose-built web-image codec without the video-codec baggage. The cost: no off-the-shelf AV1 decoder to borrow — everything written from scratch. ② Still in Google's libwebp2, never promoted to mainstream Chrome — libwebp2 is Google's own open-source library, with continuing GitHub commits, but Chrome has no webp2 image decoder registered (compare: WebP native since 2010, AVIF native since 2020). Google has never publicly committed to shipping it — a "we keep researching, but won't promise productisation" posture. Rare for big-company open source: usually it's either ship or kill — WebP2 sits in unusual long-term experimental limbo.

适用

USE FOR

(研究)Codec 对比基准
libwebp2 开发者社区实验
关注下一代图像 codec 的从业者跟踪样本

(Research) codec comparison benchmarks
libwebp2 developer-community experiments
Tracking sample for next-gen image-codec watchers

反适用

AVOID

任何生产环境(浏览器原生支持为零)
任何对兼容性有要求的场景
替代 AVIF / WebP / JXL —— 没有理由

Any production setting (zero native browser support)
Any compatibility-sensitive scenario
Replacing AVIF / WebP / JXL — no reason to

scope	readers	editors	CLI
WebP2	✗ 无浏览器原生 · ~ Google libwebp2 库自带的参考解码器	✗ 无主流编辑器支持	`cwp2` / `dwp2`(libwebp2 仓库自带 · 仅参考实现)

奇闻 · TRIVIA

TRIVIA

WebP2 项目 README 第一句就是:"WebP 2 is an experimental successor of WebP. WebP 2 is not WebP, neither v2 of WebP."(WebP 2 是 WebP 的实验后继者,既不是 WebP,也不是 WebP 的 v2)—— 一个公司开源项目罕见地、官方地、亲自地把"这是研究项目,不要在生产用"写在最显眼的地方。对比一下:WebP / AVIF / JXL 的 README 都是"这是产品,请用",WebP2 是"这是实验,请别用"。

WebP2's README opens with: "WebP 2 is an experimental successor of WebP. WebP 2 is not WebP, neither v2 of WebP." A rare case of a corporate open-source project officially, prominently, in-its-own-words declaring "this is a research project — do not use in production." Compare: the WebP / AVIF / JXL READMEs all read "this is a product, please use it"; WebP2's reads "this is an experiment, please don't."

←起源:origin: Google libwebp2(2021 起)· 基于 AV1 思想自研 ←前身:predecessor: WebP(VP8 帧内 → libwebp2 重做 codec) ↔平行竞争:parallel competitors: AVIF / JXL · 都在抢"下一代 web 图片"位 →现实定位:real position: 未来不确定 · 长期实验状态 · 浏览器原生支持为零

AVIF Sequence — 视频帧序列

AVIF Sequence — when stills become a video track

YEAR 2019(与 AVIF 同期) AUTHOR AOMedia(Alliance for Open Media) EXT .avif(同 AVIF 单图) BASE HEIF(ISOBMFF)容器 · AV1 video codec MODE 多 image item / video track + 帧间预测 STATUS 与 AVIF 相同 · Chrome / Firefox / Safari modern

"AVIF 的'多帧'就是把 AV1 的 video 模式接回来。"

"AVIF's multi-frame mode just dials AV1's video back in."

AVIF 单图模式只用 AV1 的 intra-frame(关键帧)编码 —— 因为 web 图片不需要"前一帧后一帧"。但 AVIF 用的容器是 HEIF(基于 ISOBMFF,跟 MP4 / JP2 同源),HEIF 容器原本就是为视频设计的,有完整的 video track 概念。所以 AVIF Sequence 做的事情非常简单:把 AV1 的 video 模式装回去 —— 让一个 AVIF 文件可以装多帧、有 timeline、可循环、可带帧间预测。结果是一种"高质量短动图替代品":代替 GIF 的 8 bit 256 色 + LZW 暴体积、代替 animated WebP 的 VP8 老 codec。实测体积比 animated WebP 小 30-50%,因为 AV1 的帧间预测远比 VP8 高效。但代价是 —— AVIF Sequence 是真正的视频压缩,带 motion estimation / motion compensation,编码时间是 animated WebP 的 10-30×。这意味着服务器侧预编码可行,用户实时上传不行:Twitter / Reddit / Imgur 这种用户上传场景你不能让用户等 30 秒;但 Cloudinary / imgix 这种 CDN 中间层服务器预编码 OK。AVIF Sequence 现在的实际用法:替代 GIF 表情包(质量 + 体积都赢)、替代 web 短动画(.mp4 太重 / GIF 太丑的中间地带)、替代某些 Live Photo 场景(iOS HEIF 走的是相邻路线)。

AVIF's still-image mode only uses AV1's intra-frame (keyframe) coding — web images don't need "previous-frame / next-frame". But AVIF's container is HEIF (built on ISOBMFF, sharing roots with MP4 / JP2), and HEIF was designed for video in the first place, with a full video-track concept. So AVIF Sequence does something extremely simple: dial AV1's video mode back in — let a single AVIF file hold multiple frames, with a timeline, looping, and inter-frame prediction. The result is a "high-quality short-animation substitute": replacing GIF's 8-bit 256-colour LZW bloat and animated WebP's older VP8 codec. Measured sizes are 30–50% smaller than animated WebP, because AV1's inter-frame prediction is far more efficient than VP8. The cost: AVIF Sequence is real video compression, with motion estimation / motion compensation — encoding takes 10–30× longer than animated WebP. So server-side pre-encoding is fine, real-time user uploads are not: Twitter / Reddit / Imgur, where users upload live, can't make the user wait 30 seconds; Cloudinary / imgix as a CDN middle layer can. Today's actual uses: replacing GIF stickers (better quality and smaller); replacing web short animations (the middle ground between heavy .mp4 and ugly GIF); replacing some Live-Photo flows (iOS HEIF takes a parallel path).

图 55 · AVIF Sequence 的 HEIF box 结构。ftyp 用 brand avis(sequence)区分单图 avif;meta 装 image items(沿用单图模式的描述方式);moov 是真正的视频 track,装 AV1 codec 的 I 帧(intra,关键帧)+ P 帧(inter,帧间预测)的时间线。"I-P-P-P-P-I-..."就是 AVIF Sequence 比 animated WebP 小 30-50% 的原因 —— P 帧只编码"和上一帧的差",而 animated WebP 每帧都是独立的。

Fig 55 · AVIF Sequence's HEIF box structure. ftyp uses brand avis (sequence) to distinguish from still avif; meta holds image items (reusing still-mode description); moov is the actual video track with an AV1-codec timeline of I-frames (intra / keyframe) and P-frames (inter / predicted from previous). The "I-P-P-P-P-I-..." pattern is exactly why AVIF Sequence is 30–50% smaller than animated WebP — P-frames encode only the delta, while animated WebP encodes every frame independently.

技术内核

Technical core

AVIF Sequence 内核两件事。① HEIF 容器内多 image item 或 video track —— HEIF(High Efficiency Image Format)容器是 ISOBMFF 风格的 box 结构,跟 MP4 / JP2 同源。AVIF 单图模式用 meta box 装一个 image item(只一帧 intra);AVIF Sequence 有两种装法:(a)多 image item(每帧独立 intra,跟单图一样,只是多个);(b)走 moov video track(真正的视频 track,有时间戳、可循环、可装 inter 帧)。ftyp 用 brand 区分:avif 是单图,avis 是 sequence。② 帧间预测可选(不一定都是 intra) —— 如果走 video track 模式,AVIF Sequence 就是真正的视频压缩:P 帧编码"和上一帧的差",B 帧编码"和前后帧的差",带 motion estimation / motion compensation 整套机制。这是它比 animated WebP / GIF 小 30-50% 的根因 —— animated WebP 每帧都是独立 VP8 intra(本质上是多张静图叠在一起),AVIF Sequence 真把"运动"压缩了。但这个能力的代价非常贵:编码时间 10-30× animated WebP,因为 motion estimation 是计算密集型搜索;客户端实时编码不可行(用户上传不能让其等 30 秒),只能服务器预编码或 CDN 中间层转码。这种"质量 / 体积赢、编码慢"的权衡跟 AVIF 单图是一致的 —— AVIF 全家就是"花更多 CPU 换更小文件"。

AVIF Sequence core, two pieces. ① Multiple image items or a video track inside HEIF — the HEIF (High Efficiency Image Format) container is an ISOBMFF-style box structure, sharing roots with MP4 / JP2. AVIF still mode places a single image item in a meta box (one intra frame). AVIF Sequence has two ways: (a) multiple image items (every frame independent intra, like still mode but several of them); (b) a real moov video track (with timestamps, looping, and inter frames). The ftyp brand distinguishes them: avif for still, avis for sequence. ② Inter-frame prediction is optional (not necessarily all intra) — in video-track mode AVIF Sequence is actual video compression: P-frames encode the delta from the previous frame, B-frames encode deltas from both sides, complete with motion estimation / motion compensation. This is why it's 30–50% smaller than animated WebP / GIF — animated WebP is essentially a stack of independent VP8-intra stills, while AVIF Sequence really compresses motion. The cost: encoding takes 10–30× animated WebP, because motion estimation is a compute-intensive search; client-side real-time encoding isn't viable (you can't make a user wait 30 seconds on upload), so it lives on the server side or in a CDN transcoder. The same "quality / size win, slow encode" trade-off as still AVIF — the whole AVIF family trades CPU for smaller files.

适用

USE FOR

高质量短动图(替代 GIF / animated WebP)
表情包 / sticker / 反应图(质量 + 体积双赢)
web 短动画(.mp4 太重 / GIF 太丑的中间地带)
服务器预编码 / CDN 中间层转码场景
Live Photo 类场景(短视频 + 关键帧静图)

High-quality short animations (replacing GIF / animated WebP)
Stickers / reactions (smaller and better-looking)
Web short animations (the middle ground between heavy .mp4 and ugly GIF)
Server-side pre-encode / CDN transcode setups
Live-Photo-like flows (short video plus a keyframe still)

反适用

AVOID

客户端实时编码(用户上传场景 · 编码 10-30× 慢)
需要真正视频功能(音轨 / 长时长 · 改用 .mp4)
老浏览器兼容场景(同 AVIF · IE / 老 Safari 不行)

Client-side real-time encoding (user uploads — 10–30× slower)
True video features (audio track / long duration — use .mp4)
Legacy browsers (same as AVIF — IE / old Safari out)

scope	readers	editors	CLI
AVIF Sequence	✓ Chrome / Firefox / Safari modern · iOS / macOS Photos · libavif	~ ffmpeg(via libavif)· FFmpeg-based 转码工具	`avifenc -k 0 frames/*.png anim.avif` · `ffmpeg -i in.mp4 -c:v libaom-av1 out.avif`

奇闻 · TRIVIA

TRIVIA

AVIF Sequence 体积比 animated WebP 小 30-50%,但编码时间是 animated WebP 的 10-30× —— 服务器预编码完全 OK(花一次 CPU 时间换无数次小流量),用户实时上传完全不行(没人愿意等 30 秒)。这是AVIF 全家共同的非对称:解码快、编码慢,所以"中央编码 + 边缘分发"的 CDN / 流媒体场景大赚,但"P2P 上传 / 客户端编辑"场景不适合。结论:AVIF Sequence 不是来取代所有动图的,是来填"高质量短动图、可以接受服务器编码慢"的中间生态位的。

AVIF Sequence is 30–50% smaller than animated WebP — but encodes 10–30× slower. Perfect for server-side pre-encoding (one-time CPU cost vs. countless small downloads), useless for real-time user uploads (no one waits 30 seconds). It's the AVIF family's shared asymmetry: fast decode, slow encode — so "centralised encode + edge-distribute" CDN / streaming wins big, while "P2P upload / client-side edit" loses. The takeaway: AVIF Sequence isn't here to replace all animations — it fills the niche of "high-quality short animation, server-side encoding tolerated".

←起源:origin: AOMedia AV1(2018)· AVIF(2019)同时定义了 sequence 模式 ←基于:based on: AVIF 单图 + AV1 video codec + HEIF video track ↔替代:replaces: animated WebP / GIF(高质量短动图场景) →现实定位:real position: 服务器预编码场景 OK · 用户实时上传不行 · 表情包 / 短动画首选

JPEG XS — 低延迟广播

JPEG XS — sub-millisecond broadcast

YEAR 2018(ISO/IEC 21122) AUTHOR JPEG WG(ISO/IEC JTC 1/SC 29/WG 1) EXT .jxs · 多直接走流 BASE 简化 wavelet(不完整 EBCOT) DESIGN < 1 ms 编 / 解延迟 · 视觉无损(高 bpp) STATUS 广播(SMPTE 2110)/ IP 视频 / VR

"为'实时'而生:压缩比小,延迟极低,正好接 4K/8K 直播。"

"Built for live — modest compression, microsecond latency, just right for 4K/8K broadcast."

现代 4K / 8K 广播正在从 SDI 光纤切到 IP 流(SMPTE 2110 标准):传统电视台用 12G-SDI 光纤把 4K 信号从摄像机送到导播台,布线昂贵;新一代直接走以太网 IP 包,跟数据中心同基础设施。但 IP 流要解决一个传统 SDI 不存在的问题:带宽。4K 60p 未压缩是 12 Gbps,8K 是 48 Gbps,数据中心万兆 / 二十五兆以太网装不下。所以需要"压一下,但不能影响实时性"的 codec。JPEG XL 太复杂(编码慢)、JPEG 2000 也慢(EBCOT entropy coding 计算量大)、H.264 / H.265 / AV1 是视频 codec 但有帧间预测延迟(至少要缓 1-2 帧才能编),完全不行。JPEG WG 在 2018 年推 JPEG XS(ISO/IEC 21122):简化的 wavelet(不做完整 EBCOT,只用更轻量的 entropy coding),牺牲压缩比(只 4-6×,而 JPEG 是 10-20×、JPEG 2000 是 20-50×),换微秒到亚毫秒级编 / 解延迟。设计目标写在标准首页:"visually lossless at 4-6× compression with sub-millisecond latency"(视觉无损 + 4-6 倍压缩 + 亚毫秒延迟)。SMPTE 2110-22(2019)正式把 JPEG XS 列入 IP 广播标准的 mezzanine compression 层。VR 头显的 wireless display(无线 VR · 把 PC 渲染的画面无线传到头显)也用 —— 因为头显需要"运动到光子"<20ms 延迟才能不晕,JPEG XS 的<1ms 编 / 解给了足够预算。

Modern 4K / 8K broadcast is moving from SDI fibre to IP streams (SMPTE 2110): traditional TV stations used 12G-SDI fibre to ship 4K from camera to control room — expensive cabling. New ones run straight Ethernet IP, sharing infrastructure with data centres. But IP brings a problem SDI never had: bandwidth. 4K 60p uncompressed is 12 Gbps; 8K is 48 Gbps — 10/25 Gigabit Ethernet can't carry it raw. So you need a codec that "compresses a little without breaking real-time". JPEG XL is too complex (slow encode); JPEG 2000 is also slow (EBCOT entropy coding is heavy); H.264 / H.265 / AV1 are video codecs but have inter-frame-prediction latency (need to buffer 1–2 frames before encoding), totally unacceptable. The JPEG WG shipped JPEG XS in 2018 (ISO/IEC 21122): simplified wavelet (no full EBCOT, lighter entropy coding), trading compression ratio (only 4–6× — vs. JPEG's 10–20× and JPEG 2000's 20–50×) for microsecond-to-sub-millisecond encode / decode latency. The standard's front page literally says: "visually lossless at 4-6× compression with sub-millisecond latency". SMPTE 2110-22 (2019) formally adopted JPEG XS as the mezzanine-compression layer for IP broadcast. VR headsets using wireless display (PC-rendered frames sent wirelessly to the headset) use it too — because headsets need "motion-to-photon" latency under 20 ms to avoid sickness, and JPEG XS's sub-1 ms encode / decode leaves enough budget for everything else.

图 56 · JPEG / JPEG 2000 / JPEG XS 在"延迟 × 压缩比"二维平面上的散点。横轴是编 + 解总延迟(对数,μs → s),纵轴是压缩比。JPEG XS 在左下角(亚毫秒延迟、4-6× 压缩比),JPEG 在中间(~10ms、10-20×),JPEG 2000 在右上角(~100ms、30-50×)。三者各占不同生态位:JPEG XS = 实时广播 / VR、JPEG = 通用 web、JPEG 2000 = 离线归档。同一委员会出三种工具,各管一片。SMPTE 2110 标记标注 JPEG XS 的实际部署位置。

Fig 56 · JPEG / JPEG 2000 / JPEG XS plotted on a two-dimensional latency × compression-ratio plane. X axis is total encode + decode latency (log scale, μs → s); Y axis is compression ratio. JPEG XS sits bottom-left (sub-millisecond, 4–6×); JPEG middle (~10 ms, 10–20×); JPEG 2000 top-right (~100 ms, 30–50×). Three different niches: JPEG XS = real-time broadcast / VR, JPEG = general web, JPEG 2000 = offline archives. The same committee shipped three tools, each owning a region. The SMPTE 2110 marker shows JPEG XS's actual deployment.

技术内核

Technical core

JPEG XS 内核三件事。① 简化的 wavelet(不做完整 EBCOT) —— JPEG 2000 的核心是 5/3 或 9/7 wavelet 加上 EBCOT(Embedded Block Coding with Optimal Truncation)entropy coding,EBCOT 提供超高压缩比但计算密集。JPEG XS 砍掉 EBCOT,只保留更简化的小波分解 + 轻量 entropy coding(直接 run-length / VLC),损失大概 5-10× 压缩比但获得10-100× 速度。② 视觉无损(typical 4-6× compression) —— 设计目标不是"压到最小",而是"压到肉眼看不出区别但带宽够省"。在 4K 60p 12 Gbps 场景,4-6× 压到 2-3 Gbps,刚好塞进 10 Gigabit Ethernet。这种"够用就好"的目标决定了它不会出现在 web(web 要求最小体积)、不会出现在归档(归档要求最高保真)。③ 帧内独立编码,无需缓冲 —— 每帧完全独立(类似 motion JPEG / motion JPEG 2000),没有帧间预测,所以编码器拿到一帧立刻编、解码器拿到一帧立刻解,延迟主要来自计算时间本身(<1 ms),不来自缓冲。这是相比 H.264 / AV1 这些视频 codec 的本质差异:视频 codec 必须缓冲 1-2 帧才能做 motion estimation,JPEG XS 完全不缓冲。代价是没有视频 codec 那种"压两个数量级"的能力,但这是 trade-off,不是缺陷。

JPEG XS core, three pieces. ① Simplified wavelet (no full EBCOT) — JPEG 2000's core is the 5/3 or 9/7 wavelet plus EBCOT (Embedded Block Coding with Optimal Truncation) entropy coding; EBCOT delivers very high compression but at heavy computational cost. JPEG XS strips EBCOT, keeping a much simpler wavelet decomposition plus lightweight entropy coding (direct run-length / VLC) — losing roughly 5–10× compression but gaining 10–100× speed. ② Visually lossless (typical 4–6× compression) — the design goal isn't "compress as much as possible", it's "compress until the eye can't tell, while saving useful bandwidth". On 4K 60p at 12 Gbps, 4–6× brings it down to 2–3 Gbps — fitting cleanly inside 10 Gigabit Ethernet. This "good enough" target keeps it out of the web (which wants the smallest size) and out of archives (which want the highest fidelity). ③ Intra-frame independent coding, no buffering — every frame is fully independent (like motion JPEG / motion JPEG 2000), no inter-frame prediction, so the encoder can encode the moment a frame arrives and the decoder can decode the moment it lands. Latency comes from compute alone (< 1 ms), not buffering. This is the essential difference vs. H.264 / AV1 video codecs: video codecs must buffer 1–2 frames to run motion estimation; JPEG XS buffers nothing. The price is no two-orders-of-magnitude video-style compression — but that's the trade-off, not a defect.

适用

USE FOR

4K / 8K 广播 IP 流(SMPTE 2110-22 mezzanine 层)
VR 头显 wireless display(PC → 头显无线传图)
实时多机位摄影棚 IP 切换台
低延迟视频墙 / 监控墙(楼宇 / 控制室)
"够用即可、宁要延迟不要压缩比"的实时场景

4K / 8K broadcast IP streams (SMPTE 2110-22 mezzanine)
VR-headset wireless display (PC → headset)
Live multi-camera studio IP switching
Low-latency video walls (control rooms / signage)
Real-time use cases where "good enough" beats "smallest"

反适用

AVOID

Web 静图(用 JPEG / WebP / AVIF)
归档 / 压缩比敏感场景(用 JPEG 2000 / JXL)
VOD / 离线点播(用 H.265 / AV1 视频 codec)

Web stills (use JPEG / WebP / AVIF)
Archives / size-sensitive scenes (use JPEG 2000 / JXL)
VOD / offline playback (use H.265 / AV1 video codecs)

scope	readers	editors	CLI
JPEG XS	✗ 浏览器无 · ~ intoPIX SDK · SMPTE 2110-22 设备 · Kakadu(部分)	✗ 无主流编辑器 · 都是广播链路硬件 / 软件	~ CLI 限商业(intoPIX)· 开源参考实现仅用于研究

奇闻 · TRIVIA

TRIVIA

JPEG XS 设计目标白纸黑字写在标准首页:"visually lossless at 4-6× compression with sub-millisecond latency"(视觉无损 + 4-6 倍压缩 + 亚毫秒延迟)。这三个数字一起出现是关键 —— "视觉无损"+"压缩"两条已经被 JPEG 2000 / 各家视频 codec 实现过,但没有一个能做到"亚毫秒"。SMPTE 在 2019 年把 JPEG XS 写入 2110-22 标准 mezzanine compression 层,意味着所有走 SMPTE 2110 IP 广播的 4K / 8K 信号 mezzanine 层都是 JPEG XS。BBC、NHK、央视等公司的新一代 IP 演播室,你看不到 JPEG XS,但每帧都在被它压一下又解一下。

JPEG XS's design goal is written verbatim on the standard's front page: "visually lossless at 4-6× compression with sub-millisecond latency". The three numbers together are the point — "visually lossless" and "compressed" had been done by JPEG 2000 and various video codecs before, but none of them hit "sub-millisecond". SMPTE wrote JPEG XS into the 2110-22 standard's mezzanine-compression layer in 2019, meaning every 4K / 8K SMPTE 2110 IP-broadcast signal's mezzanine layer is JPEG XS. At BBC, NHK, CCTV's new IP studios, you'll never see JPEG XS — but every frame is being compressed and decompressed by it.

←起源:origin: JPEG WG · 2018 ISO/IEC 21122 · 为 IP 广播 / 实时设计 ←基于:based on: JPEG 2000 简化版本(砍 EBCOT 换速度) ↔平行存在:parallel niche: 在广播 / VR 业各自占据"低延迟视觉无损"生态位 →现实定位:real position: SMPTE 2110-22 · 4K / 8K IP 广播 mezzanine · VR wireless display

神经压缩 — HiFiC / CDC / NN-codec

Neural compression — HiFiC / CDC / NN-codec

YEAR 2017 起(Toderici 2016 / Ballé 2018 / HiFiC 2020 / CDC 2023) AUTHOR Google / NYU / Stanford / Tencent / NVIDIA / Disney Research EXT 无统一(.nn-img · .hific · .cdc 各家各自) BASE Encoder NN + Hyperprior + Entropy + Decoder NN(±GAN ±Diffusion) DEPLOYMENT 解码必须 GPU(模型 10-50M params · 几十 MB) STATUS 实验 / 部分商用试点 · 短期不替代 AVIF

"用神经网络当 codec —— 同 bpp 下视觉效果比 AVIF 好 30%,但解码要 GPU。"

"Use a neural net as codec — visually 30% better than AVIF at same bpp, but needs a GPU to decode."

传统 codec 的设计哲学是手工设计 transform + 量化 + 熵编码:JPEG 用 8×8 DCT、AVIF 用 AV1 intra block 变换、JPEG 2000 用 wavelet —— 每一步都是人写的数学。神经压缩从 2016-2017 起换了路:整个 codec 是一个端到端可训练的神经网络。Toderici 等人 2016 年在 ICLR 用 RNN 做图像压缩;Ballé 等人 2018 年在 ICLR 提出 Hyperprior(用一个小网络估计 latent 的概率分布给熵编码器,大幅提升压缩比);Mentzer / Toderici 等人 2020 年在 NeurIPS 发表 HiFiC(High-Fidelity Generative Compression),引入 GAN 训练让低 bpp 重建有"细节合成";2023 年 Stanford 出 CDC(Conditional Diffusion Codec)用扩散模型当 decoder。在视觉相似度指标(MS-SSIM / LPIPS)上明显赢传统 codec —— 特别在极低 bpp(< 0.3 bpp):传统 codec 这时已经糊成方块、出 ringing,而 NN codec 可以"幻觉"出合理的纹理和细节(虽然不是真实的 —— 是plausibly hallucinated)。但工业部署寥寥:解码器 NN 必须随客户端分发(几十 MB 模型 vs 几 KB 图,反向负担);模型版本升级会让旧 .nn-img 解不出来;解码 GPU 依赖让移动端不可接受;学术界每 6 个月一篇 NeurIPS 论文宣布超越 AVIF 30%,但生产部署没几个真站住的。短期不会替代 AVIF,但可能在"AI 生成内容"领域率先落地 —— 同 AI 生成的图,用 AI 压缩。

Traditional codecs are hand-designed transforms + quantisation + entropy coding: JPEG uses 8×8 DCT, AVIF uses AV1 intra-block transforms, JPEG 2000 uses wavelets — every step is human-written maths. Neural compression took a different path from 2016–2017: the whole codec is a single end-to-end trainable neural network. Toderici et al. did RNN-based compression at ICLR 2016; Ballé et al. introduced the Hyperprior at ICLR 2018 (a small network estimating the latent's probability distribution for the entropy coder, dramatically improving ratios); Mentzer / Toderici et al. published HiFiC (High-Fidelity Generative Compression) at NeurIPS 2020, adding GAN training so low-bpp reconstructions get "detail synthesis"; in 2023 Stanford shipped CDC (Conditional Diffusion Codec) using a diffusion model as decoder. On visual-similarity metrics (MS-SSIM / LPIPS) they clearly beat traditional codecs — especially at very low bpp (< 0.3 bpp): traditional codecs by then are blocky and full of ringing, while NN codecs can "hallucinate" plausible texture and detail (not real — plausibly hallucinated). Industrial deployment stays thin: the decoder NN must ship with the client (tens of MB of model vs. a few KB of image — inverted load); a model-version bump makes old .nn-img files undecodable; GPU dependency rules out mobile; every six months a NeurIPS paper claims +30% over AVIF, but few productions actually stick. Short term it won't replace AVIF, but it may land first in "AI-generated content" — AI images compressed by AI.

图 57a · 神经压缩典型 codec pipeline。编码侧:image → Encoder NN → latent(continuous tensor)→ quantise + entropy → bytes。解码侧:bytes → entropy decode → latent → Decoder NN → image。Encoder 和 Decoder 都是 CNN(典型 10-50M params),通过端到端联合训练优化"重建损失 + 熵率"的拉格朗日组合。最关键的非对称是:解码 NN 必须随客户端分发,这给"图很小但模型很大"的反向负担埋下伏笔 —— web 流量是体积敏感的,几十 MB 的模型一次下载,几 KB 的图无数次,要算才算得过。

Fig 57a · A typical neural-codec pipeline. Encode: image → Encoder NN → latent (continuous tensor) → quantise + entropy → bytes. Decode: bytes → entropy decode → latent → Decoder NN → image. Encoder and Decoder are both CNNs (typically 10–50 M params), trained end-to-end against a Lagrangian of "reconstruction loss + entropy rate". The defining asymmetry: the decoder NN must ship with the client — the "tiny image, fat model" inverted load. Web traffic is size-sensitive: tens of MB of model downloaded once vs. a few KB of image many times — the maths only works if usage is heavy enough.

图 57b · HiFiC 的 GAN-based 训练三角。在 Ballé Hyperprior 的"Encoder + Decoder"基础上加一个 Discriminator,Decoder 既要让重建图像原图(rate-distortion 项),又要让重建图看起来真实到能骗过 Discriminator(GAN 项)。低 bpp 时 rate-distortion 单独走会出糊状方块,GAN 项把"hallucinated 但 plausible 的纹理"补回去。结果是:HiFiC 在 0.1 bpp 看比 AVIF 0.3 bpp 还好 —— 但补出来的细节是"像那么回事",不是真的,法医 / 医学场景不能用。

Fig 57b · HiFiC's GAN-based training triangle. Add a Discriminator on top of Ballé's Hyperprior "Encoder + Decoder". The Decoder now has to make the reconstruction look like the original (rate-distortion term) AND look real enough to fool the Discriminator (GAN term). At low bpp, pure rate-distortion produces blurry mush; the GAN term fills in "hallucinated but plausible" texture. Result: HiFiC at 0.1 bpp looks better than AVIF at 0.3 bpp — but the synthesised detail is "looks right", not "is right". Forensic / medical use is forbidden.

图 57c · 同 bpp 下的视觉质量对比(示意)。横向三档 bpp(0.1 / 0.2 / 0.5),纵向 JPEG / AVIF / HiFiC 三种 codec。在极低 bpp(0.1):JPEG 已经糊成方块、AVIF 涂成马赛克,HiFiC 用 GAN 幻觉出可信纹理。在中 bpp(0.2):AVIF 翻盘可用,HiFiC 仍领先。到常规 bpp(0.5):三者趋近,HiFiC 的优势缩小。结论:NN codec 在极低 bpp 占优最明显,这是它的真正生态位 —— 但极低 bpp 应用场景本身有限(普通 web 跑 0.5 bpp 不缺地方放)。

Fig 57c · Visual quality at the same bpp (illustrative). Three bpp columns (0.1 / 0.2 / 0.5); three codecs in rows (JPEG / AVIF / HiFiC). At very low bpp (0.1): JPEG is blocky, AVIF mosaic-y; HiFiC hallucinates plausible texture via GAN. At mid bpp (0.2): AVIF becomes usable, HiFiC still ahead. At regular bpp (0.5): all three converge, HiFiC's edge shrinks. Takeaway: NN codecs win most at very low bpp, which is their real niche — but very-low-bpp applications are themselves limited (regular web at 0.5 bpp has plenty of room).

图 57d · 解码时间(单帧 1080p · 对数横轴)。JPEG 微秒级、AVIF 毫秒级(已经是"重 codec")、HiFiC有 GPU 是几十毫秒、HiFiC 纯 CPU 是秒级(1.5 秒一帧)。这条数据线就是 NN codec 工业部署最大的拦路虎 —— 桌面 GPU 还能接受,手机 / IoT / 老设备完全不行。所以 NN codec 短期不会进 web,要进也是"先在云端 / 流媒体的服务器侧解了再发"。

Fig 57d · Decode time (single 1080p frame, log x-axis). JPEG microseconds; AVIF milliseconds (already a "heavy codec"); HiFiC with GPU tens of ms; HiFiC CPU only is seconds (≈ 1.5 s/frame). This single line is NN-codec deployment's biggest blocker — desktop GPUs can swallow it, but phones / IoT / older devices can't. So NN codecs won't reach the web in the short term — if they appear, it'll be "decode in the cloud / streaming server, then send the pixels".

技术内核

Technical core

神经压缩内核五块。① Encoder / Decoder 都是 CNN(典型 10-50M params),从图到 latent 是几层下采样卷积 + GDN(generalised divisive normalisation)非线性激活,从 latent 到图是对应的反卷积 / 上采样;两侧权重通过端到端反向传播联合训练。② 超先验(Hyperprior):Ballé 2018 的关键贡献 —— 用一个小网络估计 latent 每个 channel 的 Gaussian / Laplace 概率分布参数 σ,再用 σ 喂 arithmetic coder。这一步让"latent 的统计结构"被显式建模,熵编码效率提升一个量级;之后所有 NN codec 都沿用 Hyperprior 思路。③ GAN 训练(HiFiC):在 rate-distortion 损失之外加一个 Discriminator 判别"重建图 vs 原图",Decoder 学着"骗过 Discriminator"。低 bpp 重建从"糊状方块"变成"幻觉的合理纹理",MS-SSIM / LPIPS / FID 都大幅好转,但细节是合成的不是真实的 —— 这是 NN codec 不能用于法医 / 医学的根本原因。④ Diffusion-based codec(CDC):Stanford 2023 工作 —— Decoder 不是单次反卷积,而是一个条件扩散模型(以 latent 为条件,从噪声开始多步去噪到图像)。优势:diffusion 的"多步细化"对低 bpp 修复尤其好;劣势:解码 50-100 步 NN forward,慢到完全反实时(1080p 几秒)。CDC 现在还是学术阶段,但代表了 NN codec 的下一程方向。⑤ 解码必须 GPU:这是工业部署最大的物理约束 —— 移动端的 GPU(Adreno / Mali)架构跟桌面 NVIDIA 差太远,也跟 NN 推理优化(TensorRT / Core ML)的高端路径差太远;Web 上要做就得走 WebGPU,但目前 WebGPU 对 NN 推理的优化跟原生差 5-10×。所以 NN codec 现在的工业部署模式都是"中央服务器 GPU 解码 → 把解出来的 RGB 再压成 AVIF / WebP / VP9 → 发给客户端" —— 客户端从来没真正解过 NN codec 的码流。

Neural compression's core, five pieces. ① Encoder / Decoder are both CNNs (typically 10–50 M params); image → latent is a few downsampling convolutions plus GDN (generalised divisive normalisation) non-linearity; latent → image is the matching upsampling. Both sides are jointly trained end-to-end via backprop. ② Hyperprior: Ballé 2018's key contribution — a small network estimates the Gaussian / Laplace distribution parameters σ for every channel of the latent, then feeds σ into the arithmetic coder. This explicitly models the latent's statistical structure, lifting entropy efficiency by an order of magnitude; every NN codec since uses Hyperprior. ③ GAN training (HiFiC): on top of rate-distortion loss, add a Discriminator distinguishing "reconstruction vs. original"; the Decoder learns to "fool the Discriminator". Low-bpp reconstructions go from "blurry mush" to "hallucinated plausible texture"; MS-SSIM / LPIPS / FID all improve sharply — but the detail is synthesised, not real. That's the fundamental reason NN codecs can't be used in forensic / medical settings. ④ Diffusion-based codec (CDC): Stanford 2023 — the Decoder isn't a single deconvolution but a conditional diffusion model (start from noise, denoise to the image conditioned on the latent). Pros: diffusion's "multi-step refinement" works especially well for low-bpp restoration. Cons: 50–100 NN forwards per decode, completely off real-time (seconds per 1080p frame). CDC is still academic but charts the next leg. ⑤ Decoding requires a GPU: the biggest physical deployment constraint — mobile GPUs (Adreno / Mali) differ too much from desktop NVIDIA, and from NN-inference optimisation paths (TensorRT / Core ML). On the web you'd go through WebGPU, but its NN-inference performance is 5–10× behind native. So today's industrial NN-codec deployments are "central GPU decode → re-compress as AVIF / WebP / VP9 → ship to client" — clients never actually decode the NN bitstream.

必须随客户端,且必须 GPU 加速。码流不能跨模型版本兼容 — 模型升级则旧 .nn-img 解不出来(没有 backward compatibility)。现行工业部署模式:中央 GPU 解码 → 转 AVIF / WebP 发给客户端,客户端从未真正解过 NN 码流。短期不会进 web · 中期可能在"AI 生成内容 + AI 压缩"垂直生态率先落地 · 长期 ?

图 57 · 神经压缩完整流程。训练阶段(一次性):大数据集(CLIC / OpenImages ~1M 图)喂进 Encoder + Hyperprior + Decoder + (Discriminator) 联合训练,损失函数是"熵率 + 重建距离 + GAN 损失"的拉格朗日组合,几周 GPU 集群训出 10-50M 参数的模型(.pt / .onnx 文件 30-150 MB)。模型必须随 Decoder 一起分发到客户端。推理阶段(每次编码 / 解码):原图 → Encoder NN → quantise → arithmetic encode → bytes(0.1 bpp 的 1080p ≈ 30 KB);bytes → arithmetic decode → Decoder NN → 重建图。GPU 上每帧 ~80 ms,纯 CPU ~1.5 s。最反直觉的部分:模型升级会让旧 .nn-img 解不出来 —— 这跟 JPEG / PNG / AVIF 那种"几十年向后兼容"完全相反,这也是 NN codec 短期不可能上 web 主战场的根本原因。

Fig 57 · Full neural-compression workflow. Training (one-off): a large dataset (CLIC / OpenImages, ~1 M images) trains Encoder + Hyperprior + Decoder + (Discriminator) jointly under a Lagrangian of "entropy rate + reconstruction distance + GAN loss". A few weeks on a GPU cluster yields a 10–50 M-param model (.pt / .onnx, 30–150 MB). The model must ship to clients alongside the Decoder. Inference (per encode / decode): image → Encoder NN → quantise → arithmetic encode → bytes (1080p at 0.1 bpp ≈ 30 KB); bytes → arithmetic decode → Decoder NN → reconstruction. ~80 ms / frame on GPU, ~1.5 s on CPU. The most counterintuitive part: a model bump makes old .nn-img files undecodable — the opposite of JPEG / PNG / AVIF's "decades of backward compatibility", which is the fundamental reason NN codecs can't fight on the web's main front in the short term.

历史专栏 · 神经压缩的十年

HISTORY · A DECADE OF NEURAL COMPRESSION

从 Toderici 2016 到 CDC 2023 · 学术 30% 与生产 0%

From Toderici 2016 to CDC 2023 · "+30% in papers, 0% in production"

2016 年 Toderici 等人在 ICLR 发表"Variable-Rate Image Compression with Recurrent Neural Networks",这是第一篇真正用 NN 做端到端图像压缩的工作 —— 用 RNN 多次迭代,每次输出 residual,堆起来达到目标 bpp。压缩比刚刚比肩 JPEG,实用价值有限,但架起了"端到端可训练 codec"的概念框架。Google Research(Toderici 在 Google)后来一直主导 NN codec 方向。

2018 年 Ballé 等人(NYU + Google)在 ICLR 发表"Variational Image Compression with a Scale Hyperprior",引入Hyperprior:用一个小网络估计 latent 每个 channel 的概率分布给 arithmetic coder。这步把"latent 的统计结构"变成可学习,熵编码效率猛涨,首次在 MS-SSIM / PSNR 上超过 BPG(HEVC intra)。Hyperprior 之后成为所有 NN codec 的标配,Ballé 这篇是整个领域的奠基论文,至今被引用近万次。

2020 年 NeurIPS,Mentzer / Toderici / Tschannen / Agustsson(Google Research)发表"High-Fidelity Generative Compression",简称 HiFiC。它在 Hyperprior 基础上加 GAN 训练,在极低 bpp(0.1-0.3)视觉效果显著超越 BPG / AVIF —— 用户研究里(asked humans which they preferred)HiFiC 0.237 bpp 跟 BPG 0.5 bpp 不分伯仲,跟 BPG 1.0 bpp 仍可一战。这是 NN codec 第一次"在主观质量上明显赢"的论文,引爆了一波业界关注。Google 内部据说在 Stadia 云游戏视频流试点,但 Stadia 2023 关了。

2022-2023 年 Tencent 推 ELIC、NVIDIA / Disney Research 各推自家 NN codec,竞争白热化。2023 年 Stanford 出 CDC(Conditional Diffusion Codec),把 Decoder 换成扩散模型 —— Diffusion 的"多步细化"让低 bpp 修复进一步突破,但解码慢到秒级。同年 Microsoft Research 推出 ContextFormer,把 transformer 引入 hyperprior。学界平均每 6 个月一篇"超越 AVIF 30%"的论文,但生产部署数依然接近 0 —— 没有一家主要 web 平台(Google Photos / iCloud / WhatsApp / Twitter / Reddit)把 NN codec 上线给终端用户。

为什么"学术 30%、生产 0%"?根因有四:(a) 解码必须 GPU,移动端被排除,而移动端是图片流量的 70%+;(b) 模型分发反向负担,几十 MB 模型对 web 来说是巨大成本,只有"内容量 ≫ 模型大小"的场景才划算(比如 Stadia 云游戏每秒 60 帧 × 几小时,确实划算 —— 但 Stadia 关了);(c) 码流不向后兼容,Engineering 角度不可接受 —— JPEG 1992 的图 2026 年还能解,NN codec 的图过几年模型版本变了就死了;(d) "幻觉细节"在很多场景不能用,法医、医学、卫星观测这些需要每个像素都真实的领域直接拒之门外。结论:NN codec 短期(5 年)不会替代 AVIF / JPEG,但可能在AI 生成内容 + AI 压缩这条特殊垂直生态率先落地 —— 同 AI 生成的图,用 AI 压缩,反正都是"合成内容",不必担心"真实性"问题。

In 2016 Toderici et al. published "Variable-Rate Image Compression with Recurrent Neural Networks" at ICLR — the first true end-to-end NN image compressor: an RNN iterates, outputting residuals, stacking up to a target bpp. Compression was barely on par with JPEG and practical value limited, but it established the "end-to-end trainable codec" framework. Google Research (Toderici's home) led the line afterwards.

In 2018 Ballé et al. (NYU + Google) at ICLR shipped "Variational Image Compression with a Scale Hyperprior", introducing the Hyperprior: a small network estimates the latent's per-channel probability distribution for the arithmetic coder. Suddenly the latent's statistical structure becomes learnable, entropy efficiency jumps, and on MS-SSIM / PSNR it surpassed BPG (HEVC intra) for the first time. Hyperprior became standard equipment in every NN codec since; Ballé's paper is the field's foundation stone, cited nearly ten thousand times.

At NeurIPS 2020, Mentzer / Toderici / Tschannen / Agustsson (Google Research) published "High-Fidelity Generative Compression" — HiFiC. On top of Hyperprior they added GAN training; at very low bpp (0.1–0.3) visual quality clearly beat BPG / AVIF — user studies (asking humans which they preferred) found HiFiC 0.237 bpp tied with BPG 0.5 bpp and held its own against BPG 1.0 bpp. The first NN-codec paper to "clearly win on subjective quality"; it sparked a wave of industry attention. Google reportedly piloted it for Stadia cloud-gaming video streams — but Stadia closed in 2023.

Through 2022–2023 Tencent shipped ELIC; NVIDIA / Disney Research each pushed their own NN codecs; the field became crowded. In 2023 Stanford released CDC (Conditional Diffusion Codec), replacing the Decoder with a diffusion model — diffusion's "multi-step refinement" pushes low-bpp restoration further, at the cost of seconds-per-frame decode. The same year Microsoft Research's ContextFormer brought transformers into the hyperprior. Roughly one paper every six months claims "+30% over AVIF", yet production deployments remain near zero — no major web platform (Google Photos / iCloud / WhatsApp / Twitter / Reddit) ships NN codecs to end users.

Why "+30% in papers, 0% in production"? Four root causes: (a) GPU-required decode rules out mobile, which is 70%+ of image traffic; (b) inverted distribution load — tens of MB of model is huge for the web, only "content ≫ model size" scenarios pay off (Stadia at 60 fps × hours did pay off — but Stadia closed); (c) no bitstream backward compatibility, an engineering deal-breaker — a JPEG from 1992 still decodes in 2026, but an NN-codec image dies the next time the model version changes; (d) hallucinated detail is unusable in many scenes — forensic, medical, satellite imaging all need every pixel to be true and reject NN codecs flat. Bottom line: NN codecs won't replace AVIF / JPEG in the next five years, but may land first in the special vertical of AI-generated content + AI compression — AI images compressed by AI, where everything is synthetic anyway and "authenticity" doesn't apply.

codec	year	author	feature
Toderici LSTM	2016	Google	早期 RNN-based · 概念奠基
Ballé Hyperprior	2018	NYU / Google	Gaussian Hyperprior · 领域基石
HiFiC	2020	Google Research	GAN-based · 低 bpp 王
ELIC	2022	Tencent	Efficient Learned Image Compression · 实用化向
CDC	2023	Stanford	Diffusion-based decoder · 多步去噪
ContextFormer	2023	Microsoft Research	Transformer-based hyperprior

$ pip install compressai                          # PyTorch NN codec 库 · InterDigital
$ python -m compressai.utils.eval_model \
    -a bmshj2018-hyperprior \
    -m /path/to/model -d in.png                   # Ballé 2018 Hyperprior 示例
$ python compressai cli encode model_hific in.png out.bin
$ python compressai cli decode model_hific out.bin recon.png
$ pip install neuralcompression                   # Meta(Facebook AI Research)NN codec
$ python -c "from compressai.zoo import bmshj2018_hyperprior; \
             m = bmshj2018_hyperprior(quality=8, pretrained=True)"

适用

USE FOR

(未来)AI 生成内容压缩(同生态:AI 图 + AI codec)
极低 bpp(< 0.3)+ 服务器侧 GPU 可用的场景
云游戏 / 流媒体的服务器侧解码 → 转 AVIF 流出
研究 / 学术评测 / 数据集压缩(实验性)
"内容 ≫ 模型大小"的高带宽专用通道(Stadia 类)

(future) AI-generated-content compression — AI image + AI codec, same ecosystem
Very-low-bpp (< 0.3) scenes with server-side GPUs
Cloud-gaming / streaming: server decode → re-encode as AVIF on the way out
Research / academic benchmarking / dataset compression
"Content ≫ model size" specialised channels (Stadia-style)

反适用

AVOID

当前 web · 任何无 GPU 端(移动 / IoT / 老电脑)
需要"每个像素真实"的场景(法医 / 医学 / 卫星)
需要长期归档(码流不向后兼容)
客户端实时编码(GPU 编码也很贵 · 普通用户上传不行)
低流量场景(模型 30-150 MB 摊不平)

Today's web · any GPU-less endpoint (mobile / IoT / old PCs)
Anything needing "every pixel real" (forensic / medical / satellite)
Long-term archives (no bitstream backward compatibility)
Client-side real-time encoding (GPU encode is also expensive — user uploads can't take it)
Low-volume scenes (the 30–150 MB model can't amortise)

scope	readers	editors / pipelines	CLI
NN codec(各家不互通)	✗ 无任何浏览器原生 · ~ compressai(InterDigital)· neuralcompression(Meta)· 各家自家 SDK	✗ 无主流编辑器 · 仅研究代码 · TensorFlow Compression · PyTorch + 自训模型	`compressai cli encode/decode` · `tfci`(TensorFlow Compression)· 各家自家 CLI 工具

奇闻 · TRIVIA

TRIVIA

HiFiC 在 0.1 bpp(极低)下肉眼看比 BPG / AVIF 好 30-50%,但生成的细节是 "plausibly hallucinated" —— Decoder 看到 latent 里"这里应该是头发",GAN 训练让它合成看起来像头发的纹理,但每根头发的具体走向是它编出来的。这不是真实的图像信息,所以 NN codec 不能用于法医取证、医学影像、卫星观测、法律证据 —— 任何"每个像素都必须真实"的场景。Google 内部用神经压缩做 Stadia 云游戏视频流 —— 但 Stadia 2023 年 1 月关闭运营,这个生产部署案例随之消失。神经压缩研究人员的圈内笑话:"每 6 个月一篇 NeurIPS 论文宣布超越 AVIF 30%,但没人能在生产部署。" 这话从 2020 年开始流传,至今仍然成立。

HiFiC at 0.1 bpp (very low) looks 30–50% better than BPG / AVIF — but the synthesised detail is "plausibly hallucinated": the Decoder sees "hair" in the latent, GAN training makes it synthesise texture that looks like hair, but the actual direction of each strand is invented. That's not real image information, so NN codecs are forbidden for forensic, medical, satellite, legal-evidence — any scene where "every pixel must be real". Google reportedly used neural compression on Stadia's cloud-gaming video stream — but Stadia shut down in January 2023, taking the production deployment with it. The community in-joke: "Every six months a NeurIPS paper claims +30% over AVIF, but no one can deploy it in production." Has been true since 2020 and still is.

←起源:origin: Toderici 2016 ICLR(RNN)· Ballé 2018 ICLR(Hyperprior) ←里程碑:milestone: HiFiC 2020 NeurIPS(Google)· CDC 2023(Stanford) ↔思想颠覆:paradigm break: 与传统 codec(JPEG / AVIF)正交 —— 端到端可训练 vs 手工设计 ↔短期并存:parallel niche (short term): 不替代任何主流 codec · 仅作研究 / 实验存在 →未来生态:future ecosystem: 与 AI 生成图像(diffusion / GAN)同生态 · "AI 内容 + AI 压缩"垂直闭环

HEIC Live Photo — 苹果的图 + 视频混合容器

HEIC Live Photo — Apple's still + video twin container

YEAR 2015(iPhone 6s 首发) AUTHOR Apple EXT .heic + .mov(双胞胎) REAL 1 张 HEIC 静图 + 1 段 MOV(3 秒 H.264)+ UUID 关联 DURATION 前 1.5 秒 + 后 1.5 秒(无声)· 3-5 MB STATUS iOS 主流(iPhone / iPad)· 跨平台兼容差

"一张照片其实是一个 .heic + 一个 .mov 的双胞胎。"

"One 'photo' is actually a twin: one .heic and one .mov."

2015 年 9 月,Apple 在 iPhone 6s 上推出 Live Photo —— 拍照时同时录下前 1.5 秒 + 后 1.5 秒共 3 秒视频,让"静态照片"在长按时能"动一下"。技术上这不是一种新的图像格式,而是一个双文件容器思路:1 张 HEIC 静图(iOS 11 起 HEIC 取代 JPEG 成为默认)+ 1 段 MOV 视频(H.264 1080p 25fps 无声),通过 metadata 里的 asset identifier UUID 关联,Photos.app(iOS / macOS)把它们当一个对象呈现。这种"图 + 视频组合"是HEIC(基于 HEIF 的 ISOBMFF 容器)在工程层的延伸 —— HEIF 容器原生支持图像 + video track 共存(参见AVIF Sequence),但 Apple 选择不把它们装进同一个 HEIF 文件,而是分两个文件靠 UUID 维系。原因可能是 backward compatibility:老的 .heic / .mov 工具不需要为 Live Photo 改动,各自能独立打开。代价就是跨平台传输:AirDrop 给非 iOS 设备时,MOV 部分会丢失,接收方只看到一张静图。它是"关联式混合容器格式"在消费级场景的代表 —— 同思路 Google 的 Motion Photo(2017)、Samsung 的 Motion Photo 都是 .jpg 加内嵌 mp4,只是合在一个文件里。

In September 2015 Apple introduced Live Photo on the iPhone 6s — the camera simultaneously records 1.5 s before and 1.5 s after the shot, three seconds total, so a "still photo" can "move a little" on long-press. Technically it's not a new image format but a twin-file container idea: one HEIC still (HEIC replaced JPEG as the default starting iOS 11) + one MOV video (H.264 1080p 25 fps, no audio), linked via an asset identifier UUID in the metadata, with Photos.app (iOS / macOS) presenting them as one object. The "still + video pair" extends HEIC (an ISOBMFF-based HEIF container) at the engineering layer — HEIF natively supports image + video track in one file (see AVIF Sequence), but Apple chose not to pack them into a single HEIF file, instead splitting across two files held together by UUID. Probably for backward compatibility: legacy .heic / .mov tools needed no Live-Photo changes; each opens independently. The cost is cross-platform transfer — AirDrop to a non-iOS device drops the MOV; the receiver sees only the still. It's "linked-pair hybrid container" at consumer scale — Google Motion Photo (2017) and Samsung's equivalent take the same idea but pack the .jpg and the mp4 into a single file.

图 58 · Live Photo 的双文件结构。同一次拍摄产生两个文件:IMG_0001.HEIC(主静图,HEVC intra,2-3 MB)+ IMG_0001.MOV(3 秒 H.264 视频,1080p 25fps 无声,3-5 MB)。两者通过 metadata 里相同的 asset identifier UUID 关联(.heic 的 UUID box / .mov 的 com.apple.quicktime.content.identifier metadata),Photos.app(iOS / macOS)读两个文件的 UUID 一致就把它们当作一张 Live Photo 呈现。AirDrop 给非 iOS 设备时,MOV 不会被识别为关联资产,只有 HEIC 静图过去 —— 这是 Live Photo 跨平台兼容差的根因。

Fig 58 · Live Photo's twin-file structure. One capture creates two files: IMG_0001.HEIC (main still, HEVC intra, 2–3 MB) + IMG_0001.MOV (3-second H.264, 1080p 25 fps, audioless, 3–5 MB). They're linked via a shared asset identifier UUID in metadata (.heic's UUID box / .mov's com.apple.quicktime.content.identifier); Photos.app (iOS / macOS) sees the matching UUIDs and presents them as one Live Photo. AirDrop to a non-iOS device doesn't recognise the MOV as a linked asset — only the HEIC still travels — which is exactly why Live Photo's cross-platform compatibility is poor.

技术内核

Technical core

Live Photo 内核两件事。① 双文件 + UUID 关联:拍照那一刻 iPhone 同时存两个文件 —— IMG_xxxx.HEIC(默认 iOS 11+ 静图格式 · HEVC intra block 编码 · ~2-3 MB)+ IMG_xxxx.MOV(QuickTime 容器装 H.264 · 1080p 25fps · 无声 · 前 1.5 + 后 1.5 共 3 秒 · ~3-5 MB)。两者通过 metadata 里相同的 asset identifier UUID 关联:.heic 在 ISOBMFF 的 uuid box 里写,.mov 在 moov.meta.keys.com.apple.quicktime.content.identifier 里写。这套关联机制是 Apple 私有的,但 UUID 字段格式在 iOS Photos 框架里有公开 API。② Photos.app 把两个文件当一个对象:iOS / macOS 的 PhotoKit 框架在导入照片时检测到匹配的 UUID 就自动绑定,UI 层呈现一个图标(单张静图 + 长按播放视频),云端同步(iCloud Photos)也作为一个 asset 同步。第三方 app 想读 Live Photo 必须走 PhotoKit 的 PHAssetResource API —— 直接读两个文件 + 匹配 UUID 也行,但要自己实现绑定逻辑。AirDrop / iMessage 在 Apple 设备间能保留双文件;但跨平台(发到 Android / Windows)只发 HEIC,MOV 部分丢失 —— 这是"双文件容器"路线最大的代价。

Live Photo's core, two pieces. ① Twin files + UUID link: the iPhone stores two files at capture — IMG_xxxx.HEIC (default iOS 11+ still format · HEVC intra · ~2–3 MB) + IMG_xxxx.MOV (QuickTime container with H.264 · 1080p 25 fps · no audio · 1.5 s before + 1.5 s after = 3 s · ~3–5 MB). They're linked via a shared asset identifier UUID: the .heic writes it in an ISOBMFF uuid box; the .mov writes it under moov.meta.keys.com.apple.quicktime.content.identifier. The mechanism is Apple-private, but the UUID field format is exposed via public APIs in iOS's Photos framework. ② Photos.app treats them as one asset: iOS / macOS PhotoKit detects matching UUIDs on import and binds them automatically; the UI shows a single icon (still image, long-press to play video); iCloud Photos syncs them as one asset. Third-party apps wanting to read Live Photos must go through PhotoKit's PHAssetResource API — reading the two files directly and matching UUIDs works too, but you implement the binding yourself. AirDrop / iMessage between Apple devices preserves both files; cross-platform (to Android / Windows) only the HEIC goes, losing the MOV — the biggest cost of the "twin-file container" path.

适用

USE FOR

iPhone / iPad Live Photo 拍照(默认开启)
Apple 生态内分享(AirDrop / iMessage / iCloud)
iOS 锁屏 / 壁纸"动起来"效果
macOS Photos.app 浏览 / 编辑(导出可选静图 / GIF / video)

iPhone / iPad Live Photo capture (on by default)
Within Apple ecosystem (AirDrop / iMessage / iCloud)
iOS lock-screen / wallpaper "moving" effects
macOS Photos.app browsing / editing (export to still / GIF / video)

反适用

AVOID

任何非 Apple 生态(Android / Windows / Web)
跨平台分享(MOV 丢失,只剩静图)
第三方 app 不走 PhotoKit 的话需自己处理 UUID 绑定
需要"单文件即可"的纯静图场景(用 HEIC / JPEG)

Anything outside Apple's ecosystem (Android / Windows / Web)
Cross-platform sharing (MOV lost, still-only remains)
Third-party apps not on PhotoKit must handle UUID binding themselves
Pure stills where one file suffices (use HEIC / JPEG)

scope	readers	editors	CLI
HEIC Live Photo	✓ iOS Photos · macOS Photos · PhotoKit API · ~ 第三方 heif-tools(部分)· ✗ Web 浏览器无	✓ iOS Photos · macOS Photos · 第三方 Live Photo 编辑 app(Lively · Motion Stills 已停)	~ `exiftool` 可读 UUID metadata · `ffmpeg` 处理 MOV 部分 · `heif-info` 读 HEIC

奇闻 · TRIVIA

TRIVIA

Live Photo 长按时播放的 3 秒视频实际是 H.264 25fps、1080p、无声,文件大小 3-5 MB —— 比 GIF 小、比 mp4 短的中间形态。每张 Live Photo 在相册里看着是一张图,但占的存储是一张 HEIC + 一段 MOV(总 5-8 MB),iCloud 备份时按一个 asset 同步。Live Photo AirDrop 给非 iOS 设备时,MOV 部分会丢失 —— 接收方只看到一张静图,这是"双文件靠 UUID 关联"路线的根本代价。Google 后来推 Motion Photo(2017)和 Samsung 的同名功能选择了不同路:把 .jpg 和内嵌 .mp4 装进同一个文件(MP4 box 嵌在 JPEG APP1 segment 里),靠"单文件"避开了 Live Photo 的跨平台问题 —— 但代价是修改了 JPEG 文件结构,某些工具会把内嵌 mp4 视为损坏数据。

The 3-second video Live Photo plays on long-press is H.264 25 fps, 1080p, audioless, 3–5 MB — smaller than a GIF, shorter than a mp4. A Live Photo looks like one image in the gallery but takes up one HEIC plus one MOV (5–8 MB combined); iCloud syncs them as a single asset. AirDrop to a non-iOS device loses the MOV — only the still arrives — the fundamental cost of the "twin files linked by UUID" route. Google later shipped Motion Photo (2017), and Samsung's eponymous feature chose differently: pack the .jpg and an embedded .mp4 into one file (an MP4 box hidden inside a JPEG APP1 segment), avoiding cross-platform issues with a single file — at the cost of modifying JPEG's structure, so some tools see the embedded mp4 as corrupt data.

←起源:origin: Apple · 2015 iPhone 6s 首发 · iOS 9 ←基于:based on: HEIC 静图 + QuickTime MOV 视频 · UUID metadata 关联 ↔同思路:cousins: Google Motion Photo(2017)· Samsung Motion Photo · 单文件 vs 双文件之争 →现实定位:real position: iOS 主流 · 跨平台兼容差 · 概念上启发了消费级"图 + 视频混合容器"

命令行 codec 一览

Command-line codec roster

这一节是查询表 —— 按"目标格式"找对应工具。所有命令均假设你已经装好对应工具(brew / apt 安装名见每行末尾)。命令风格各家不一,但参数语义大致互通:-q / --quality 控质量、-o / --output 给输出文件、-s / --speed 调编码速度(慢 = 小)。

A reference table — find the right tool by output format. Each row assumes you have installed the package (Homebrew / apt name at the end). Each codec uses its own flag dialect, but the semantics roughly converge: -q / --quality for quality, -o / --output for the output file, -s / --speed for the encoder speed (slower = smaller).

format	encoder	decoder	typical command	install
JPEG	cjpeg / mozjpeg / jpegli	djpeg	`cjpeg -quality 85 -optimize in.ppm > out.jpg`	`libjpeg-turbo`
PNG	oxipng / pngcrush / optipng	libpng	`oxipng -o6 in.png`	`oxipng`
WebP	cwebp	dwebp	`cwebp -q 75 in.png -o out.webp`	`libwebp`
AVIF	avifenc	avifdec	`avifenc -s 6 -a end-usage=q -a cq-level=23 in.png out.avif`	`libavif`
JPEG XL	cjxl	djxl	`cjxl in.png out.jxl --quality 90`	`libjxl`
HEIC	heif-enc	heif-dec	`heif-enc -q 60 in.png -o out.heic`	`libheif`
GIF	gifsicle / convert	gifsicle	`gifsicle --colors 256 -O3 in.gif > out.gif`	`gifsicle`
BC1-7 (DDS)	nvtt_export / texconv / ispc_texcomp	D3D / OpenGL native	`nvtt_export --bc7 in.png -o out.dds`	`nvtt`
ASTC	astcenc	astcdec / GPU	`astcenc -cl in.png out.astc 6x6 -medium`	`astcenc`
KTX2	toktx	ktxinfo	`toktx --bcmp 7 out.ktx2 in.png`	`KTX-Software`
Basis	basisu	basisu	`basisu -ktx2 in.png -output_file out.ktx2`	`basis_universal`
OpenEXR	oiiotool / exrtools	OpenImageIO	`oiiotool in.png -o out.exr`	`OpenImageIO`
TIFF	libtiff / convert	libtiff	`convert in.png -compress lzw out.tif`	`libtiff`
RAW	—	dcraw / LibRaw / rawtherapee-cli	`dcraw -v -w in.NEF`	`libraw`
DICOM	dcmconv	dcmdump / dcm2pnm	`dcm2pnm in.dcm out.pnm`	`dcmtk`
SVG	svgo / inkscape / resvg	resvg / browser	`svgo in.svg -o out.svg`	`svgo`
FITS	astropy / cfitsio	astropy / ds9	`python -c "from astropy.io import fits; ..."`	`astropy`
generic	libvips / ImageMagick	same	`vips copy in.png out.avif[Q=60]`	`libvips`

几个值得知道的事实

A few worth-knowing facts

libvips 是隐藏的性能王者 —— 大批量处理比 ImageMagick 快 5-10×、内存占用低 10×。所有需要"批量转码 100k+ 张图"的场景都应该首选 vips。
jpegli 是 Google 2024 年从 libjxl 仓库剥出来的"现代 JPEG 编码器" —— 同 quality 比 mozjpeg 体积小约 35%,而且产出仍是合法 JPEG,所有 JPEG 解码器都能读。
oxipng 是 Rust 写的 PNG 重压缩器 —— 比 pngcrush 快 5-10×,体积稍小;oxipng -o6 是大多数项目的默认配方。
Squoosh CLI(npm i -g @squoosh/cli)是浏览器中 squoosh.app 的命令行版本 —— 一个 Node 包搞定 AVIF / WebP / JXL / mozjpeg / oxipng,适合 CI 流水线。

libvips is the hidden performance king — bulk pipelines run 5-10× faster than ImageMagick at one-tenth the memory. Any "batch-convert 100 k images" job should reach for vips first.
jpegli, spun out of the libjxl repo by Google in 2024, is the "modern JPEG encoder" — about 35% smaller than mozjpeg at the same quality, and the output is still legal JPEG that every decoder can read.
oxipng is a Rust PNG re-packer — 5-10× faster than pngcrush and slightly smaller; oxipng -o6 is the default recipe for most projects.
Squoosh CLI (npm i -g @squoosh/cli) is the command-line cousin of the browser-based squoosh.app — one Node package wraps AVIF / WebP / JXL / mozjpeg / oxipng, ideal for CI pipelines.

DevTools 看响应头与解码时间

DevTools — response headers & decode time

浏览器选择哪种格式不是玄学,而是三个 HTTP 头 + 一段 JS 解码任务共同决定的。Network 面板看 Accept(请求时浏览器宣告支持哪些格式)、Content-Type(响应里服务器实际返回什么)、Content-Length(字节数)三个头,这是picture / source 协商的全部凭证。Performance 面板里"Decode Image"任务才是真实的代价 —— AVIF 比 JPEG 慢 3 倍、JXL 又比 AVIF 快 2 倍,这些差异在 4G 慢网下会被字节数掩盖,在 5G 快网或本地 CDN 下却开始主导首屏渲染时间。

Which format the browser picks is not magic — it's decided by three HTTP headers plus a JS decode task. The Network panel shows Accept (the browser announces what it supports), Content-Type (what the server actually returns), and Content-Length (the byte count). These three are the entire vocabulary of picture / source negotiation. The Performance panel's "Decode Image" task is the real cost — AVIF decodes about 3× slower than JPEG, JXL about 2× faster than AVIF; over slow 4G the byte savings dominate, but on 5G or a near CDN, decode time starts to set first-paint.

Network 面板

Network panel

图 a · DevTools Network 面板某图片请求。Accept(请求头)+ Content-Type(响应头)+ Vary: Accept(响应头)三段共同构成"按浏览器选格式"的协商凭证。CDN 上一定要设 Vary: Accept,否则 Chrome 拿到 AVIF 后,Safari 也会拿到同一份 AVIF —— 然后解码失败。

Fig a · A typical image request in DevTools Network. Accept (request) + Content-Type (response) + Vary: Accept (response) form the negotiation contract. CDNs must set Vary: Accept, otherwise Chrome's AVIF cache will be served to Safari, which then fails to decode.

Performance 面板 — Decode 任务

Performance panel — decode task

图 b · Performance 面板时间线模拟。同一张 1080p 图,JPEG 解码 ~5 ms、WebP ~8 ms、AVIF ~18 ms、JXL ~7 ms —— AVIF 是当前主流格式里解码最慢的,因为 AV1 帧内解码本来就比 VP8 / JPEG 重 2-3×。在 5G 快网下,AVIF 节省的 30% 字节可能被慢解码吃掉,这也是 picture/source 协商策略要"看场景挑选"的根因。

Fig b · A simulated Performance timeline. Same 1080p image: JPEG ~5 ms, WebP ~8 ms, AVIF ~18 ms, JXL ~7 ms — AVIF is the slowest decoder among today's mainstream formats because AV1 intra decoding is intrinsically 2-3× heavier than VP8 / JPEG. On fast 5G networks, AVIF's 30% byte savings can be eaten by the slow decode — exactly why picture/source negotiation is a per-scenario choice, not a one-size-fits-all rule.

picture + source fallback 链

picture + source fallback chain

图 c · <picture> + 多 <source> 的 fallback 链是树状匹配。浏览器自顶向下扫描,遇到第一个 type 自己支持的就停下,后面的 source 完全不下载。所以"AVIF → WebP → JPEG"的顺序很重要 —— 反过来写 JPEG 永远赢,AVIF 永远没机会。

Fig c · <picture> with multiple <source> tags is a tree-shaped match. The browser scans top-down and stops at the first type it supports — every later source is never fetched. Order matters: "AVIF → WebP → JPEG" is correct; reversed, JPEG always wins and AVIF never gets a chance.

同图横评 — 解码时间条形图

Same image — decode time bars

图 d · 同图 4K · 解码时间条形图。AVIF 字节最小,但解码最慢;JXL 字节也小、解码却比 WebP 还快 —— 这正是 JXL 在"现代格式"中独特的位置。本图基于 M1 Mac · 主线程 · 单核 · 仅作量级示意,真实数据按机型 / 编码参数浮动 ±50%。

Fig d · Same 4K image, decode time bars. AVIF wins on bytes but loses on decode; JXL is small and faster than WebP — exactly why JXL occupies its unique slot among the "modern" trio. Numbers measured on M1 Mac, main thread, single core; treat as orders of magnitude — real numbers shift ±50% with hardware and encoder settings.

把"Accept 头 + Content-Type 响应 + Vary: Accept 缓存指令"理解透,你就抓住了"为什么这台浏览器收到 AVIF、那台收到 JPEG"的全部机理。把 Performance 面板里的 Decode Image 长度看习惯,你就知道"是不是该用 AVIF"不只是字节问题,而是字节 ÷ 解码时间的比值问题。

Understand the trio "Accept request + Content-Type response + Vary: Accept cache directive" and you have the full mechanism for "why this browser got AVIF and that one got JPEG." Get used to reading Decode Image durations in the Performance panel, and "should I serve AVIF" stops being a byte question and becomes a bytes ÷ decode-time ratio question.

libvips vs ImageMagick — 性能对比表

libvips vs ImageMagick — performance comparison

两个最常见的"通用图像处理库",设计目标完全相反 —— ImageMagick 1990 年代生于 Unix 工具传统,先把整张图解码到内存再做操作,简单直接;libvips 1990 年代末由 VIPS 项目演化而来,核心思路是streaming pipeline:像素一行一行流过处理链,从不把整张图加载进内存。这条架构差异在批量处理场景被放大成 5-10× 的速度差和 10× 的内存差,直接决定了它们各自的最佳战场。

The two most common general-purpose image-processing libraries are designed for opposite goals. ImageMagick, born of 1990s Unix tools, decodes the whole image into memory and operates on it — simple and direct. libvips evolved from the VIPS project of the late 1990s and is built on a streaming pipeline: pixels flow through the chain row by row, the full image never loads into RAM. That single architectural choice expands into 5-10× speed and 10× memory differences at scale — and that decides which library belongs where.

metric	libvips	ImageMagick
设计 / design	streaming + parallel(pthread)	full-load(全图入内存)
100 张 4K → JPEG 时间	~8 s	~60 s
100 张 4K → AVIF 时间	~120 s	~600 s
峰值内存 / peak RAM	~50 MB	~500 MB
命令行 / CLI	`vips copy in.png out.jpg[Q=85]`	`convert in.png -quality 85 out.jpg`
学习曲线 / learning curve	中(API 风格独特)	低(命令名好记)
format coverage	常用 + 现代(AVIF / WebP / JXL)	250+ 格式(含老格式 / 罕见容器)
典型用法 / typical use	批量服务 · 高吞吐 thumbnail	单张定制 · 复杂滤镜 · 老格式恢复

数据基于 libvips 官方 benchmark + 社区验证,实测值因机型 / 任务类型浮动 ±30%。结论是稳定的:需要批量、需要省内存、需要快 用 libvips;需要冷门格式、需要复杂滤镜、单次任务 用 ImageMagick。

Numbers are taken from the libvips official benchmark plus community runs; real values shift ±30% by hardware and task type. The takeaway is robust: pick libvips when you need bulk, low memory, high speed; pick ImageMagick when you need rare formats, complex filters, or one-off jobs.

两种内存模型示意

Two memory models

图 e · 两种内存模型对比。libvips(上)是 streaming pipeline,像素一行一行接力穿过处理节点,峰值内存 ~50 MB;ImageMagick(下)是 full-load,先把整张 4K 图解到内存(~500 MB),所有处理就地完成。两种模型在"单张 100×100 缩略图"上看不出差,但在"批量 100k 张 4K → AVIF"任务里,libvips 用 1/10 的内存跑出 5× 的速度。

Fig e · Two memory models. libvips (top) is a streaming pipeline — pixels relay through processing nodes row by row at a ~50 MB peak. ImageMagick (bottom) is full-load — the whole 4K image decompresses into RAM (~500 MB), all operations happen in place. The difference is invisible on a single 100×100 thumbnail; on a 100 k batch of 4K → AVIF, libvips uses one-tenth the memory at 5× the speed.

一句记忆口诀:"vips 流水、IM 大屋" —— vips 像生产线传送带,材料(像素行)源源不断流过工位;ImageMagick 像把所有材料堆进一个大房间再一起加工。两种思路都对,只是适合不同规模。

A mnemonic: "vips is the conveyor, IM is the warehouse" — vips moves rows past stations like an assembly line; ImageMagick piles everything into one big room and processes in place. Both philosophies work; they just fit different scales.

「我应该用哪个格式」决策树

"Which format should I use" — decision tree

本文走完 50+ 格式之后,最常被问到的还是这一句:"那我到底用哪个?"答案分两层:第一层是用途场景 —— 屏幕显示、GPU 纹理、HDR 影视、科学医学,用途不同候选集就完全不同;第二层是具体约束 —— 兼容老浏览器吗?要透明通道吗?是 16-bit 工程影像吗?这张决策树是出发点,不是教条 —— 真实工程里你可能因为某个客户的 IT 政策只能用 JPEG,或因为某个 GPU 不支持 BC7 而退回 BC1,这些场景决策树之外。

After 50+ formats, the question we still get most is: "OK, so which one do I use?" The answer has two layers. Layer one is use case — screen display, GPU texture, HDR film, science / medicine; different domains, totally different shortlists. Layer two is specific constraints — must support legacy browsers? need alpha? 16-bit engineering imagery? This tree is a starting point, not gospel — real projects sometimes pin you to JPEG for IT-policy reasons, or fall back from BC7 to BC1 because of a target GPU; those edge cases live outside the tree.

树根问"用途",叶子才到具体格式。中间几跳问的是"老浏览器要不要兜底""有没有透明""是不是 16-bit"。同一片叶子(比如"屏幕显示 · 照片"),最终选 AVIF 还是 JPEG,取决于客户群是不是全在 Safari 16+。这棵树没有"绝对正确",只有"在你的约束下,谁先 +谁兜底"。

The root asks "what for"; only the leaves name a format. The middle hops ask "do legacy browsers need a fallback?" "is there alpha?" "is it 16-bit?". On the same leaf — say "screen · photo" — choosing AVIF over JPEG depends entirely on whether your audience is all on Safari 16+. The tree has no absolute right answer; only "given your constraints, what's the primary and what's the fallback?"

四个职业场景的"开箱组合"

Four professional starter kits

CASE 1 · WEB 设计师 · WEB DESIGNER

网页设计师的图片栈

A web designer's image stack

主图(hero / 文章插图):AVIF(Q70,~30% smaller than JPEG)+ WebP 兼容兜底 + JPEG 终极兜底,用 <picture> 三层 source。Logo / icon:SVG 优先(矢量 + 任意缩放),PNG 兜底给不渲染 SVG 的环境(老 Outlook / 部分 RSS reader)。动图:WebP anim 或 AVIF anim,千万别用 GIF(同图体积大 10×)。UI 截图 / 设计稿:PNG(无损 + 锐边)。

Hero / inline images: AVIF (Q70, ~30% smaller than JPEG) + WebP compat + JPEG final fallback, served through a three-layer <picture>. Logo / icon: SVG first (vector, scale-free), PNG fallback for SVG-blind environments (old Outlook, some RSS readers). Animation: WebP anim or AVIF anim — never GIF (10× larger at the same quality). UI screenshots / design exports: PNG (lossless + sharp edges).

CASE 2 · 游戏纹理师 · GAME TEXTURE ARTIST

游戏纹理师的输出栈

A game texture artist's pipeline

桌面 GPU:BC7(albedo / RGBA 通用)+ BC5(法线贴图,XY 双通道精度高)+ BC4(灰度,roughness / metallic)。移动 GPU:ASTC 6×6(常规质量)或 4×4(高质量),iOS 和 Android 14+ 都支持。跨平台容器:KTX2 + Basis Universal —— 一份"中间码"在运行时按目标 GPU 转码到 BC7 或 ASTC,资产体积省一半。源文件:16-bit TIFF / PNG / EXR(法线和 displacement),工具链(Substance / nvtt)从源压到目标格式。

Desktop GPU: BC7 (albedo / generic RGBA) + BC5 (normal maps, two-channel high precision) + BC4 (greyscale, roughness / metallic). Mobile GPU: ASTC 6×6 (normal quality) or 4×4 (high quality); supported on iOS and Android 14+. Cross-platform container: KTX2 + Basis Universal — one "middle code" transcoded at runtime to BC7 or ASTC, halving asset bytes. Source files: 16-bit TIFF / PNG / EXR (normals, displacement); tooling (Substance, nvtt) compresses sources to the target format.

CASE 3 · 摄影师 · PHOTOGRAPHER

摄影师的归档与发布栈

A photographer's archive & delivery

归档(原始):DNG(Adobe 的开放 RAW 容器,跨厂商兼容)优先于 NEF / CR3 / ARW(各厂商私有);后期源:16-bit TIFF(ProPhoto / sRGB)或 32-bit EXR(HDR);分享(社交):JPEG Q85-90(兼容一切设备),Instagram / 微信仍只吃 JPEG;个人网站:AVIF Q70 + JPEG 兜底,体积省 30%,Safari 16+ 都能直接看;打印:TIFF 16-bit 或 PDF/X-1a(印刷标准)。

Archive (original): DNG (Adobe's open RAW container, cross-vendor) over NEF / CR3 / ARW (proprietary); Post-production source: 16-bit TIFF (ProPhoto / sRGB) or 32-bit EXR (HDR); Social sharing: JPEG Q85-90 (universally supported) — Instagram and WeChat still only accept JPEG; Personal website: AVIF Q70 + JPEG fallback, ~30% smaller, native to Safari 16+; Print: 16-bit TIFF or PDF/X-1a (print standard).

CASE 4 · 学术研究 · ACADEMIC RESEARCH

学术 / 科研工作者的格式栈

An academic researcher's stack

天文:FITS(Flexible Image Transport System)是天文界的事实标准 —— Hubble / JWST / 地基望远镜全用它,数据头里带 WCS 坐标系。医学:DICOM(Digital Imaging and Communications in Medicine)—— CT / MRI / X-ray 全部走 DICOM,带病人 ID / 设备型号 / 扫描参数 metadata。HDR / 学术成像:PFM(Portable FloatMap,简单 32-bit float)或 EXR;PFM 在学术圈流行因为格式简单,任何脚本能解。论文图 / 印刷:TIFF(无损,印刷友好)或 PDF/EPS(矢量)。

Astronomy: FITS (Flexible Image Transport System) is the de-facto standard — Hubble, JWST, ground-based telescopes all use it, with WCS world-coordinate metadata in the header. Medicine: DICOM (Digital Imaging and Communications in Medicine) — CT / MRI / X-ray all flow through DICOM, with patient ID / device model / scan parameters as metadata. HDR / academic imaging: PFM (Portable FloatMap, plain 32-bit float) or EXR; PFM is popular in academia because the format is simple and any script can parse it. Paper figures / print: TIFF (lossless, print-ready) or PDF / EPS (vector).

这四个组合不是"最优解",只是"开箱推荐"。真实工程里你会遇到甲方只接受 JPEG、Unity 强制要求 ASTC LDR、医院 PACS 系统只识别 DICOM 1995 子集 —— 这些约束才是决策树之外的真正变量。但当约束消失时,这四套组合是 2026 年最不出错的起点。

These four kits aren't "optimal" — they're "out-of-the-box recommendations." Real projects hit constraints — clients accepting only JPEG, Unity demanding ASTC LDR, a hospital PACS that only reads a 1995 DICOM subset — and those constraints are the real variables outside the tree. But when constraints recede, these four kits are the safest 2026 starting points.

像素的归宿

The fate of the pixel

它出生在一颗 CMOS sensor 的硅井底部 —— 一个 14-bit 的电荷,被 ADC 抬上数字总线,被相机固件写进一个叫 .ARW 的 RAW 文件,sleep 在某张 SD 卡的 NAND 块上六个月没人看。它当时还不是"像素",它只是一个电压样本,一个 16384 之中的整数,带着读出噪声和热噪声,带着一行 EXIF 和一段 ICC profile 等待被解释。

It was born at the bottom of a CMOS sensor's silicon well — a 14-bit charge, lifted onto a digital bus by an ADC, written by camera firmware into a .ARW RAW file, sleeping in the NAND of some SD card for six months with no one looking. It wasn't a "pixel" yet — just a voltage sample, an integer out of 16384, carrying read noise and thermal noise, a line of EXIF, and an ICC profile waiting to be interpreted.

六个月后它被 LibRaw 解码成 16-bit linear,被 Lightroom 调色,被导出成 16-bit TIFF 进 Photoshop 修瑕,被另存为 sRGB JPEG 上传朋友圈,又被同一张图压成 AVIF 上博客 hero,被 Cloudflare CDN 缓存到全球 200 个边缘节点,被一万个浏览器在一分钟内同时解码,在某些 WebGL 场景里它被上传到 GPU 显存压成 BC7 块,被 fragment shader 采样过 12 次,被 ICC profile 从 sRGB 映到 Display P3,被 mipmap 选了 LOD 2,被 trilinear filter 平滑掉了高频。它有时是 24 bit,有时是 8 bit,有时是 4 bit/pixel,有时是浮点。它一直在变形。

Six months later LibRaw decodes it into 16-bit linear, Lightroom grades it, it exports as 16-bit TIFF into Photoshop for retouching, saves as sRGB JPEG to a social feed, gets re-compressed as AVIF for a blog hero, lives in Cloudflare's CDN across 200 edge nodes, decodes simultaneously in ten thousand browsers within a minute, gets uploaded to GPU memory as a BC7 block in some WebGL scene, is sampled 12 times by a fragment shader, gets remapped from sRGB to Display P3 by an ICC profile, picks LOD 2 from a mipmap chain, gets smoothed by a trilinear filter. Sometimes it's 24 bits, sometimes 8, sometimes 4 bits per pixel, sometimes floating point. It never stops changing shape.

图 9.1像素的 8 站旅程,这次不是预告,是回顾。

Fig 9.1The pixel's 8-stop journey — this time as recap, not preview.

它最后变成了屏幕上一个发光的小方块。它当过 RAW、当过 AVIF、当过 BC7、当过显存、当过电压、当过光子。每一段旅程都给它换了一个容器,但它一直是同一颗像素 —— 一个被反复翻译、反复重写、反复压缩、反复采样,却始终保留某种"原意"的微小信号。

It ends as a glowing square on a screen. It has been a RAW, an AVIF, a BC7 block, GPU memory, a voltage, a photon. Every leg of the journey gave it a different container — but it stayed the same pixel: a tiny signal repeatedly translated, rewritten, compressed, and sampled, somehow holding onto its original meaning through every transform.

三个反直觉结论

Three counter-intuitive takeaways

沉淀这五十多种格式之后,有三件事是写完之前没意识到的。

After settling fifty-plus formats into this codex, three things surprised me — none of which I expected before writing.

最古老的格式不一定最差。

The oldest format isn't always the worst.

QOI(2021)的 spec 一页 A4 写得下,实现 300 行 C 代码,比 PNG(1996)简单 100×,却只慢 PNG 3-4× / 大 5-10%。Farbfeld(2014)更激进 —— 干脆不压缩,只做"标头 + 像素"。PCX(1985)的 RLE 在纯色场景甚至比 PNG 还小。简洁是一种持久的设计姿态,不是历史遗物 —— 当 GIF 还在被使用、BMP 还在 Windows 剪贴板里跑、JPEG 仍占 web 图像 60%,你会发现"老"和"差"是两个独立维度。

QOI's spec (2021) fits on one A4 page; the reference implementation is 300 lines of C — 100× simpler than PNG (1996), only 3-4× slower and 5-10% larger. Farbfeld (2014) goes further — no compression at all, just "header + pixels." PCX's RLE (1985) beats PNG on flat-color art. Simplicity is a durable design posture, not a relic — when GIF is still in use, BMP still drives the Windows clipboard, and JPEG still serves 60% of the web, "old" and "bad" turn out to be independent axes.

ASTC 6×6 比 BC7 4×4 看似激进,实际几乎没视觉差。

ASTC 6×6 looks aggressive next to BC7 4×4 — but the visual gap is invisible.

直觉上块越大越糊,但 ASTC 6×6(2.22 bpp)对比 BC7 4×4(8 bpp)显存少 2.25×,ΔPSNR 只有不到 1 dB,SSIM 差异在双盲测里几乎不可分辨。移动游戏开发者一致默认 ASTC 6×6 是甜点;Unity 的 mobile preset 直接以 6×6 为默认。压缩比的甜点不在 4×4,而在让 GPU 缓存命中率最大化的那个块大小 —— 4×4 太奢侈,8×8 太糊,6×6 恰好。

Intuition says bigger blocks blur more — but ASTC 6×6 (2.22 bpp) versus BC7 4×4 (8 bpp) is 2.25× less VRAM with under 1 dB of PSNR loss; SSIM differences are essentially invisible in blind tests. Mobile game devs converge on ASTC 6×6 as the sweet spot; Unity's mobile preset defaults to it. The sweet spot of texture compression isn't 4×4 — it's whichever block size maximizes GPU cache hit rate. 4×4 is luxury, 8×8 is mush, 6×6 is just right.

AVIF 战胜 JXL 不是技术问题,是政治问题。

AVIF didn't out-engineer JXL — it out-politicked it.

JXL 在所有维度都赢:HDR、lossless、JPEG 无损 transcode、渐进式解码、解码速度。但 Chrome 团队在 2022-10 以"业界兴趣不足"为由从 Chromium 砍掉 flag,理由是 AVIF 已经够用 —— 而 AVIF 背后是 AOMedia(Google + Netflix + Amazon + 多家芯片厂的联盟)。技术从不单独决定胜负,生态决定。同样的故事在 WebP vs JPEG2000、HEIC vs JXR、Opus vs Vorbis 反复上演;能写进一个 Chromium 的 if-branch,胜过所有 paper 上的曲线。

JXL wins on every axis: HDR, lossless, lossless JPEG transcode, progressive decoding, decode speed. But in October 2022 the Chrome team pulled the flag from Chromium citing "insufficient ecosystem interest" — AVIF, they argued, was enough. And AVIF stands on AOMedia (Google + Netflix + Amazon + chip vendors). Technology rarely decides alone — ecosystems do. The same story replays in WebP vs JPEG2000, HEIC vs JXR, Opus vs Vorbis: shipping inside one Chromium if-branch beats every curve in every paper.

三个发现共享一个底色:格式不是被它的技术指标决定的,是被使用它的人决定的。

All three share the same undertone: a format isn't decided by its technical merits — it's decided by the people who use it.

参考与扩展阅读

References & further reading

本文写作的关键依据。按章节分组,数据 / 引用全部来自公开来源。如发现错漏,欢迎邮件指正。

Sources this article relies on, grouped by phase. All data and quotations are drawn from public references — corrections welcome by email.

Phase I · Web 显示派

Phase I · Web display

RFC 2083 — PNG (Portable Network Graphics) Specification, 1997
ISO/IEC 15948 — PNG Specification (Second Edition), 2003
ISO/IEC 10918-1 — JPEG, 1992
RFC 1951 — DEFLATE Compressed Data Format Specification
RFC 9082 — AVIF Image File Format
ISO/IEC 23008-12 — HEIF (Image File Format)
AOMedia AV1 Bitstream & Decoding Process Specification, v1.0.0-errata1
libwebp documentation — developers.google.com/speed/webp
mozjpeg — github.com/mozilla/mozjpeg
Squoosh source — github.com/GoogleChromeLabs/squoosh
Cloudflare blog — "Generating WebP, AVIF and JPEG XL all at once"
Jon Sneyers — "The case for JPEG XL", Cloudinary blog (2021)
Chrome JXL removal — bugs.chromium.org/p/chromium/issues/detail?id=1178058
Smashing Magazine — "Comparing JPEG-XL, AVIF, WebP & JPEG" (2022)

Phase II · GPU 纹理派

Phase II · GPU textures

Khronos KTX 2.0 specification — registry.khronos.org/KTX/specs/2.0/ktxspec_v2.html
Basis Universal — github.com/BinomialLLC/basis_universal
ARM ASTC specification — developer.arm.com/documentation/100672
D3D11 BC1-BC7 specification — Microsoft Docs (Direct3D 11 texture block compression)
Intel ISPCTextureCompressor — github.com/GameTechDev/ISPCTextureCompressor
Lance Williams — "Pyramidal Parametrics", SIGGRAPH (1983)
OpenGL ES 3.0 / 3.2 specification — Khronos Group
NVIDIA Texture Tools — github.com/NVIDIAGameWorks/NVIDIATextureTools
Iourcha, Nayak & Hong — "System and method for fixed-rate block-based image compression with inferred pixel values" (S3TC, 1999)

Phase III · HDR / 工程影像

Phase III · HDR / engineering imaging

OpenEXR — openexr.com (Academy Software Foundation)
Greg Ward — "Real Pixels", Graphics Gems II (1991, RGBE format)
Adobe DNG specification 1.7 — helpx.adobe.com/camera-raw/digital-negative.html
LibRaw documentation — libraw.org
NEMA DICOM Standard PS 3.x — dicomstandard.org
TIFF 6.0 specification — Adobe (1992)
SMPTE ST 268 — DPX File Format for Digital Moving-Picture Exchange
Dave Coffin's dcraw — cybercom.net/~dcoffin/dcraw/
OpenColorIO — opencolorio.org
ITU-R BT.2100 — Image parameter values for HDR television
SMPTE ST 2084 — Perceptual Quantizer (PQ) transfer function

Phase IV · 矢量 / 文档

Phase IV · Vector / document

W3C SVG 1.1 / SVG 2 Recommendation — w3.org/TR/SVG2/
ISO 32000-1 / -2 — Document management — Portable Document Format (PDF)
ITU-T T.88 — JBIG2 (Joint Bi-level Image experts Group)
Adobe PostScript Language Reference Manual, 3rd ed. (1999)
Lottie — airbnb.design/lottie / lottiefiles.com
Encapsulated PostScript File Format Specification, Adobe v3.0
WMF / EMF — Microsoft Open Specifications [MS-WMF], [MS-EMF]

Phase V · 复古 / 怪格式

Phase V · Retro / oddities

QOI specification — qoiformat.org / github.com/phoboslab/qoi
Farbfeld — tools.suckless.org/farbfeld/
NetPBM (PBM/PGM/PPM) — netpbm.sourceforge.net
EA IFF '85 specification — Jerry Morrison, Electronic Arts (1985)
Truevision TGA File Format Specification, v2.0 (1989)
ZSoft PCX Technical Reference Manual (1988)
BMP / DIB structure — Microsoft Docs (Win32 GDI)
XPM — X PixMap format, X.Org reference

Phase VI · 卫星 / 科学

Phase VI · Satellite / science

FITS Standard 4.0 — fits.gsfc.nasa.gov/fits_standard.html
OGC GeoTIFF 1.1 specification — ogc.org/standard/geotiff/
NITF MIL-STD-2500C — National Imagery Transmission Format
astropy — astropy.org
GDAL — gdal.org (Geospatial Data Abstraction Library)
Cloud-Optimized GeoTIFF (COG) — cogeo.org
Zarr — zarr.dev (chunked, compressed N-dimensional arrays)

Phase VII · 神经压缩 / 未来

Phase VII · Neural / future

Toderici et al. — "Variable Rate Image Compression with Recurrent Neural Networks", ICLR 2016
Ballé, Minnen et al. — "Variational Image Compression with a Scale Hyperprior", ICLR 2018
Mentzer, Toderici et al. — "High-Fidelity Generative Image Compression" (HiFiC), NeurIPS 2020
Yang, Mandt — "Lossy Image Compression with Conditional Diffusion Models" (CDC), NeurIPS 2023
CompressAI — github.com/InterDigitalInc/CompressAI
ISO/IEC 21122 — JPEG XS (low-latency lightweight image coding)
WebP2 — chromium.googlesource.com/codecs/libwebp2 (experimental)
JPEG AI — Call for Proposals, ISO/IEC JTC 1/SC 29/WG 1 (2022)

综合 / 工具

General / tools

libvips documentation — libvips.github.io/libvips/
ImageMagick documentation — imagemagick.org
OpenImageIO — openimageio.readthedocs.io
David Salomon — "Data Compression: The Complete Reference", 4th ed. (Springer)
Khalid Sayood — "Introduction to Data Compression", 5th ed.
Charles Poynton — "Digital Video and HD: Algorithms and Interfaces", 2nd ed.

合计 ~70 条参考,覆盖 8 组。完整列表可视为这条沉积带的"地层钻孔",每一层都能往下挖。

About 70 references in total across 8 groups. Treat the list as a borehole through this sedimentary band — every stratum can be dug deeper.

一颗像素的 50 种归宿

50 fates of one pixel

图片格式 家族树

A family tree of image formats

BMP — 没有压缩的童年

BMP — A Childhood Without Compression

技术内核

Technical core

适用

USE FOR

反适用

AVOID

GIF — 1987 与 LZW 专利往事

GIF — 1987 and the LZW Patent Saga

技术内核

Technical core

适用

USE FOR

反适用

AVOID

PNG — DEFLATE、scanline filter、与一场弑父

PNG — DEFLATE, Scanline Filters, and a Patricide

技术内核

Technical core

适用

USE FOR

反适用

AVOID

APNG — PNG 偷偷做了动图

APNG — PNG Secretly Grew Frames

技术内核

Technical core

适用

USE FOR

反适用

AVOID

animated WebP — WebP 的动图分身

animated WebP — WebP's Multi-Frame Twin

技术内核

Technical core

适用

USE FOR

反适用

AVOID

JPEG — 8×8 DCT 三十年统治

JPEG — Three Decades of the 8×8 DCT

技术内核

Technical core

适用

USE FOR

反适用

AVOID

JPEG-LS — 你没听说过的无损 JPEG

JPEG-LS — The Lossless JPEG You Never Heard Of

技术内核

Technical core

适用

USE FOR

反适用

AVOID

JPEG 2000 — 小波变换的悲壮失败

JPEG 2000 — The Tragic Defeat of the Wavelet

技术内核

Technical core

适用

USE FOR

反适用

AVOID

JPEG XR — 微软的最后一次努力

JPEG XR — Microsoft's Last Attempt

技术内核

Technical core

适用

USE FOR

反适用

AVOID

WebP — Google 把 VP8 帧内拿来做图

WebP — Google Carved an Image Format Out of a Video Frame

技术内核

Technical core

图片格式家族树