BMP — 没有压缩的童年
BMP — A Childhood Without Compression
把像素原样写进硬盘,这就够了。
Just write the pixels straight to disk; that's enough.
1980 年代末,Windows 需要一个不依赖任何压缩库、可以从显存里直接 dump 出来、又能直接 load 回去的位图容器。设计目标根本不是体积,而是"零依赖、零解码、零思考"。BMP 因此把当年显存的扫描方向、字节顺序、行对齐规则一并写进了文件头,并固定下来——三十多年后,这些 80 年代显存的影子仍然活在 .bmp 里。
In the late 1980s Windows needed a bitmap container that depended on no compression library, could be dumped straight from video memory, and loaded straight back in. The design goal was not file size; it was zero dependencies, zero decoding, zero thinking. BMP froze the scan direction, byte order, and row alignment of the era's video memory into the file header. Three decades later, those 1980s VRAM ghosts still live inside every .bmp.
技术内核
Technical core
BMP 由两段定长头组成:14 字节 BITMAPFILEHEADER(magic "BM" + 文件大小 + 像素数据偏移)和 40 字节 BITMAPINFOHEADER(宽、高、位深、压缩方式、调色板大小等)。像素阵列自下而上排列——origin 在左下角,这直接对应 80 年代 CRT 显存的扫描方向。颜色通道顺序是 BGR 而不是 RGB,同样源自当年 Windows 显存的字节排列;打开任何一个 .bmp,前三个像素字节读出来都是蓝绿红。每一行的字节数必须是 4 字节的整数倍——不足的尾部用 0 padding,目的是让 32 位 CPU 一次取一个像素时不需要做对齐计算。后期的 BMPv4 / BMPv5 加了 RLE-4 / RLE-8 行程编码、bitfield 自定义通道掩码、ICC profile、alpha 通道,但生态并没有真的跟进——大多数解码器只认那 40 字节头。
A BMP file is two fixed-size headers: a 14-byte BITMAPFILEHEADER (magic "BM", file size, pixel-data offset) and a 40-byte BITMAPINFOHEADER (width, height, bit depth, compression, palette size). The pixel array is stored bottom-up: origin in the lower-left corner, mirroring the scan direction of 1980s CRT VRAM. Channel order is BGR, not RGB — again copied from how Windows laid out video memory. Every row must be padded with zeros to a multiple of 4 bytes, so a 32-bit CPU can read one pixel per fetch with no alignment math. Later BMPv4 / BMPv5 added RLE-4 / RLE-8 run-length encoding, bitfield channel masks, ICC profiles, and a real alpha channel — but the ecosystem never caught up; most decoders still only recognise the original 40-byte info header.
适用
USE FOR
- Windows 系统资源(
.cur光标、.ico图标内层) - 嵌入式 / RTOS 等没有解码库可用的环境
- 编解码器教学:位图最素的样本
- 临时 dump:debug 时把 framebuffer 直接存盘
- Windows system resources (
.curcursors,.icoinner bitmaps) - Embedded / RTOS targets with no decoder library
- Codec teaching: the most stripped-down bitmap sample
- Throw-away framebuffer dumps when debugging
反适用
AVOID
- 任何 web 场景:体积是 PNG 的 5–20×
- 移动端 / 流量敏感传输
- 需要 metadata、color profile、HDR 的工程影像
- 需要稳定 alpha 的 UI 资源(BMPv5 兼容性差)
- Anything on the web: 5–20× larger than PNG
- Mobile / bandwidth-sensitive delivery
- Engineering imagery that needs metadata, color profiles, or HDR
- UI assets needing reliable alpha (BMPv5 support is patchy)
| scope | browsers | tools | CLI |
|---|---|---|---|
| BMP | ✓ 100% (no one ships it on the web) | ✓✓✓ Photoshop · GIMP · Paint · Preview | convert in.png out.bmp (ImageMagick) |
GIF — 1987 与 LZW 专利往事
GIF — 1987 and the LZW Patent Saga
用 256 个颜色坚持了 39 年。
Held the line with 256 colors for 39 years.
1987 年是拨号上网的年代——一张 100 KB 图片要传整整一分钟。CompuServe 需要一种比 BMP 小得多、跨平台、还能拼成动图的格式。Wilhite 拿了刚发表不久的 LZW 字典压缩,加上 256 色调色板,做出了 GIF87a。两年后 89a 又补上透明色与动画扩展,从此一锤定音——并且谁也没想到,它会撑过 GeoCities、撑过宽带、撑过 Flash,最后被 Twitter 时代的"表情包"二次激活。
1987 was the dial-up era — a 100 KB image took a full minute to download. CompuServe needed something far smaller than BMP, cross-platform, and capable of stitching frames into a loop. Wilhite combined freshly published LZW dictionary compression with a 256-colour palette and shipped GIF87a. Two years later 89a added transparency and animation extensions, locking the format in. No one expected it to outlive GeoCities, broadband, and Flash — only to be re-ignited by the Twitter-era reaction-meme.
技术内核
Technical core
GIF 由四件事拼出来:① 调色板——一张全局调色板(GCT),最多 256 个 RGB888 entry,每帧也可以再覆盖一张局部调色板(LCT);② LZW 压缩——变长 9 至 12 bit 字典,字典随像素索引流动态扩展,字典满了就清空重来;③ 多帧 + disposal——每帧带一个 Graphic Control Extension,disposal method 决定下一帧绘制前如何处理当前帧(保留 / 还原背景 / 还原前一帧);④ 89a 扩展——透明色索引(让其中一个 palette 槽位变透明,所以 alpha 永远是 1 bit)、Comment / Plain Text Extension、以及最关键的 NETSCAPE2.0 Application Extension——后者带一个 16-bit 循环计数,从此 GIF 可以无限循环。这个扩展不是 GIF 标准的一部分,是 Netscape 1995 年自己加的——但今天每一张循环 GIF 都欠 Netscape 一个 credit。
GIF is four things stitched together. ① Palette — one global colour table (GCT) of up to 256 RGB888 entries, optionally overridden per frame by a local table (LCT). ② LZW compression — a variable-width 9-to-12-bit dictionary that grows with the pixel-index stream and resets when full. ③ Frames + disposal — each frame carries a Graphic Control Extension whose disposal method tells the decoder how to wipe the previous frame (keep / restore-background / restore-previous). ④ 89a extensions — a transparent-colour index (one palette slot becomes "transparent", which is why alpha is forever 1-bit), Comment and Plain-Text extensions, and the all-important NETSCAPE2.0 Application Extension that carries a 16-bit loop counter. That last one isn't part of any standard — Netscape just added it in 1995 — yet every looping GIF on Earth still owes Netscape a credit.
适用
USE FOR
- 表情包 / 反应 GIF(社交平台仍按 GIF MIME 处理)
- 极简动图、像素艺术、loading 转圈
- 少色线稿、单色 banner、低保真预览
- Reaction memes (most platforms still pipe them as image/gif)
- Minimal motion loops, pixel art, spinners
- Low-colour line art, monochrome banners, low-fi previews
反适用
AVOID
- 真实照片:256 色根本不够,带 dither 还是色块
- 渐变 / 长视频片段:体积爆炸,远不如 mp4 / WebM
- 需要半透明 alpha 的任何场景
- Photographs: 256 colours simply isn't enough, even with dither
- Gradients or long clips: file size explodes; mp4 / WebM win by 10–50×
- Anything needing real (non-binary) alpha
| scope | browsers | tools | CLI |
|---|---|---|---|
| GIF | ✓✓✓ universal since Mosaic 1993 | ✓✓✓ Photoshop · GIMP · ezgif · Figma | gifsicle -O3 in.gif -o out.gif · ffmpeg -i in.mp4 out.gif |
PNG — DEFLATE、scanline filter、与一场弑父
PNG — DEFLATE, Scanline Filters, and a Patricide
GIF 收钱的那天,自由工程师写了一个免费的弑父者。
The day GIF started charging, free engineers wrote its successor.
1995 年初 Unisys 开始执行 LZW 专利,所有 GIF 编码器都要付费——包括 CompuServe 自己。Usenet comp.graphics 上 Thomas Boutell 在两周内拉起一支 30 人志愿团队,目标四条:(a) 完全无专利;(b) 比 GIF 更小;(c) 真正的 alpha 通道,而不是 1 bit 透明色;(d) 16 bit/channel + ICC profile + gamma 校正,为下一个十年的设备做准备。九个月后 PNG 1.0 + zlib + DEFLATE 三个 RFC 同时发布——这是互联网历史上最快、最干净的一场技术弑父。
In early 1995 Unisys began enforcing its LZW patent: every GIF encoder, including CompuServe's own, now owed money. Within two weeks a thirty-person volunteer crew on Usenet's comp.graphics rallied around Thomas Boutell with four goals: (a) wholly patent-free; (b) smaller than GIF; (c) real alpha, not a single transparent palette slot; (d) 16 bit / channel plus ICC profiles and gamma, ready for the next decade of hardware. Nine months later PNG 1.0, zlib, and DEFLATE shipped as three simultaneous RFCs — possibly the cleanest, fastest patricide in internet history.
技术内核
Technical core
PNG 的五个支柱:① DEFLATE = LZ77 + Huffman——和 zip / gzip 完全同款,1996 年的 RFC 1951,从设计第一天起就保证无专利。② 5 种 scanline filter(None / Sub / Up / Average / Paeth):每行像素独立选最优 filter——filter 不压缩,而是把数据预测成残差,让后面的 DEFLATE 更容易找到重复。Paeth filter 用左、上、左上三像素做方向预测,在自然图像上几乎总赢。③ chunks 体系:IHDR / PLTE / IDAT / IEND 是必须的,后面可以追加 tRNS(调色板透明)、gAMA / cHRM / iCCP(色彩管理)、tEXt / iTXt(metadata)、acTL / fcTL(APNG 扩展)等等;每个 chunk 一个 CRC32 校验,未知 chunk 解码器必须安全跳过——因此 PNG 的扩展性几乎是无限的。④ 真 alpha:8 或 16 bit 的独立 alpha 通道,不再借调色板槽位伪装;PNG-32 = RGBA 8-bit。⑤ Adam7 interlace:7 趟扫描的渐进显示——20 年前在 56 K 调制解调器上极其有用,今天不再常用。
Five pillars hold PNG up. ① DEFLATE = LZ77 + Huffman — the exact stack used by zip and gzip, RFC 1951, patent-free by construction. ② Five scanline filters (None / Sub / Up / Average / Paeth): each row picks its best filter independently — the filter doesn't compress, it predicts residuals so DEFLATE can spot repetitions. Paeth, which predicts from the left, upper and upper-left neighbours, almost always wins on natural images. ③ The chunks system: IHDR / PLTE / IDAT / IEND are mandatory; everything else (tRNS for palette transparency, gAMA / cHRM / iCCP for colour management, tEXt / iTXt for metadata, acTL / fcTL for APNG, …) is optional and CRC-checked, and decoders are required to ignore chunks they don't recognise — so PNG can grow forever without breaking old readers. ④ Real alpha: an independent 8- or 16-bit alpha channel, no longer disguised as a palette slot. PNG-32 is plain RGBA 8-bit. ⑤ Adam7 interlace: a 7-pass progressive scan — invaluable on 56 K modems twenty years ago, mostly obsolete today.
图 3 · 全流程 · 原始 RGBA → 逐行选 filter(None / Sub / Up / Average / Paeth)→ DEFLATE 压(LZ77 + Huffman, level 0–9)→ 打包成 chunks(IHDR + IDAT×N + IEND, 每个带 CRC32)→ 输出 .png 文件。可选:在 filter 之前先做 Adam7 隔行重排。
Fig 3 · Full pipeline · raw RGBA → per-row filter (None / Sub / Up / Average / Paeth) → DEFLATE (LZ77 + Huffman, zlib level 0–9) → pack into chunks (IHDR + IDAT × N + IEND, each CRC-checked) → emit .png. Optional: pre-shuffle rows with Adam7 before filtering.
| format | year | lossless | palette | alpha | animation | typical size vs JPEG-Q85 |
|---|---|---|---|---|---|---|
| BMP | 1990 | ✓ | ✓ | partial (v5) | — | ≈ 8–20 × |
| GIF | 1987 | partial | ✓ (256) | 1-bit | ✓ | ≈ 0.4 × (low colour) |
| PNG-8 | 1996 | ✓ | ✓ (256) | 8-bit | — | ≈ 0.3 × |
| PNG-24/32 | 1996 | ✓ | — | 8 / 16-bit | — | ≈ 1.5–5 × |
| JPEG (Q85) | 1992 | — | — | — | — | 1.0 × (baseline) |
$ oxipng -o6 in.png # brute-force re-pack, 5–30% smaller
$ pngcrush -reduce -brute in.png out.png # classic, slower but still useful
$ convert in.png -strip out.png # ImageMagick — drop metadata chunks
$ pngquant --quality=70-90 in.png # lossy palette quantisation → PNG-8
$ zopflipng -m in.png out.png # Google's zopfli, max DEFLATE compression
适用
USE FOR
- 屏幕截图、录屏帧、UI 设计稿(无损 + 锐边)
- 透明 logo、PWA 图标、Material 图标
- 需要稳定 alpha + 跨平台一致性的任何资源
- 16 bit/channel 工程影像中转(在 EXR 之前的轻量选项)
- Screenshots, recorded frames, UI design exports (lossless + sharp edges)
- Transparent logos, PWA icons, Material icons
- Anything needing reliable alpha and cross-platform consistency
- A lightweight 16-bit/channel transit format before EXR enters the picture
反适用
AVOID
- 真实照片:体积比 JPEG / WebP / AVIF 大 5–10 ×
- 视频帧序列:用 H.264 / AV1 / WebM,不是 APNG
- 对加载时间极敏感的首屏大图
- Photographs: 5–10 × larger than JPEG / WebP / AVIF
- Video frame sequences: use H.264 / AV1 / WebM, not APNG
- Above-the-fold hero images where bytes matter most
| scope | browsers | tools | CLI |
|---|---|---|---|
| PNG | ✓✓✓ universal since IE 4 / Mozilla 1.0 | ✓✓✓ Photoshop · Figma · Sketch · GIMP · Preview | oxipng -o6 · pngquant · zopflipng |
APNG — PNG 偷偷做了动图
APNG — PNG Secretly Grew Frames
PNG 工作组说不要,Mozilla 偷偷加了。
PNG WG said no. Mozilla shipped it anyway.
2004 年 Mozilla 想给 Firefox 加一类"加载中"的动态图标:GIF 只剩 256 色看起来很丑,而 PNG 工作组官方动图方案 MNG 又庞大复杂,几乎没人实现。两个 Mozilla 工程师 Stuart Parmenter 和 Vladimir Vukićević 干脆在 PNG 上塞了三个新 chunk:acTL(动画控制)、fcTL(每帧控制)、fdAT(帧数据)。提案发到 PNG 邮件列表,工作组明确拒绝接收——理由是"会破坏 PNG 简洁性"。Mozilla 不为所动,2008 年随 Firefox 3 直接发布;十年后 Apple、Google 跟进,2017 年 W3C 终于回头把它收编为标准。一个被官方拒绝的扩展,反过来被市场和标准追认。
In 2004 Mozilla wanted lightweight loading animations in Firefox: GIF's 256 colours looked ugly and the PNG working group's official MNG was so vast that almost no one implemented it. Two Mozilla engineers, Stuart Parmenter and Vladimir Vukićević, simply added three new chunks to PNG — acTL (animation control), fcTL (per-frame control), fdAT (frame data). They sent the proposal to the PNG mailing list; the working group flatly refused, citing damage to "PNG's simplicity". Mozilla shipped it anyway in Firefox 3 (2008). A decade later Apple and Google followed, and in 2017 the W3C finally adopted APNG as a standard. A rejected extension, ratified later by the market and the spec.
IHDR 之后插入新的 acTL(动画控制),第一个 IDAT 仍是合法的静态 PNG 首帧——不支持 APNG 的解码器止于此(图中红色虚线)。其后是 fcTL 与 fdAT 的反复:每帧一个 fcTL 描述位置/时长/disposal,后面紧跟若干 fdAT 携带帧数据。acTL sits right after IHDR; the first IDAT is still a perfectly legal static PNG — old decoders stop at the dashed red line. From there on, fcTL + fdAT alternate per frame, with each fcTL describing the frame's position, delay and disposal mode.技术内核
Technical core
APNG 在 PNG 上做的改动只有三件事:① 三个新 chunk——acTL 携带帧数和循环次数;fcTL 是每帧控制块,描述偏移、宽高、显示时长(分子/分母两个 16-bit)、blend mode 与 disposal mode;fdAT 本质是带 4 字节 sequence 编号的 IDAT,数据段格式完全相同。② 第一帧仍是合法 PNG——把首帧仍写成 IDAT,意味着不支持 APNG 的解码器(早期 Safari、ImageMagick 旧版本)看到的就是一张静态图,向后兼容性极佳。这是 APNG 比 MNG 成功的最大原因之一。③ blend / disposal 模式——blend mode 有 SOURCE(直接覆盖)与 OVER(alpha 合成)两种;disposal mode 有 NONE(保留)、BACKGROUND(清空)、PREVIOUS(还原前一帧)三种,跟 GIF 89a 的 disposal 完全是同一套语义。除此之外,APNG 对色彩空间、滤波、压缩(DEFLATE)的处理与 PNG 一字不差。
APNG only adds three things to PNG. ① Three new chunks: acTL carries frame count and loop count; fcTL is a per-frame control block describing offset, width/height, delay (a 16-bit numerator and denominator), blend mode and disposal mode; fdAT is essentially an IDAT prefixed with a 4-byte sequence number — its data payload format is identical. ② The first frame is still a valid PNG: keeping frame 0 as an IDAT means a decoder that doesn't understand APNG (old Safari, older ImageMagick) just sees a static image. This backward-compatibility trick is the biggest reason APNG won where MNG failed. ③ Blend / disposal modes: blend modes are SOURCE (overwrite) and OVER (alpha composite); disposal modes are NONE (keep), BACKGROUND (clear), PREVIOUS (restore prior frame) — exact same semantics as GIF 89a. For everything else (colour space, filters, DEFLATE), APNG inherits PNG byte for byte.
适用
USE FOR
- 高质量动图——Twitter / Telegram / 微信表情包
- 需要真 alpha 通道的动效(GIF 只能 1-bit)
- 静图回退极重要的场景:不支持 APNG 的浏览器仍能显示首帧
- High-quality animated stickers (Twitter / Telegram / WeChat)
- Animations needing real alpha (GIF gives you only 1-bit)
- Anywhere a static fallback matters: non-APNG decoders still see frame 0
反适用
AVOID
- 体积敏感场景:同等画质比 WebP / AVIF 大 2–5 ×
- 长序列 / 视频片段:用 H.264 / AV1 / WebM
- Bandwidth-tight contexts: 2–5 × larger than WebP / AVIF at equal quality
- Long sequences or video clips: use H.264 / AV1 / WebM
| scope | browsers | tools | CLI |
|---|---|---|---|
| APNG | ✓✓✓ Firefox 3+ · Safari 8+ · Chrome 59+ · Edge 18+ | ✓✓ GIMP · Photoshop (plugin) · ezgif | apngasm in_*.png out.apng · ffmpeg -plays 0 ... out.apng |
animated WebP — WebP 的动图分身
animated WebP — WebP's Multi-Frame Twin
WebP 在容器里偷偷塞了多帧——比 PNG 优雅,比 GIF 漂亮。
WebP slipped multiple frames into one container — neater than PNG, prettier than GIF.
静态 WebP 已经在体积上把 GIF 摁在地上摩擦——同一张表情包,WebP 通常只要 GIF 的 1/3。Google 接下来要做的事很自然:把 RIFF 容器从"装一帧"扩展成"装多帧",加一个 VP8X 扩展头声明 alpha / animation / ICC 等 feature flags,再加一个 ANIM 全局动画块和若干 ANMF 帧块。这套扩展在 2012 年附近随 libwebp 0.2 公开,WebP 一夜之间从静图格式变成"比 GIF 小 30%、画质好一个数量级、还有真 alpha"的动图格式。今天 Telegram、WhatsApp 的"高级表情包"几乎都是 animated WebP。
Static WebP already crushed GIF on bytes — the same sticker is typically a third the size in WebP. The next step was obvious: extend the RIFF container from one frame to many. Google added a VP8X extended header to declare feature flags (alpha / animation / ICC), an ANIM global animation block, and a stream of ANMF per-frame blocks. The extension landed around 2012 with libwebp 0.2, and overnight WebP went from a still-image format to one that beats GIF by ~30 % in size, an order of magnitude in quality, and finally adds real alpha. Today's "premium" stickers on Telegram and WhatsApp are almost all animated WebP.
VP8X 声明本文件含 animation / alpha / ICC 等 feature flag;ANIM 一次性给出背景色和循环次数;然后每一帧是一个 ANMF 块,内部再嵌一段 VP8(有损)或 VP8L(无损)bitstream。VP8X declares which feature flags are active (animation / alpha / ICC); ANIM gives the background colour and loop count once; each ANMF then carries one frame, internally wrapping a VP8 (lossy) or VP8L (lossless) bitstream.技术内核
Technical core
animated WebP 的扩展逻辑可以用三句话概括:① RIFF 容器 + VP8X 扩展头——RIFF 是 Microsoft 1991 年发明的"分块容器"标准(用过 .wav / .avi 都见过它),WebP 直接复用,VP8X 是 WebP 自己加的扩展头,8 字节里第一字节是 feature flags(ICC profile / alpha / EXIF / XMP / animation 各占一位),其余字节给出 24-bit 的画布宽高。② ANIM + ANMF——ANIM 块在文件级别声明背景色和循环次数,ANMF 块在帧级别给出偏移、宽高、duration、blend mode、disposal mode,跟 APNG / GIF 是同一套语义。③ 每帧可独立选编码——WebP 内置两套编码器 VP8(有损,基于运动补偿和 DCT)与 VP8L(无损,基于 LZ77 + 颜色变换 + Huffman),animated WebP 允许逐帧切换:贴纸的纯色背景一帧用 VP8L 无损,人物动画一帧用 VP8 有损,同一文件里混排。
Three sentences cover the entire mechanism. ① RIFF + VP8X header: RIFF is Microsoft's 1991 chunk container (anyone who's opened a .wav or .avi has met it). WebP reuses it verbatim and adds an 8-byte VP8X header — the first byte is a bitfield of feature flags (ICC profile / alpha / EXIF / XMP / animation), and the remainder encodes a 24-bit canvas width and height. ② ANIM + ANMF: ANIM sits at file scope and declares background colour plus loop count; each ANMF then carries per-frame offset, dimensions, duration, blend mode and disposal mode — exact same semantics as APNG and GIF. ③ Per-frame codec choice: WebP ships two encoders, VP8 (lossy, motion-compensation + DCT) and VP8L (lossless, LZ77 + colour transform + Huffman). An animated WebP can switch encoders frame by frame — a sticker's flat background uses lossless VP8L, the character animation uses lossy VP8, all in one file.
适用
USE FOR
- 现代动图首选 / 表情包 / 短动效
- 需要 alpha 又要小体积的循环动画
- 一帧无损一帧有损混排的复杂贴纸
- The default modern animated-image format · stickers · short loops
- Looped animation that needs alpha and small bytes
- Complex stickers mixing lossless and lossy frames in one file
反适用
AVOID
- Safari < 14(老 iOS):需要 GIF/APNG 兜底
- 需要超低延迟硬解的视频流——还是用 H.264 / AV1
- Safari < 14 (older iOS): you'll need a GIF/APNG fallback
- Latency-critical hardware-decoded video — stick with H.264 / AV1
| scope | browsers | tools | CLI |
|---|---|---|---|
| animated WebP | ✓✓✓ Chrome 32+ · Firefox 65+ · Safari 14+ | ✓✓ Photoshop (plugin) · ezgif · GIMP 2.10+ | cwebp · webpmux -frame f1.webp +100 ... -o anim.webp |
JPEG — 8×8 DCT 三十年统治
JPEG — Three Decades of the 8×8 DCT
8×8 的格子,装下了三十年的人类视觉。
An 8×8 grid that held three decades of human vision.
1980 年代末,扫描仪、数码相机、传真机几乎同时崛起,所有人都需要一个能把"自然图像"压到 1/10 体积、人眼又看不太出来的标准。JPEG 委员会用三个事实搭了一条压缩流水线:人眼对亮度比对色度敏感、对低频比对高频敏感、对能量集中的信号(自然图像)有极强冗余可挖。把这三件事翻译成代码,就是 YCbCr + 4:2:0 + 8×8 DCT + 量化——JPEG 因此能在 Q85 这个挡位上把 5 MB 的照片压到 250 KB,而你几乎看不出差。
By the late 1980s scanners, digital cameras and fax machines were arriving in parallel — everyone needed a way to crunch "natural images" to a tenth of their size while the human eye barely noticed. The JPEG committee built a pipeline around three facts: the eye is more sensitive to luma than to chroma, more sensitive to low frequencies than to high, and natural images carry enormous redundancy in their energy distribution. Translated into code, that becomes YCbCr + 4:2:0 + 8×8 DCT + quantisation — and lets JPEG turn a 5 MB photo into 250 KB at Q85 with practically no visible loss.
技术内核
Technical core
JPEG 的压缩流水线有六环:① RGB → YCbCr——分离亮度与色度,为后续区别对待打基础。② 4:2:0 子采样——色度面 Cb / Cr 在水平和垂直方向各降一半采样率,体积立刻砍掉 50%,人眼几乎无感。③ 切成 8×8 块,每块做 DCT-II——空域换频域,自然图像的能量集中在左上(低频),右下大量系数趋近 0。④ 量化表——亮度表 + 色度表(色度表更激进),把不重要的高频系数除以一个较大的整数后取整,大量系数被清零。这一步是 JPEG 唯一的有损步骤,所有"画质损失"都在此发生。⑤ zig-zag 扫描 + RLE + Huffman——把 64 个系数按"之"字形排成一维序列,尾部一长串零交给 RLE 压缩,剩下的字面量做 Huffman 熵编码,无损。⑥ JFIF / Exif 容器封装——JPEG 标准本身只规定 codec 流(SOI / APPn / DQT / DHT / SOF / SOS / EOI marker),文件格式是另一层:JFIF 1.02(1992)规定了标准的 APP0 元数据,Exif(1995)在 APP1 里塞相机参数。今天你看到的每张 .jpg 几乎都是 "JFIF + Exif 包了一段 JPEG codec 流"。
Six stages make up the JPEG pipeline. ① RGB → YCbCr: split luma from chroma so the rest of the pipeline can treat them differently. ② 4:2:0 chroma subsampling: halve Cb and Cr horizontally and vertically — instantly drops 50 % of the data with virtually no perceptual cost. ③ Split into 8×8 blocks; run DCT-II per block: spatial → frequency domain. Natural-image energy clusters in the top-left (low frequency); the bottom-right is mostly near-zero. ④ Quantisation tables (luma + chroma, chroma being more aggressive): each coefficient is divided by the matching integer and rounded, killing huge swathes of high-frequency information. This is the only lossy step — every visible artefact JPEG ever produces comes from here. ⑤ Zig-zag scan + RLE + Huffman: unroll the 64 coefficients into a 1-D stream so the long zero-tail compresses cleanly under RLE, then Huffman-encode the remaining literals. Lossless. ⑥ JFIF / Exif container: the JPEG spec only defines the codec stream (SOI / APPn / DQT / DHT / SOF / SOS / EOI markers); the file format is a separate layer. JFIF 1.02 (1992) standardised an APP0 metadata segment, Exif (1995) tucked camera metadata into APP1. Almost every .jpg you've ever seen is "JFIF + Exif wrapping a JPEG codec stream".
图 6 · JPEG 全流程 · RGB → YCbCr 分离 → 4:2:0 子采样(−50%)→ 切 8×8 块 → DCT-II 频域变换 → 量化(★ 唯一有损步骤,Q 控制狠度)→ zig-zag 扫描成 1-D → RLE 压零 → Huffman 熵编码 → JFIF / Exif 包外壳 → .jpg。Q、量化表、子采样比、baseline / progressive 是编码器仅有的几个旋钮。
Fig 6 · The full JPEG pipeline · RGB → YCbCr split → 4:2:0 subsample (−50 %) → 8×8 blocks → DCT-II → Quantise (★ the one and only lossy step, Q controls how brutal) → zig-zag scan → RLE → Huffman → JFIF / Exif wrapper → .jpg. The encoder really only has four knobs: Q, quant tables, subsample ratio, and baseline-vs-progressive scan.
| format | year | typical 1080p photo | quality at same size |
|---|---|---|---|
| JPEG Q85 | 1992 | ≈ 250 KB | baseline |
| WebP Q75 | 2010 | ≈ 165 KB | ≈ JPEG Q85 |
| HEIC Q60 | 2015 | ≈ 125 KB | ≈ JPEG Q85 |
| AVIF Q60 | 2019 | ≈ 95 KB | ≈ JPEG Q85 |
| JXL Q90 | 2021 | ≈ 85 KB | ≈ JPEG Q85 |
$ cjpeg -quality 85 -optimize -progressive in.ppm > out.jpg # reference libjpeg encoder, progressive scan
$ jpegoptim --max=85 --strip-all in.jpg # cap quality at 85, drop all metadata in place
$ mozjpeg cjpeg -quality 85 in.png > out.jpg # Mozilla's encoder — 5–10% smaller at same Q
$ exiftool -all= -overwrite_original in.jpg # nuke all Exif / GPS / thumbnail metadata
适用
USE FOR
- 真实照片(自然图像、人像、风景)
- 真彩渐变、模糊背景、艺术摄影
- 任何"颜色丰富、连续变化"的内容
- 需要最大兼容性的场景:每一台设备都能解
- Real photographs (nature, portraits, landscapes)
- Truecolor gradients, soft backgrounds, art photography
- Anything with rich, continuously varying colour
- Maximum compatibility — every device on Earth decodes JPEG
反适用
AVOID
- 文字 / 截图 / UI:8×8 块边界会出现明显方格 artifact
- 线稿 / 卡通 / 像素艺术:锐边附近出现 ringing
- 需要 alpha 通道的任何场景
- 需要无损保留每一个像素的工程影像
- Text / screenshots / UI: visible 8×8 block artefacts
- Line art / cartoons / pixel art: ringing near sharp edges
- Anything needing an alpha channel
- Engineering images where every pixel must survive intact
| scope | browsers | tools | CLI |
|---|---|---|---|
| JPEG / JFIF / Exif | ✓✓✓ universal — every browser, every OS, every camera | ✓✓✓ Photoshop · Lightroom · Figma · Preview · everything | cjpeg · jpegoptim · mozjpeg · exiftool |
JPEG-LS — 你没听说过的无损 JPEG
JPEG-LS — The Lossless JPEG You Never Heard Of
比 PNG 快 3 倍,但因为不带容器和颜色管理,谁都没记住它。
3× faster than PNG, but no container, no colour management — and no one remembered.
1990 年代中期,医学影像的需求摆在桌面上:CT 和 MRI 一帧就是 12-bit 灰度大图,一次扫描几百帧——必须无损,但要比 PNG 简单、要比 JPEG 的 lossless mode 实用。HP Labs 的 Marcelo Weinberger 团队拿出了 LOCO-I(LOw COmplexity LOssless COmpression for Images)算法:用三像素中位数预测器(MED)估算下一个像素,把残差送进 Golomb-Rice 编码,在长平滑区域切到 RLE。1997 年 ISO/IEC 14495-1 正式发布,实测比 PNG 压缩率略好、解码速度快 3-5 倍——但它没有自带容器(只是裸 bitstream + 极简 marker),没有 ICC profile,没有元数据,没有 alpha,Web 浏览器全部当它不存在。最后只有医学影像活到了今天:DICOM 把 JPEG-LS 当成它的标准内嵌编码之一。
By the mid-1990s the medical-imaging world had a clear ask: CT and MRI frames were 12-bit greyscale, hundreds of slices per scan, and they had to be lossless — but simpler than PNG and more usable than JPEG's lossless mode. Marcelo Weinberger and team at HP Labs produced LOCO-I (LOw COmplexity LOssless COmpression for Images): a median-of-three predictor (MED) estimates each pixel, the residual goes into Golomb-Rice coding, and long flat runs switch to RLE. ISO/IEC 14495-1 shipped in 1997, beating PNG slightly on ratio and decoding 3–5× faster — but JPEG-LS arrived as a bare bitstream with minimal markers: no container, no ICC profile, no metadata, no alpha. Browsers ignored it entirely. Only medical imaging kept it alive — DICOM still embeds JPEG-LS as one of its standard encodings.
x 由其上方 b、左方 a、左上 c 三个邻居预测——根据 c 与 a/b 极值的关系三选一,本质上是在猜测当前位置是水平边缘、垂直边缘还是平面。预测出来的值与真实值的差(残差)再送进 Golomb-Rice 编码——这套加法运算简单到 90 年代医院 CT 设备的弱 CPU 也能跑得动。x is estimated from neighbours b (above), a (left) and c (top-left) — three branches pick whichever fit, essentially guessing whether the local context is a horizontal edge, a vertical edge, or a flat surface. The residual (predicted minus actual) is then Golomb-Rice coded. The arithmetic is so simple that even the weak CPUs in 1990s hospital CT scanners could keep up.技术内核
Technical core
JPEG-LS 的精彩之处在于"用三件极简的事打败了 PNG"。① MED 预测器——只用左、上、左上三个像素就能判断当前像素位置在水平边、垂直边还是平坦区:c ≥ max(a,b) 说明这是从右上往下的边,预测取 min(a,b);c ≤ min(a,b) 反向,取 max(a,b);否则就是平坦区,预测取 a+b-c(即沿梯度延伸)。预测准了,残差就接近 0。② Golomb-Rice 熵编码——预测残差大致服从拉普拉斯/几何分布,Golomb-Rice 是这种分布的最优前缀码:把残差除以 2^k 拆成商和余数,商用 unary 码(若干个 1 加一个 0),余数用 k 位定长。参数 k 在编码过程中根据上下文自适应,完全跳过了 Huffman 表的构造,无需多遍扫描。③ Run-length mode——当解码器探测到连续多个像素被相同的上下文预测、且残差全部为 0 时,自动切换到 RLE 模式直接编码 run length——这是它在医学灰度图(大量黑底)和文档扫描上完胜 PNG 的关键。整个 codec 没有 DCT、没有变换、没有量化(无损模式),数学上接近"算术替代变换"。
JPEG-LS beats PNG with three tiny ideas. ① MED predictor: just three neighbouring pixels (left, above, top-left) decide whether the current pixel sits on a horizontal edge, vertical edge or smooth surface. c ≥ max(a,b) picks min(a,b); c ≤ min(a,b) picks max(a,b); otherwise a + b − c (planar extrapolation). When the predictor is right, the residual is near zero. ② Golomb-Rice entropy coding: residuals roughly follow a Laplacian / geometric distribution, and Golomb-Rice is the optimal prefix code for it — divide the residual by 2k, encode the quotient in unary (k ones plus a terminating zero) and the remainder in k bits flat. The parameter k adapts per context during encoding, so there's no Huffman table to construct and no extra pass over the data. ③ Run-length mode: when the decoder sees consecutive pixels predicted by the same context with zero residuals, it switches to RLE and encodes the run length directly — the move that destroys PNG on medical greyscales (mostly black background) and document scans. The whole codec has no DCT, no transform, no quantisation (in lossless mode); it's almost pure arithmetic replacing transforms.
适用
USE FOR
- 医学影像 DICOM 内嵌(CT / MRI 标准无损编码)
- 高速无损归档:解码比 PNG 快 3–5×
- 嵌入式设备无损相机(算力极低)
- 文档扫描:大平面 + 边缘的混合内容
- DICOM medical imaging (a standard lossless encoding for CT / MRI)
- High-throughput lossless archival — 3–5× faster decoding than PNG
- Embedded lossless cameras with very tight CPU budgets
- Document scans — flat regions plus sharp edges
反适用
AVOID
- Web:浏览器零原生支持
- 需要 alpha 通道
- 需要内嵌 ICC profile / EXIF / 元数据
- The web — zero native browser support
- Anything requiring an alpha channel
- Anything requiring embedded ICC profiles / EXIF / metadata
| scope | browsers | tools | CLI |
|---|---|---|---|
| JPEG-LS | ✗ none | ✓ DCMTK · CharLS · MATLAB / Python (pylibjpeg) | charls -e in.pgm out.jls · dcmcjpls in.dcm out.dcm |
JPEG 2000 — 小波变换的悲壮失败
JPEG 2000 — The Tragic Defeat of the Wavelet
技术比 JPEG 强,专利让它寸步难行。
Technically beats JPEG. Patents tied its feet.
90 年代末 JPEG 的痛点是清楚的:8×8 块边界肉眼可见、没有 alpha、压缩等级单一、metadata 设计落后。JPEG 工作组想用一个全新算法一次性解决——结果就是 JPEG 2000:用整图 DWT(离散小波变换) 取代 8×8 DCT,无块边界、天然多分辨率;用 EBCOT(Embedded Block Coding with Optimized Truncation) 做编码,可以按"质量层、分辨率层、组件层、空间区域"任意子集解码——同一个 .jp2 文件,你可以只取低分辨率缩略,也可以只取一个画面区域。技术上完胜 JPEG。但有两件事压垮了它:① 解码复杂度比 JPEG 高 10 倍以上,移动设备根本跑不动;② 标准里嵌了几十项专利(虽然多数 RAND 免费),浏览器厂商出于法律风险拒绝实现。Mozilla 和 Google 多次明确说"不"。最后只在三个不在意延迟和算力的领域活下来:数字影院(DCI 强制使用)、卫星图像、医学影像。Safari 是唯一原生支持的浏览器——这是 Apple ImageIO 框架顺带带的,Apple 自己也不主推。
JPEG's 1990s pain points were obvious: visible 8×8 block boundaries, no alpha, only one compression curve, dated metadata. The JPEG WG tried to fix all of it with a clean-sheet algorithm — JPEG 2000. Replace the 8×8 DCT with a whole-image discrete wavelet transform (no block edges, naturally multi-resolution). Replace the entropy coder with EBCOT (Embedded Block Coding with Optimised Truncation), which lets a decoder grab any subset of quality / resolution / component / region from the same .jp2 file — pull just a thumbnail, or just one ROI. Technically it crushes JPEG. Two things broke it. ① Decoding cost is 10× JPEG or more — mobile silicon could not keep up. ② The standard sits on dozens of patents (most RAND-free, but the legal cloud was real), and browser vendors refused to implement it. Mozilla and Google both said no on record. JPEG 2000 survived only in three latency-insensitive, compute-rich worlds: digital cinema (DCI mandates it), satellite imagery, and medical imaging. Safari is the only browser that ships native support — and even that came along for free with Apple's ImageIO framework. Apple never promoted it.
LL₃ 就是 1/8 大小的天然缩略图——无需重新解码原图就能拿到。这正是"按需取分辨率"的物理基础:解码器要 1/8 缩略只读 LL₃,要 1/4 加上 LH₂/HL₂/HH₂,以此类推。LL₃ is a free 1/8-scale thumbnail — no re-decoding required. This is the physical basis of "decode any resolution you want": grab just LL₃ for 1/8, add the level-2 subbands for 1/4, and so on.技术内核
Technical core
JPEG 2000 的技术结构有四件事值得记住。① DWT 小波变换——替代 DCT。无损模式用 5/3 整数小波(可逆),有损模式用 9/7 浮点小波(更高效)。整图变换没有 8×8 块边界,所以彻底消除了 JPEG 那种"打格子"的 artifact;同时小波天然多分辨率——见上图。② tile + code-block + EBCOT 三层切分——大图先按 tile(典型 256×256 或 1024×1024)分块独立处理,tile 内部 DWT 后的每个子带再切成 code-block(典型 64×64),EBCOT 对每个 code-block 做位平面编码 + 算术编码,最后 R-D 优化决定哪些位平面截断。③ quality / resolution / component / position progression——同一个 .jp2 文件可以按四种顺序组织码流,解码器拿到任意前缀就能解出对应的一份"低质量但完整"或"高质量但单一分辨率"或"单一区域"的图像。这是 IIIF(图书馆/博物馆高分辨率扫描)的核心能力。④ 同算法既无损又有损——只通过量化步长切换,不像 JPEG / JPEG-LS 是两个独立标准。
Four pieces are worth remembering. ① Discrete wavelet transform replaces the DCT. Lossless mode uses the reversible 5/3 integer wavelet; lossy mode uses the 9/7 floating-point wavelet (higher efficiency). Whole-image transform = no 8×8 block edges = no JPEG-style tiling artefacts. The wavelet is also naturally multi-resolution (see figure above). ② tile + code-block + EBCOT three-level partitioning. Large images are first split into tiles (typically 256×256 or 1024×1024), each tile is wavelet-transformed, each subband is split into code-blocks (typically 64×64), and EBCOT bit-plane codes each block with arithmetic coding before R-D optimisation decides which bit-planes to truncate. ③ Quality / resolution / component / position progression: a single .jp2 can order its codestream four different ways, and any prefix the decoder receives yields either a "low-quality but complete" or "high-quality but single-resolution" or "single-region" image. This is the core capability behind IIIF (the library / museum high-resolution scan protocol). ④ One algorithm, both lossless and lossy — switching is just a matter of the quantisation step, not a separate standard like JPEG vs JPEG-LS.
适用
USE FOR
- DCI 数字影院(标准强制 — 你影院看的每一帧都是 .j2k)
- 卫星 / 遥感 / 航拍超大图(按需取分辨率)
- 医学影像 DICOM 的"高保真无损"选项
- 文化遗产高分辨率扫描(IIIF 图像服务器)
- DCI digital cinema (mandatory — every frame in your theatre is .j2k)
- Satellite / remote-sensing / aerial gigapixel imagery (decode-on-demand)
- DICOM medical imaging when "high-fidelity lossless" is required
- Cultural-heritage high-resolution scans (IIIF image servers)
反适用
AVOID
- Web — 除 Safari 外全军覆没
- 移动端 / 任何低算力解码场景
- 需要快速预览的桌面应用(解码慢)
- The web — every browser except Safari refuses to ship it
- Mobile / any low-compute decoding context
- Desktop apps that need snappy thumbnails — decoding is slow
| scope | browsers | tools | CLI |
|---|---|---|---|
| JPEG 2000 | ✓ Safari only · ✗ Chrome / Firefox / Edge | ✓✓ Photoshop · GIMP · Preview · ImageMagick | opj_compress -i in.png -o out.jp2 · kdu_compress (commercial) |
JPEG XR — 微软的最后一次努力
JPEG XR — Microsoft's Last Attempt
微软第一个支持 HDR 32-bit float 的 web 格式,但 Chrome 没要它。
Microsoft's first 32-bit float HDR web format. Chrome said no.
2006 年微软看着 Web 上的图片格式仍然是 JPEG / GIF / PNG 三件套,觉得机会来了:推一个比 JPEG 强、带 alpha、支持 HDR 浮点、解码比 JPEG 2000 快的"下一代"web 图片格式。原名 HD Photo / Windows Media Photo,2009 年通过 ISO/IEC 29199 标准化为 JPEG XR(XR = eXtended Range)。技术上确实漂亮:16×16 大块 PCT 变换比 JPEG 的 8×8 DCT artifact 更不可见、原生支持 RGBE 和 scRGB 的 32-bit float HDR、无损与有损共用算法。微软在 Internet Explorer 9 / Edge Legacy 里直接内置了原生支持。但是——Chromium 拒绝实现,Mozilla 拒绝实现。理由很直白:"我们已经在押注 WebP / AVIF,不想为一个微软推的格式增加攻击面"。Edge 在 2018 年放弃自己的渲染引擎转 Chromium 后,JPEG XR 的最后一个原生支持者也消失了。讽刺的是,这套"微软推格式 - Chrome 拒绝实现 - 格式死亡"的剧本,后来被 Google 反过来用在了 WebP 推广上——你推什么我接什么,我推什么你最好接。
By 2006 Microsoft surveyed the web and saw JPEG / GIF / PNG still ruling the field. They saw a gap: ship a next-generation format that beats JPEG, adds alpha, supports HDR floats, and decodes faster than JPEG 2000. Originally HD Photo / Windows Media Photo, it was standardised in 2009 as JPEG XR ("eXtended Range") under ISO/IEC 29199. The technology was genuinely good: a 16×16 photo core transform (PCT) replaces JPEG's 8×8 DCT, with much less visible blocking; native support for RGBE and scRGB 32-bit float HDR; lossless and lossy sharing one algorithm. Microsoft baked native support into Internet Explorer 9 and Edge Legacy. But Chromium refused. Mozilla refused. The reasoning was blunt: "we're already betting on WebP / AVIF; we don't want extra attack surface for a Microsoft-pushed format." When Edge gave up its own rendering engine and switched to Chromium in 2018, the last browser with native JPEG XR support vanished. The painful irony: the "Microsoft pushes a format → Chrome refuses → format dies" playbook was later inverted by Google for WebP — what you push, I'll accept; what I push, you'd better accept.
技术内核
Technical core
JPEG XR 的技术设计有三个亮点。① 整数 16×16 PCT(Photo Core Transform)——本质上是一个类 DCT 的整数变换,但块更大、内部还有一层 4×4 子变换做"重叠"(lapped transform),让块与块之间不再有硬边界。同等质量下,JPEG XR 的 blocking artifact 比 JPEG 弱得多,但解码复杂度只比 JPEG 高一点点(远低于 JPEG 2000 的 10×)。② 原生 HDR float 支持——这是 JPEG XR 最超前的部分。它直接编码 RGBE(共享指数 32-bit)和 scRGB 浮点,不需要色调映射就能存高动态范围内容。这比 HEIC / AVIF 推广 HDR 早了将近十年——但当时显示器和操作系统都没准备好,没人用得上。③ 共享熵编码思路——熵编码部分仍然用类 JPEG 的"块+扫描+游程+熵"路径,所以软件实现成本低,微软自己的参考实现一千多行 C 就够了。这跟 JPEG 2000 几万行的复杂度相比,工程上确实"够轻"——但终究敌不过浏览器厂商的政治意愿。
JPEG XR has three technical strengths. ① Integer 16×16 PCT (Photo Core Transform) — essentially a DCT-like integer transform with a larger block, plus an inner 4×4 sub-transform that does a lapped overlap, killing hard block edges between adjacent macro-blocks. At equal quality JPEG XR shows much weaker blocking than JPEG, while costing only marginally more to decode (nowhere near JPEG 2000's 10×). ② Native HDR float support — the most forward-looking piece. It encodes RGBE (shared-exponent 32-bit) and scRGB floating-point directly, storing high-dynamic-range content without tone-mapping. This predated HEIC's and AVIF's HDR push by nearly a decade — but in 2006 neither displays nor operating systems were ready, and nobody had a workflow for it. ③ Shared entropy-coding lineage — the entropy back end is still a JPEG-style "block + scan + run-length + entropy" pipeline, so implementations are small. Microsoft's own reference implementation is barely a thousand lines of C — far lighter than JPEG 2000's tens of thousands. Engineering cost wasn't the problem. Browser-vendor politics was.
适用
USE FOR
- (历史) Windows 7 Photo Viewer 默认支持的高质量缩略图
- (历史) Office 2010+ 内置 HD Photo 编辑
- 研究 / 兼容老 Windows 资源时
- (historical) High-quality thumbnails in Windows 7 Photo Viewer
- (historical) HD Photo editing built into Office 2010+
- Research / interoperating with legacy Windows assets
反适用
AVOID
- 2026 任何现代场景:HEIC / AVIF / JPEG XL 全面替代
- Web — 没有任何主流浏览器原生支持
- Any modern 2026 scenario — HEIC / AVIF / JPEG XL fully replace it
- The web — no major browser ships native support
| scope | browsers | tools | CLI |
|---|---|---|---|
| JPEG XR | ✗ none (Edge Legacy only · removed in Chromium Edge) | ✓ Photoshop (plugin) · Windows Photos (legacy) | JxrEncApp -i in.tif -o out.jxr · JxrDecApp |
WebP — Google 把 VP8 帧内拿来做图
WebP — Google Carved an Image Format Out of a Video Frame
把 VP8 视频的一帧抠出来当图片,体积砍掉 30%。
Took one frame out of a VP8 video, shaved 30% off image size.
2010 年的 Google 看着 web 图片世界,觉得三件套(JPEG / PNG / GIF)中间还有一道明显的"裂缝":没有一种格式能同时满足"比 JPEG 小 30%、比 PNG 小 26%、还能动图 + alpha"。Google 当时刚刚在 2009 年用 1.246 亿美元收购了视频编码公司 On2 Technologies,手里握着一颗刚开源的 VP8 视频 codec——VP8 的 intra-frame(I 帧) 已经具备完整的图像帧内编码能力。Google 工程师的算盘很直接:与其重新发明轮子,不如直接把 VP8 的一帧拿出来,套一层 RIFF 容器,就是一种新的图片格式。WebP 由此诞生——它是历史上第一个"视频 codec 直接派生为图片格式"的工业级例子,后来 HEIC / AVIF 都走了完全相同的路线。
In 2010 Google looked at the web's image landscape and saw a clear gap in the JPEG / PNG / GIF triumvirate: nothing was simultaneously "30% smaller than JPEG, 26% smaller than PNG, and capable of both animation and alpha." Having just paid $124.6 million in 2009 to acquire the video-codec company On2 Technologies, Google now owned the VP8 video codec — and a VP8 intra-frame (I-frame) is already a complete still-image encoding pipeline. The Googlers did the obvious thing: pull out a single VP8 frame, wrap it in a RIFF container, ship it as a new image format. WebP was born — historically the first industrial-scale example of "video codec directly repurposed into still-image format". HEIC and AVIF later took the exact same playbook.
技术内核
Technical core
WebP 内部其实是两个完全独立的格式,共用一个 .webp 后缀和一个 RIFF 外壳。① VP8 intra-frame(有损):4×4 / 16×16 块预测(共 10 + 4 种 intra mode)→ 类 DCT 整数变换 → 量化 → boolean arithmetic coding(算术编码)。预测让"猜得准的部分不用传",算术编码比 Huffman 多挤出 5-15% 体积——这是 WebP 比 JPEG 小 30% 的两大功臣。② VP8L(无损):跟 VP8 一点关系都没有,是 Google 自己写的一套独立无损算法——14 种 spatial predictor + color cache(用 hash table 缓存最近用过的颜色)+ LZ77 + Huffman。在自然图像上比 PNG 小 26%,但编码慢 5-10×。③ RIFF 容器:借用微软 Wave 音频用过的 RIFF 格式——文件头是 RIFF<size>WEBP,后面跟 chunk 序列:VP8X(全局信息)/ VP8(有损主帧)/ VP8L(无损主帧)/ ALPH(独立 alpha 通道)/ ANIM + ANMF(动图)/ ICCP(色彩配置)/ EXIF / XMP。④ 独立 alpha:lossy 主帧不带 alpha,alpha 走单独的 ALPH chunk,可以选择无损 lossless 或有损 lossy 编 alpha——这是 WebP 比 JPEG + PNG 拼凑方案精巧的地方。⑤ animated WebP:ANIM 设全局参数(背景色 / 循环次数), ANMF 每帧带 disposal / blend / xy offset,逻辑跟 GIF 完全同源,但每帧用 VP8 / VP8L 编。
WebP is, in fact, two unrelated formats sharing a .webp extension and a RIFF wrapper. ① VP8 intra-frame (lossy): 4×4 / 16×16 block prediction (10 + 4 intra modes) → DCT-like integer transform → quantise → boolean arithmetic coding. Prediction means "the easy-to-guess parts don't need to ship" and arithmetic coding squeezes out another 5–15 % over Huffman — together those are why WebP runs ~30 % smaller than JPEG. ② VP8L (lossless): unrelated to VP8 — a separate lossless codec Google wrote from scratch — 14 spatial predictors + a color cache (hash-tabling recently-used colours) + LZ77 + Huffman. ~26 % smaller than PNG on natural images but 5–10 × slower to encode. ③ RIFF container: borrowed from Microsoft's Wave audio — the file starts with RIFF<size>WEBP, then a sequence of chunks: VP8X (global info) / VP8 (lossy main frame) / VP8L (lossless main frame) / ALPH (separate alpha channel) / ANIM + ANMF (animation) / ICCP (color profile) / EXIF / XMP. ④ Separate alpha: lossy main frames don't carry alpha; alpha lives in a dedicated ALPH chunk that can itself be encoded losslessly or lossily — much cleaner than JPEG + PNG patchwork. ⑤ animated WebP: ANIM sets the globals (background colour, loop count), each ANMF frame carries disposal / blend / xy-offset just like GIF, but each frame is itself VP8 or VP8L.
图 10 · WebP 全流程(lossy 主路径) · RGB → YUV 4:2:0 → 16×16/4×4 切块 → intra 预测(10 种)→ DCT-like 整数变换 → 量化(★ 唯一有损步骤,Q 0-100)→ boolean arithmetic 编码 → RIFF 包外壳(VP8X + VP8 + 可选 ALPH/ICCP/EXIF/ANIM)→ .webp。无损路径走另一条线:VP8L 的 14 predictor + color cache + LZ77 + Huffman。
Fig 10 · The full WebP pipeline (lossy main path) · RGB → YUV 4:2:0 → split into 16×16 / 4×4 blocks → intra prediction (10 modes) → DCT-like integer transform → quantise (★ the only lossy step, Q 0–100) → boolean arithmetic coding → RIFF wrap (VP8X + VP8 + optional ALPH / ICCP / EXIF / ANIM) → .webp. The lossless path goes elsewhere: VP8L's 14 predictors + color cache + LZ77 + Huffman.
RIFF<size>WEBP 12 字节文件头;再里头 VP8X 描述全局 flag + 画布尺寸;然后 VP8(lossy)和 VP8L(lossless)二选一;ALPH 单独装 alpha(可独立选有损或无损);ANIM + 多个 ANMF 用于动图;ICCP / EXIF / XMP 是可选 metadata。RIFF<size>WEBP; VP8X holds global flags + canvas size; then either VP8 (lossy) or VP8L (lossless); ALPH carries alpha independently (itself lossy or lossless); ANIM + multiple ANMF chunks make up animation; ICCP / EXIF / XMP are optional metadata.| codec | encode time | decode time | typical Q | 1080p photo |
|---|---|---|---|---|
| JPEG (mozjpeg) | 1.0 × | 1.0 × | 85 | ≈ 250 KB |
| WebP (cwebp) | ≈ 3 × | ≈ 1.5 × | 75 | ≈ 165 KB |
| AVIF (avifenc) | ≈ 50 × | ≈ 3 × | 60 | ≈ 95 KB |
$ cwebp -q 75 in.png -o out.webp # lossy default · Q 75 ≈ JPEG Q85 quality
$ cwebp -lossless in.png -o out.webp # VP8L lossless path · 5–10× slower
$ cwebp -near_lossless 60 in.png -o out.webp # lossy preprocessing then lossless encode
$ cwebp -q 80 -alpha_q 100 in.png -o out.webp # keep alpha lossless even with lossy RGB
$ webpmux -frame f1.webp +100 -frame f2.webp +100 \
-loop 0 -o anim.webp # build animated WebP from frames
$ dwebp out.webp -o decoded.png # decode back to PNG for inspection
适用
USE FOR
- 2026 web 图片首选——所有现代浏览器都支持(Chrome 32+ / Firefox 65+ / Safari 14+ / Edge 18+)
- 需要 alpha 的产品图、电商主图(替代 PNG-24)
- 需要 animation 的 UGC、表情、loading(替代 GIF,体积只有 1/4)
- CDN 自动转换 pipeline(Cloudinary、Fastly、Imgix 都支持)
- The default web image format in 2026 — every modern browser ships it (Chrome 32+, Firefox 65+, Safari 14+, Edge 18+)
- Product photos and e-commerce hero shots that need alpha (replaces PNG-24)
- UGC stickers, reactions, loading anims (replaces GIF at ¼ the size)
- CDN auto-conversion pipelines (Cloudinary, Fastly, Imgix all support it)
反适用
AVOID
- iOS < 14 设备(2014 年前的 iPhone 5/5s 等)
- 邮件附件(很多邮件客户端、Outlook 老版本不渲染)
- 设计交付 / 印刷输出(用 PNG / TIFF / PSD)
- 需要更高压缩率的现代场景——直接用 AVIF / JXL
- iOS < 14 devices (pre-2014 iPhone 5 / 5s and friends)
- Email attachments — many clients and older Outlook versions still won't render WebP
- Design hand-off / print output — use PNG / TIFF / PSD instead
- Modern scenarios that need maximum compression — go straight to AVIF / JXL
| scope | browsers | tools | CLI |
|---|---|---|---|
| WebP (lossy + lossless + alpha + anim) | ✓ Chrome 32+ · Firefox 65+ · Safari 14+ · Edge 18+ · Opera 19+ | ✓ Photoshop (24+ native) · Sketch · Figma · Squoosh · ImageMagick · GIMP · Affinity | cwebp / dwebp / webpmux / gif2webp (libwebp by Google) |
HEIC / HEIF — 苹果与专利墙
HEIC / HEIF — Apple and the Patent Wall
技术上是 AVIF 的爸爸,专利上是 AVIF 的反例。
Technically the parent of AVIF; legally the cautionary tale.
2015 年 MPEG 把 HEVC(H.265 视频)的帧内编码能力封装成一个图像容器规范,叫 HEIF(High Efficiency Image File Format),标准号 ISO/IEC 23008-12。思路与 WebP 完全同源:用现代视频 codec 的 intra-frame 做静态图像编码,用 ISOBMFF(MP4 同根的容器)装。HEIF 是个"容器规范",真正的像素 codec 由 payload 决定——用 HEVC 装就叫 HEIC(.heic),用 AVC/H.264 装就叫 HEIF AVCI;Apple 选了前者。2017 年 9 月 iOS 11 把相机默认存储格式从 JPEG 改成 HEIC——一夜之间,全球数亿台 iPhone 开始产生 HEIC 文件。比 JPEG 体积小一半、支持 10-bit HDR、支持 alpha、支持多对象嵌套——技术上没毛病,问题全在专利。
In 2015 MPEG wrapped HEVC's (H.265 video) intra-frame coding into an image-container spec called HEIF — High Efficiency Image File Format, ISO/IEC 23008-12. Same thinking as WebP: take a modern video codec's intra-frame, use it as a still-image codec, package it in ISOBMFF (the same container family as MP4). HEIF itself is just a container spec; the actual pixel codec depends on the payload — HEVC-payloaded HEIF is HEIC (.heic), AVC/H.264-payloaded HEIF is HEIF AVCI. Apple picked HEVC. In September 2017, iOS 11 switched the camera's default capture format from JPEG to HEIC — overnight, hundreds of millions of iPhones started producing HEIC files. Half the size of JPEG, 10-bit HDR support, alpha, nested multi-image objects — technically flawless. All the problems are in the patents.
heic = HEVC payload);meta 是元数据容器,里头 hdlr 标"图像句柄"、pitm 指定主图 item id、iinf 列所有 item、iloc 给 byte 偏移、iprp 装属性(HEVC config / color / 尺寸);mdat 装真正的 HEVC bitstream——主图、缩略图、派生项、alpha 都是独立 item,通过 iloc 查表找位置。ftyp declares the brand (heic = HEVC payload). meta is the metadata container — hdlr tags it as an image handler, pitm names the primary-item id, iinf lists every item, iloc gives their byte offsets, iprp carries item properties (HEVC config, colour, dimensions). mdat holds the actual HEVC bitstreams — main image, thumbnails, derived items, alpha all live as independent items, each addressed via iloc.技术内核
Technical core
HEIF / HEIC 的技术构造分四层。① HEIF 容器 = ISOBMFF box 系——跟 MP4 / MOV / 3GP 同根的"box-in-box"二进制格式,每个 box 4-byte size + 4-byte FourCC type + payload。这套格式过去 20 年被全球视频行业打磨得极其成熟,标准库一抓一大把,Apple 自然顺手。② HEVC intra-frame payload——CTU(Coding Tree Unit)最大可达 64×64,远大于 JPEG 的 8×8 / WebP 的 16×16,同样质量下 macroblock artifact 几乎肉眼不可见;intra prediction 有 35 种方向(DC + Planar + 33 angular),比 VP8 的 10 种细得多;后处理还有 SAO(Sample Adaptive Offset)和 deblocking filter,把块边界进一步抹平。这是 HEIC 能比 JPEG 小 50% 的核心。③ 多对象 / 派生项 / 网格——HEIF 不止能存"一张图",它能存"主图 + 缩略图 + 多视角图 + 派生编辑(裁剪 / 旋转 / 网格拼接)",每个对象一个 item,iloc 表查偏移。Apple 利用这个特性做"突发拍照"(把一个 burst session 的 10 张图打包成 1 个 .heic)。④ Live Photo 混合容器——iPhone 的 Live Photo 不是单文件,它是 1 张 .heic 静图(主关键帧)+ 1 段 .mov 视频(前后 1.5 秒 + 音频)的组合,iCloud 同步时把它们绑在一起作为"一个资产"管理——这是 HEIF 最被低估的工程贡献。
HEIF / HEIC has four technical layers. ① HEIF container = ISOBMFF box family — the same "box-in-box" binary format as MP4 / MOV / 3GP, every box is 4-byte size + 4-byte FourCC type + payload. Twenty years of video-industry tooling makes the spec battle-tested and trivial for Apple to adopt. ② HEVC intra-frame payload — the Coding Tree Unit can reach 64×64, much larger than JPEG's 8×8 or WebP's 16×16, so macroblock artefacts are practically invisible at the same quality; intra prediction has 35 directions (DC + Planar + 33 angular) versus VP8's 10; post-processing adds SAO (Sample Adaptive Offset) and a deblocking filter that further smooth block boundaries. That's the core reason HEIC weighs ~50 % less than JPEG. ③ Multi-item, derived items, grids — HEIF doesn't store "one image"; it stores "main image + thumbnails + multi-view images + derived edits (crop / rotate / grid-tile composition)". Each object is its own item, addressed via the iloc table. Apple uses this to pack a burst-photo session of ten images into a single .heic file. ④ Live Photo as a hybrid container — iPhone's Live Photo isn't a single file; it's a .heic still (the keyframe) + a .mov video (1.5 s before + 1.5 s after, with audio). iCloud syncs them as a bound pair, treating the combo as a single asset — HEIF's most underappreciated engineering contribution.
适用
USE FOR
- iPhone / iPad 拍照默认存储(2017 iOS 11+ 至今)
- iCloud 相册同步 / Apple Photos 编辑链
- 10-bit HDR 静态照片(P3 色域 + Dolby Vision Stills)
- Live Photo 双文件混合资产
- Apple 生态闭环内的高效存储与传输
- iPhone / iPad default photo storage (2017 iOS 11+ onward)
- iCloud Photos sync · Apple Photos edit chain
- 10-bit HDR stills (P3 gamut + Dolby Vision Stills)
- Live Photo's two-file hybrid asset
- High-efficiency storage and transfer inside Apple's walled garden
反适用
AVOID
- 任何需要 Web 通用兼容的场景:Chrome / Firefox 至今不支持原生 HEIC
- 跨平台分享 / 邮件附件:Windows、Android 默认看不了
- 商业项目的 Web 主图——用 WebP / AVIF
- 开源 / 自由软件管线——HEVC 专利费让大多数 FOSS 项目不愿意 ship 解码器
- Anything that needs broad web compatibility — Chrome and Firefox still won't ship native HEIC
- Cross-platform sharing / email attachments — Windows and Android won't render it by default
- Web hero images for commercial projects — use WebP / AVIF instead
- Open-source / libre pipelines — HEVC patent fees keep most FOSS projects from shipping a decoder
| scope | browsers | tools | CLI |
|---|---|---|---|
| HEIC / HEIF | ✓ Safari 17+ (macOS 14+ / iOS 17+) · ✗ Chrome · ✗ Firefox | ✓ macOS Preview · Apple Photos · Windows 10+ (HEIF Image Extension paid) · Photoshop 2023+ | heif-enc -q 60 in.png -o out.heic · heif-dec out.heic out.png (libheif) |
AVIF — AV1 的副产品成了王
AVIF — A Video Codec's Side-Effect Became King
为视频生的 codec,顺手把图片格式革命了一遍。
A video codec by birth — and it casually rewrote image formats.
2018 年 3 月 AOMedia 发布 AV1 视频编码 1.0,目标是做"完全免专利费的 HEVC 替代品"——背后是 Google / Mozilla / Cisco / Apple / Netflix / Microsoft / Intel / Amazon / Nvidia / Samsung 三十多家公司组成的联盟,带着各自的专利池交叉许可。AV1 走的是同一条"视频帧内 → 静态图片"路径(WebP / HEIC 都是这条路),把 intra-frame 编码能力套进 HEIF 容器(ISOBMFF),就拿到了 AVIF (AV1 Image File Format)——体积比 HEIC 略小、专利免费、跨厂商共识、Chrome 与 Firefox 与 Safari 三大引擎都点头。AVIF 2019 年 2 月发布标准,Chrome 85 (2020 年 8 月) 落地,Firefox 93 (2021 年 10 月) 跟进,Safari 16.4 (2023 年 3 月) 收尾——HEIC 阵营在 Web 上正式退场。
In March 2018 AOMedia shipped AV1 1.0 — the goal was a "completely royalty-free HEVC alternative". The alliance behind it is 30+ companies (Google, Mozilla, Cisco, Apple, Netflix, Microsoft, Intel, Amazon, Nvidia, Samsung…) cross-licensing their patent pools to make it stick. AV1 took the same "video intra-frame → still image" route as WebP and HEIC, wrapped its intra-frame encoder in HEIF (ISOBMFF), and out came AVIF (AV1 Image File Format) — smaller than HEIC, patent-free, cross-vendor, with all three big browser engines on board. The spec landed in February 2019, Chrome 85 shipped it in August 2020, Firefox 93 in October 2021, Safari 16.4 in March 2023. On the open web HEIC was officially out.
C = α·Y + β,只需要 signal 一个 α(每块 4 bit 左右);β 是块内均值。色度残差因此大大缩小——这是 AV1 在低 bitrate 下色彩还能保真的关键之一。C = α·Y + β. Only α needs to be signalled (≈4 bits per block); β is the chroma mean. Chroma residuals shrink dramatically — a major reason AV1 keeps colour fidelity at low bitrates.技术内核
Technical core
AVIF 的技术深度全在 AV1 这一侧——容器只是 HEIF 的复用。① AV1 intra prediction:56 种角度方向(粗扇 9° 步 + 细扇 3° 步)+ 4 种特殊模式(DC / Planar / Smooth / Paeth)+ CfL(Chroma from Luma,色度从亮度推导)+ Palette mode(每块独立小调色板,适合 UI 截图)+ Intra Block Copy(块内自指,跟视频的"运动补偿"对偶)——比 HEVC 的 35 方向、VP8 的 10 方向都细很多。② Superblock 128×128 + 多种切分:递归切到最小 4×4,还允许 2:1 / 1:2 / 4:1 矩形,平坦区整块保留、纹理区切细。③ 变换块 16 种组合:DCT-2 / ADST(非对称离散正弦)/ WHT(Walsh-Hadamard)/ IDTX(恒等)四种变换在 H/V 两个方向独立选择,共 4×4 = 16 种组合——纹理方向不同,选不同变换效率最优。④ HEIF 容器:跟 HEIC 完全同根的 ISOBMFF box 树(ftyp 'avif' · meta · iloc · iprp · mdat),thumbnail / alpha / depth map 都是独立 item。⑤ 专利策略是它最大的非技术杀招:AOMedia 的核心承诺是"会员单位互相 royalty-free 交叉许可,且对所有人 patent non-assert"——Google / Cisco 把已有专利池贡献进来,把"做一个免费 codec"从技术问题变成了行业政治问题,并赢了。
AVIF's technical depth lives on the AV1 side; the container is just HEIF reused. ① AV1 intra prediction: 56 angular directions (coarse 9° + fine 3° steps) + 4 special modes (DC / Planar / Smooth / Paeth) + CfL (Chroma-from-Luma) + Palette mode (per-block tiny palette, great for UI screenshots) + Intra Block Copy (intra-frame self-reference, the still-image dual of motion compensation). HEVC has 35 directions; VP8 had 10. ② Superblocks at 128×128 recursively split down to 4×4, with 2:1 / 1:2 / 4:1 rectangular partitions — flat regions stay whole, textured regions split. ③ Sixteen transform-block combinations: DCT-2 / ADST (asymmetric discrete sine) / WHT (Walsh-Hadamard) / IDTX (identity) chosen independently for H and V — 4×4 = 16 combos, different texture orientations get different transforms. ④ HEIF container: the same ISOBMFF box tree as HEIC (ftyp 'avif', meta · iloc · iprp, mdat); thumbnails, alpha and depth maps live as independent items. ⑤ Patent strategy is the real masterstroke: AOMedia's binding promise is "members cross-license royalty-free; every patent the alliance touches is non-asserted against the world". Google and Cisco committed their pools, and the question of "can a free codec exist?" turned from a technical one into an industry-politics one — which they won.
图 12 · AVIF 全流程 · YUV4:2:0 → 128×128 superblock 多级切分 → 56 方向 intra 预测(+ CfL/Palette/IBC) → 16 种变换组合 → 量化(★ 唯一有损步骤,cq-level 控制狠度) → CDF 自适应算术编码 → HEIF box 包外壳 → .avif。cq-level、speed、subsample 比、bit depth 是编码器主要旋钮。
Fig 12 · The full AVIF pipeline · YUV 4:2:0 → 128×128 superblock recursive split → intra prediction with 56 angular modes (+ CfL / Palette / IBC) → one of 16 transform combinations → quantise (★ the only lossy step, governed by cq-level) → CDF adaptive arithmetic coding → HEIF box wrapper → .avif. The main knobs are cq-level, encoder speed, chroma subsample and bit depth.
| codec | year | patent | 1080p photo @ JPEG-Q85 quality | encode time | browser support |
|---|---|---|---|---|---|
| JPEG | 1992 | free (post-2007) | ≈ 250 KB | 1 × | ✓✓✓ universal |
| WebP | 2010 | free | ≈ 165 KB | ≈ 3 × | ✓✓✓ since 2020 |
| HEIC | 2015 | $$$ (3 pools) | ≈ 125 KB | ≈ 20 × | only Safari |
| AVIF | 2019 | free (AOMedia) | ≈ 95 KB | ≈ 50 × | ✓✓✓ all modern |
| JXL | 2021 | free | ≈ 85 KB | ≈ 5 × | partial (Safari · Firefox flag) |
$ avifenc --min 0 --max 63 -a end-usage=q -a cq-level=23 in.png out.avif # typical Q23 — visually near-lossless
$ avifenc -j 8 -s 6 in.png out.avif # speed 0–10 (lower=better/slower); -j threads
$ avifenc -d 10 --yuv 444 in.png out.avif # 10-bit + 4:4:4 chroma — for HDR / design assets
$ avifdec out.avif decoded.png # reference decode via libavif
$ cavif --quality 80 in.png -o out.avif # Rust CLI built on rav1e; faster preset
适用
USE FOR
- 现代 Web 首屏主图 / Hero 图(预编码后 CDN 分发)
- 体积敏感 + 质量要求高的内容图(电商、媒体、博客)
- 透明 PNG 替代 — 体积可省 80–95%,肉眼几乎无损
- 10-bit HDR 图像分发(P3 / Rec.2020 色域)
- 响应式
<picture>中作为优先 source(配 WebP / JPEG fallback)
- Modern web hero / above-the-fold images (pre-encoded, CDN-served)
- Bandwidth-sensitive content images — e-commerce, media, blogs
- Transparent-PNG replacement — 80–95 % smaller, visually identical
- 10-bit HDR image delivery (P3 / Rec.2020 gamut)
- Top source in
<picture>with WebP / JPEG fallback
反适用
AVOID
- 需要 IE / 老 Android(< 5.0) / 老 Safari(< 16) 兼容的场景
- 编码时间敏感:CI 实时构建 / 服务器实时转码 / 浏览器端用户上传
- 用户头像 / 缩略图等"用一次就丢"的小图(编码成本不划算)
- 需要无损归档的工程影像(改用 PNG / EXR / TIFF)
- Anything that must run on IE, old Android (< 5.0), or old Safari (< 16)
- Encode-time-sensitive paths: CI builds, on-the-fly server transcoding, browser-side user uploads
- Avatars / throwaway thumbnails — encode cost outweighs the savings
- Lossless engineering archives — use PNG / EXR / TIFF instead
| scope | browsers | tools | CLI |
|---|---|---|---|
| AVIF · AVIF Sequence (anim) | ✓✓✓ Chrome 85+ (2020-08) · Firefox 93+ (2021-10) · Safari 16.4+ (2023-03) · Edge 121+ | ✓✓ Photoshop 24.2+ · Figma (export only) · GIMP 2.10+ · Squoosh · Cloudflare Images · imgix | avifenc (libavif) · cavif (rav1e) · sharp (Node) · ffmpeg -c:v libaom-av1 |
JPEG XL — 被 Chrome 砍掉的"完美"格式
JPEG XL — The "Perfect" Format Chrome Killed
技术上吊打所有人,被 Chrome 团队以"兴趣不足"砍掉。
Technically beats everyone. Chrome killed it citing "insufficient interest".
2017 年 AOMedia 已经在猛推 AVIF,但有一群人不满足:HDR 摄影师、印刷出版业、漫画 / 插画家、需要无损归档的博物馆、还有手里握着几十亿张 JPEG 资产没法迁移的所有人——AVIF 解决不了他们的问题。Cloudinary 与 Google Research 把两个独立项目(Cloudinary 的 FUIF + Google 的 PIK)合并,推出 JPEG XL,目标是做"一个能同时干完所有事的下一代格式":(a) 把现存 JPEG 文件 无损 transcode 成 JXL,体积省 ~20%,任何时候可逆向恢复原 byte-exact JPEG;(b) 现代 VarDCT lossy 编码,质量比 AVIF 略好;(c) Modular 模式做无损,比 PNG / WebP-LL 都小;(d) 真正的渐进式解码——第一段 ~1/64 数据就能显示完整的"像素化粗略图",随后几段越来越清晰;(e) 8–32 bit + float、HDR、宽色域、CMYK、高位深 alpha 全套原生。技术上几乎是"现代格式应该有的样子"的完整集成,2021 年 2 月以 ISO/IEC 18181 标准化通过——但落地之路比技术艰难得多。
By 2017 AOMedia was already pushing AVIF hard, but a constituency wasn't satisfied: HDR photographers, the print and publishing industry, comic/manga artists, archival museums, and anyone holding billions of legacy JPEGs they couldn't migrate — AVIF solved none of their problems. Cloudinary and Google Research merged two independent projects (Cloudinary's FUIF and Google's PIK) into JPEG XL with the explicit ambition of "doing all of it at once": (a) losslessly transcode existing JPEGs into JXL, ~20 % smaller, reversible to byte-exact original JPEG; (b) modern VarDCT lossy with quality slightly above AVIF; (c) Modular mode for lossless, smaller than both PNG and WebP-LL; (d) real progressive decoding — the first ≈1/64 of the bitstream already displays a complete coarse image, with subsequent segments adding detail; (e) native 8–32 bit + float, HDR, wide gamut, CMYK and high-bit-depth alpha. Technically it's the complete integration of "what a modern format should look like". ISO/IEC 18181 was published in February 2021. The path to adoption proved much harder than the engineering.
djxl 反向恢复时,bit-by-bit 还原原始 .jpg。这是其它现代 codec 都做不到的事。djxl to recover the original .jpg byte for byte. No other modern codec offers this.技术内核
Technical core
JXL 的技术广度是当代图像格式里最大的——它把"现代图像格式应该有的所有能力"打包进同一个容器,六个核心点:① VarDCT(可变块 DCT)——块大小可在 2×2 到 256×256 之间自由变化,远比 AVIF 的 4×4–128×128 灵活;搭配 XYB(感知分离的色彩空间,JXL 自创)+ 自适应量化矩阵(可按图像内容定制),lossy 模式直接对标 AVIF。② Modular 模式——meta-adaptive 预测器(WP / Gradient / Self-correcting,可学习权重)+ 通道变换链(Squeeze / RCT / 自定义 transform),做无损或 near-lossless,小于 PNG / WebP-LL 30–50%。③ JPEG 无损 transcode(最革命性):任意 JPEG 文件解码到 DCT 系数,不再变换、不再量化,直接用 JXL 的熵编码重新打包,体积省 ~20%;djxl 反向时 byte-exact 恢复原 JPEG——这是其它 codec 全都做不到的事。④ 真渐进式解码——比特流头部就是低分辨率版本,解码器收到前 ~1/64 字节就能渲染一张完整的低分辨率图(不像 progressive JPEG 是按频率扫描,中途看起来糊);非常适合慢网。⑤ HDR / 32-bit float / wide gamut / CMYK 全原生——无需 ICC profile hack,XYB 色空间内部就支持 HDR;打印行业的高位深 + CMYK 也是一等公民。⑥ Patch 系统——对图片中重复出现的 pattern(同一个表情、漫画里反复出现的角色脸)单独编码一次,在出现位置插入引用,极大压缩漫画 / 表情包 / 截图。技术上几乎是"现代图像格式应该有的样子"的完整集成。
JXL has the broadest technical surface area of any current image format — it bundles every capability "a modern image format ought to have" into one container. Six pillars: ① VarDCT — block sizes range freely from 2×2 to 256×256, far more flexible than AVIF's 4×4–128×128. Combined with XYB (a perceptually separated colour space JXL invented) and content-adaptive quantisation matrices, lossy mode trades blow-for-blow with AVIF. ② Modular mode — meta-adaptive predictors (WP / Gradient / Self-correcting, weights learnable) plus channel-transform chains (Squeeze / RCT / custom) deliver lossless or near-lossless that's 30–50 % smaller than PNG and WebP-lossless. ③ JPEG lossless transcode (the revolutionary one): decode any JPEG into its DCT coefficients, skip requantising and re-transforming, and just re-encode with JXL's entropy coder — about 20 % smaller. djxl recovers the original JPEG byte for byte. No other codec offers this. ④ True progressive decoding — the bitstream's head is the low-resolution version. Receive the first ~1/64 of bytes and the decoder renders a complete coarse image (unlike progressive JPEG, which scans by frequency and stays blurry mid-load). Excellent for slow networks. ⑤ HDR, 32-bit float, wide gamut, CMYK all native — no ICC-profile hacks; XYB supports HDR internally; high-bit-depth + CMYK are first-class for print. ⑥ Patch system — encode a repeating pattern (an emoji, a recurring character face in a comic) once, then place references at every occurrence. Comics, sticker sheets and screenshots compress dramatically.
图 13 · JXL 三条编码路径并存:lossy 走 VarDCT + 自适应量化(★ 唯一有损步骤)、lossless 走 Modular + 预测器、JPEG transcode 直接打包 DCT 系数;三条路径都汇入 ANS(asymmetric numeral system)熵编码,最后包进 .jxl 容器。djxl 可把 transcode 路径反向恢复为 byte-exact 的原 JPEG。
Fig 13 · JXL fans out into three coding paths: lossy via VarDCT + adaptive quantisation (★ the only lossy step), lossless via Modular + predictors, JPEG transcode by repacking DCT coefficients. All three converge on ANS (asymmetric numeral system) entropy coding before being wrapped in the .jxl container. djxl reverses the transcode path back to a byte-exact original JPEG.
| feature | JPEG | WebP | HEIC | AVIF | JXL |
|---|---|---|---|---|---|
| HDR / wide gamut | ✗ | ✗ | ✓ | ✓ | ✓ (XYB native) |
| 16+ bit depth | ✗ | ✗ | partial (10/12) | ✓ (10/12) | ✓ (up to 32 + float) |
| lossless mode | nominal (10918-1 part 4) | ✓ | ✓ | ✓ | ✓ (best in class) |
| JPEG recompress | — | ✗ | ✗ | ✗ | ✓ (lossless · ~20 % smaller · reversible) |
| progressive | by frequency (blurry) | ✗ | ✗ | ✗ | ✓ true spatial / DC-first |
| CMYK | ✓ | ✗ | ✗ | ✗ | ✓ first-class |
| Chrome support | ✓ | ✓ | ✗ | ✓ | ✗ (removed 2022-10) |
| Safari 17+ | ✓ | ✓ | ✓ | ✓ | ✓ (since 2023-09) |
$ cjxl in.png out.jxl --quality 90 # quality 0–100 (≈90 visually lossless)
$ cjxl in.png out.jxl --distance 1.0 # distance: 0=lossless, ~1=Q90, ~3=Q75
$ cjxl in.jpg out.jxl --lossless_jpeg 1 # JPEG → JXL lossless transcode (~20% smaller)
$ djxl out.jxl roundtrip.jpg # reverse transcode — byte-exact original .jpg
$ cjxl in.png out.jxl -d 0 -e 9 # lossless, max effort (smallest, slowest)
适用
USE FOR
- macOS / iOS 17+ 内部存储链路(Apple Photos 后端)
- 摄影 / RAW 后期管线(Lightroom · Capture One 已 native)
- 印刷出版业(CMYK + 高位深 first-class)
- HDR / wide-gamut / Dolby Vision Stills 长期归档
- 把现存 JPEG 资产无损迁移省 ~20% 体积(可逆)
- 漫画 / 表情包 / 截图(patch 系统压缩极优)
- macOS / iOS 17+ internal storage pipeline (Apple Photos back-end)
- Photography / RAW post pipelines (Lightroom · Capture One ship JXL natively)
- Print and publishing (CMYK + high bit depth as first-class)
- HDR / wide-gamut / Dolby Vision Stills long-term archives
- Migrating existing JPEG libraries — ~20 % smaller, fully reversible
- Comics / sticker sheets / screenshots (patch system compresses superbly)
反适用
AVOID
- 桌面 Chrome / Edge 主流量场景(2022-10 已移除支持)
- Android 主流浏览器(WebView / Chrome 同样不支持)
- 实时性能敏感的服务端 / 客户端 transcoding(库还在快速演进)
- 需要"全 Web 兼容"的公共图床 / CDN 默认输出
- Desktop Chrome / Edge mainstream traffic (support removed Oct 2022)
- Android's main browsers (WebView / Chrome don't support it either)
- Latency-sensitive server/client transcoding (libraries still maturing)
- "Universal web compatibility" as the default CDN output
| scope | browsers | tools | CLI |
|---|---|---|---|
| JPEG XL | ✓ Safari 17+ (2023-09) · flag Firefox image.jxl.enabled · ✗ Chrome (removed 2022-10) · ✗ Edge |
✓✓ Photoshop 24.2+ · Camera Raw · Lightroom · Capture One · Krita · GIMP 2.10.30+ · Affinity Photo 2 · macOS Preview / iOS Photos | cjxl · djxl (libjxl) · sharp (Node, libjxl-bind) |
KTX / KTX2 — 容器与 payload 的分离
KTX / KTX2 — separating container from payload
"它本身不是格式,是装格式的盒子。"
"Not a format itself — a box that holds formats."
GPU 块压缩格式(BCn / ETC2 / ASTC)的规范只规定了"4×4 像素块怎么编成几个字节",但没规定一个完整的纹理资产文件要怎么组织——mipmap 链怎么排?cubemap 的六个面怎么放?array layer 怎么索引?ICC color profile 放哪?Khronos 看不下去,做了 KTX(Khronos TeXture)当通用容器:头部 + key-value metadata + level/layer/face 的 byte-offset 索引表 + 真正的像素 payload。KTX 不关心 payload 是 BC7 还是 ASTC,只负责"把它装好、运行时一次性 upload 到 GPU"。2019 年 KTX2 加上 supercompression(用 Zstd 或 Basis Universal 把已经 GPU 压过的 payload 再压一遍),并把 mip 顺序改成 smallest-first 便于流式加载——成了 glTF 2.0 / WebGPU / Babylon / three.js 的资产事实标准。
GPU block-compression specs (BCn / ETC2 / ASTC) only define "how a 4×4 pixel block is encoded into a few bytes" — they say nothing about how a complete texture asset is laid out: how the mip chain is ordered, how the six faces of a cubemap sit together, how array layers are indexed, where the ICC colour profile lives. Khronos picked up the slack with KTX (Khronos TeXture): header + key-value metadata + a byte-offset index table for every level/layer/face + the actual pixel payload. KTX is payload-agnostic — it doesn't care whether the payload is BC7 or ASTC, it just packs the asset and lets the runtime upload it to the GPU in one go. KTX2 (2019) added supercompression — running the already-GPU-compressed payload through Zstd or Basis Universal a second time — and reversed the mip order to smallest-first so streaming loaders can swap in a low-res placeholder immediately. It is now the de-facto asset format for glTF 2.0, WebGPU, Babylon.js and three.js.
技术内核
Technical core
KTX 的设计有四个支点。① header + index 表——文件头 80 字节,描述纹理的逻辑维度(width / height / depth / mip levels / array layers / faces);后面跟一张 level index 表,告诉 loader 第 N 级 mip 在文件内的 byte offset 和 byte length。这种"先索引后数据"的布局让 loader 不用扫整个文件就能跳读任意 level。② 每 mip level 内有 padding——GPU 上传时纹理需要按硬件对齐(通常 4 字节或 8 字节边界),KTX 直接在 file format 层面加 padding,运行时 memcpy 一行就能直接交给 glCompressedTexImage2D。③ KTX2 supercompression——这是 KTX2 相对 KTX1 最大的进化。GPU 块压缩(BC7 / ASTC)在 GPU 端是不能再压的——它们必须保持"硬件能直接 sample"的格式。但传输时(网络下载、磁盘存储)可以再用 Zstd 把字节流压一遍,运行时解压回原样再 upload。Basis Universal 更激进:它在 KTX2 里存的是一种"中间表示",运行时按目标设备转码成 BC7(桌面 D3D12 / Vulkan)、ETC2(老移动)或 ASTC(现代移动)——一个文件,所有平台。④ 多对象类型——同一份 KTX2 可以装单 2D 纹理、cubemap(6 face)、texture array(N layer)、3D 体积纹理,甚至带 mipmap 的 cubemap array(常用于 IBL 反射探针)。glTF 2.0 用 KHR_texture_basisu 扩展把 KTX2 + Basis 钉成 PBR 资产的官方携带格式。
KTX rests on four pillars. ① Header + level index — an 80-byte header describes the texture's logical dimensions (width / height / depth / mip levels / array layers / faces); then a level index lists the byte offset and byte length of every mip level. With "index first, data later" a loader can seek straight to any level without scanning the whole file. ② Padding inside each mip level — GPUs require texture rows to land on hardware-aligned boundaries (typically 4- or 8-byte). KTX bakes the padding into the file so the runtime can memcpy a row straight into glCompressedTexImage2D. ③ KTX2 supercompression — the headline upgrade over KTX1. GPU block compression (BC7 / ASTC) cannot be re-compressed on the GPU — the format has to stay "hardware-sampleable". But for transit (download, disk) the byte stream can be Zstd'd once and decompressed at load time before upload. Basis Universal goes further: KTX2 stores an intermediate representation that the runtime transcodes per-device into BC7 (desktop D3D12 / Vulkan), ETC2 (older mobile) or ASTC (modern mobile). One file, every platform. ④ Multi-object payload — a single KTX2 can carry a 2D texture, a cubemap (6 faces), a texture array (N layers), a 3D volume texture, even a mipmapped cubemap array for IBL reflection probes. glTF 2.0's KHR_texture_basisu extension nails KTX2 + Basis as the official carrier for PBR assets.
适用
USE FOR
- glTF 2.0 模型纹理(
KHR_texture_basisu) - WebGPU / WebGL 2 资产管线
- 跨平台游戏纹理(一个 .ktx2 + Basis,运行时转目标格式)
- cubemap / texture array / 3D volume 纹理打包
- 需要流式加载的大尺寸纹理(smallest-first mip 顺序)
- glTF 2.0 model textures (
KHR_texture_basisu) - WebGPU / WebGL 2 asset pipelines
- Cross-platform game textures (one .ktx2 + Basis, transcoded at runtime)
- Cubemaps, texture arrays, 3D volumes packed into one file
- Large textures that need streaming (smallest-first mip order)
反适用
AVOID
- Web 主图 / 普通照片——KTX2 不是"图片格式",浏览器
<img>不解 - 编辑链(Photoshop / Affinity)——这是终端纹理资产,不是工作格式
- 不需要 GPU 直接 sample 的场景(用 PNG / WebP)
- Web hero images / regular photos — KTX2 is not an image format,
<img>won't decode it - Editing chains (Photoshop / Affinity) — this is a final-asset format, not a working format
- Anything that doesn't need direct GPU sampling — use PNG / WebP
| scope | browsers / engines | tools | CLI |
|---|---|---|---|
| KTX2 / Basis | ✗ 浏览器原生 · ✓ WebGL/WebGPU 通过 loader · ✓ Babylon.js · ✓ three.js KTX2Loader | ✓ Khronos KTX-Software · NVIDIA Texture Tools Exporter · AMD Compressonator | toktx --bcmp --t2 out.ktx2 in.png · basisu in.png -ktx2 -uastc |
DDS — DirectDraw Surface 容器
DDS — the DirectDraw Surface container
"D3D 时代的 KTX,只是没人记得它先来。"
"The KTX of the D3D era — except few remember it came first."
1999 年 Direct3D 7.0 推出的时候,游戏行业急需一个"硬件能直接 sample 的纹理容器"——你不能用 BMP / TGA,因为它们是 CPU 端 RGBA,显卡读到要先解压再上传,带宽吃不住。微软干脆把 DirectDraw Surface(.dds)定义成纹理资产的标准磁盘格式:头部 124 字节描述维度 / mip 数 / pixel format / cubemap 标记,后面直接是 DXT(后来的 BCn)块或未压缩 RGBA8 字节流。Khronos 的 KTX 要 6 年后(2005)才出来。所以严格讲,"GPU 纹理容器"这个范式是微软先做的——KTX 是开放生态对它的回应。Bethesda 时代的 PC 游戏 mod 圈,几乎所有纹理替换包都是 .dds——这就是它的护城河。
When Direct3D 7.0 shipped in 1999, the games industry urgently needed "a texture container the hardware could sample directly". BMP and TGA were CPU-side RGBA — the GPU would have to decompress and re-upload before sampling, and the bus simply couldn't take it. Microsoft defined DirectDraw Surface (.dds) as the standard on-disk texture asset: a 124-byte header describing dimensions / mip count / pixel format / cubemap flags, followed by raw DXT (later BCn) blocks or uncompressed RGBA8. Khronos's KTX wouldn't appear for another six years. Strictly speaking, the "GPU texture container" idea was Microsoft's first — KTX is the open-ecosystem reply. The Bethesda-era PC modding scene (Skyrim / Fallout) shipped texture replacements almost exclusively as .dds — that's the moat that keeps DDS relevant.
技术内核
Technical core
DDS 的结构简单到几乎没什么可讲的——这是它的优点。① 头部 124 字节 DDS_HEADER:固定字段描述 width / height / depth(volume 纹理用)/ mipMapCount / pitch(每行 byte 数)/ PixelFormat(老的 FourCC 字段:DXT1/DXT3/DXT5/...);加上 dwCaps / dwCaps2 标记(cubemap / volume / mip)。② DX10 扩展头 20 字节(可选):DirectX 10+ 引入的现代头,用 DXGI_FORMAT 枚举(DXGI_FORMAT_BC7_UNORM / DXGI_FORMAT_BC6H_UF16 / ...)替代 FourCC——因为新的块压缩格式(BC6H、BC7)的 FourCC 名字位不够用了。③ payload 直接是块压缩字节流——没有 padding 设计、没有 supercompression、没有 key-value metadata,只有最直接的 mip + face + layer 字节拼接。这是它跟 KTX2 最大的差距:DDS 是"足够好"的工程容器,KTX2 是"考虑到 Web / 跨平台 / Basis 转码"的现代容器。但对于 Windows / D3D 闭环,DDS 已经够用 25 年。
DDS's structure is almost embarrassingly simple — and that's its strength. ① The 124-byte DDS_HEADER: fixed fields for width / height / depth (for volume textures) / mipMapCount / pitch (bytes per row) / PixelFormat (the old FourCC field — DXT1 / DXT3 / DXT5 / …); plus dwCaps / dwCaps2 flags (cubemap / volume / mip). ② The optional 20-byte DX10 extension header: a modern header introduced in DirectX 10+ that swaps FourCC for the DXGI_FORMAT enum (DXGI_FORMAT_BC7_UNORM / DXGI_FORMAT_BC6H_UF16 / …) — necessary because newer block formats (BC6H, BC7) ran out of FourCC bits. ③ The payload is just block-compressed bytes — no padding scheme, no supercompression, no key-value metadata, just the most direct possible concatenation of mip × face × layer bytes. That's the gap with KTX2: DDS is a "good enough" engineering container, KTX2 is a modern container that thinks about Web, cross-platform delivery and Basis transcoding. For a Windows / D3D walled garden, though, DDS has been sufficient for 25 years.
适用
USE FOR
- Windows PC 游戏纹理资产
- D3D9 / D3D11 / D3D12 引擎(原生支持)
- Bethesda / Valve / id Tech 等老牌游戏 mod 包
- Unreal Engine 4 / 5 中间纹理(导入前)
- Windows PC game texture assets
- D3D9 / D3D11 / D3D12 engines (native support)
- Bethesda / Valve / id Tech mod packs
- Unreal Engine 4 / 5 intermediate textures (pre-import)
反适用
AVOID
- 跨平台 / Web / 移动端——用 KTX2
- 需要 supercompression(Zstd / Basis)的资产管线
- 需要 ICC color profile / 丰富 metadata 的工程影像
- Cross-platform / Web / mobile — use KTX2
- Pipelines that need supercompression (Zstd / Basis)
- Engineering imagery that needs ICC profiles or rich metadata
| scope | engines | tools | CLI |
|---|---|---|---|
| DDS | ✓ D3D9–12 原生 · ✓ Unreal · ✓ Unity · ✓ Source / id Tech | ✓ NVIDIA Texture Tools · DirectXTex · GIMP DDS 插件 · Photoshop NVIDIA Plug-in | texconv -f BC7_UNORM in.png · nvtt_export -f bc7 -o out.dds in.png |
BC1 (DXT1) — 4×4 块、4 bpp 的祖宗
BC1 (DXT1) — the 4×4-block, 4-bpp ancestor
"4 个像素压成 8 字节,显存砍掉 8 倍,从此再也回不去。"
"Four pixels squeezed into eight bytes — VRAM cut 8×, no going back."
1998 年的 GPU 显存极其稀缺——NVIDIA Riva TNT 旗舰 16 MB,普通卡 8 MB,而一张 256×256 的 RGBA 纹理就要 256 KB。一个游戏关卡要几十张纹理,显存装不下,带宽更扛不住(显存带宽要支撑帧缓冲、Z-buffer、纹理 sample 三路并发)。S3 Graphics 提出 S3TC(S3 Texture Compression):把 4×4 = 16 个像素打包成 8 字节,体积压到 1/8(原 64 字节),GPU 纹理单元在 sample 时硬件解块——不需要 CPU 全图解压上传,显存里存的就是块数据。一夜之间,同样显存能装 8 倍的纹理,带宽吃掉 1/8。这是 GPU 块压缩的开山之作,定义了往后 25 年所有 BCn / ETC / ASTC 的基础范式:固定大小块 + 端点 + 内插 + 索引。
In 1998, GPU VRAM was scarce — NVIDIA's flagship Riva TNT had 16 MB, mid-range cards 8 MB. A single 256×256 RGBA texture cost 256 KB. A game level needed dozens; the VRAM couldn't hold them and the bus couldn't feed them (memory bandwidth had to serve framebuffer, Z-buffer and texture sampling at the same time). S3 Graphics proposed S3TC (S3 Texture Compression): pack 4×4 = 16 pixels into 8 bytes, an 8× shrink from the original 64 bytes; the texture unit decodes a block on the fly during sampling, so VRAM stores the compressed blocks directly without any CPU-side full-image decompression. Overnight, the same VRAM could hold 8× as many textures and the bus had to move 1⁄8 the bytes. This is the founding act of GPU block compression and it set the template every later BCn / ETC / ASTC variant follows: fixed-size block + endpoints + interpolation + per-pixel index.
技术内核
Technical core
BC1 的"4×4 块 + 端点 + 内插 + 索引"四件套是它的全部技术内核,也是后面所有 BCn / ETC / ASTC 都在改进的同一个范式。① 固定大小块——4×4,绝不可变。这是为了让 GPU 纹理单元能直接通过坐标计算定位到块,不需要扫表;sample 一个像素只需要"算块号 → 加载 8 字节 → 解端点 → 查 index → 输出颜色"四步,完全硬件实现。② 端点 + 内插——只存两个端点 c0/c1(RGB565,各 16-bit),内插出 c2/c3 让块能表达 4 种颜色。这是个赌博:它假设一个 4×4 块内的颜色变化是"沿着色空间一条直线"的,适用于大多数自然纹理(草地、石头、皮肤)但对锯齿状颜色边缘会糊。③ 2-bit/像素 index——每像素只需要 2 bit 选 4 选 1,16 像素共 32 bit = 4 byte,跟 endpoints 的 4 byte 加一起正好 8 byte 一块。④ 1-bit alpha 隐藏档——如果 c0 ≤ c1(数值上),BC1 进入"alpha 模式":c3 变成"完全透明",c2 = (c0 + c1)/2 只有一种内插;每像素的 index = 3 表示透明。这就是 BC1 的"穷人 alpha"——只有透/不透,但不占额外字节。需要平滑 alpha 必须升级 BC2 / BC3。
BC1's "4×4 block + endpoints + interpolation + index" combo is the entire technical core — every later BCn / ETC / ASTC just iterates on this same template. ① Fixed-size blocks — 4×4, immutable. This lets the GPU's texture unit address a block directly via coordinate arithmetic, no lookup needed; sampling one pixel reduces to "compute block id → load 8 bytes → decode endpoints → read index → emit colour", four steps, all hardware. ② Endpoints + interpolation — only two endpoints c0/c1 (RGB565, 16 bits each) are stored; c2/c3 are interpolated so the block expresses four colours. It's a bet: BC1 assumes the colour variation in any 4×4 block lies along a straight line in colour space. True enough for most natural textures (grass, stone, skin), but jagged colour edges blur. ③ 2 bits per pixel of index — each pixel just needs 2 bits to choose one of four colours; 16 pixels × 2 bits = 32 bits = 4 bytes, which combined with the 4 bytes of endpoints lands exactly at 8 bytes per block. ④ 1-bit alpha hidden mode — if c0 ≤ c1 numerically, BC1 enters "alpha mode": c3 becomes fully transparent, c2 = (c0 + c1)/2 is the only interpolated colour, and an index of 3 means transparent. That's BC1's "poor man's alpha" — opaque/transparent only, no extra bytes. For smooth alpha you have to step up to BC2 / BC3.
适用
USE FOR
- 不带 alpha 或仅需 1-bit alpha 的 RGB 纹理
- 老游戏 / 移动端低端设备(对带宽极度敏感)
- 显存预算极紧的 lightmap / 大尺寸地形纹理
- BC7 不可用的旧 D3D9 / OpenGL ES 2.0 平台
- RGB textures with no alpha (or 1-bit alpha at most)
- Older games / low-end mobile devices (extreme bandwidth sensitivity)
- Lightmaps and large terrain textures with tight VRAM budgets
- Legacy D3D9 / OpenGL ES 2.0 platforms where BC7 isn't available
反适用
AVOID
- 需要平滑 alpha 渐变(粒子、烟雾、UI 圆角)——用 BC3 / BC7
- 颜色梯度细致的高质量纹理——块伪影明显
- 法线贴图——4-bpp 端点精度不够,用 BC5
- HDR——用 BC6H
- Smooth alpha gradients (particles, smoke, UI rounded corners) — use BC3 / BC7
- Fine colour gradients in high-quality textures — block artefacts show
- Normal maps — 4-bpp endpoint precision is too coarse, use BC5
- HDR — use BC6H
| scope | APIs | tools | CLI |
|---|---|---|---|
| BC1 / DXT1 / S3TC | ✓ D3D 全版本 · ✓ Vulkan · ✓ Metal · ✓ OpenGL 4.2+ (ARB) / OpenGL ES 3.0+ (extension) | ✓ NVIDIA Texture Tools · AMD Compressonator · texconv · Crunch (cross-platform) | nvtt_export -f bc1 -o out.dds in.png · toktx --t2 --bcmp out.ktx2 in.png |
BC2 / BC3 (DXT3 / DXT5) — alpha 处理两条路
BC2 / BC3 (DXT3 / DXT5) — two ways to handle alpha
"BC2 给你显式 4-bit alpha,BC3 让 alpha 也学 BC1 的内插。"
"BC2 gives explicit 4-bit alpha; BC3 lets alpha use the BC1 trick too."
BC1 的 1-bit alpha(透 / 不透)对游戏 UI 圆角、粒子边缘、烟雾、玻璃、毛发都不够——这些都需要平滑的 alpha 渐变(0 到 255 中间的值)。S3 / Microsoft 在 1998-1999 同时提出 DXT3 和 DXT5 两条路:DXT3(BC2)粗暴,每像素直接给 4-bit alpha,16 像素共 64 bit = 8 byte;再加 BC1 的 8 byte 颜色块,共 16 byte/块,8 bpp。DXT5(BC3)聪明,把 alpha 也当成"端点 + 内插"块——存 2 个 8-bit alpha 端点 + 6 个内插值(共 8 种 alpha) + 每像素 3-bit index;颜色块仍用 BC1 那套。两者体积一样(16 byte/块),但 BC3 在平滑 alpha 渐变(粒子、烟雾)上明显好,BC2 在锐利 alpha 边缘(UI 图标的 1-bit-like alpha)上略好——但实践中 BC3 几乎全胜。所以游戏圈 BC3 / DXT5 才是事实主流。
BC1's 1-bit alpha (opaque or transparent, nothing in between) wasn't enough for game UI rounded corners, particle edges, smoke, glass or hair — all of those need smooth alpha gradients (values between 0 and 255). S3 / Microsoft proposed DXT3 and DXT5 in 1998-1999, two roads. DXT3 (BC2) is brute force: store an explicit 4-bit alpha per pixel; 16 pixels × 4 bits = 64 bits = 8 bytes; plus the 8-byte BC1 colour block, total 16 bytes per block at 8 bpp. DXT5 (BC3) is clever: treat alpha as an "endpoints + interpolation" block too — 2 × 8-bit alpha endpoints + 6 interpolated values (8 alpha levels in total) + a 3-bit index per pixel; the colour block still uses BC1. Both occupy the same 16 bytes per block, but BC3 clearly wins on smooth alpha gradients (particles, smoke); BC2 has a slight edge on razor-sharp alpha edges (UI icons that are basically 1-bit alpha). In practice BC3 wins almost everywhere — so the games industry treats BC3 / DXT5 as the de-facto default.
技术内核
Technical core
两个格式的核心差异全在 alpha 块。① 都是 16 byte/块,8 bpp——BC1 的颜色块 8 byte 不变,各加 8 byte 的 alpha 块。颜色端点和 BC1 一样:c0/c1 RGB565 + 内插 c2/c3 + 2-bit index——没区别。② BC2 的 alpha 块 = 16 个 4-bit 直接值——每像素 0-15 表示 alpha 量化到 16 阶。优点:对锐利 alpha 边界(UI 图标、纹理掩码)无量化误差;缺点:平滑渐变只有 16 阶,会出 banding。BC2 在 1999 年被一些早期 UI 系统用过,后来逐渐让位给 BC3。③ BC3 的 alpha 块 = BC1 alpha 化——存 2 个 8-bit alpha 端点 a0/a1(各 1 byte = 2 byte),如果 a0 > a1 用 6 个 1/7 步长内插值(共 8 阶),如果 a0 ≤ a1 用 4 个内插值 + 2 个保留(0 和 255 的硬端点)= 8 阶里有 2 个固定;每像素 3-bit index(16 px × 3 bit = 48 bit = 6 byte)。共 2+6 = 8 byte。BC3 在平滑 alpha(粒子、烟雾、毛发)上明显优于 BC2,代价是锐利 alpha 边缘会有轻微模糊。④ 命名混乱:游戏圈一般叫 DXT3 / DXT5(D3D 老命名),Khronos / Vulkan / Metal 一般叫 BC2 / BC3——同一个东西两套名字,是 OpenGL 和 D3D 命名分歧的活化石。
The whole difference between the two lives in the alpha block. ① Both are 16 bytes per block, 8 bpp — the BC1 colour block (8 bytes) is unchanged; each format adds an 8-byte alpha block. Colour endpoints, c2/c3 interpolation and 2-bit indices are identical to BC1 — no surprises there. ② BC2's alpha block = 16 explicit 4-bit values — each pixel quantises alpha to one of 16 levels. Pro: zero quantisation error on sharp alpha edges (UI icons, masks). Con: only 16 levels, so smooth gradients band. BC2 saw use in some early-2000s UI systems and then quietly handed the baton to BC3. ③ BC3's alpha block = BC1, applied to alpha — store 2 × 8-bit alpha endpoints a0/a1 (1 byte each = 2 bytes); if a0 > a1, interpolate 6 values at 1/7 steps (8 levels total); if a0 ≤ a1, interpolate 4 values + reserve two slots for hard 0 and 255 (2 of the 8 levels are fixed); 3-bit index per pixel (16 × 3 = 48 bits = 6 bytes). Total 2 + 6 = 8 bytes. BC3 clearly beats BC2 on smooth alpha (particles, smoke, hair), at the cost of slightly fuzzier sharp alpha edges. ④ Naming chaos: the games industry says DXT3 / DXT5 (D3D legacy); Khronos / Vulkan / Metal say BC2 / BC3 — same thing, two name systems, a living fossil of the OpenGL-vs-D3D naming split.
适用
USE FOR
- BC2 → 锐利 alpha 边缘的 UI 图标、纹理掩码
- BC3 → 平滑 alpha 渐变的粒子、烟雾、毛发、玻璃、UI 圆角
- 需要 alpha 但 BC7 不可用的旧平台(D3D9 / GL ES 2.0)
- BC2 → sharp alpha edges (UI icons, texture masks)
- BC3 → smooth alpha gradients (particles, smoke, hair, glass, UI rounded corners)
- Anything that needs alpha on legacy platforms where BC7 isn't available (D3D9 / GL ES 2.0)
反适用
AVOID
- 现代项目(2015+)——BC7 在质量上完全替代,体积一样
- 法线贴图——用 BC5
- HDR——用 BC6H
- Modern projects (2015+) — BC7 fully replaces both at the same size with better quality
- Normal maps — use BC5
- HDR — use BC6H
| scope | APIs | tools | CLI |
|---|---|---|---|
| BC2 / BC3 | ✓ D3D 全版本 · ✓ Vulkan · ✓ Metal · ✓ OpenGL 4.2+ | ✓ NVIDIA Texture Tools · AMD Compressonator · texconv · Crunch | nvtt_export -f bc3 -o out.dds in.png · texconv -f BC3_UNORM in.png |
BC4 / BC5 — 单/双通道,法线贴图省一通道
BC4 / BC5 — single / dual channel, dropping a channel from normal maps
"法线贴图省一通道,显存再砍一半。"
"Drop a channel from normal maps; halve the VRAM again."
游戏图形里,法线贴图是仅次于 albedo 的第二大显存消耗——每个像素一个法线向量(X, Y, Z)。直觉上要 RGB 三通道,但法线是单位向量(长度 = 1),所以 Z 可以由 X / Y 推导出来:Z = sqrt(1 - X² - Y²)。这意味着实际只需要存 X / Y 两个通道,Z 在 fragment shader 里现算。BC5 就是为这个场景设计的——只存 R / G 两通道,每个通道用 BC3 的 alpha 块法(端点 + 内插 + 3-bit index),共 16 byte/块、8 bpp。BC4 是 BC5 的"半个版本",只存一个通道,用于灰度纹理:高度图、roughness 图、AO 遮罩、metallic 通道。BC4 / BC5 的本质是"把 BC3 的 alpha 块单独拎出来当颜色通道用"——这种"通道拆分 + 几何内插"的思路把法线贴图的显存占用从 BC3 RGB 的 8 bpp 直接砍到 BC5 RG 的 8 bpp 但质量提升 3-5×(因为不浪费 bits 在不需要的通道上)。
In game graphics, normal maps are the second-largest VRAM hog after albedo — every pixel stores a normal vector (X, Y, Z). Intuitively that means three RGB channels, but a normal is a unit vector (length 1), so Z can be derived: Z = sqrt(1 − X² − Y²). You really only need to store X / Y; the fragment shader recomputes Z. BC5 is built for exactly that — store just R / G, each compressed with the BC3 alpha-block trick (endpoints + interpolation + 3-bit index), 16 bytes per block at 8 bpp. BC4 is the "half-version" of BC5: just one channel, for greyscale textures — height maps, roughness maps, AO masks, the metallic channel. BC4 / BC5 are essentially "BC3's alpha block lifted out and used as a colour channel". This "channel split + geometric interpolation" trick keeps normal maps at 8 bpp (same as BC3 RGB) but bumps quality 3-5× because no bits are wasted on a channel you don't need.
技术内核
Technical core
BC4 / BC5 的设计思路简洁到一句话:把 BC3 的 alpha 块当成"通用的单通道压缩块"用。① BC4 = BC3 的 alpha 块独立——4×4 块,8 byte;存 2 个 8-bit 端点 r0 / r1(2 byte)+ 6 个内插值(隐含,不占字节,运行时算)+ 每像素 3-bit index(16 × 3 = 48 bit = 6 byte);共 8 byte / 16 像素 = 4 bpp。每像素只有一个 8-bit 通道(原数据的 R)。② BC5 = 两个 BC4 块叠加——一个 BC4 块存 R(法线 X),一个 BC4 块存 G(法线 Y);共 16 byte / 块 = 8 bpp。Z 不存,fragment shader 里算 z = sqrt(1 - x*x - y*y)——单 sqrt + 2 mul + 1 sub,GPU 一周期完成。③ BC4 的"unsigned"和"signed"两种模式:BC4_UNORM(0-255)和 BC4_SNORM(-128 到 127),后者专门给法线分量这种"中心对称"信号用,避免 0.5 偏置。BC5 同理。④ 命名又分裂:Khronos 叫 BC4 / BC5,Microsoft 老命名叫 ATI1 / ATI2(AMD 提出的格式名),OpenGL ARB 扩展叫 RGTC1 / RGTC2(Red-Green Texture Compression)——三套名,一个东西。游戏引擎源码里三种叫法都能见到。
BC4 / BC5's design boils down to one sentence: take BC3's alpha block and reuse it as a generic single-channel compression block. ① BC4 = BC3's alpha block, standalone — 4×4 block, 8 bytes; 2 × 8-bit endpoints r0 / r1 (2 bytes) + 6 implicit interpolated values (computed at runtime, no bytes spent) + 3-bit per-pixel index (16 × 3 = 48 bits = 6 bytes); total 8 bytes / 16 pixels = 4 bpp. Each pixel carries one 8-bit channel (the input's R). ② BC5 = two BC4 blocks stacked — one BC4 block for R (normal X), one for G (normal Y); 16 bytes per block = 8 bpp. Z isn't stored — the fragment shader computes z = sqrt(1 − x*x − y*y), one sqrt + two muls + one sub, retired in a single GPU cycle. ③ BC4 has UNORM and SNORM modes — BC4_UNORM (0-255) and BC4_SNORM (−128 to 127); the signed variant is specifically for centre-symmetric signals like normal components, avoiding a 0.5 bias. BC5 mirrors this. ④ Naming forks again: Khronos says BC4 / BC5; Microsoft's legacy names are ATI1 / ATI2 (AMD-coined names); the OpenGL ARB extension calls them RGTC1 / RGTC2 (Red-Green Texture Compression). Three names, one thing — and you'll see all three in any sufficiently old engine source tree.
适用
USE FOR
- BC5 → 法线贴图(行业标准,Unreal / Unity / id Tech 默认)
- BC4 → roughness / metallic / AO / 高度图 等单通道
- SDF(Signed Distance Field)字体纹理(BC4)
- 需要 R / G 双通道但不需要 B 的任何场景
- BC5 → normal maps (industry standard — Unreal / Unity / id Tech default)
- BC4 → single-channel data: roughness / metallic / AO / height maps
- SDF (Signed Distance Field) font textures (BC4)
- Anything that needs R / G but not B
反适用
AVOID
- 需要 RGB 三通道的彩色纹理(用 BC1 / BC7)
- HDR(用 BC6H)
- 3-channel colour textures (use BC1 / BC7)
- HDR (use BC6H)
| scope | APIs | tools | CLI |
|---|---|---|---|
| BC4 / BC5 | ✓ D3D10+ · ✓ Vulkan · ✓ Metal · ✓ OpenGL 4.0+ (RGTC) | ✓ NVIDIA Texture Tools · AMD Compressonator · texconv · Unreal / Unity 自动用 | nvtt_export -f bc5 -o normal.dds normal.png · texconv -f BC5_UNORM normal.png |
BC6H — HDR 块压缩
BC6H — HDR block compression
"显存里的 HDR — 反射探针、cubemap 全靠它。"
"HDR in VRAM — reflection probes and cubemaps depend on it."
PBR(基于物理的渲染)需要 HDR 环境贴图——天空、室内 IBL 反射探针、自发光场景全是。问题是 BC1-5 都基于 8-bit/通道 端点 + 内插,根本无法表达 float16 的 [-65504, 65504] 范围。如果用未压缩 RGBA16F,一张 1024×1024 的 cubemap(6 面)要 1024×1024×6×8 = 48 MB。一个室外场景几张 cubemap 几百 MB 就没了。BC6H 是 D3D11 时代专门为 HDR 设计的块压缩:4×4 块、16 byte/块、8 bpp(跟 BC7 同尺寸),但 payload 直接是 float16 RGB(无 alpha)。它用 14 种块模式来权衡精度——根据这块的颜色分布选最合适的模式。BC6H 让 HDR cubemap 体积从 RGBA16F 的 64 bpp 砍到 8 bpp(8× 压缩),同时保持 float16 的动态范围——这是 PBR 渲染管线得以普及的硬件基础。Unreal Engine 4 / 5、Unity HDRP 默认对 cubemap 的 HDR 资产用 BC6H。
PBR (physically based rendering) needs HDR environment maps — skies, indoor IBL reflection probes, emissive scenes all live in HDR. The trouble is that BC1-5 all rely on 8-bit-per-channel endpoints + interpolation, so they simply cannot express float16's [−65504, 65504] range. Uncompressed RGBA16F would cost 1024 × 1024 × 6 × 8 = 48 MB for a single 1024² cubemap (six faces); an outdoor scene with a handful of cubemaps blows past hundreds of MB. BC6H is the D3D11-era block format built specifically for HDR: 4×4 block, 16 bytes per block, 8 bpp (same size as BC7), but the payload is float16 RGB (no alpha). Its trick is 14 block modes that trade off precision differently — the encoder picks the mode best suited to that block's colour distribution. BC6H takes HDR cubemaps from RGBA16F's 64 bpp down to 8 bpp (8× compression) while keeping float16's dynamic range. That's the hardware foundation that lets PBR pipelines exist at scale today. Unreal Engine 4 / 5 and Unity HDRP default to BC6H for HDR cubemap assets.
技术内核
Technical core
BC6H 跟 BC1-5 不是同一类设计——它没有"统一 4 端点 + 内插"的简洁结构,而是 14 种块模式让编码器按块的颜色分布挑最优解。① 14 种块模式——每种模式给端点不同 bit 数(如 10-10-10、7-6-6-6、11-5-4-4 等三/四个分量)、是否启用 2 分区(把 4×4 块拆成两组,每组独立端点 + 内插,适用于块内有明显颜色边界的情况)、index 用 3-bit 还是 4-bit。编码器对每个块尝试多种模式,挑 PSNR 最高那个塞进 16 byte。② 端点用 float16 表示——这是 BC6H 区别于所有其他 BC 的核心。BC1-5 的端点是定点整数(RGB565 或 8-bit),只能表示 0-1;BC6H 的端点是浮点,可以表示 [-65504, 65504]——HDR 高光、太阳直射、自发光物体的真实数值都能装进去。③ UF16 (unsigned) vs SF16 (signed)——UF16 范围 [0, 65504],适合不会有负值的 HDR 颜色;SF16 范围 [-65504, 65504],适合可能有负值的 HDR 法线或其他工程数据。④ 4×4 块仍只 16 byte——这是工程上最重要的一点:BC6H 跟 BC7 一样是 8 bpp,HDR 的体积成本只比 LDR 多 1×(BC1 是 4 bpp,BC7 / BC6H 都是 8 bpp)。这个"HDR 不贵"的承诺让 IBL 反射探针 / cubemap 的大规模使用成为可能——Unreal Engine 默认每个室外场景烘焙几十张 BC6H cubemap。
BC6H isn't built like BC1-5 — there's no clean "two endpoints + interpolation" template. Instead, 14 block modes let the encoder pick the best fit for that block's colour distribution. ① 14 block modes — each mode allocates different bit counts to the endpoints (e.g. 10-10-10, 7-6-6-6, 11-5-4-4, three or four components), optionally enables 2-partition mode (split the 4×4 block into two regions, each with its own endpoints + interpolation, which helps when a block has a sharp colour boundary), and uses 3- or 4-bit indices. The encoder tries multiple modes per block and packs whichever maximises PSNR into the 16-byte block. ② Endpoints expressed as float16 — this is the one thing that sets BC6H apart from every other BCn. BC1-5 endpoints are fixed-point integers (RGB565 or 8-bit) capped at 0-1; BC6H endpoints are floating point and can express [−65504, 65504] — the actual numerical range of HDR highlights, direct sun, emissive surfaces. ③ UF16 (unsigned) vs SF16 (signed) — UF16's range is [0, 65504], suitable for non-negative HDR colour; SF16's is [−65504, 65504], suitable for HDR normals or other engineering data that may go negative. ④ 4×4 block, still just 16 bytes — and this is the most important engineering fact: BC6H is 8 bpp, the same as BC7. HDR costs only 1× more bytes than LDR (BC1 is 4 bpp, BC7 / BC6H are 8 bpp). That "HDR isn't expensive" promise is what makes large-scale IBL reflection probes and HDR cubemaps practical — Unreal Engine routinely bakes dozens of BC6H cubemaps per outdoor scene.
适用
USE FOR
- HDR cubemap(天空盒、IBL 反射探针)
- 烘焙的 lightmap HDR 部分
- HDR 自发光纹理(霓虹灯、屏幕、火焰)
- volumetric 体积纹理(雾 / 云,需要 HDR 强度)
- HDR cubemaps (skyboxes, IBL reflection probes)
- HDR portions of baked lightmaps
- HDR emissive textures (neon, screens, flames)
- Volumetric textures (fog / clouds — need HDR intensity)
反适用
AVOID
- LDR 纹理(用 BC7,质量更好且支持 alpha)
- 需要 alpha 的 HDR(BC6H 不支持 alpha)
- D3D10 及以下平台(BC6H 是 D3D11+)
- 移动 GPU 早期型号(看 BPTC / ASTC HDR 支持情况)
- LDR textures (use BC7 — better quality and supports alpha)
- HDR that needs alpha (BC6H has no alpha channel)
- D3D10 and earlier (BC6H requires D3D11+)
- Older mobile GPUs (check BPTC / ASTC HDR support)
| scope | APIs | tools | CLI |
|---|---|---|---|
| BC6H | ✓ D3D11+ · ✓ Vulkan · ✓ Metal (macOS / iOS Apple Silicon) · ✓ OpenGL 4.2+ (BPTC) | ✓ NVIDIA Texture Tools · AMD Compressonator · texconv · ISPC bc6h_enc | nvtt_export -f bc6h -o sky.dds sky.exr · texconv -f BC6H_UF16 sky.exr |
BC7 — 现代 BCn 的集大成
BC7 — the synthesis of modern BCn
"一种格式,八种块模式,自动挑最合适那种。"
"One format, eight block modes — pick whichever fits best."
BC1-5 各自只擅长一种场景:BC1 是 RGB 无 alpha、BC2 是 RGB + 锐利 alpha、BC3 是 RGB + 平滑 alpha、BC4 是单通道、BC5 是双通道。游戏纹理混合场景多——一张角色贴图可能同时有平滑 RGB 渐变 + 锐利 alpha 边缘 + 高频金属反光,任何单一 BCn 都解释不了整张。美术希望"一种格式覆盖所有"——不用每张图手动挑 BCn。BC7 的解法是 8 种内部块模式 + 编码器为每个 4×4 块自动挑最合适那种:同样 8 bpp(跟 BC2 / BC3 一样),但同图视觉质量比它们好 5-10×,几乎追上未压缩。BC7 因此成为 D3D11 时代之后桌面游戏纹理的事实唯一选择——AAA 游戏 90% 桌面贴图都用 BC7。
BC1-5 each excel at exactly one scenario: BC1 is RGB without alpha, BC2 is RGB + sharp alpha, BC3 is RGB + smooth alpha, BC4 is single-channel, BC5 is dual-channel. Real game textures mix scenarios — a single character map can carry smooth RGB gradients, sharp alpha edges and high-frequency metallic specular all at once, and no single BCn explains the whole thing. Artists want "one format that covers everything" without per-texture format picking. BC7's answer: 8 internal block modes plus an encoder that picks the best mode per 4×4 block. At the same 8 bpp as BC2 / BC3, BC7 looks 5-10× better visually — close to uncompressed. That's why, post-D3D11, BC7 became the de-facto only choice for desktop game textures: 90 % of AAA desktop textures are BC7.
技术内核
Technical core
BC7 的设计哲学跟 BC1-5 完全相反——BC1-5 是"一种结构覆盖一类场景",BC7 是"八种结构都做出来,让编码器临时挑"。① 8 种 mode (mode 0-7):每种 mode 内部不同的 (a) 区块切分(1 / 2 / 3 个子区,subsets——把 4×4 块拆成多组,每组独立端点 + 内插,适用于块内有明显颜色边界);(b) endpoint bit 分配(如 mode 1 给端点 6·6·6 高精度,mode 2 给 5·5·5 留更多 bit 给 index);(c) index bit width(2 或 3 或 4 bit,索引位越多越能精细内插);(d) 可选 p-bit(端点末位补一位精度)与 rotation(把 alpha 跟某个颜色通道交换,提升 alpha 精度)。② mode 0-3 偏 RGB 高质量,mode 4-7 偏 RGBA——RGB 模式给颜色更多 bit 但不要 alpha;RGBA 模式拨一些 bit 给 alpha 通道。这种"分工"让 BC7 既能当 BC1 的 RGB 升级,又能当 BC3 的 RGBA 升级,完全覆盖。③ 编码器枚举所有 mode 选最优——每个 4×4 块要对 8 mode × 几十种分区组合 × 端点优化跑一遍,计算 SSE(平方误差和),选 SSE 最低那个塞进 16 byte。这就是 BC7 编码慢的根本原因——典型 8K 纹理用 naive brute-force 要 40 分钟,Intel ISPC SIMD 后降到几秒。④ 8 bpp(同 BC2 / BC3,但视觉质量好 5-10×)——BC1 / BC4 是 4 bpp,BC7 / BC2 / BC3 / BC5 / BC6H 都是 8 bpp。BC7 跟 BC2 / BC3 同 bpp,胜在 mode 选择灵活,典型纹理 PSNR 高 +8-12 dB。⑤ 解码硬件原生——D3D11+ / GL 4.2+ / Vulkan / Metal 全平台支持,GPU sample 一个 BC7 texel 跟 sample 一个 RGBA8 一样快。这是 BC7 比"软件解码 + 上传"格式(如 KTX 装 zlib)的根本优势。
BC7's design philosophy inverts BC1-5: BC1-5 use one structure per scenario, BC7 ships eight structures and lets the encoder pick at runtime. ① 8 modes (mode 0-7), each varying along (a) partitioning (1, 2 or 3 subsets — splitting the 4×4 block into independent regions, useful when there's a sharp colour boundary inside the block); (b) endpoint bit allocation (mode 1 gives endpoints 6·6·6 high precision; mode 2 gives 5·5·5 and donates the saved bits to the index); (c) index bit width (2, 3 or 4 bits — more bits means finer interpolation); (d) optional p-bits (one extra LSB on the endpoints) and rotation (swap alpha with one of the colour channels to boost alpha precision when warranted). ② Mode 0-3 lean toward high-quality RGB; mode 4-7 lean toward RGBA — RGB modes give colour more bits with no alpha, RGBA modes shave bits off colour to fund alpha. That division of labour is what lets BC7 simultaneously upgrade BC1 (RGB) and BC3 (RGBA). ③ The encoder enumerates all modes and picks the optimum — for every 4×4 block it tries 8 modes × tens of partition combinations × endpoint optimisations, scores them by SSE (sum of squared error), and writes the best one into 16 bytes. This is the core reason BC7 encoding is slow: a typical 8K texture needs ~40 minutes with naive brute-force, dropping to seconds with Intel's ISPC SIMD encoder. ④ 8 bpp (the same as BC2 / BC3) with 5-10× better visual quality — BC1 / BC4 are 4 bpp; BC7 / BC2 / BC3 / BC5 / BC6H are all 8 bpp. At equal bpp BC7's mode-selection flexibility wins +8-12 dB PSNR over BC2 / BC3 on typical textures. ⑤ Hardware-native decoding — D3D11+ / GL 4.2+ / Vulkan / Metal all decode BC7 in silicon; sampling a BC7 texel costs the same as sampling RGBA8. That hardware-native sampling is BC7's fundamental advantage over "software-decode + upload" formats like KTX-with-zlib payloads.
图 20 · BC7 完整编码流程:输入一个 4×4 RGBA 块,编码器并行尝试 8 种 mode(每种 mode 内部还要枚举分区方案 / 端点优化),为每种 mode 算出 SSE(squared error sum),取最低那个,把"哪种 mode + 端点 + index"打包成 16 byte block。整张纹理重复几十万次——这就是 BC7 编码慢的根源,也是 ISPC / CUDA 加速器存在的理由。
Fig 20 · BC7's full encode pipeline: take a 4×4 RGBA block, run trial encodes through all 8 modes (each one in turn enumerates partition layouts and endpoint optimisations), score them by SSE (sum of squared error), pick the lowest, and pack "chosen mode + endpoints + indices" into a 16-byte block. A whole texture repeats this hundreds of thousands of times — exactly why BC7 encoding is slow, and exactly why ISPC / CUDA-accelerated encoders exist.
| format | bpp | RGBA | quality | encode time |
|---|---|---|---|---|
| BC1 | 4 | RGB + 1-bit α | low | 1× (baseline) |
| BC3 | 8 | RGBA | medium | 1× |
| BC7 | 8 | RGBA | high | ~50-200× of BC1 |
| ASTC 4×4 | 8 | RGBA | high+ | similar to BC7 |
| ASTC 6×6 | 3.56 | RGBA | medium+ | similar to BC7 |
$ nvtt_export --bc7 in.png -o out.dds # NVIDIA Texture Tools, GPU-accelerated
$ ispc_texcomp -bc7 in.png out.dds # Intel SIMD encoder, ~10× faster than naive
$ toktx --encode bc7 out.ktx2 in.png # wrap into KTX2 (web / WebGPU friendly)
$ texconv -f BC7_UNORM in.png # Microsoft DirectXTex CLI
$ Compressonator.exe -fd BC7 in.png out.dds # AMD Compressonator
适用
USE FOR
- 桌面 AAA 游戏纹理(角色 / 场景 / UI / 道具,99% 默认)
- WebGPU 高质量纹理(KTX2 容器封装)
- 同时需要 RGB 高保真 + alpha 的混合贴图
- 升级现存 BC1 / BC3 资产以提升画质(同 / 双倍体积)
- 金属反光 / 高频细节贴图(mode 6 RGBA 单分区 + 4-bit index 表现极佳)
- Desktop AAA game textures (characters / environments / UI / props — 99 % default)
- High-quality WebGPU textures (wrapped in KTX2 containers)
- Mixed maps that need both fidelity-grade RGB and alpha
- Upgrading existing BC1 / BC3 assets for better quality (same or 2× the bytes)
- Metallic specular / high-frequency detail (mode 6 — RGBA single subset + 4-bit index — excels)
反适用
AVOID
- 移动端(用 ASTC,块尺寸更灵活、bpp 可调)
- HDR 纹理(用 BC6H,BC7 仍是 LDR 0-1)
- D3D10 及以下的老硬件(BC7 是 D3D11+)
- 实时编码场景(即便 SIMD 仍比 BC1 慢 5-10×,服务端实时压缩慎用)
- 单 / 双通道贴图(用 BC4 / BC5 更省空间)
- Mobile (use ASTC — flexible block sizes, tunable bpp)
- HDR textures (use BC6H — BC7 is still LDR 0-1)
- D3D10 or older hardware (BC7 requires D3D11+)
- Real-time encoding (even SIMD is 5-10× slower than BC1; server-side live compression is risky)
- Single / dual-channel maps (BC4 / BC5 are more space-efficient)
| scope | APIs | tools | CLI |
|---|---|---|---|
| BC7 | ✓ D3D11+ · ✓ Vulkan · ✓ Metal · ✓ OpenGL 4.2+ (BPTC) · ✓ WebGPU (texture-compression-bc) | ✓✓ NVIDIA Texture Tools (CUDA) · Intel ISPC ispc_texcomp · AMD Compressonator · Microsoft texconv · KTX-Software toktx |
nvtt_export --bc7 · ispc_texcomp -bc7 · toktx --encode bc7 |
ETC1 — Android 早期标准
ETC1 — the early Android standard
"OpenGL ES 时代第一个免专利的块压缩。"
"The first patent-free block codec of the OpenGL ES era."
2005 年 Khronos 在为 OpenGL ES 标准化纹理压缩时遇到一个棘手问题——S3TC(BC1-3)效果好但被 S3 Graphics 申请了一堆专利,Khronos 不可能把"必须授权才能用"的格式塞进开放标准。Ericsson Research 提了 ETC1(Ericsson Texture Compression),声明免专利,正好填上空缺,跟着 OpenGL ES 2.0(2007)一起进入 Android 强制基线。Android 从此在游戏纹理上有了统一格式——美术不必为不同 GPU 厂商分别打包,Mali / Adreno / PowerVR / Tegra 全都能解 ETC1。代价是 ETC1 没有 alpha 通道,任何带透明度的资产(UI 图标、粒子、角色边缘)都要拆成"RGB 用 ETC1 + alpha 用 8-bit 灰度图"两份纹理上传——显存和带宽都要付双份钱。这是 ETC2 在 2013 年出生的根本原因。但回到 2005,免专利 + GLES 2.0 强制 = ETC1 一夜之间成了 Android 游戏纹理事实标准。Angry Birds(2009)、Cut the Rope(2010)这一代手机游戏的纹理资产几乎全是 ETC1。
In 2005, while Khronos was standardising texture compression for OpenGL ES, it ran into a thorny problem — S3TC (BC1-3) worked beautifully but was wrapped in patents owned by S3 Graphics, and an open standard couldn't mandate "must license to use" formats. Ericsson Research proposed ETC1 (Ericsson Texture Compression), declared it patent-free, and it slotted neatly into the gap, riding alongside OpenGL ES 2.0 (2007) into the Android mandatory baseline. Suddenly Android had a single texture format every artist could ship — no need to repackage per vendor, since Mali, Adreno, PowerVR and Tegra all decoded ETC1. The price was that ETC1 had no alpha channel, so anything translucent (UI icons, particles, character edges) had to be split into "RGB as ETC1 + alpha as an 8-bit greyscale map" — two texture uploads, double the VRAM and bandwidth. That is exactly why ETC2 was born in 2013. But back in 2005, patent-free + GLES 2.0 mandatory equals ETC1 becoming the de-facto Android texture standard overnight. Angry Birds (2009) and Cut the Rope (2010) — that generation of mobile games — shipped almost their entire texture base in ETC1.
技术内核
Technical core
ETC1 的设计是"把 BC1 的思路换一种几何切分,绕开专利"。① 4×4 块切两半——不像 BC1 把 4×4 当整体处理,ETC1 把块切成上下 2×4 或左右 4×2 两半(块头有 1 bit 标记 flip 方向),每半独立有自己的颜色 base + modifier。这是 ETC1 跟 BCn 最大的几何差异——BCn 块是统一的 16 像素插值,ETC1 是两组 8 像素插值。② RGB444 base + 16 行 modifier 表——每半的 base color 只有 12 bit(RGB444),精度比 BC1 的 RGB565 还低;但靠 modifier 表补救——3 bit 选 16 行预设里的一行,每行给出 4 个亮度偏移值(如 ±2 / ±8 这种"小幅"组,或 ±42 / ±183 这种"大幅"组),覆盖从平滑渐变到硬边缘的不同需求。③ 2-bit/像素 index——每像素再用 2 bit 选 modifier 行里的 4 个偏移值之一,加到 base color 上得到最终颜色。换言之 ETC1 的颜色计算是"base ± modifier",只在亮度方向上调,色相不变——这意味着 ETC1 处理彩色高频细节(花布、彩色噪点)很差,但处理"单色平滑+亮度变化"(皮肤、墙面、地形)很好。④ 没有 alpha——这是 ETC1 最致命的局限。Android 游戏的解决方案是"双纹理上传":RGB 用 ETC1,alpha 用单通道 8-bit 灰度图(或 ETC1 的另一个块当 alpha 用,叫 ETC1+A 的 hack)。⑤ 每块 8 byte / 16 像素 = 4 bpp——跟 BC1 同体积。质量略差于 BC1(因为色相方向死板),但免专利 = 能强制进 GLES 标准,这是 BC1 做不到的。
ETC1's design is "use a different geometric split from BC1 to dodge the patents". ① 4×4 block split into two halves — unlike BC1, which treats the 4×4 as one unit, ETC1 splits the block into two 2×4 halves (or two 4×2 halves; the block header carries a single flip bit). Each half independently owns its colour base + modifier. That's the biggest geometric difference from BCn: BCn's block is one 16-pixel interpolation; ETC1's is two 8-pixel interpolations. ② RGB444 base + a 16-row modifier table — each half's base colour is only 12 bits (RGB444), even less precise than BC1's RGB565; the modifier table makes up the difference. Three bits pick one of 16 preset rows, each row carrying four brightness offsets (a "fine" set like ±2 / ±8, a "coarse" set like ±42 / ±183), covering everything from smooth gradients to hard edges. ③ 2 bits per pixel for the index — each pixel picks one of the four offsets in the chosen modifier row and adds it to the base colour, producing the final value. ETC1's colour math is therefore "base ± modifier" — adjustment only along brightness, never along hue. That makes ETC1 poor on coloured high-frequency detail (patterned cloth, coloured noise) and excellent on monochrome-plus-brightness signals (skin, walls, terrain). ④ No alpha — ETC1's most fatal limitation. The Android workaround was the "two-texture upload": RGB as ETC1, alpha as a single-channel 8-bit greyscale map (or a second ETC1 block reused as alpha — the "ETC1+A" hack). ⑤ 8 bytes per 16-pixel block = 4 bpp, the same footprint as BC1. Quality lags BC1 slightly (because hue can't move) but ETC1 is patent-free, which lets it become a mandatory part of the GLES standard — something BC1 could never be.
适用
USE FOR
- (历史) OpenGL ES 2.0 时代 Android 游戏纹理
- (历史) Android 4.x / 5.x 时代不带 alpha 的资产(地形、天空盒、道具背景)
- 极少数仍需要兼容 OpenGL ES 2.0 设备的旧游戏维护
- (historical) OpenGL ES 2.0-era Android game textures
- (historical) Android 4.x / 5.x assets without alpha (terrain, skyboxes, prop backgrounds)
- The rare modern case of maintaining a legacy game that still ships to GLES 2.0 devices
反适用
AVOID
| scope | APIs | tools | CLI |
|---|---|---|---|
| ETC1 | ✓ OpenGL ES 2.0+(强制) · ✓ OpenGL 4.3+(ARB_ES3_compatibility) · ~ Vulkan(扩展) · ✗ D3D / Metal | ✓ Khronos etc1tool · Mali Texture Compression Tool · ImageMagick · Unity / Unreal 早期内置 |
etc1tool in.png --encode -o out.pkm · etc2comp -format ETC1 in.png -o out.ktx |
ETC2 / EAC — alpha 加成
ETC2 / EAC — adding alpha
"ETC1 加上 alpha 通道,正好赶上 OpenGL ES 3.0。"
"ETC1 with alpha — just in time for OpenGL ES 3.0."
ETC1 在 Android 上跑了 6 年(2007-2013),但"没有 alpha"这个缺陷越用越疼。任何带透明度的资产——UI 图标、HUD、粒子系统、抠图角色——都要拆成两份纹理上传:RGB 用 ETC1(4 bpp),alpha 用 8-bit 灰度图(8 bpp),合计 12 bpp,显存和带宽是单纹理的 3 倍。手机游戏的 UI 又特别多透明元素,这个负担实打实地让中低端 Android 设备跑不动。2013 年 Khronos 正式推 ETC2 / EAC——保持向下兼容(老的 ETC1 块在 ETC2 解码器里能直接用),同时加入 RGBA 模式(ETC2 RGB 块 + EAC alpha 块,共 16 byte = 8 bpp)。ETC2 还顺手补齐了 R11 / RG11 单/双通道格式(对应桌面的 BC4 / BC5,用于法线贴图、roughness 等),让移动端也有了完整的"通道拆分"工具箱。最重要的政治决定:Khronos 把 ETC2 定成 OpenGL ES 3.0 的强制基线——任何宣称支持 GLES 3.0 的 GPU 都必须解码 ETC2。这意味着 2014 年之后的 Android 游戏可以放心地"全资产 ETC2",不再需要为"老设备没 ETC2"留 fallback。Unity / Unreal 在 2014 年都把 Android 默认纹理改成了 ETC2。
ETC1 ran on Android for six years (2007-2013), but "no alpha" hurt more every year. Anything translucent — UI icons, HUDs, particle systems, alpha-masked characters — had to upload two textures: RGB as ETC1 (4 bpp) plus alpha as an 8-bit greyscale map (8 bpp), 12 bpp combined and roughly 3× the bandwidth of a single texture. Mobile UI is unusually heavy on translucent elements, and that overhead measurably broke mid- to low-end Android devices. In 2013 Khronos shipped ETC2 / EAC: keep ETC1 backward compatibility (legacy ETC1 blocks decode unchanged in an ETC2 decoder) and add an RGBA mode (an ETC2 RGB block + an EAC alpha block, 16 bytes = 8 bpp total). ETC2 also rounded out single- and dual-channel formats with R11 / RG11 (the mobile counterparts to desktop BC4 / BC5 — normals, roughness, etc.), giving mobile its own full "channel-split" toolbox. The crucial political decision: Khronos made ETC2 a mandatory baseline for OpenGL ES 3.0. Any GPU that claims GLES 3.0 support must decode ETC2. Post-2014 Android games could finally ship all-ETC2 with no "device might not have ETC2" fallback. Unity and Unreal both flipped their Android default to ETC2 in 2014.
技术内核
Technical core
ETC2 的设计哲学是"在 ETC1 上做加法,不做减法"——所有 ETC1 块在 ETC2 解码器里都能正常工作(向下兼容),新增的能力都通过"block 头部模式位"切换。① RGB 块 4 种模式:(a) ETC1 兼容模式(老的"两半 base + modifier" 结构,8 byte);(b) T-mode(把 4×4 块按 T 形分成两个颜色区,适合块内有 L 形 / T 形硬边);(c) H-mode(把块按 H 形分两区,适合垂直硬边);(d) Planar mode(用三个角点的颜色定义平面,块内每像素从平面采样,适合平滑渐变如皮肤、天空)。每块的头部 1 bit 指明用哪种模式,编码器为每块挑最优。② RGBA8 = ETC2 RGB block + EAC alpha block——一块 16 byte,前 8 byte 是 ETC2 RGB,后 8 byte 是 EAC(Ericsson Alpha Compression)alpha 块。EAC alpha 块用 8 个端点 + 内插值 + 3-bit/像素 index,提供 11-bit 等效精度,远高于 BC3 alpha 块的 8-bit。③ R11 / RG11——独立的单/双 11-bit 通道格式,对应桌面的 BC4 / BC5,用于法线贴图(RG11)、高度图 / roughness(R11)等。R11 是 8 byte/块 = 4 bpp,RG11 是 16 byte/块 = 8 bpp。④ punch-through alpha——一种特殊模式叫 ETC2 RGBA1(RGB8_PUNCHTHROUGH_ALPHA1_ETC2),只允许 alpha = 0 或 255 的硬切边(像 BC1 的 1-bit alpha),用于树叶、栅栏这种"完全透明 / 完全不透"的资产,体积仍是 4 bpp。⑤ OpenGL ES 3.0 强制 = 不需要 fallback——这是 ETC2 最大的工程优势。BC1-7 在桌面是"硬件支持但要查 capability",ETC2 在 Android GLES 3.0+ 是"必然存在"。Unity / Unreal 因此在 2014 年果断把 Android 默认纹理改成 ETC2。
ETC2's design philosophy is "add to ETC1, never subtract" — every ETC1 block decodes correctly in an ETC2 decoder (backward compatibility), and new capabilities are gated behind block-header mode bits. ① The RGB block has four modes: (a) ETC1-compatible (the legacy "two halves, base + modifier" structure, 8 bytes); (b) T-mode (the 4×4 block split into two colour regions in a T shape — handy for blocks with L- or T-shaped hard edges); (c) H-mode (split into two regions in an H shape — for vertical hard edges); (d) Planar mode (three corner colours define a plane, every pixel is sampled from that plane — for smooth gradients like skin and sky). One bit in the block header chooses the mode; the encoder picks per-block. ② RGBA8 = ETC2 RGB block + EAC alpha block — 16 bytes per block: the first 8 are ETC2 RGB, the next 8 are EAC (Ericsson Alpha Compression). The EAC alpha block carries 8 endpoints + interpolated values + 3-bit per-pixel index, delivering an effective 11-bit precision — far above BC3 alpha's 8-bit. ③ R11 / RG11 — standalone single- and dual-channel 11-bit formats, the mobile counterparts to desktop BC4 / BC5, used for normal maps (RG11), height / roughness maps (R11), etc. R11 is 8 bytes per block = 4 bpp; RG11 is 16 bytes = 8 bpp. ④ Punch-through alpha — a special mode called ETC2 RGBA1 (RGB8_PUNCHTHROUGH_ALPHA1_ETC2) only allows alpha = 0 or 255 hard cut-outs (like BC1's 1-bit alpha), targeted at foliage / fences / "fully on or fully off" assets at 4 bpp. ⑤ OpenGL ES 3.0 mandatory = no fallback needed — and that is ETC2's biggest engineering advantage. On desktop, BC1-7 are "hardware-supported but capability-checked"; on Android with GLES 3.0+, ETC2 is guaranteed to exist. That is exactly why Unity and Unreal flipped the Android default to ETC2 in 2014.
适用
USE FOR
- Android 游戏纹理 / OpenGL ES 3.0+ 全部资产(默认选择)
- 需要兼容老 Android 设备但又想要 alpha 的项目
- Vulkan 移动端纹理(广泛支持)
- R11 / RG11 用于移动端法线贴图、roughness、高度图
- punch-through alpha 用于树叶 / 栅栏 / UI 硬边切边资产
- Android game textures / OpenGL ES 3.0+ assets (the default choice)
- Projects that need to support older Android devices yet still ship alpha
- Vulkan mobile textures (broadly supported)
- R11 / RG11 for mobile normal maps, roughness, height maps
- Punch-through alpha for foliage / fences / hard-edged UI cut-out assets
| scope | APIs | tools | CLI |
|---|---|---|---|
| ETC2 / EAC | ✓ OpenGL ES 3.0+(强制) · ✓ OpenGL 4.3+(ARB_ES3_compatibility) · ✓ Vulkan(VK_FORMAT_ETC2_*) · ~ Metal(iOS 13 之前) · ✗ D3D 原生 | ✓✓ Google etc2comp(开源,SIMD 加速) · Mali Texture Compression Tool · Compressonator · Unity / Unreal 内置 |
etc2comp -format RGBA8 in.png -o out.ktx · toktx --encode etc2 out.ktx2 in.png |
PVRTC — Apple 早期独占
PVRTC — Apple's early proprietary lock-in
"PowerVR 的私有方案,iPhone 一代到 7 代的纹理本命。"
"PowerVR's proprietary scheme — texture-of-life for iPhone 1 through 7."
PVRTC 的诞生跟一个非常特定的硬件架构绑定:Imagination Technologies 的 PowerVR GPU 用的是 TBDR(Tile Based Deferred Rendering,基于瓦片的延迟渲染)——把屏幕切成小瓦片(典型 32×32 像素),每个瓦片独立渲染、合成,显著省功耗(手机的核心需求)。问题是 TBDR 处理瓦片时,纹理 sample 经常跨瓦片边界,如果纹理压缩格式是"块独立"的(像 BC1 / ETC1 那种,每个 4×4 块独立解码),瓦片边界处会出现明显的"块状不连续"(blocky artifact)。Imagination 在 2003 年提出 PVRTC 解决这个问题:不存"每块独立的颜色",而是存两层"低分辨率的颜色信号" + 一个"调制信号"——运行时 GPU 在采样点对两层信号做双线性插值,然后用调制信号在两个插值结果之间混合。这样块之间天然连续,没有边界 artifact——完美适配 TBDR。代价是 PVRTC 是私有格式,只有 PowerVR GPU 能解。但 Apple iPhone 1(2007)到 iPhone 7(2016)全部用 PowerVR GPU,所以 PVRTC 是 iOS 游戏的唯一标准纹理格式近十年。Infinity Blade、Real Racing、Monument Valley 一代游戏的纹理资产基本全是 PVRTC。iPhone 8 / iPad Pro A10(2017)改用 Apple 自研 GPU,默认 ASTC,PVRTC 进入历史。
PVRTC's birth is tied to one very specific hardware architecture: Imagination Technologies' PowerVR GPUs use TBDR (Tile Based Deferred Rendering), which slices the screen into small tiles (typically 32×32 pixels), renders and composites each tile independently, and saves significant power — the core mobile requirement. The trouble is that during tile processing, texture samples regularly cross tile boundaries; if the texture format is "block independent" (like BC1 / ETC1, each 4×4 block decoded in isolation), tile boundaries grow visible "blocky" artifacts. In 2003 Imagination proposed PVRTC to solve this. Instead of storing "independent colour per block", PVRTC stores two layers of low-resolution colour signals plus a modulation signal — at sample time the GPU bilinearly interpolates both colour layers, then blends the two interpolated results using the modulation signal. Blocks are naturally continuous across boundaries — no block artifacts, a perfect TBDR fit. The price is that PVRTC is proprietary, decodable only on PowerVR GPUs. But every iPhone from the iPhone 1 (2007) through the iPhone 7 (2016) shipped with a PowerVR GPU, so PVRTC was the de facto sole texture standard on iOS for nearly a decade. Infinity Blade, Real Racing and Monument Valley — that generation of iOS games — basically shipped their entire texture base as PVRTC. The iPhone 8 / iPad Pro A10 (2017) switched to Apple's own GPU, defaulting to ASTC, and PVRTC slid into history.
技术内核
Technical core
PVRTC 的技术结构跟 BCn / ETCn 完全是另一条思路——它不做"每块独立解码",而是用"全图低分辨率信号 + 调制图"的方案。① 两个低分辨率 RGB 层 + 一个调制层——记原图分辨率 W×H,PVRTC 把它编码为:(a) 信号 A,分辨率 (W/4)×(H/4)(2 bpp 模式)或 (W/4)×(H/2)(4 bpp 模式),每个采样点存 RGB 端点;(b) 信号 B,跟 A 同分辨率,存另一组 RGB 端点;(c) 调制信号 mod,跟原图同分辨率,每像素 1 bit(2 bpp 模式)或 2 bit(4 bpp 模式)指明 A 和 B 的混合比例。② 采样时的实际运算:GPU 对 A、B 各自做双线性插值得到 colorA、colorB,再用 mod 在两者之间混合。这不是块独立——同一个像素的颜色受周围 4 个 A 端点 / 4 个 B 端点的影响,块边界因此天然平滑过渡。③ 块尺寸 8×4(2 bpp)或 4×4(4 bpp)——两种码率档:2 bpp 是 8×4 块 / 8 byte = 2 bpp(注意是 8 byte/块,跟 BC1 同 byte 数但块更大,所以 bpp 减半),4 bpp 是 4×4 块 / 8 byte。④ 原生 alpha——比 ETC1 强,能直接装 RGBA 数据(虽然质量略差于 BC3 / BC7)。⑤ "分辨率必须是 2 的幂 + 正方形 + ≥8×8"——PVRTC v1 的硬限制。这个限制源于"信号 A、B 必须能均匀采样到原图所有像素"。PVRTC2(2009)放宽了这个限制(支持任意宽高 + punch-through alpha),但 PVRTC2 的硬件支持远不如 v1 普及。⑥ PowerVR 独占解码硬件——这是 PVRTC 同时是它的优势和坟墓。优势:iPhone 1-7 全部 PowerVR,PVRTC 在 iOS 游戏里是"必然支持";坟墓:其他 GPU 不解 PVRTC,Android 设备完全用不了,跨平台游戏要分别打包 PVRTC(iOS)+ ETC2(Android)两份纹理。Apple 在 A10(2017)改用自研 GPU(基于 Imagination 但有改造),iOS 11+ 推荐 ASTC 后,PVRTC 就停止发展了。Imagination Technologies 也在 2017 年因 Apple 流失被收购,PVRTC 实际上跟着公司一起进入历史。
PVRTC's technical structure is on a completely different track from BCn / ETCn — it doesn't do "decode each block in isolation"; it uses "global low-resolution signals + a modulation map." ① Two low-resolution RGB layers plus one modulation layer — given source resolution W×H, PVRTC encodes: (a) signal A at (W/4)×(H/4) (2 bpp mode) or (W/4)×(H/2) (4 bpp mode), each sample storing an RGB endpoint; (b) signal B, same resolution as A, holding another RGB endpoint set; (c) modulation signal mod, at the source's full resolution, with 1 bit/pixel (2 bpp mode) or 2 bits/pixel (4 bpp mode) specifying the blend ratio between A and B. ② Sample-time arithmetic: the GPU bilinearly interpolates A and B independently to produce colourA and colourB, then uses mod to blend them. This is not block-independent — a single pixel's colour depends on the surrounding 4 A endpoints + 4 B endpoints, so block boundaries transition smoothly by construction. ③ Block size 8×4 (2 bpp) or 4×4 (4 bpp) — two bitrate tiers. The 2 bpp variant uses 8×4 blocks at 8 bytes per block (note: same bytes-per-block as BC1, but the block is larger, so bpp halves); the 4 bpp variant is 4×4 blocks at 8 bytes. ④ Native alpha — stronger than ETC1, can carry RGBA directly (though with somewhat lower quality than BC3 / BC7). ⑤ "Power-of-two, square, ≥ 8×8" — PVRTC v1's hard requirement, rooted in the need for signals A and B to sample uniformly onto every source pixel. PVRTC2 (2009) relaxed this (arbitrary aspect ratios + punch-through alpha), but PVRTC2 hardware support never reached v1's ubiquity. ⑥ PowerVR-exclusive decode hardware — both PVRTC's strength and its tomb. The strength: every iPhone 1-7 had a PowerVR GPU, so PVRTC was guaranteed-supported on iOS. The tomb: no other GPU decodes PVRTC, so Android couldn't use it at all, and cross-platform games had to ship two texture builds — PVRTC (iOS) + ETC2 (Android). When Apple's A10 (2017) moved to in-house GPUs (originally Imagination-derived, but heavily modified), and iOS 11+ recommended ASTC, PVRTC stopped evolving. Imagination Technologies itself was acquired in 2017 after losing the Apple business; PVRTC effectively went into the history books with the company.
适用
USE FOR
- (历史) iPhone 1-7 / iPad 第一代到 Pro A9 的 iOS 游戏
- (历史) 老 Android PowerVR 设备(MX 系列芯片)
- 仍需要兼容到 iOS 9-10 的旧游戏维护
- 需要"块边界天然连续"的特殊场景(罕见)
- (historical) iOS games on iPhone 1-7 / first-gen iPad through iPad Pro A9
- (historical) Older Android PowerVR devices (MX-series SoCs)
- Maintaining a legacy game that still ships to iOS 9-10
- The rare niche that genuinely needs "natively continuous block boundaries"
反适用
AVOID
- 现代 iOS 项目(用 ASTC,Apple Silicon 默认)
- 任何非 PowerVR GPU 设备(Android Mali / Adreno / Tegra,完全不解)
- 跨平台游戏(双轨打包成本高,统一用 ASTC + ETC2 fallback)
- 非 2 的幂 / 非正方形纹理(PVRTC v1 硬限制)
- Modern iOS projects (use ASTC, Apple Silicon's default)
- Any non-PowerVR GPU device (Android Mali / Adreno / Tegra simply can't decode it)
- Cross-platform games (the dual-track packaging cost is high — unify on ASTC + ETC2 fallback)
- Non-power-of-two or non-square textures (a hard PVRTC v1 limitation)
| scope | APIs | tools | CLI |
|---|---|---|---|
| PVRTC v1 / v2 | ✓ PowerVR GPU(iOS A4-A9, 部分老 Android) · ~ Vulkan iOS 兼容层 · ✗ 其他 GPU | ✓ Imagination PVRTexTool(GUI + CLI) · texconv 不支持 · Unity / Unreal 早期 iOS 默认 |
PVRTexToolCLI -i in.png -o out.pvr -f PVRTC1_4_RGB · PVRTexToolCLI -f PVRTC1_2_RGBA -i in.png -o out.pvr |
ASTC — 可变块大小的现代之王
ASTC — the modern king of variable block size
"4×4 还是 12×12?同一个格式,你自己挑。"
"4×4 or 12×12? Same format — you choose."
BCn / ETC2 都是固定 4×4 块,bpp 永远 4 或 8——只能"全图统一档位"。但游戏里的纹理质量需求从来不是一档的:UI 图标、角色脸需要高质量(密集块、高 bpp);远处地形、天空盒可以低质量(稀疏块、低 bpp)。美术希望一种格式同时支持"质量/体积"光谱滑块——同样一个文件结构,从 8 bpp 一路滑到 1 bpp。ARM 主导设计 ASTC(Adaptive Scalable Texture Compression),提供 14 种块大小(4×4 至 12×12),bpp 从 8 降到 0.89——同一格式覆盖近 9× 体积范围。Khronos 在 2012 年通过标准化(GLES 3.2 强制 + Vulkan 默认 + Apple A8 起原生支持),ASTC 成为现代移动 + WebGPU 的事实之王。BC7 守桌面、ASTC 守移动——这是 GPU 纹理压缩 2010 年代后的两强格局。
BCn / ETC2 are fixed at 4×4 blocks; bpp is locked at 4 or 8 — every texture in a project must pick one tier for the whole image. But real game textures need a spectrum: UI icons and character faces want high quality (dense blocks, high bpp), while distant terrain and skyboxes can run low quality (sparse blocks, low bpp). Artists want one format that exposes a quality / size dial — the same file structure sliding from 8 bpp down to 1 bpp. ARM led the design of ASTC (Adaptive Scalable Texture Compression), shipping 14 block sizes from 4×4 to 12×12, with bpp dropping from 8 to 0.89 — one format spanning nearly a 9× size range. Khronos standardised it in 2012 (mandatory in GLES 3.2, default in Vulkan, native on Apple from A8 onward), and ASTC became the de-facto king of modern mobile and WebGPU. BC7 owns desktop, ASTC owns mobile — that's the post-2010s duopoly of GPU texture compression.
技术内核
Technical core
ASTC 的设计哲学是"一格框架,无限档位"——所有块共用 16 byte 容器,但内部组件按块大小重新分配比例,让格式从 8 bpp 一路滑到 0.89 bpp。① 14 种块大小:4×4 / 5×4 / 5×5 / 6×5 / 6×6 / 8×5 / 8×6 / 8×8 / 10×5 / 10×6 / 10×8 / 10×10 / 12×10 / 12×12——LDR profile 全部支持,HDR profile 仅前 8 种。还有 3D 体素扩展(用 3×3×3 等 11 种 3D 块)。② 每块固定 16 byte——这是 ASTC "档位光谱"的根本机制:容器不变,块越大(像素更多)→ 每像素分到的 bit 越少 → bpp 越低。比如 4×4 块 = 16 px / 16 byte = 8 bpp;12×12 块 = 144 px / 16 byte = 0.89 bpp。同样的解码硬件、同样的文件结构,档位却覆盖近 9× 体积差。③ 13 种 endpoint 编码格式 (CEM):LDR int 8/16/24-bit、HDR float、单/双/三平面(单平面 = RGB 共享一组端点;三平面 = RGB 各自独立端点,适合高频彩色细节)。每块在 endpoint mode 字段内挑一种,精确匹配局部像素分布。④ 权重平面 + 双权重平面:基本 ASTC 用一张权重图控制所有通道的内插;双权重平面(dual-plane)模式让 alpha 或某一颜色通道走独立权重——类比 BC7 的 "rotation",但更通用,在彩色 + 高频 alpha 混合贴图上质量明显更好。⑤ HDR + 3D 双扩展——LDR profile(主流硬件全支持)给颜色 0-1 范围;HDR profile(部分硬件)给 float 范围,直接当移动版 BC6H 用;3D profile(更小众)给体素纹理(医疗影像、烟雾模拟、地形 3D 噪声)。⑥ 权重网格大小可独立于块大小——一个 12×12 块的权重网格可以是 4×4(更稀疏,更节省 bit 给 endpoint),也可以是 8×8(更密,牺牲 endpoint 精度换插值精度)。这是 ASTC 比 BC7 更灵活的核心,编码器要在"块大小 × endpoint mode × 权重网格"三维空间搜索最优。⑦ 编码极慢——astcenc 参考编码器是 brute-force 搜全部组合,单图 6×6 thorough 模式可能要几分钟。但解码硬件原生,sample 一个 ASTC texel 跟 sample BC7 一样快。
ASTC's design philosophy is "one frame, infinite tiers" — every block shares a 16-byte container, but the internal allocation re-balances by block size, sliding the format from 8 bpp all the way to 0.89 bpp. ① 14 block sizes: 4×4 / 5×4 / 5×5 / 6×5 / 6×6 / 8×5 / 8×6 / 8×8 / 10×5 / 10×6 / 10×8 / 10×10 / 12×10 / 12×12 — all supported by the LDR profile; HDR is limited to the first 8. A 3D extension also defines 11 voxel block sizes. ② Every block is exactly 16 bytes — the mechanism behind ASTC's tier spectrum. The container stays constant; the bigger the block (more pixels packed in), the fewer bits per pixel, and the lower the bpp. 4×4 = 16 px / 16 bytes = 8 bpp; 12×12 = 144 px / 16 bytes = 0.89 bpp. Same decode hardware, same file layout, nearly 9× size difference between extremes. ③ 13 endpoint encodings (CEM): LDR int at 8 / 16 / 24-bit, HDR float, with one / two / three planes (one-plane = RGB share endpoints; three-plane = each colour channel has independent endpoints, ideal for high-frequency colour). Each block picks one CEM in its endpoint-mode field to match the local pixel distribution. ④ Weight plane + dual weight plane: basic ASTC uses one weight grid controlling all channels' interpolation; dual-plane mode lets alpha or one colour channel travel on an independent weight grid — analogous to BC7's "rotation" but more general, and visibly better on mixed colour + high-frequency-alpha maps. ⑤ HDR + 3D extensions — the LDR profile (universally supported) covers colour in [0, 1]; the HDR profile (partial hardware support) gives float range and effectively serves as mobile BC6H; the 3D profile (more niche) targets voxel textures (medical imaging, smoke simulation, 3D terrain noise). ⑥ Weight grid size independent of block size — a 12×12 block can use a 4×4 weight grid (sparser, donating bits to endpoints) or an 8×8 grid (denser, trading endpoint precision for interpolation precision). This is what makes ASTC more flexible than BC7: the encoder searches a three-dimensional space of "block size × endpoint mode × weight grid." ⑦ Encoding is brutally slow — astcenc, the reference encoder, brute-forces the whole combination space; a single image at 6×6 with the thorough preset can take minutes. But decoding is hardware-native — sampling an ASTC texel costs the same as sampling BC7.
图 24 · ASTC 完整编码 + 采样流程:输入一个 RGBA 块,编码器对 14 种块大小逐一试压(每种内部还要枚举 endpoint mode 和权重网格组合),按 SSIM 评分,在用户给定的"bpp 预算"约束下选最优块大小,把结果塞进 16 byte。GPU 在 sample 时硬件原生解码——Vulkan / OpenGL ES 3.2 / Metal / WebGPU 全部一次 cycle 取出像素,跟 sample BC7 同样快。
Fig 24 · ASTC's full encode + sample pipeline: take an RGBA tile, run trial encodes against all 14 block sizes (each enumerating endpoint modes and weight-grid configurations), score by SSIM, and pick the best block size that fits the project's bpp budget. The result is always 16 bytes. GPUs decode it natively at sample time — Vulkan / OpenGL ES 3.2 / Metal / WebGPU all fetch a pixel in a single cycle, exactly as fast as sampling BC7.
| block | bpp | typical use | vs BCn at same bpp |
|---|---|---|---|
| 4×4 | 8.00 | UI icons, important textures | ≈ BC7 (slightly better) |
| 5×5 | 5.12 | mid-quality, skin / cloth | BCn no equivalent tier |
| 6×6 | 3.56 | environment, mobile default | ≫ BC1 (4 bpp) by ~6 dB |
| 8×8 | 2.00 | terrain, distant | BCn no tier here |
| 10×10 | 1.28 | far LOD, skybox | BCn no tier here |
| 12×12 | 0.89 | extreme low, ambient | BCn no tier here |
$ astcenc -cl in.png out.astc 6x6 -medium # LDR profile, 6×6 block, medium preset
$ astcenc -cs in.png out.astc 6x6 -thorough # thorough = brute-force search, much slower / better
$ astcenc -ch in.exr out.astc 6x6 -medium # HDR profile (input EXR float)
$ toktx --astc 6x6 out.ktx2 in.png # wrap into KTX2 (web / WebGPU friendly)
$ toktx --encode astc --astc_blk_d 6x6 out.ktx2 in.png # explicit block-size flag
适用
USE FOR
- 移动游戏(iOS A8+ / Android GLES 3.2+,99% 默认)
- WebGPU(macOS 默认 / 移动 Web,配合 KTX2)
- VR 头显纹理(Quest / Vision Pro 全部 ASTC)
- 跨平台游戏纹理打包(配合 Basis Universal 转码)
- 需要"质量/体积"档位灵活调节的项目(同 image 不同 mip 用不同档)
- Mobile games (iOS A8+ / Android GLES 3.2+ — 99 % default)
- WebGPU (macOS default / mobile web, paired with KTX2)
- VR headset textures (Quest / Vision Pro all use ASTC)
- Cross-platform game texture packaging (with Basis Universal transcoding)
- Projects that need a flexible quality / size dial (same image, different mip tiers)
反适用
AVOID
- 桌面 Windows / Linux 含 Intel HD 集显的目标(用 BC7)
- D3D11 / D3D12 桌面游戏(BC7 是事实标准)
- HDR 纹理在多数桌面硬件(用 BC6H;ASTC HDR Profile 桌面支持差)
- 实时编码场景(astcenc thorough 单图几分钟,极不适合服务端实时)
- 极老的 Android 设备(GLES 3.0 及以下,改用 ETC2 fallback)
- Desktop Windows / Linux targets that include Intel HD iGPUs (use BC7)
- D3D11 / D3D12 desktop games (BC7 is the de-facto standard)
- HDR textures on most desktop hardware (use BC6H; desktop ASTC HDR support is poor)
- Real-time encoding (astcenc thorough takes minutes per image — never use server-side live)
- Very old Android devices (GLES 3.0 and below — fall back to ETC2)
| scope | APIs | tools | CLI |
|---|---|---|---|
| ASTC LDR | ✓✓ Vulkan · ✓✓ OpenGL ES 3.2(强制)· ✓ Metal (Apple A8+) · ~ WebGPU (可选 feature) · ✗ Intel HD Graphics 桌面 | ✓✓ ARM astcenc(参考编码器,开源)· KTX-Software toktx · NVIDIA Texture Tools · Mali Texture Compression Tool · Unity / Unreal 内置 |
astcenc -cl in.png out.astc 6x6 -medium · toktx --astc 6x6 out.ktx2 in.png |
| ASTC HDR | ~ Vulkan(部分硬件)· ~ Apple Metal (A11+ 部分支持) · ✗ 多数移动 GPU | ✓ astcenc -ch |
astcenc -ch in.exr out.astc 6x6 -medium |
Basis Universal — 一次编码、多平台转码
Basis Universal — encode once, transcode anywhere
"一份 .basis,运行时按设备转 BC7、ETC2 或 ASTC。"
"One .basis file — transcoded at runtime to BC7, ETC2 or ASTC depending on the device."
Web 和跨平台游戏一直有个尴尬:桌面要 BC7、Android 要 ETC2、现代移动要 ASTC、老 iOS 要 PVRTC——同一张纹理要打四份,资产包体积爆炸,CDN 流量翻倍,管理痛苦不堪。Rich Geldreich(前 Valve、Crunch 作者、桌面纹理压缩领域的活字典)在 2018 年提出"中间格式"思路:编码时存为 Basis Universal(一种紧凑的 IR——intermediate representation),运行时用 JS 或 Wasm 解码到目标设备的块格式。一份资产 → 任意设备。Khronos 接受捐赠后,Basis 成为 KTX2 supercompression scheme 的事实标准,glTF 2.0 把 KTX2 + Basis 列为推荐的纹理 payload。three.js / Babylon.js / Unity WebGL / godot Web 全都内置 Basis transcoder。Web 端从此告别"打四份纹理"的时代。
Web and cross-platform games long suffered an awkward problem: desktop needs BC7, Android needs ETC2, modern mobile wants ASTC, legacy iOS wants PVRTC — the same texture has to be packed four ways, asset bundles balloon, CDN traffic doubles, and asset management becomes a nightmare. In 2018 Rich Geldreich (ex-Valve, author of Crunch, a living encyclopedia of desktop texture compression) proposed an "intermediate format" approach: encode once into Basis Universal (a compact IR — intermediate representation), then at runtime use JS or Wasm to transcode to whatever block format the target device wants. One asset → any device. After Geldreich donated the project to Khronos, Basis became the de-facto KTX2 supercompression scheme; glTF 2.0 lists KTX2 + Basis as the recommended texture payload. three.js / Babylon.js / Unity WebGL / godot Web all ship the Basis transcoder. The "pack four textures" era of the Web ended here.
basisu 把 PNG 压成 ETC1S(超小,~2 bpp,适合普通贴图)或 UASTC(高质,~8 bpp,适合法线/UI/重要纹理)中间格式,装进 .basis 独立容器或 KTX2 supercompression payload。运行时 JS/Wasm transcoder 检测设备能力——桌面转 BC7、Android 转 ETC2、现代移动转 ASTC、老 iOS 转 PVRTC——一份资产打通所有平台,GPU 拿到的是原生块格式可以直接 sample。basisu compresses a PNG into either ETC1S (tiny, ~2 bpp, for general textures) or UASTC (high quality, ~8 bpp, for normals / UI / important textures), packed into a standalone .basis container or a KTX2 supercompression payload. At runtime a JS/Wasm transcoder probes the device — desktops get BC7, Android gets ETC2, modern mobile gets ASTC, legacy iOS gets PVRTC — one asset covers every platform, and the GPU receives a native block format it can sample directly.技术内核
Technical core
Basis 的核心是"中间表示 + 运行时转码"——既不像 BCn/ASTC 那样直接是 GPU 块格式,也不像 PNG/JPEG 那样是 CPU 像素流,而是一种专门设计来"几乎零成本转码到任何块格式"的紧凑中间形态。① 两个 profile:ETC1S 基于 ETC1 的色彩端点结构,每块 ~2 bpp,体积极小,质量约相当于 JPEG 中等;UASTC("Universal ASTC")基于 ASTC 4×4 的子集,每块 8 bpp,质量约等于 ASTC 4×4 / BC7。两档对应"小贴图随便堆"和"重要纹理用高质量"。② 编码后再用 supercompression 压一遍——ETC1S 用 LZ-style + RDO(rate-distortion optimisation)再压缩 30-50%,UASTC 用 Zstd 压缩 ~30%。最终 .basis / KTX2 文件比裸 BCn 还小,比 PNG/JPEG 慢一点解码但能直接送给 GPU。③ 运行时转码极快——transcoder 是设计成 O(blocks) 的简单查表 + 位重排,Wasm 实现单核能跑 几百 MB/s,比 PNG 解码快一个数量级。这是 Basis 跟传统"在线解码 PNG → CPU RGBA → uploadTexture"路径的根本区别——后者占 CPU + 占带宽 + 占显存,前者一步到位送 GPU 块格式。④ 支持目标:BC1 / BC3 / BC4 / BC5 / BC7 / ETC1 / ETC2 / ASTC 4×4 / PVRTC1 / PVRTC2 / RGBA32(无块格式硬件兜底)——基本覆盖现役所有 GPU。
Basis's core idea is "intermediate representation + runtime transcode" — it is neither a direct GPU block format like BCn / ASTC, nor a CPU pixel stream like PNG / JPEG, but a compact intermediate form deliberately designed to transcode to any block format at almost zero cost. ① Two profiles: ETC1S is built on ETC1's colour-endpoint structure, ~2 bpp per block, extremely small, with quality roughly on par with mid-quality JPEG; UASTC ("Universal ASTC") is built on a subset of ASTC 4×4, 8 bpp per block, with quality close to ASTC 4×4 / BC7. The two tiers map to "stack lots of small textures" vs "use high quality on important textures." ② The encode is then run through supercompression — ETC1S uses an LZ-style codec plus RDO (rate-distortion optimisation) and shrinks another 30–50 %; UASTC uses Zstd for about 30 %. The resulting .basis / KTX2 file is smaller than raw BCn, decodes a touch slower than PNG / JPEG, but ships straight to the GPU. ③ Runtime transcoding is blazing fast — the transcoder is engineered as O(blocks) with simple table lookups and bit re-shuffling; the Wasm build hits hundreds of MB/s on a single core, an order of magnitude faster than PNG decoding. That's the fundamental difference between Basis and the traditional "decode PNG → CPU RGBA → uploadTexture" path: the latter eats CPU + bandwidth + VRAM, while the former hands the GPU a block format in one step. ④ Supported targets: BC1 / BC3 / BC4 / BC5 / BC7 / ETC1 / ETC2 / ASTC 4×4 / PVRTC1 / PVRTC2 / RGBA32 (an uncompressed fallback for hardware without block formats) — effectively every GPU in service.
适用
USE FOR
- glTF 2.0 模型纹理(KTX2 + Basis 是官方推荐)
- WebGPU / WebGL 资产(配合 KTX2 容器)
- three.js / Babylon.js / PlayCanvas / godot Web 项目
- 跨平台游戏纹理打包(一份资产覆盖桌面/移动/Web)
- CDN 流量敏感的场景(ETC1S ~2 bpp 体积比 PNG 小很多)
- glTF 2.0 model textures (KTX2 + Basis is the official recommendation)
- WebGPU / WebGL assets (paired with the KTX2 container)
- three.js / Babylon.js / PlayCanvas / godot Web projects
- Cross-platform game texture packaging (one asset for desktop / mobile / Web)
- CDN-bandwidth-sensitive scenarios (ETC1S at ~2 bpp is much smaller than PNG)
| scope | runtimes | tools | CLI |
|---|---|---|---|
| Basis Universal | ✓✓ three.js / Babylon.js / PlayCanvas 内置 transcoder · ✓✓ Unity WebGL / godot Web · ✓ 任意 WebGL/WebGPU + Wasm transcoder | ✓✓ Khronos basisu(参考编码器,开源) · toktx(打 KTX2 + Basis payload) · KTX-Software 套件 |
basisu in.png -uastc -output_file out.basis · toktx --encode uastc out.ktx2 in.png |
Crunch — 在 BC 体积上再砍一半
Crunch — halving BC's size with a second pass
"先 cluster 再 BC1 — 在 BC 体积上再砍一半。"
"Cluster first, then BC1 — halve the BC size."
2010s 初期移动 + 主机游戏的纹理资产包动辄几百 MB,主要是 BC1/BC3 块的累积——iOS App Store 限制单包 < 2 GB,主机光盘也是有限介质。Rich Geldreich 在 2012 年观察到一个事实:大量 4×4 块其实彼此相似——同一张草地纹理里成千上万个块都是"绿色为主、轻微噪点变化",同一面墙的砖块色调几乎一致。如果这些块共用一个码本(codebook),只存"指向码本的索引 + 微小偏移",体积可以再砍一半。Crunch 把这个想法落地:对所有块做 k-means 聚类,然后用 RC(range coder)+ Huffman 二次熵编码 BC 字典——磁盘体积比裸 BCn 再小 30-50%。运行时只需在 CPU 上花几十毫秒解回普通 BCn,再上传给 GPU。这是"BC 之上还能再压"的第一个工程化实践。后来同作者 6 年后用同样思路做了 Basis Universal,覆盖更广 GPU 块格式 + 加上 transcode 维度——Crunch 进入历史。
By the early 2010s, mobile and console game texture bundles had ballooned to hundreds of MB, mostly accumulated BC1/BC3 blocks — the iOS App Store capped single bundles at 2 GB, and console discs are finite media. In 2012 Rich Geldreich noticed an obvious truth: most 4×4 blocks are similar to each other — a grass texture has thousands of blocks that are all "mostly green with mild noise"; the bricks on a wall share an almost identical palette. If those blocks shared a single codebook and we only stored "codebook index + small offset," size could be halved again. Crunch put that idea into practice: run k-means clustering across all blocks, then run RC (range coder) + Huffman as a second-pass entropy code over the BC dictionary — on disk the result is 30–50 % smaller than raw BCn. At runtime the CPU spends tens of milliseconds decoding back to ordinary BCn, then uploads it to the GPU. It was the first engineering-grade demonstration of "compressing on top of BC." Six years later the same author took the idea further, covering more GPU block formats plus an extra transcode dimension — that became Basis Universal, and Crunch quietly walked off into history.
技术内核
Technical core
Crunch 的工程实现只有两步,但每步都精妙。① k-means cluster:把所有 4×4 块当成一个 64-bit 高维向量样本(BC1 块结构 = 2 个 16-bit 端点 + 32-bit 4-color 索引),用 k-means 在 BC 块空间内聚类成 N 个代表块(典型 N = 1024);每块只存"代表块索引 + 局部偏移量"。这一步把"每块 64 bit 独立"变成"每块 ~10 bit 索引 + 小残差",体积压缩比通常 4-6 倍,但因为 BCn 本身已经是有损,残差很小,质量损失可控。② RC + Huffman 二次熵编码:对 codebook 自身(1024 × 64 bit = 8 KB)和 index 流(~10 bit × 块数)再用 range coder + Huffman 树压缩——index 流通常有强自相关(同一区域的相邻块很可能落在同一 cluster),熵很低,Huffman 能再砍 30-50%。最终 .crn 文件平均比裸 BCn 小 50%,跟 PNG 体积差不多但能直接送 GPU(还要先在 CPU 上 swizzle 回 BCn,有几十 ms 解码延迟)。运行时解码是 streaming 的——可以一边读文件一边解块,不需要一次加载整张图——这是 Crunch 设计上对 mmap 友好的一个细节。
Crunch's engineering implementation has just two steps, but each is delicately tuned. ① k-means clustering: treat every 4×4 block as a 64-bit high-dimensional vector (a BC1 block = two 16-bit endpoints + a 32-bit 4-colour index), then run k-means in BC-block space to find N representative blocks (typically N = 1024); each block stores "representative-block index + local offset." This step turns "64 independent bits per block" into "about 10 bits of index + a tiny residual," giving a 4–6× size compression — and because BCn is already lossy, the residual is small and quality loss stays controlled. ② Second-pass RC + Huffman entropy coding: the codebook itself (1024 × 64 bits = 8 KB) and the index stream (~10 bits × block count) go through a range coder plus Huffman tree — the index stream is strongly auto-correlated (adjacent blocks in the same region almost always fall in the same cluster), entropy is low, and Huffman shaves another 30–50 %. The resulting .crn file averages 50 % smaller than raw BCn, comparable in size to PNG but ready to ship to the GPU (you do still need a CPU swizzle back to BCn first, costing tens of ms of decode latency). Runtime decode is streaming — blocks can be decoded as the file streams in, no need to load the whole image at once — a deliberate design choice that makes Crunch friendly to mmap.
适用
USE FOR
- 2010s 移动游戏纹理资产包压缩(BC1/BC3 体积敏感场景)
- 需要"BC 之上再省一半"的资产管线
- 研究/学习 cluster + 熵编码思路的参考
- 2010s mobile-game texture bundle compression (BC1 / BC3 size-sensitive cases)
- Asset pipelines needing another 50 % off on top of BC
- Reference for studying cluster + entropy-coding designs
反适用
AVOID
- 2018 之后任何新项目——直接用 Basis Universal
- 需要支持 BC7 / ASTC / ETC2 等多种块格式(Crunch 只覆盖 BC1/BC3)
- 需要运行时直接送 GPU,不接受 CPU 解码延迟
- 跨平台部署(无 ETC/ASTC 转码,移动端覆盖差)
- Any new project after 2018 — use Basis Universal instead
- Pipelines needing multiple block formats (Crunch only covers BC1 / BC3)
- Cases that demand zero CPU decode at runtime
- Cross-platform deployment (no ETC / ASTC transcode, weak mobile coverage)
| scope | engines | tools | CLI |
|---|---|---|---|
| Crunch (.crn) | ✓ Unity 内置 importer(早期版本) · ✓ Unreal Engine 4 移动端(可选) · ✗ 现代主流引擎已弃用 | ✓ crnlib(开源 C++ 库) · ✓ crunch CLI · ~ 命令仍可用但维护停滞 | crunch -file in.png -fileformat crn -dxt1 |
Mipmap — 一张纹理八张分辨率
Mipmap — one texture, eight resolutions
"一张纹理八张分辨率,采样时按距离自动选。"
"One texture, eight resolutions — auto-picked by distance at sample time."
1983 年 Lance Williams(NYIT,纽约理工学院计算机图形实验室)在 SIGGRAPH 发表 'Pyramidal Parametrics',第一次系统提出 mipmap 概念。它解决的问题是 3D 场景里最古老也最折磨人的视觉 bug:aliasing(走样/摩尔纹/闪烁)。当一个有纹理的多边形(墙、地面、远处地形)远离相机,屏幕上一个像素就会覆盖纹理上多个 texel——简单的"取最近 texel"采样会随机丢掉大部分信息,产生移动时的闪烁、网格图案上的摩尔纹、远处瓦片纹理的"沸腾"效果。Williams 的洞察:预先存好 N 个降采样层(每层是上一层的 2× 缩放,带 box filter 平均),采样时按"屏幕像素覆盖纹理多大"(LOD,Level of Detail)自动选合适那一层。代价是显存 +33%(几何级数 1+1/4+1/16+...→4/3),收益是无 aliasing + 缓存命中率提升(远处 mip 是小图,容易留在 GPU L2)。所有现代 GPU 纹理默认都带 mipmap,几乎所有引擎都强制开启——这是 GPU 时代的"一旦学会就回不去"的基础设施。
In 1983 Lance Williams (NYIT — the New York Institute of Technology Computer Graphics Lab) published "Pyramidal Parametrics" at SIGGRAPH, the first systematic proposal of the mipmap concept. It solved one of the oldest and most maddening visual bugs in 3D rendering: aliasing (moiré patterns, shimmering, sparkle). When a textured polygon (a wall, the ground, distant terrain) recedes from the camera, a single screen pixel covers many texels — naive "nearest-texel" sampling randomly throws away most of the information, producing shimmering as the camera moves, moiré on grid textures, and "boiling" on distant tiled surfaces. Williams's insight: pre-store N down-sampled layers (each is the previous one box-filtered to 2× smaller), and at sample time pick the right layer based on how much texture area each screen pixel covers (LOD — Level of Detail). The cost is +33 % VRAM (a geometric series 1 + 1/4 + 1/16 + … → 4/3); the payoff is zero aliasing plus better cache hits (distant mips are small and fit in GPU L2). Every modern GPU texture defaults to having mipmaps, almost every engine enforces them — once you know how it feels, you never go back. It's the infrastructure of the GPU era.
技术内核
Technical core
Mipmap 的工程实现非常直接,但每个细节都暗含思想。① 每张 base 图额外存 log₂(N) 个降采样层:mip 0 = base 原始图;mip 1 = box-filtered 2× 缩放;mip 2 = 又 2×;直到 1×1。一张 1024×1024 base 共 11 个 mip(0~10)。降采样可以用 box filter(简单平均)、Lanczos(更锐利但更贵)、或在 sRGB 空间需要先 gamma decode 再 filter 再 encode 回去——很多老引擎在 sRGB 纹理上没做 gamma-correct mip 生成,导致远处纹理看起来"灰蒙蒙"。② 显存额外 +33%:几何级数 1 + 1/4 + 1/16 + ... = 4/3,极限是基础体积的 4/3。这是个固定开销,大概率值得——除非你的纹理永远只在近距离用(UI、屏幕特效)。③ GPU 采样时按 LOD 自动选 mip level:GPU 在像素着色器里能算出当前像素的 dPdx/dPdy(纹理坐标在屏幕水平/垂直方向的偏导数),据此估出"一个屏幕像素覆盖纹理多大",对数运算后得到 LOD 浮点数。整数部分选 mip level,小数部分用于trilinear filtering——在两个 mip 之间双线性插值,避免 mip 跳变可见的"接缝"。④ 各向异性过滤(anisotropic filtering)是 mipmap 的延伸——当视角倾斜时(比如往远处看的地面),屏幕像素在纹理上覆盖的不是正方形而是细长的矩形,简单 trilinear 会过模糊。aniso filtering 沿主轴方向多采样几次再加权,质量更好但带宽更大,通常给"开 16x aniso"档位。⑤ 容器内置 mip chain:KTX/KTX2/DDS 都把 mip 0 → mip N 顺序拼接进 payload,加载时一次 mmap 全部入显存——这就是为什么 GPU 容器格式天生跟 mipmap 绑定的设计。
The engineering of mipmap is straightforward, but every detail hides a small lesson. ① Each base texture stores log₂(N) extra down-sampled layers: mip 0 = the original base; mip 1 = box-filtered 2× smaller; mip 2 = another 2×; … down to 1×1. A 1024×1024 base has 11 total mips (0–10). The down-sample filter can be box (simple averaging), Lanczos (sharper but more expensive), or — for sRGB textures — must gamma-decode, filter in linear, then re-encode; many older engines skipped gamma-correct mip generation, which is why distant textures looked "washed out" in their games. ② +33 % VRAM: the geometric series 1 + 1/4 + 1/16 + … = 4/3, fixed extra cost. Almost always worth it, unless the texture is only ever used up close (UI, screen FX). ③ GPU picks the mip level by LOD at sample time: in a pixel shader the GPU can compute dPdx / dPdy (the partial derivatives of the texture coordinate in screen X / Y), use that to estimate "how much texture one screen pixel covers," and take the log to get a floating-point LOD. The integer part chooses the mip; the fractional part feeds trilinear filtering — bilinear blending between two adjacent mips to mask the visible "seams" of a mip transition. ④ Anisotropic filtering is an extension of mipmap — at oblique viewing angles (looking down at distant ground, say), one screen pixel covers a long thin rectangle on the texture, not a square, and plain trilinear over-blurs. Aniso filtering takes multiple samples along the major axis and weights them, giving better quality at the cost of bandwidth — usually exposed as a "16× aniso" toggle. ⑤ Containers embed the mip chain: KTX / KTX2 / DDS all concatenate mip 0 → mip N into the payload so a load can mmap the whole pyramid into VRAM at once — which is why GPU container formats have always been designed hand-in-glove with mipmaps.
适用
USE FOR
- 任何 3D 场景纹理(地形、建筑、角色、远景)——必须开启
- 视差贴图、normal map、AO、roughness 等 PBR 纹理
- 大场景中需要保留远处细节但避免 aliasing 的所有用途
- 纹理 cache 性能优化(远处用小 mip,提升 L2 命中)
- Any 3D scene texture (terrain, architecture, characters, vistas) — must be enabled
- Parallax / normal / AO / roughness and other PBR maps
- Any case in a large scene that wants distant detail without aliasing
- Texture cache optimisation (distant geometry uses small mips, boosting L2 hit-rate)
反适用
AVOID
- 2D UI 元素、屏幕后处理 LUT(永远 1:1 采样,mip 浪费 33% 显存)
- 动态生成的纹理(每帧重新生成 mip 太贵)
- Render target / framebuffer attachment(通常不需要 mip)
- 极小图(< 32×32,mip 层只有几层,收益小)
- 2D UI elements, screen post-process LUTs (always 1:1 sampling — mip wastes 33 % VRAM)
- Dynamically generated textures (regenerating mips every frame is expensive)
- Render target / framebuffer attachments (usually don't need mips)
- Very small textures (< 32×32 — only a couple of mip levels, minimal benefit)
| scope | APIs | tools | CLI |
|---|---|---|---|
| Mipmap | ✓✓ 所有 GPU 硬件原生 · OpenGL / D3D / Vulkan / Metal / WebGL / WebGPU 全部内置 | ✓✓ glGenerateMipmap(GPU 端生成) · texconv -m · NVIDIA Texture Tools · KTX-Software toktx · ImageMagick |
texconv -m 0 -f BC7_UNORM in.png(0 = full chain)· toktx --genmipmap out.ktx2 in.png |
OpenEXR — 影视行业标准
OpenEXR — the film industry standard
"星战幕后用了 30 年的格式,你做合成第一个学的就是它。"
"30 years of Star Wars VFX runs on this. The first format you learn in compositing."
1999 年 ILM 在做《珍珠港》《魔戒》前期合成时,发现手上没有一个合用的中间格式:16-bit TIFF 不够动态范围(镜头闪光、火焰、HDRI 环境贴图很容易超过 1.0),Radiance HDR(C29)只有 RGBE 三通道、不能装 Z-depth / motion vector / object ID。VFX 合成流程的真实需求是:(a) 真 HDR float——亮度无上限,可正可负;(b) 任意 channel——一张文件能塞 RGBA + Z + Normal + Motion + Object ID + UV pass + 几十层灯光分层;(c) tile-based 部分加载——一个 8K EXR 可能 2 GB,Nuke / Houdini 经常只读视口看得到的那一小块;(d) 多分辨率 mip——给 IBL 环境贴图直接拿不同 LOD 采样。OpenEXR 就是为这四件事设计的,30 年没出过第二个对手。
In 1999 ILM was deep in pre-production on Pearl Harbor and The Lord of the Rings, and discovered that no existing intermediate format fit their pipeline: 16-bit TIFF lacked dynamic range (lens flares, explosions and HDRI environment maps easily exceed 1.0), and Radiance HDR (C29) only carried three RGBE channels — no Z-depth, motion vector or object ID. The real VFX-compositing requirements were: (a) true HDR float — unbounded brightness, possibly negative; (b) arbitrary channels — one file holding RGBA + Z + Normal + Motion + Object ID + UV pass + dozens of light groups; (c) tile-based partial loading — an 8K EXR can be 2 GB, and Nuke / Houdini routinely read only the viewport tile; (d) multi-resolution mips — sampling IBL environment maps at the right LOD. OpenEXR was designed for those four needs, and in 30 years no rival has emerged.
技术内核
Technical core
OpenEXR 的设计跟 PNG / JPEG / TIFF 走的不是一条路——它不是"把一张 RGBA 图存好",而是"为合成流水线提供一个可流式部分加载、可任意配 channel、可分通道选 codec 的容器"。① 任意 channel:不止 RGBA,可以是 R / G / B / A / Z / object_id / motion.x / motion.y / normal.x / normal.y / normal.z / UV.x / UV.y 以及任意自定义命名。channel list 在 header 里,每个 channel 自带 pixelType(half / float / uint)和 sampling rate(支持次采样)。② 半精度 float16(half)是默认 pixelType,16 bit 表示 [−65504, +65504] + ±Inf + NaN——这是 ILM 跟 NVIDIA 在 1999 年一起定义的格式,后来被 IEEE 754-2008 收编(binary16),并成为 GPU 显存里 HDR 纹理的事实标准。③ 多压缩 codec:NONE(纯字节流)/ RLE(整数离散最佳)/ ZIP(zlib 通用)/ ZIPS(逐 scanline ZIP)/ PIZ(wavelet 无损,行业默认)/ PXR24(把 float32 截到 24-bit,Pixar 贡献,几乎无损)/ B44 / B44A(老 lossy)/ DWAA / DWAB(基于 DCT 的现代 lossy,DreamWorks 贡献,体积砍 5-10×,常用于 dailies)。可以 per-part 选不同 codec——RGBA 用 DWAA、Z 用 ZIP、Object ID 用 RLE,各取所长。④ Tile-based 部分加载:文件可选 scanline 或 tile 模式,tile 模式下 header 里有 offset table,Nuke / Houdini / Mari 加载 8K EXR 时只读视口看得到的几个 tile(可能 64 KB 而不是 2 GB)——这个能力是非线性合成软件 / 数字绘景的命脉。⑤ 多分辨率 mip:tiled EXR 可存 mipmaps(rip-maps 也行),IBL 环境贴图按 LOD 直接采样,不必外部生成 mip chain。⑥ 多帧 / multi-part:OpenEXR 1.x 用 .0001.exr / .0002.exr 帧序列(每帧一文件,管线友好);2.0(2013)引入单文件多 part,可在一个 .exr 里塞多个 layer / 多个 view(立体渲染左右眼)/ 多个分辨率,每个 part 独立 codec。这种"容器化"路线让 EXR 跟 USD / OCIO / ACES 这些现代 VFX 中间件无缝衔接。
OpenEXR's design takes a different road from PNG / JPEG / TIFF — it is not "store one RGBA image well" but "provide a streamable, partially-loadable container with arbitrary channels and per-channel codec choice for the compositing pipeline." ① Arbitrary channels: not just RGBA but R / G / B / A / Z / object_id / motion.x / motion.y / normal.x / normal.y / normal.z / UV.x / UV.y plus any custom name. The channel list lives in the header, each channel carrying its own pixelType (half / float / uint) and sampling rate (subsampling supported). ② Half-precision float16 is the default pixelType — 16 bits representing [−65504, +65504] plus ±Inf and NaN — a format ILM and NVIDIA jointly defined in 1999, later folded into IEEE 754-2008 (binary16) and now the de-facto standard for HDR textures in GPU VRAM. ③ Multiple compression codecs: NONE (raw bytes) / RLE (best for integer discrete data) / ZIP (general-purpose zlib) / ZIPS (per-scanline ZIP) / PIZ (wavelet lossless, industry default) / PXR24 (truncates float32 to 24 bits — Pixar's contribution, near-lossless) / B44 / B44A (legacy lossy) / DWAA / DWAB (modern DCT-based lossy, DreamWorks' contribution, 5-10× smaller — the dailies workhorse). Codecs can be picked per part — DWAA on RGBA, ZIP on Z, RLE on Object ID; each plays to strength. ④ Tile-based partial loading: files can be scanline or tile mode; in tile mode the header carries an offset table, so Nuke / Houdini / Mari loading an 8K EXR read only the visible tiles (possibly 64 KB out of 2 GB) — that capability is the lifeblood of node-based compositors and digital matte painters. ⑤ Multi-resolution mips: tiled EXR can carry mipmaps (or rip-maps), so IBL environment maps sample at the right LOD without an external mip chain. ⑥ Multi-frame / multi-part: OpenEXR 1.x used per-frame .0001.exr / .0002.exr sequences (one file per frame, pipeline-friendly); 2.0 (2013) added single-file multi-part, packing multiple layers, multiple views (stereo left/right), or multiple resolutions into one .exr with per-part codec choice. That "container-like" direction lets EXR plug seamlessly into modern VFX middleware — USD, OCIO, ACES.
图 28 · OpenEXR 完整编码流程。渲染输出是一个多通道浮点 buffer——RGBA、Z(深度,float32)、Normal.xyz、Motion.xy、Object ID(整数)、UV、若干灯光分层。EXR 编码器先按用途分组(beauty / data / int / light / UV),再per-part 选 codec:RGBA 和灯光分层用 DWAA(视觉无损,体积砍 5-10×);Z / Normal / Motion / UV 必须 ZIP 无损(几何数据 1 bit 错就出 artifact);Object ID 是整数用 RLE 最优。然后把每个 part 切成 64×64 tile 并建 offset table(Nuke 加载时可只读视口 tile),最后封装进 multi-part 容器并附 ACES 色彩空间属性。整张 8K EXR 可能从 RAW 800 MB 压到 80 MB,而且任何子集可独立读出——这是 30 年没人能替代它的原因。
Fig 28 · OpenEXR's full encode pipeline. The render output is a multi-channel floating-point buffer — RGBA, Z (depth, float32), Normal.xyz, Motion.xy, Object ID (integer), UV, and several light-group passes. The EXR encoder first groups channels by purpose (beauty / data / int / light / UV) and then picks a codec per part: RGBA and light groups go through DWAA (visually lossless, 5-10× smaller); Z / Normal / Motion / UV must stay lossless ZIP (a single bit-flip in geometry data is a visible artifact); integer Object ID is best as RLE. Then each part is split into 64×64 tiles with an offset table (so Nuke can read just the viewport tiles), and finally everything is packed into a multi-part container with the ACES colour-space attribute attached. An entire 8K EXR can compress from raw 800 MB down to about 80 MB, and any subset can still be read independently — which is why no one has displaced it in 30 years.
| format | bit depth | float | channels | typical use |
|---|---|---|---|---|
| 8-bit JPEG | 8 | ✗ | RGB(YCbCr 内部) | screen photo / web |
| 16-bit TIFF | 16 int | ✗ | RGBA | print photo / scan |
| Radiance HDR | RGBE 32-bit | ✓ (shared exp) | RGB | early CG / IBL |
| OpenEXR | 16 / 32 float | ✓ (true) | unlimited (任意 named) | VFX / film / render output |
| HDR10 / HLG | 10-bit PQ | ✗ (perceptual) | YCbCr | TV broadcast / streaming |
$ exrheader scene.exr # 看 channel / codec / displayWindow / attrs
$ exrinfo scene.exr # 简洁版 header 摘要 (OpenEXR 3.x)
$ oiiotool scene.exr -ch R,G,B,A -o rgb.exr # OpenImageIO 抽 channel
$ oiiotool scene.exr --channels=Z -o depth.exr # 单独抽 depth pass
$ exrenvmap input.exr cubemap.exr # latlong → cube · IBL 预处理
$ exrmaketiled in.exr tiled.exr # scanline → tiled (启用部分加载)
$ exrmultipart -combine -i a.exr b.exr -o m.exr# 多 part 合并到一个文件
$ exr2aces in.exr out.exr # 转 ACES2065-1 色彩空间
适用
USE FOR
- VFX 合成中间格式(Nuke / After Effects / Fusion 必备)
- 渲染器输出(Arnold / V-Ray / Renderman / Cycles 默认 EXR)
- IBL 环境贴图(latlong / cube,带 mip)
- ACES 工作流(2015+ 几乎所有好莱坞片)的全流程交换格式
- 数字绘景 / matte painting(Mari / Photoshop 32-bit 模式)
- 需要保留 Z / Normal / Motion / Object ID 等 AOV pass 的渲染管线
- 立体 / 多视图渲染(单文件 multi-part 装左右眼)
- VFX compositing intermediate (mandatory in Nuke / After Effects / Fusion)
- Renderer output (Arnold / V-Ray / Renderman / Cycles default to EXR)
- IBL environment maps (latlong / cubemap, with mip chain)
- End-to-end exchange format for any ACES workflow (essentially every Hollywood release post-2015)
- Digital matte painting (Mari, Photoshop 32-bit mode)
- Render pipelines that must preserve AOV passes — Z / Normal / Motion / Object ID
- Stereo / multi-view renders (single-file multi-part packs left/right eyes)
反适用
AVOID
- Web 显示(浏览器不解 EXR · 文件巨大)
- 移动端 / app 资源(没有 GPU 硬件解码 · 体积不友好)
- 消费级照片分发(用 JPEG / AVIF / HEIF)
- 需要 8-bit / 整数像素的最终交付(用 TIFF / PNG)
- 对体积极敏感的传输场景(EXR 即便 DWAA 也比 JPEG 大几倍)
- Web display (browsers don't decode EXR; files are huge)
- Mobile / app assets (no GPU hardware decode; size unfriendly)
- Consumer photo distribution (use JPEG / AVIF / HEIF)
- Final-delivery 8-bit / integer pixels (use TIFF / PNG)
- Bandwidth-critical transmission (even DWAA EXR is several times larger than JPEG)
| scope | APIs / DCC | tools | CLI |
|---|---|---|---|
| OpenEXR | ✓✓ Nuke · Houdini · Mari · Maya · Blender · Cinema 4D · DaVinci Resolve · After Effects · Fusion · Photoshop(32-bit) · Arnold / V-Ray / Renderman / Cycles 渲染器全部原生 | ✓✓ OpenEXR 官方 lib(C++) · OpenImageIO(oiiotool) · ImageMagick · ffmpeg(EXR sequence) · DJV / mrViewer 看片器 |
exrheader · exrinfo · oiiotool · exrmaketiled · exrenvmap · exrmultipart |
Radiance HDR — 光照贴图老兵
Radiance HDR — the lightmap veteran
"用 8-bit 共享指数装 32-bit float —— 1989 的 hack。"
"32-bit float packed via shared 8-bit exponent — a 1989 hack."
1989 年 Greg Ward 在 Lawrence Berkeley National Lab 写 Radiance —— 一套物理光照模拟渲染器,要算光在场景里的真实辐射度,输出值会从 1e-6(月光)横跨到 1e6(太阳直射)。当时的难题不是算法,而是把这些数装到磁盘里:浮点 IEEE 754 32-bit/通道的话,一张 1024×768 的图就要 12 MB,而 1989 的硬盘是几十 MB 起跳的奢侈品。Greg Ward 的 hack:RGB 三个通道共享一个 8-bit 指数—— 把 R/G/B 三个 float 归一化到同一个 2^E 之下,只存归一化后的 8-bit 尾数 + 一个 8-bit 指数,合计 32 bit/像素(跟 RGBA8 一样大)。范围理论上 1e-38 到 1e38,精度 ~1%(对光照足够,对色彩管理就显粗糙)。再配一个极简的 RLE 行内压缩,这就是 .hdr 格式。靠着这个 hack,IBL 环境贴图、PSPI(panoramic stereo painted images)、HDR 全景照片在 90 年代到 2000 年代撑了 20 年,直到 OpenEXR 把它替换掉。
In 1989 Greg Ward at Lawrence Berkeley National Lab was writing Radiance — a physically-based lighting-simulation renderer — and needed to store radiance values that spanned 1e-6 (moonlight) to 1e6 (direct sun). The challenge wasn't the math; it was fitting that range on disk: IEEE 754 32-bit per channel meant a 1024×768 image cost 12 MB, and 1989 hard drives were luxuries measured in tens of megabytes. Ward's hack: have R/G/B share a single 8-bit exponent — normalise the three floats to the same 2^E, store the normalised 8-bit mantissas plus one 8-bit exponent, totalling 32 bit/pixel (the same as RGBA8). Range nominally 1e-38 to 1e38, precision ~1 % (good enough for lighting, coarse for colour management). Add a minimal scanline RLE on top, and you have the .hdr format. The hack carried IBL environment maps, PSPI panoramas and HDR photography through the 1990s and 2000s for two decades, until OpenEXR finally retired it.
value = (RGB / 256) × 2^(E − 128)——共享指数让三个通道一起放缩,代价是亮度差异极大的颜色(比如蓝色通道很弱、红色很强)精度退化。范围理论上 1e-38 ~ 1e38,精度 ~1%——对光照模拟够用,对色彩管理就嫌粗糙。这是 1989 年的工程取舍:用跟 RGBA8 一样的 32 bit/pixel 装下 6 个数量级的动态范围。value = (RGB / 256) × 2^(E − 128) — the shared exponent scales the three channels together, the cost being precision loss when channel intensities differ wildly (a strong red beside a weak blue). Range is nominally 1e−38 to 1e38 at ~1 % precision — fine for lighting simulation, coarse for colour management. The 1989 trade-off: pack six orders of magnitude into the same 32 bit/pixel as RGBA8.技术内核
Technical core
Radiance HDR 的内核小到只有三件事。① RGBE 编码——三个通道共用一个指数。编码时找 max(R, G, B),归一化到 [0, 1],尾数乘 256 取整,指数加 128 偏移存为一字节;解码时反向。共享指数让"亮度差异极大的颜色"(蓝色通道极弱、红色极强)精度退化——这是它跟 float16 / float32 在色彩管理意义上的本质差距。② 极简 RLE——文件里每一行像素分开压缩:旧格式整行 RGBE 一起 RLE,1991 之后改成"先把 R / G / B / E 四个字节流分别拆开,再各自 RLE",压缩率显著提升(因为 E 经常大段重复,RLE 在它上面收益最大)。压缩开销小到 90 年代的 SGI 能软件实时解码。③ 文本头——文件开头是 ASCII 头,几行 #?RADIANCE / 标识 / 曝光值 / EXPOSURE= / FORMAT= 32-bit_rle_rgbe,然后一个空行,然后是分辨率字符串(-Y 480 +X 640),再之后才是 RLE 二进制流。这种"文本头 + 二进制 payload"的设计后来被 PFM / NetPBM 继承。三件事加起来就是整个 .hdr 格式——简单、自包含、跨平台。代价:① 不支持 alpha;② 没有 metadata(没有 ICC profile、白点、色彩空间);③ 只有 RGB,不能存 Z / Normal / Motion;④ 共享指数精度天生粗糙。这些缺点直接催生了 OpenEXR 在 1999 年的设计目标——"做 Radiance HDR 做不到的所有事"。
The Radiance HDR core is just three things. ① RGBE encoding — three channels share one exponent. Encode by finding max(R, G, B), normalising to [0, 1], scaling mantissas by 256 and rounding, and storing the exponent biased by 128 in one byte; decode is the inverse. The shared exponent loses precision for "channels of wildly different magnitudes" (a strong red beside a weak blue) — that's the format's fundamental colour-management weakness compared to float16 / float32. ② Minimal RLE — pixels are compressed per scanline: the old format ran RLE over the interleaved RGBE bytes; the post-1991 format de-interleaves into four byte streams (R / G / B / E) and RLEs each separately, dramatically improving compression (E often has long runs, where RLE wins biggest). Compression overhead is light enough that 1990s SGI workstations decoded in software in real time. ③ Text header — the file begins with an ASCII header: a few lines of #?RADIANCE / identifier / EXPOSURE= / FORMAT=32-bit_rle_rgbe, then a blank line, then a resolution string (-Y 480 +X 640), and only then the RLE binary stream. The "text header + binary payload" pattern was later inherited by PFM and the NetPBM family. Those three things are the entire .hdr format — simple, self-contained, portable. The costs: ① no alpha; ② no metadata (no ICC profile, no white-point, no colour-space tag); ③ RGB only — no Z, normal or motion channels; ④ inherent precision floor from the shared exponent. Those very gaps drove OpenEXR's 1999 design brief: "do everything Radiance HDR can't."
适用
USE FOR
- IBL 环境贴图老资产 · 兼容旧渲染器(1990s-2000s 的 .hdr 库)
- 全景 latlong HDR 照片(Bracketed exposure stitch 工作流末端)
- HDR 光照模拟的最终输出 · 论文 demo / 教学
- 需要跨多个 DCC 但又不愿付 EXR 复杂度的场景
- Legacy IBL environment maps · old-renderer compatibility (1990s-2000s .hdr libraries)
- Panoramic latlong HDR photographs (the tail end of bracketed-exposure stitch workflows)
- Final output of HDR lighting simulations · paper demos / teaching
- Cross-DCC interchange when EXR's complexity isn't worth paying for
反适用
AVOID
- 现代 VFX 合成场景(用 OpenEXR · 多通道 / 高精度)
- 需要 alpha 的任何场景(RGBE 没有 A)
- 色彩管理严格的工作流(共享指数精度太粗 · 没有 ICC)
- 负数 / 复杂数学中间值(RGBE 只能存非负 RGB)
- Modern VFX compositing (use OpenEXR — multi-channel, higher precision)
- Anything needing alpha (RGBE has no A)
- Strict colour-managed pipelines (the shared exponent is too coarse, no ICC)
- Negative or complex intermediate maths (RGBE stores only non-negative RGB)
| scope | renderers / tools | editors | CLI |
|---|---|---|---|
| Radiance HDR (.hdr / .pic) | ✓✓ Radiance · pbrt · Mitsuba · Arnold · V-Ray · Blender Cycles · 几乎所有 IBL 输入支持 | ✓ Photoshop(32-bit) · GIMP · Affinity Photo · HDRShop · Picturenaut | ra_ppm · ra_tiff · ra_xyze(Radiance 自带 ra_* 套件)· oiiotool --convert |
PFM — Portable FloatMap
PFM — Portable FloatMap
"NetPBM 的 HDR 表亲 —— ASCII 头加裸 float。"
"NetPBM's HDR cousin — ASCII header plus raw float."
学术研究和渲染器中间格式有一种长期需求,主流格式都满足不了:"最简单的 HDR 容器"——不要任何压缩(读写都是 mmap,瞬间)、不要任何 metadata(纯净,bit 级 reproducibility)、能直接当 float* 数组操作(C 代码 fopen + fseek 过头部就能用,不需要任何库)。OpenEXR 太复杂(几百种 attribute、wavelet codec、tile / scanline 切换),Radiance HDR 精度太粗(RGBE shared exponent),float TIFF 的 IFD 解析又是一坨。Paul Debevec 等学术圈的人在 NetPBM(PPM / PGM / PBM)风格基础上,做了 PFM:三行 ASCII 头(magic / 宽高 / scale 字段)紧跟 raw float32 像素流。论文 supplementary、渲染器中间盘、调试图像 dump,这些场景里 PFM 是最舒服的——别的格式都嫌"太聪明"。
Academic research and renderer-internal storage share a recurring need that no mainstream format satisfies: the simplest possible HDR container — no compression (read / write is just mmap), no metadata (pure, bit-exact reproducibility), and direct use as a float* array (C code can fopen, fseek past the header, and operate on the bytes without any library). OpenEXR is too complex (hundreds of attributes, wavelet codecs, tile / scanline modes), Radiance HDR is too coarse (RGBE shared exponent), float TIFF's IFD parsing is its own mess. Paul Debevec and colleagues in academia took the NetPBM lineage (PPM / PGM / PBM) and produced PFM: three ASCII header lines (magic / width-height / scale) followed by a raw float32 pixel stream. For paper supplementaries, renderer internal dumps, and debugging-image scratch storage, PFM is the most comfortable choice — every other format feels "too clever."
PF = RGB · Pf = 灰度);第二行宽高(空格分隔);第三行 scale 字段(浮点数,符号位决定字节序——负数小端,正数大端,数值本身用作曝光缩放)。三行加起来约 20 字节。紧跟着就是 raw float32 像素流,RGB 顺序排列,12 字节/像素,自下而上(跟 BMP 同向)。整个文件可以 mmap 直接当 float* 用,跳过头部就行——这是 PFM 唯一的设计目标。PF = RGB, Pf = grayscale); line 2 is width and height separated by a space; line 3 is the scale field (a float whose sign bit encodes endianness — negative for little-endian, positive for big-endian — and whose magnitude doubles as an exposure factor). Total header ~20 bytes. The raw float32 pixel stream follows, RGB-interleaved, 12 bytes per pixel, stored bottom-up (same orientation as BMP). The entire file can be mmap'd as a float* after skipping the header — that simplicity is PFM's single design goal.技术内核
Technical core
PFM 内核三件事,合起来不到 30 行 C 代码就能写完读写器。① NetPBM 风格的文本头:像 PPM 一样,前几行是 ASCII。第一行是 magic 标识——PF 表示 float32 RGB,Pf 表示 float32 单通道灰度。第二行是宽高(空格分隔的整数)。第三行是 scale 字段——一个浮点数,绝对值是曝光 / 缩放因子(读取时通常忽略,渲染器自己处理),符号编码字节序:负数小端,正数大端。三行,十几个字符。② raw float32 RGB 像素流:头部紧跟二进制 float32 数据,RGB 交错(R₀ G₀ B₀ R₁ G₁ B₁ …),12 字节/像素;灰度模式 4 字节/像素。自下而上(像 BMP,但跟 OpenGL 纹理坐标天然吻合)——这是最常见的踩坑点,新人写 reader 很容易上下颠倒。③ 无任何压缩 / 无任何 metadata:这是故意的。没有 ICC profile,没有色彩空间,没有曝光记录,没有作者注释——纯粹"一张数字"。这恰好是论文实验、渲染器调试、参考实现里最重要的属性:你要 reproduce 别人的结果,任何额外 metadata 都是干扰。代价是它没法用于生产:文件大(4K RGB float32 ≈ 95 MB,无压缩),没法做色彩管理,工具支持 niche。但在它的位置——学术调试、bit-exact 中间盘——没人能替代它。
PFM has three core elements; a complete reader/writer fits in under 30 lines of C. ① NetPBM-style text header: like PPM, the first few lines are ASCII. Line 1 is the magic — PF for float32 RGB, Pf for float32 grayscale. Line 2 is the width and height (space-separated integers). Line 3 is the scale field — a float whose absolute value is an exposure / scale factor (typically ignored at read time; the renderer handles tone mapping itself), and whose sign encodes endianness: negative is little-endian, positive is big-endian. Three lines, a dozen characters. ② Raw float32 RGB pixel stream: binary float32 data follows the header, RGB-interleaved (R₀ G₀ B₀ R₁ G₁ B₁ …) at 12 bytes per pixel; grayscale is 4 bytes per pixel. Bottom-up (like BMP, but conveniently aligned with OpenGL's texture coordinate origin) — the most common pitfall when writing a reader is flipping the rows. ③ No compression, no metadata: intentional. No ICC profile, no colour-space tag, no exposure record, no author comment — just "the numbers." That happens to be the single most important property for paper experiments, renderer debugging, and reference implementations: when you reproduce someone else's result, any extra metadata is noise. The cost is that PFM is unsuitable for production: files are huge (a 4K RGB float32 image is ~95 MB uncompressed), there's no colour management, and tooling support is niche. But in its niche — academic debugging, bit-exact intermediate scratch — nothing else replaces it.
适用
USE FOR
- 学术论文 supplementary / reproducibility 数据集
- 渲染器内部中间盘(每帧 dump,要求 mmap 速度)
- 算法 bit-exact 对比(diff 两个 .pfm 必须完全一致)
- 调试 / 可视化 float buffer(GPU readback dump)
- Academic-paper supplementaries / reproducibility datasets
- Renderer internal scratch storage (per-frame dumps that need mmap speed)
- Bit-exact algorithm comparison (diff'ing two .pfm files must match byte for byte)
- Debugging / visualising float buffers (GPU readback dumps)
反适用
AVOID
- 任何生产场景(无压缩 · 文件巨大)
- 需要色彩管理的工作流(无 ICC / 无色彩空间)
- 跨工具协作(支持 niche)
- Web / 移动端(浏览器不解码)
- Any production scenario (uncompressed → enormous files)
- Colour-managed pipelines (no ICC, no colour-space tag)
- Cross-tool collaboration (niche support)
- Web / mobile (no browser decode)
| scope | tools | libraries | CLI |
|---|---|---|---|
| PFM (.pfm) | ✓ ImageMagick · OpenImageIO · pfstools · MATLAB / OpenCV(自定义 reader 常见) | libpfs · OIIO · pbrt 自带 reader · 论文配套源码常自带 30 行 C 实现 |
pfsin / pfsout(pfstools)· oiiotool in.pfm -o out.exr |
16/32-bit TIFF — 被忽视的扛把子
16/32-bit TIFF — the unsung workhorse
"40 年了,印刷厂还在用它,因为没有更好的替代。"
"40 years on, print shops still use it because nothing better replaced it."
1986 年 Aldus(后来被 Adobe 在 1994 收购)推 PageMaker —— 桌面排版革命的开端,问题是当时图像格式碎成一地:GIF 只有 256 色,EPS 是 PostScript 矢量,Mac PICT 跨不了平台,扫描仪厂商各自用私有格式。Aldus 跟扫描仪厂商一起设计了 TIFF —— Tag Image File Format —— 目标是"任何位深、任何 codec、任何 metadata、跨平台无损"。它的解法是 tag 系统:不像 BMP 那样固定字段,而是用 IFD(Image File Directory)装一个"几百种 tag 都可选"的描述表,payload 可换 codec,可多页,可跨设备元信息。从此所有需要"高保真 + 灵活元数据"的领域都默认 TIFF:印刷出版、扫描仪、显微镜、医学影像、卫星遥感、文物档案。40 年了,没人替代得了——不是因为它优秀,是因为它什么都能装:DICOM 内嵌它,GeoTIFF 是它的子集,DNG 是它的子集,Photoshop 16-bit 工作流默认它。被忽视的扛把子。
In 1986 Aldus (acquired by Adobe in 1994) launched PageMaker, the start of the desktop publishing revolution. The problem: image formats were a Tower of Babel — GIF only did 256 colours, EPS was vector PostScript, Mac PICT didn't cross platforms, scanner vendors each shipped a proprietary format. Aldus partnered with scanner vendors to design TIFF — Tag Image File Format — aiming for "any bit depth, any codec, any metadata, cross-platform, lossless." The solution was a tag system: rather than fixed fields like BMP, an IFD (Image File Directory) carries a descriptive table of "hundreds of optional tags," the payload swaps codecs, files can be multi-page, and device metadata travels with the image. From then on every domain needing "high fidelity + flexible metadata" defaulted to TIFF: print publishing, scanners, microscopes, medical imaging, satellite remote-sensing, museum archives. Forty years later nothing has replaced it — not because it's elegant, but because it can hold anything: DICOM embeds it, GeoTIFF is its subset, DNG is its subset, Photoshop's 16-bit workflow defaults to it. The unsung workhorse.
技术内核
Technical core
TIFF 的设计核心可以总结成四条规则,40 年没变。① 基于 IFD 的 tag 系统:文件不是"按字段顺序"装数据,而是"我有什么属性,就在 tag 表里加一行"。tag id 是 16-bit 无符号整数(0~65535),Adobe 保留 32768 以下,32768~65535 是private tags(GeoTIFF / DNG / OME-TIFF 等子集格式都在这个区域)。每个 tag 自带数据类型(BYTE / SHORT / LONG / RATIONAL / ASCII / FLOAT / DOUBLE 等 12 种),解析器只要"我认识的 tag 处理,不认识的跳过"。这种设计直接借鉴了 IBM 的 EBCDIC 数据描述传统,后来又被 ISOBMFF / Matroska 等容器借鉴。② 多页(IFD chain):每个 IFD 末尾有一个指向"下一个 IFD"的 offset,多页 TIFF 就是把 IFD 串成链表。最经典用例是传真组 4(Group 4 Fax)——黑白文档扫描多页存一个 .tif;现在扩展到扫描仪批量扫描、显微镜 z-stack、卫星多光谱波段,每页一个 IFD。③ 多种 codec 可选:NONE(原始)/ PackBits(早期 Mac RLE)/ LZW(默认无损,90 年代有专利争议)/ DEFLATE(zlib,无损,现在最常用)/ JPEG-in-TIFF(把 JPEG bitstream 当 strip 数据装,1992 加,但 spec 模糊导致实现不一致)/ Group 3 / Group 4 Fax(双值黑白图像专用)/ LERC(地理空间近无损)。每个 strip 或 tile 独立 codec。④ 任意位深:1 bit(黑白扫描)/ 4 / 8(普通照片)/ 16(高保真扫描、医学影像)/ 32-bit float(IEEE 754,科研、HDR)。BitsPerSample tag 是个数组——可以是 (16, 16, 16) 表示 RGB 各 16 bit,可以是 (8, 8, 8, 8) 表示 RGBA8,甚至 (16, 16, 16, 16, 16, 16) 表示 6 通道高光谱。SampleFormat tag 进一步指定每个通道是 unsigned int / signed int / IEEE float / void(自定义)——这就是 TIFF 能存 16-bit 摄影、32-bit float HDR、整数 ID buffer 的根源。
TIFF's design boils down to four rules, unchanged in 40 years. ① IFD-based tag system: the file isn't laid out as "fields in fixed order," it's "whatever properties exist, add a row to the tag table." Tag IDs are 16-bit unsigned integers (0–65535); Adobe reserves 0–32767 and the 32768–65535 range is private tags (where GeoTIFF, DNG, OME-TIFF and other subset formats live). Each tag carries its own data type (BYTE / SHORT / LONG / RATIONAL / ASCII / FLOAT / DOUBLE — 12 in total), and a parser simply handles tags it knows and skips the rest. The design borrows directly from IBM's EBCDIC data-description tradition and was later borrowed by ISOBMFF, Matroska and other modern containers. ② Multi-page (IFD chain): each IFD ends with an offset to the next IFD, so multi-page TIFFs are linked lists of IFDs. The classic use case is Group 4 fax — multi-page black-and-white document scans in a single .tif; today this extends to flatbed batch scans, microscope z-stacks, and satellite multi-spectral bands, one IFD per page. ③ Multiple codec options: NONE (raw) / PackBits (early Mac RLE) / LZW (default lossless, embroiled in 1990s patent disputes) / DEFLATE (zlib, lossless, today's most common choice) / JPEG-in-TIFF (a JPEG bitstream stuffed into strip data, added in 1992 but with vague enough spec language that implementations still disagree) / Group 3 and Group 4 fax (bilevel black-and-white only) / LERC (near-lossless geospatial). Each strip or tile picks its codec independently. ④ Arbitrary bit depth: 1-bit (B&W scans) / 4 / 8 (regular photos) / 16 (high-fidelity scans, medical imaging) / 32-bit float (IEEE 754, science, HDR). The BitsPerSample tag is an array — it can be (16, 16, 16) for RGB at 16 bpp, (8, 8, 8, 8) for RGBA8, or even (16, 16, 16, 16, 16, 16) for six-channel hyperspectral. The SampleFormat tag further specifies whether each channel is unsigned int / signed int / IEEE float / void (custom) — that combination is exactly why TIFF can hold 16-bit photography, 32-bit float HDR, and integer ID buffers in the same container.
适用
USE FOR
- 印刷出版 · 高保真扫描 · CMYK 精确印前流程
- 扫描仪 / 复印机 / 传真机默认输出(Group 4 多页)
- 卫星 / 航测 GeoTIFF · 医学 DICOM · 显微镜 OME-TIFF · 文物数字化
- 16-bit 摄影 / Photoshop 高位深工作流中间格式
- Print publishing · high-fidelity scanning · CMYK pre-press pipelines
- Default output of scanners / copiers / fax machines (multi-page Group 4)
- Satellite / aerial GeoTIFF · medical DICOM · microscopy OME-TIFF · cultural-heritage digitisation
- 16-bit photography / intermediate format in Photoshop's high-bit-depth workflow
反适用
AVOID
- Web 网页内嵌图(浏览器不解码,得转 JPEG / PNG / WebP / AVIF)
- 移动端 / app 内分发(体积大、解码慢)
- 消费级照片分享(用 JPEG / HEIC)
- 对"必须 100% 兼容"敏感的场景(各家 reader 支持 tag 子集不同)
- Web pages (browsers don't decode TIFF; convert to JPEG / PNG / WebP / AVIF)
- Mobile / in-app distribution (large, slow to decode)
- Consumer photo sharing (use JPEG / HEIC)
- Anywhere "must be 100 % compatible" matters (different readers support different tag subsets)
| scope | editors / DCC | libraries | CLI |
|---|---|---|---|
| TIFF (.tif / .tiff) | ✓✓✓ Photoshop · Lightroom · Capture One · Affinity · GIMP · DaVinci Resolve · ArcGIS / QGIS · Fiji / ImageJ · DICOM viewers · 几乎所有图像工具 | libtiff(40 年事实标准 reference 实现)· OpenImageIO · GDAL · scikit-image · libgeotiff · OME Bio-Formats |
tiffinfo · tiffcp · tiffsplit · tiff2pdf · oiiotool · gdalinfo |
RAW — 厂商林立的原始数据
RAW — the manufacturer-fragmented origin data
"所谓 RAW,不是一个格式,是几十个互不兼容的格式族。"
"'RAW' is not one format — it's a zoo of several dozen incompatible formats."
数码相机 sensor 的原始输出是 12-14 bit Bayer pattern raw 数据——每个像素位置上只有一个颜色样本(R 或 G 或 B),需要 demosaic 算法才能算出完整 RGB。如果在相机里直接转 JPEG,会立即丢掉四样东西:(a) 高位深(14 bit → 8 bit,动态范围砍 64 倍);(b) demosaic 之前的灵活性(JPEG 已经是固定算法插值过的结果,不能换);(c) 白平衡可调性(JPEG 已经把 WB 烘进像素,后期改容易出色偏);(d) 曝光宽容度(过曝 / 欠曝在 14 bit RAW 里能拉回来,JPEG clip 后无法恢复)。摄影师需要"把决定留到后期再做"的格式 = RAW。但每家相机厂商都自己定义,互不兼容,这是后期工作流 30 年的最大头疼——也是 LibRaw / Lightroom 这些工具存在的全部理由。
A digital camera sensor's raw output is 12-14 bit Bayer-pattern data — each pixel position carries only one colour sample (R or G or B), and a demosaic algorithm has to interpolate the full RGB. Convert to JPEG inside the camera and you immediately lose four things: (a) high bit depth (14 bit → 8 bit, dynamic range cut by 64×); (b) flexibility before demosaic (JPEG is already a fixed-algorithm interpolation, you can't swap it); (c) white-balance malleability (JPEG bakes WB into pixels; later changes risk colour casts); (d) exposure latitude (over- and under-exposure can be recovered in 14 bit RAW; JPEG clips and the data is gone). The format that lets photographers "defer decisions to post" is RAW. But every camera maker defined its own, none compatible with the others — that has been the post-production headache of the past 30 years, and the entire reason LibRaw / Lightroom / Capture One exist.
技术内核
Technical core
RAW 不是一个格式,是一种思路的几十种实现。技术上有五条共同线索。① Bayer mosaic CFA:sensor 上每个物理像素只盖一种颜色滤镜(R/G/B 中的一种),按 2×2 重复排列。每个 2×2 块里有 2 绿 + 1 红 + 1 蓝(模拟人眼对绿色亮度更敏感)。读 RAW 必须先知道是 RGGB / BGGR / GRBG / GBRG 哪种,再用demosaic 算法(AHD / VNG / PPG / AMaZE / DCB / Igv …十多种)插出每个像素完整的 RGB。Fuji X-Trans 是个异类——6×6 X 形排列,普通 demosaic 算法对它效果差,得用专门的 X-Trans demosaic。② 12-14 bit/channel:不是 8 bit。这意味着比 JPEG 多 4-6 stop 动态范围(高光 / 暗部都能拉)。CMOS sensor 物理 ADC 通常 14 bit,Phase One 等中画幅可达 16 bit。RAW 把这些位深原样保留,后期"曝光 +2 / -2"才不会出 banding。③ 白平衡 / 色彩矩阵 / tone curve 全部未应用:相机只在 EXIF / MakerNotes 里"记录"拍摄时的 WB 是 5500K 还是 Auto,但不烘进像素。色彩矩阵(把 sensor 厂商特定的 R/G/B 响应曲线映射到标准 XYZ 色彩空间的 3×3 矩阵)也是同样:存为 metadata,由后期解码器应用。这是 RAW 跟 JPEG 的根本不同——后者是"决定都做完了的最终结果",前者是"原料 + 配方,但还没开火"。④ 容器基本都基于 TIFF/IFD:Canon CR2 / Nikon NEF / Sony ARW / Fuji RAF / Olympus ORF / Pentax PEF / Panasonic RW2 几乎全是 TIFF base 加私有 tag 区(0x8769 EXIF + 0x927C MakerNote + 厂商私有 tag id)。例外是 Canon CR3(2018 起,改用 ISOBMFF / HEIF 同源容器)和 Sigma X3F(自家完全独立)。这种"TIFF + 私有 tag"的设计意味着标准 TIFF reader 能看到大致结构,但解不出像素——必须靠厂商 SDK 或 LibRaw 的逆向工程。⑤ 解码必须靠厂商 SDK 或 LibRaw:Adobe Camera Raw / Lightroom 的 RAW 解码引擎是闭源商业;开源世界里 LibRaw(Dave Coffin 单文件 C 程序 dcraw 的继承者)通过逆向工程支持几乎所有相机 RAW,是 darktable / RawTherapee / digiKam / Fiji 的共同底层。dcraw 本身是工程史奇迹——Coffin 一个人 20 年维护一份单文件 C,支持上千款相机。LibRaw 接手后变成了正式 lib + 持续更新。
RAW is not one format but one idea realised dozens of times. Five common technical threads. ① Bayer mosaic CFA: every physical sensor pixel sits behind a single colour filter (one of R/G/B), arranged in a repeating 2×2. Each 2×2 has 2 green + 1 red + 1 blue (mirroring the eye's stronger luminance response to green). Reading a RAW requires first knowing the arrangement (RGGB / BGGR / GRBG / GBRG) and then running a demosaic algorithm (AHD / VNG / PPG / AMaZE / DCB / Igv … more than ten exist) to interpolate the full RGB at every pixel. Fuji X-Trans is the oddball — a 6×6 X-shaped pattern, on which generic Bayer demosaicers do poorly; it needs a dedicated X-Trans demosaic. ② 12-14 bit/channel, not 8. That means 4-6 stops more dynamic range than JPEG (highlights and shadows both recoverable). CMOS sensor ADCs are usually physically 14 bit; Phase One and similar medium-format gear reach 16 bit. RAW keeps every bit, so post-exposure "+2 / −2" doesn't band. ③ White balance, colour matrix, and tone curve are not applied. The camera only records in EXIF / MakerNotes that WB was set to 5500K or Auto — it does not bake it into the pixels. The colour matrix (a 3×3 mapping from the sensor's vendor-specific R/G/B response into standard XYZ) is likewise stored as metadata for the decoder to apply later. That is the deep difference from JPEG: JPEG is "all decisions, finalised"; RAW is "ingredients plus recipe, but the burner is off." ④ The container is almost always TIFF/IFD: Canon CR2 / Nikon NEF / Sony ARW / Fuji RAF / Olympus ORF / Pentax PEF / Panasonic RW2 are all TIFF-based with private tag regions (0x8769 EXIF + 0x927C MakerNote + vendor-private tag ids). Exceptions: Canon CR3 (since 2018, ISOBMFF — the HEIF / MP4 family) and Sigma X3F (entirely independent). The "TIFF + private tags" design means a generic TIFF reader can see the gross structure but can't decode the pixels — that requires the vendor SDK or LibRaw's reverse-engineering. ⑤ Decoding leans on the vendor SDK or LibRaw: Adobe Camera Raw / Lightroom's RAW decoder is a closed-source commercial engine; in open source, LibRaw (the successor to Dave Coffin's single-file dcraw) supports nearly every camera RAW through reverse engineering and is the shared backend of darktable / RawTherapee / digiKam / Fiji. dcraw itself is an engineering miracle — Coffin maintained a single-file C program for 20 years that supported thousands of cameras solo. LibRaw took over and turned it into a proper library with continuous updates.
图 32 · RAW 端到端处理流水线。左:相机内,sensor 出 14-bit Bayer raw,经厂商私有压缩(基本无损或可选有损)写到 .CR3 / .NEF / .ARW 文件,带 EXIF + MakerNotes 元数据。中:导入电脑后由 LibRaw / Adobe Camera Raw 用 DCP color profile + 厂商 SDK 解码,跑 demosaic → 白平衡 → 曝光 → tone curve → 色彩空间。右:输出多种最终格式——印刷归档用 16-bit TIFF,跨厂商归档用 DNG,网页用 JPEG,手机端用 HEIC / AVIF,新选项 JPEG XL。原 RAW 文件保留——这是 RAW 的全部价值:5 年后有更好的 demosaic 算法或调色风格,你可以重新出图。
Fig 32 · The end-to-end RAW pipeline. Left: in camera, the sensor produces 14-bit Bayer raw, vendor-private compression writes it to .CR3 / .NEF / .ARW with EXIF + MakerNotes metadata. Middle: imported to a host computer where LibRaw / Adobe Camera Raw decode it with the DCP colour profile and the vendor SDK, running demosaic → white balance → exposure → tone curve → colour space. Right: multiple final outputs — 16-bit TIFF for print masters and archive, DNG for vendor-neutral archive, JPEG for the web, HEIC / AVIF for mobile, and JPEG XL as the newer high-quality / low-size option. The original RAW is kept — this is the whole point of RAW: in five years, better demosaic algorithms or a new grade let you re-render the same shot.
| brand | format | year | bit depth | container |
|---|---|---|---|---|
| Canon | CR2 / CR3 | 2004 / 2018 | 14 | TIFF base · CR3 改 ISOBMFF |
| Nikon | NEF | 1999 | 12-14 | TIFF base |
| Sony | ARW | 2005 | 14 | TIFF base |
| Fujifilm | RAF | 2000 | 14 | TIFF base · X-Trans CFA |
| Olympus | ORF | 2003 | 12 | TIFF base |
| Adobe | DNG | 2004 | 12-32 | TIFF base · 公开 spec |
$ dcraw -v -w in.NEF # dcraw: 用相机 WB 解码 NEF 输出 PPM
$ dcraw -i -v in.CR2 # 只读 metadata 不解码
$ rawtherapee-cli -o out.tif -t -c in.CR2 # RawTherapee 命令行 RAW → 16-bit TIFF
$ darktable-cli in.ARW out.jpg # darktable 命令行 RAW → JPEG
$ exiv2 -p a in.RAF # 查 EXIF + MakerNotes
$ exiftool -a -G1 -s in.NEF # 万能元数据查看 · 厂商私有 tag 都列出来
$ libraw_unpack in.ARW # LibRaw 命令行: 输出未处理 Bayer raw
$ Adobe\ DNG\ Converter --convert in.CR3 out.dng # 转 DNG 归档
适用
USE FOR
- 商业摄影 / 婚礼 / 时尚 / 风光 / 影楼后期(必需 RAW)
- 专业新闻 / 体育摄影(后期裁剪 / 曝光宽容度)
- HDR 包围曝光合成源(三张 RAW 比三张 JPEG 信息多得多)
- 天文摄影 / 长时间曝光(暗部噪点处理依赖 14 bit)
- 需要"5 年后用新工具重出"的归档(DNG 推荐)
- Commercial / wedding / fashion / landscape / studio post-production (RAW required)
- Professional news / sports (post-crop, exposure latitude)
- HDR bracketed merging (three RAWs carry vastly more information than three JPEGs)
- Astrophotography / long exposures (shadow noise-handling needs 14-bit headroom)
- Archives expected to be re-rendered with future tools (DNG recommended)
反适用
AVOID
- 终端用户分享(没人想看 .NEF · 给 JPEG / HEIC)
- 实时预览 / 直播(解码慢)
- 移动端 / Web(浏览器不解 · 工具链没接)
- 手机日常拍照(ProRAW 例外,但 99% 场景普通 JPEG / HEIC 够用)
- 极小存储 / 极小内存设备(RAW 文件 20-100 MB / 张)
- Sharing with end users (nobody wants a .NEF — give them JPEG / HEIC)
- Live preview / streaming (decode is slow)
- Mobile / web (browsers don't decode; toolchains aren't wired)
- Everyday phone photography (ProRAW excepted; JPEG / HEIC suffices for 99 % of cases)
- Very-tight-storage / tight-memory devices (a RAW file is 20-100 MB)
| scope | commercial | open source | CLI / lib |
|---|---|---|---|
| vendor RAW (CR3 / NEF / ARW / RAF / ORF / RW2 / DNG …) | ✓✓✓ Adobe Lightroom · Camera Raw · Capture One · DxO PhotoLab · Phase One Capture · ON1 Photo RAW · Luminar | ✓✓ RawTherapee · darktable · ART · digiKam · UFRaw · Krita(导入)· GIMP(via plug-in)· Fiji | dcraw · libraw · exiftool · exiv2 · rawtherapee-cli · darktable-cli · Adobe DNG Converter |
DNG — Adobe 想统一 RAW
DNG — Adobe's attempt to unify RAW
"想做 RAW 的 PNG,部分成功。"
"Tried to be the PNG of RAW. Partial success."
2004 年 Adobe 看到 RAW 生态彻底碎掉:Canon CR2、Nikon NEF、Sony ARW、Fuji RAF、Olympus ORF、Pentax PEF、Panasonic RW2…几十种格式互不兼容,每出一款新相机 Adobe Camera Raw / Lightroom 就得加一个 decoder profile,工作量惊人;摄影师归档时也心慌——5 年后还能不能开一张今天的 .ARW?Adobe 推出 DNG(Digital Negative),基于开放的 TIFF/EP(TIFF Electronic Photography)扩展,目标只有一个:"一个公开 spec 的 RAW 格式装所有厂商的数据"。结果一半成功:Pentax / Leica / Hasselblad 选择原生输出 DNG,Apple 2020 年的 iPhone ProRAW 也用 DNG 包装;但 Canon / Nikon / Sony 三巨头坚持自家专有,从未给 DNG 让路。Adobe DNG Converter 工具可以把任意厂商 RAW 离线转 DNG 做归档,但转换过程可能有损 metadata——某些 MakerNotes 字段在 DNG 里没有标准对应,只能丢弃。
By 2004 Adobe saw the RAW ecosystem fully fragmented: Canon CR2, Nikon NEF, Sony ARW, Fuji RAF, Olympus ORF, Pentax PEF, Panasonic RW2 — dozens of mutually incompatible formats. Every new camera body forced Adobe Camera Raw / Lightroom to add another decoder profile, the workload was extraordinary, and photographers were nervous about archiving — would today's .ARW still open in five years? Adobe introduced DNG (Digital Negative), built on the open TIFF/EP (TIFF Electronic Photography) extension, with one goal: "one publicly specified RAW format that holds every vendor's data". The result was half a success: Pentax / Leica / Hasselblad chose to output DNG natively, and Apple's 2020 iPhone ProRAW wraps DNG too — but Canon / Nikon / Sony stuck with their proprietary formats and have never made room for DNG. The Adobe DNG Converter can offline-convert any vendor's RAW to DNG for archive, but conversion may lose some metadata — certain MakerNotes fields have no standard DNG equivalent and are simply dropped.
技术内核
Technical core
DNG 三件事撑起整个设计。① 基于 TIFF/EP 扩展:DNG 不是从零设计的容器,而是在 TIFF 6.0 + TIFF/EP(TIFF Electronic Photography,1998 ISO 12234-2)上加了一组规范化的私有 tag。这意味着已有 TIFF reader 能看到大致结构(虽然不能正确出图),也意味着 DNG spec 公开后,任何人能写 DNG 解码器——Adobe 故意降低门槛。② 厂商私有 metadata 透传:DNG 在容器里专门留一块 MakerNotes 区,把原厂的私有元数据(比如 Sony ARW 里的某个加密曝光块)原样塞进去,DNG 解码器看不懂也不会丢。这是 Adobe 跟厂商的"和解":你转 DNG 不会丢你的相机特定信息,某天厂商 SDK 想读还能读回去。③ 包含 demosaic 后的可选预览 + 完整原始 sensor 数据:DNG 文件里通常嵌一张 JPEG preview(给 Lightroom 缩略图秒开)+ 完整的 Bayer raw payload(给后期重新解码)。比起原厂 RAW 多 5-10% 体积,但换来"打开就有缩略图"的体验。某些 DNG 还可选 lossy compressed 模式(Adobe Lossy DNG,基于 JPEG 在 raw 域上做有损,体积砍 50% 但有损 RAW 的灵活度——主要给 iPhone ProRAW 用)。
DNG rests on three pillars. ① Built on TIFF/EP: DNG is not a from-scratch container; it sits on TIFF 6.0 + TIFF/EP (TIFF Electronic Photography, ISO 12234-2 from 1998) with a standardised set of private tags. Existing TIFF readers can see the gross structure (without rendering correctly), and once the DNG spec was public anyone could write a DNG decoder — Adobe deliberately lowered the barrier. ② Vendor metadata passthrough: DNG reserves a MakerNotes region in the container and stores the original vendor's private metadata (e.g. some encrypted exposure block from Sony ARW) verbatim; the DNG decoder needn't understand it, but it isn't dropped. This is Adobe's reconciliation gesture to vendors: converting to DNG doesn't lose your camera-specific information, and a vendor SDK could in principle read it back later. ③ Optional demosaiced preview + full original sensor data: a DNG file usually carries an embedded JPEG preview (so Lightroom thumbnails appear instantly) plus the complete Bayer raw payload (for re-decoding in post). The cost is 5-10 % more bytes than the original vendor RAW, in exchange for "opens with a thumbnail" UX. Some DNGs also enable a lossy mode (Adobe Lossy DNG — JPEG-style lossy in the raw domain, 50 % smaller, at the cost of some RAW flexibility — primarily targeted at iPhone ProRAW).
适用
USE FOR
- 厂商无关的 RAW 长期归档(摄影师整理 5-10 年素材)
- iPhone ProRAW(Apple 2020 起的官方 RAW 选项)
- Pentax / Leica / Hasselblad 原生输出
- Lightroom 默认导入选项("转 DNG 后导入")
- 需要可移植 RAW 的科研 / 文物数字化场景
- Vendor-neutral long-term RAW archive (5-10 years of photographer footage)
- iPhone ProRAW (Apple's official RAW option since 2020)
- Native output from Pentax / Leica / Hasselblad
- Lightroom's default import option ("convert to DNG on import")
- Research / cultural-heritage digitisation needing portable RAW
反适用
AVOID
- Canon / Nikon / Sony 主流相机原生输出(没有,只能事后转)
- 当前流水线已绑定厂商 SDK 的工作流(转换增加风险)
- 体积极敏感场景(DNG 通常比原厂 RAW 大 5-10%)
- 普通终端用户分享(用 JPEG / HEIC)
- Native output from mainstream Canon / Nikon / Sony bodies (none — can only convert)
- Workflows already bound to vendor SDKs (conversion adds risk)
- Strictly size-sensitive scenarios (DNG is typically 5-10 % larger than the original RAW)
- Sharing with regular end users (use JPEG / HEIC)
| scope | tools | libraries | CLI |
|---|---|---|---|
| DNG (.dng) | ✓✓ Adobe Camera Raw · Lightroom · Capture One · darktable · RawTherapee · Apple 系统(iPhone ProRAW 原生) | LibRaw(读)· Adobe DNG SDK(读 / 写)· libtiff(读基础结构) |
Adobe DNG Converter(GUI + CLI)· dnglab(开源 RAW → DNG)· exiftool |
CR3 / NEF / ARW — 主流厂商的 RAW
CR3 / NEF / ARW — the big-three vendor RAWs
"三家相机巨头各做一套,都不兼容,都活得很好。"
"Three camera giants, three formats, none compatible — and all thriving."
Canon / Nikon / Sony 三家占数码相机市场 80% 以上,各家拥有完整的 DSLR / 无反 + 镜头生态(EF / RF / F / Z / E / FE 卡口等),RAW 格式是其专有生态的最后一环——锁定到自家 RAW 意味着用户后期工作流也跟着锁定(用 Canon DPP / Nikon NX Studio / Sony Imaging Edge 时体验最完整,跨家就得依赖 LibRaw 或商业第三方)。Canon 2018 把 CR2 升级 CR3,容器从 TIFF 换成 ISOBMFF(同 HEIF / MP4 spec 族)——为的是跟 HEIF 工具链共享 box 解析器,顺便能在 RAW 文件里塞 HEIF 缩略图、HEVC 视频片段、AAC 音频(给"双重曝光"和短视频功能用)。Nikon NEF 一直是 TIFF base,从 1999 年 D1 到现在 Z 系列没换。Sony ARW 也是 TIFF base,但有臭名昭著的"有损 RAW"模式——早期 α 系列默认输出"压缩 RAW"实际上是有损,被摄影社区批评后才允许选"未压缩"。三家都不公开 RAW spec,LibRaw / dcraw 全靠逆向工程支持。
Canon / Nikon / Sony together hold over 80 % of the digital-camera market, each with a complete DSLR / mirrorless + lens ecosystem (EF / RF / F / Z / E / FE mounts and so on), and the RAW format is the final piece of that proprietary stack — being locked into a vendor's RAW means your post workflow follows (the experience is most complete in Canon DPP / Nikon NX Studio / Sony Imaging Edge; cross-vendor work depends on LibRaw or commercial third parties). Canon upgraded CR2 to CR3 in 2018, swapping the container from TIFF to ISOBMFF (same family as HEIF / MP4) — to share box parsers with the HEIF toolchain and incidentally to embed HEIF thumbnails, HEVC video clips, and AAC audio in the RAW file (for "double-exposure" and short-video features). Nikon NEF has been TIFF-based since the 1999 D1 and the Z series has not changed it. Sony ARW is also TIFF-based, but with the notorious "lossy RAW" mode — early α bodies defaulted to "compressed RAW" that was actually lossy, and only after sustained criticism from the photography community was an "uncompressed" option allowed. None of the three publish RAW specs; LibRaw / dcraw support them entirely through reverse engineering.
技术内核
Technical core
三巨头 RAW 共三条线索。① 容器:CR3 是 ISOBMFF,CR2 / NEF / ARW 是 TIFF 系。Canon 2018 把 CR2 升级 CR3 时换了容器,目的就是跟现代 ISOBMFF 生态(HEIF / MP4 / AVIF / JPEG XL)对齐,顺便能在一个 .CR3 里塞 RAW + JPEG preview + HEVC 视频片段 + AAC 音频(给"双重曝光"和短视频功能用)。Nikon NEF 和 Sony ARW 还是传统 TIFF base——文件开头 TIFF header,接 IFD chain,每个 IFD 装一张图(thumbnail / preview JPEG / 真正 RAW),Sony 还在 IFD 里加私有 SR2 sub-IFD 装额外 metadata。② 各家私有有损 RAW 压缩。Canon 有 CRaw(visually lossless,体积砍 30-40%);Nikon 有 NEF Compressed(实际是把 14-bit raw 用一个查找表压成 12-bit 等价精度,有损但视觉无损);Sony 早期默认就是有损"压缩 RAW"(被批评后允许选"未压缩")。这些有损模式都是闭源算法,LibRaw 逆向支持但有时跟厂商官方解码结果略有偏差。③ "有损 RAW"概念的兴起。原本 RAW 的精神就是"无损保留 sensor 数据",但 14-bit 有损压缩(类似 Lossy DNG)能砍体积 50-70%、视觉几乎无损,对存储敏感的场景(连拍 / 4K 视频拍摄间隙拍照)很有吸引力。Canon CRaw / Sony Compressed RAW / Nikon NEF Compressed 都属于这类——长远看 RAW 文件正在向 "有损但视觉无损" 滑动,这跟 JPEG XL / HEIC 的设计哲学不谋而合。
Three threads connect the big-three RAWs. ① Container: CR3 is ISOBMFF, CR2 / NEF / ARW are TIFF-family. When Canon upgraded CR2 to CR3 in 2018 it swapped the container — the goal was to align with the modern ISOBMFF ecosystem (HEIF / MP4 / AVIF / JPEG XL) and incidentally to pack RAW + JPEG preview + HEVC video clips + AAC audio into one .CR3 (used by "double-exposure" and short-video features). Nikon NEF and Sony ARW remain traditional TIFF-based — the file opens with a TIFF header, then an IFD chain, each IFD holding one image (thumbnail / preview JPEG / actual RAW); Sony additionally puts a private SR2 sub-IFD inside the IFD to carry extra metadata. ② Each vendor's private lossy RAW compression. Canon offers CRaw (visually lossless, 30-40 % size reduction); Nikon offers NEF Compressed (effectively a lookup-table that compresses 14-bit raw to a 12-bit-equivalent precision, lossy but visually lossless); Sony's early default was a lossy "compressed RAW" (after criticism, an "uncompressed" option was added). These lossy modes are closed-source algorithms — LibRaw supports them via reverse engineering but its decoder occasionally diverges slightly from the vendor's. ③ The rise of "lossy RAW". RAW's original spirit is "preserve sensor data losslessly," but 14-bit lossy compression (similar to Lossy DNG) cuts size by 50-70 % with virtually no visible loss — attractive in storage-sensitive scenarios (burst shooting, stills between 4K video clips). Canon CRaw / Sony Compressed RAW / Nikon NEF Compressed all belong here. Long-term, RAW files are sliding toward "lossy but visually lossless" — coincidentally the same design philosophy as JPEG XL / HEIC.
适用
USE FOR
- 各家相机的原生输出(谁拍用谁的 RAW · 没第二选择)
- 用厂商官方软件做后期(Canon DPP / Nikon NX Studio / Sony Imaging Edge)
- 需要厂商完整 metadata 的场景(镜头校正 / 自动 WB 微调)
- 跟 HEIF / MP4 工具链协同(CR3 ISOBMFF 容器友好)
- Native output from each vendor's cameras (whoever you shoot with, that's your RAW — no choice)
- Post in the vendor's official software (Canon DPP / Nikon NX Studio / Sony Imaging Edge)
- Scenarios needing the vendor's complete metadata (lens correction, auto-WB fine-tuning)
- Pipelines aligned with the HEIF / MP4 toolchain (CR3's ISOBMFF container fits naturally)
反适用
AVOID
- 跨厂商 / 跨工具长期归档(转 DNG 更稳)
- 需要公开 spec 的科研归档(三家都不公开)
- 极度敏感的 bit-exact 比对(LibRaw 解出的结果跟厂商 SDK 可能略有偏差)
- 移动端 / Web 直接显示
- Cross-vendor / cross-tool long-term archiving (DNG is more reliable)
- Scientific archives needing public specs (none of the three publish)
- Highly bit-exact comparisons (LibRaw decodes can deviate slightly from vendor SDKs)
- Direct display on mobile / web
| scope | vendor | libraries | CLI |
|---|---|---|---|
| CR3 / NEF / ARW + 各家私有压缩 | Canon DPP · Nikon NX Studio · Sony Imaging Edge · Adobe Camera Raw / Lightroom · Capture One | LibRaw(逆向)· Canon EDSDK / Nikon SDK / Sony SDK(闭源 / 需申请) |
libraw_unprocessed_raw · dcraw -D(输出原始 sensor 数据)· exiftool |
DICOM — 医学影像的封闭城堡 · 扛把子
DICOM — the walled city of medical imaging · heavy hitter
"它不是图片格式,是带 4000 个字段的医疗记录。"
"Not just an image format — a medical record with 4,000 attributes."
1980 年代,医院里 CT、MRI、X-ray、超声各家厂商各做一套协议:GE 的 CT 出不来 Siemens MRI 能读的文件,科室之间没法交换数据,医生想做一次跨设备影像会诊基本不可能。ACR(美国放射学院)与 NEMA(美国电气制造商协会)1985 年合作发布 ACR-NEMA 1.0,1993 年改名 DICOM 3.0 并加入网络协议。DICOM 同时定义了三件事:(a) 文件格式——一个 .dcm 既是图像也是患者完整病历;(b) 网络协议 DIMSE——医院里 CT 跟 PACS 之间的传输怎么走;(c) 元数据字典——4000+ 标准 tag 涵盖患者姓名、研究日期、modality、像素数据、窗宽窗位等任何医疗影像可能需要的字段。这套体系后来成了医院 IT 的事实标准——全球任何 CT / MRI / 超声 / 病理切片设备出厂时都说 DICOM,任何 PACS / EHR / 工作站默认输入也是 DICOM。30 年没人能挑战,因为它解决的不是"压像素",而是整个医疗影像的协议栈。
In the 1980s, hospital CT / MRI / X-ray / ultrasound vendors each defined their own protocols: a GE CT image couldn't be opened by a Siemens MRI station, departments couldn't exchange data, and a multi-modality consult was effectively impossible. ACR (American College of Radiology) and NEMA (National Electrical Manufacturers Association) jointly released ACR-NEMA 1.0 in 1985, then renamed it DICOM 3.0 in 1993 and added a network protocol. DICOM defines three things at once: (a) a file format — one .dcm is simultaneously an image and a complete patient record; (b) a network protocol, DIMSE — how images move between a CT scanner and a PACS server inside a hospital; (c) a metadata dictionary — 4,000+ standard tags covering patient name, study date, modality, pixel data, window width / level, and every medical-imaging attribute imaginable. The whole stack became the de-facto standard of hospital IT: every CT / MRI / ultrasound / pathology-slide device on Earth speaks DICOM out of the box, every PACS / EHR / workstation reads DICOM by default. 30 years later it remains unchallenged — because what it solved isn't "compressing pixels" but the entire medical-imaging protocol stack.
'DICM'。然后是 File Meta Information(group 0x0002 的 tag,核心是 transfer syntax UID,告诉解码器像素是 JPEG / JPEG-LS / 无压缩还是别的)。最后是 DataSet body——一长串 DataElement,每个是一个 (group, element) tag + 数据,既包含医疗 metadata(患者名 / modality / 窗宽窗位)也包含 (7FE0, 0010) PixelData——真正的图像。一份 .dcm 同时是图像 + 病历 + 设备信息 + 工作流上下文。'DICM'. Next is the File Meta Information (tags in group 0x0002 — most importantly the transfer-syntax UID, telling the decoder whether pixels are JPEG / JPEG-LS / uncompressed or something else). Last is the DataSet body — a long sequence of DataElements, each a (group, element) tag plus data, mixing medical metadata (patient name / modality / window width / level) and the (7FE0, 0010) PixelData — the actual image. One .dcm is image + record + device info + workflow context all at once.技术内核
Technical core
DICOM 体系庞大但有六根支柱。① DataSet = 一组 DataElement,每个 DataElement 由 4 字段组成:Tag(group, element)+ VR(Value Representation,数据类型,如 PN=PersonName / DA=Date / US=UnsignedShort / OB=OtherByte)+ Length + Value。DataSet 可以嵌套(SQ 类型 = Sequence)。② 4000+ 标准 tag 由 DICOM 数据字典维护:(0010, 0010)=PatientName、(0008, 0060)=Modality、(0008, 0020)=StudyDate、(0028, 0010)=Rows、(0028, 1050)=WindowCenter、(7FE0, 0010)=PixelData…奇数 group 留给私有扩展(各厂商私有 tag 的栖息地)。③ Transfer Syntax UID 决定像素压缩方式——这是 DICOM 最关键的"开关":1.2.840.10008.1.2(无压缩, implicit VR)、.1.2.1(无压缩 explicit VR,最常见)、.1.2.4.50(JPEG baseline)、.1.2.4.80(JPEG-LS lossless,CT/MRI 默认)、.1.2.4.91(JPEG 2000 lossy)、.1.2.5(RLE)、.1.2.4.107(HEVC main profile,新)等几十种。同一份 .dcm 可以"换 transfer syntax"重新压缩,但 metadata 完全保留。④ Multi-frame:CT 和 MRI 一次扫描会出几十到几百张切片,DICOM 既支持每张切片一个 .dcm 文件(典型用法),也支持一个文件多帧(类似 GIF 多帧)——后者方便长 cine 序列。⑤ Window / Level metadata:CT 是 12-bit 灰度数据(范围 -1024~3071 Hounsfield Units),但屏幕只能显示 8-bit。DICOM 在 metadata 里存窗宽(WW)+ 窗位(WL)——告诉显示器"把哪段 12-bit 范围映射到 8-bit 灰度"。同一张 CT,医生可以切到"骨窗"(WW=2000, WL=300)看骨折,切到"软组织窗"(WW=400, WL=40)看肿瘤,切到"肺窗"(WW=1500, WL=-600)看肺纹理——一张图三种用途。⑥ DICOMweb(WADO-RS / STOW-RS / QIDO-RS):2010s 后基于 HTTP REST 的现代接口,正在逐步替代 1980s 设计的 DIMSE TCP 协议——本质上是把 DICOM 网络层从 OSI 7 层改造成 HTTP 友好版,方便跟现代云原生 PACS / Web 浏览器集成。
DICOM is sprawling but rests on six pillars. ① DataSet = a list of DataElements; each DataElement has four fields: Tag (group, element) + VR (Value Representation — the type, e.g. PN=PersonName / DA=Date / US=UnsignedShort / OB=OtherByte) + Length + Value. DataSets can nest (the SQ / Sequence type). ② 4,000+ standard tags, maintained by the DICOM Data Dictionary: (0010, 0010)=PatientName, (0008, 0060)=Modality, (0008, 0020)=StudyDate, (0028, 0010)=Rows, (0028, 1050)=WindowCenter, (7FE0, 0010)=PixelData… Odd-numbered groups are reserved for private extensions (where vendor-private tags live). ③ Transfer Syntax UID decides pixel compression — DICOM's most important switch: 1.2.840.10008.1.2 (uncompressed, implicit VR), .1.2.1 (uncompressed, explicit VR, most common), .1.2.4.50 (JPEG baseline), .1.2.4.80 (JPEG-LS lossless, the CT/MRI default), .1.2.4.91 (JPEG 2000 lossy), .1.2.5 (RLE), .1.2.4.107 (HEVC main profile, new), and dozens more. The same .dcm can be "transcoded to a new transfer syntax" — the metadata survives untouched. ④ Multi-frame: a CT or MRI scan produces tens to hundreds of slices; DICOM supports either one file per slice (the typical layout) or one file holding many frames (like a multi-frame GIF) — the latter is handy for long cine sequences. ⑤ Window / Level metadata: CT data is 12-bit greyscale (range −1024 to 3071 Hounsfield Units) but a display only shows 8 bits. DICOM stores window width (WW) + window level (WL) in metadata — telling the viewer "map this slice of the 12-bit range to 8-bit greys." A single CT can be re-windowed: "bone window" (WW=2000, WL=300) for fractures, "soft-tissue window" (WW=400, WL=40) for tumours, "lung window" (WW=1500, WL=−600) for lung markings — one image, three purposes. ⑥ DICOMweb (WADO-RS / STOW-RS / QIDO-RS): a post-2010 HTTP-REST modernisation that is steadily replacing the 1980s-era DIMSE TCP protocol — fundamentally re-architecting DICOM's network layer from OSI-7 into something HTTP-friendly, so cloud-native PACS and web browsers can integrate cleanly.
1.2.840.10008 是 DICOM 的 OID 根命名空间,后缀决定算法。CT / MRI 实际部署里 JPEG-LS lossless(.4.80) 是默认——因为医疗影像必须无损,而 JPEG-LS 对 12-16 bit 灰度高效。JPEG 2000 lossy 主要在科研和非诊断场景。HEVC(.4.107)是 2017 年加入的新选项,主要给超声 cine 和 4D 数据用。同一份 .dcm 可以离线 transcode 换 transfer syntax,metadata 不动,只换像素压缩——这是 DICOM 灵活性的关键。1.2.840.10008 is DICOM's OID root namespace; the suffix selects an algorithm. In real CT / MRI deployments JPEG-LS lossless (.4.80) is the default — medical imagery must be lossless and JPEG-LS handles 12-16 bit greyscale efficiently. Lossy JPEG 2000 is mostly for research and non-diagnostic uses. HEVC (.4.107) was added in 2017 mainly for ultrasound cine loops and 4D data. The same .dcm can be transcoded offline to a different transfer syntax — metadata stays untouched, only the pixel compression changes — which is the heart of DICOM's flexibility.图 33 · DICOM 端到端医疗 IT 流水线。左:CT 设备扫描出 12-bit Hounsfield,经 reconstruction 后写成 N 张 .dcm 切片(每张带 4000 tag,默认 JPEG-LS lossless),通过 DIMSE 的 C-STORE 命令推到医院 PACS。中:PACS 服务器(Orthanc / dcm4che)按"Patient → Study → Series → Instance"四级层次索引,对外提供 DICOMweb HTTP REST API——QIDO-RS 按 tag 搜、WADO-RS 取像素、STOW-RS 上传。右:三条下游路径——① 放射科医生工作站(OsiriX / Horos / RadiAnt)应用窗宽窗位 + MPR + 3D 重建,出诊断报告;② AI 模型(CheXpert / nnU-Net / MONAI)读 .dcm pixel + metadata 出 segmentation;③ EHR(Epic / Cerner)用 FHIR ImagingStudy 资源把影像挂到患者档案上。整个医院 IT 体系的"像素 + 协议 + 字典"层全是 DICOM——30 年没人能挑战。
Fig 33 · The end-to-end DICOM medical-IT pipeline. Left: a CT scanner produces 12-bit Hounsfield data, runs reconstruction, writes N .dcm slices (each carrying 4,000 tags, default JPEG-LS lossless), and pushes them to the hospital PACS via the DIMSE C-STORE command. Middle: the PACS server (Orthanc / dcm4che) indexes data along the four-level "Patient → Study → Series → Instance" hierarchy and exposes a DICOMweb HTTP REST API — QIDO-RS searches by tag, WADO-RS retrieves pixels, STOW-RS uploads. Right: three downstream consumers — ① the radiologist workstation (OsiriX / Horos / RadiAnt) applies window/level + MPR + 3D rendering and produces a written report; ② AI models (CheXpert / nnU-Net / MONAI) read .dcm pixel + metadata to output segmentations; ③ the EHR (Epic / Cerner) uses the FHIR ImagingStudy resource to attach the imaging to a patient record. The entire hospital-IT stack — pixel layer, protocol layer, dictionary layer — runs on DICOM. 30 years and no one has displaced it.
| transfer syntax UID | codec | lossy? | typical use |
|---|---|---|---|
| 1.2.840.10008.1.2 | uncompressed (implicit VR) | — | small images / legacy |
| 1.2.840.10008.1.2.1 | uncompressed (explicit VR) | — | most common · default |
| 1.2.840.10008.1.2.4.50 | JPEG baseline | lossy | low-priority / preview |
| 1.2.840.10008.1.2.4.80 | JPEG-LS lossless | lossless | CT / MRI default |
| 1.2.840.10008.1.2.4.91 | JPEG 2000 lossy | lossy | research / non-diagnostic |
| 1.2.840.10008.1.2.5 | RLE | lossless | simple / integer |
| 1.2.840.10008.1.2.4.107 | HEVC main profile | lossy | ultrasound cine / 4D (new) |
$ dcmdump in.dcm # DCMTK: 看所有 tag + value
$ dcm2pnm in.dcm out.pnm # DICOM → PNM (应用 W/L)
$ dcmconv -ti in.dcm out.dcm # 改 transfer syntax(转码)
$ dcmodify -ea "(0010,0010)" in.dcm # 删除 PatientName tag(匿名化)
$ python -c "import pydicom; print(pydicom.dcmread('in.dcm'))" # pydicom 读 .dcm
$ orthanc-cli upload http://pacs:8042/ in.dcm # 上传到 Orthanc PACS
$ curl http://pacs/dicom-web/studies # DICOMweb QIDO-RS 查 studies
$ TotalSegmentator -i ct.dcm -o seg/ # AI 分割: 100+ 解剖结构
适用
USE FOR
- 医学影像所有 modality(CT / MRI / X-ray / 超声 / 病理切片 / 核医学 / 心电)
- 医院 PACS / EHR 集成(没第二选择)
- 医学 AI 模型训练 / 推理(DICOM 是事实输入格式)
- 跨设备 / 跨医院影像交换(必须遵循)
- 公开医学影像数据集发布(MIMIC-CXR / BraTS / RSNA / NIH)
- 临床 GxP / HIPAA / GDPR 合规归档
- Every medical-imaging modality (CT / MRI / X-ray / ultrasound / pathology slides / nuclear medicine / ECG)
- Hospital PACS / EHR integration (no alternative)
- Medical-AI training / inference (DICOM is the de-facto input format)
- Cross-device / cross-hospital image exchange (mandatory)
- Releasing public medical-imaging datasets (MIMIC-CXR / BraTS / RSNA / NIH)
- Clinical GxP / HIPAA / GDPR compliant archival
反适用
AVOID
- 任何非医疗场景(几乎是定义)
- 消费级 / Web / 手机端图片(没浏览器支持)
- 纯科研非临床(NIfTI / NRRD / Zarr 更轻)
- 艺术 / 摄影 / 设计(用错赛道)
- 需要小文件 / 低复杂度的场景(DICOM 元数据开销大)
- Any non-medical scenario (almost by definition)
- Consumer / web / mobile imagery (no browser support)
- Pure-research non-clinical work (NIfTI / NRRD / Zarr are lighter)
- Art / photography / design (wrong lane entirely)
- Scenarios needing tiny files / low complexity (DICOM metadata overhead is heavy)
| scope | commercial | open source | CLI / lib |
|---|---|---|---|
| DICOM 文件 + DIMSE + DICOMweb | ✓✓✓ GE Centricity · Siemens syngo · Philips IntelliSpace · Epic Radiant · Sectra · Agfa IMPAX · Carestream | ✓✓ Orthanc · dcm4che · DCMTK · OHIF Viewer · Cornerstone.js · OsiriX Lite · Horos · Weasis · 3D Slicer · pydicom · MONAI | dcmdump · dcm2pnm · dcmconv · dcmodify · pydicom · SimpleITK · orthanc-cli · dcm4che-tools |
SVG — 不是位图,但 web 里就是图
SVG — not a bitmap, but on the web it just is the image
"不存像素,存数学。屏幕多大,它就多清晰。"
"Stores math, not pixels — sharp at any size."
1990 年代末,W3C 想要一个"web 上的矢量"——能在浏览器里直接渲染、能跟 HTML / CSS / JS 共存的开放格式。当时的对手是 Adobe Flash 和 Macromedia 的私有矢量动画,以及微软推的 VML(Vector Markup Language)。1999 年 W3C 启动 SVG WG,2001 年发布 SVG 1.0 Recommendation。SVG 的核心是 XML + 矢量数学:一份 .svg 文件就是一棵 DOM 树,根 <svg> 下挂着 <rect> / <circle> / <path> / <text> 等几何元素,辅以 <linearGradient> / <filter> 等装饰。整张图被嵌入到 HTML 的 DOM 里,可被 CSS 染色、被 JS 操控、被屏幕阅读器朗读。最关键的:它不是被栅格化后才渲染——浏览器在屏幕分辨率上重新计算每条 path,所以它在 1×、2×、3× DPR 上都同等清晰。这是位图永远做不到的事。最终,SVG 战胜 VML(微软 2010 起放弃),又熬过了 Adobe 在 2010 因安全和性能问题逐步禁用的 Flash —— 2020 年 Adobe 正式停止 Flash 支持,SVG 成为 web 矢量的唯一标准。
In the late 1990s, W3C wanted a "web-native vector" — an open format that could be rendered in the browser and live alongside HTML / CSS / JS. The contenders of the day were Adobe Flash, Macromedia's proprietary vector animation, and Microsoft's VML (Vector Markup Language). W3C started the SVG WG in 1999 and shipped SVG 1.0 Recommendation in 2001. SVG's core is XML + vector math: an .svg file is a DOM tree — a root <svg> with <rect> / <circle> / <path> / <text> geometry inside, decorated by <linearGradient> / <filter> and friends. The whole image lives inside HTML's DOM — colourable by CSS, scriptable by JS, readable by screen readers. Most crucially: SVG is not rasterised first and rendered second — the browser recomputes every path at the screen's true resolution, so it stays equally sharp at 1×, 2×, 3× DPR. That is something a bitmap can never do. SVG eventually defeated VML (Microsoft abandoned it after 2010) and outlived Flash (which Adobe began phasing out in 2010 for security and performance reasons, and formally killed in 2020) — leaving SVG as the web's sole vector standard.
d 属性是一段命令字符串:M 移笔(move,起点)/ L 直线(line)/ C 三次贝塞尔(cubic)/ Q 二次贝塞尔(quadratic)/ A 椭圆弧(arc)/ Z 闭合(close)。一条 path 就是一串这种命令拼起来的轨迹,引擎按命令顺序"画"一遍。所有矢量字体、Adobe Illustrator 输出、绝大多数图标 SVG 都是这种 path —— 圆和矩形只是它的语法糖。d attribute is a command string: M move (start point) / L line / C cubic Bézier / Q quadratic Bézier / A elliptical arc / Z close. A path is just that command string strung together; the engine "draws" it in order. All vector fonts, Adobe Illustrator output and the vast majority of icon SVGs are paths like this — <rect> and <circle> are merely sugar.viewBox="0 0 40 40")在 CSS 上分别按 40 / 60 / 80 px 渲染,内部的圆 不 被预栅格化再放大,而是浏览器在当前屏幕分辨率下重新解一遍 path/circle 的几何方程。这就是矢量"无限清晰"的本质 —— 不是"图变大",而是"图被重新画了一遍"。viewBox="0 0 40 40") is rendered at 40 / 60 / 80 px in CSS; the inner circle is not pre-rasterised then scaled — the browser re-solves the geometric equation of path / circle at the current screen resolution. That is the essence of vector "infinite sharpness" — not "the picture got bigger," but "the picture was redrawn."feGaussianBlur 高斯模糊;feColorMatrix 颜色矩阵(等价于 LUT,可做去色 / 偏色 / 反相);feOffset 像素偏移(常用作 drop shadow 第一步);feMerge 把若干层合并(把 offset+blur 跟原图叠成投影)。SVG filter 是一条链,跟 Photoshop 的滤镜栈同源 —— 实际上 Photoshop / Sketch / Figma 的"投影 / 内阴影 / 模糊"导出 SVG 时就是翻译成这几个 primitive。feGaussianBlur for blur; feColorMatrix (an LUT — desaturate, tint, invert); feOffset for pixel translation (the first step of a drop shadow); feMerge to stack outputs (combine offset+blur with the source for a shadow). An SVG filter is a chain — the same lineage as Photoshop's filter stack — and indeed Photoshop / Sketch / Figma translate "drop shadow / inner shadow / blur" into exactly these primitives when exporting SVG.技术内核
Technical core
SVG 的工程内核可分六块。① XML 文档——不是二进制,是文本,因此可被 grep / diff / git blame / sed / 任何文本工具处理,这一点跟 PNG / JPEG 完全相反。优点是版本管理友好、可程序生成、可手写;缺点是大体积场景(几十万节点的复杂可视化)解析慢、内存大。② shapes + path——基本几何元素 <rect> / <circle> / <ellipse> / <line> / <polyline> / <polygon>,加最强的 <path>(命令字符串拼出任意曲线 — M/L/H/V 直线类、C/S/Q/T 贝塞尔、A 椭圆弧、Z 闭合)。所有矢量字体、所有 Illustrator 输出本质都是 path。③ 装饰 = gradient + pattern + filter——<linearGradient> / <radialGradient> 渐变;<pattern> 平铺纹理;<filter> 是滤镜链,提供 feGaussianBlur(模糊)/ feColorMatrix(LUT)/ feOffset(偏移)/ feMerge(合并)/ feComposite(合成)/ feTurbulence(柏林噪声)/ feMorphology(膨胀腐蚀)等 20+ primitive,串联起来等价于 Photoshop 滤镜栈,Sketch / Figma 的"投影"导出 SVG 就是 feOffset+feGaussianBlur+feMerge 三件套。④ CSS 染色 + class——SVG 元素接受 fill / stroke / opacity / transform 等表现属性,也接受 CSS。一个图标 SVG 在不同 dark/light theme 下只需切换 CSS 变量,不必重新导出;currentColor 关键字让 fill 跟随父元素文字颜色,这是图标库(Heroicons / Lucide / Phosphor)的核心机制。⑤ JS 操控——每个 SVG 元素都是 DOM Node,document.querySelector('circle').setAttribute('cx', 100) 直接生效。这是 D3.js / Observable Plot / Chart.js / Recharts 这一整代数据可视化库的根基 —— 它们的真正能力不是"画 SVG",而是"把数据 join 到 SVG DOM 元素上,让 SVG DOM 跟随数据更新"。⑥ 动画三条路:(a) SMIL(Synchronized Multimedia Integration Language)在 SVG 1.0 时定的 <animate> / <animateTransform> / <animateMotion>,声明式但被 Chrome 一度想废弃,现在保留但不推荐;(b) CSS animation + transform / opacity,现代主流,跟 HTML 一致;(c) JS / requestAnimationFrame,最灵活,D3 / GSAP / anime.js 都用。在它们之上,Lottie(2017,Airbnb 的 Bodymovin AE 插件 → JSON,JS lib 渲染)是矢量动画的现代补充 —— 设计师在 After Effects 里做动画,导成 JSON,Lottie lib 在浏览器 / iOS / Android 上以 SVG 或 Canvas 渲染。底层渲染路径仍然是 SVG / Canvas 的几何指令。
SVG's engineering core breaks into six pieces. ① XML document — text, not binary, so it grep / diff / git blame / sed / any text tool — the polar opposite of PNG / JPEG. The upside is version-control friendliness, scriptability, hand-authorability; the downside is that giant scenes (a viz with 100k nodes) parse slowly and bloat memory. ② Shapes + path — primitive geometry <rect> / <circle> / <ellipse> / <line> / <polyline> / <polygon>, plus the killer <path> (a command string composing arbitrary curves — M/L/H/V for straight, C/S/Q/T for Béziers, A for elliptical arcs, Z to close). Every vector font and every Illustrator export is essentially a path. ③ Decoration = gradient + pattern + filter — <linearGradient> / <radialGradient>; <pattern> for tiles; <filter> is a filter chain with 20+ primitives — feGaussianBlur, feColorMatrix (LUT), feOffset, feMerge, feComposite, feTurbulence (Perlin noise), feMorphology — strung together equivalent to Photoshop's filter stack. Sketch / Figma's "drop shadow" export is exactly feOffset + feGaussianBlur + feMerge. ④ CSS styling + class — SVG elements accept presentation attributes (fill / stroke / opacity / transform) and also CSS. An icon SVG can switch dark/light theme via a single CSS variable; currentColor lets fill inherit the parent's text colour — the central mechanism behind icon libraries like Heroicons / Lucide / Phosphor. ⑤ JS manipulation — every SVG element is a DOM Node, so document.querySelector('circle').setAttribute('cx', 100) just works. That is the foundation of an entire generation of dataviz libraries — D3.js, Observable Plot, Chart.js, Recharts — whose true superpower isn't "drawing SVG" but "joining data to SVG DOM nodes so the DOM updates with the data." ⑥ Three animation paths: (a) SMIL (Synchronized Multimedia Integration Language) defined the original <animate> / <animateTransform> / <animateMotion> — declarative, briefly threatened with deprecation by Chrome, now retained but not recommended; (b) CSS animation + transform / opacity, the modern mainstream — same as HTML; (c) JS / requestAnimationFrame, the most flexible — D3 / GSAP / anime.js. Layered on top, Lottie (2017, Airbnb's Bodymovin After-Effects plugin → JSON + JS runtime) is the modern vector-animation supplement: designers animate in AE, export to JSON, and Lottie renders in browsers / iOS / Android via SVG or Canvas. The underlying render path is still SVG / Canvas geometric drawing.
图 36 · SVG 完整处理流程。XML 源被浏览器解析为 SVG DOM 树,CSS 样式与 JS 操控直接作用于 DOM 节点(可热更新),布局阶段按 viewBox + transform 计算几何,可选的 filter chain(由 feOffset / feBlur / feMerge 等 primitive 串联)在栅格化前应用,最后才在当前屏幕 DPR 上栅格化为像素。这条流水线跟 PNG / JPEG 走"解码 → 完整位图 → 缩放采样"完全不同 — SVG 永远在最后一刻、按设备分辨率重画一次,所以无论 1× / 2× / 3× 屏都同等清晰。CSS 染色 / JS 数据可视化 / filter 投影都发生在 DOM 上,不需要重新导出文件 — 这是 D3 / Heroicons / Figma 设计交付能跑起来的工程基础。
Fig 36 · SVG's full processing pipeline. The XML source is parsed by the browser into a live SVG DOM tree; CSS and JS act directly on the DOM (hot-reloadable); layout computes geometry from viewBox + transform; an optional filter chain (composed of feOffset / feBlur / feMerge primitives) runs before rasterisation; only at the very end is the result rasterised at the current screen DPR. This differs fundamentally from PNG / JPEG's "decode → full bitmap → resample on resize" — SVG is redrawn once, at the device's true resolution, in the last moment, so it stays equally sharp on 1× / 2× / 3× displays. CSS theming, JS-driven dataviz, filter shadows — all on the DOM, no re-export needed — that's the engineering foundation under D3, Heroicons and Figma's "design hand-off" workflow.
| feature | SVG | PNG | Lottie | |
|---|---|---|---|---|
| 缩放无损 | ✓ | ✗ | ✓ | ✓ |
| Web 嵌入 | ✓ inline / img | ✓ img | ✗(大多数) | ✓ JS lib |
| 动画 | SMIL / CSS / JS | APNG | 无 | JSON timeline |
| 文本可搜索 | ✓ XML | ✗ | ✓ | partial |
| 体积(图标) | ~1 KB | ~3-10 KB | ~10 KB | ~30 KB |
$ svgo in.svg -o out.svg # 优化 SVG · 删冗余、合并 path
$ inkscape --export-png=out.png in.svg # SVG → PNG · CLI 友好
$ resvg in.svg out.png # Rust SVG 渲染器 · 服务端常用
$ lottie2html in.json out.html # Lottie JSON → HTML/SVG 静态化
$ npx @figma/code-connect svg in.fig # Figma → SVG 导出
适用
USE FOR
- 图标 / logo / UI 装饰(Heroicons / Lucide / Phosphor)
- 数据可视化(D3.js / Observable Plot / Chart.js / Recharts)
- 需要 Retina / 4K 屏天生免疫的任何图
- 需要 CSS 变量切 dark/light theme 的图
- 需要 currentColor 跟随父元素文字颜色的图标
- 简单动画 / loader / 微交互(CSS animate)
- 设计交付 / 跨 DCC(Figma / Sketch / Illustrator 都原生输出)
- Icons / logos / UI decoration (Heroicons / Lucide / Phosphor)
- Data visualization (D3.js / Observable Plot / Chart.js / Recharts)
- Anything that must stay sharp on Retina / 4K displays
- Anything theme-able via CSS variables (dark / light)
- Icons that follow parent text colour via
currentColor - Simple loaders / micro-interactions (CSS animation)
- Design hand-off across DCCs (Figma / Sketch / Illustrator all export SVG)
反适用
AVOID
- 照片(没有压缩比优势 · 体积爆炸)
- 复杂栅格内容(渐变噪点 / 真实纹理 / 模糊)
- 百万节点级 dataviz(DOM 解析 + 重排极慢 · 改用 Canvas / WebGL)
- 外部嵌入需执行 JS 的场景(
<img src>模式 JS 被浏览器禁)
- Photos (no compression edge — files explode)
- Complex raster content (noisy gradients, real textures, blur)
- Million-node dataviz (DOM parse + reflow are slow — use Canvas / WebGL)
- Embeds that need to run JS (browsers disable JS in
<img src>mode)
| scope | browsers / runtimes | editors / DCC | CLI |
|---|---|---|---|
| SVG (W3C) | ✓✓ 所有现代浏览器原生 · React / Vue / Svelte 原生 JSX 支持 · iOS / Android (WebView · React Native SVG) · Skia / Cairo / Core Graphics 引擎 | ✓✓ Figma · Sketch · Illustrator · Inkscape · Affinity Designer · Boxy SVG · 所有现代设计工具均原生导出 | svgo · inkscape · resvg · rsvg-convert · imagemagick convert · cairosvg |
PDF — 容器之王
PDF — king of containers
"你以为它是文档,其实是个能装一切的容器 —— 矢量、位图、字体、JS。"
"You think it's a document. It's a container holding everything — vectors, bitmaps, fonts, JavaScript."
1993 年 Adobe Acrobat 1.0 推出 PDF(Portable Document Format),目标是"任何打印机、任何屏幕看到的内容一致"——这件事在 1993 年其实没解决:你在 Mac 上排好的版到 Windows 打印机上字体丢失、布局错位是日常,LaTeX / TeX 那种把布局序列化进文件的思路在工业界没普及。Adobe 创始人 John Warnock 决定把自家的 PostScript(打印机用的页面描述语言)简化、加上随机访问索引,做成一个面向查看与归档的格式 — 这就是 PDF。基于 PostScript 简化,固定页面布局,可嵌字体 + 位图 + 矢量 + JS + 表单。30 年后成为合同、表单、印刷、归档的事实标准,2008 年 PDF 1.7 成为开放 ISO 32000 标准,Adobe 对自家格式失去专有控制权 —— 这个让步反而是 PDF 真正普及的关键。
In 1993 Adobe Acrobat 1.0 launched PDF (Portable Document Format) with a single ambition: "any printer, any screen, the same page." That problem wasn't solved at the time — typing a layout on Mac and printing it on Windows routinely produced missing fonts and broken pages, while LaTeX / TeX's idea of serialising the layout into the file hadn't reached industry. Adobe co-founder John Warnock chose to simplify his own PostScript (the page-description language inside printers), add a random-access index, and ship it as a viewer / archival format — that is PDF. Built on a simplified PostScript, fixed-page-layout, and able to embed fonts + bitmaps + vectors + JS + forms. Thirty years later it is the de-facto standard for contracts, forms, print, and archival. In 2008 PDF 1.7 became the open ISO 32000 standard and Adobe lost proprietary control of its own format — the concession that finally made PDF universal.
技术内核
Technical core
PDF 的工程内核四件事。① 基于 PostScript 改良的页面描述语言—— 矢量绘图原语(m moveto / l lineto / c curveto / S stroke / f fill),跟 SVG path 命令同源思想。但 PDF 把 PostScript 的"图灵完备 + 解释执行"裁掉了,只保留可渲染的子集,加上 xref 随机访问索引,让 1000 页文件能任意翻页。② 可嵌入字体 + 位图 + 矢量—— 字体支持 Type1 / TrueType / OpenType / CID(CJK 大字符集);图像支持 JPEG / JBIG2(C40·黑白扫描)/ CCITT G4(传真)/ JPEG 2000(C8)/ Flate(zlib)等多种 codec — 整个 PDF 文件本质上是一个容器,实际像素由内嵌的 codec 解码。③ 分页 + 表单 + JS + 数字签名—— Page Tree 支持长文档;AcroForm / XFA 表单(可填写、可提交);Action 对象可绑定 JavaScript(报税表 / 计算字段);Signature 字段配合 PKI 数字签名让 PDF 在合同 / 法律文件场景立足。④ 归档子集 PDF/A —— ISO 19005,2005 起定义,禁用透明 / JS / 外部依赖 / 加密,要求嵌入所有字体 — 是 PDF 的 strict 子集,目的是"30 年后还能打开"。法律 / 政府 / 科研论文归档是 PDF/A 的主战场。
PDF's engineering core, four pieces. ① Page-description language descended from PostScript — vector primitives (m moveto / l lineto / c curveto / S stroke / f fill), kindred to SVG's path commands. But PDF removed PostScript's "Turing-complete interpreted execution," keeping only the renderable subset and adding the xref random-access table, so jumping around a 1000-page file is fast. ② Embeddable fonts + bitmaps + vectors — fonts: Type1 / TrueType / OpenType / CID (large CJK glyph sets); images: JPEG / JBIG2 (C40 · black-white scan) / CCITT G4 (fax) / JPEG 2000 (C8) / Flate (zlib). PDF is fundamentally a container; actual pixels are decoded by the inner codecs. ③ Pagination + forms + JS + digital signatures — Page Tree scales to long docs; AcroForm / XFA forms (fillable, submittable); Action objects bind JavaScript (tax forms with computed cells); Signature fields use PKI to put PDF on solid legal ground for contracts. ④ PDF/A archival subset — ISO 19005, defined from 2005, bans transparency / JS / external dependencies / encryption and mandates embedded fonts — a strict subset designed to "still open in 30 years." Legal, government and scientific-paper archival lives on PDF/A.
适用
USE FOR
- 合同 / 法律文件(数字签名 + 跨平台一致)
- 印刷 / 排版交付(InDesign 导出 PDF/X 印刷标准)
- 表单(税表 / 申请表 / 可填可提交)
- 长文档归档(PDF/A · 30 年后仍可打开)
- 科研论文 / 学术出版(LaTeX → pdflatex 输出)
- 电子书(固定布局 · 不重排)
- Contracts / legal documents (digital signature + cross-platform consistency)
- Print / typography delivery (InDesign → PDF/X print standard)
- Forms (tax forms, applications — fillable, submittable)
- Long-term archival (PDF/A — still openable in 30 years)
- Scientific papers / academic publishing (LaTeX → pdflatex)
- E-books (fixed layout, no reflow)
反适用
AVOID
- Web 主图(浏览器有原生 viewer 但加载慢 · 用 SVG / image)
- 响应式 / 重排内容(PDF 是固定布局 · 用 EPUB / HTML)
- 移动端阅读体验(放大缩小笨重 · 用 EPUB)
- 需要修改 / 协作的活文档(用 Google Doc / Notion / Office 365)
- Web hero images (browsers have viewers but loading is slow — use SVG / image)
- Responsive / reflowable content (PDF is fixed-layout — use EPUB / HTML)
- Mobile reading (zoom is clumsy — use EPUB)
- Live collaborative documents (use Google Docs / Notion / Office 365)
| scope | viewers | tools | CLI |
|---|---|---|---|
| PDF (ISO 32000) | ✓✓ Adobe Acrobat / Reader · macOS Preview · pdf.js (Mozilla, 浏览器内置) · Foxit · SumatraPDF · Skim | ✓✓ Adobe Acrobat Pro · Affinity Publisher · LibreOffice · LaTeX (pdflatex) · Word / Pages 导出 · InDesign 导出 PDF/X | qpdf · pdftk · pdftoppm · pdfinfo · mutool · ghostscript · pandoc |
EPS — PostScript 的图片化身
EPS — PostScript dressed as an image
"PostScript 加上 BoundingBox,就成了'图片'。"
"PostScript plus a BoundingBox = an 'image'."
1987 年 Adobe 为印刷出版定义 EPS(Encapsulated PostScript)—— 解决一个具体的工程问题:PostScript 1985 起作为打印机页面描述语言,文件本身是整页描述,没有"这张图占多大区域"的概念。但当时的 DTP(桌面出版)排版软件(Aldus PageMaker · QuarkXPress · 后来的 InDesign)需要把插图嵌入文档,要知道图边界做版心 / 文字绕排 / 缩放。Adobe 的解法非常简洁:一个普通 PostScript 文档,加上一行 %%BoundingBox: x1 y1 x2 y2 注释声明图像边框 —— 排版软件读这一行就能知道该图占多少空间,不必真去解 PS。再加 %%BeginPreview / %%EndPreview 嵌入位图预览(给不能渲染 PS 的程序看)。这就是 EPS。一个超低成本的"约定":不修改 PS 语法本身,只用注释扩展。这个格式撑起 90 年代到 2000 年代的全部印刷设计与 LaTeX 论文图表,2010 年代后被 PDF 完全替代 —— 因为 PDF 同样能做这件事,而且不需要"约定",直接是标准。
In 1987 Adobe defined EPS (Encapsulated PostScript) for the print-publishing industry — solving a concrete engineering problem. PostScript, from 1985, was a printer page-description language; a file described an entire page, with no notion of "how much space this illustration takes." But DTP applications (Aldus PageMaker / QuarkXPress / later InDesign) needed to embed illustrations inside documents, with a known bounding box for layout, text wrap, and scaling. Adobe's fix was elegantly minimal: a regular PostScript document plus one comment line — %%BoundingBox: x1 y1 x2 y2 — declaring the image's frame. The DTP app reads that line to know the size, without ever interpreting the PS itself. Add %%BeginPreview / %%EndPreview to embed a bitmap preview (for apps that can't render PS), and you have EPS — a near-zero-cost "convention" that extends PS via comments rather than syntax. The format carried virtually all print design and LaTeX paper figures through the 1990s and 2000s, and was wholly replaced by PDF after the 2010s — because PDF does the same thing without needing a convention; it's just the standard.
%!PS-Adobe-3.0 EPSF-3.0 标识 + 一些 DSC structuring comments);② %%BoundingBox: x1 y1 x2 y2(图像边框,DTP 软件读这一行决定版心);③ 可选的位图预览(给不能渲染 PS 的旧程序看);④ 真正的 PostScript 绘图代码(m / l / c / S / f 等命令)。整个文件是合法的 PostScript,可被 GhostScript 直接解释 —— EPS 的"图片化"完全靠 BoundingBox 这一行注释,不修改 PS 语法本身。%!PS-Adobe-3.0 EPSF-3.0 marker + DSC structuring comments); ② %%BoundingBox: x1 y1 x2 y2 (the bounding box DTP apps read for layout); ③ optional bitmap preview (for legacy apps that can't render PS); ④ the actual PostScript drawing code (m / l / c / S / f commands). The whole file is valid PostScript, interpretable directly by GhostScript — the "image-ness" of EPS rests entirely on the BoundingBox comment, with no change to PS syntax itself.技术内核
Technical core
EPS 的内核只有两件事。① 普通 PostScript 文档 + 必须包含 %%BoundingBox 注释 —— BoundingBox 用 4 个数字声明图框(左下 x / 左下 y / 右上 x / 右上 y,单位 PostScript point = 1/72 inch);DTP 软件读这一行做版心,完全不需要解释 PS 本体。这是"约定式扩展"的工程经典 —— 0 成本扩展旧标准。② 可选 %%BeginPreview / %%EndPreview 嵌入位图缩略图(TIFF / WMF / PICT 三种主流格式)。1990 年代很多 DTP 软件不能在屏幕上渲染 PS(GhostScript 普及前 PS 解释开销大),所以排版时屏幕看到的是 preview 位图,打印时打印机解释真正的 PS 输出矢量。这种"屏幕用预览 / 打印用矢量"的双轨工作流是 EPS 的实际使用模式。EPS 的限制也很清楚:不支持透明(PS 没有 alpha 通道概念)、不支持多页(单页才叫"图片")、不支持表单 / JS / 加密(那是 PDF 的事)。这些限制在 1987 年是合理的,但到 2000 年代设计需求复杂化后,PDF(同样基于 PostScript 但加了透明、压缩、随机访问、多页、嵌入字体)就成了天然替代。LaTeX \includegraphics{fig.eps} 是 90 年代-2010 年代论文标配 —— pdflatex 流行后,EPS 几乎被 PDF 替代,因为 pdflatex 不能直接吃 EPS,需要 epstopdf 转换。
EPS's core is only two things. ① A regular PostScript document that must contain a %%BoundingBox comment — BoundingBox declares the figure frame using four numbers (lower-left x / lower-left y / upper-right x / upper-right y, in PostScript points = 1/72 inch). DTP apps read that single line for layout without interpreting the PS body — a textbook example of "convention-based extension," which extends a legacy standard at zero cost. ② Optional %%BeginPreview / %%EndPreview embeds a bitmap thumbnail (TIFF / WMF / PICT being the main formats). In the 1990s many DTP apps couldn't render PS on screen (PS interpretation was expensive before GhostScript matured), so on screen they showed the preview bitmap and at print time the printer interpreted the real PS as vectors. This "preview on screen / vector at print" two-track workflow was how EPS was actually used. EPS's limitations are equally clear: no transparency (PS has no alpha concept), no multi-page (a single page is what "image" meant), no forms / JS / encryption (those came in PDF). Reasonable in 1987, but as design needs grew through the 2000s, PDF — also PostScript-derived but with transparency, compression, random access, multi-page, embedded fonts — became the natural successor. LaTeX's \includegraphics{fig.eps} was the standard for academic figures from the 90s through the 2010s; once pdflatex became dominant, EPS was almost entirely replaced by PDF, since pdflatex doesn't ingest EPS directly and requires epstopdf.
适用
USE FOR
- (历史)90 年代-2000 年代印刷设计交付
- (历史)老 LaTeX 论文图表(latex+dvips 工作流)
- 跟老印刷机 / 老 RIP 兼容的图形交付
- 需要纯矢量 PostScript 输出的科学绘图(老版 gnuplot / xfig)
- (legacy) 1990s-2000s print design hand-off
- (legacy) old LaTeX figures (latex + dvips workflow)
- Compatibility with vintage presses / RIPs
- Pure-PostScript scientific plots (old gnuplot / xfig)
反适用
AVOID
- Web(浏览器不支持 EPS)
- 现代设计交付(用 PDF / SVG)
- 需要透明的设计(EPS 不支持透明)
- 多页文档(用 PDF)
- pdflatex 工作流(需
epstopdf转换 · 不如直接 PDF)
- Web (no browser support for EPS)
- Modern design hand-off (use PDF / SVG)
- Anything needing transparency (EPS has none)
- Multi-page documents (use PDF)
- pdflatex workflows (needs
epstopdf— go straight to PDF)
| scope | editors | renderers | CLI |
|---|---|---|---|
| EPS (Adobe) | ✓ Adobe Illustrator · Inkscape · Affinity Designer · CorelDRAW(都可读老资产) | ✓ GhostScript · old QuarkXPress · old PageMaker · 老印刷机 RIP | ps2pdf · epstopdf · gs(GhostScript)· pstoedit |
AI — Illustrator 的私有
AI — Illustrator's proprietary file
"实质是 PDF + Adobe 私有 metadata。"
"Actually a PDF with Adobe-private metadata."
1987 年 Adobe Illustrator 1.0 推出,自定义 .ai 格式存放矢量插画 —— 跟 EPS / PDF 同年诞生,是 Adobe 在 80 年代末"PostScript 三件套"里专门给设计师用的源文件容器。早期(Illustrator 1.0 - CS1)的 .ai 是简化 PostScript,跟 EPS 几乎同源(都是 PS 子集),但加了 Illustrator 专有的图层、画板、笔刷等 metadata。CS2(2005)后 Adobe 做了一个有趣的工程决定:把 .ai 底层切到 PDF —— 因为 PDF 已经能装下 PostScript 矢量 + 字体 + 透明 + 嵌入位图(Adobe 内部的 PGF 私有 codec),再加上 Illustrator 私有的 PrivateData section 存放图层 / artboard / brush / 实时效果等 Illustrator 专有信息,就是完整的 .ai。结果:.ai 文件用 Adobe Reader 打开能看到栅格化预览(因为底层就是 PDF,Reader 直接渲染了内嵌的栅格化版本),但只有 Illustrator 才能完整编辑图层结构。这是设计师交付的"源文件"标准 —— 你在视觉行业接到的 brand kit、logo 源文件、海报源文件,90% 是 .ai。
In 1987 Adobe Illustrator 1.0 launched with the proprietary .ai format for vector illustrations — born the same year as EPS / PDF, the source-file container of Adobe's "PostScript trifecta" of the late 1980s. Early .ai (Illustrator 1.0 - CS1) was a simplified PostScript, kindred to EPS (both PS subsets), with Illustrator-specific metadata layered on top — layers, artboards, brushes. From CS2 (2005), Adobe made an interesting engineering decision: switch the .ai underbelly to PDF — because PDF already carried PostScript vectors + fonts + transparency + embedded raster (via Adobe's private PGF codec), plus an Illustrator-private PrivateData section for layers, artboards, brushes, live effects. So a .ai opens in Adobe Reader and shows a rasterised preview (because the file is fundamentally a PDF, and Reader renders the embedded raster version) — but only Illustrator can edit the layer structure. That is the "source file" standard for design delivery: 90 % of the brand kits, logo sources, and poster sources you'll receive in the visual industry are .ai files.
技术内核
Technical core
.ai 的内核两件事。① CS2(2005)后 .ai 格式底层就是 PDF —— 严格说是带 Adobe PGF(私有位图 codec,内嵌栅格化预览)+ 完整矢量绘图指令的 PDF 文档。这个工程决定的副作用极其有趣:.ai 保存时会内嵌一份 PDF 兼容预览(默认勾选 "Create PDF Compatible File"),所以 Adobe Reader / Preview / 浏览器 PDF viewer 都能直接打开 .ai 看到栅格化效果 — 但拿不到图层。② 私有 PrivateData section 存 Illustrator 特有的图层(Layers,可命名 / 锁定 / 隐藏 / 嵌套)/ 画板(Artboards,一份 .ai 可有多张画板,做 brand kit 一次性交付 logo + favicon + 名片)/ 笔刷(自定义 brush)/ 实时效果(Live Effects:阴影 / 模糊 / 3D 等可编辑非破坏性效果)/ 符号(Symbol,可重用元件)。这部分私有 chunk 是 Adobe 的护城河 —— 没有公开规范,Inkscape / Affinity Designer 只能部分解析(读到矢量 path 和填色,但图层结构 / 实时效果常丢)。设计师交付源文件时圈内默认就是 .ai —— 因为它是唯一能保留全部"可编辑性"的格式;导出 SVG / PDF 都会损失部分 Illustrator-only 信息。
.ai's core, two pieces. ① From CS2 (2005), the .ai underbelly is PDF — strictly, a PDF document with Adobe's PGF (private bitmap codec for the embedded raster preview) plus full vector drawing commands. The side effect is amusing: when you save a .ai, Illustrator embeds a PDF-compatible preview by default (the "Create PDF Compatible File" checkbox), so Adobe Reader / Preview / browser PDF viewers all open it and show the rasterised view — but never the layer structure. ② Private PrivateData section for Illustrator-specific layers (named / lockable / hidable / nestable), artboards (a .ai can hold many artboards — deliver logo + favicon + business card in one brand-kit file), brushes (custom), live effects (non-destructive shadow / blur / 3D), and symbols (reusable components). That private section is Adobe's moat — undocumented; Inkscape and Affinity Designer can only partially parse it (reading vectors and fills but commonly losing layer hierarchy and live effects). The designer's industry default is to hand off .ai because it is the only format that preserves full "editability" — exporting to SVG / PDF discards Illustrator-only information.
适用
USE FOR
- 设计师交付源文件(brand kit / logo / 海报源)
- 多画板项目(一文件装多张交付)
- 需要保留图层 / 实时效果的可编辑设计
- 跟其他 Adobe CC 软件协作(InDesign / After Effects / Photoshop 智能对象)
- Designer source-file delivery (brand kit / logo / poster source)
- Multi-artboard projects (one file for many deliveries)
- Editable designs preserving layers / live effects
- Adobe-CC interop (InDesign / After Effects / Photoshop smart objects)
反适用
AVOID
- 任何不装 Illustrator 的场景(Inkscape / Affinity 只能部分读)
- Web 嵌入(用 SVG 导出)
- 跨工具协作(用 SVG / PDF 中间格式)
- 开源工作流(私有格式 · 锁定 Adobe 生态)
- Anywhere without Illustrator (Inkscape / Affinity only partially parse)
- Web embedding (export to SVG)
- Cross-tool collaboration (use SVG / PDF as the lingua franca)
- Open-source workflows (proprietary — locks you into Adobe)
| scope | full editor | partial readers | CLI |
|---|---|---|---|
| .ai (Adobe 私有) | ✓✓ Adobe Illustrator(唯一完整支持) | ~ Inkscape(读矢量 + path)· Affinity Designer · CorelDRAW · Adobe Reader(只读 PDF 预览) | 几乎无 · uconv / pdftocairo 把 PDF 部分提取 |
JBIG2 — PDF 里的黑白压缩
JBIG2 — the black-and-white compressor inside PDF
"扫描黑白合同的瘦身高手,但 2013 出过事故。"
"The B&W scan slimming wizard — with a 2013 incident on its record."
2000 年 ITU-T 标准化 JBIG2(T.88)替代上一代 G3 / G4 传真编码,专门压扫描黑白文档(合同、票据、账单、医学胶片黑白扫描)。它解决一个具体的工程问题:CCITT G4(1980 年代传真标准)是逐行 RLE,体积砍 10× 已经是上限,但 1990 年代末扫描分辨率从 200 DPI 升到 600 DPI,文件再次膨胀。JBIG2 的关键创新是把页面切成 symbol(连通域) —— 把扫描页面里所有连通的像素块识别出来,相似 symbol 共享一个 dictionary 模板,实际像素流变成"在 (x, y) 引用 dictionary 第 N 个 symbol"。一页扫描合同里所有的 'e' 在视觉上可能 90% 像,JBIG2 只存一个 'e' 模板,其余位置都是引用 —— 体积比 CCITT G4 砍一半到三分之二。Acrobat 9(2008)起,JBIG2 成为 PDF 默认黑白扫描压缩。但 2013 年 Xerox 复印机用有损 JBIG2(允许"用相似 symbol 替代")导致扫描合同里的数字 6 被替换成 8,工程图纸尺寸出错,Xerox 召回固件 —— 此后法律 / 工程行业默认关 JBIG2 选无损 CCITT G4。
In 2000 ITU-T standardised JBIG2 (T.88) to replace the previous-generation G3 / G4 fax encodings, targeting scanned black-and-white documents (contracts, invoices, statements, B&W medical scans). It solved a specific engineering problem: CCITT G4 (1980s fax) was per-row RLE, capped near 10× compression, but late-1990s scanners climbed from 200 DPI to 600 DPI and files swelled again. JBIG2's key innovation is cutting pages into symbols (connected components) — every connected pixel cluster on a page is identified, similar symbols share one dictionary template, and the actual stream becomes "at (x, y) reference symbol #N." On a scanned contract, all of the 'e' glyphs are ~90 % visually identical, so JBIG2 stores one 'e' template and turns the rest into references — file size drops to half or a third of CCITT G4. From Acrobat 9 (2008), JBIG2 became PDF's default for black-and-white scans. But in 2013 a Xerox copier using lossy JBIG2 (which permits "substitute a similar symbol") caused 6s in scanned contracts to be replaced by 8s, producing wrong dimensions in engineering blueprints. Xerox recalled the firmware. Legal and engineering industries have since defaulted to disabling JBIG2 and using lossless CCITT G4.
(x, y, refid) 引用 —— 体积从 462 px 降到 154 px + 3 个坐标对。整个扫描页里所有重复字形都这样处理,实际能砍掉 50-70% 的 G4 体积。有损模式更激进:允许"非常相似的 symbol 共享一个模板",这就是 2013 年 Xerox 把数字 6 错配成 8 的原因 — 6 和 8 在扫描噪点下视觉相似度极高。(x, y, refid) references — 462 px collapses to 154 px + three coordinate triples. Across an entire scanned page, all repeated glyphs go through this dictionary, knocking 50-70 % off CCITT G4. Lossy mode is more aggressive: it allows "very similar symbols to share one template," which is exactly how 2013 Xerox mis-substituted '6' for '8' — the two digits are visually indistinguishable to the heuristic under scan noise.技术内核
Technical core
JBIG2 三件事。① 把扫描页面切成 symbol(连通域)—— 编码器扫描整页,识别出所有连通像素块(每个字符 / 每个标点 / 每段线条),把视觉相似的 symbol 共享一个 dictionary 模板。位流变成"位置 + dictionary 引用",而不是"逐像素栅格"。② 三种 region 编码:(a) generic region 用 CABAC 算术编码逐像素压缩,处理不规则内容(图标、签名、印章);(b) text region 用上面的 symbol 字典,处理文本(占扫描合同的 90%);(c) halftone region 用 grayscale 模板字典,处理半调网点(扫描照片的二值化)。三种 region 在同一页里可混用 —— 编码器自动分割。③ 有损模式 vs 无损模式:无损模式严格匹配 symbol(只共享 bit-exact 相同的连通域);有损模式允许"用相似 symbol 替代",阈值由编码器决定 —— 体积更小,但可能静默修改字符。这就是 2013 年 Xerox 事故的根因:数字 6 和 8 在扫描噪点下连通域形状相似,有损 JBIG2 把同一份模板用在两个不同字符上,导致扫描出的合同跟原件数字不一样。Xerox 召回固件,法律 / 工程 / 医疗行业从此默认关 JBIG2 选无损 CCITT G4 —— 即便牺牲一倍体积也要保证 bit-exact。Acrobat 提供"无损 JBIG2"选项,但 default 是有损,所以你扫描合同前要手动关掉。
JBIG2 in three pieces. ① Slice the scanned page into symbols (connected components) — the encoder scans the whole page, identifies every connected pixel cluster (every character, every punctuation mark, every line stroke), and lets visually similar symbols share one dictionary template. The bitstream becomes "position + dictionary reference," not "pixel-by-pixel raster." ② Three region encodings: (a) generic region uses CABAC arithmetic per-pixel for irregular content (icons, signatures, stamps); (b) text region uses the symbol dictionary above, for text (about 90 % of a contract scan); (c) halftone region uses grayscale-template dictionaries, for halftone screens (the binarisation of scanned photos). All three coexist on a single page, with the encoder choosing per-area. ③ Lossy vs lossless mode: lossless matches symbols strictly (sharing only bit-exact identical components); lossy permits "substitute a similar symbol" by an encoder-side threshold — smaller, but can silently rewrite characters. That is exactly the 2013 Xerox bug's root cause: the digits 6 and 8 have visually similar connected components under scan noise, and lossy JBIG2 reused one template across two different characters — so the scanned contract's digits no longer matched the original. Xerox recalled the firmware, and legal / engineering / medical industries have since disabled JBIG2 in favour of lossless CCITT G4 — willing to pay 2× the size to guarantee bit-exactness. Acrobat does offer a "lossless JBIG2" option, but the default is lossy, so you must turn it off explicitly before scanning a contract.
适用
USE FOR
- 非关键扫描黑白文档(图书馆藏书 / 报纸归档 / 普通账单)
- 已开启无损模式的合同 / 票据扫描
- 需要 PDF 体积砍 5-10× 的纯文本扫描场景
- 医学胶片黑白图像归档(无损模式)
- Non-critical B&W scans (library books, newspaper archives, casual statements)
- Contract / receipt scans only when lossless mode is enabled
- Pure-text scan PDFs needing 5-10× shrink
- B&W medical-image archival (lossless mode)
反适用
AVOID
- 灰度 / 彩色扫描(JBIG2 只有 1-bit · 用 JPEG 2000)
- 法律合同(默认有损模式可能改字符 · 强烈建议无损或关闭)
- 工程图纸 / 数字尺寸(2013 Xerox 事故先例)
- 医学诊断报告(任何字符替换都不可接受)
- Grayscale / colour scans (JBIG2 is 1-bit only — use JPEG 2000)
- Legal contracts (default lossy can rewrite characters — force lossless or off)
- Engineering blueprints with numeric dimensions (the 2013 Xerox precedent)
- Medical diagnostic reports (no character substitution acceptable)
| scope | encoders | decoders | CLI |
|---|---|---|---|
| JBIG2 (ITU-T T.88) | ✓ Adobe Acrobat Pro · LuraTech · CVision · ABBYY · Foxit Phantom | ✓ 所有 PDF viewer(Adobe Reader · Preview · pdf.js)· 独立 .jb2 解码罕见 | jbig2enc(开源 · Lepton)· jbig2dec(GhostScript)· Acrobat 命令行 |
TGA — Truevision 时代的纹理王
TGA — the texture king from the Truevision era
"3D 游戏行业用了 20 年的纹理格式 —— 因为 alpha 简单。"
"Twenty years of 3D game textures — chosen for its simple alpha."
1984 年 Truevision 公司推出 Targa 系列显卡 —— 这是早期 PC 真彩(24-bit)显卡的代表作,而 TGA(Targa Graphics Adapter)正是该卡的"出厂格式":一种结构极简的位图容器,用来把显卡里 24-bit RGB / 32-bit RGBA 像素数据原样存到磁盘。规范一句话能讲完:18 byte 文件头 + 可选 image ID + 可选调色板 + 像素数据 + 可选 RLE,解析比 BMP 还快(BMP 还得分 V3 / V4 / V5 几代)。Truevision 公司在 90 年代后期破产,但 TGA 因为另一个生态延续了生命 —— 1990 年代中期 id Software 的 Quake 引擎、Epic 的 Unreal 引擎、Valve 的 Source 引擎都把 TGA 当作纹理标准,理由极简单:① 32-bit RGBA 透明用一个独立 alpha 通道,不像 BMP 那样要靠 magic 像素;② 18 byte 头部解析 30 行 C 代码搞定,引擎启动时一次性吃掉成百上千张纹理零负担;③ 跨平台 (DOS / Windows / IRIX / Mac),老纹理工具链全部支持。所以"格式厂商死了,格式靠用户死撑"是 TGA 的故事 —— 你今天打开 Quake 1 的 mod 包,里面 90% 的纹理仍是 .tga,Photoshop 也仍原生支持。
In 1984 Truevision launched the Targa line of graphics cards — early flagship 24-bit colour cards for the PC — and TGA (Targa Graphics Adapter) was the card's "factory" format: a minimal bitmap container for dumping 24-bit RGB / 32-bit RGBA pixel data straight from VRAM to disk. The whole spec fits in a sentence: 18-byte header + optional image ID + optional palette + pixel data + optional RLE, parseable faster than BMP (which has the V3 / V4 / V5 generation soup). Truevision went out of business in the late 1990s, but TGA lived on inside another ecosystem — id Software's Quake, Epic's Unreal, and Valve's Source engine all adopted TGA as their texture standard for embarrassingly simple reasons: (1) 32-bit RGBA carries transparency in a real alpha channel, not BMP's magic-pixel hack; (2) the 18-byte header parses in 30 lines of C, so an engine can wolf down hundreds of textures at startup; (3) it's cross-platform (DOS / Windows / IRIX / Mac) and every legacy texture tool already supported it. So TGA's story is "the vendor died, but the users carried the format" — open a Quake 1 mod pack today and 90 % of the textures are still .tga, with Photoshop still supporting it natively.
18 byte 文件头(image type / 调色板属性 / 宽高 / 每像素位数 / origin / descriptor);② 可选 image ID 字段(通常空);③ 可选调色板(8 / 16-bit 模式才有);④ 真正的像素数据,顺序是 BGR(A) 而非 RGB(A)—— 这跟 BMP 一致,反映 80 年代显存按字节小端读出的事实;⑤ 可选的 v2.0 footer(20 byte),里面带 "TRUEVISION-XFILE.\0" 签名。整张图可选 RLE 压缩(每段 1 byte 头 + 像素),压缩率不高但解码 50 行 C。规范极简,所以 Quake/Unreal 系引擎用了 20 年。18-byte header (image type / palette attrs / width / height / bits-per-pixel / origin / descriptor); ② optional image ID (usually empty); ③ optional colormap (only for 8 / 16-bit modes); ④ the real pixel payload, ordered BGR(A) rather than RGB(A) — same as BMP, a reflection of 1980s little-endian byte-by-byte VRAM reads; ⑤ an optional v2.0 footer (20 bytes) carrying the "TRUEVISION-XFILE.\0" signature. The whole pixel block can be RLE-compressed (1-byte header per run + pixels) — compression isn't great but the decoder is 50 lines of C. That minimalism is exactly why Quake / Unreal engines used it for two decades.技术内核
Technical core
TGA 内核三件事。① 18 byte 文件头是规范的全部 —— 包含 image type ID(决定 colormap / RGB / B&W,有无 RLE)、colormap 属性、image origin / 宽 / 高、bits-per-pixel(8 / 16 / 24 / 32)、image descriptor(里面有 alpha 位数 + 行扫描方向);整个解析 50 行 C 全搞定。这是极简的工程胜利:对比 BMP 三代 header(BITMAPINFOHEADER → V4 → V5),TGA 一份 header 写到死。② 像素深度 8 / 16 / 24 / 32 四档:8-bit indexed 走调色板(老游戏精灵图);16-bit 是 5:5:5 + 1-bit alpha(老 3D 卡硬件最爱);24-bit 真彩 BGR;32-bit BGRA 带完整 alpha —— 后两个是 1990 年代游戏纹理的事实标准。注意像素是 BGR 顺序(同 BMP),这是 80 年代 x86 小端 + 显存按 byte 读取的共识,移植到 OpenGL / Direct3D 时引擎要逐像素 swap。③ 可选 RLE 压缩:每段 1 byte run header(高位 1 表示 RLE,7 位长度) + 像素值,简陋但解码极快。Quake 1 / 2 / 3 / Unreal Tournament / Half-Life 1 时代的纹理基本都是 24-bit TGA + RLE off(磁盘大,但 mmap 进显存零拷贝),这个工作流一直延续到 2005 年前后 DDS / KTX 把 TGA 替代 —— 因为 GPU 直接支持的压缩纹理(DXT / BCn)能在 VRAM 里压成 1/4 体积,TGA 只是 raw RGBA。
TGA's core, three pieces. ① The 18-byte header is the entire spec — image type ID (colormap / RGB / B&W, RLE or not), colormap attrs, image origin / width / height, bits-per-pixel (8 / 16 / 24 / 32), image descriptor (alpha bit count + scan direction). The whole parser is 50 lines of C. A win for minimalism: compare BMP's three-generation header soup (BITMAPINFOHEADER → V4 → V5); TGA wrote one header and never changed it. ② Four pixel depths: 8 / 16 / 24 / 32 — 8-bit indexed via colormap (old game sprites); 16-bit as 5:5:5 + 1-bit alpha (favourite of early 3D cards); 24-bit truecolour BGR; 32-bit BGRA with full alpha — the last two are the 1990s game-texture standards. Note pixels are BGR-ordered (same as BMP), the consensus of 1980s x86 little-endian + byte-by-byte VRAM reads; engines must swap per pixel when porting to OpenGL / Direct3D. ③ Optional RLE: each run is 1 byte (top bit = RLE flag, 7 bits = length) + pixel value — crude, but very fast to decode. Quake 1 / 2 / 3 / Unreal Tournament / Half-Life 1 era textures were almost all 24-bit TGA, RLE off (bigger on disk, but mmap straight into VRAM with zero copies). That workflow lasted until ~2005 when DDS / KTX replaced TGA — because GPU-native compressed textures (DXT / BCn) shrink to 1/4 in VRAM, while TGA is just raw RGBA.
适用
USE FOR
- 老 3D 引擎纹理(Quake / Unreal / Source 系 mod)
- 需要带 alpha + 极简头的中转格式
- 工具链中间格式(渲染输出 → TGA → 压缩成 DDS / KTX)
- 视频后期合成中转(Nuke / After Effects 序列帧)
- Old 3D-engine textures (Quake / Unreal / Source mods)
- Intermediate format that wants alpha + a tiny header
- Toolchain bridges (renderer output → TGA → DDS / KTX)
- VFX compositing intermediates (Nuke / After Effects sequences)
| scope | editors | engines / readers | CLI |
|---|---|---|---|
| TGA (Truevision) | ✓ Photoshop 原生 · GIMP · Krita · Affinity Photo | ✓✓ Quake / Unreal / Source / Cryengine 系老引擎 · Nuke · After Effects | convert in.png out.tga(ImageMagick)· tga2png · stb_image(50 行 C) |
ICO / CUR — 浏览器标签上的小图
ICO / CUR — the tiny image on your browser tab
"它装着多分辨率的同一个图标,16 / 32 / 48 / 256 一锅端。"
"It packs the same icon at 16 / 32 / 48 / 256 — a multi-res bundle."
1985 年 Windows 1.0 出现时,Microsoft 面临一个具体的工程问题:同一个应用图标在 16×16(任务栏 / 标题栏)、32×32(桌面 / 文件管理器)、48×48(start menu)甚至更大尺寸下都要好看 —— 但单纯把 32×32 缩到 16×16 会糊掉,小尺寸需要手工像素绘制(每个像素都要算)。Microsoft 的解法是设计一个"多分辨率包" —— 一个 ICO 文件存 N 个 image entry,每个 entry 是一张完整图(不同尺寸 / 不同色深),操作系统按显示场景挑最合适的那个。1999 年 IE 5 把它推上 web 一等公民:<link rel="icon" href="favicon.ico"> 让浏览器标签也能展示网站图标 —— 这是 favicon 的诞生。Vista(2007)给 ICO 加了"内嵌 PNG"支持,256×256 大尺寸图标终于可用(BMP 256×256 太大,PNG 压缩后只剩 1/10)。CUR 是 ICO 的鼠标指针变种,几乎一样的容器结构,只多两个字段:hotspot x / y(指针的"实际点击位置",比如箭头尖在哪个像素)。今天每个浏览器仍优先认 favicon.ico,即便你已经声明了 SVG / PNG favicon —— 因为 IE 时代的事实标准实在太顽强了。
When Windows 1.0 shipped in 1985, Microsoft faced a concrete engineering problem: the same application icon had to look right at 16×16 (taskbar / titlebar), 32×32 (desktop / Explorer), 48×48 (start menu), and beyond — but naively scaling 32×32 down to 16×16 looks blurry, because at small sizes every pixel must be hand-painted. Microsoft's fix was a "multi-res package": one ICO file holding N image entries, each a complete image at a different resolution / colour depth, with the OS picking the best match for the display scenario. In 1999, IE 5 promoted it to a first-class citizen of the web: <link rel="icon" href="favicon.ico"> let browser tabs show site icons — that's the birth of favicon. Vista (2007) added "embedded PNG" support to ICO, finally making 256×256 icons practical (a 256×256 BMP is huge; PNG-compressed it's a tenth the size). CUR is the cursor variant — same container, plus two extra fields: hotspot x / y (which pixel of the cursor counts as "the click point," e.g. the arrow tip). Every browser today still favours favicon.ico even after you declare SVG / PNG favicons — the IE-era de-facto standard is just that hard to dislodge.
技术内核
Technical core
ICO 内核三件事。① 容器结构 + N 个 image entry —— 6 byte ICONDIR 头(reserved + type=1 是 ICO / 2 是 CUR + image_count) + N 个 16 byte 的 ICONDIRENTRY(每条描述 width / height / colour count / planes / bit count / size / offset)+ N 段真正的 image data。entry 在文件末尾按偏移堆放。这种"目录 + payload"是 80 年代 PE / OLE 时期 Microsoft 的标准设计语言。② 早期 entry 内嵌 BMP,Vista 后允许内嵌 PNG —— 1985-2007 年所有 ICO 内嵌的都是 BMP(去掉 BITMAPFILEHEADER,只留 BITMAPINFOHEADER + 像素 + AND mask),32×32 32-bit alpha 一个 4096 byte 起步。Vista(2007)给 ICO 加 PNG 内嵌支持(magic 字节判断:开头 89 50 4E 47 是 PNG,否则当 BMP),终于让 256×256 大尺寸图标可用 —— BMP 256×256 32-bit 是 256 KB,PNG 压完通常 20-50 KB。这是 ICO 唯一一次重大演进。③ CUR 与 ICO 几乎一样,多 hotspot 字段 —— ICONDIRENTRY 里 reserved 的两个 byte,在 CUR 文件里被重新解释为 hotspot_x / hotspot_y(指针图像里的"实际点击点"坐标,比如箭头尖在 (0, 0) 像素位置)。这就是鼠标指针的全部技术差异。今天 favicon.ico 仍是浏览器最优先识别的图标格式 —— 即便你声明了 <link rel="icon" href="favicon.svg">,浏览器仍会先 GET /favicon.ico 再走 link 标签。
ICO's core, three pieces. ① Directory + N image entries — a 6-byte ICONDIR header (reserved + type=1 for ICO / 2 for CUR + image_count) + N 16-byte ICONDIRENTRYs (each describing width / height / colour count / planes / bit count / size / offset) + N image-data blobs piled at the end by offset. The "directory + payload" idiom is pure 1980s Microsoft (PE / OLE house style). ② Early entries embed BMP, Vista+ embeds PNG — from 1985 to 2007 every ICO entry was a BMP (BITMAPFILEHEADER stripped; just BITMAPINFOHEADER + pixels + AND mask), with a 32×32 32-bit alpha icon starting around 4 KB. Vista (2007) added PNG embedding (magic-byte sniff: 89 50 4E 47 means PNG, otherwise BMP), finally making 256×256 icons practical — a 256×256 32-bit BMP is 256 KB, PNG-compressed usually 20-50 KB. That was ICO's one and only major evolution. ③ CUR is ICO with hotspot fields — the two "reserved" bytes in ICONDIRENTRY are reinterpreted as hotspot_x / hotspot_y in CUR (the cursor's "actual click point," e.g. an arrow's tip lives at pixel (0, 0)). That's the whole cursor difference. Today favicon.ico is still the highest-priority icon for browsers — even with <link rel="icon" href="favicon.svg"> declared, browsers still GET /favicon.ico first, then check the link tags.
适用
USE FOR
- Web favicon(浏览器最高优先级)
- Windows 桌面 / 文件管理器 / 任务栏图标
- 需要多分辨率打包的应用图标
- CUR · 自定义鼠标指针(游戏 / 老 Windows 主题)
- Web favicon (highest browser priority)
- Windows desktop / Explorer / taskbar icons
- Application icons that need multi-res packaging
- CUR · custom mouse cursors (games / old Windows themes)
反适用
AVOID
- 任何非 favicon / 非桌面图标场景
- 需要矢量缩放的图标(用 favicon.svg)
- 跨平台(macOS 用 .icns · Linux 用 PNG)
- 动画图标(用 SVG / GIF / animated PNG)
- Anything that isn't a favicon or desktop icon
- Vector-scalable icons (use favicon.svg)
- Cross-platform (macOS uses .icns; Linux uses PNG)
- Animated icons (use SVG / GIF / animated PNG)
| scope | browsers / OS | editors | CLI |
|---|---|---|---|
| ICO / CUR (Microsoft) | ✓✓✓ 所有浏览器(自 IE 5)· Windows native · macOS / Linux 也能读 | ✓ Photoshop(插件)· GIMP · IcoFX · Greenfish Icon Editor | convert in.png -resize 256 out.ico(ImageMagick)· icotool · png2ico |
NetPBM (PPM / PGM / PBM) — 教科书最爱的 ASCII 三件套
NetPBM (PPM / PGM / PBM) — the textbook ASCII trio
"你用文本编辑器就能写一张图。"
"You can write an image in a text editor."
1988 年 Jef Poskanzer 写 NetPBM 工具集时,需要一个"最简单的图像格式" —— 简单到 vim 能直接编辑、cat 能看出大致内容、Unix 管道能 convert in.png pgm:- | sharpen | convert pgm:- out.png 串联。三件套从极简递增:PBM(Portable Bitmap,1-bit 黑白)、PGM(Portable Graymap,灰度)、PPM(Portable Pixmap,RGB)。每个有两套编码:ASCII 模式(P1 / P2 / P3 magic)用空格分隔的十进制数字写像素值;binary 模式(P4 / P5 / P6 magic)用 raw 字节。ASCII 模式可以 vim 直接编辑像素 —— 这就是计算机视觉教学最常见的"hello world":学生第一次自己 fwrite 出一张图,几乎都是 PPM。NetPBM 工具集本身有 200+ 个小命令(pamflip / pamcat / pnmtopng / pamscale / ...),每个只做一件事 —— 这是 80s Unix 哲学的活化石,跟 grep / sed / awk 是同一种生物。今天 NetPBM 在生产几乎不用,但学术研究、算法测试、工具链中转格式至今仍用 —— 因为它太简单,任何人 1 小时能写完整的 PPM reader / writer。
In 1988, while writing the NetPBM toolkit, Jef Poskanzer needed "the simplest possible image format" — simple enough that vim could edit it directly, cat could roughly read it, and Unix pipes could chain it as convert in.png pgm:- | sharpen | convert pgm:- out.png. The trio steps up in capability: PBM (Portable Bitmap, 1-bit black & white), PGM (Portable Graymap, grayscale), PPM (Portable Pixmap, RGB). Each has two encodings: ASCII mode (magic P1 / P2 / P3) writing pixel values as space-separated decimal numbers; binary mode (P4 / P5 / P6) using raw bytes. The ASCII mode is editable in vim — which is why PPM is the canonical "hello world" of computer-vision teaching: a student's first fwrite-an-image is almost always a PPM. The NetPBM toolkit itself ships 200+ small commands (pamflip / pamcat / pnmtopng / pamscale / ...), each doing exactly one thing — a living fossil of 1980s Unix philosophy, kin to grep / sed / awk. Today NetPBM is almost never used in production, but academic work, algorithm tests, and toolchain bridges still rely on it — because it is so simple that anyone can write a complete PPM reader / writer in an hour.
P3(P1=PBM ASCII / P2=PGM / P3=PPM,P4 / 5 / 6 是对应 binary);② 4 4 宽高;③ 255 maxval(色深上限,8-bit 就是 255);④ 后面是 4×4=16 个像素的 RGB 三元组。整个文件可用 vim 编辑、cat 阅读。binary 模式(P6)只把第 ④ 段换成 raw 字节,前三行仍是 ASCII —— 所以 PPM 文件 head -3 永远是文本头,parser 一行解析一个字段即可。P3 (P1 = PBM ASCII / P2 = PGM / P3 = PPM; P4 / 5 / 6 are the binary counterparts); ② 4 4 width and height; ③ 255 maxval (colour depth ceiling — 255 for 8-bit); ④ then 16 RGB triples for the 4×4 image. Edit it in vim, read it with cat. Binary mode (P6) only replaces section ④ with raw bytes — the first three lines stay ASCII — so head -3 on any PPM is always a text header, and a parser can lex one field per line.技术内核
Technical core
NetPBM 三件套的内核三件事。① 三档色深 = 三个格式:PBM(1-bit 黑白,每像素 0 / 1)、PGM(灰度,每像素一个 0..maxval 的整数)、PPM(RGB,每像素三个 0..maxval 的整数)。再加一个伞名 PNM(Portable Anymap)代表"上面三个之一",NetPBM 工具命令 pnmtopng 表示"任何 PNM 都能转 PNG"。② ASCII 模式 + binary 模式两套 magic:P1(PBM ASCII)/ P2(PGM ASCII)/ P3(PPM ASCII)的像素值用空格 / 换行分隔的十进制数字写;P4 / P5 / P6 是对应的 binary 模式,像素是 raw 字节(PBM packed bits,PGM / PPM 是 1 byte 或 2 byte per channel 取决于 maxval)。ASCII 体积大但能 vim 编辑、binary 紧凑但仍头部 ASCII。③ 头部 = magic + 宽 + 高 + maxval(PBM 无 maxval)+ 像素值,字段之间用任意空白(空格 / 制表 / 换行)分隔,允许 # 开头的注释行。整个 spec 一页能写完。NetPBM 工具集设计哲学:200+ 个小命令(pamflip 翻转 / pamcat 拼接 / pnmtopng 转 PNG / pamscale 缩放 / pamcomp 合成 / ...),每个 source 几百行 C,只做一件事,可 Unix 管道串联 —— cat input.ppm | pamflip -lr | pnmtopng > output.png 是合法工作流。这套哲学今天活在 ImageMagick / FFmpeg 里,但 NetPBM 是更纯的"原版"。
NetPBM trio's core, three pieces. ① Three depths = three formats: PBM (1-bit B&W, pixel is 0 / 1), PGM (grayscale, pixel is one 0..maxval integer), PPM (RGB, pixel is three 0..maxval integers). Plus an umbrella name PNM (Portable Anymap) meaning "any of the three"; NetPBM's pnmtopng command means "any PNM convertible to PNG." ② ASCII + binary modes, two magics each: P1 (PBM ASCII) / P2 (PGM ASCII) / P3 (PPM ASCII) write pixel values as decimal numbers separated by whitespace; P4 / P5 / P6 are the binary counterparts (PBM packed bits, PGM / PPM 1 byte or 2 bytes per channel depending on maxval). ASCII is bulky but vim-editable; binary is compact but still has an ASCII header. ③ Header = magic + width + height + maxval (PBM has no maxval) + pixel values, fields separated by any whitespace, with # comments allowed. The whole spec fits on one page. NetPBM's toolkit philosophy: 200+ small commands (pamflip, pamcat, pnmtopng, pamscale, pamcomp, ...), each a few hundred lines of C, each doing one thing, all pipeable — cat input.ppm | pamflip -lr | pnmtopng > output.png is a legitimate workflow. That spirit lives on inside ImageMagick / FFmpeg today, but NetPBM is the purer original.
适用
USE FOR
- 学术研究 / 计算机视觉教学(算法 hello world)
- 算法测试(写 50 行 C / Python 就能 fwrite 出可视化)
- 工具链中间格式(很多 Unix 命令直接读写 PPM)
- 需要"vim 能改"的极端调试场景
- Academic research / CV teaching (the algorithm "hello world")
- Algorithm testing (50 lines of C / Python fwrites a visualisation)
- Toolchain bridges (many Unix tools read / write PPM natively)
- Extreme debugging where you need vim-editable pixels
反适用
AVOID
- 任何生产 / web 场景(无压缩 · 体积比 BMP 还大)
- 需要 alpha 通道(PPM 本身无 alpha · PAM 才有)
- 需要色彩管理 / EXIF / 嵌入 ICC 的场景
- 跟现代 web / GPU 工具链对接(用 PNG)
- Any production / web context (no compression — bigger than BMP)
- Anything needing alpha (PPM has none — only PAM does)
- Workflows needing colour management / EXIF / embedded ICC
- Modern web / GPU toolchains (use PNG)
| scope | readers | editors | CLI |
|---|---|---|---|
| NetPBM (PPM / PGM / PBM / PNM) | ✓ GIMP · ImageMagick · ffmpeg · Python (PIL / OpenCV) · 几乎所有 Unix 图像工具 | ✓ 任意文本编辑器(ASCII 模式)· GIMP · ImageMagick | NetPBM 200+ 命令套件:pamcat · pnmtopng · pnmtotiff · pamflip · pamscale |
XBM / XPM — X Window 的 ASCII 图
XBM / XPM — X Window's ASCII images
"图片就是 C 数组,#include 进 X 程序就能用。"
"The image is a C array — #include it into your X program."
1985 年 X Window System 在 MIT Athena 项目里诞生时,所有 X 应用都是 C 写的;开发者需要把图标(光标 / 应用 logo / 工具栏 button)直接嵌入程序 —— 当时没有"资源文件"的标准做法(Windows 的 .rc 资源系统 1985 年也才刚出来)。X Consortium 想出一个绝妙的偷懒解法:既然程序是 C,那图标就写成C 字节数组,编译时 #include "icon.xbm" 直接进 binary。XBM(X Bitmap)是 1-bit 黑白:static char name_bits[] = { 0xff, 0x80, 0x40, ... };,每个 byte 8 个像素。1989 年法国 Bull 公司扩展成 XPM(X PixMap)加调色板:文件顶部声明每个 ASCII 字符对应一种颜色(" c None" 透明,". c #ffffff" 白,"+ c #000000" 黑),下面是字符矩阵图,每个像素用一个或多个 ASCII 字符表示 —— 整个 .xpm 文件本身就是合法 C 字符串数组。Web 早期 Mozilla / Netscape 也支持过 XBM / XPM(因为 Unix 上的浏览器开发者太熟了),但 1990 年代后 PNG / GIF 成为主流,XBM / XPM 退到 X 老应用领域。今天 GIMP 安装目录里仍有 .xpm 图标,fvwm / twm 老 X 主题也仍用 XPM —— 这是 80 年代 Unix-C 共生关系的活化石。
When X Window System was born at MIT's Project Athena in 1985, every X app was written in C, and developers needed to embed icons (cursors / app logos / toolbar buttons) directly inside binaries — at the time there was no standard "resource file" idiom (Windows' .rc system also only emerged in 1985). The X Consortium's clever lazy fix: since the program is C, write the icon as a C byte array, then #include "icon.xbm" at compile time. XBM (X Bitmap) was 1-bit B&W: static char name_bits[] = { 0xff, 0x80, 0x40, ... };, eight pixels per byte. In 1989 France's Bull Research extended it to XPM (X PixMap) with palettes: the file header declares one ASCII character per colour (" c None" transparent, ". c #ffffff" white, "+ c #000000" black), followed by a character matrix where each pixel is one or more ASCII characters — the whole .xpm file is itself a valid C string array. Early web Mozilla / Netscape supported XBM / XPM (Unix-side browser devs knew them intimately), but after the 1990s PNG / GIF took over and XBM / XPM retreated to legacy X apps. Today GIMP's install directory still ships .xpm icons; fvwm / twm legacy X themes still use XPM — a living fossil of the 1980s Unix-C symbiosis.
" " = 透明,"." = 黄,"+" = 黑),声明每个 ASCII 字符对应的颜色;② 字符矩阵图,每行一个字符串字面量,每个字符就是一个像素。整文件是合法的 C string 数组,#include "smile.xpm" 直接编译进 binary。这就是为什么早期 X 程序能"内嵌图标" —— 不需要加载器,编译时就嵌进去了。" " = transparent, "." = yellow, "+" = black) declaring each ASCII character's colour; ② a character matrix — one string literal per row, each character is one pixel. The whole file is a valid C string array; #include "smile.xpm" compiles it straight into the binary. That's how early X programs "embedded icons" — no loader needed; the icon enters the binary at compile time.技术内核
Technical core
XBM / XPM 内核两件事。① XBM = 1-bit 黑白 + C 字节数组 —— 文件就是 #define name_width 16 / #define name_height 16 + static char name_bits[] = { 0xff, 0x80, ... };;每个 byte 装 8 个像素(LSB-first,跟 X server 的 bitmap 图像对齐),解析 = C 编译器读取 = 0 解码代价。这种"图片即源码"的设计,只有在开发者就是用户(X Window 程序员)的语境下才合理 —— 没有非开发者会写 XBM。② XPM = 多色 + 调色板 + 字符矩阵 —— 1989 年 Bull 公司扩展:头部声明 "width height ncolors chars_per_pixel",然后是 ncolors 行 colormap("X c #rrggbb" 或 "X c colorname"),最后是字符矩阵(每行一个字符串字面量)。chars_per_pixel 通常是 1,但调色板超过 ASCII 印刷字符数(~94)时可以用 2 个字符代表一个像素(支持上千色)。整文件仍是 C string 数组语法,所以可 #include 进程序也可以从磁盘 fopen 解析(libXpm 提供解析器 —— 但其实就是个 C 源码 lexer)。这种"格式即源码"的设计后来还有几个变种:Apple PICT 早期可序列化为 PostScript,Lisp Machine 直接把图片存成 sexp,但只有 XBM / XPM 真正广泛使用过。今天 XBM / XPM 几乎被 PNG / SVG 完全替代,但你打开 GIMP 安装目录(/usr/share/gimp/2.10/)里的 themes / icons,仍能看到大量 .xpm —— 老 X 应用的代码资产惯性。
XBM / XPM's core, two pieces. ① XBM = 1-bit B&W + C byte array — the file is #define name_width 16 / #define name_height 16 + static char name_bits[] = { 0xff, 0x80, ... };; each byte holds 8 pixels (LSB-first to align with X server bitmaps), and "parsing" means letting the C compiler read it — zero decoding cost. The "image is source code" design only makes sense when developers are the users (X programmers). Non-developers don't author XBMs. ② XPM = multi-colour + palette + character matrix — Bull's 1989 extension declares "width height ncolors chars_per_pixel" at the top, then ncolors colormap lines ("X c #rrggbb" or "X c colorname"), then a character matrix (one string literal per row). chars_per_pixel is usually 1, but if the palette exceeds the ~94 printable ASCII range you can use 2 chars per pixel (supporting thousands of colours). The file remains valid C string-array syntax, so it's both #include-able into a program and fopen-parseable from disk (libXpm provides a parser — really just a C-source lexer). Several variants attempted similar tricks later (Apple PICT serialised to PostScript; Lisp Machines stored images as sexps), but XBM / XPM are the only ones that saw real adoption. They are essentially obsolete today, but the GIMP install directory (/usr/share/gimp/2.10/) still ships piles of .xpm icons in themes / icons — pure code-asset inertia from legacy X apps.
适用
USE FOR
- (历史)X Window 系应用图标
- (历史)fvwm / twm / IceWM 等老 X 主题
- (历史)GIMP / xterm / xfig 内嵌图标资产
- 极简调试场景(直接 cat 看图,因为是 ASCII)
- (legacy) icons in X Window applications
- (legacy) fvwm / twm / IceWM old X themes
- (legacy) embedded icons in GIMP / xterm / xfig
- Extreme debugging — cat the file and "see" the image (it's ASCII)
反适用
AVOID
- 任何现代场景(用 PNG / SVG)
- 非 X 平台(Windows / macOS 几乎不用)
- 高色深 / 大尺寸(XPM 字符矩阵在 256×256 24-bit 下文件巨大)
- 需要压缩 / alpha 通道 / 色彩管理
- Anything modern (use PNG / SVG)
- Non-X platforms (almost unused on Windows / macOS)
- High-depth / large images (XPM's char matrix bloats at 256×256 24-bit)
- Anything needing compression / alpha / colour management
| scope | readers | editors | CLI |
|---|---|---|---|
| XBM / XPM (X Consortium) | ✓ ImageMagick · GIMP · libXpm · 老 X11 应用 · 早期 Mozilla / Netscape | ✓ 任意文本编辑器(本质是 C 源码)· GIMP 导出 · pixmap-tools | xbmtopbm · pamtoxpm · convert in.png out.xpm(ImageMagick) |
PCX — DOS 时代的痕迹
PCX — a fingerprint of the DOS era
"DOS Paintbrush 的产物,有过 BBS 时代的辉煌。"
"From DOS Paintbrush — once king of the BBS era."
1985 年 ZSoft 公司推出 PC Paintbrush —— 这是 DOS 时代最流行的画图程序;Microsoft 1990 年代收购 ZSoft 部分技术后,把 PC Paintbrush 改名 Windows Paintbrush 内置进 Windows 3.0(后来又改名 Paint)。配套的 PCX 格式有两个核心特性:简单 RLE 压缩(让 PCX 比 BMP 小一半)+ 可选 256-color 调色板放在文件末尾(这是个怪设计,也是 PCX 最大的工程特色)。1980 年代后期 BBS 时代是 PCX 的高光时刻 —— 那个年代上传 / 下载图片靠 9600 / 14400 baud 拨号调制解调器(几 KB/s),体积每减一半,下载时间就少一半。BBS 上传图片实际标准就是 PCX,跟 ZIP 套着发是常见组合。1990 年代初 GIF(1987)凭 LZW 压缩(更高压缩比 + 调色板更优)+ AOL / CompuServe 的推广迅速取代 PCX,JPEG(1992)再砍掉照片场景,PCX 跌出舞台。今天 PCX 几乎只在游戏考古(老 DOS 游戏的图形资产)和 fax 系统(早期 fax 软件用 PCX 做缓存)里出现;但 ImageMagick / GIMP 仍能读 .pcx 文件,这是格式生态学里"读得了但没人写"的典型案例。
In 1985 ZSoft Corporation launched PC Paintbrush — the most popular drawing program of the DOS era. Microsoft acquired some of ZSoft's tech in the early 1990s, renamed PC Paintbrush as Windows Paintbrush, and bundled it into Windows 3.0 (later renamed Paint). The accompanying PCX format had two core traits: simple RLE compression (making PCX about half the size of BMP) plus an optional 256-colour palette placed at the end of the file (a quirky design, and PCX's signature engineering trait). The late-1980s BBS era was PCX's golden age — that period's image transfers ran over 9600 / 14400 baud modems (a few KB/s), where halving file size meant halving download time. PCX was effectively the BBS image standard, often bundled inside ZIP archives. In the early 1990s GIF (1987) overtook PCX through LZW compression (better ratio + better palette handling) and AOL / CompuServe distribution; JPEG (1992) then claimed the photo niche, and PCX fell off the stage. Today PCX really only shows up in game archaeology (DOS-era game graphics) and fax systems (early fax software used PCX as an internal cache); ImageMagick / GIMP still read .pcx, a textbook case of "readable but no one writes" in format ecology.
技术内核
Technical core
PCX 内核两件事。① 简单 RLE 压缩 —— 像素数据每段读 1 byte:如果高 2 位是 11,低 6 位(1-63)就是 run length,下一 byte 是要重复的像素值;否则这 byte 本身就是单像素。这是极简 RLE,压缩率不高(漫画 / 大色块图能砍 50%,真彩照片几乎没效果),但解码代价低,DOS 上 8086 CPU 也能实时解。规范一句话写完。② 头部 128 byte + 像素数据 + 可选 256-color 调色板放在文件末尾 —— 这是 PCX 最大的工程特色,也是它怪的根源。原因:DOS 早期生成图像时是顺序写文件的(8086 上文件 seek 慢且不可靠),编码器边读屏幕像素边写 RLE,但还不知道会用到哪些颜色 —— 索引扩展时(偶尔出现新颜色)只能累计调色板,直到写完所有像素才能在文件末尾追加 769 byte 调色板(1 byte signature 0x0C 标记 + 256 × 3 byte RGB)。这是"流式编码 + 只能追加"硬约束下的产物,跟今天的 PNG / WebP 必须先决定调色板是完全相反的工程权衡。1980 年代 BBS 时代下载 PCX 的人其实经常解码到一半就显示出像素 —— 但调色板还没下载完,所以图像颜色是错的,直到下载完整个文件(包括末尾调色板),才能用正确颜色重绘。这就是 BBS 时代特有的"图像渐进显示但颜色慢慢校准"体验。今天的 progressive JPEG 是有意为之,PCX 的渐进显示其实是意外的副作用。
PCX's core, two pieces. ① Simple RLE — read 1 byte from the pixel stream: if its top two bits are 11, the lower six bits (1-63) give the run length and the next byte is the pixel value to repeat; otherwise the byte is itself a single pixel. Minimalist RLE — modest ratios (50 % for cartoons / flat-colour images, near-nothing for photos), but cheap to decode (real-time on a DOS 8086 CPU). The whole spec fits in one sentence. ② 128-byte header + pixel data + optional 256-colour palette at the end of the file — PCX's signature engineering trait, and the source of all its quirkiness. Reason: in early DOS, file generation was sequential (8086 file seek was slow and unreliable). The encoder streamed screen pixels into RLE while writing the file, yet didn't yet know the full set of colours used — palette accumulation could pick up a new colour at any time, and the palette could only be appended (769 bytes: 1-byte signature 0x0C + 256 × 3 bytes RGB) once all pixels had been written. It's the product of a "streaming encode, append-only" hardware constraint — the opposite of today's PNG / WebP, which must commit a palette up front. BBS-era PCX downloaders frequently saw the image render mid-download — but with wrong colours, until the trailing palette finally arrived and a redraw fixed them. That's the BBS-specific experience: progressive image display with gradually-correcting colours. Modern progressive JPEG is intentional; PCX's progressive display was an accidental side effect.
适用
USE FOR
- (历史)DOS 时代图像 / PC Paintbrush 输出
- (历史)BBS 上传 / 下载图片
- (历史)早期 Windows 3.0 / 3.1 应用
- 游戏考古 / 老 DOS 游戏图形资产解码
- (legacy) DOS-era images / PC Paintbrush output
- (legacy) BBS image upload / download
- (legacy) early Windows 3.0 / 3.1 applications
- Game archaeology — DOS-era graphics asset decoding
反适用
AVOID
- 任何现代场景(用 PNG / WebP)
- 真彩照片(RLE 几乎无压缩 · 用 JPEG)
- 需要 alpha 通道(PCX 不支持透明)
- 需要色彩管理 / 嵌入 ICC profile
- Anything modern (use PNG / WebP)
- True-colour photos (RLE buys ~nothing — use JPEG)
- Anything needing alpha (PCX has no transparency)
- Workflows needing colour management or embedded ICC
| scope | readers | editors | CLI |
|---|---|---|---|
| PCX (ZSoft) | ✓ ImageMagick · GIMP · IrfanView · XnView · 老 DOS 软件 | ~ GIMP 导出 · ImageMagick · 现代编辑器很少原生写 PCX | convert in.pcx out.png(ImageMagick)· pcxtoppm(NetPBM) |
Sun Raster — 工作站老照片
Sun Raster — workstation snapshots
"SunOS 屏幕截图的格式 —— 现在只在博物馆里。"
"SunOS screenshot format — found only in museums."
1988 年 Sun Microsystems 推出 SPARCstation —— 那个年代 Unix 工作站的代名词,"a network is the computer" 的实体。SunOS 桌面环境(OpenWindows,基于 NeWS / X11 混合)需要一个标准截图格式,Sun 顺手定义了 Sun Raster:32-byte header(magic / width / height / depth / length / type / colormap type / colormap length)+ 可选 colormap + raw 或 byte-RLE 像素流。极简到一页规格能写完。当时 X11 + xv(著名的图像查看器)是 Sun 工作站圈子的实际标准 —— 写论文 / 投幻灯片 / 文档配图,大家都用 .ras。但走出 Sun 生态就没人认了:PC 那边是 BMP / GIF / PCX,Mac 那边是 PICT / TIFF,Sun Raster 是 Sun 圈内独有方言。1990 年代后期 SGI / HP / IBM 各家工作站逐渐被 Linux / PC 替代,Sun 自己 2009 年被 Oracle 收购,Sun Raster 就跟 SunOS 一起进入历史。今天 ImageMagick 仍能读 .ras,但写它的人几乎绝迹 —— 跟 PCX 一样的"读得了但没人写"格式 fossil。
In 1988 Sun Microsystems shipped the SPARCstation — the very face of Unix workstations in that era, the physical embodiment of "the network is the computer". SunOS's desktop environment (OpenWindows, a NeWS / X11 hybrid) needed a standard screenshot format, and Sun casually defined Sun Raster: a 32-byte header (magic / width / height / depth / length / type / colormap type / colormap length), an optional colormap, and a raw-or-byte-RLE pixel stream. Minimal enough to fit on a single page of spec. At the time X11 + xv (the legendary image viewer) was the workstation circle's de-facto stack — papers, slides, documentation figures, all shipped as .ras. But step outside the Sun ecosystem and no one knew the format: PC land had BMP / GIF / PCX, Mac land had PICT / TIFF, Sun Raster was a Sun-only dialect. In the late 1990s SGI / HP / IBM workstations gave way to Linux / PCs; Sun itself was acquired by Oracle in 2009, and Sun Raster slipped into history alongside SunOS. ImageMagick still reads .ras today, but writers have all but vanished — another "readable but no one writes" fossil, just like PCX.
技术内核
Technical core
Sun Raster 内核就一件事:32-byte header + 可选 colormap + raw 或 byte-RLE 像素。头部 8 个 32-bit big-endian 字段:magic(0x59A66A95)、width、height、depth(1/8/24/32)、length(像素数据字节数)、type(0=old / 1=standard / 2=byte-encoded RLE / 3=RGB / 5=TIFF / 6=IFF)、colormap type(0=none / 1=RGB / 2=raw)、colormap length。pixels 类型决定是 raw 还是 RLE:byte-encoded RLE 简单粗暴,跟 PCX RLE 一脉相承 —— 看到 0x80 byte 就开始 run-length 编码(0x80 = escape · 0x80 0x00 = 单个 0x80 字面量 · 0x80 N V = 重复 N 次 V)。整个格式没有 chunk 系统、没有 metadata、没有 EXIF、没有 ICC profile、没有 alpha(depth=32 时第 4 个 byte 是保留位通常不渲染)。这就是 1988 年工作站语境的设计:系统截图工具需要的是简单 + 快 + 跟 X server 内存布局对齐,其它都是包袱。Sun Raster 跟同期(也已死)Silicon Graphics 的 SGI RGB(.rgb / .sgi)、HP 的 PCL Raster 是同一类东西 —— 工作站厂商各自定义的"系统级图片格式",随着工作站本身退场而消亡。
Sun Raster's core is one thing: a 32-byte header + optional colormap + raw or byte-RLE pixels. The header has eight 32-bit big-endian fields: magic (0x59A66A95), width, height, depth (1 / 8 / 24 / 32), length (pixel byte count), type (0 = old / 1 = standard / 2 = byte-encoded RLE / 3 = RGB / 5 = TIFF / 6 = IFF), colormap type (0 = none / 1 = RGB / 2 = raw), colormap length. The type field decides raw versus RLE: byte-encoded RLE is brute-simple and inherits straight from PCX RLE — when a 0x80 byte is seen, run-length kicks in (0x80 = escape · 0x80 0x00 = a literal 0x80 · 0x80 N V = N copies of V). The format has no chunk system, no metadata, no EXIF, no ICC profile, no alpha (depth = 32 reserves the 4th byte but typically doesn't render it). That's the 1988 workstation mindset: a system screenshot tool wants simple + fast + aligned with the X server's framebuffer; everything else is overhead. Sun Raster sits next to Silicon Graphics' SGI RGB (.rgb / .sgi) and HP's PCL Raster as a class of "system-level image formats" defined by individual workstation vendors — and they all died with the workstations themselves.
适用
USE FOR
- (已死)SunOS / OpenWindows 屏幕截图
- (已死)1990s X11 + xv 工作站论文配图
- 计算机历史考古 · 老 Sun 实验室资产解码
- 博物馆数字化项目
- (dead) SunOS / OpenWindows screenshots
- (dead) 1990s X11 + xv workstation paper figures
- Computing-history archaeology — decoding old Sun lab assets
- Museum digitisation projects
反适用
AVOID
- 任何现代场景(用 PNG / WebP)
- 需要 alpha / metadata / 色彩管理
- 非 Unix 工作站平台
- 需要现代浏览器或编辑器原生支持
- Anything modern (use PNG / WebP)
- Anything needing alpha / metadata / colour management
- Non-Unix-workstation platforms
- Anything needing native browser or editor support
| scope | readers | editors | CLI |
|---|---|---|---|
| Sun Raster (Sun Microsystems) | ✓ ImageMagick(legacy)· NetPBM · 老 xv / xli · libsun-raster | ~ ImageMagick 还能写 · GIMP 早期版本支持读 · 几乎无现代写入器 | convert in.ras out.png(ImageMagick)· rasttopnm(NetPBM) |
IFF / ILBM — Amiga 的传家宝
IFF / ILBM — Amiga's family heirloom
"chunk 容器思想的祖宗 —— PNG 都受它影响。"
"The grandfather of chunk-based containers — even PNG owes it credit."
1985 年 Commodore 推出 Amiga 1000 —— 那台领先时代 5 年的多媒体个人电脑(custom 芯片组 Agnus / Denise / Paula 同时跑图形 / 音频 / DMA,自定义协处理器堆出 4096 色 HAM 模式 / 4 路立体声 8-bit PCM,1985 年的硬件配置直到 1990s 中期 PC 才追上)。EA(Electronic Arts)的工程师 Jerry Morrison 为 Amiga 设计了 IFF(Interchange File Format)—— 一个"通用的多媒体容器":每个 chunk 是 4-byte ASCII ID + 4-byte big-endian length + payload,FORM 是顶层 chunk(描述具体类型如 ILBM = Interleaved BitMap),内嵌 BMHD(图像头)、CMAP(调色板)、BODY(像素数据)等子 chunk。主流 image type 是 ILBM,按 bitplane 而非 packed pixels 存储 —— 6 张 1-bit bitmap 叠加表达 6-bit 色,正好对应 Amiga 显存的 bitplane 硬件布局。这套 chunk 容器思想后来直接影响了 PNG / WebP / RIFF / AIFF / ISOBMFF / MP4 —— 你今天用过的几乎所有 chunk-based 文件格式都欠 IFF 一个 credit。Amiga 1994 年破产被 Commodore 拖死,IFF 跟着退场到复古圈子,但它的设计 DNA仍活在你每天用的格式里。
In 1985 Commodore launched the Amiga 1000 — a multimedia PC five years ahead of its time (custom chipset Agnus / Denise / Paula running graphics / audio / DMA in parallel; coprocessors stacking up to 4096-colour HAM mode and four-channel 8-bit stereo PCM; 1985 hardware specs PCs only caught up to in the mid-1990s). EA's Jerry Morrison designed IFF (Interchange File Format) for Amiga — a "universal multimedia container": every chunk is a 4-byte ASCII ID + 4-byte big-endian length + payload; FORM is the top-level chunk (declaring the concrete type, e.g. ILBM = Interleaved BitMap), with sub-chunks like BMHD (image header), CMAP (palette), BODY (pixel data). The dominant image type is ILBM, stored as bitplanes rather than packed pixels — six 1-bit bitmaps stacked to represent 6-bit colour, matching Amiga's bitplane framebuffer layout exactly. This chunk-container idiom went on to shape PNG / WebP / RIFF / AIFF / ISOBMFF / MP4 — almost every chunk-based file format you touch today owes IFF a credit. Amiga went down with Commodore in 1994 and IFF retreated to the retro scene, but its design DNA still lives in formats you use every day.
技术内核
Technical core
IFF / ILBM 内核两件事。① chunk = 4-byte ASCII ID + 4-byte big-endian length + payload(payload 末尾 padding 到偶数对齐)。这套语法极其简洁:解码器读 8 byte 就知道这个 chunk 是什么、有多大、跳到哪;不认识的 chunk 直接 skip 不报错 —— 前向兼容靠这个机制实现。FORM / LIST / CAT 是几个特殊容器 chunk(payload 内嵌其它 chunk),其它的 BMHD / CMAP / BODY / GRAB / CRNG / CCRT / ANNO / AUTH 等都是 leaf chunk。整套机制后来被 PNG 抄了:PNG 的 IHDR / PLTE / IDAT / IEND chunk 系统(4-byte type + 4-byte length + data + 4-byte CRC)就是 IFF chunk 加了一个 CRC 校验字段。WebP 用 RIFF(IFF 的 little-endian 变种)更是直接继承。② ILBM 按 bitplane 而非 packed pixels 存储 —— 一行 320 像素 6-bit 颜色,packed pixels 存法是 320 × 6 bit / 8 = 240 byte,一行像素值连续;ILBM 存法是 6 张 320-bit(40-byte)bitmap 交错,每张 bitmap 上一个像素位置只放该像素值的某一位。这种"诡异"布局不是为了压缩,是为了跟 Amiga Denise 芯片的 bitplane DMA 硬件直接对齐 —— Denise 在每个像素时钟从 6 张 bitmap 同时取一位组成 6-bit 索引,然后查 CMAP 调色板得到 RGB。这是 1985 年定制硬件 + 文件格式协同设计的范例,跟 GIF 的 LZW(为通用 CPU 设计)完全不同的工程方向。今天 ILBM 只在 Amiga 模拟器(WinUAE / FS-UAE)和老游戏(Lemmings / Defender of the Crown / Shadow of the Beast)资产解码里用得到 —— 但 chunk-container 思想已经统治了一切。
IFF / ILBM's core, two pieces. ① chunk = 4-byte ASCII ID + 4-byte big-endian length + payload (payload padded to even-byte alignment). Brutally simple: read 8 bytes and the decoder knows what the chunk is, how big, and where to jump; unknown chunks are silently skipped — forward compatibility falls out of this mechanism. FORM / LIST / CAT are the special container chunks (their payload nests other chunks); everything else (BMHD / CMAP / BODY / GRAB / CRNG / CCRT / ANNO / AUTH ...) is a leaf chunk. PNG copied the idiom wholesale: PNG's IHDR / PLTE / IDAT / IEND system (4-byte type + 4-byte length + data + 4-byte CRC) is essentially IFF chunks plus a CRC field. WebP, built on RIFF (the little-endian variant of IFF), inherits even more directly. ② ILBM stores bitplanes rather than packed pixels — a 320-pixel row at 6-bit colour, packed-pixel storage is 320 × 6 bit / 8 = 240 bytes (pixel values laid out contiguously); ILBM stores it as six interleaved 320-bit (40-byte) bitmaps, each bitmap holding one bit of the pixel index. The "weird" layout isn't for compression — it's aligned with the Amiga Denise chip's bitplane DMA hardware: every pixel clock Denise reads one bit from each of the six bitmaps in parallel to assemble a 6-bit index, then a CMAP lookup gives RGB. A perfect 1985 example of custom hardware co-designed with the file format — the opposite engineering direction from GIF's LZW (designed for general-purpose CPUs). Today ILBM lives only in Amiga emulators (WinUAE / FS-UAE) and legacy game asset decoding (Lemmings / Defender of the Crown / Shadow of the Beast) — but the chunk-container idea has gone on to dominate everything.
适用
USE FOR
- (已死)Amiga 应用程序图像 / 游戏资产
- Amiga 模拟器(WinUAE / FS-UAE)
- 计算机历史 / 复古游戏考古
- 研究 chunk-container 设计的"原型样本"
- (dead) Amiga application images / game assets
- Amiga emulators (WinUAE / FS-UAE)
- Computing history / retro game archaeology
- Studying chunk-container design from its prototype
反适用
AVOID
- 任何现代场景(用 PNG / WebP)
- 非 Amiga 平台原生显示
- 需要现代浏览器或社交媒体支持
- 大色深 / 真彩照片(bitplane 布局对 24-bit 不友好)
- Anything modern (use PNG / WebP)
- Native display outside Amiga platforms
- Anything needing modern browser / social-media support
- High-depth / true-colour photos (bitplane layout fights 24-bit)
| scope | readers | editors | CLI |
|---|---|---|---|
| IFF / ILBM (EA · Jerry Morrison) | ✓ ImageMagick · 部分 GIMP 版本 · WinUAE / FS-UAE · libilbm | ~ GIMP(部分版本)· DPaint(原生 Amiga)· ImageMagick 转换 | iffinfo(legacy)· ilbmtoppm / ppmtoilbm(NetPBM) |
QOI — 现代极简主义
QOI — modern minimalism
"一个人,一个周末,写了一个比 PNG 简单 100 倍的格式。"
"One person, one weekend, made a format 100× simpler than PNG."
2021 年 Dominic Szablewski(phoboslab,知名 JS 游戏引擎 Impact / Q1K3 系列作者)做了一个反思:"为什么 PNG 这么复杂?LZ77 + Huffman + 5 种 filter + zlib 包装 + chunks 系统 + CRC32?能不能用一个周末写一个'够用'的无损图片格式?"答案是 QOI(Quite OK Image format):5 个简单 op(RGB / RGBA / INDEX / DIFF / RUN),编码器和解码器各 ~300 行 C 代码,速度比 PNG 快 50× 编码 / 3-4× 解码,体积大 5-10%。spec 一页 PDF 印得下。Dominic 把项目发到 Hacker News 后排第一,Reddit / Twitter 疯转,一周内 ffmpeg / ImageMagick / Rust crates / Go libraries 就接入了 QOI,phoboslab/qoi 单 repo 拿到 7000+ star。这是格式生态学里少见的"个人作品 vs 工业标准"现象 —— QOI 不会替代 PNG(浏览器零支持 + 体积更大),但它证明了"PNG 的复杂度其实很多是历史包袱,90% 用例不需要"。同时代 Farbfeld(2014 suckless)是同类哲学的另一个尝试,两者并存形成"现代极简主义"小流派。
In 2021 Dominic Szablewski (phoboslab, well-known author of the Impact JS engine and the Q1K3 game series) asked a sharp question: "Why is PNG so complex? LZ77 + Huffman + five filter types + zlib wrapping + chunks + CRC32 — could you write a 'good enough' lossless image format in a weekend?" The answer was QOI (Quite OK Image format): five simple ops (RGB / RGBA / INDEX / DIFF / RUN), encoder and decoder ~300 lines of C each, encoding ~50× faster and decoding 3-4× faster than PNG, files 5-10% larger. The spec fits on a one-page PDF. Dominic posted it on Hacker News and hit #1; Reddit / Twitter blew up; within a week ffmpeg / ImageMagick / Rust crates / Go libraries had QOI support, and phoboslab/qoi crossed 7000+ stars. A rare format-ecology event: a personal project versus an industrial standard — QOI won't displace PNG (zero browser support + larger files), but it proved "much of PNG's complexity is historical baggage; 90% of use cases don't need it." Farbfeld (2014, suckless) is a contemporary attempt in the same vein; the two coexist as the "modern minimalism" mini-school.
(r·3 + g·5 + b·7 + a·11) % 64,极简但实测冲突率合理。(r·3 + g·5 + b·7 + a·11) % 64 is dead simple but has a reasonable collision rate in practice.技术内核
Technical core
QOI 内核四件事。① 极简 5 个 op:QOI_OP_RGB(prefix 8-bit + 3 byte 像素)、QOI_OP_RGBA(prefix 8-bit + 4 byte 像素)、QOI_OP_INDEX(prefix 2-bit + 6-bit 引用最近 64 像素 hash 表)、QOI_OP_DIFF(prefix 2-bit + 6-bit ±2 RGB 颜色差)、QOI_OP_RUN(prefix 2-bit + 6-bit run 长度 1-62)。整个码本就 5 个 op,跟 PNG 的 LZ77 + Huffman 比起来连"压缩"都谈不上 —— QOI 是用结构化预测(相邻像素相同 → RUN;相邻像素差 ±2 内 → DIFF;最近 64 像素曾出现 → INDEX)避免重复传输,LZ77 才是真正的字典压缩。② 编码 / 解码各只 ~300 行 C 代码 —— phoboslab 把整个 reference 实现做成单文件 header-only library `qoi.h`,500 行不到包括两套 API。对比之下 libpng + zlib 加起来 100k+ 行。③ 速度比 PNG 快 ~50×(编码)/ ~3-4×(解码)—— 因为没有 LZ77 字典查找、没有 Huffman 自适应、没有 5 种 filter 自适应预测;每个像素就是 1-5 byte 的 op + payload,编码器一遍扫,解码器也一遍扫,cache friendly 到极致。④ 体积比 PNG 大 ~5-10%—— 这是 QOI 的代价。在嵌入式 / 教学 / 不能依赖大型 codec lib 的场景里这点体积差完全可以接受,但 web 上下行带宽贵,所以 QOI 永远不会替代 PNG。Hacker News 上很多人不理解这一点 —— "为什么浏览器不集成?" 因为 web 是体积敏感的,QOI 是 CPU / 复杂度敏感的,目标用户群完全不同。
QOI's core, four pieces. ① Five minimal ops: QOI_OP_RGB (8-bit prefix + 3-byte pixel), QOI_OP_RGBA (8-bit prefix + 4-byte pixel), QOI_OP_INDEX (2-bit prefix + 6-bit reference into a hash table of the last 64 pixels), QOI_OP_DIFF (2-bit prefix + 6-bit ±2 RGB delta), QOI_OP_RUN (2-bit prefix + 6-bit run length 1-62). The whole codebook is five ops — compared to PNG's LZ77 + Huffman, you can barely call QOI "compression": it uses structured prediction (adjacent pixel identical → RUN; delta within ±2 → DIFF; one of the last 64 → INDEX) to avoid retransmitting redundant data; LZ77 is true dictionary compression. ② Encoder / decoder each ~300 lines of C — phoboslab ships the whole reference implementation as a header-only single file `qoi.h`, under 500 lines including both APIs. Compare libpng + zlib together: 100k+ lines. ③ ~50× faster encoding, ~3-4× faster decoding than PNG — no LZ77 dictionary lookup, no adaptive Huffman, no five-way filter prediction; each pixel becomes a 1-5 byte op + payload; the encoder scans once, the decoder scans once, cache-friendly to the extreme. ④ Files ~5-10% larger than PNG — QOI's cost. Acceptable in embedded / teaching / no-big-codec-lib scenarios; unacceptable on the web, where downlink bandwidth is expensive — which is why QOI will never displace PNG. Many on Hacker News didn't get this — "why don't browsers integrate it?" Because the web is size-sensitive while QOI is CPU- and complexity-sensitive; their target audiences barely overlap.
适用
USE FOR
- 嵌入式 / IoT(SRAM 紧张,装不下 libpng)
- 教学场景 · 让学生 1 周内写完一个无损图片格式
- 游戏内部资产(载入速度比体积重要)
- 命令行工具临时缓存(qoi 比 ppm / bmp 都好)
- Embedded / IoT (tight SRAM, no room for libpng)
- Teaching — students can write a complete lossless format in a week
- Game internal assets (load speed beats file size)
- CLI tool intermediate caches (better than ppm / bmp)
反适用
AVOID
- Web(浏览器零支持 + 体积比 PNG 大)
- 需要 progressive / interlace / 多帧动画
- 需要色彩管理 / EXIF / ICC profile
- 对体积敏感的存档场景(用 PNG / WebP)
- Web (zero browser support + larger than PNG)
- Anything needing progressive / interlaced / multi-frame
- Anything needing colour management / EXIF / ICC profiles
- Size-sensitive archival (use PNG / WebP)
| scope | readers | editors | CLI |
|---|---|---|---|
| QOI (phoboslab) | ✓ ImageMagick · ffmpeg · Rust qoi crate · Go qoi · Python qoi-py · phoboslab/qoi.h | ~ ImageMagick / GIMP(插件)/ ffmpeg 转换 · 浏览器零支持 | qoiconv in.png out.qoi · convert in.qoi out.png |
Farbfeld — suckless 的 32 字节头
Farbfeld — suckless's 32-byte header
"32 byte 头 + 16-bit BE RGBA,可被 gzip 替你压缩。"
"32-byte header + 16-bit BE RGBA — let gzip do the compression for you."
2014 年 suckless.org —— 那个推崇 dwm / dmenu / st / surf 的极简主义社区(口号"software that sucks less",拒绝任何"非必要"的功能)—— 做了 Farbfeld:一个完全无压缩的图像格式,32 byte 头部 + raw 16-bit big-endian RGBA。整个 spec 文档总共 11 行,比 PNG spec(200+ 页)短 4 个数量级。设计哲学就一句话:"压缩是 gzip 的事,不是图像格式的事。"于是 Unix 管道用得很好:png2ff in.png | gzip > out.ff.gz 就能存档,cat in.ff.gz | gunzip | ff2png > out.png 就能恢复;每个工具只做一件事(do one thing well),完全是 Unix 哲学的复刻。学术 / suckless 圈子里这种极简哲学很受欢迎,但生产几乎无人用 —— 没有压缩(gzip 替代品)、没有 alpha 行为标准、没有 metadata、没有色彩管理、没有 progressive。Farbfeld 跟同时代的 QOI(2021)是"现代极简主义"双子星:QOI 是"五个 op + ~300 行解码器",Farbfeld 是"32 byte 头 + 0 行解码器(就是 raw bytes)"—— 走得比 QOI 更远,但也更不实用。
In 2014 suckless.org — the minimalism community behind dwm / dmenu / st / surf, motto "software that sucks less", famous for refusing anything "non-essential" — released Farbfeld: a fully uncompressed image format, 32-byte header + raw 16-bit big-endian RGBA. The whole spec document is 11 lines — four orders of magnitude shorter than PNG's 200+ page spec. The design philosophy is one sentence: "compression is gzip's job, not the image format's." So Unix pipes work beautifully: png2ff in.png | gzip > out.ff.gz for archive, cat in.ff.gz | gunzip | ff2png > out.png to restore; each tool does one thing well — pure Unix philosophy reimplemented. The academic / suckless circle adores this minimalism, but production usage is essentially zero — no compression (gzip stands in), no defined alpha semantics, no metadata, no colour management, no progressive. Farbfeld and the contemporaneous QOI (2021) are the twin stars of "modern minimalism": QOI is "five ops + ~300-line decoder", Farbfeld is "32-byte header + zero-line decoder (literally raw bytes)" — going further than QOI, and even less practical.
技术内核
Technical core
Farbfeld 内核两件事。① 头部固定 32 byte:8 byte ASCII magic "farbfeld"(注意是小写,跟 PNG 0x89PNG 不同 —— suckless 觉得 magic byte 里塞 high-bit 是过度工程)+ 4 byte big-endian uint32 width + 4 byte big-endian uint32 height + 16 byte 保留。spec 没说保留区做什么,因为不需要—— 任何扩展应该靠 gzip 包装外面的 metadata 文件,而不是格式内部。② 像素流 16-bit big-endian RGBA × (width × height),完全无压缩 —— 设计哲学是"压缩是 gzip 的事,不是图像格式的事"。所以一张 1920×1080 的 Farbfeld 文件原始大小就是 32 + 1920 × 1080 × 8 = 16.59 MB,gzip 之后大概 2-5 MB(取决于内容);PNG 同图像大概 200 KB-2 MB,WebP 大概 100 KB-1 MB。Farbfeld 在体积上完全打不过 —— 但它的 spec 短(11 行)、解码器短(0 行,因为就是 raw bytes,memcpy 就完事)、跟 Unix 管道完美兼容(每个 pipeline stage 只做一件事:png2ff 转入,gzip 压缩,网络传输,gunzip 解压,ff2png 转出)。这是哲学上的图像格式,不是工程上的图像格式。生产用不了,但教 Unix 哲学课的时候完美样本。
Farbfeld's core, two pieces. ① Fixed 32-byte header: 8 bytes of ASCII magic "farbfeld" (note lowercase — different from PNG's 0x89PNG; suckless considers high-bit-in-magic over-engineering) + 4-byte big-endian uint32 width + 4-byte big-endian uint32 height + 16 reserved bytes. The spec doesn't say what reserved bytes are for, because they don't need to be — any extension should be a separate metadata file wrapped in the same gzip, not inside the format. ② Pixel stream: 16-bit big-endian RGBA × (width × height), completely uncompressed — the design philosophy is "compression is gzip's job, not the image format's." So a 1920×1080 Farbfeld file is literally 32 + 1920 × 1080 × 8 = 16.59 MB raw; gzip takes it down to 2-5 MB depending on content; PNG of the same image is ~200 KB-2 MB; WebP ~100 KB-1 MB. Farbfeld loses on size every time — but its spec is short (11 lines), its decoder is zero lines (raw bytes, memcpy and you're done), and it's perfect with Unix pipelines (each stage does one thing: png2ff converts in, gzip compresses, network transfers, gunzip decompresses, ff2png converts out). It's a philosophical image format, not an engineering one. Useless in production, perfect for teaching Unix philosophy.
适用
USE FOR
- Unix 管道临时缓存(
png2ff | gzip) - suckless 哲学的练习 / 教学样本
- 极简图像处理工具链(每段 pipeline 各司其职)
- 科研里需要完全确定字节布局的场景
- Unix pipeline intermediate caches (
png2ff | gzip) - suckless-philosophy exercises / teaching samples
- Minimalist image-processing toolchains (one job per stage)
- Research scenarios needing fully deterministic byte layout
反适用
AVOID
- 几乎任何实际场景(用 PNG / WebP / AVIF)
- Web(浏览器零支持 + 体积巨大)
- 需要 metadata / 色彩管理 / EXIF
- 需要 8-bit RGB(Farbfeld 强制 16-bit RGBA,常见 8-bit 输入要 expand 一倍体积)
- Almost any real-world scenario (use PNG / WebP / AVIF)
- Web (zero browser support + massive size)
- Anything needing metadata / colour management / EXIF
- Common 8-bit RGB input (Farbfeld forces 16-bit RGBA — doubles the size)
| scope | readers | editors | CLI |
|---|---|---|---|
| Farbfeld (suckless.org) | ✓ suckless farbfeld utils · NetPBM(部分版本)· ImageMagick | ~ 任意能写 16-bit BE RGBA 的工具(几行 C 就能写) | png2ff · ff2png · jpg2ff · pamtoff(NetPBM) |
GeoTIFF — 卫星图像的字面"地球图"
GeoTIFF — TIFF that literally maps Earth
"TIFF 加 6 个 tag,就成了卫星图像的字面意思的'地球图'。"
"Six extra tags turn TIFF into a literal 'image of Earth'."
1990 年代末,卫星遥感(Landsat、SPOT、Sentinel)生成的图像数量爆炸式增长,带来一个共性问题:像素本身只是亮度,但图像必须能告诉下游"第 (1024, 768) 个像素在地球上是哪一点经纬度、用什么投影、用什么 datum"——否则它就只是一张漂亮的灰阶,做不了 GIS 分析。USGS 联合一批遥感机构在 TIFF 6.0 之上加了 6 个核心 GeoKey,把"像素 → 大地坐标"的映射元数据标准化,并在 2000 年正式发布 GeoTIFF 1.0。OGC 在 2019 年把它升级为 GeoTIFF 1.1 国际标准。结果是:卫星图像、航空摄影、数字高程模型(DEM)、土地利用图全部默认 GeoTIFF;GDAL(几乎所有 GIS 软件的底层 I/O 库)、QGIS、ArcGIS 是工具链命脉;NASA Worldview、Sentinel Hub、Google Earth Engine 内部也走 GeoTIFF。
By the late 1990s, satellite remote sensing (Landsat, SPOT, Sentinel) was producing images at industrial scale, all sharing one problem: a pixel is just a brightness value, but the image must tell downstream "where on Earth is pixel (1024, 768)? in what projection? on what datum?" — otherwise it's just a pretty greyscale, useless for GIS analysis. USGS and a coalition of remote-sensing agencies layered six core GeoKeys on top of TIFF 6.0 to standardise the "pixel → geographic coordinate" mapping metadata, and shipped GeoTIFF 1.0 in 2000. OGC promoted it to international standard GeoTIFF 1.1 in 2019. The result: satellite imagery, aerial photography, digital elevation models (DEMs) and land-use maps all default to GeoTIFF; GDAL (the I/O backbone of nearly every GIS app), QGIS and ArcGIS form the toolchain spine; NASA Worldview, Sentinel Hub and Google Earth Engine all consume GeoTIFF internally.
ModelPixelScale + ModelTiepoint 把任意像素 (col, row) 映射到 (lat, lon) 在指定 datum / projection 上的真实地理坐标;右:地球经纬网格代表"像素终点"。整套机制对 TIFF 阅读器完全向后兼容 —— 不认识 GeoKey 的工具仍能把 .tif 当普通 TIFF 打开。ModelPixelScale + ModelTiepoint map any pixel (col, row) to (lat, lon) on a chosen datum / projection; right: a lat/lon globe grid stands for the "pixel destination". The whole scheme is fully backward-compatible — a TIFF reader that doesn't know GeoKey can still open the .tif as a plain TIFF.技术内核
Technical core
GeoTIFF 内核三件事。① 6 个核心 GeoKey tag:ModelTransformationTag(4×4 仿射矩阵,完整描述像素到地理坐标的线性变换)/ ModelTiepointTag(若干"控制点对",每对是像素位置 ↔ 地理坐标,适合非线性场景)/ ModelPixelScaleTag(每像素代表多少经纬度 / 米)/ GeoKeyDirectoryTag(主索引,记录所有 GeoKey 的 ID + value 偏移)/ GeoDoubleParamsTag(浮点参数表)/ GeoAsciiParamsTag(ASCII 字符串表,存投影名 / datum 名)。它们寄生在 TIFF 私有 tag 域 34735-34737,因此对不识别 GeoKey 的 TIFF 阅读器完全向后兼容 —— 这是 GeoTIFF 设计最聪明的地方。② 像素 → 地理坐标映射:典型组合是 ModelTiepoint(标定原点)+ ModelPixelScale(每像素单位),引擎按 lon = tiepoint.lon + col × scale.x / lat = tiepoint.lat - row × scale.y 反算;复杂场景上 ModelTransformationTag 直接给 4×4 矩阵。Datum / 投影通过 EPSG 代码引用(EPSG:4326 = WGS84,EPSG:3857 = Web Mercator)。③ 多波段(multi-band):一张 GeoTIFF 通常不止 RGB,Landsat 8 有 11 个 band(可见光 + 近红外 NIR + 短波红外 SWIR + 热红外 + 全色 panchromatic),Sentinel-2 有 13 个,科学家用 NIR - Red 算 NDVI(归一化植被指数)、SWIR 看含水量、Thermal 看地表温度。这些 band 全部走 TIFF 标准的 SamplesPerPixel + BitsPerSample 机制存,互不打扰。
GeoTIFF's core, three pieces. ① Six core GeoKey tags: ModelTransformationTag (a 4×4 affine matrix giving the full linear pixel → coordinate transform) / ModelTiepointTag (control-point pairs, each pixel position ↔ geographic coordinate, for non-linear cases) / ModelPixelScaleTag (lat/lon or metres per pixel) / GeoKeyDirectoryTag (the master index — every GeoKey's ID + value offset) / GeoDoubleParamsTag (floating-point parameter table) / GeoAsciiParamsTag (ASCII string table for projection / datum names). They live in TIFF private tag slots 34735–34737, so any TIFF reader that doesn't know GeoKey is fully backward-compatible — the cleverest part of the design. ② Pixel → coordinate mapping: the typical combination is ModelTiepoint (anchor) + ModelPixelScale (per-pixel unit) — the engine inverts lon = tiepoint.lon + col × scale.x / lat = tiepoint.lat - row × scale.y; for complex cases ModelTransformationTag carries a 4×4 matrix directly. Datum / projection are referenced via EPSG codes (EPSG:4326 = WGS84, EPSG:3857 = Web Mercator). ③ Multi-band: a GeoTIFF rarely stops at RGB — Landsat 8 has 11 bands (visible + NIR + SWIR + thermal IR + panchromatic), Sentinel-2 has 13; scientists compute NDVI (vegetation index) from NIR - Red, see water content with SWIR, surface temperature with thermal IR. All bands ride the standard TIFF SamplesPerPixel + BitsPerSample mechanism, no extra plumbing.
适用
USE FOR
- 卫星图像(Landsat / Sentinel / SPOT / 高分系列)
- 航空摄影 / 无人机正射影像
- 数字高程模型(DEM / DSM / DTM)
- 土地利用图 / 植被覆盖图 / 气象格点
- Cloud Optimized GeoTIFF(COG)做云端流式读
- Satellite imagery (Landsat / Sentinel / SPOT / Gaofen)
- Aerial photography / UAV orthophotos
- Digital elevation models (DEM / DSM / DTM)
- Land-use / vegetation / weather grid maps
- Cloud Optimized GeoTIFF (COG) for streaming from object storage
反适用
AVOID
- 任何非地理场景(普通照片用 JPEG / WebP)
- Web 直接展示(浏览器不解 GeoTIFF · 需服务端切瓦片)
- 对 metadata 不敏感的临时图像处理
- Anything non-geographic (use JPEG / WebP for ordinary photos)
- Direct browser display (no native GeoTIFF — server-side tile a WMS / XYZ)
- Throwaway image work that doesn't care about metadata
| scope | readers | editors | CLI |
|---|---|---|---|
| GeoTIFF (OGC) | ✓✓ GDAL · QGIS · ArcGIS · ENVI · ERDAS · rasterio (Python) · sf / terra (R) · OpenLayers / Leaflet (经服务端切瓦片) | ✓ QGIS · ArcGIS · GlobalMapper · 任意基于 GDAL 的 GIS 软件 | gdalinfo in.tif · gdal_translate · gdalwarp · rio info |
NITF — 军用情报图像
NITF — military intelligence imagery
"军方版 GeoTIFF,加上一堆 'security classification' 标记。"
"Military GeoTIFF plus a pile of security classification tags."
1980 年代,美国国防部需要一个"统一的情报图像格式" —— 卫星侦察、航空侦察、地面侦察、目标识别、地图、火控影像都要互通,且要带军方专用的元数据。1987 年发布 NITF 1.0(National Imagery Transmission Format),1998 年升 NITF 2.0,2005 年定 NITF 2.1 / MIL-STD-2500C,即今天的现役版本(同步对外发布的 ISO/IEC 12087-5 国际标准基本与之等价)。设计目标包括:(a) 多 segment 文件 —— 一个 .ntf 可以同时装多张图、多张文本注释、多张图形覆盖物(graphic / overlay)、已知地标(LUT 类);(b) 强制 security classification —— 每个 segment 有 2 字节的密级标记(U=Unclassified / C=Confidential / S=Secret / T=Top Secret),还有控制释放范围的 NOFORN / REL TO 等代号;(c) 支持 JPEG / JPEG 2000 / 无压缩的 payload。MIL-STD-2500C 全文 800 多页,涵盖从"怎么标记机密"到"怎么嵌入手绘图标"的全套军用流程。商业领域几乎没人用 —— 它是国防 + 北约 + 部分国家测绘局的内部语言。
In the 1980s the US Department of Defense needed a single intelligence-imagery container — satellite recon, aerial recon, ground recon, target identification, maps and fire-control imagery had to interoperate and carry military-specific metadata. NITF 1.0 (National Imagery Transmission Format) shipped in 1987, NITF 2.0 in 1998, and NITF 2.1 / MIL-STD-2500C in 2005 — the version still in service (ISO/IEC 12087-5, published in parallel, is essentially equivalent). Design goals: (a) multi-segment file — one .ntf can hold multiple images, text annotations, graphic / overlay layers and known-landmark tables (LUT-class); (b) mandatory security classification — every segment carries a two-byte clearance marker (U = Unclassified / C = Confidential / S = Secret / T = Top Secret) plus distribution caveats like NOFORN or REL TO; (c) JPEG / JPEG 2000 / uncompressed payload. MIL-STD-2500C runs over 800 pages, covering everything from "how to tag classification" to "how to embed hand-drawn icons". Commercial use is essentially zero — NITF is the internal language of the US DoD, NATO and a few national mapping agencies.
CLAS(security class)字段:U / C / S / T。文件总密级取所有 segment 中的最高值;读者只看到自己有权限的段。CLAS (security class) field: U / C / S / T. The file's overall classification equals the max across segments; readers see only the segments their clearance permits.技术内核
Technical core
NITF 内核三件事。① 多 segment 容器:一个 .ntf 文件 = 一个 file header(388 byte 起,固定字段)+ N 个 image segment + N 个 graphic segment + N 个 text segment + N 个 reserved extension segment(给 NGA 或第三方扩展用)。每个 segment 是独立单元 —— 自己的 header、自己的 payload、自己的密级。这跟 GeoTIFF 一个文件一张图的设计思路完全不同 —— NITF 一个文件就是一份"情报包"。② 强制 security classification:每个 segment 的 header 里都有 2 byte CLAS 字段(U=Unclassified / C=Confidential / S=Secret / T=Top Secret);此外还有 control caveats(NOFORN=No Foreign Nationals / REL TO XXX=Releasable To 列表 / ORCON=Originator Controlled 等)。文件总密级 = 所有 segment 密级的最高值;读取系统按读者权限逐段 redact —— 你看到的可能是同一份 .ntf 但里面只有 2 个 segment,其它的被空白替代。③ 支持多种 payload:image segment 的实际像素数据可以是无压缩 raw、JPEG(legacy)、JPEG 2000(主流 · 因为 JP2 的 progressive + ROI + 任意分辨率层级正好契合"先看缩略图再放大看局部"的情报场景)、Vector Quantization(VQ,1990s 老格式)。NITF 也支持嵌入 GeoTIFF 风格的地理元数据(通过专门的 ICHIPB / RPC00B 之类 TRE,Tagged Record Extensions),所以一张 NITF 同时是图、是地理参考、是密级文档。
NITF's core, three pieces. ① Multi-segment container: a .ntf is one file header (≥ 388 bytes, fixed fields) + N image segments + N graphic segments + N text segments + N reserved-extension segments (for NGA or third-party extensions). Each segment is independent — its own header, its own payload, its own classification. The opposite of GeoTIFF's "one file, one image" model — NITF is "one file, one intelligence package". ② Mandatory security classification: each segment header carries a two-byte CLAS field (U = Unclassified / C = Confidential / S = Secret / T = Top Secret); plus control caveats (NOFORN = No Foreign Nationals / REL TO XXX = Releasable To list / ORCON = Originator Controlled, etc.). File-level classification = max across segments; reading systems redact per segment by clearance — your copy of the .ntf may show only two segments, the rest blanked. ③ Multiple payload types: image segments can carry uncompressed raw, JPEG (legacy), JPEG 2000 (mainstream — JP2's progressive + ROI + arbitrary resolution layers fit the "thumbnail first, zoom into a region" intelligence workflow perfectly), or Vector Quantization (VQ, an older 1990s codec). NITF also embeds GeoTIFF-style geo metadata via dedicated TREs (Tagged Record Extensions) such as ICHIPB / RPC00B — so a single .ntf is at once an image, a geo-reference and a classified document.
适用
USE FOR
- 美国国防部 / NGA / 北约相关情报图像
- 需要文件内多 segment 不同密级混存的场景
- 侦察图像 + 分析员注记 + 覆盖物一体化交付
- US DoD / NGA / NATO intelligence imagery
- Mixed-classification segments inside a single file
- Recon image + analyst notes + overlays as one package
| scope | readers | editors | CLI |
|---|---|---|---|
| NITF (US DoD / NGA / NATO) | ~ 限国防工具链 · GDAL 部分支持 · ESRI ArcGIS for Defense · Hexagon ERDAS · BAE GXP | ~ ArcGIS for Defense · ENVI · Hexagon ERDAS · 其它需许可证 | nitfutils(NGA 官方) · gdalinfo --formats | grep NITF · gdal_translate -of NITF |
FITS — 天文学的"什么都装"
FITS — astronomy's "one format to hold the universe"
"它是图,是表格,是光谱,是任何天文数据 —— '一种格式装下宇宙'。"
"It's an image, a table, a spectrum — anything astronomical. 'One format to hold the universe'."
1981 年,Don Wells、Eric Greisen、Ronald Harten 等几位射电 / 光学天文学家在《Astronomy & Astrophysics》期刊上发表论文,提议一种"统一的天文数据格式":FITS = Flexible Image Transport System。痛点是当时美国国家光学天文台(NOAO)、国家射电天文台(NRAO)、欧南台(ESO)、各高校观测站都在用各自不兼容的二进制格式,数据交换得反复转换,且磁带寄送是常态(那个年代没有互联网,数据靠 9 轨磁带跨大洲邮寄)。设计目标:(a) 高位深图像(8-64 bit int / 32-64 bit float),因为 CCD 输出动辄 16-bit、长曝叠加后是 32-bit;(b) 多波段 / 多维数据立方体(空间 X/Y + 波长 Z 三维,甚至时间 T 第四维);(c) 表格存观测元数据(曝光时间、滤镜、坐标、温度、读出噪声等数百个字段);(d) 跨望远镜兼容,且自描述(读它的人不需要望远镜手册也能解读)。1988 年 IAU(国际天文联合会)正式认可 FITS 为天文数据交换标准。至今 40 余年未被替代 —— 因为天文需要严格的可读性、自描述性、跨工具一致性、长期归档能力,这些目标 HDF5 / NetCDF 等更现代格式都做不到比 FITS 更好的平衡。astropy 是 Python 天文社区的事实标准,from astropy.io import fits 是天文程序员的"hello world"。
In 1981, Don Wells, Eric Greisen and Ronald Harten — radio and optical astronomers — published a paper in Astronomy & Astrophysics proposing a unified astronomical data format: FITS = Flexible Image Transport System. The pain: the US NOAO (optical), NRAO (radio), ESO and university observatories all used incompatible binary formats; exchange meant constant conversion, and magnetic-tape shipping was the norm (no Internet — data travelled the world on 9-track tapes). Design goals: (a) high bit-depth images (8-64 bit int, 32-64 bit float — CCDs produce 16-bit, long-exposure stacks reach 32-bit); (b) multi-band / multi-dimensional data cubes (spatial X/Y + wavelength Z, or even a time axis T); (c) tabular metadata (exposure, filter, coordinates, temperature, read-noise — hundreds of fields); (d) cross-telescope, self-describing (a reader needs no instrument handbook). The IAU formally adopted FITS in 1988. Forty-plus years on, no replacement has stuck — modern formats like HDF5 / NetCDF can't beat FITS's balance of human-readability, self-description, cross-tool consistency and long-term archival. astropy is the de-facto Python astronomy library, and from astropy.io import fits is the astronomer-programmer's "hello world".
END 标记 header 结束。最神奇的是:你可以直接用 head -c 2880 image.fits 看到一段可读的元数据 —— 几十年的天文文件都能用文本编辑器"瞄一眼"。END to mark header termination. The magic: you can head -c 2880 image.fits and read the metadata in plain text — decades-old astronomy files are still text-editor inspectable.CRPIX1 / CRPIX2(参考像素位置)、CRVAL1 / CRVAL2(参考像素的 RA / Dec)、CDELT1 / CDELT2 或 CDi_j 矩阵(每像素的角秒)、CTYPE1 = 'RA---TAN'(投影类型,常见 TAN / SIN / ARC / ZEA)共同定义了像素到天球的可逆函数。这套机制由 Greisen & Calabretta 在 2002 年的两篇里程碑论文里完整化,所有现代天文软件(ds9 · astropy.wcs · IDL)都按这套读 WCS。CRPIX1 / CRPIX2 (reference-pixel position), CRVAL1 / CRVAL2 (RA / Dec at the reference pixel), CDELT1 / CDELT2 or the CDi_j matrix (arcsec per pixel) and CTYPE1 = 'RA---TAN' (projection — typically TAN / SIN / ARC / ZEA) together define an invertible pixel-to-sky function. The framework was completed in two landmark Greisen & Calabretta papers (2002), and every modern astronomy tool (ds9 · astropy.wcs · IDL) reads WCS the same way.技术内核
Technical core
FITS 内核六块。① HDU(Header Data Unit)链:一个 .fits 文件 = Primary HDU(必含)+ N 个 Extension HDU(可选),线性串联。Primary 装主图像;Extension 可以是 IMAGE(2D / 3D / N 维数组)、BINTABLE(二进制表)、TABLE(ASCII 表)。一个观测出来的 .fits 通常就是"主图 + mask + error + 源星表"四件套。② 80 byte ASCII header card:格式 KEYWORD = value / comment,全大写、固定 byte 1-8 是关键字、byte 9 是 '='、byte 11-30 是值、byte 31 是 '/'、byte 32-80 是注释。36 张卡 = 1 个 2880 byte 块(磁带遗产);最后一张是 END。你能用文本编辑器看见 FITS 的元数据 —— 这是 FITS 跨越 40 年的根本原因之一。③ 多维数据数组:NAXIS = N(维度数)/ NAXIS1, NAXIS2, …, NAXISN(各维大小)/ BITPIX(每像素位深 8/16/32/-32/-64,负数表示 IEEE 754 浮点)。NAXIS=2 是图,NAXIS=3 是数据立方体(X/Y/λ),NAXIS=4 加时间维。④ BINTABLE / TABLE:这是 FITS 真正的"杀手级"扩展 —— 二进制表能存源星表(N 行 × 几十列,从 ID / RA / Dec 到亮度 / 颜色 / 形态参数)、光谱(N 行波长 × 流量)、时序数据(N 行时间 × 通量),全部走标准 TFORM / TTYPE / TUNIT 描述,任何 FITS 工具都能读。这点让 FITS 既是"图像格式"又是"科学数据库",HDF5 / NetCDF 都没有这种"老牌 + 跨工具一致"的优势。⑤ WCS(World Coordinate System):由 Greisen & Calabretta 在 2002 年的两篇里程碑论文里完整化,通过 CRPIX(参考像素)/ CRVAL(参考点天球坐标)/ CDELT 或 CDi_j 矩阵(每像素角秒)/ CTYPE = 'RA---TAN'(投影类型,常见 TAN / SIN / ARC / ZEA)定义可逆函数。GeoTIFF 学的就是这套思路,只是把"天球"换成了"地球"。⑥ 多种 tile compression:Rice(整数、低噪图像最优 · 2-3×)/ GZIP(通用 · 2-3×)/ PLIO(mask 类整数图 · 4-8×)/ HCOMPRESS(有损 · 巡天图像 · 4-10×,JWST / Pan-STARRS 等大型项目用它)。压缩存在专门的 BINTABLE 扩展里,不破坏任何 FITS 兼容性 —— 不认识压缩的工具仍能识别 BINTABLE,只是看不懂里面是图。
FITS's core, six pieces. ① HDU (Header Data Unit) chain: a .fits file = Primary HDU (mandatory) + N Extension HDUs (optional), in a linear chain. Primary holds the main image; Extensions can be IMAGE (2D / 3D / N-D array), BINTABLE (binary table) or TABLE (ASCII table). A typical observation produces "main image + mask + error map + source catalogue" in a single file. ② 80-byte ASCII header card: format KEYWORD = value / comment, all uppercase, bytes 1–8 are the keyword, byte 9 is '=', bytes 11–30 are the value, byte 31 is '/', bytes 32–80 are the comment. Thirty-six cards = one 2880-byte block (magnetic-tape heritage); the last card is END. You can read FITS metadata in a text editor — one of the deep reasons FITS has lasted forty years. ③ Multidimensional data arrays: NAXIS = N (dimensions) / NAXIS1, NAXIS2, …, NAXISN (sizes) / BITPIX (per-pixel bit depth 8/16/32/-32/-64; negative = IEEE 754 float). NAXIS=2 is an image, NAXIS=3 is a data cube (X/Y/λ), NAXIS=4 adds time. ④ BINTABLE / TABLE — FITS's killer extension: binary tables hold source catalogues (N rows × tens of columns from ID / RA / Dec to magnitudes / colours / shape parameters), spectra (N rows of wavelength × flux), time series (N rows of time × flux) — all described via standard TFORM / TTYPE / TUNIT, readable by any FITS tool. This is what makes FITS simultaneously "image format" and "scientific database" — an edge HDF5 / NetCDF can't match. ⑤ WCS (World Coordinate System): completed in Greisen & Calabretta's two landmark 2002 papers — CRPIX (reference pixel) / CRVAL (sky coordinate at the reference pixel) / CDELT or CDi_j matrix (arcsec/pixel) / CTYPE = 'RA---TAN' (projection — typically TAN / SIN / ARC / ZEA) together define an invertible function. GeoTIFF borrowed exactly this design, just swapping "sky" for "Earth". ⑥ Tile compressions: Rice (best for low-noise integer images, 2-3×) / GZIP (general, 2-3×) / PLIO (mask-style integer images, 4-8×) / HCOMPRESS (lossy, survey imagery, 4-10× — used by JWST, Pan-STARRS and other large surveys). Compressed data lives inside a dedicated BINTABLE extension, so FITS compatibility is preserved — tools that don't understand the compression still see a BINTABLE, just can't decode the image inside.
图 52 · FITS 完整天文工作流。望远镜 CCD 输出 16/32-bit raw,pipeline 写成多 HDU FITS(主图 + mask + error + 源星表),ds9 / astropy 读入后通过 header 里的 WCS 把像素转成 (RA, Dec) 天球坐标 —— 这一步让"今天 JWST 的红外图"和"30 年前 Hubble 的可见光图"可以叠在同一片天空上做联合分析。下游再走 SourceExtractor 测光、specutils 拟合光谱、matplotlib 出图,最后进入论文。这条流水线 1981 年至今没有结构上的变化,只是工具变了 —— 磁带 → CD → FTP → AWS S3。MAST(STScI)、IRSA(IPAC)、NED 等天文数据中心至今全部按 FITS 提供下载。
Fig 52 · The full FITS astronomical workflow. The telescope CCD emits 16/32-bit raw, the pipeline writes a multi-HDU FITS (main image + mask + error + source catalogue), ds9 / astropy reads it and uses the WCS in the header to convert pixels to (RA, Dec) sky coordinates — that single step lets "today's JWST infrared image" and "Hubble's visible-light image from thirty years ago" be stacked on the same patch of sky for joint analysis. Downstream: SourceExtractor for photometry, specutils for spectral fitting, matplotlib for publication figures, then into the paper. This pipeline has been structurally unchanged since 1981; only the tools changed — magnetic tape → CD → FTP → AWS S3. MAST (STScI), IRSA (IPAC), NED and other archives still distribute everything as FITS.
| compression | typical ratio | lossy? | typical use |
|---|---|---|---|
| Rice | 2-3× | 无损 | 低噪整数图(CCD raw) |
| GZIP | 2-3× | 无损 | 通用 / 文本类数据 |
| PLIO | 4-8× | 无损 | mask 图(整数 / 稀疏) |
| HCOMPRESS | 4-10× | 有损 | 巡天图像(JWST · Pan-STARRS) |
$ ds9 image.fits # SAOImage 查看 · 天文标配 GUI
$ python -c "from astropy.io import fits; \
hdu = fits.open('img.fits'); \
print(hdu.info())" # Python · 列出所有 HDU
$ funpack img.fits.fz # CFITSIO 解 Rice / GZIP / HCOMPRESS
$ wcsinfo img.fits # 看 WCS · CRPIX / CRVAL / CTYPE
$ fitsverify img.fits # 校验是否合规 FITS · NASA 出品
适用
USE FOR
- 所有天文数据(从太阳到深空)
- 多维科学数据(N-D 数组、数据立方体)
- 需要长期归档(40+ 年向后兼容)
- 需要 ASCII 可读 metadata 的科学场景
- 需要 BINTABLE 二进制表 + 图像同文件的工作流
- 跨望远镜 / 跨时代数据叠加(WCS 一致性)
- All astronomical data (Sun to deep sky)
- Multidimensional scientific data (N-D arrays, data cubes)
- Long-term archival (40-plus-year backward compatibility)
- Scientific work needing ASCII-readable metadata
- Workflows mixing BINTABLE and image in one file
- Cross-telescope / cross-era data stacking (WCS consistency)
反适用
AVOID
- 任何非科学场景(用 PNG / JPEG / TIFF)
- 需要浏览器原生展示(零浏览器支持)
- 对压缩比极度敏感的存档(用 HEIF / AVIF)
- 不需要 WCS / metadata 的工作流(开销浪费)
- Anything non-scientific (use PNG / JPEG / TIFF)
- Native browser display (zero browser support)
- Archives extremely size-sensitive (use HEIF / AVIF)
- Workflows that don't need WCS / metadata (overhead waste)
| scope | readers | editors / pipelines | CLI |
|---|---|---|---|
| FITS (IAU) | ✓✓ ds9 · astropy.io.fits · CFITSIO · IDL · IRAF · CASA · DS9 / Aladin · ESA Datalabs · MAST / IRSA / NED 数据中心 | ✓✓ astropy 全家桶(specutils / photutils / lightkurve)· STScI Hubble pipeline · NASA JWST pipeline · ESO 镜像 | fitsinfo · fitsdump · fitscopy · fitsverify · funpack · wcsinfo |
JP2 / JPX — JPEG 2000 在科学领域的活路
JP2 / JPX — JPEG 2000's afterlife in science
"JPEG 2000 在 web 死了,在医学和卫星领域活得很好。"
"JPEG 2000 died on the web; it lives well in medicine and satellites."
2000 年 ISO/IEC 15444-1 发布,JPEG 2000 标准里附带 JP2 文件结构 —— 一个 ISOBMFF 风格的 box 容器(跟 MP4 同源,2001 年才被定为 ISOBMFF;但 JP2 box 框架是更早成型的同套思路),内部装 JPEG 2000 codestream payload + 色彩管理 + 分辨率信息 + ICC profile 等元数据。2003 年 ISO/IEC 15444-2 定义 JPX 扩展,允许多 codestream(类似多页)、复杂 metadata、跟 XML 的元数据集成、富互动结构。设计本意是替代 JPEG 成为下一代主流 —— 任意分辨率层级解码、无损 + 有损切换、ROI(region of interest)优先解码、progressive 流式播放。但是 web 浏览器拒绝实现:Chromium / Firefox 都说 "JPEG 2000 解码 CPU 开销太大,而 web 流量是体积敏感不是质量敏感",只有 Safari 至今支持(macOS / iOS 走系统 ImageIO 框架)。结果 JPEG 2000 在主流场景死亡,但在不被浏览器决定的场景里活得很好:DICOM transfer syntax(医学影像标准内嵌 JP2 codestream)、卫星图像归档(ESA / NASA 部分管线)、文化遗产高保真扫描(Library of Congress 古籍数字化)—— 这些场景需要"任意分辨率层级解码"和"同一文件无损 + 有损切换"。
In 2000 ISO/IEC 15444-1 shipped, with the JPEG 2000 standard also defining the JP2 file structure — an ISOBMFF-style box container (cousin of MP4 — formally ISOBMFF only in 2001, but JP2's box framework is an earlier instance of the same philosophy) wrapping a JPEG 2000 codestream plus colour-management, resolution and ICC-profile metadata. In 2003 ISO/IEC 15444-2 defined the JPX extension, allowing multiple codestreams (page-like), richer metadata, XML metadata integration and interactive structures. The original ambition was to replace JPEG as the mainstream — any-resolution-layer decoding, lossless / lossy switch, region-of-interest priority decoding, progressive streaming. Browsers refused: Chromium and Firefox both said "JPEG 2000 decode is too CPU-heavy, and web traffic is size-sensitive not quality-sensitive"; only Safari supports it today (via macOS / iOS ImageIO). So JPEG 2000 died on the mainstream — and lives well in scenes browsers don't gatekeep: DICOM transfer syntaxes (medical-imaging standards embedding JP2 codestreams), satellite-image archiving (ESA / NASA pipelines), cultural-heritage high-fidelity scans (Library of Congress book digitisation) — places that need "arbitrary resolution layers" and "lossless / lossy in one file".
jP(12 byte 签名)/ ftyp(文件类型 'jp2 ')/ jp2h(image header super-box,内含 ihdr 宽高位深 / colr 色彩空间)/ jp2c(实际的 JPEG 2000 codestream payload)。JPX(ISO/IEC 15444-2)扩展可加多个 jp2c(类似多页)+ uuid 自定义 box + XML metadata。这套 box 哲学跟 MP4 / HEIF 同源 —— 都把"容器和编码解耦"当成第一原则。jP (12-byte signature) / ftyp (file type 'jp2 ') / jp2h (image-header super-box containing ihdr for width/height/depth and colr for colour space) / jp2c (the actual JPEG 2000 codestream payload). JPX (ISO/IEC 15444-2) extends this with multiple jp2c boxes (page-like), uuid custom boxes and XML metadata. The same box philosophy as MP4 / HEIF — "container decoupled from codec" as first principle.技术内核
Technical core
JP2 / JPX 内核两件事。① JP2 = ISOBMFF box 容器 + JPEG 2000 codestream payload:容器层负责文件组织(签名 / 文件类型 / 图像头 / 色彩管理 / ICC profile / 分辨率信息 / metadata),payload 层是 JPEG 2000 的 wavelet codestream(EBCOT 码块 + 分辨率层 + 质量层),两者解耦。这套 box 哲学跟 MP4 / HEIF 同源,工业实现都共享 ISOBMFF parser。② JPX(ISO/IEC 15444-2)是 JP2 的扩展,加多 codestream(可装多张图,类似 PDF 的多页)、复杂 metadata(XML / RDF 集成,适合文化遗产场景描述古籍册次 / 著录信息)、富互动结构(超链接、分层标注)。JP2 在医学和卫星归档活下来的真正原因不是容器多复杂,而是 JPEG 2000 codestream 的两个核心特性:(a) 同一文件可无损或有损切换 —— 用 reversible 5/3 wavelet 是无损,irreversible 9/7 wavelet 是有损,客户端按 quality layer 选;DICOM 1.2.840.10008.1.2.4.91 transfer syntax 就是有损 9/7,部分医院 CT 用它做长期归档;(b) 任意分辨率层级解码 —— wavelet 多分辨率天然支持"先看 1/8 缩略图,再按需解 1/4、1/2、原始",对超大幅图像(古籍数字化 50K×50K 像素 / 卫星 10000×10000 多波段)做"渐进 + ROI 优先"流式查看是杀手锏。这种"同一文件多种用法"的能力 JPEG / WebP / AVIF 都没有(它们要么必须有损要么必须无损,无法切换)。
JP2 / JPX core, two pieces. ① JP2 = ISOBMFF box container + JPEG 2000 codestream payload: the container layer handles file organisation (signature / file type / image header / colour management / ICC profile / resolution / metadata); the payload is the JPEG 2000 wavelet codestream (EBCOT code blocks + resolution layers + quality layers); the two are decoupled. The same box philosophy as MP4 / HEIF — industrial implementations share the same ISOBMFF parser. ② JPX (ISO/IEC 15444-2) extends JP2 with multiple codestreams (page-like, à la PDF), richer metadata (XML / RDF integration — perfect for cultural-heritage cataloguing of book volumes and bibliographic records) and interactive structures (hyperlinks, layered annotations). The real reason JP2 lives on in medicine and satellite archiving isn't container sophistication — it's two core properties of the JPEG 2000 codestream itself: (a) lossless / lossy in one file — the reversible 5/3 wavelet is lossless, the irreversible 9/7 is lossy, and the client picks via quality layer; DICOM transfer syntax 1.2.840.10008.1.2.4.91 is lossy 9/7, used by some hospital CT archives; (b) arbitrary-resolution-layer decoding — wavelet multi-resolution naturally lets you "see a 1/8 thumbnail first, then decode 1/4, 1/2, original on demand". For huge imagery (50K×50K-pixel book scans, 10000×10000 multi-band satellite scenes), "progressive + ROI-priority" streaming is the killer feature. JPEG / WebP / AVIF have no equivalent — they're forced lossy or forced lossless, no switch.
适用
USE FOR
- DICOM transfer syntax(医学影像归档)
- 卫星图像归档(ESA / NASA / 部分商业卫星)
- 文化遗产高保真扫描(LoC · 大英图书馆古籍)
- 电影 DCP(Digital Cinema Package · DCI 强制 JP2)
- 需要同一文件无损 / 有损切换的存档场景
- 需要任意分辨率层级 + ROI 优先解码的超大图
- DICOM transfer syntax (medical-image archives)
- Satellite-image archiving (ESA / NASA / commercial sats)
- Cultural-heritage high-fidelity scans (LoC, British Library)
- Digital Cinema Package (DCI mandates JP2)
- Archives needing in-file lossless / lossy switch
- Huge images needing arbitrary-resolution + ROI-priority decode
反适用
AVOID
- Web(只 Safari 原生,Chromium / Firefox 拒绝)
- 移动端(解码 CPU / 内存开销大)
- 追求极致压缩比的现代场景(用 AVIF / HEIF)
- 不需要分辨率层级或 ROI 的普通图像
- The web (Safari only — Chromium / Firefox refused)
- Mobile (heavy decode CPU / memory)
- Modern scenes chasing peak compression (use AVIF / HEIF)
- Plain images without resolution-layer / ROI needs
| scope | readers | editors | CLI |
|---|---|---|---|
| JP2 / JPX | ~ Safari(macOS / iOS · 原生)· OpenJPEG(开源)· Kakadu(商业 · 性能基准)· DICOM 阅读器(MicroDicom · OsiriX · Horos)· GDAL · ImageMagick | ~ Photoshop(JP2 / JPX 插件)· GIMP(部分版本)· IrfanView · 文化遗产专用扫描软件 | opj_decompress(OpenJPEG)· kdu_expand(Kakadu)· gdal_translate -of JP2OpenJPEG |
WebP2 — Google 的实验后代
WebP2 — Google's experimental heir
"WebP 的 mark II,但 AVIF 已经赢在了起跑线。"
"WebP Mark II — but AVIF already crossed the finish line."
2021 年 Google 启动 WebP2 项目,目标是"做 WebP 的下一代"—— 不再像 WebP v1 那样基于 VP8 帧内编码,而是吸收 AV1(2018 年发布的下一代视频 codec)的思想自研一套全新 codec,同时保留 WebP 的 web 优先哲学(简单容器、轻量解码、Chrome 直接支持)。但设计窗口已经关闭:WebP2 启动时,AVIF(直接基于 AV1 的图片格式)2019 年已被 Netflix / Google 推动落地,2020 年 Chrome 加入支持,2022 年 Firefox / Safari 全部跟进 —— Google 自己的浏览器都已经先支持了竞品。WebP2 在 libwebp2 仓库慢慢迭代,但从未推到 Chrome 主流支持;Google 自己也没有公开宣布要替代 WebP 或 AVIF。结果今天 WebP2 处于一种尴尬状态:技术上是真的在写,数据上压缩率确实跟 AVIF / JXL 在同一档,但商业上没有任何动机推它落地 —— 因为 AVIF 已经占据了"下一代 web 图片格式"的生态位。WebP2 项目自己的 README 第一句话就承认:"WebP 2 is an experimental successor of WebP. WebP 2 is not WebP, neither v2 of WebP."(WebP 2 是 WebP 的实验后继者,既不是 WebP,也不是 WebP 的 v2)—— 一个少见的、官方亲自打的"这是研究项目,不是产品"标签。
In 2021 Google started the WebP2 project, aiming to build "WebP's next generation" — no longer based on VP8 intra-frame coding like WebP v1, but absorbing ideas from AV1 (the 2018 next-gen video codec) into a freshly engineered codec, while keeping WebP's web-first philosophy (simple container, light decoder, native Chrome support). But the design window had already closed: by the time WebP2 launched, AVIF (the image format directly built on AV1) had been driven into production by Netflix and Google in 2019, picked up by Chrome in 2020, and joined by Firefox and Safari by 2022 — Google's own browser already supported the competitor. WebP2 keeps iterating in the libwebp2 repo, but has never been promoted to mainstream Chrome support; Google has never publicly committed to it replacing WebP or AVIF. Today WebP2 sits in an awkward limbo: technically real, with compression on par with AVIF / JXL, but with zero commercial pressure to ship — AVIF already owns the "next-gen web image" ecological niche. The project's own README opens with a rare self-aware disclaimer: "WebP 2 is an experimental successor of WebP. WebP 2 is not WebP, neither v2 of WebP." A research project, officially labelled as such.
技术内核
Technical core
WebP2 内核两件事。① 基于 AV1 思想自研 codec(不直接用 AV1) —— Google 没有像 AVIF 那样直接抄 AV1 的帧内编码,而是从 AV1 借鉴几个思路(更大的 transform block 64×64、更聪明的 intra prediction、entropy coding 改进)然后自研一套独立 codec。原因有政治也有技术:技术上,Google 想做更轻量的解码器,AVIF 的解码器其实是 AV1 的子集,代码量大,移动设备 CPU 紧;政治上,WebP / VP9 / AV1 都是 Google 系的开放视频 codec 生态,WebP2 是想做"web 图片专用、不背 video codec 包袱"的小而美。但代价是 —— 没有现成的 AV1 解码器可借,得自己写。② 仍在 Google libwebp2 开发,未推到 Chrome 主流 —— libwebp2 是 Google 自己的开源库,在 GitHub 持续提交,但 Chrome 至今没有 webp2 的 image decoder 注册(对比 WebP 是 2010 年原生,AVIF 是 2020 年原生)。Google 自己也没公开 commit 推它落地 —— 一种"我们继续研究,但不答应商业化"的姿态。这种姿态在大公司开源项目里很少见,通常要么开发要么砍,WebP2 罕见地处于"长期实验状态"。
WebP2 core, two pieces. ① AV1-inspired but home-grown codec (not AV1 itself) — Google did not, like AVIF, just adopt AV1's intra-frame coding directly. Instead it borrowed ideas from AV1 (larger 64×64 transform blocks, smarter intra prediction, improved entropy coding) and engineered its own independent codec. The reason is partly political, partly technical: technically, Google wanted a lighter decoder — AVIF's decoder is essentially a subset of AV1, code-heavy and tight on mobile CPU; politically, WebP / VP9 / AV1 are all Google-aligned open video codecs, and WebP2 was meant to be a small purpose-built web-image codec without the video-codec baggage. The cost: no off-the-shelf AV1 decoder to borrow — everything written from scratch. ② Still in Google's libwebp2, never promoted to mainstream Chrome — libwebp2 is Google's own open-source library, with continuing GitHub commits, but Chrome has no webp2 image decoder registered (compare: WebP native since 2010, AVIF native since 2020). Google has never publicly committed to shipping it — a "we keep researching, but won't promise productisation" posture. Rare for big-company open source: usually it's either ship or kill — WebP2 sits in unusual long-term experimental limbo.
适用
USE FOR
- (研究)Codec 对比基准
- libwebp2 开发者社区实验
- 关注下一代图像 codec 的从业者跟踪样本
- (Research) codec comparison benchmarks
- libwebp2 developer-community experiments
- Tracking sample for next-gen image-codec watchers
反适用
AVOID
- 任何生产环境(浏览器原生支持为零)
- 任何对兼容性有要求的场景
- 替代 AVIF / WebP / JXL —— 没有理由
- Any production setting (zero native browser support)
- Any compatibility-sensitive scenario
- Replacing AVIF / WebP / JXL — no reason to
| scope | readers | editors | CLI |
|---|---|---|---|
| WebP2 | ✗ 无浏览器原生 · ~ Google libwebp2 库自带的参考解码器 | ✗ 无主流编辑器支持 | cwp2 / dwp2(libwebp2 仓库自带 · 仅参考实现) |
AVIF Sequence — 视频帧序列
AVIF Sequence — when stills become a video track
"AVIF 的'多帧'就是把 AV1 的 video 模式接回来。"
"AVIF's multi-frame mode just dials AV1's video back in."
AVIF 单图模式只用 AV1 的 intra-frame(关键帧)编码 —— 因为 web 图片不需要"前一帧后一帧"。但 AVIF 用的容器是 HEIF(基于 ISOBMFF,跟 MP4 / JP2 同源),HEIF 容器原本就是为视频设计的,有完整的 video track 概念。所以 AVIF Sequence 做的事情非常简单:把 AV1 的 video 模式装回去 —— 让一个 AVIF 文件可以装多帧、有 timeline、可循环、可带帧间预测。结果是一种"高质量短动图替代品":代替 GIF 的 8 bit 256 色 + LZW 暴体积、代替 animated WebP 的 VP8 老 codec。实测体积比 animated WebP 小 30-50%,因为 AV1 的帧间预测远比 VP8 高效。但代价是 —— AVIF Sequence 是真正的视频压缩,带 motion estimation / motion compensation,编码时间是 animated WebP 的 10-30×。这意味着服务器侧预编码可行,用户实时上传不行:Twitter / Reddit / Imgur 这种用户上传场景你不能让用户等 30 秒;但 Cloudinary / imgix 这种 CDN 中间层服务器预编码 OK。AVIF Sequence 现在的实际用法:替代 GIF 表情包(质量 + 体积都赢)、替代 web 短动画(.mp4 太重 / GIF 太丑的中间地带)、替代某些 Live Photo 场景(iOS HEIF 走的是相邻路线)。
AVIF's still-image mode only uses AV1's intra-frame (keyframe) coding — web images don't need "previous-frame / next-frame". But AVIF's container is HEIF (built on ISOBMFF, sharing roots with MP4 / JP2), and HEIF was designed for video in the first place, with a full video-track concept. So AVIF Sequence does something extremely simple: dial AV1's video mode back in — let a single AVIF file hold multiple frames, with a timeline, looping, and inter-frame prediction. The result is a "high-quality short-animation substitute": replacing GIF's 8-bit 256-colour LZW bloat and animated WebP's older VP8 codec. Measured sizes are 30–50% smaller than animated WebP, because AV1's inter-frame prediction is far more efficient than VP8. The cost: AVIF Sequence is real video compression, with motion estimation / motion compensation — encoding takes 10–30× longer than animated WebP. So server-side pre-encoding is fine, real-time user uploads are not: Twitter / Reddit / Imgur, where users upload live, can't make the user wait 30 seconds; Cloudinary / imgix as a CDN middle layer can. Today's actual uses: replacing GIF stickers (better quality and smaller); replacing web short animations (the middle ground between heavy .mp4 and ugly GIF); replacing some Live-Photo flows (iOS HEIF takes a parallel path).
ftyp 用 brand avis(sequence)区分单图 avif;meta 装 image items(沿用单图模式的描述方式);moov 是真正的视频 track,装 AV1 codec 的 I 帧(intra,关键帧)+ P 帧(inter,帧间预测)的时间线。"I-P-P-P-P-I-..."就是 AVIF Sequence 比 animated WebP 小 30-50% 的原因 —— P 帧只编码"和上一帧的差",而 animated WebP 每帧都是独立的。ftyp uses brand avis (sequence) to distinguish from still avif; meta holds image items (reusing still-mode description); moov is the actual video track with an AV1-codec timeline of I-frames (intra / keyframe) and P-frames (inter / predicted from previous). The "I-P-P-P-P-I-..." pattern is exactly why AVIF Sequence is 30–50% smaller than animated WebP — P-frames encode only the delta, while animated WebP encodes every frame independently.技术内核
Technical core
AVIF Sequence 内核两件事。① HEIF 容器内多 image item 或 video track —— HEIF(High Efficiency Image Format)容器是 ISOBMFF 风格的 box 结构,跟 MP4 / JP2 同源。AVIF 单图模式用 meta box 装一个 image item(只一帧 intra);AVIF Sequence 有两种装法:(a)多 image item(每帧独立 intra,跟单图一样,只是多个);(b)走 moov video track(真正的视频 track,有时间戳、可循环、可装 inter 帧)。ftyp 用 brand 区分:avif 是单图,avis 是 sequence。② 帧间预测可选(不一定都是 intra) —— 如果走 video track 模式,AVIF Sequence 就是真正的视频压缩:P 帧编码"和上一帧的差",B 帧编码"和前后帧的差",带 motion estimation / motion compensation 整套机制。这是它比 animated WebP / GIF 小 30-50% 的根因 —— animated WebP 每帧都是独立 VP8 intra(本质上是多张静图叠在一起),AVIF Sequence 真把"运动"压缩了。但这个能力的代价非常贵:编码时间 10-30× animated WebP,因为 motion estimation 是计算密集型搜索;客户端实时编码不可行(用户上传不能让其等 30 秒),只能服务器预编码或 CDN 中间层转码。这种"质量 / 体积赢、编码慢"的权衡跟 AVIF 单图是一致的 —— AVIF 全家就是"花更多 CPU 换更小文件"。
AVIF Sequence core, two pieces. ① Multiple image items or a video track inside HEIF — the HEIF (High Efficiency Image Format) container is an ISOBMFF-style box structure, sharing roots with MP4 / JP2. AVIF still mode places a single image item in a meta box (one intra frame). AVIF Sequence has two ways: (a) multiple image items (every frame independent intra, like still mode but several of them); (b) a real moov video track (with timestamps, looping, and inter frames). The ftyp brand distinguishes them: avif for still, avis for sequence. ② Inter-frame prediction is optional (not necessarily all intra) — in video-track mode AVIF Sequence is actual video compression: P-frames encode the delta from the previous frame, B-frames encode deltas from both sides, complete with motion estimation / motion compensation. This is why it's 30–50% smaller than animated WebP / GIF — animated WebP is essentially a stack of independent VP8-intra stills, while AVIF Sequence really compresses motion. The cost: encoding takes 10–30× animated WebP, because motion estimation is a compute-intensive search; client-side real-time encoding isn't viable (you can't make a user wait 30 seconds on upload), so it lives on the server side or in a CDN transcoder. The same "quality / size win, slow encode" trade-off as still AVIF — the whole AVIF family trades CPU for smaller files.
适用
USE FOR
- 高质量短动图(替代 GIF / animated WebP)
- 表情包 / sticker / 反应图(质量 + 体积双赢)
- web 短动画(.mp4 太重 / GIF 太丑的中间地带)
- 服务器预编码 / CDN 中间层转码场景
- Live Photo 类场景(短视频 + 关键帧静图)
- High-quality short animations (replacing GIF / animated WebP)
- Stickers / reactions (smaller and better-looking)
- Web short animations (the middle ground between heavy .mp4 and ugly GIF)
- Server-side pre-encode / CDN transcode setups
- Live-Photo-like flows (short video plus a keyframe still)
反适用
AVOID
- 客户端实时编码(用户上传场景 · 编码 10-30× 慢)
- 需要真正视频功能(音轨 / 长时长 · 改用 .mp4)
- 老浏览器兼容场景(同 AVIF · IE / 老 Safari 不行)
- Client-side real-time encoding (user uploads — 10–30× slower)
- True video features (audio track / long duration — use .mp4)
- Legacy browsers (same as AVIF — IE / old Safari out)
| scope | readers | editors | CLI |
|---|---|---|---|
| AVIF Sequence | ✓ Chrome / Firefox / Safari modern · iOS / macOS Photos · libavif | ~ ffmpeg(via libavif)· FFmpeg-based 转码工具 | avifenc -k 0 frames/*.png anim.avif · ffmpeg -i in.mp4 -c:v libaom-av1 out.avif |
JPEG XS — 低延迟广播
JPEG XS — sub-millisecond broadcast
"为'实时'而生:压缩比小,延迟极低,正好接 4K/8K 直播。"
"Built for live — modest compression, microsecond latency, just right for 4K/8K broadcast."
现代 4K / 8K 广播正在从 SDI 光纤切到 IP 流(SMPTE 2110 标准):传统电视台用 12G-SDI 光纤把 4K 信号从摄像机送到导播台,布线昂贵;新一代直接走以太网 IP 包,跟数据中心同基础设施。但 IP 流要解决一个传统 SDI 不存在的问题:带宽。4K 60p 未压缩是 12 Gbps,8K 是 48 Gbps,数据中心万兆 / 二十五兆以太网装不下。所以需要"压一下,但不能影响实时性"的 codec。JPEG XL 太复杂(编码慢)、JPEG 2000 也慢(EBCOT entropy coding 计算量大)、H.264 / H.265 / AV1 是视频 codec 但有帧间预测延迟(至少要缓 1-2 帧才能编),完全不行。JPEG WG 在 2018 年推 JPEG XS(ISO/IEC 21122):简化的 wavelet(不做完整 EBCOT,只用更轻量的 entropy coding),牺牲压缩比(只 4-6×,而 JPEG 是 10-20×、JPEG 2000 是 20-50×),换微秒到亚毫秒级编 / 解延迟。设计目标写在标准首页:"visually lossless at 4-6× compression with sub-millisecond latency"(视觉无损 + 4-6 倍压缩 + 亚毫秒延迟)。SMPTE 2110-22(2019)正式把 JPEG XS 列入 IP 广播标准的 mezzanine compression 层。VR 头显的 wireless display(无线 VR · 把 PC 渲染的画面无线传到头显)也用 —— 因为头显需要"运动到光子"<20ms 延迟才能不晕,JPEG XS 的<1ms 编 / 解给了足够预算。
Modern 4K / 8K broadcast is moving from SDI fibre to IP streams (SMPTE 2110): traditional TV stations used 12G-SDI fibre to ship 4K from camera to control room — expensive cabling. New ones run straight Ethernet IP, sharing infrastructure with data centres. But IP brings a problem SDI never had: bandwidth. 4K 60p uncompressed is 12 Gbps; 8K is 48 Gbps — 10/25 Gigabit Ethernet can't carry it raw. So you need a codec that "compresses a little without breaking real-time". JPEG XL is too complex (slow encode); JPEG 2000 is also slow (EBCOT entropy coding is heavy); H.264 / H.265 / AV1 are video codecs but have inter-frame-prediction latency (need to buffer 1–2 frames before encoding), totally unacceptable. The JPEG WG shipped JPEG XS in 2018 (ISO/IEC 21122): simplified wavelet (no full EBCOT, lighter entropy coding), trading compression ratio (only 4–6× — vs. JPEG's 10–20× and JPEG 2000's 20–50×) for microsecond-to-sub-millisecond encode / decode latency. The standard's front page literally says: "visually lossless at 4-6× compression with sub-millisecond latency". SMPTE 2110-22 (2019) formally adopted JPEG XS as the mezzanine-compression layer for IP broadcast. VR headsets using wireless display (PC-rendered frames sent wirelessly to the headset) use it too — because headsets need "motion-to-photon" latency under 20 ms to avoid sickness, and JPEG XS's sub-1 ms encode / decode leaves enough budget for everything else.
技术内核
Technical core
JPEG XS 内核三件事。① 简化的 wavelet(不做完整 EBCOT) —— JPEG 2000 的核心是 5/3 或 9/7 wavelet 加上 EBCOT(Embedded Block Coding with Optimal Truncation)entropy coding,EBCOT 提供超高压缩比但计算密集。JPEG XS 砍掉 EBCOT,只保留更简化的小波分解 + 轻量 entropy coding(直接 run-length / VLC),损失大概 5-10× 压缩比但获得10-100× 速度。② 视觉无损(typical 4-6× compression) —— 设计目标不是"压到最小",而是"压到肉眼看不出区别但带宽够省"。在 4K 60p 12 Gbps 场景,4-6× 压到 2-3 Gbps,刚好塞进 10 Gigabit Ethernet。这种"够用就好"的目标决定了它不会出现在 web(web 要求最小体积)、不会出现在归档(归档要求最高保真)。③ 帧内独立编码,无需缓冲 —— 每帧完全独立(类似 motion JPEG / motion JPEG 2000),没有帧间预测,所以编码器拿到一帧立刻编、解码器拿到一帧立刻解,延迟主要来自计算时间本身(<1 ms),不来自缓冲。这是相比 H.264 / AV1 这些视频 codec 的本质差异:视频 codec 必须缓冲 1-2 帧才能做 motion estimation,JPEG XS 完全不缓冲。代价是没有视频 codec 那种"压两个数量级"的能力,但这是 trade-off,不是缺陷。
JPEG XS core, three pieces. ① Simplified wavelet (no full EBCOT) — JPEG 2000's core is the 5/3 or 9/7 wavelet plus EBCOT (Embedded Block Coding with Optimal Truncation) entropy coding; EBCOT delivers very high compression but at heavy computational cost. JPEG XS strips EBCOT, keeping a much simpler wavelet decomposition plus lightweight entropy coding (direct run-length / VLC) — losing roughly 5–10× compression but gaining 10–100× speed. ② Visually lossless (typical 4–6× compression) — the design goal isn't "compress as much as possible", it's "compress until the eye can't tell, while saving useful bandwidth". On 4K 60p at 12 Gbps, 4–6× brings it down to 2–3 Gbps — fitting cleanly inside 10 Gigabit Ethernet. This "good enough" target keeps it out of the web (which wants the smallest size) and out of archives (which want the highest fidelity). ③ Intra-frame independent coding, no buffering — every frame is fully independent (like motion JPEG / motion JPEG 2000), no inter-frame prediction, so the encoder can encode the moment a frame arrives and the decoder can decode the moment it lands. Latency comes from compute alone (< 1 ms), not buffering. This is the essential difference vs. H.264 / AV1 video codecs: video codecs must buffer 1–2 frames to run motion estimation; JPEG XS buffers nothing. The price is no two-orders-of-magnitude video-style compression — but that's the trade-off, not a defect.
适用
USE FOR
- 4K / 8K 广播 IP 流(SMPTE 2110-22 mezzanine 层)
- VR 头显 wireless display(PC → 头显无线传图)
- 实时多机位摄影棚 IP 切换台
- 低延迟视频墙 / 监控墙(楼宇 / 控制室)
- "够用即可、宁要延迟不要压缩比"的实时场景
- 4K / 8K broadcast IP streams (SMPTE 2110-22 mezzanine)
- VR-headset wireless display (PC → headset)
- Live multi-camera studio IP switching
- Low-latency video walls (control rooms / signage)
- Real-time use cases where "good enough" beats "smallest"
反适用
AVOID
- Web 静图(用 JPEG / WebP / AVIF)
- 归档 / 压缩比敏感场景(用 JPEG 2000 / JXL)
- VOD / 离线点播(用 H.265 / AV1 视频 codec)
- Web stills (use JPEG / WebP / AVIF)
- Archives / size-sensitive scenes (use JPEG 2000 / JXL)
- VOD / offline playback (use H.265 / AV1 video codecs)
| scope | readers | editors | CLI |
|---|---|---|---|
| JPEG XS | ✗ 浏览器无 · ~ intoPIX SDK · SMPTE 2110-22 设备 · Kakadu(部分) | ✗ 无主流编辑器 · 都是广播链路硬件 / 软件 | ~ CLI 限商业(intoPIX)· 开源参考实现仅用于研究 |
神经压缩 — HiFiC / CDC / NN-codec
Neural compression — HiFiC / CDC / NN-codec
"用神经网络当 codec —— 同 bpp 下视觉效果比 AVIF 好 30%,但解码要 GPU。"
"Use a neural net as codec — visually 30% better than AVIF at same bpp, but needs a GPU to decode."
传统 codec 的设计哲学是手工设计 transform + 量化 + 熵编码:JPEG 用 8×8 DCT、AVIF 用 AV1 intra block 变换、JPEG 2000 用 wavelet —— 每一步都是人写的数学。神经压缩从 2016-2017 起换了路:整个 codec 是一个端到端可训练的神经网络。Toderici 等人 2016 年在 ICLR 用 RNN 做图像压缩;Ballé 等人 2018 年在 ICLR 提出 Hyperprior(用一个小网络估计 latent 的概率分布给熵编码器,大幅提升压缩比);Mentzer / Toderici 等人 2020 年在 NeurIPS 发表 HiFiC(High-Fidelity Generative Compression),引入 GAN 训练让低 bpp 重建有"细节合成";2023 年 Stanford 出 CDC(Conditional Diffusion Codec)用扩散模型当 decoder。在视觉相似度指标(MS-SSIM / LPIPS)上明显赢传统 codec —— 特别在极低 bpp(< 0.3 bpp):传统 codec 这时已经糊成方块、出 ringing,而 NN codec 可以"幻觉"出合理的纹理和细节(虽然不是真实的 —— 是plausibly hallucinated)。但工业部署寥寥:解码器 NN 必须随客户端分发(几十 MB 模型 vs 几 KB 图,反向负担);模型版本升级会让旧 .nn-img 解不出来;解码 GPU 依赖让移动端不可接受;学术界每 6 个月一篇 NeurIPS 论文宣布超越 AVIF 30%,但生产部署没几个真站住的。短期不会替代 AVIF,但可能在"AI 生成内容"领域率先落地 —— 同 AI 生成的图,用 AI 压缩。
Traditional codecs are hand-designed transforms + quantisation + entropy coding: JPEG uses 8×8 DCT, AVIF uses AV1 intra-block transforms, JPEG 2000 uses wavelets — every step is human-written maths. Neural compression took a different path from 2016–2017: the whole codec is a single end-to-end trainable neural network. Toderici et al. did RNN-based compression at ICLR 2016; Ballé et al. introduced the Hyperprior at ICLR 2018 (a small network estimating the latent's probability distribution for the entropy coder, dramatically improving ratios); Mentzer / Toderici et al. published HiFiC (High-Fidelity Generative Compression) at NeurIPS 2020, adding GAN training so low-bpp reconstructions get "detail synthesis"; in 2023 Stanford shipped CDC (Conditional Diffusion Codec) using a diffusion model as decoder. On visual-similarity metrics (MS-SSIM / LPIPS) they clearly beat traditional codecs — especially at very low bpp (< 0.3 bpp): traditional codecs by then are blocky and full of ringing, while NN codecs can "hallucinate" plausible texture and detail (not real — plausibly hallucinated). Industrial deployment stays thin: the decoder NN must ship with the client (tens of MB of model vs. a few KB of image — inverted load); a model-version bump makes old .nn-img files undecodable; GPU dependency rules out mobile; every six months a NeurIPS paper claims +30% over AVIF, but few productions actually stick. Short term it won't replace AVIF, but it may land first in "AI-generated content" — AI images compressed by AI.
技术内核
Technical core
神经压缩内核五块。① Encoder / Decoder 都是 CNN(典型 10-50M params),从图到 latent 是几层下采样卷积 + GDN(generalised divisive normalisation)非线性激活,从 latent 到图是对应的反卷积 / 上采样;两侧权重通过端到端反向传播联合训练。② 超先验(Hyperprior):Ballé 2018 的关键贡献 —— 用一个小网络估计 latent 每个 channel 的 Gaussian / Laplace 概率分布参数 σ,再用 σ 喂 arithmetic coder。这一步让"latent 的统计结构"被显式建模,熵编码效率提升一个量级;之后所有 NN codec 都沿用 Hyperprior 思路。③ GAN 训练(HiFiC):在 rate-distortion 损失之外加一个 Discriminator 判别"重建图 vs 原图",Decoder 学着"骗过 Discriminator"。低 bpp 重建从"糊状方块"变成"幻觉的合理纹理",MS-SSIM / LPIPS / FID 都大幅好转,但细节是合成的不是真实的 —— 这是 NN codec 不能用于法医 / 医学的根本原因。④ Diffusion-based codec(CDC):Stanford 2023 工作 —— Decoder 不是单次反卷积,而是一个条件扩散模型(以 latent 为条件,从噪声开始多步去噪到图像)。优势:diffusion 的"多步细化"对低 bpp 修复尤其好;劣势:解码 50-100 步 NN forward,慢到完全反实时(1080p 几秒)。CDC 现在还是学术阶段,但代表了 NN codec 的下一程方向。⑤ 解码必须 GPU:这是工业部署最大的物理约束 —— 移动端的 GPU(Adreno / Mali)架构跟桌面 NVIDIA 差太远,也跟 NN 推理优化(TensorRT / Core ML)的高端路径差太远;Web 上要做就得走 WebGPU,但目前 WebGPU 对 NN 推理的优化跟原生差 5-10×。所以 NN codec 现在的工业部署模式都是"中央服务器 GPU 解码 → 把解出来的 RGB 再压成 AVIF / WebP / VP9 → 发给客户端" —— 客户端从来没真正解过 NN codec 的码流。
Neural compression's core, five pieces. ① Encoder / Decoder are both CNNs (typically 10–50 M params); image → latent is a few downsampling convolutions plus GDN (generalised divisive normalisation) non-linearity; latent → image is the matching upsampling. Both sides are jointly trained end-to-end via backprop. ② Hyperprior: Ballé 2018's key contribution — a small network estimates the Gaussian / Laplace distribution parameters σ for every channel of the latent, then feeds σ into the arithmetic coder. This explicitly models the latent's statistical structure, lifting entropy efficiency by an order of magnitude; every NN codec since uses Hyperprior. ③ GAN training (HiFiC): on top of rate-distortion loss, add a Discriminator distinguishing "reconstruction vs. original"; the Decoder learns to "fool the Discriminator". Low-bpp reconstructions go from "blurry mush" to "hallucinated plausible texture"; MS-SSIM / LPIPS / FID all improve sharply — but the detail is synthesised, not real. That's the fundamental reason NN codecs can't be used in forensic / medical settings. ④ Diffusion-based codec (CDC): Stanford 2023 — the Decoder isn't a single deconvolution but a conditional diffusion model (start from noise, denoise to the image conditioned on the latent). Pros: diffusion's "multi-step refinement" works especially well for low-bpp restoration. Cons: 50–100 NN forwards per decode, completely off real-time (seconds per 1080p frame). CDC is still academic but charts the next leg. ⑤ Decoding requires a GPU: the biggest physical deployment constraint — mobile GPUs (Adreno / Mali) differ too much from desktop NVIDIA, and from NN-inference optimisation paths (TensorRT / Core ML). On the web you'd go through WebGPU, but its NN-inference performance is 5–10× behind native. So today's industrial NN-codec deployments are "central GPU decode → re-compress as AVIF / WebP / VP9 → ship to client" — clients never actually decode the NN bitstream.
图 57 · 神经压缩完整流程。训练阶段(一次性):大数据集(CLIC / OpenImages ~1M 图)喂进 Encoder + Hyperprior + Decoder + (Discriminator) 联合训练,损失函数是"熵率 + 重建距离 + GAN 损失"的拉格朗日组合,几周 GPU 集群训出 10-50M 参数的模型(.pt / .onnx 文件 30-150 MB)。模型必须随 Decoder 一起分发到客户端。推理阶段(每次编码 / 解码):原图 → Encoder NN → quantise → arithmetic encode → bytes(0.1 bpp 的 1080p ≈ 30 KB);bytes → arithmetic decode → Decoder NN → 重建图。GPU 上每帧 ~80 ms,纯 CPU ~1.5 s。最反直觉的部分:模型升级会让旧 .nn-img 解不出来 —— 这跟 JPEG / PNG / AVIF 那种"几十年向后兼容"完全相反,这也是 NN codec 短期不可能上 web 主战场的根本原因。
Fig 57 · Full neural-compression workflow. Training (one-off): a large dataset (CLIC / OpenImages, ~1 M images) trains Encoder + Hyperprior + Decoder + (Discriminator) jointly under a Lagrangian of "entropy rate + reconstruction distance + GAN loss". A few weeks on a GPU cluster yields a 10–50 M-param model (.pt / .onnx, 30–150 MB). The model must ship to clients alongside the Decoder. Inference (per encode / decode): image → Encoder NN → quantise → arithmetic encode → bytes (1080p at 0.1 bpp ≈ 30 KB); bytes → arithmetic decode → Decoder NN → reconstruction. ~80 ms / frame on GPU, ~1.5 s on CPU. The most counterintuitive part: a model bump makes old .nn-img files undecodable — the opposite of JPEG / PNG / AVIF's "decades of backward compatibility", which is the fundamental reason NN codecs can't fight on the web's main front in the short term.
| codec | year | author | feature |
|---|---|---|---|
| Toderici LSTM | 2016 | 早期 RNN-based · 概念奠基 | |
| Ballé Hyperprior | 2018 | NYU / Google | Gaussian Hyperprior · 领域基石 |
| HiFiC | 2020 | Google Research | GAN-based · 低 bpp 王 |
| ELIC | 2022 | Tencent | Efficient Learned Image Compression · 实用化向 |
| CDC | 2023 | Stanford | Diffusion-based decoder · 多步去噪 |
| ContextFormer | 2023 | Microsoft Research | Transformer-based hyperprior |
$ pip install compressai # PyTorch NN codec 库 · InterDigital
$ python -m compressai.utils.eval_model \
-a bmshj2018-hyperprior \
-m /path/to/model -d in.png # Ballé 2018 Hyperprior 示例
$ python compressai cli encode model_hific in.png out.bin
$ python compressai cli decode model_hific out.bin recon.png
$ pip install neuralcompression # Meta(Facebook AI Research)NN codec
$ python -c "from compressai.zoo import bmshj2018_hyperprior; \
m = bmshj2018_hyperprior(quality=8, pretrained=True)"
适用
USE FOR
- (未来)AI 生成内容压缩(同生态:AI 图 + AI codec)
- 极低 bpp(< 0.3)+ 服务器侧 GPU 可用的场景
- 云游戏 / 流媒体的服务器侧解码 → 转 AVIF 流出
- 研究 / 学术评测 / 数据集压缩(实验性)
- "内容 ≫ 模型大小"的高带宽专用通道(Stadia 类)
- (future) AI-generated-content compression — AI image + AI codec, same ecosystem
- Very-low-bpp (< 0.3) scenes with server-side GPUs
- Cloud-gaming / streaming: server decode → re-encode as AVIF on the way out
- Research / academic benchmarking / dataset compression
- "Content ≫ model size" specialised channels (Stadia-style)
反适用
AVOID
- 当前 web · 任何无 GPU 端(移动 / IoT / 老电脑)
- 需要"每个像素真实"的场景(法医 / 医学 / 卫星)
- 需要长期归档(码流不向后兼容)
- 客户端实时编码(GPU 编码也很贵 · 普通用户上传不行)
- 低流量场景(模型 30-150 MB 摊不平)
- Today's web · any GPU-less endpoint (mobile / IoT / old PCs)
- Anything needing "every pixel real" (forensic / medical / satellite)
- Long-term archives (no bitstream backward compatibility)
- Client-side real-time encoding (GPU encode is also expensive — user uploads can't take it)
- Low-volume scenes (the 30–150 MB model can't amortise)
| scope | readers | editors / pipelines | CLI |
|---|---|---|---|
| NN codec(各家不互通) | ✗ 无任何浏览器原生 · ~ compressai(InterDigital)· neuralcompression(Meta)· 各家自家 SDK | ✗ 无主流编辑器 · 仅研究代码 · TensorFlow Compression · PyTorch + 自训模型 | compressai cli encode/decode · tfci(TensorFlow Compression)· 各家自家 CLI 工具 |
HEIC Live Photo — 苹果的图 + 视频混合容器
HEIC Live Photo — Apple's still + video twin container
"一张照片其实是一个 .heic + 一个 .mov 的双胞胎。"
"One 'photo' is actually a twin: one .heic and one .mov."
2015 年 9 月,Apple 在 iPhone 6s 上推出 Live Photo —— 拍照时同时录下前 1.5 秒 + 后 1.5 秒共 3 秒视频,让"静态照片"在长按时能"动一下"。技术上这不是一种新的图像格式,而是一个双文件容器思路:1 张 HEIC 静图(iOS 11 起 HEIC 取代 JPEG 成为默认)+ 1 段 MOV 视频(H.264 1080p 25fps 无声),通过 metadata 里的 asset identifier UUID 关联,Photos.app(iOS / macOS)把它们当一个对象呈现。这种"图 + 视频组合"是HEIC(基于 HEIF 的 ISOBMFF 容器)在工程层的延伸 —— HEIF 容器原生支持图像 + video track 共存(参见AVIF Sequence),但 Apple 选择不把它们装进同一个 HEIF 文件,而是分两个文件靠 UUID 维系。原因可能是 backward compatibility:老的 .heic / .mov 工具不需要为 Live Photo 改动,各自能独立打开。代价就是跨平台传输:AirDrop 给非 iOS 设备时,MOV 部分会丢失,接收方只看到一张静图。它是"关联式混合容器格式"在消费级场景的代表 —— 同思路 Google 的 Motion Photo(2017)、Samsung 的 Motion Photo 都是 .jpg 加内嵌 mp4,只是合在一个文件里。
In September 2015 Apple introduced Live Photo on the iPhone 6s — the camera simultaneously records 1.5 s before and 1.5 s after the shot, three seconds total, so a "still photo" can "move a little" on long-press. Technically it's not a new image format but a twin-file container idea: one HEIC still (HEIC replaced JPEG as the default starting iOS 11) + one MOV video (H.264 1080p 25 fps, no audio), linked via an asset identifier UUID in the metadata, with Photos.app (iOS / macOS) presenting them as one object. The "still + video pair" extends HEIC (an ISOBMFF-based HEIF container) at the engineering layer — HEIF natively supports image + video track in one file (see AVIF Sequence), but Apple chose not to pack them into a single HEIF file, instead splitting across two files held together by UUID. Probably for backward compatibility: legacy .heic / .mov tools needed no Live-Photo changes; each opens independently. The cost is cross-platform transfer — AirDrop to a non-iOS device drops the MOV; the receiver sees only the still. It's "linked-pair hybrid container" at consumer scale — Google Motion Photo (2017) and Samsung's equivalent take the same idea but pack the .jpg and the mp4 into a single file.
IMG_0001.HEIC(主静图,HEVC intra,2-3 MB)+ IMG_0001.MOV(3 秒 H.264 视频,1080p 25fps 无声,3-5 MB)。两者通过 metadata 里相同的 asset identifier UUID 关联(.heic 的 UUID box / .mov 的 com.apple.quicktime.content.identifier metadata),Photos.app(iOS / macOS)读两个文件的 UUID 一致就把它们当作一张 Live Photo 呈现。AirDrop 给非 iOS 设备时,MOV 不会被识别为关联资产,只有 HEIC 静图过去 —— 这是 Live Photo 跨平台兼容差的根因。IMG_0001.HEIC (main still, HEVC intra, 2–3 MB) + IMG_0001.MOV (3-second H.264, 1080p 25 fps, audioless, 3–5 MB). They're linked via a shared asset identifier UUID in metadata (.heic's UUID box / .mov's com.apple.quicktime.content.identifier); Photos.app (iOS / macOS) sees the matching UUIDs and presents them as one Live Photo. AirDrop to a non-iOS device doesn't recognise the MOV as a linked asset — only the HEIC still travels — which is exactly why Live Photo's cross-platform compatibility is poor.技术内核
Technical core
Live Photo 内核两件事。① 双文件 + UUID 关联:拍照那一刻 iPhone 同时存两个文件 —— IMG_xxxx.HEIC(默认 iOS 11+ 静图格式 · HEVC intra block 编码 · ~2-3 MB)+ IMG_xxxx.MOV(QuickTime 容器装 H.264 · 1080p 25fps · 无声 · 前 1.5 + 后 1.5 共 3 秒 · ~3-5 MB)。两者通过 metadata 里相同的 asset identifier UUID 关联:.heic 在 ISOBMFF 的 uuid box 里写,.mov 在 moov.meta.keys.com.apple.quicktime.content.identifier 里写。这套关联机制是 Apple 私有的,但 UUID 字段格式在 iOS Photos 框架里有公开 API。② Photos.app 把两个文件当一个对象:iOS / macOS 的 PhotoKit 框架在导入照片时检测到匹配的 UUID 就自动绑定,UI 层呈现一个图标(单张静图 + 长按播放视频),云端同步(iCloud Photos)也作为一个 asset 同步。第三方 app 想读 Live Photo 必须走 PhotoKit 的 PHAssetResource API —— 直接读两个文件 + 匹配 UUID 也行,但要自己实现绑定逻辑。AirDrop / iMessage 在 Apple 设备间能保留双文件;但跨平台(发到 Android / Windows)只发 HEIC,MOV 部分丢失 —— 这是"双文件容器"路线最大的代价。
Live Photo's core, two pieces. ① Twin files + UUID link: the iPhone stores two files at capture — IMG_xxxx.HEIC (default iOS 11+ still format · HEVC intra · ~2–3 MB) + IMG_xxxx.MOV (QuickTime container with H.264 · 1080p 25 fps · no audio · 1.5 s before + 1.5 s after = 3 s · ~3–5 MB). They're linked via a shared asset identifier UUID: the .heic writes it in an ISOBMFF uuid box; the .mov writes it under moov.meta.keys.com.apple.quicktime.content.identifier. The mechanism is Apple-private, but the UUID field format is exposed via public APIs in iOS's Photos framework. ② Photos.app treats them as one asset: iOS / macOS PhotoKit detects matching UUIDs on import and binds them automatically; the UI shows a single icon (still image, long-press to play video); iCloud Photos syncs them as one asset. Third-party apps wanting to read Live Photos must go through PhotoKit's PHAssetResource API — reading the two files directly and matching UUIDs works too, but you implement the binding yourself. AirDrop / iMessage between Apple devices preserves both files; cross-platform (to Android / Windows) only the HEIC goes, losing the MOV — the biggest cost of the "twin-file container" path.
适用
USE FOR
- iPhone / iPad Live Photo 拍照(默认开启)
- Apple 生态内分享(AirDrop / iMessage / iCloud)
- iOS 锁屏 / 壁纸"动起来"效果
- macOS Photos.app 浏览 / 编辑(导出可选静图 / GIF / video)
- iPhone / iPad Live Photo capture (on by default)
- Within Apple ecosystem (AirDrop / iMessage / iCloud)
- iOS lock-screen / wallpaper "moving" effects
- macOS Photos.app browsing / editing (export to still / GIF / video)
反适用
AVOID
- 任何非 Apple 生态(Android / Windows / Web)
- 跨平台分享(MOV 丢失,只剩静图)
- 第三方 app 不走 PhotoKit 的话需自己处理 UUID 绑定
- 需要"单文件即可"的纯静图场景(用 HEIC / JPEG)
- Anything outside Apple's ecosystem (Android / Windows / Web)
- Cross-platform sharing (MOV lost, still-only remains)
- Third-party apps not on PhotoKit must handle UUID binding themselves
- Pure stills where one file suffices (use HEIC / JPEG)
| scope | readers | editors | CLI |
|---|---|---|---|
| HEIC Live Photo | ✓ iOS Photos · macOS Photos · PhotoKit API · ~ 第三方 heif-tools(部分)· ✗ Web 浏览器无 | ✓ iOS Photos · macOS Photos · 第三方 Live Photo 编辑 app(Lively · Motion Stills 已停) | ~ exiftool 可读 UUID metadata · ffmpeg 处理 MOV 部分 · heif-info 读 HEIC |
命令行 codec 一览
Command-line codec roster
这一节是查询表 —— 按"目标格式"找对应工具。所有命令均假设你已经装好对应工具(brew / apt 安装名见每行末尾)。命令风格各家不一,但参数语义大致互通:-q / --quality 控质量、-o / --output 给输出文件、-s / --speed 调编码速度(慢 = 小)。
A reference table — find the right tool by output format. Each row assumes you have installed the package (Homebrew / apt name at the end). Each codec uses its own flag dialect, but the semantics roughly converge: -q / --quality for quality, -o / --output for the output file, -s / --speed for the encoder speed (slower = smaller).
| format | encoder | decoder | typical command | install |
|---|---|---|---|---|
| JPEG | cjpeg / mozjpeg / jpegli | djpeg | cjpeg -quality 85 -optimize in.ppm > out.jpg | libjpeg-turbo |
| PNG | oxipng / pngcrush / optipng | libpng | oxipng -o6 in.png | oxipng |
| WebP | cwebp | dwebp | cwebp -q 75 in.png -o out.webp | libwebp |
| AVIF | avifenc | avifdec | avifenc -s 6 -a end-usage=q -a cq-level=23 in.png out.avif | libavif |
| JPEG XL | cjxl | djxl | cjxl in.png out.jxl --quality 90 | libjxl |
| HEIC | heif-enc | heif-dec | heif-enc -q 60 in.png -o out.heic | libheif |
| GIF | gifsicle / convert | gifsicle | gifsicle --colors 256 -O3 in.gif > out.gif | gifsicle |
| BC1-7 (DDS) | nvtt_export / texconv / ispc_texcomp | D3D / OpenGL native | nvtt_export --bc7 in.png -o out.dds | nvtt |
| ASTC | astcenc | astcdec / GPU | astcenc -cl in.png out.astc 6x6 -medium | astcenc |
| KTX2 | toktx | ktxinfo | toktx --bcmp 7 out.ktx2 in.png | KTX-Software |
| Basis | basisu | basisu | basisu -ktx2 in.png -output_file out.ktx2 | basis_universal |
| OpenEXR | oiiotool / exrtools | OpenImageIO | oiiotool in.png -o out.exr | OpenImageIO |
| TIFF | libtiff / convert | libtiff | convert in.png -compress lzw out.tif | libtiff |
| RAW | — | dcraw / LibRaw / rawtherapee-cli | dcraw -v -w in.NEF | libraw |
| DICOM | dcmconv | dcmdump / dcm2pnm | dcm2pnm in.dcm out.pnm | dcmtk |
| SVG | svgo / inkscape / resvg | resvg / browser | svgo in.svg -o out.svg | svgo |
| FITS | astropy / cfitsio | astropy / ds9 | python -c "from astropy.io import fits; ..." | astropy |
| generic | libvips / ImageMagick | same | vips copy in.png out.avif[Q=60] | libvips |
几个值得知道的事实
A few worth-knowing facts
- libvips 是隐藏的性能王者 —— 大批量处理比 ImageMagick 快 5-10×、内存占用低 10×。所有需要"批量转码 100k+ 张图"的场景都应该首选 vips。
- jpegli 是 Google 2024 年从 libjxl 仓库剥出来的"现代 JPEG 编码器" —— 同 quality 比 mozjpeg 体积小约 35%,而且产出仍是合法 JPEG,所有 JPEG 解码器都能读。
- oxipng 是 Rust 写的 PNG 重压缩器 —— 比 pngcrush 快 5-10×,体积稍小;
oxipng -o6是大多数项目的默认配方。 - Squoosh CLI(
npm i -g @squoosh/cli)是浏览器中 squoosh.app 的命令行版本 —— 一个 Node 包搞定 AVIF / WebP / JXL / mozjpeg / oxipng,适合 CI 流水线。
- libvips is the hidden performance king — bulk pipelines run 5-10× faster than ImageMagick at one-tenth the memory. Any "batch-convert 100 k images" job should reach for vips first.
- jpegli, spun out of the libjxl repo by Google in 2024, is the "modern JPEG encoder" — about 35% smaller than mozjpeg at the same quality, and the output is still legal JPEG that every decoder can read.
- oxipng is a Rust PNG re-packer — 5-10× faster than pngcrush and slightly smaller;
oxipng -o6is the default recipe for most projects. - Squoosh CLI (
npm i -g @squoosh/cli) is the command-line cousin of the browser-based squoosh.app — one Node package wraps AVIF / WebP / JXL / mozjpeg / oxipng, ideal for CI pipelines.
DevTools 看响应头与解码时间
DevTools — response headers & decode time
浏览器选择哪种格式不是玄学,而是三个 HTTP 头 + 一段 JS 解码任务共同决定的。Network 面板看 Accept(请求时浏览器宣告支持哪些格式)、Content-Type(响应里服务器实际返回什么)、Content-Length(字节数)三个头,这是picture / source 协商的全部凭证。Performance 面板里"Decode Image"任务才是真实的代价 —— AVIF 比 JPEG 慢 3 倍、JXL 又比 AVIF 快 2 倍,这些差异在 4G 慢网下会被字节数掩盖,在 5G 快网或本地 CDN 下却开始主导首屏渲染时间。
Which format the browser picks is not magic — it's decided by three HTTP headers plus a JS decode task. The Network panel shows Accept (the browser announces what it supports), Content-Type (what the server actually returns), and Content-Length (the byte count). These three are the entire vocabulary of picture / source negotiation. The Performance panel's "Decode Image" task is the real cost — AVIF decodes about 3× slower than JPEG, JXL about 2× faster than AVIF; over slow 4G the byte savings dominate, but on 5G or a near CDN, decode time starts to set first-paint.
Network 面板
Network panel
Accept(请求头)+ Content-Type(响应头)+ Vary: Accept(响应头)三段共同构成"按浏览器选格式"的协商凭证。CDN 上一定要设 Vary: Accept,否则 Chrome 拿到 AVIF 后,Safari 也会拿到同一份 AVIF —— 然后解码失败。Accept (request) + Content-Type (response) + Vary: Accept (response) form the negotiation contract. CDNs must set Vary: Accept, otherwise Chrome's AVIF cache will be served to Safari, which then fails to decode.Performance 面板 — Decode 任务
Performance panel — decode task
picture + source fallback 链
picture + source fallback chain
<picture> + 多 <source> 的 fallback 链是树状匹配。浏览器自顶向下扫描,遇到第一个 type 自己支持的就停下,后面的 source 完全不下载。所以"AVIF → WebP → JPEG"的顺序很重要 —— 反过来写 JPEG 永远赢,AVIF 永远没机会。<picture> with multiple <source> tags is a tree-shaped match. The browser scans top-down and stops at the first type it supports — every later source is never fetched. Order matters: "AVIF → WebP → JPEG" is correct; reversed, JPEG always wins and AVIF never gets a chance.同图横评 — 解码时间条形图
Same image — decode time bars
把"Accept 头 + Content-Type 响应 + Vary: Accept 缓存指令"理解透,你就抓住了"为什么这台浏览器收到 AVIF、那台收到 JPEG"的全部机理。把 Performance 面板里的 Decode Image 长度看习惯,你就知道"是不是该用 AVIF"不只是字节问题,而是字节 ÷ 解码时间的比值问题。
Understand the trio "Accept request + Content-Type response + Vary: Accept cache directive" and you have the full mechanism for "why this browser got AVIF and that one got JPEG." Get used to reading Decode Image durations in the Performance panel, and "should I serve AVIF" stops being a byte question and becomes a bytes ÷ decode-time ratio question.
libvips vs ImageMagick — 性能对比表
libvips vs ImageMagick — performance comparison
两个最常见的"通用图像处理库",设计目标完全相反 —— ImageMagick 1990 年代生于 Unix 工具传统,先把整张图解码到内存再做操作,简单直接;libvips 1990 年代末由 VIPS 项目演化而来,核心思路是streaming pipeline:像素一行一行流过处理链,从不把整张图加载进内存。这条架构差异在批量处理场景被放大成 5-10× 的速度差和 10× 的内存差,直接决定了它们各自的最佳战场。
The two most common general-purpose image-processing libraries are designed for opposite goals. ImageMagick, born of 1990s Unix tools, decodes the whole image into memory and operates on it — simple and direct. libvips evolved from the VIPS project of the late 1990s and is built on a streaming pipeline: pixels flow through the chain row by row, the full image never loads into RAM. That single architectural choice expands into 5-10× speed and 10× memory differences at scale — and that decides which library belongs where.
| metric | libvips | ImageMagick |
|---|---|---|
| 设计 / design | streaming + parallel(pthread) | full-load(全图入内存) |
| 100 张 4K → JPEG 时间 | ~8 s | ~60 s |
| 100 张 4K → AVIF 时间 | ~120 s | ~600 s |
| 峰值内存 / peak RAM | ~50 MB | ~500 MB |
| 命令行 / CLI | vips copy in.png out.jpg[Q=85] | convert in.png -quality 85 out.jpg |
| 学习曲线 / learning curve | 中(API 风格独特) | 低(命令名好记) |
| format coverage | 常用 + 现代(AVIF / WebP / JXL) | 250+ 格式(含老格式 / 罕见容器) |
| 典型用法 / typical use | 批量服务 · 高吞吐 thumbnail | 单张定制 · 复杂滤镜 · 老格式恢复 |
数据基于 libvips 官方 benchmark + 社区验证,实测值因机型 / 任务类型浮动 ±30%。结论是稳定的:需要批量、需要省内存、需要快 用 libvips;需要冷门格式、需要复杂滤镜、单次任务 用 ImageMagick。
Numbers are taken from the libvips official benchmark plus community runs; real values shift ±30% by hardware and task type. The takeaway is robust: pick libvips when you need bulk, low memory, high speed; pick ImageMagick when you need rare formats, complex filters, or one-off jobs.
两种内存模型示意
Two memory models
一句记忆口诀:"vips 流水、IM 大屋" —— vips 像生产线传送带,材料(像素行)源源不断流过工位;ImageMagick 像把所有材料堆进一个大房间再一起加工。两种思路都对,只是适合不同规模。
A mnemonic: "vips is the conveyor, IM is the warehouse" — vips moves rows past stations like an assembly line; ImageMagick piles everything into one big room and processes in place. Both philosophies work; they just fit different scales.
「我应该用哪个格式」决策树
"Which format should I use" — decision tree
本文走完 50+ 格式之后,最常被问到的还是这一句:"那我到底用哪个?"答案分两层:第一层是用途场景 —— 屏幕显示、GPU 纹理、HDR 影视、科学医学,用途不同候选集就完全不同;第二层是具体约束 —— 兼容老浏览器吗?要透明通道吗?是 16-bit 工程影像吗?这张决策树是出发点,不是教条 —— 真实工程里你可能因为某个客户的 IT 政策只能用 JPEG,或因为某个 GPU 不支持 BC7 而退回 BC1,这些场景决策树之外。
After 50+ formats, the question we still get most is: "OK, so which one do I use?" The answer has two layers. Layer one is use case — screen display, GPU texture, HDR film, science / medicine; different domains, totally different shortlists. Layer two is specific constraints — must support legacy browsers? need alpha? 16-bit engineering imagery? This tree is a starting point, not gospel — real projects sometimes pin you to JPEG for IT-policy reasons, or fall back from BC7 to BC1 because of a target GPU; those edge cases live outside the tree.
树根问"用途",叶子才到具体格式。中间几跳问的是"老浏览器要不要兜底""有没有透明""是不是 16-bit"。同一片叶子(比如"屏幕显示 · 照片"),最终选 AVIF 还是 JPEG,取决于客户群是不是全在 Safari 16+。这棵树没有"绝对正确",只有"在你的约束下,谁先 +谁兜底"。
The root asks "what for"; only the leaves name a format. The middle hops ask "do legacy browsers need a fallback?" "is there alpha?" "is it 16-bit?". On the same leaf — say "screen · photo" — choosing AVIF over JPEG depends entirely on whether your audience is all on Safari 16+. The tree has no absolute right answer; only "given your constraints, what's the primary and what's the fallback?"
四个职业场景的"开箱组合"
Four professional starter kits
这四个组合不是"最优解",只是"开箱推荐"。真实工程里你会遇到甲方只接受 JPEG、Unity 强制要求 ASTC LDR、医院 PACS 系统只识别 DICOM 1995 子集 —— 这些约束才是决策树之外的真正变量。但当约束消失时,这四套组合是 2026 年最不出错的起点。
These four kits aren't "optimal" — they're "out-of-the-box recommendations." Real projects hit constraints — clients accepting only JPEG, Unity demanding ASTC LDR, a hospital PACS that only reads a 1995 DICOM subset — and those constraints are the real variables outside the tree. But when constraints recede, these four kits are the safest 2026 starting points.
像素的归宿
The fate of the pixel
它出生在一颗 CMOS sensor 的硅井底部 —— 一个 14-bit 的电荷,被 ADC 抬上数字总线,被相机固件写进一个叫 .ARW 的 RAW 文件,sleep 在某张 SD 卡的 NAND 块上六个月没人看。它当时还不是"像素",它只是一个电压样本,一个 16384 之中的整数,带着读出噪声和热噪声,带着一行 EXIF 和一段 ICC profile 等待被解释。
It was born at the bottom of a CMOS sensor's silicon well — a 14-bit charge, lifted onto a digital bus by an ADC, written by camera firmware into a .ARW RAW file, sleeping in the NAND of some SD card for six months with no one looking. It wasn't a "pixel" yet — just a voltage sample, an integer out of 16384, carrying read noise and thermal noise, a line of EXIF, and an ICC profile waiting to be interpreted.
六个月后它被 LibRaw 解码成 16-bit linear,被 Lightroom 调色,被导出成 16-bit TIFF 进 Photoshop 修瑕,被另存为 sRGB JPEG 上传朋友圈,又被同一张图压成 AVIF 上博客 hero,被 Cloudflare CDN 缓存到全球 200 个边缘节点,被一万个浏览器在一分钟内同时解码,在某些 WebGL 场景里它被上传到 GPU 显存压成 BC7 块,被 fragment shader 采样过 12 次,被 ICC profile 从 sRGB 映到 Display P3,被 mipmap 选了 LOD 2,被 trilinear filter 平滑掉了高频。它有时是 24 bit,有时是 8 bit,有时是 4 bit/pixel,有时是浮点。它一直在变形。
Six months later LibRaw decodes it into 16-bit linear, Lightroom grades it, it exports as 16-bit TIFF into Photoshop for retouching, saves as sRGB JPEG to a social feed, gets re-compressed as AVIF for a blog hero, lives in Cloudflare's CDN across 200 edge nodes, decodes simultaneously in ten thousand browsers within a minute, gets uploaded to GPU memory as a BC7 block in some WebGL scene, is sampled 12 times by a fragment shader, gets remapped from sRGB to Display P3 by an ICC profile, picks LOD 2 from a mipmap chain, gets smoothed by a trilinear filter. Sometimes it's 24 bits, sometimes 8, sometimes 4 bits per pixel, sometimes floating point. It never stops changing shape.
它最后变成了屏幕上一个发光的小方块。它当过 RAW、当过 AVIF、当过 BC7、当过显存、当过电压、当过光子。每一段旅程都给它换了一个容器,但它一直是同一颗像素 —— 一个被反复翻译、反复重写、反复压缩、反复采样,却始终保留某种"原意"的微小信号。
It ends as a glowing square on a screen. It has been a RAW, an AVIF, a BC7 block, GPU memory, a voltage, a photon. Every leg of the journey gave it a different container — but it stayed the same pixel: a tiny signal repeatedly translated, rewritten, compressed, and sampled, somehow holding onto its original meaning through every transform.
三个反直觉结论
Three counter-intuitive takeaways
沉淀这五十多种格式之后,有三件事是写完之前没意识到的。
After settling fifty-plus formats into this codex, three things surprised me — none of which I expected before writing.
QOI(2021)的 spec 一页 A4 写得下,实现 300 行 C 代码,比 PNG(1996)简单 100×,却只慢 PNG 3-4× / 大 5-10%。Farbfeld(2014)更激进 —— 干脆不压缩,只做"标头 + 像素"。PCX(1985)的 RLE 在纯色场景甚至比 PNG 还小。简洁是一种持久的设计姿态,不是历史遗物 —— 当 GIF 还在被使用、BMP 还在 Windows 剪贴板里跑、JPEG 仍占 web 图像 60%,你会发现"老"和"差"是两个独立维度。
QOI's spec (2021) fits on one A4 page; the reference implementation is 300 lines of C — 100× simpler than PNG (1996), only 3-4× slower and 5-10% larger. Farbfeld (2014) goes further — no compression at all, just "header + pixels." PCX's RLE (1985) beats PNG on flat-color art. Simplicity is a durable design posture, not a relic — when GIF is still in use, BMP still drives the Windows clipboard, and JPEG still serves 60% of the web, "old" and "bad" turn out to be independent axes.
直觉上块越大越糊,但 ASTC 6×6(2.22 bpp)对比 BC7 4×4(8 bpp)显存少 2.25×,ΔPSNR 只有不到 1 dB,SSIM 差异在双盲测里几乎不可分辨。移动游戏开发者一致默认 ASTC 6×6 是甜点;Unity 的 mobile preset 直接以 6×6 为默认。压缩比的甜点不在 4×4,而在让 GPU 缓存命中率最大化的那个块大小 —— 4×4 太奢侈,8×8 太糊,6×6 恰好。
Intuition says bigger blocks blur more — but ASTC 6×6 (2.22 bpp) versus BC7 4×4 (8 bpp) is 2.25× less VRAM with under 1 dB of PSNR loss; SSIM differences are essentially invisible in blind tests. Mobile game devs converge on ASTC 6×6 as the sweet spot; Unity's mobile preset defaults to it. The sweet spot of texture compression isn't 4×4 — it's whichever block size maximizes GPU cache hit rate. 4×4 is luxury, 8×8 is mush, 6×6 is just right.
JXL 在所有维度都赢:HDR、lossless、JPEG 无损 transcode、渐进式解码、解码速度。但 Chrome 团队在 2022-10 以"业界兴趣不足"为由从 Chromium 砍掉 flag,理由是 AVIF 已经够用 —— 而 AVIF 背后是 AOMedia(Google + Netflix + Amazon + 多家芯片厂的联盟)。技术从不单独决定胜负,生态决定。同样的故事在 WebP vs JPEG2000、HEIC vs JXR、Opus vs Vorbis 反复上演;能写进一个 Chromium 的 if-branch,胜过所有 paper 上的曲线。
JXL wins on every axis: HDR, lossless, lossless JPEG transcode, progressive decoding, decode speed. But in October 2022 the Chrome team pulled the flag from Chromium citing "insufficient ecosystem interest" — AVIF, they argued, was enough. And AVIF stands on AOMedia (Google + Netflix + Amazon + chip vendors). Technology rarely decides alone — ecosystems do. The same story replays in WebP vs JPEG2000, HEIC vs JXR, Opus vs Vorbis: shipping inside one Chromium if-branch beats every curve in every paper.
三个发现共享一个底色:格式不是被它的技术指标决定的,是被使用它的人决定的。
All three share the same undertone: a format isn't decided by its technical merits — it's decided by the people who use it.
参考与扩展阅读
References & further reading
本文写作的关键依据。按章节分组,数据 / 引用全部来自公开来源。如发现错漏,欢迎邮件指正。
Sources this article relies on, grouped by phase. All data and quotations are drawn from public references — corrections welcome by email.
Phase I · Web 显示派
Phase I · Web display
- RFC 2083 — PNG (Portable Network Graphics) Specification, 1997
- ISO/IEC 15948 — PNG Specification (Second Edition), 2003
- ISO/IEC 10918-1 — JPEG, 1992
- RFC 1951 — DEFLATE Compressed Data Format Specification
- RFC 9082 — AVIF Image File Format
- ISO/IEC 23008-12 — HEIF (Image File Format)
- AOMedia AV1 Bitstream & Decoding Process Specification, v1.0.0-errata1
- libwebp documentation — developers.google.com/speed/webp
- mozjpeg — github.com/mozilla/mozjpeg
- Squoosh source — github.com/GoogleChromeLabs/squoosh
- Cloudflare blog — "Generating WebP, AVIF and JPEG XL all at once"
- Jon Sneyers — "The case for JPEG XL", Cloudinary blog (2021)
- Chrome JXL removal — bugs.chromium.org/p/chromium/issues/detail?id=1178058
- Smashing Magazine — "Comparing JPEG-XL, AVIF, WebP & JPEG" (2022)
Phase II · GPU 纹理派
Phase II · GPU textures
- Khronos KTX 2.0 specification — registry.khronos.org/KTX/specs/2.0/ktxspec_v2.html
- Basis Universal — github.com/BinomialLLC/basis_universal
- ARM ASTC specification — developer.arm.com/documentation/100672
- D3D11 BC1-BC7 specification — Microsoft Docs (Direct3D 11 texture block compression)
- Intel ISPCTextureCompressor — github.com/GameTechDev/ISPCTextureCompressor
- Lance Williams — "Pyramidal Parametrics", SIGGRAPH (1983)
- OpenGL ES 3.0 / 3.2 specification — Khronos Group
- NVIDIA Texture Tools — github.com/NVIDIAGameWorks/NVIDIATextureTools
- Iourcha, Nayak & Hong — "System and method for fixed-rate block-based image compression with inferred pixel values" (S3TC, 1999)
Phase III · HDR / 工程影像
Phase III · HDR / engineering imaging
- OpenEXR — openexr.com (Academy Software Foundation)
- Greg Ward — "Real Pixels", Graphics Gems II (1991, RGBE format)
- Adobe DNG specification 1.7 — helpx.adobe.com/camera-raw/digital-negative.html
- LibRaw documentation — libraw.org
- NEMA DICOM Standard PS 3.x — dicomstandard.org
- TIFF 6.0 specification — Adobe (1992)
- SMPTE ST 268 — DPX File Format for Digital Moving-Picture Exchange
- Dave Coffin's dcraw — cybercom.net/~dcoffin/dcraw/
- OpenColorIO — opencolorio.org
- ITU-R BT.2100 — Image parameter values for HDR television
- SMPTE ST 2084 — Perceptual Quantizer (PQ) transfer function
Phase IV · 矢量 / 文档
Phase IV · Vector / document
- W3C SVG 1.1 / SVG 2 Recommendation — w3.org/TR/SVG2/
- ISO 32000-1 / -2 — Document management — Portable Document Format (PDF)
- ITU-T T.88 — JBIG2 (Joint Bi-level Image experts Group)
- Adobe PostScript Language Reference Manual, 3rd ed. (1999)
- Lottie — airbnb.design/lottie / lottiefiles.com
- Encapsulated PostScript File Format Specification, Adobe v3.0
- WMF / EMF — Microsoft Open Specifications [MS-WMF], [MS-EMF]
Phase V · 复古 / 怪格式
Phase V · Retro / oddities
- QOI specification — qoiformat.org / github.com/phoboslab/qoi
- Farbfeld — tools.suckless.org/farbfeld/
- NetPBM (PBM/PGM/PPM) — netpbm.sourceforge.net
- EA IFF '85 specification — Jerry Morrison, Electronic Arts (1985)
- Truevision TGA File Format Specification, v2.0 (1989)
- ZSoft PCX Technical Reference Manual (1988)
- BMP / DIB structure — Microsoft Docs (Win32 GDI)
- XPM — X PixMap format, X.Org reference
Phase VI · 卫星 / 科学
Phase VI · Satellite / science
- FITS Standard 4.0 — fits.gsfc.nasa.gov/fits_standard.html
- OGC GeoTIFF 1.1 specification — ogc.org/standard/geotiff/
- NITF MIL-STD-2500C — National Imagery Transmission Format
- astropy — astropy.org
- GDAL — gdal.org (Geospatial Data Abstraction Library)
- Cloud-Optimized GeoTIFF (COG) — cogeo.org
- Zarr — zarr.dev (chunked, compressed N-dimensional arrays)
Phase VII · 神经压缩 / 未来
Phase VII · Neural / future
- Toderici et al. — "Variable Rate Image Compression with Recurrent Neural Networks", ICLR 2016
- Ballé, Minnen et al. — "Variational Image Compression with a Scale Hyperprior", ICLR 2018
- Mentzer, Toderici et al. — "High-Fidelity Generative Image Compression" (HiFiC), NeurIPS 2020
- Yang, Mandt — "Lossy Image Compression with Conditional Diffusion Models" (CDC), NeurIPS 2023
- CompressAI — github.com/InterDigitalInc/CompressAI
- ISO/IEC 21122 — JPEG XS (low-latency lightweight image coding)
- WebP2 — chromium.googlesource.com/codecs/libwebp2 (experimental)
- JPEG AI — Call for Proposals, ISO/IEC JTC 1/SC 29/WG 1 (2022)
综合 / 工具
General / tools
- libvips documentation — libvips.github.io/libvips/
- ImageMagick documentation — imagemagick.org
- OpenImageIO — openimageio.readthedocs.io
- David Salomon — "Data Compression: The Complete Reference", 4th ed. (Springer)
- Khalid Sayood — "Introduction to Data Compression", 5th ed.
- Charles Poynton — "Digital Video and HD: Algorithms and Interfaces", 2nd ed.
合计 ~70 条参考,覆盖 8 组。完整列表可视为这条沉积带的"地层钻孔",每一层都能往下挖。
About 70 references in total across 8 groups. Treat the list as a borehole through this sedimentary band — every stratum can be dug deeper.