ursb.me / notes
FIELD NOTE / 05 图像编码 · 格式百科 Image Codecs · Codex 2026

沉积
像素

Sediment
of Pixels.

从 1985 BMP 到 2026 神经压缩,从屏幕显示到 GPU 显存,从医学 CT 到天文 FITS — 一份手绘的图片格式百科。

From 1985 BMP to 2026 neural codecs, from screen pixels to GPU memory, from medical CT to astronomical FITS — a hand-drawn codex of image formats.

穿过 7 个阵营的旅程 A journey through 7 families ▸ 7 phases
Web
screen pixels
GPU
block-compressed
HDR
high bit-depth
Vector
math, not pixels
Retro
time machine
Science
CT · sky · maps
Future
neural codecs

一颗像素的 50 种归宿

50 fates of one pixel

一颗像素从相机 sensor 出生 —— 它当时不是像素,是电荷。它经过 ADC 变成 12 位的数字,被写进 RAW 文件,然后一切才开始。

A pixel is born on a camera sensor — though, strictly, it isn't a pixel yet, it's charge. It passes through an ADC, becomes a 12-bit number, gets written into a RAW file, and only then does the journey begin.

这颗像素接下来可能:被压成 8 bit JPEG 上传社交网络;被 CDN 缓存;被另一台浏览器解码;被上传到 GPU 显存变成 BC7 4×4 块;被 fragment shader 采样;最后变成屏幕上一个会发光的小方块。

Next, this pixel might be: squeezed into 8-bit JPEG, uploaded to a social network, cached on a CDN, decoded in another browser, uploaded to GPU memory as a BC7 4×4 block, sampled by a fragment shader — and finally become one glowing square on a screen.

这是同一颗像素,但它穿过的容器有 50 多种形状。这篇文章是这些形状的百科。

This is the same pixel, but the containers it passes through have more than 50 shapes. This article is a codex of those shapes.

下一节是一张图 —— 50+ 个图片格式按时间和阵营画成的家族树。先看那张图,再选你最关心的阵营深入。这篇可以读半小时,也可以查一辈子。

The next section is a single image — 50+ image formats drawn as a family tree by time and family. Look at the tree first, then dive into the family you care about most. You can read this in half an hour, or use it for the rest of your life.

图片格式 家族树

A family tree of image formats

横轴是时间(1985 → 2026),纵轴是 7 个阵营。每个节点是一个格式,实心方块是 13 个"扛把子"。点任意节点跳到对应章节。

X-axis is time (1985 → 2026), Y-axis is the 7 families. Each node is a format; filled squares are the 13 "heavy hitters". Click any node to jump to its chapter.

birth
edit
compress
transmit
decode
VRAM
sample
screen

图 0.2 · 50+ 个图片格式按时间(横轴)和阵营(纵轴)分布。实心方块是 13 个"扛把子"(单独成专章),空心圆是其余格式。线表示继承/替代/派生/致敬。下方为 8 站像素旅程示意,各阶段对应不同阵营色。点击节点跳到章节。

Fig 0.2 · 50+ image formats laid out by time (X) and family (Y). Filled squares are the 13 "heavy hitters" (with extended chapters); circles are the rest. Lines mark inherit / replace / derive / tribute. The 8-stop bar below is the pixel's journey, colored by family at each stage. Click any node to jump.

BMP — 没有压缩的童年

BMP — A Childhood Without Compression

YEAR 1990 AUTHOR Microsoft / IBM EXT .bmp / .dib MIME image/bmp STD vendor (Microsoft) LOSSY lossless (opt. RLE) DEPTH 1/4/8/16/24/32 bit ALPHA since BMPv5 ANIM none STATUS legacy (still inside Windows)

把像素原样写进硬盘,这就够了。

Just write the pixels straight to disk; that's enough.

1980 年代末,Windows 需要一个不依赖任何压缩库、可以从显存里直接 dump 出来、又能直接 load 回去的位图容器。设计目标根本不是体积,而是"零依赖、零解码、零思考"。BMP 因此把当年显存的扫描方向、字节顺序、行对齐规则一并写进了文件头,并固定下来——三十多年后,这些 80 年代显存的影子仍然活在 .bmp 里。

In the late 1980s Windows needed a bitmap container that depended on no compression library, could be dumped straight from video memory, and loaded straight back in. The design goal was not file size; it was zero dependencies, zero decoding, zero thinking. BMP froze the scan direction, byte order, and row alignment of the era's video memory into the file header. Three decades later, those 1980s VRAM ghosts still live inside every .bmp.

FILE 14 B BITMAPFILEHEADER INFO 40 B BITMAPINFOHEADER PALETTE opt · 4n B if depth ≤ 8 B G R pad → 4-byte align … rows bottom-up … PIXEL ARRAY · BGR · bottom-up · 4-byte aligned row 0 = bottom offset 0 14 54 54 + 4·N
图 1 · BMP 文件结构。FILE + INFO 两个头(共 54 字节)→ 可选调色板 → 像素阵列。像素以 BGR 顺序、自下而上存放,每行末尾用 0 padding 至 4 字节对齐——这三条都源自 1980 年代 Windows 显存布局。
Fig 1 · BMP file layout. Two headers (54 B total) → optional palette → pixel array. Pixels are stored in BGR order, bottom-up, with each row zero-padded to a 4-byte boundary — three rules inherited verbatim from 1980s Windows VRAM layout.

技术内核

Technical core

BMP 由两段定长头组成:14 字节 BITMAPFILEHEADER(magic "BM" + 文件大小 + 像素数据偏移)和 40 字节 BITMAPINFOHEADER(宽、高、位深、压缩方式、调色板大小等)。像素阵列自下而上排列——origin 在左下角,这直接对应 80 年代 CRT 显存的扫描方向。颜色通道顺序是 BGR 而不是 RGB,同样源自当年 Windows 显存的字节排列;打开任何一个 .bmp,前三个像素字节读出来都是蓝绿红。每一行的字节数必须是 4 字节的整数倍——不足的尾部用 0 padding,目的是让 32 位 CPU 一次取一个像素时不需要做对齐计算。后期的 BMPv4 / BMPv5 加了 RLE-4 / RLE-8 行程编码、bitfield 自定义通道掩码、ICC profile、alpha 通道,但生态并没有真的跟进——大多数解码器只认那 40 字节头。

A BMP file is two fixed-size headers: a 14-byte BITMAPFILEHEADER (magic "BM", file size, pixel-data offset) and a 40-byte BITMAPINFOHEADER (width, height, bit depth, compression, palette size). The pixel array is stored bottom-up: origin in the lower-left corner, mirroring the scan direction of 1980s CRT VRAM. Channel order is BGR, not RGB — again copied from how Windows laid out video memory. Every row must be padded with zeros to a multiple of 4 bytes, so a 32-bit CPU can read one pixel per fetch with no alignment math. Later BMPv4 / BMPv5 added RLE-4 / RLE-8 run-length encoding, bitfield channel masks, ICC profiles, and a real alpha channel — but the ecosystem never caught up; most decoders still only recognise the original 40-byte info header.

适用

USE FOR

  • Windows 系统资源(.cur 光标、.ico 图标内层)
  • 嵌入式 / RTOS 等没有解码库可用的环境
  • 编解码器教学:位图最素的样本
  • 临时 dump:debug 时把 framebuffer 直接存盘
  • Windows system resources (.cur cursors, .ico inner bitmaps)
  • Embedded / RTOS targets with no decoder library
  • Codec teaching: the most stripped-down bitmap sample
  • Throw-away framebuffer dumps when debugging

反适用

AVOID

  • 任何 web 场景:体积是 PNG 的 5–20×
  • 移动端 / 流量敏感传输
  • 需要 metadata、color profile、HDR 的工程影像
  • 需要稳定 alpha 的 UI 资源(BMPv5 兼容性差)
  • Anything on the web: 5–20× larger than PNG
  • Mobile / bandwidth-sensitive delivery
  • Engineering imagery that needs metadata, color profiles, or HDR
  • UI assets needing reliable alpha (BMPv5 support is patchy)
scopebrowserstoolsCLI
BMP 100% (no one ships it on the web) ✓✓✓ Photoshop · GIMP · Paint · Preview convert in.png out.bmp (ImageMagick)
父:parent: none (one of the earliest PC bitmap standards) 子:children: ICO/CUR (embeds BMP) · concept lives on in NetPBM / TGA / Farbfeld

GIF — 1987 与 LZW 专利往事

GIF — 1987 and the LZW Patent Saga

YEAR 1987 (87a) · 1989 (89a) AUTHOR CompuServe · Steve Wilhite EXT .gif MIME image/gif STD CompuServe → de-facto LOSSY palette-lossy + LZW lossless DEPTH 8-bit indexed ALPHA 1-bit binary ANIM STATUS survivor (memes keep it alive)

用 256 个颜色坚持了 39 年。

Held the line with 256 colors for 39 years.

1987 年是拨号上网的年代——一张 100 KB 图片要传整整一分钟。CompuServe 需要一种比 BMP 小得多、跨平台、还能拼成动图的格式。Wilhite 拿了刚发表不久的 LZW 字典压缩,加上 256 色调色板,做出了 GIF87a。两年后 89a 又补上透明色与动画扩展,从此一锤定音——并且谁也没想到,它会撑过 GeoCities、撑过宽带、撑过 Flash,最后被 Twitter 时代的"表情包"二次激活。

1987 was the dial-up era — a 100 KB image took a full minute to download. CompuServe needed something far smaller than BMP, cross-platform, and capable of stitching frames into a loop. Wilhite combined freshly published LZW dictionary compression with a 256-colour palette and shipped GIF87a. Two years later 89a added transparency and animation extensions, locking the format in. No one expected it to outlive GeoCities, broadband, and Flash — only to be re-ignited by the Twitter-era reaction-meme.

PALETTE · 8×32 = 256 ×32 rows 256 entries · 8-bit index LZW · variable-width 9–12 bit in : A B A B A B A C dict: A=#65 · B=#66 · C=#67 AB=#258 · BA=#259 ABA=#260 · BAB=#261 ABAC=#262 out: #65 #66 #258 #260 #67 → 9-bit codes grow up to 12-bit DISPOSAL · 3 modes F1 F2 F3 1 · keep prev frame 2 · restore background 3 · restore previous + NETSCAPE2.0 ext = ∞ loop
图 2 · GIF 三件套。左:全局 256 色调色板,每像素一个 8-bit 索引。中:LZW 字典随输入流动态扩展(9 → 12 bit 自适应)。右:多帧 + frame disposal 三种模式;NETSCAPE2.0 扩展开启了"无限循环"。
Fig 2 · The GIF trio. Left: global 256-colour palette, one 8-bit index per pixel. Middle: LZW dictionary grows on the fly (9 → 12 bit adaptive). Right: multi-frame stack with three disposal modes; the NETSCAPE2.0 extension is what unlocks infinite looping.

技术内核

Technical core

GIF 由四件事拼出来:① 调色板——一张全局调色板(GCT),最多 256 个 RGB888 entry,每帧也可以再覆盖一张局部调色板(LCT);② LZW 压缩——变长 9 至 12 bit 字典,字典随像素索引流动态扩展,字典满了就清空重来;③ 多帧 + disposal——每帧带一个 Graphic Control Extension,disposal method 决定下一帧绘制前如何处理当前帧(保留 / 还原背景 / 还原前一帧);④ 89a 扩展——透明色索引(让其中一个 palette 槽位变透明,所以 alpha 永远是 1 bit)、Comment / Plain Text Extension、以及最关键的 NETSCAPE2.0 Application Extension——后者带一个 16-bit 循环计数,从此 GIF 可以无限循环。这个扩展不是 GIF 标准的一部分,是 Netscape 1995 年自己加的——但今天每一张循环 GIF 都欠 Netscape 一个 credit。

GIF is four things stitched together. ① Palette — one global colour table (GCT) of up to 256 RGB888 entries, optionally overridden per frame by a local table (LCT). ② LZW compression — a variable-width 9-to-12-bit dictionary that grows with the pixel-index stream and resets when full. ③ Frames + disposal — each frame carries a Graphic Control Extension whose disposal method tells the decoder how to wipe the previous frame (keep / restore-background / restore-previous). ④ 89a extensions — a transparent-colour index (one palette slot becomes "transparent", which is why alpha is forever 1-bit), Comment and Plain-Text extensions, and the all-important NETSCAPE2.0 Application Extension that carries a 16-bit loop counter. That last one isn't part of any standard — Netscape just added it in 1995 — yet every looping GIF on Earth still owes Netscape a credit.

适用

USE FOR

  • 表情包 / 反应 GIF(社交平台仍按 GIF MIME 处理)
  • 极简动图、像素艺术、loading 转圈
  • 少色线稿、单色 banner、低保真预览
  • Reaction memes (most platforms still pipe them as image/gif)
  • Minimal motion loops, pixel art, spinners
  • Low-colour line art, monochrome banners, low-fi previews

反适用

AVOID

  • 真实照片:256 色根本不够,带 dither 还是色块
  • 渐变 / 长视频片段:体积爆炸,远不如 mp4 / WebM
  • 需要半透明 alpha 的任何场景
  • Photographs: 256 colours simply isn't enough, even with dither
  • Gradients or long clips: file size explodes; mp4 / WebM win by 10–50×
  • Anything needing real (non-binary) alpha
scopebrowserstoolsCLI
GIF ✓✓✓ universal since Mosaic 1993 ✓✓✓ Photoshop · GIMP · ezgif · Figma gifsicle -O3 in.gif -o out.gif · ffmpeg -i in.mp4 out.gif
父:parent: none (contemporary of BMP, different lineage) 子:children: APNG (carries the animation idea) · WebP / AVIF (visual successors, palette dropped)

PNG — DEFLATE、scanline filter、与一场弑父

PNG — DEFLATE, Scanline Filters, and a Patricide

YEAR 1996 (RFC 2083) · 2003 (W3C / ISO 15948) AUTHOR PNG Dev Group · Thomas Boutell et al. EXT .png MIME image/png STD ISO/IEC 15948 LOSSY lossless DEPTH 1/2/4/8/16 bit/channel ALPHA 8 / 16-bit ANIM none (see APNG) STATUS mainstream · everywhere

GIF 收钱的那天,自由工程师写了一个免费的弑父者。

The day GIF started charging, free engineers wrote its successor.

1995 年初 Unisys 开始执行 LZW 专利,所有 GIF 编码器都要付费——包括 CompuServe 自己。Usenet comp.graphics 上 Thomas Boutell 在两周内拉起一支 30 人志愿团队,目标四条:(a) 完全无专利;(b) 比 GIF 更小;(c) 真正的 alpha 通道,而不是 1 bit 透明色;(d) 16 bit/channel + ICC profile + gamma 校正,为下一个十年的设备做准备。九个月后 PNG 1.0 + zlib + DEFLATE 三个 RFC 同时发布——这是互联网历史上最快、最干净的一场技术弑父。

In early 1995 Unisys began enforcing its LZW patent: every GIF encoder, including CompuServe's own, now owed money. Within two weeks a thirty-person volunteer crew on Usenet's comp.graphics rallied around Thomas Boutell with four goals: (a) wholly patent-free; (b) smaller than GIF; (c) real alpha, not a single transparent palette slot; (d) 16 bit / channel plus ICC profiles and gamma, ready for the next decade of hardware. Nine months later PNG 1.0, zlib, and DEFLATE shipped as three simultaneous RFCs — possibly the cleanest, fastest patricide in internet history.

DEFLATE = LZ77 + HUFFMAN input : T H E _ C A T _ S A T _ O N _ T H E _ C A T back-reference (dist=15, len=8) STEP 1 · LZ77 → literals + (dist, len) tokens: T H E _ C A T _ S A T _ O N _ <15,8> STEP 2 · HUFFMAN → variable-length bits bits : 101 0011 1110 010 11001 0 … ← shorter for frequent symbols
图 3a · DEFLATE 两步走。第一步 LZ77 把重复串替换成 (距离, 长度) 反向引用,第二步对剩下的字面量与引用做 Huffman 变长编码。两步都没专利——zip / gzip / PNG / WOFF 全部用同一套。
Fig 3a · DEFLATE in two passes. LZ77 first replaces repeated runs with (distance, length) back-references, then Huffman gives variable-length codes to the resulting literals and references. Zero patents — zip, gzip, PNG, and WOFF all share this exact stack.
5 SCANLINE FILTERS · per row, picked greedily 0 · None raw bytes 1 · Sub x − left 2 · Up x − above 3 · Average x − ⌊(left+up)/2⌋ 4 · Paeth predict from L,U,UL smaller residuals → DEFLATE compresses harder (bar height = magnitude of byte residual) heuristic: pick the filter whose residuals have the smallest signed-byte sum (libpng's "minsum")
图 3b · 五种 scanline filter。每一行像素独立挑选最优 filter——None / Sub(减左) / Up(减上) / Average(减平均) / Paeth(基于 L、U、UL 三像素的方向预测)。filter 的目的不是压缩,而是把数据变成"更易被 DEFLATE 压缩的样子"。
Fig 3b · The five scanline filters. Each row independently picks the best one — None / Sub (subtract left) / Up (subtract above) / Average (subtract floor-of-mean) / Paeth (a directional predictor over the L, U and UL neighbours). Filters don't compress on their own; they reshape the bytes into something DEFLATE can crush further.
CHUNKS · IHDR → [PLTE] → IDAT × N → IEND SIG 8 byte magic IHDR w · h · depth PLTE opt · indexed IDAT DEFLATE bits ×N tEXt · gAMA iCCP · tRNS … IEND terminator Each chunk: length(4) · type(4) · data(N) · CRC32(4) — unknown chunks are skipped, so the format is infinitely extensible. Lower-case first letter ⇒ ancillary (safe to skip) · Upper-case ⇒ critical.
图 3c · PNG chunk 链。强制顺序 IHDR → [PLTE] → IDAT × N → IEND;中间可以塞任意数量的 ancillary chunks(gamma、ICC、文本元数据等)。每个 chunk 都带 CRC32,且解码器可安全跳过未知 chunk——PNG 因此可以无限扩展而不破坏旧解码器。
Fig 3c · The PNG chunk chain. The mandatory order is IHDR → [PLTE] → IDAT × N → IEND; any number of ancillary chunks (gamma, ICC, text metadata, …) may be sprinkled in between. Every chunk carries a CRC32, and decoders are required to skip unknown chunks safely — which is exactly why PNG keeps gaining features without breaking old readers.
ADAM7 · 7-pass interlace, 8×8 super-block 1 6 4 6 2 6 4 6 7 7 7 7 7 7 7 7 5 6 5 6 5 6 5 6 7 7 7 7 7 7 7 7 3 6 4 6 3 6 4 6 7 7 7 7 7 7 7 7 5 6 5 6 5 6 5 6 7 7 7 7 7 7 7 7 progressive preview: pass 1 = 1/64 of pixels, pass 7 = 1/2
图 3d · Adam7 隔行扫描的 8×8 子块。第 1 趟只发 1 个像素(整图的 1/64),后续每趟密度翻倍——浏览器可以在还没下完整张图时,先按这个顺序拼出一个粗糙预览。
Fig 3d · Adam7's 8×8 super-block. Pass 1 ships a single pixel (1/64 of the image); each subsequent pass doubles density — browsers can render a rough preview before the file finishes downloading.

技术内核

Technical core

PNG 的五个支柱:① DEFLATE = LZ77 + Huffman——和 zip / gzip 完全同款,1996 年的 RFC 1951,从设计第一天起就保证无专利。② 5 种 scanline filter(None / Sub / Up / Average / Paeth):每行像素独立选最优 filter——filter 不压缩,而是把数据预测成残差,让后面的 DEFLATE 更容易找到重复。Paeth filter 用左、上、左上三像素做方向预测,在自然图像上几乎总赢。③ chunks 体系:IHDR / PLTE / IDAT / IEND 是必须的,后面可以追加 tRNS(调色板透明)、gAMA / cHRM / iCCP(色彩管理)、tEXt / iTXt(metadata)、acTL / fcTL(APNG 扩展)等等;每个 chunk 一个 CRC32 校验,未知 chunk 解码器必须安全跳过——因此 PNG 的扩展性几乎是无限的。④ 真 alpha:8 或 16 bit 的独立 alpha 通道,不再借调色板槽位伪装;PNG-32 = RGBA 8-bit。⑤ Adam7 interlace:7 趟扫描的渐进显示——20 年前在 56 K 调制解调器上极其有用,今天不再常用。

Five pillars hold PNG up. ① DEFLATE = LZ77 + Huffman — the exact stack used by zip and gzip, RFC 1951, patent-free by construction. ② Five scanline filters (None / Sub / Up / Average / Paeth): each row picks its best filter independently — the filter doesn't compress, it predicts residuals so DEFLATE can spot repetitions. Paeth, which predicts from the left, upper and upper-left neighbours, almost always wins on natural images. ③ The chunks system: IHDR / PLTE / IDAT / IEND are mandatory; everything else (tRNS for palette transparency, gAMA / cHRM / iCCP for colour management, tEXt / iTXt for metadata, acTL / fcTL for APNG, …) is optional and CRC-checked, and decoders are required to ignore chunks they don't recognise — so PNG can grow forever without breaking old readers. ④ Real alpha: an independent 8- or 16-bit alpha channel, no longer disguised as a palette slot. PNG-32 is plain RGBA 8-bit. ⑤ Adam7 interlace: a 7-pass progressive scan — invaluable on 56 K modems twenty years ago, mostly obsolete today.

PNG ENCODE PIPELINE RGBA w · h · 8 / 16 bit raw pixels in memory FILTER per row choose 1 of 5 None · Sub · Up Average · Paeth DEFLATE LZ77 + Huffman zlib level 0–9 window 32 KB CHUNK PACK IHDR · IDAT × N + ancillary chunks + CRC32 each .png 8B sig + chunks + IEND filters reshape, they do not compress all the saving happens here opt: Adam7 interlace 7-pass progressive · pre-filter Critical chunks (uppercase): IHDR · IDAT · IEND · PLTE · ancillary (lowercase): tRNS · gAMA · cHRM · sRGB · iCCP · tEXt · iTXt · acTL · fcTL · fdAT Knob to the encoder author: which filter to pick per row, and which zlib level — everything else is fixed by the spec.

图 3 · 全流程 · 原始 RGBA → 逐行选 filter(None / Sub / Up / Average / Paeth)→ DEFLATE 压(LZ77 + Huffman, level 0–9)→ 打包成 chunks(IHDR + IDAT×N + IEND, 每个带 CRC32)→ 输出 .png 文件。可选:在 filter 之前先做 Adam7 隔行重排。

Fig 3 · Full pipeline · raw RGBA → per-row filter (None / Sub / Up / Average / Paeth) → DEFLATE (LZ77 + Huffman, zlib level 0–9) → pack into chunks (IHDR + IDAT × N + IEND, each CRC-checked) → emit .png. Optional: pre-shuffle rows with Adam7 before filtering.

formatyearlosslesspalettealphaanimationtypical size vs JPEG-Q85
BMP1990partial (v5)≈ 8–20 ×
GIF1987partial✓ (256)1-bit≈ 0.4 × (low colour)
PNG-81996✓ (256)8-bit≈ 0.3 ×
PNG-24/3219968 / 16-bit≈ 1.5–5 ×
JPEG (Q85)19921.0 × (baseline)
$ oxipng -o6 in.png                  # brute-force re-pack, 5–30% smaller
$ pngcrush -reduce -brute in.png out.png   # classic, slower but still useful
$ convert in.png -strip out.png      # ImageMagick — drop metadata chunks
$ pngquant --quality=70-90 in.png    # lossy palette quantisation → PNG-8
$ zopflipng -m in.png out.png        # Google's zopfli, max DEFLATE compression

适用

USE FOR

  • 屏幕截图、录屏帧、UI 设计稿(无损 + 锐边)
  • 透明 logo、PWA 图标、Material 图标
  • 需要稳定 alpha + 跨平台一致性的任何资源
  • 16 bit/channel 工程影像中转(在 EXR 之前的轻量选项)
  • Screenshots, recorded frames, UI design exports (lossless + sharp edges)
  • Transparent logos, PWA icons, Material icons
  • Anything needing reliable alpha and cross-platform consistency
  • A lightweight 16-bit/channel transit format before EXR enters the picture

反适用

AVOID

  • 真实照片:体积比 JPEG / WebP / AVIF 大 5–10 ×
  • 视频帧序列:用 H.264 / AV1 / WebM,不是 APNG
  • 对加载时间极敏感的首屏大图
  • Photographs: 5–10 × larger than JPEG / WebP / AVIF
  • Video frame sequences: use H.264 / AV1 / WebM, not APNG
  • Above-the-fold hero images where bytes matter most
scopebrowserstoolsCLI
PNG ✓✓✓ universal since IE 4 / Mozilla 1.0 ✓✓✓ Photoshop · Figma · Sketch · GIMP · Preview oxipng -o6 · pngquant · zopflipng
父:parent: BMP · GIF (replaces both) 子:children: APNG (animation extension) · MNG (failed animated cousin)

APNG — PNG 偷偷做了动图

APNG — PNG Secretly Grew Frames

YEAR 2004 (Mozilla) · 2017 (W3C) AUTHOR Stuart Parmenter · Vladimir Vukićević (Mozilla) EXT .apng · .png MIME image/apng STD non-PNG-WG → 2017 W3C LOSSY lossless (same as PNG) DEPTH same as PNG (1–16 bit/ch) ALPHA same as PNG (8 / 16-bit) ANIM STATUS mainstream · all major browsers

PNG 工作组说不要,Mozilla 偷偷加了。

PNG WG said no. Mozilla shipped it anyway.

2004 年 Mozilla 想给 Firefox 加一类"加载中"的动态图标:GIF 只剩 256 色看起来很丑,而 PNG 工作组官方动图方案 MNG 又庞大复杂,几乎没人实现。两个 Mozilla 工程师 Stuart Parmenter 和 Vladimir Vukićević 干脆在 PNG 上塞了三个新 chunk:acTL(动画控制)、fcTL(每帧控制)、fdAT(帧数据)。提案发到 PNG 邮件列表,工作组明确拒绝接收——理由是"会破坏 PNG 简洁性"。Mozilla 不为所动,2008 年随 Firefox 3 直接发布;十年后 Apple、Google 跟进,2017 年 W3C 终于回头把它收编为标准。一个被官方拒绝的扩展,反过来被市场和标准追认。

In 2004 Mozilla wanted lightweight loading animations in Firefox: GIF's 256 colours looked ugly and the PNG working group's official MNG was so vast that almost no one implemented it. Two Mozilla engineers, Stuart Parmenter and Vladimir Vukićević, simply added three new chunks to PNG — acTL (animation control), fcTL (per-frame control), fdAT (frame data). They sent the proposal to the PNG mailing list; the working group flatly refused, citing damage to "PNG's simplicity". Mozilla shipped it anyway in Firefox 3 (2008). A decade later Apple and Google followed, and in 2017 the W3C finally adopted APNG as a standard. A rejected extension, ratified later by the market and the spec.

APNG · IHDR · acTL · IDAT · fcTL · fdAT × N · IEND IHDR PNG core acTL NEW IDAT frame 0 · static old decoder stops here ↑ fcTL NEW fdAT ×N · NEW IEND end Critical chunk casing keeps acTL/fcTL/fdAT skippable; first IDAT is the static fallback frame. Old PNG readers see only frame 0; APNG-aware readers continue past the dashed line.
图 4 · APNG chunk 链。IHDR 之后插入新的 acTL(动画控制),第一个 IDAT 仍是合法的静态 PNG 首帧——不支持 APNG 的解码器止于此(图中红色虚线)。其后是 fcTLfdAT 的反复:每帧一个 fcTL 描述位置/时长/disposal,后面紧跟若干 fdAT 携带帧数据。
Fig 4 · The APNG chunk chain. A new acTL sits right after IHDR; the first IDAT is still a perfectly legal static PNG — old decoders stop at the dashed red line. From there on, fcTL + fdAT alternate per frame, with each fcTL describing the frame's position, delay and disposal mode.

技术内核

Technical core

APNG 在 PNG 上做的改动只有三件事:① 三个新 chunk——acTL 携带帧数和循环次数;fcTL 是每帧控制块,描述偏移、宽高、显示时长(分子/分母两个 16-bit)、blend mode 与 disposal mode;fdAT 本质是带 4 字节 sequence 编号的 IDAT,数据段格式完全相同。② 第一帧仍是合法 PNG——把首帧仍写成 IDAT,意味着不支持 APNG 的解码器(早期 Safari、ImageMagick 旧版本)看到的就是一张静态图,向后兼容性极佳。这是 APNG 比 MNG 成功的最大原因之一。③ blend / disposal 模式——blend mode 有 SOURCE(直接覆盖)与 OVER(alpha 合成)两种;disposal mode 有 NONE(保留)、BACKGROUND(清空)、PREVIOUS(还原前一帧)三种,跟 GIF 89a 的 disposal 完全是同一套语义。除此之外,APNG 对色彩空间、滤波、压缩(DEFLATE)的处理与 PNG 一字不差。

APNG only adds three things to PNG. ① Three new chunks: acTL carries frame count and loop count; fcTL is a per-frame control block describing offset, width/height, delay (a 16-bit numerator and denominator), blend mode and disposal mode; fdAT is essentially an IDAT prefixed with a 4-byte sequence number — its data payload format is identical. ② The first frame is still a valid PNG: keeping frame 0 as an IDAT means a decoder that doesn't understand APNG (old Safari, older ImageMagick) just sees a static image. This backward-compatibility trick is the biggest reason APNG won where MNG failed. ③ Blend / disposal modes: blend modes are SOURCE (overwrite) and OVER (alpha composite); disposal modes are NONE (keep), BACKGROUND (clear), PREVIOUS (restore prior frame) — exact same semantics as GIF 89a. For everything else (colour space, filters, DEFLATE), APNG inherits PNG byte for byte.

适用

USE FOR

  • 高质量动图——Twitter / Telegram / 微信表情包
  • 需要真 alpha 通道的动效(GIF 只能 1-bit)
  • 静图回退极重要的场景:不支持 APNG 的浏览器仍能显示首帧
  • High-quality animated stickers (Twitter / Telegram / WeChat)
  • Animations needing real alpha (GIF gives you only 1-bit)
  • Anywhere a static fallback matters: non-APNG decoders still see frame 0

反适用

AVOID

  • 体积敏感场景:同等画质比 WebP / AVIF 大 2–5 ×
  • 长序列 / 视频片段:用 H.264 / AV1 / WebM
  • Bandwidth-tight contexts: 2–5 × larger than WebP / AVIF at equal quality
  • Long sequences or video clips: use H.264 / AV1 / WebM
scopebrowserstoolsCLI
APNG ✓✓✓ Firefox 3+ · Safari 8+ · Chrome 59+ · Edge 18+ ✓✓ GIMP · Photoshop (plugin) · ezgif apngasm in_*.png out.apng · ffmpeg -plays 0 ... out.apng
父:parent: PNG 子:children: conceptually replaces GIF animation · superseded by animated WebP & AVIF Sequence

animated WebP — WebP 的动图分身

animated WebP — WebP's Multi-Frame Twin

YEAR 2010 (WebP) · ~2012 (animation ext) AUTHOR Google EXT .webp MIME image/webp STD Google proprietary LOSSY lossy + lossless (dual mode) DEPTH 8 bit/channel ALPHA 8-bit ANIM STATUS mainstream

WebP 在容器里偷偷塞了多帧——比 PNG 优雅,比 GIF 漂亮。

WebP slipped multiple frames into one container — neater than PNG, prettier than GIF.

静态 WebP 已经在体积上把 GIF 摁在地上摩擦——同一张表情包,WebP 通常只要 GIF 的 1/3。Google 接下来要做的事很自然:把 RIFF 容器从"装一帧"扩展成"装多帧",加一个 VP8X 扩展头声明 alpha / animation / ICC 等 feature flags,再加一个 ANIM 全局动画块和若干 ANMF 帧块。这套扩展在 2012 年附近随 libwebp 0.2 公开,WebP 一夜之间从静图格式变成"比 GIF 小 30%、画质好一个数量级、还有真 alpha"的动图格式。今天 Telegram、WhatsApp 的"高级表情包"几乎都是 animated WebP。

Static WebP already crushed GIF on bytes — the same sticker is typically a third the size in WebP. The next step was obvious: extend the RIFF container from one frame to many. Google added a VP8X extended header to declare feature flags (alpha / animation / ICC), an ANIM global animation block, and a stream of ANMF per-frame blocks. The extension landed around 2012 with libwebp 0.2, and overnight WebP went from a still-image format to one that beats GIF by ~30 % in size, an order of magnitude in quality, and finally adds real alpha. Today's "premium" stickers on Telegram and WhatsApp are almost all animated WebP.

RIFF · WEBP · VP8X · ANIM · ANMF × N RIFF WEBP VP8X flags ANIM bg · loops ANMF frame 1 ANMF frame 2 ANMF N
图 5 · animated WebP 的容器嵌套。最外层是 1991 年的 RIFF,次层 WEBP 子块。VP8X 声明本文件含 animation / alpha / ICC 等 feature flag;ANIM 一次性给出背景色和循环次数;然后每一帧是一个 ANMF 块,内部再嵌一段 VP8(有损)或 VP8L(无损)bitstream。
Fig 5 · How animated WebP nests. The outermost layer is the 1991-vintage RIFF; the WEBP sub-block sits one level in. VP8X declares which feature flags are active (animation / alpha / ICC); ANIM gives the background colour and loop count once; each ANMF then carries one frame, internally wrapping a VP8 (lossy) or VP8L (lossless) bitstream.

技术内核

Technical core

animated WebP 的扩展逻辑可以用三句话概括:① RIFF 容器 + VP8X 扩展头——RIFF 是 Microsoft 1991 年发明的"分块容器"标准(用过 .wav / .avi 都见过它),WebP 直接复用,VP8X 是 WebP 自己加的扩展头,8 字节里第一字节是 feature flags(ICC profile / alpha / EXIF / XMP / animation 各占一位),其余字节给出 24-bit 的画布宽高。② ANIM + ANMF——ANIM 块在文件级别声明背景色和循环次数,ANMF 块在帧级别给出偏移、宽高、duration、blend mode、disposal mode,跟 APNG / GIF 是同一套语义。③ 每帧可独立选编码——WebP 内置两套编码器 VP8(有损,基于运动补偿和 DCT)与 VP8L(无损,基于 LZ77 + 颜色变换 + Huffman),animated WebP 允许逐帧切换:贴纸的纯色背景一帧用 VP8L 无损,人物动画一帧用 VP8 有损,同一文件里混排。

Three sentences cover the entire mechanism. ① RIFF + VP8X header: RIFF is Microsoft's 1991 chunk container (anyone who's opened a .wav or .avi has met it). WebP reuses it verbatim and adds an 8-byte VP8X header — the first byte is a bitfield of feature flags (ICC profile / alpha / EXIF / XMP / animation), and the remainder encodes a 24-bit canvas width and height. ② ANIM + ANMF: ANIM sits at file scope and declares background colour plus loop count; each ANMF then carries per-frame offset, dimensions, duration, blend mode and disposal mode — exact same semantics as APNG and GIF. ③ Per-frame codec choice: WebP ships two encoders, VP8 (lossy, motion-compensation + DCT) and VP8L (lossless, LZ77 + colour transform + Huffman). An animated WebP can switch encoders frame by frame — a sticker's flat background uses lossless VP8L, the character animation uses lossy VP8, all in one file.

适用

USE FOR

  • 现代动图首选 / 表情包 / 短动效
  • 需要 alpha 又要小体积的循环动画
  • 一帧无损一帧有损混排的复杂贴纸
  • The default modern animated-image format · stickers · short loops
  • Looped animation that needs alpha and small bytes
  • Complex stickers mixing lossless and lossy frames in one file

反适用

AVOID

  • Safari < 14(老 iOS):需要 GIF/APNG 兜底
  • 需要超低延迟硬解的视频流——还是用 H.264 / AV1
  • Safari < 14 (older iOS): you'll need a GIF/APNG fallback
  • Latency-critical hardware-decoded video — stick with H.264 / AV1
scopebrowserstoolsCLI
animated WebP ✓✓✓ Chrome 32+ · Firefox 65+ · Safari 14+ ✓✓ Photoshop (plugin) · ezgif · GIMP 2.10+ cwebp · webpmux -frame f1.webp +100 ... -o anim.webp
父:parent: WebP · GIF (animation idea) 子:children: coexists with APNG · superseded by AVIF Sequence

JPEG — 8×8 DCT 三十年统治

JPEG — Three Decades of the 8×8 DCT

YEAR 1992 (ISO/IEC IS 10918-1) AUTHOR Joint Photographic Experts Group EXT .jpg · .jpeg MIME image/jpeg STD ISO/IEC 10918 · ITU-T T.81 LOSSY lossy (a near-lossless mode exists, rarely used) DEPTH 8 bit/channel ALPHA none ANIM none (MJPEG is a separate beast) STATUS the internet's largest image bucket

8×8 的格子,装下了三十年的人类视觉。

An 8×8 grid that held three decades of human vision.

1980 年代末,扫描仪、数码相机、传真机几乎同时崛起,所有人都需要一个能把"自然图像"压到 1/10 体积、人眼又看不太出来的标准。JPEG 委员会用三个事实搭了一条压缩流水线:人眼对亮度比对色度敏感、对低频比对高频敏感、对能量集中的信号(自然图像)有极强冗余可挖。把这三件事翻译成代码,就是 YCbCr + 4:2:0 + 8×8 DCT + 量化——JPEG 因此能在 Q85 这个挡位上把 5 MB 的照片压到 250 KB,而你几乎看不出差。

By the late 1980s scanners, digital cameras and fax machines were arriving in parallel — everyone needed a way to crunch "natural images" to a tenth of their size while the human eye barely noticed. The JPEG committee built a pipeline around three facts: the eye is more sensitive to luma than to chroma, more sensitive to low frequencies than to high, and natural images carry enormous redundancy in their energy distribution. Translated into code, that becomes YCbCr + 4:2:0 + 8×8 DCT + quantisation — and lets JPEG turn a 5 MB photo into 250 KB at Q85 with practically no visible loss.

YCbCr · 4:2:0 SUBSAMPLE RGB Y · full Cb · 1/2 Cr · 1/2 Cb / Cr sampled at half resolution in both axes → 50 % data immediately gone, eye barely notices
图 6a · 第一步:RGB → YCbCr,把"亮度"与"色度"分开。再做 4:2:0 子采样——色度面在水平和垂直方向各砍一半,体积立省 50%。整张照片你的眼睛只会觉得"颜色边缘稍微软了一点",几乎察觉不到。
Fig 6a · Step one: RGB → YCbCr, separating luma from chroma. Then 4:2:0 subsampling — chroma planes are halved on both axes, killing 50 % of the data on the spot. The eye perceives only a slight softening of colour edges; almost no one notices.
8×8 DCT-II · pixels → frequency spatial 8×8 5255616670616473 63596690109856972 6259681131441046673 DCT-II frequency 8×8 DC ↖ (energy concentrated) · AC → ↘ (mostly zero)
图 6b · 第二步:每 8×8 块做一次 DCT-II,把"像素"换成"频率系数"。左上角 DC 系数代表整块平均亮度,右下角是最高频。自然图像的能量天然集中在左上区域——右下大量系数接近 0,正等着量化把它们清零。
Fig 6b · Step two: each 8×8 block runs through a DCT-II, swapping "pixels" for "frequency coefficients". The top-left DC term is the block's average; the bottom-right is the highest frequency. In natural images the energy clusters near the top-left — the bottom-right is mostly near-zero coefficients ready to be quantised away.
QUANT TABLES · standard Q50 luma (Y) 1611101624405161 1212141926586055 1413162440576956 1417222951878062 182237566810910377 243555648110411392 49647887103121120101 7292959811210010399 chroma (Cb/Cr) 1718244799999999 1821266699999999 2426569999999999 4766999999999999 9999999999999999 9999999999999999 9999999999999999 9999999999999999
图 6c · 标准 Q50 量化表(亮度 + 色度)。每个频域系数除以表中对应位置的整数再四舍五入——表里的数从左上(低频)到右下(高频)递增,意味着高频被砍得更狠;色度表更激进,几乎所有中高频位置都是 99。JPEG 压缩的"损失"主要发生在这一步。
Fig 6c · The standard Q50 quantisation tables (luma + chroma). Each frequency coefficient is divided by the integer at the matching position and rounded — the values rise from top-left (low frequency) to bottom-right (high frequency), so high frequencies get crushed harder. The chroma table is more aggressive — nearly every mid- and high-frequency slot is 99. This step is where almost all of JPEG's loss lives.
ZIG-ZAG · DC at #1, AC outward 126715162829 3581417273043 49131826314244 1012192532414554 1120243340465355 2123343947525661 2235384851576062 3637495058596364 → 64 coefs in one stream tail = many 0s → RLE pays off
图 6d · zig-zag 扫描路径。把 8×8 = 64 个系数按"之"字形排成一维序列:DC 在 #1,后面按频率从低到高展开。量化后右下大量系数都是 0,序列的尾部出现长串零——刚好喂给 RLE(run-length)和 Huffman 吃。
Fig 6d · The zig-zag scan path. 64 coefficients get unrolled into a 1-D stream — DC sits at position #1, then frequencies fan out from low to high. After quantisation, the bottom-right is mostly zero, so the tail of the stream becomes long runs of zeros — perfect food for RLE and then Huffman.

技术内核

Technical core

JPEG 的压缩流水线有六环:① RGB → YCbCr——分离亮度与色度,为后续区别对待打基础。② 4:2:0 子采样——色度面 Cb / Cr 在水平和垂直方向各降一半采样率,体积立刻砍掉 50%,人眼几乎无感。③ 切成 8×8 块,每块做 DCT-II——空域换频域,自然图像的能量集中在左上(低频),右下大量系数趋近 0。④ 量化表——亮度表 + 色度表(色度表更激进),把不重要的高频系数除以一个较大的整数后取整,大量系数被清零。这一步是 JPEG 唯一的有损步骤,所有"画质损失"都在此发生。⑤ zig-zag 扫描 + RLE + Huffman——把 64 个系数按"之"字形排成一维序列,尾部一长串零交给 RLE 压缩,剩下的字面量做 Huffman 熵编码,无损。⑥ JFIF / Exif 容器封装——JPEG 标准本身只规定 codec 流(SOI / APPn / DQT / DHT / SOF / SOS / EOI marker),文件格式是另一层:JFIF 1.02(1992)规定了标准的 APP0 元数据,Exif(1995)在 APP1 里塞相机参数。今天你看到的每张 .jpg 几乎都是 "JFIF + Exif 包了一段 JPEG codec 流"。

Six stages make up the JPEG pipeline. ① RGB → YCbCr: split luma from chroma so the rest of the pipeline can treat them differently. ② 4:2:0 chroma subsampling: halve Cb and Cr horizontally and vertically — instantly drops 50 % of the data with virtually no perceptual cost. ③ Split into 8×8 blocks; run DCT-II per block: spatial → frequency domain. Natural-image energy clusters in the top-left (low frequency); the bottom-right is mostly near-zero. ④ Quantisation tables (luma + chroma, chroma being more aggressive): each coefficient is divided by the matching integer and rounded, killing huge swathes of high-frequency information. This is the only lossy step — every visible artefact JPEG ever produces comes from here. ⑤ Zig-zag scan + RLE + Huffman: unroll the 64 coefficients into a 1-D stream so the long zero-tail compresses cleanly under RLE, then Huffman-encode the remaining literals. Lossless. ⑥ JFIF / Exif container: the JPEG spec only defines the codec stream (SOI / APPn / DQT / DHT / SOF / SOS / EOI markers); the file format is a separate layer. JFIF 1.02 (1992) standardised an APP0 metadata segment, Exif (1995) tucked camera metadata into APP1. Almost every .jpg you've ever seen is "JFIF + Exif wrapping a JPEG codec stream".

JPEG ENCODE PIPELINE · 8 stages RGB 8 bit/ch YCbCr colour split 4:2:0 chroma 1/2 −50% data 8×8 split block grid DCT-II space → freq QUANTISE Q 50–95 ★ lossy step zig-zag to 1-D RLE collapse 0s Huffman entropy code JFIF / Exif SOI · APP0..n DQT · DHT · SOS .jpg SOI ... markers ... EOI perceptual layer · YCbCr + 4:2:0 DCT + quantise = where the bytes vanish entropy layer · lossless file layer · JFIF / Exif metadata Knobs: chroma subsample (4:4:4 / 4:2:2 / 4:2:0) · quality Q (50–95) · standard or custom quant tables · baseline vs progressive scan Markers: SOI(start) · APPn(metadata) · DQT(quant tables) · DHT(Huffman tables) · SOF(frame header) · SOS(scan) · EOI(end) Decode is the same in reverse: parse markers → Huffman+RLE → de-quantise → IDCT → YCbCr→RGB → upsample chroma.

图 6 · JPEG 全流程 · RGB → YCbCr 分离 → 4:2:0 子采样(−50%)→ 切 8×8 块 → DCT-II 频域变换 → 量化(★ 唯一有损步骤,Q 控制狠度)→ zig-zag 扫描成 1-D → RLE 压零 → Huffman 熵编码 → JFIF / Exif 包外壳 → .jpg。Q、量化表、子采样比、baseline / progressive 是编码器仅有的几个旋钮。

Fig 6 · The full JPEG pipeline · RGB → YCbCr split → 4:2:0 subsample (−50 %) → 8×8 blocks → DCT-II → Quantise (★ the one and only lossy step, Q controls how brutal) → zig-zag scan → RLE → Huffman → JFIF / Exif wrapper → .jpg. The encoder really only has four knobs: Q, quant tables, subsample ratio, and baseline-vs-progressive scan.

formatyeartypical 1080p photoquality at same size
JPEG Q851992≈ 250 KBbaseline
WebP Q752010≈ 165 KB≈ JPEG Q85
HEIC Q602015≈ 125 KB≈ JPEG Q85
AVIF Q602019≈ 95 KB≈ JPEG Q85
JXL Q902021≈ 85 KB≈ JPEG Q85
$ cjpeg -quality 85 -optimize -progressive in.ppm > out.jpg   # reference libjpeg encoder, progressive scan
$ jpegoptim --max=85 --strip-all in.jpg                       # cap quality at 85, drop all metadata in place
$ mozjpeg cjpeg -quality 85 in.png > out.jpg                  # Mozilla's encoder — 5–10% smaller at same Q
$ exiftool -all= -overwrite_original in.jpg                   # nuke all Exif / GPS / thumbnail metadata

适用

USE FOR

  • 真实照片(自然图像、人像、风景)
  • 真彩渐变、模糊背景、艺术摄影
  • 任何"颜色丰富、连续变化"的内容
  • 需要最大兼容性的场景:每一台设备都能解
  • Real photographs (nature, portraits, landscapes)
  • Truecolor gradients, soft backgrounds, art photography
  • Anything with rich, continuously varying colour
  • Maximum compatibility — every device on Earth decodes JPEG

反适用

AVOID

  • 文字 / 截图 / UI:8×8 块边界会出现明显方格 artifact
  • 线稿 / 卡通 / 像素艺术:锐边附近出现 ringing
  • 需要 alpha 通道的任何场景
  • 需要无损保留每一个像素的工程影像
  • Text / screenshots / UI: visible 8×8 block artefacts
  • Line art / cartoons / pixel art: ringing near sharp edges
  • Anything needing an alpha channel
  • Engineering images where every pixel must survive intact
scopebrowserstoolsCLI
JPEG / JFIF / Exif ✓✓✓ universal — every browser, every OS, every camera ✓✓✓ Photoshop · Lightroom · Figma · Preview · everything cjpeg · jpegoptim · mozjpeg · exiftool
父:parent: none — DCT-II + Huffman + RLE was a fresh assembly 子:children: JPEG-LS · JPEG 2000 · JPEG XR · WebP · HEIC · AVIF · JPEG XL — every one of them is a "JPEG successor"

JPEG-LS — 你没听说过的无损 JPEG

JPEG-LS — The Lossless JPEG You Never Heard Of

YEAR 1997 (ISO/IEC 14495-1) AUTHOR HP Labs · LOCO-I (Weinberger et al.) EXT .jls / .jpgls MIME image/jls STD ISO/IEC 14495 LOSSY lossless + near-lossless DEPTH 2–16 bit ALPHA none (multi-channel handled separately) ANIM none STATUS medical-imaging niche

比 PNG 快 3 倍,但因为不带容器和颜色管理,谁都没记住它。

3× faster than PNG, but no container, no colour management — and no one remembered.

1990 年代中期,医学影像的需求摆在桌面上:CT 和 MRI 一帧就是 12-bit 灰度大图,一次扫描几百帧——必须无损,但要比 PNG 简单、要比 JPEG 的 lossless mode 实用。HP Labs 的 Marcelo Weinberger 团队拿出了 LOCO-I(LOw COmplexity LOssless COmpression for Images)算法:用三像素中位数预测器(MED)估算下一个像素,把残差送进 Golomb-Rice 编码,在长平滑区域切到 RLE。1997 年 ISO/IEC 14495-1 正式发布,实测比 PNG 压缩率略好、解码速度快 3-5 倍——但它没有自带容器(只是裸 bitstream + 极简 marker),没有 ICC profile,没有元数据,没有 alpha,Web 浏览器全部当它不存在。最后只有医学影像活到了今天:DICOM 把 JPEG-LS 当成它的标准内嵌编码之一。

By the mid-1990s the medical-imaging world had a clear ask: CT and MRI frames were 12-bit greyscale, hundreds of slices per scan, and they had to be lossless — but simpler than PNG and more usable than JPEG's lossless mode. Marcelo Weinberger and team at HP Labs produced LOCO-I (LOw COmplexity LOssless COmpression for Images): a median-of-three predictor (MED) estimates each pixel, the residual goes into Golomb-Rice coding, and long flat runs switch to RLE. ISO/IEC 14495-1 shipped in 1997, beating PNG slightly on ratio and decoding 3–5× faster — but JPEG-LS arrived as a bare bitstream with minimal markers: no container, no ICC profile, no metadata, no alpha. Browsers ignored it entirely. Only medical imaging kept it alive — DICOM still embeds JPEG-LS as one of its standard encodings.

MED · MEDIAN EDGE DETECTOR c b a x predict x from a · b · c if c >= max(a, b): predict = min(a, b) if c <= min(a, b): predict = max(a, b) otherwise: predict = a + b - c
图 7 · MED 预测器。当前像素 x 由其上方 b、左方 a、左上 c 三个邻居预测——根据 ca/b 极值的关系三选一,本质上是在猜测当前位置是水平边缘、垂直边缘还是平面。预测出来的值与真实值的差(残差)再送进 Golomb-Rice 编码——这套加法运算简单到 90 年代医院 CT 设备的弱 CPU 也能跑得动。
Fig 7 · The MED predictor. The current pixel x is estimated from neighbours b (above), a (left) and c (top-left) — three branches pick whichever fit, essentially guessing whether the local context is a horizontal edge, a vertical edge, or a flat surface. The residual (predicted minus actual) is then Golomb-Rice coded. The arithmetic is so simple that even the weak CPUs in 1990s hospital CT scanners could keep up.

技术内核

Technical core

JPEG-LS 的精彩之处在于"用三件极简的事打败了 PNG"。① MED 预测器——只用左、上、左上三个像素就能判断当前像素位置在水平边、垂直边还是平坦区:c ≥ max(a,b) 说明这是从右上往下的边,预测取 min(a,b);c ≤ min(a,b) 反向,取 max(a,b);否则就是平坦区,预测取 a+b-c(即沿梯度延伸)。预测准了,残差就接近 0。② Golomb-Rice 熵编码——预测残差大致服从拉普拉斯/几何分布,Golomb-Rice 是这种分布的最优前缀码:把残差除以 2^k 拆成商和余数,商用 unary 码(若干个 1 加一个 0),余数用 k 位定长。参数 k 在编码过程中根据上下文自适应,完全跳过了 Huffman 表的构造,无需多遍扫描。③ Run-length mode——当解码器探测到连续多个像素被相同的上下文预测、且残差全部为 0 时,自动切换到 RLE 模式直接编码 run length——这是它在医学灰度图(大量黑底)和文档扫描上完胜 PNG 的关键。整个 codec 没有 DCT、没有变换、没有量化(无损模式),数学上接近"算术替代变换"。

JPEG-LS beats PNG with three tiny ideas. ① MED predictor: just three neighbouring pixels (left, above, top-left) decide whether the current pixel sits on a horizontal edge, vertical edge or smooth surface. c ≥ max(a,b) picks min(a,b); c ≤ min(a,b) picks max(a,b); otherwise a + b − c (planar extrapolation). When the predictor is right, the residual is near zero. ② Golomb-Rice entropy coding: residuals roughly follow a Laplacian / geometric distribution, and Golomb-Rice is the optimal prefix code for it — divide the residual by 2k, encode the quotient in unary (k ones plus a terminating zero) and the remainder in k bits flat. The parameter k adapts per context during encoding, so there's no Huffman table to construct and no extra pass over the data. ③ Run-length mode: when the decoder sees consecutive pixels predicted by the same context with zero residuals, it switches to RLE and encodes the run length directly — the move that destroys PNG on medical greyscales (mostly black background) and document scans. The whole codec has no DCT, no transform, no quantisation (in lossless mode); it's almost pure arithmetic replacing transforms.

适用

USE FOR

  • 医学影像 DICOM 内嵌(CT / MRI 标准无损编码)
  • 高速无损归档:解码比 PNG 快 3–5×
  • 嵌入式设备无损相机(算力极低)
  • 文档扫描:大平面 + 边缘的混合内容
  • DICOM medical imaging (a standard lossless encoding for CT / MRI)
  • High-throughput lossless archival — 3–5× faster decoding than PNG
  • Embedded lossless cameras with very tight CPU budgets
  • Document scans — flat regions plus sharp edges

反适用

AVOID

  • Web:浏览器零原生支持
  • 需要 alpha 通道
  • 需要内嵌 ICC profile / EXIF / 元数据
  • The web — zero native browser support
  • Anything requiring an alpha channel
  • Anything requiring embedded ICC profiles / EXIF / metadata
scopebrowserstoolsCLI
JPEG-LS none DCMTK · CharLS · MATLAB / Python (pylibjpeg) charls -e in.pgm out.jls · dcmcjpls in.dcm out.dcm
父:parent: JPEG (shared standardisation track, independent algorithm) 子:children: indirectly informed PNG's filter strategy · still alive inside DICOM today

JPEG 2000 — 小波变换的悲壮失败

JPEG 2000 — The Tragic Defeat of the Wavelet

YEAR 2000 (ISO/IEC 15444-1) AUTHOR JPEG Working Group EXT .jp2 / .jpx / .j2k MIME image/jp2 STD ISO/IEC 15444 LOSSY lossy + lossless (same algorithm) DEPTH 1–38 bit ALPHA ANIM none STATUS cinema (DCI) · satellite · medical — dead on the web

技术比 JPEG 强,专利让它寸步难行。

Technically beats JPEG. Patents tied its feet.

90 年代末 JPEG 的痛点是清楚的:8×8 块边界肉眼可见、没有 alpha、压缩等级单一、metadata 设计落后。JPEG 工作组想用一个全新算法一次性解决——结果就是 JPEG 2000:用整图 DWT(离散小波变换) 取代 8×8 DCT,无块边界、天然多分辨率;用 EBCOT(Embedded Block Coding with Optimized Truncation) 做编码,可以按"质量层、分辨率层、组件层、空间区域"任意子集解码——同一个 .jp2 文件,你可以只取低分辨率缩略,也可以只取一个画面区域。技术上完胜 JPEG。但有两件事压垮了它:① 解码复杂度比 JPEG 高 10 倍以上,移动设备根本跑不动;② 标准里嵌了几十项专利(虽然多数 RAND 免费),浏览器厂商出于法律风险拒绝实现。Mozilla 和 Google 多次明确说"不"。最后只在三个不在意延迟和算力的领域活下来:数字影院(DCI 强制使用)、卫星图像、医学影像。Safari 是唯一原生支持的浏览器——这是 Apple ImageIO 框架顺带带的,Apple 自己也不主推。

JPEG's 1990s pain points were obvious: visible 8×8 block boundaries, no alpha, only one compression curve, dated metadata. The JPEG WG tried to fix all of it with a clean-sheet algorithm — JPEG 2000. Replace the 8×8 DCT with a whole-image discrete wavelet transform (no block edges, naturally multi-resolution). Replace the entropy coder with EBCOT (Embedded Block Coding with Optimised Truncation), which lets a decoder grab any subset of quality / resolution / component / region from the same .jp2 file — pull just a thumbnail, or just one ROI. Technically it crushes JPEG. Two things broke it. ① Decoding cost is 10× JPEG or more — mobile silicon could not keep up. ② The standard sits on dozens of patents (most RAND-free, but the legal cloud was real), and browser vendors refused to implement it. Mozilla and Google both said no on record. JPEG 2000 survived only in three latency-insensitive, compute-rich worlds: digital cinema (DCI mandates it), satellite imagery, and medical imaging. Safari is the only browser that ships native support — and even that came along for free with Apple's ImageIO framework. Apple never promoted it.

DWT · 3-LEVEL SUBBAND DECOMPOSITION HL₁ LH₁ HH₁ HL₂ LH₂ HH₂ HL₃ LH₃ HH₃ LL₃ LL = low-low (DC / thumbnail) HL = horizontal detail LH = vertical detail HH = diagonal detail 3 recursions → 8× thumbnail "for free" decode any subset
图 8 · 三级 DWT 子带金字塔。每一级把图像分解成 LL(低频/缩略)、HL(水平细节)、LH(垂直细节)、HH(对角细节)四个子带,然后在 LL 上递归。三级递归后,最左上角那一小块 LL₃ 就是 1/8 大小的天然缩略图——无需重新解码原图就能拿到。这正是"按需取分辨率"的物理基础:解码器要 1/8 缩略只读 LL₃,要 1/4 加上 LH₂/HL₂/HH₂,以此类推。
Fig 8 · A 3-level DWT subband pyramid. Each level splits the image into LL (low-frequency / thumbnail), HL (horizontal detail), LH (vertical detail) and HH (diagonal detail), then recurses on LL. After three levels, the tiny top-left LL₃ is a free 1/8-scale thumbnail — no re-decoding required. This is the physical basis of "decode any resolution you want": grab just LL₃ for 1/8, add the level-2 subbands for 1/4, and so on.

技术内核

Technical core

JPEG 2000 的技术结构有四件事值得记住。① DWT 小波变换——替代 DCT。无损模式用 5/3 整数小波(可逆),有损模式用 9/7 浮点小波(更高效)。整图变换没有 8×8 块边界,所以彻底消除了 JPEG 那种"打格子"的 artifact;同时小波天然多分辨率——见上图。② tile + code-block + EBCOT 三层切分——大图先按 tile(典型 256×256 或 1024×1024)分块独立处理,tile 内部 DWT 后的每个子带再切成 code-block(典型 64×64),EBCOT 对每个 code-block 做位平面编码 + 算术编码,最后 R-D 优化决定哪些位平面截断。③ quality / resolution / component / position progression——同一个 .jp2 文件可以按四种顺序组织码流,解码器拿到任意前缀就能解出对应的一份"低质量但完整"或"高质量但单一分辨率"或"单一区域"的图像。这是 IIIF(图书馆/博物馆高分辨率扫描)的核心能力。④ 同算法既无损又有损——只通过量化步长切换,不像 JPEG / JPEG-LS 是两个独立标准。

Four pieces are worth remembering. ① Discrete wavelet transform replaces the DCT. Lossless mode uses the reversible 5/3 integer wavelet; lossy mode uses the 9/7 floating-point wavelet (higher efficiency). Whole-image transform = no 8×8 block edges = no JPEG-style tiling artefacts. The wavelet is also naturally multi-resolution (see figure above). ② tile + code-block + EBCOT three-level partitioning. Large images are first split into tiles (typically 256×256 or 1024×1024), each tile is wavelet-transformed, each subband is split into code-blocks (typically 64×64), and EBCOT bit-plane codes each block with arithmetic coding before R-D optimisation decides which bit-planes to truncate. ③ Quality / resolution / component / position progression: a single .jp2 can order its codestream four different ways, and any prefix the decoder receives yields either a "low-quality but complete" or "high-quality but single-resolution" or "single-region" image. This is the core capability behind IIIF (the library / museum high-resolution scan protocol). ④ One algorithm, both lossless and lossy — switching is just a matter of the quantisation step, not a separate standard like JPEG vs JPEG-LS.

适用

USE FOR

  • DCI 数字影院(标准强制 — 你影院看的每一帧都是 .j2k)
  • 卫星 / 遥感 / 航拍超大图(按需取分辨率)
  • 医学影像 DICOM 的"高保真无损"选项
  • 文化遗产高分辨率扫描(IIIF 图像服务器)
  • DCI digital cinema (mandatory — every frame in your theatre is .j2k)
  • Satellite / remote-sensing / aerial gigapixel imagery (decode-on-demand)
  • DICOM medical imaging when "high-fidelity lossless" is required
  • Cultural-heritage high-resolution scans (IIIF image servers)

反适用

AVOID

  • Web — 除 Safari 外全军覆没
  • 移动端 / 任何低算力解码场景
  • 需要快速预览的桌面应用(解码慢)
  • The web — every browser except Safari refuses to ship it
  • Mobile / any low-compute decoding context
  • Desktop apps that need snappy thumbnails — decoding is slow
scopebrowserstoolsCLI
JPEG 2000 Safari only · Chrome / Firefox / Edge ✓✓ Photoshop · GIMP · Preview · ImageMagick opj_compress -i in.png -o out.jp2 · kdu_compress (commercial)
父:parent: JPEG (intended successor that never inherited) 子:children: JPX (scientific variant) · survives in DCI / DICOM / IIIF

JPEG XR — 微软的最后一次努力

JPEG XR — Microsoft's Last Attempt

YEAR 2006 (HD Photo) · 2009 (ISO/IEC 29199) AUTHOR Microsoft EXT .jxr / .hdp / .wdp MIME image/jxr STD ISO/IEC 29199 LOSSY lossy + lossless DEPTH 8 / 16 / 32 bit · float HDR ALPHA ANIM none STATUS near-dead — Edge Legacy only · removed in Chromium Edge

微软第一个支持 HDR 32-bit float 的 web 格式,但 Chrome 没要它。

Microsoft's first 32-bit float HDR web format. Chrome said no.

2006 年微软看着 Web 上的图片格式仍然是 JPEG / GIF / PNG 三件套,觉得机会来了:推一个比 JPEG 强、带 alpha、支持 HDR 浮点、解码比 JPEG 2000 快的"下一代"web 图片格式。原名 HD Photo / Windows Media Photo,2009 年通过 ISO/IEC 29199 标准化为 JPEG XR(XR = eXtended Range)。技术上确实漂亮:16×16 大块 PCT 变换比 JPEG 的 8×8 DCT artifact 更不可见、原生支持 RGBE 和 scRGB 的 32-bit float HDR、无损与有损共用算法。微软在 Internet Explorer 9 / Edge Legacy 里直接内置了原生支持。但是——Chromium 拒绝实现,Mozilla 拒绝实现。理由很直白:"我们已经在押注 WebP / AVIF,不想为一个微软推的格式增加攻击面"。Edge 在 2018 年放弃自己的渲染引擎转 Chromium 后,JPEG XR 的最后一个原生支持者也消失了。讽刺的是,这套"微软推格式 - Chrome 拒绝实现 - 格式死亡"的剧本,后来被 Google 反过来用在了 WebP 推广上——你推什么我接什么,我推什么你最好接。

By 2006 Microsoft surveyed the web and saw JPEG / GIF / PNG still ruling the field. They saw a gap: ship a next-generation format that beats JPEG, adds alpha, supports HDR floats, and decodes faster than JPEG 2000. Originally HD Photo / Windows Media Photo, it was standardised in 2009 as JPEG XR ("eXtended Range") under ISO/IEC 29199. The technology was genuinely good: a 16×16 photo core transform (PCT) replaces JPEG's 8×8 DCT, with much less visible blocking; native support for RGBE and scRGB 32-bit float HDR; lossless and lossy sharing one algorithm. Microsoft baked native support into Internet Explorer 9 and Edge Legacy. But Chromium refused. Mozilla refused. The reasoning was blunt: "we're already betting on WebP / AVIF; we don't want extra attack surface for a Microsoft-pushed format." When Edge gave up its own rendering engine and switched to Chromium in 2018, the last browser with native JPEG XR support vanished. The painful irony: the "Microsoft pushes a format → Chrome refuses → format dies" playbook was later inverted by Google for WebP — what you push, I'll accept; what I push, you'd better accept.

BLOCK SIZE · LARGER = FEWER VISIBLE EDGES 16 × 16 PCT JPEG XR · larger blocks 8 × 8 DCT JPEG · 4 visible edges/here larger blocks → fewer block boundaries → fewer visible artefacts at low Q
图 9 · 块大小对比。同一片区域,JPEG XR 用一个 16×16 PCT 块覆盖(左,蓝边),JPEG 要拆成 4 个 8×8 DCT 块(右,红边——内部 4 条红线就是肉眼可见的 blocking artifact 来源)。块越大,边界越少,低质量(Q < 50)时画面就越平滑。这是 JPEG XR 视觉上比 JPEG 干净的核心原因。
Fig 9 · Block-size comparison. Across the same physical area, JPEG XR covers it with a single 16×16 PCT block (left, blue border), while JPEG must split it into four 8×8 DCT blocks (right, red borders — those four red lines inside are exactly where the visible blocking artefact lives). Larger blocks mean fewer boundaries, which means smoother images at low quality (Q < 50). That's the core reason JPEG XR looks visually cleaner than JPEG.

技术内核

Technical core

JPEG XR 的技术设计有三个亮点。① 整数 16×16 PCT(Photo Core Transform)——本质上是一个类 DCT 的整数变换,但块更大、内部还有一层 4×4 子变换做"重叠"(lapped transform),让块与块之间不再有硬边界。同等质量下,JPEG XR 的 blocking artifact 比 JPEG 弱得多,但解码复杂度只比 JPEG 高一点点(远低于 JPEG 2000 的 10×)。② 原生 HDR float 支持——这是 JPEG XR 最超前的部分。它直接编码 RGBE(共享指数 32-bit)和 scRGB 浮点,不需要色调映射就能存高动态范围内容。这比 HEIC / AVIF 推广 HDR 早了将近十年——但当时显示器和操作系统都没准备好,没人用得上。③ 共享熵编码思路——熵编码部分仍然用类 JPEG 的"块+扫描+游程+熵"路径,所以软件实现成本低,微软自己的参考实现一千多行 C 就够了。这跟 JPEG 2000 几万行的复杂度相比,工程上确实"够轻"——但终究敌不过浏览器厂商的政治意愿。

JPEG XR has three technical strengths. ① Integer 16×16 PCT (Photo Core Transform) — essentially a DCT-like integer transform with a larger block, plus an inner 4×4 sub-transform that does a lapped overlap, killing hard block edges between adjacent macro-blocks. At equal quality JPEG XR shows much weaker blocking than JPEG, while costing only marginally more to decode (nowhere near JPEG 2000's 10×). ② Native HDR float support — the most forward-looking piece. It encodes RGBE (shared-exponent 32-bit) and scRGB floating-point directly, storing high-dynamic-range content without tone-mapping. This predated HEIC's and AVIF's HDR push by nearly a decade — but in 2006 neither displays nor operating systems were ready, and nobody had a workflow for it. ③ Shared entropy-coding lineage — the entropy back end is still a JPEG-style "block + scan + run-length + entropy" pipeline, so implementations are small. Microsoft's own reference implementation is barely a thousand lines of C — far lighter than JPEG 2000's tens of thousands. Engineering cost wasn't the problem. Browser-vendor politics was.

适用

USE FOR

  • (历史) Windows 7 Photo Viewer 默认支持的高质量缩略图
  • (历史) Office 2010+ 内置 HD Photo 编辑
  • 研究 / 兼容老 Windows 资源时
  • (historical) High-quality thumbnails in Windows 7 Photo Viewer
  • (historical) HD Photo editing built into Office 2010+
  • Research / interoperating with legacy Windows assets

反适用

AVOID

  • 2026 任何现代场景:HEIC / AVIF / JPEG XL 全面替代
  • Web — 没有任何主流浏览器原生支持
  • Any modern 2026 scenario — HEIC / AVIF / JPEG XL fully replace it
  • The web — no major browser ships native support
scopebrowserstoolsCLI
JPEG XR none (Edge Legacy only · removed in Chromium Edge) Photoshop (plugin) · Windows Photos (legacy) JxrEncApp -i in.tif -o out.jxr · JxrDecApp
父:parent: JPEG 子:children: indirectly informed HEVC's intra coding · fully superseded by HEIC / AVIF

WebP — Google 把 VP8 帧内拿来做图

WebP — Google Carved an Image Format Out of a Video Frame

YEAR 2010 AUTHOR Google (acquired On2 Technologies, 2009) EXT .webp MIME image/webp STD Google 私有 → 事实标准 (RFC 6386 covers the underlying VP8) LOSSY VP8 intra (lossy) + VP8L (lossless) DEPTH 8 bpc ALPHA ✓ (8-bit, ALPH chunk) ANIM ✓ (ANIM + ANMF chunks) STATUS 主流 · Chrome 32+ / Firefox 65+ / Safari 14+ / Edge 18+

把 VP8 视频的一帧抠出来当图片,体积砍掉 30%。

Took one frame out of a VP8 video, shaved 30% off image size.

2010 年的 Google 看着 web 图片世界,觉得三件套(JPEG / PNG / GIF)中间还有一道明显的"裂缝":没有一种格式能同时满足"比 JPEG 小 30%、比 PNG 小 26%、还能动图 + alpha"。Google 当时刚刚在 2009 年用 1.246 亿美元收购了视频编码公司 On2 Technologies,手里握着一颗刚开源的 VP8 视频 codec——VP8 的 intra-frame(I 帧) 已经具备完整的图像帧内编码能力。Google 工程师的算盘很直接:与其重新发明轮子,不如直接把 VP8 的一帧拿出来,套一层 RIFF 容器,就是一种新的图片格式。WebP 由此诞生——它是历史上第一个"视频 codec 直接派生为图片格式"的工业级例子,后来 HEIC / AVIF 都走了完全相同的路线。

In 2010 Google looked at the web's image landscape and saw a clear gap in the JPEG / PNG / GIF triumvirate: nothing was simultaneously "30% smaller than JPEG, 26% smaller than PNG, and capable of both animation and alpha." Having just paid $124.6 million in 2009 to acquire the video-codec company On2 Technologies, Google now owned the VP8 video codec — and a VP8 intra-frame (I-frame) is already a complete still-image encoding pipeline. The Googlers did the obvious thing: pull out a single VP8 frame, wrap it in a RIFF container, ship it as a new image format. WebP was born — historically the first industrial-scale example of "video codec directly repurposed into still-image format". HEIC and AVIF later took the exact same playbook.

VP8 INTRA · 4 PREDICTION MODES H · horizontal copy left col → V · vertical copy top row ↓ DC · average μ mean(L+T) TM · True Motion TL L + T − TL 每个 4×4 块挑一种 mode,只编码"残差" block picks one mode, only the residual is coded VP8 also has 6 more directional modes (LD/RD/VR/VL/HD/HU) — total 10 for 4×4 blocks
图 10a · VP8 帧内 4 种基础预测。块拿左列、上行、或左上角已经解出来的像素去"猜"自己,然后只编"猜错的差值"。这是 VP8(以及后来 HEVC / AV1)能比 JPEG 小一截的核心原因——JPEG 完全没有空间预测这一步。
Fig 10a · VP8's four base intra-prediction modes. Each block uses the already-decoded pixels from the left column, top row, or top-left corner to "guess" itself, then only the prediction residual gets encoded. This is the core reason VP8 (and later HEVC / AV1) beats JPEG on size — JPEG has no spatial-prediction step at all.

技术内核

Technical core

WebP 内部其实是两个完全独立的格式,共用一个 .webp 后缀和一个 RIFF 外壳。① VP8 intra-frame(有损):4×4 / 16×16 块预测(共 10 + 4 种 intra mode)→ 类 DCT 整数变换 → 量化 → boolean arithmetic coding(算术编码)。预测让"猜得准的部分不用传",算术编码比 Huffman 多挤出 5-15% 体积——这是 WebP 比 JPEG 小 30% 的两大功臣。② VP8L(无损):跟 VP8 一点关系都没有,是 Google 自己写的一套独立无损算法——14 种 spatial predictor + color cache(用 hash table 缓存最近用过的颜色)+ LZ77 + Huffman。在自然图像上比 PNG 小 26%,但编码慢 5-10×。③ RIFF 容器:借用微软 Wave 音频用过的 RIFF 格式——文件头是 RIFF<size>WEBP,后面跟 chunk 序列:VP8X(全局信息)/ VP8(有损主帧)/ VP8L(无损主帧)/ ALPH(独立 alpha 通道)/ ANIM + ANMF(动图)/ ICCP(色彩配置)/ EXIF / XMP。④ 独立 alpha:lossy 主帧不带 alpha,alpha 走单独的 ALPH chunk,可以选择无损 lossless 或有损 lossy 编 alpha——这是 WebP 比 JPEG + PNG 拼凑方案精巧的地方。⑤ animated WebP:ANIM 设全局参数(背景色 / 循环次数), ANMF 每帧带 disposal / blend / xy offset,逻辑跟 GIF 完全同源,但每帧用 VP8 / VP8L 编。

WebP is, in fact, two unrelated formats sharing a .webp extension and a RIFF wrapper. ① VP8 intra-frame (lossy): 4×4 / 16×16 block prediction (10 + 4 intra modes) → DCT-like integer transform → quantise → boolean arithmetic coding. Prediction means "the easy-to-guess parts don't need to ship" and arithmetic coding squeezes out another 5–15 % over Huffman — together those are why WebP runs ~30 % smaller than JPEG. ② VP8L (lossless): unrelated to VP8 — a separate lossless codec Google wrote from scratch — 14 spatial predictors + a color cache (hash-tabling recently-used colours) + LZ77 + Huffman. ~26 % smaller than PNG on natural images but 5–10 × slower to encode. ③ RIFF container: borrowed from Microsoft's Wave audio — the file starts with RIFF<size>WEBP, then a sequence of chunks: VP8X (global info) / VP8 (lossy main frame) / VP8L (lossless main frame) / ALPH (separate alpha channel) / ANIM + ANMF (animation) / ICCP (color profile) / EXIF / XMP. ④ Separate alpha: lossy main frames don't carry alpha; alpha lives in a dedicated ALPH chunk that can itself be encoded losslessly or lossily — much cleaner than JPEG + PNG patchwork. ⑤ animated WebP: ANIM sets the globals (background colour, loop count), each ANMF frame carries disposal / blend / xy-offset just like GIF, but each frame is itself VP8 or VP8L.

VP8L · 14 PREDICTORS + COLOR CACHE 14 spatial predictors + avg + paeth + select …14 total color cache · hash → #FF1A2B → #2B6FAA → #4A8A3E → #FDFBF3 → #15171C 2^(1..11) entries predictor 杀图像冗余 · color cache 杀重复颜色 · 后接 LZ77 + Huffman
图 10b · VP8L 的两个核心武器。左:14 种空间 predictor(方向 / 平均 / Paeth 风格 / Select 等)负责"猜下一个像素的颜色";右:color cache 是一个 2^k 大小的 hash table,缓存最近用过的颜色,命中时只发一个短索引。两层杀完冗余,再交给 LZ77 + Huffman 做最终压缩。
Fig 10b · VP8L's two main weapons. Left: 14 spatial predictors (directional, average, Paeth-style, select-type, etc.) "guess" the next pixel. Right: a color cache — a 2^k-entry hash table of recently-seen colours; on a hit only a tiny index is emitted. Both layers strip redundancy before LZ77 + Huffman do the final pack.
WebP LOSSY ENCODE PIPELINE · VP8 INTRA-FRAME REUSED AS STILL IMAGE YUV 4:2:0 RGB → YUV chroma sub- sample 50% SPLIT 16×16 macro → 4×4 sub tile decision INTRA PRED H · V · DC · TM + 6 directional 10 modes total TRANSFORM DCT-like int 4×4 + Walsh-Hadamard on residual QUANTISE ★ Q level 0–100 only lossy step default Q = 75 BOOLEAN ARITH 5–15% < Huff VP8 native RIFF WRAP VP8X · VP8 · [ALPH] · [ICCP] · [EXIF] + optional ANIM / ANMF for animation .webp alt: VP8L lossless path RGBA → 14 predictors → color cache → LZ77 + Huffman → VP8L chunk Knobs: -q 0–100 · -m 0–6 (compression effort) · -af (auto-filter) · -alpha_q 0–100 · -lossless / -near_lossless · -mt (multithread)

图 10 · WebP 全流程(lossy 主路径) · RGB → YUV 4:2:0 → 16×16/4×4 切块 → intra 预测(10 种)→ DCT-like 整数变换 → 量化(★ 唯一有损步骤,Q 0-100)→ boolean arithmetic 编码 → RIFF 包外壳(VP8X + VP8 + 可选 ALPH/ICCP/EXIF/ANIM)→ .webp。无损路径走另一条线:VP8L 的 14 predictor + color cache + LZ77 + Huffman。

Fig 10 · The full WebP pipeline (lossy main path) · RGB → YUV 4:2:0 → split into 16×16 / 4×4 blocks → intra prediction (10 modes) → DCT-like integer transform → quantise (★ the only lossy step, Q 0–100) → boolean arithmetic coding → RIFF wrap (VP8X + VP8 + optional ALPH / ICCP / EXIF / ANIM) → .webp. The lossless path goes elsewhere: VP8L's 14 predictors + color cache + LZ77 + Huffman.

RIFF · WEBP · CHUNK TREE RIFF <size> WEBP VP8X · global flags + canvas size VP8 (lossy) VP8L (lossless) · OR ALPH · independent alpha (lossy + lossless) ANIM · loops/bg ANMF × N · frames ICCP · EXIF · XMP (optional)
图 10c · WebP 的 RIFF 容器结构。最外层是 RIFF<size>WEBP 12 字节文件头;再里头 VP8X 描述全局 flag + 画布尺寸;然后 VP8(lossy)和 VP8L(lossless)二选一;ALPH 单独装 alpha(可独立选有损或无损);ANIM + 多个 ANMF 用于动图;ICCP / EXIF / XMP 是可选 metadata。
Fig 10c · WebP's RIFF container layout. Outer 12-byte header is RIFF<size>WEBP; VP8X holds global flags + canvas size; then either VP8 (lossy) or VP8L (lossless); ALPH carries alpha independently (itself lossy or lossless); ANIM + multiple ANMF chunks make up animation; ICCP / EXIF / XMP are optional metadata.
codecencode timedecode timetypical Q1080p photo
JPEG (mozjpeg)1.0 ×1.0 ×85≈ 250 KB
WebP (cwebp)≈ 3 ×≈ 1.5 ×75≈ 165 KB
AVIF (avifenc)≈ 50 ×≈ 3 ×60≈ 95 KB
SAME 1080p PHOTO · ≈ EQUAL VISUAL QUALITY JPEG Q85 250 KB WebP Q75 165 KB AVIF Q60 95 KB 0 125 KB 250 KB WebP −34% vs JPEG · AVIF −62% vs JPEG · −42% vs WebP
图 10d · 同一张 1080p 照片在视觉接近的 Q 下三种格式的体积。WebP 砍掉 JPEG 三分之一,AVIF 在 WebP 基础上再砍掉四成。这正是"为什么 WebP 不是终点"——AVIF 后来居上的根本理由。
Fig 10d · The same 1080p photograph at visually-equivalent Q levels across three codecs. WebP cuts a third off JPEG; AVIF then takes another 40 % off WebP. This is exactly why "WebP isn't the finish line" — and the core reason AVIF eventually overtook it.
$ cwebp -q 75 in.png -o out.webp                        # lossy default · Q 75 ≈ JPEG Q85 quality
$ cwebp -lossless in.png -o out.webp                    # VP8L lossless path · 5–10× slower
$ cwebp -near_lossless 60 in.png -o out.webp            # lossy preprocessing then lossless encode
$ cwebp -q 80 -alpha_q 100 in.png -o out.webp           # keep alpha lossless even with lossy RGB
$ webpmux -frame f1.webp +100 -frame f2.webp +100 \
          -loop 0 -o anim.webp                          # build animated WebP from frames
$ dwebp out.webp -o decoded.png                         # decode back to PNG for inspection

适用

USE FOR

  • 2026 web 图片首选——所有现代浏览器都支持(Chrome 32+ / Firefox 65+ / Safari 14+ / Edge 18+)
  • 需要 alpha 的产品图、电商主图(替代 PNG-24)
  • 需要 animation 的 UGC、表情、loading(替代 GIF,体积只有 1/4)
  • CDN 自动转换 pipeline(Cloudinary、Fastly、Imgix 都支持)
  • The default web image format in 2026 — every modern browser ships it (Chrome 32+, Firefox 65+, Safari 14+, Edge 18+)
  • Product photos and e-commerce hero shots that need alpha (replaces PNG-24)
  • UGC stickers, reactions, loading anims (replaces GIF at ¼ the size)
  • CDN auto-conversion pipelines (Cloudinary, Fastly, Imgix all support it)

反适用

AVOID

  • iOS < 14 设备(2014 年前的 iPhone 5/5s 等)
  • 邮件附件(很多邮件客户端、Outlook 老版本不渲染)
  • 设计交付 / 印刷输出(用 PNG / TIFF / PSD)
  • 需要更高压缩率的现代场景——直接用 AVIF / JXL
  • iOS < 14 devices (pre-2014 iPhone 5 / 5s and friends)
  • Email attachments — many clients and older Outlook versions still won't render WebP
  • Design hand-off / print output — use PNG / TIFF / PSD instead
  • Modern scenarios that need maximum compression — go straight to AVIF / JXL
scopebrowserstoolsCLI
WebP (lossy + lossless + alpha + anim) Chrome 32+ · Firefox 65+ · Safari 14+ · Edge 18+ · Opera 19+ Photoshop (24+ native) · Sketch · Figma · Squoosh · ImageMagick · GIMP · Affinity cwebp / dwebp / webpmux / gif2webp (libwebp by Google)
父:parent: VP8 (Google / On2 video codec, 2008) 致敬:tribute: AVIF (same idea: video intra-frame → still image) 派生:derived: animated WebP · WebP2 (R&D, never shipped)

HEIC / HEIF — 苹果与专利墙

HEIC / HEIF — Apple and the Patent Wall

YEAR 2015 (HEIF, ISO/IEC 23008-12) · 2017 iOS 11 default capture AUTHOR MPEG (HEIF container) · Apple (HEIC instantiation) EXT .heic / .heif MIME image/heic · image/heif STD ISO/IEC 23008-12 (HEIF) + HEVC (payload, ISO/IEC 23008-2) LOSSY HEVC intra (lossy) · lossless mode exists but rare DEPTH 8 / 10 / 12 bit ALPHA ANIM ✓ + Live Photo 混合容器(.heic + .mov 双文件) STATUS iOS 默认 / macOS 原生 · Web 几乎不支持(Safari ✓ · Chrome / Firefox ✗)

技术上是 AVIF 的爸爸,专利上是 AVIF 的反例。

Technically the parent of AVIF; legally the cautionary tale.

2015 年 MPEG 把 HEVC(H.265 视频)的帧内编码能力封装成一个图像容器规范,叫 HEIF(High Efficiency Image File Format),标准号 ISO/IEC 23008-12。思路与 WebP 完全同源:用现代视频 codec 的 intra-frame 做静态图像编码,用 ISOBMFF(MP4 同根的容器)装。HEIF 是个"容器规范",真正的像素 codec 由 payload 决定——用 HEVC 装就叫 HEIC(.heic),用 AVC/H.264 装就叫 HEIF AVCI;Apple 选了前者。2017 年 9 月 iOS 11 把相机默认存储格式从 JPEG 改成 HEIC——一夜之间,全球数亿台 iPhone 开始产生 HEIC 文件。比 JPEG 体积小一半、支持 10-bit HDR、支持 alpha、支持多对象嵌套——技术上没毛病,问题全在专利。

In 2015 MPEG wrapped HEVC's (H.265 video) intra-frame coding into an image-container spec called HEIF — High Efficiency Image File Format, ISO/IEC 23008-12. Same thinking as WebP: take a modern video codec's intra-frame, use it as a still-image codec, package it in ISOBMFF (the same container family as MP4). HEIF itself is just a container spec; the actual pixel codec depends on the payload — HEVC-payloaded HEIF is HEIC (.heic), AVC/H.264-payloaded HEIF is HEIF AVCI. Apple picked HEVC. In September 2017, iOS 11 switched the camera's default capture format from JPEG to HEIC — overnight, hundreds of millions of iPhones started producing HEIC files. Half the size of JPEG, 10-bit HDR support, alpha, nested multi-image objects — technically flawless. All the problems are in the patents.

HEIF · ISOBMFF BOX TREE ftyp · file type · brand 'heic' / 'mif1' / 'msf1' meta · metadata container hdlr · 'pict' pitm · primary item id iinf · item info table iloc · item byte offsets iprp / ipco / ipma · item properties (HEVC config · color · dimensions) mdat · raw HEVC bitstream(s) — image items, thumbnails, derived items iloc 提供从 item-id 到 mdat 内偏移的映射 · alpha 走单独 item
图 11 · HEIF 的 ISOBMFF box 树。ftyp 标 brand(heic = HEVC payload);meta 是元数据容器,里头 hdlr 标"图像句柄"、pitm 指定主图 item id、iinf 列所有 item、iloc 给 byte 偏移、iprp 装属性(HEVC config / color / 尺寸);mdat 装真正的 HEVC bitstream——主图、缩略图、派生项、alpha 都是独立 item,通过 iloc 查表找位置。
Fig 11 · HEIF's ISOBMFF box tree. ftyp declares the brand (heic = HEVC payload). meta is the metadata container — hdlr tags it as an image handler, pitm names the primary-item id, iinf lists every item, iloc gives their byte offsets, iprp carries item properties (HEVC config, colour, dimensions). mdat holds the actual HEVC bitstreams — main image, thumbnails, derived items, alpha all live as independent items, each addressed via iloc.

技术内核

Technical core

HEIF / HEIC 的技术构造分四层。① HEIF 容器 = ISOBMFF box 系——跟 MP4 / MOV / 3GP 同根的"box-in-box"二进制格式,每个 box 4-byte size + 4-byte FourCC type + payload。这套格式过去 20 年被全球视频行业打磨得极其成熟,标准库一抓一大把,Apple 自然顺手。② HEVC intra-frame payload——CTU(Coding Tree Unit)最大可达 64×64,远大于 JPEG 的 8×8 / WebP 的 16×16,同样质量下 macroblock artifact 几乎肉眼不可见;intra prediction 有 35 种方向(DC + Planar + 33 angular),比 VP8 的 10 种细得多;后处理还有 SAO(Sample Adaptive Offset)和 deblocking filter,把块边界进一步抹平。这是 HEIC 能比 JPEG 小 50% 的核心。③ 多对象 / 派生项 / 网格——HEIF 不止能存"一张图",它能存"主图 + 缩略图 + 多视角图 + 派生编辑(裁剪 / 旋转 / 网格拼接)",每个对象一个 item,iloc 表查偏移。Apple 利用这个特性做"突发拍照"(把一个 burst session 的 10 张图打包成 1 个 .heic)。④ Live Photo 混合容器——iPhone 的 Live Photo 不是单文件,它是 1 张 .heic 静图(主关键帧)+ 1 段 .mov 视频(前后 1.5 秒 + 音频)的组合,iCloud 同步时把它们绑在一起作为"一个资产"管理——这是 HEIF 最被低估的工程贡献。

HEIF / HEIC has four technical layers. ① HEIF container = ISOBMFF box family — the same "box-in-box" binary format as MP4 / MOV / 3GP, every box is 4-byte size + 4-byte FourCC type + payload. Twenty years of video-industry tooling makes the spec battle-tested and trivial for Apple to adopt. ② HEVC intra-frame payload — the Coding Tree Unit can reach 64×64, much larger than JPEG's 8×8 or WebP's 16×16, so macroblock artefacts are practically invisible at the same quality; intra prediction has 35 directions (DC + Planar + 33 angular) versus VP8's 10; post-processing adds SAO (Sample Adaptive Offset) and a deblocking filter that further smooth block boundaries. That's the core reason HEIC weighs ~50 % less than JPEG. ③ Multi-item, derived items, grids — HEIF doesn't store "one image"; it stores "main image + thumbnails + multi-view images + derived edits (crop / rotate / grid-tile composition)". Each object is its own item, addressed via the iloc table. Apple uses this to pack a burst-photo session of ten images into a single .heic file. ④ Live Photo as a hybrid container — iPhone's Live Photo isn't a single file; it's a .heic still (the keyframe) + a .mov video (1.5 s before + 1.5 s after, with audio). iCloud syncs them as a bound pair, treating the combo as a single asset — HEIF's most underappreciated engineering contribution.

适用

USE FOR

  • iPhone / iPad 拍照默认存储(2017 iOS 11+ 至今)
  • iCloud 相册同步 / Apple Photos 编辑链
  • 10-bit HDR 静态照片(P3 色域 + Dolby Vision Stills)
  • Live Photo 双文件混合资产
  • Apple 生态闭环内的高效存储与传输
  • iPhone / iPad default photo storage (2017 iOS 11+ onward)
  • iCloud Photos sync · Apple Photos edit chain
  • 10-bit HDR stills (P3 gamut + Dolby Vision Stills)
  • Live Photo's two-file hybrid asset
  • High-efficiency storage and transfer inside Apple's walled garden

反适用

AVOID

  • 任何需要 Web 通用兼容的场景:Chrome / Firefox 至今不支持原生 HEIC
  • 跨平台分享 / 邮件附件:Windows、Android 默认看不了
  • 商业项目的 Web 主图——用 WebP / AVIF
  • 开源 / 自由软件管线——HEVC 专利费让大多数 FOSS 项目不愿意 ship 解码器
  • Anything that needs broad web compatibility — Chrome and Firefox still won't ship native HEIC
  • Cross-platform sharing / email attachments — Windows and Android won't render it by default
  • Web hero images for commercial projects — use WebP / AVIF instead
  • Open-source / libre pipelines — HEVC patent fees keep most FOSS projects from shipping a decoder
scopebrowserstoolsCLI
HEIC / HEIF Safari 17+ (macOS 14+ / iOS 17+) · Chrome · Firefox macOS Preview · Apple Photos · Windows 10+ (HEIF Image Extension paid) · Photoshop 2023+ heif-enc -q 60 in.png -o out.heic · heif-dec out.heic out.png (libheif)
父:parents: HEVC (H.265 intra) + ISOBMFF (MP4 container family) 子:children: AVIF (same container lineage, AV1 payload swapped in to escape patents) 派生:derived: Live Photo (.heic + .mov hybrid asset)

AVIF — AV1 的副产品成了王

AVIF — A Video Codec's Side-Effect Became King

YEAR 2019 (AOMedia AV1 1.0 + AVIF spec 1.0.0) AUTHOR Alliance for Open Media (Google · Mozilla · Cisco · Apple · Netflix · Microsoft · Intel · Amazon · Nvidia · Samsung 等) EXT .avif MIME image/avif STD AOMedia AVIF 1.0.0 + AV1 (AV1 Bitstream & Decoding Process) LOSSY AV1 intra (lossy) · 也支持 lossless 模式 DEPTH 8 / 10 / 12 bit ALPHA ANIM ✓ (AVIF Sequence) STATUS 现代 Web 主推 — Chrome 85+ (2020-08) · Firefox 93+ (2021-10) · Safari 16.4+ (2023-03)

为视频生的 codec,顺手把图片格式革命了一遍。

A video codec by birth — and it casually rewrote image formats.

2018 年 3 月 AOMedia 发布 AV1 视频编码 1.0,目标是做"完全免专利费的 HEVC 替代品"——背后是 Google / Mozilla / Cisco / Apple / Netflix / Microsoft / Intel / Amazon / Nvidia / Samsung 三十多家公司组成的联盟,带着各自的专利池交叉许可。AV1 走的是同一条"视频帧内 → 静态图片"路径(WebP / HEIC 都是这条路),把 intra-frame 编码能力套进 HEIF 容器(ISOBMFF),就拿到了 AVIF (AV1 Image File Format)——体积比 HEIC 略小、专利免费、跨厂商共识、Chrome 与 Firefox 与 Safari 三大引擎都点头。AVIF 2019 年 2 月发布标准,Chrome 85 (2020 年 8 月) 落地,Firefox 93 (2021 年 10 月) 跟进,Safari 16.4 (2023 年 3 月) 收尾——HEIC 阵营在 Web 上正式退场。

In March 2018 AOMedia shipped AV1 1.0 — the goal was a "completely royalty-free HEVC alternative". The alliance behind it is 30+ companies (Google, Mozilla, Cisco, Apple, Netflix, Microsoft, Intel, Amazon, Nvidia, Samsung…) cross-licensing their patent pools to make it stick. AV1 took the same "video intra-frame → still image" route as WebP and HEIC, wrapped its intra-frame encoder in HEIF (ISOBMFF), and out came AVIF (AV1 Image File Format) — smaller than HEIC, patent-free, cross-vendor, with all three big browser engines on board. The spec landed in February 2019, Chrome 85 shipped it in August 2020, Firefox 93 in October 2021, Safari 16.4 in March 2023. On the open web HEIC was officially out.

AV1 SUPERBLOCK · 128×128 SPLIT TREE 128 64 32 32 64×16 rect 16 8×8 grid 4×4 leaf depths: 128 → 64 → 32 → 16 → 8 → 4 + rect partitions (2:1 / 1:2 / 4:1) flat regions → keep 64 / 128
图 12.a · AV1 superblock 是 128×128,可以递归切到最小 4×4——比 H.264 的 16×16 / HEVC 的 64×64 都更灵活,平坦区可整块保留(省 bit)、纹理区可切到 4×4(省失真),还能用 2:1 / 1:2 / 4:1 矩形切分。
Fig 12.a · An AV1 superblock is 128×128 and can recursively split down to 4×4 — more flexible than H.264's 16×16 or HEVC's 64×64. Flat regions stay at 64 / 128 (saving bits); textured regions split to 4×4 (saving distortion); rectangular partitions (2:1, 1:2, 4:1) are also available.
AV1 INTRA PREDICTION · 56 DIRECTIONS + 4 SPECIAL 56 angular step 3° fine fan step 9° coarse + 4 special DC · Planar · Smooth Paeth (PNG-style) vs HEVC: 35 dirs vs VP8: 10 dirs
图 12.b · AV1 的 intra prediction 罗盘 — 56 个角度方向(粗扇 9° 步、细扇 3° 步)+ 4 种特殊模式(DC / Planar / Smooth / Paeth);对比 HEVC 35、VP8 10。方向越细,纹理预测越准,残差越小。
Fig 12.b · AV1's intra-prediction compass — 56 angular directions (coarse 9° steps, fine 3° steps) plus 4 special modes (DC / Planar / Smooth / Paeth). HEVC has 35; VP8 has 10. Finer angles mean better texture prediction and smaller residuals.
CfL · CHROMA FROM LUMA Y · luma already encoded C = α·Y + β α signalled β = mean(C) Cb / Cr predicted from Y ·
图 12.c · CfL (Chroma from Luma) — 同一块的色度直接从已编码的亮度推导,公式 C = α·Y + β,只需要 signal 一个 α(每块 4 bit 左右);β 是块内均值。色度残差因此大大缩小——这是 AV1 在低 bitrate 下色彩还能保真的关键之一。
Fig 12.c · CfL (Chroma from Luma) — chroma is derived from the already-encoded luma via C = α·Y + β. Only α needs to be signalled (≈4 bits per block); β is the chroma mean. Chroma residuals shrink dramatically — a major reason AV1 keeps colour fidelity at low bitrates.
SAME PHOTO · SIZE & PSNR JPEG Q85 250 KB PSNR 32 dB WebP 165 KB PSNR 33 dB HEIC 125 KB PSNR 35 dB AVIF 95 KB PSNR 35 dB indicative numbers · 1080p natural photo · default encoder profile
图 12.d · 同一张 1080p 自然照片在四种 codec 下的体积与 PSNR 参考值。AVIF 用约 38% 的 JPEG 体积达到比 JPEG 高 3 dB 的客观质量;AVIF 的 95 KB 已经接近 HEIC,但有完整的 web 兼容。
Fig 12.d · The same 1080p natural photo under four codecs (indicative numbers). AVIF reaches ≈3 dB higher PSNR at ≈38 % of JPEG's bytes; the 95 KB figure matches HEIC while keeping full web compatibility.

技术内核

Technical core

AVIF 的技术深度全在 AV1 这一侧——容器只是 HEIF 的复用。① AV1 intra prediction:56 种角度方向(粗扇 9° 步 + 细扇 3° 步)+ 4 种特殊模式(DC / Planar / Smooth / Paeth)+ CfL(Chroma from Luma,色度从亮度推导)+ Palette mode(每块独立小调色板,适合 UI 截图)+ Intra Block Copy(块内自指,跟视频的"运动补偿"对偶)——比 HEVC 的 35 方向、VP8 的 10 方向都细很多。② Superblock 128×128 + 多种切分:递归切到最小 4×4,还允许 2:1 / 1:2 / 4:1 矩形,平坦区整块保留、纹理区切细。③ 变换块 16 种组合:DCT-2 / ADST(非对称离散正弦)/ WHT(Walsh-Hadamard)/ IDTX(恒等)四种变换在 H/V 两个方向独立选择,共 4×4 = 16 种组合——纹理方向不同,选不同变换效率最优。④ HEIF 容器:跟 HEIC 完全同根的 ISOBMFF box 树(ftyp 'avif' · meta · iloc · iprp · mdat),thumbnail / alpha / depth map 都是独立 item。⑤ 专利策略是它最大的非技术杀招:AOMedia 的核心承诺是"会员单位互相 royalty-free 交叉许可,且对所有人 patent non-assert"——Google / Cisco 把已有专利池贡献进来,把"做一个免费 codec"从技术问题变成了行业政治问题,并赢了。

AVIF's technical depth lives on the AV1 side; the container is just HEIF reused. ① AV1 intra prediction: 56 angular directions (coarse 9° + fine 3° steps) + 4 special modes (DC / Planar / Smooth / Paeth) + CfL (Chroma-from-Luma) + Palette mode (per-block tiny palette, great for UI screenshots) + Intra Block Copy (intra-frame self-reference, the still-image dual of motion compensation). HEVC has 35 directions; VP8 had 10. ② Superblocks at 128×128 recursively split down to 4×4, with 2:1 / 1:2 / 4:1 rectangular partitions — flat regions stay whole, textured regions split. ③ Sixteen transform-block combinations: DCT-2 / ADST (asymmetric discrete sine) / WHT (Walsh-Hadamard) / IDTX (identity) chosen independently for H and V — 4×4 = 16 combos, different texture orientations get different transforms. ④ HEIF container: the same ISOBMFF box tree as HEIC (ftyp 'avif', meta · iloc · iprp, mdat); thumbnails, alpha and depth maps live as independent items. ⑤ Patent strategy is the real masterstroke: AOMedia's binding promise is "members cross-license royalty-free; every patent the alliance touches is non-asserted against the world". Google and Cisco committed their pools, and the question of "can a free codec exist?" turned from a technical one into an industry-politics one — which they won.

AVIF ENCODE PIPELINE · 8 stages YUV 4:2:0 10/12 bit + alpha (item) SB 128×128 split → 64/32/16/8/4 + rect partitions INTRA PRED 56 dir + DC/Planar + CfL · Palette · IBC TRANSFORM DCT-2 / ADST / WHT / IDTX 16 combos QUANTISE cq-level 0–63 ★ lossy step CDF ENTROPY arithmetic (Daala-style) adaptive probabilities HEIF WRAP ftyp 'avif' · meta iloc · iprp · mdat .avif box tree + AV1 OBUs block-tree decisions · per-superblock DCT & quantise — where bytes vanish entropy layer · lossless file layer · ISOBMFF boxes Knobs: cq-level (0=best, 63=worst) · speed (0–10, lower=more search) · subsample (4:4:4 / 4:2:2 / 4:2:0) · bit depth (8/10/12) · alpha (separate item) OBU = Open Bitstream Unit, AV1's basic packet (sequence header / frame header / tile group / metadata). Encode is ~50× slower than JPEG; decode is fast (dav1d ships in browsers and beats reference by 2–3×).

图 12 · AVIF 全流程 · YUV4:2:0 → 128×128 superblock 多级切分 → 56 方向 intra 预测(+ CfL/Palette/IBC) → 16 种变换组合 → 量化(★ 唯一有损步骤,cq-level 控制狠度) → CDF 自适应算术编码 → HEIF box 包外壳 → .avif。cq-levelspeed、subsample 比、bit depth 是编码器主要旋钮。

Fig 12 · The full AVIF pipeline · YUV 4:2:0 → 128×128 superblock recursive split → intra prediction with 56 angular modes (+ CfL / Palette / IBC) → one of 16 transform combinations → quantise (★ the only lossy step, governed by cq-level) → CDF adaptive arithmetic coding → HEIF box wrapper → .avif. The main knobs are cq-level, encoder speed, chroma subsample and bit depth.

codecyearpatent1080p photo @ JPEG-Q85 qualityencode timebrowser support
JPEG1992free (post-2007)≈ 250 KB1 ×✓✓✓ universal
WebP2010free≈ 165 KB≈ 3 ×✓✓✓ since 2020
HEIC2015$$$ (3 pools)≈ 125 KB≈ 20 ×only Safari
AVIF2019free (AOMedia)≈ 95 KB≈ 50 ×✓✓✓ all modern
JXL2021free≈ 85 KB≈ 5 ×partial (Safari · Firefox flag)
$ avifenc --min 0 --max 63 -a end-usage=q -a cq-level=23 in.png out.avif   # typical Q23 — visually near-lossless
$ avifenc -j 8 -s 6 in.png out.avif                # speed 0–10 (lower=better/slower); -j threads
$ avifenc -d 10 --yuv 444 in.png out.avif          # 10-bit + 4:4:4 chroma — for HDR / design assets
$ avifdec out.avif decoded.png                     # reference decode via libavif
$ cavif --quality 80 in.png -o out.avif            # Rust CLI built on rav1e; faster preset

适用

USE FOR

  • 现代 Web 首屏主图 / Hero 图(预编码后 CDN 分发)
  • 体积敏感 + 质量要求高的内容图(电商、媒体、博客)
  • 透明 PNG 替代 — 体积可省 80–95%,肉眼几乎无损
  • 10-bit HDR 图像分发(P3 / Rec.2020 色域)
  • 响应式 <picture> 中作为优先 source(配 WebP / JPEG fallback)
  • Modern web hero / above-the-fold images (pre-encoded, CDN-served)
  • Bandwidth-sensitive content images — e-commerce, media, blogs
  • Transparent-PNG replacement — 80–95 % smaller, visually identical
  • 10-bit HDR image delivery (P3 / Rec.2020 gamut)
  • Top source in <picture> with WebP / JPEG fallback

反适用

AVOID

  • 需要 IE / 老 Android(< 5.0) / 老 Safari(< 16) 兼容的场景
  • 编码时间敏感:CI 实时构建 / 服务器实时转码 / 浏览器端用户上传
  • 用户头像 / 缩略图等"用一次就丢"的小图(编码成本不划算)
  • 需要无损归档的工程影像(改用 PNG / EXR / TIFF)
  • Anything that must run on IE, old Android (< 5.0), or old Safari (< 16)
  • Encode-time-sensitive paths: CI builds, on-the-fly server transcoding, browser-side user uploads
  • Avatars / throwaway thumbnails — encode cost outweighs the savings
  • Lossless engineering archives — use PNG / EXR / TIFF instead
scopebrowserstoolsCLI
AVIF · AVIF Sequence (anim) ✓✓✓ Chrome 85+ (2020-08) · Firefox 93+ (2021-10) · Safari 16.4+ (2023-03) · Edge 121+ ✓✓ Photoshop 24.2+ · Figma (export only) · GIMP 2.10+ · Squoosh · Cloudflare Images · imgix avifenc (libavif) · cavif (rav1e) · sharp (Node) · ffmpeg -c:v libaom-av1
父:parents: AV1 (video codec, AOMedia 2018) + HEIF / ISOBMFF (container, MPEG 2015) 致敬:tribute to: HEIC (same container lineage, AV1 swapped in to dodge HEVC patents) 派生:derived: AVIF Sequence (animation extension) 竞争:rival: JPEG XL (technically slightly better, politically blocked from Chrome)

JPEG XL — 被 Chrome 砍掉的"完美"格式

JPEG XL — The "Perfect" Format Chrome Killed

YEAR 2021 (ISO/IEC 18181) · 提案 2017 起 AUTHOR Cloudinary + Google Research · 主要作者 Jon Sneyers · Jyrki Alakuijala · Luca Versari · Zoltan Szabadka EXT .jxl MIME image/jxl STD ISO/IEC 18181 (Part 1: codec · Part 2: file format · Part 3: conformance · Part 4: reference software) LOSSY VarDCT 模式 · 无损: Modular 模式 · JPEG transcode 可逆 DEPTH 8 / 10 / 12 / 16 / 32 bit + 16/32-bit float (HDR / wide gamut) ALPHA ✓ (full bit depth) ANIM STATUS Apple Safari 17+ (2023-09) · Firefox flag · Chrome 移除 (2022-10) · macOS / iOS 内部全 native

技术上吊打所有人,被 Chrome 团队以"兴趣不足"砍掉。

Technically beats everyone. Chrome killed it citing "insufficient interest".

2017 年 AOMedia 已经在猛推 AVIF,但有一群人不满足:HDR 摄影师、印刷出版业、漫画 / 插画家、需要无损归档的博物馆、还有手里握着几十亿张 JPEG 资产没法迁移的所有人——AVIF 解决不了他们的问题。Cloudinary 与 Google Research 把两个独立项目(Cloudinary 的 FUIF + Google 的 PIK)合并,推出 JPEG XL,目标是做"一个能同时干完所有事的下一代格式":(a) 把现存 JPEG 文件 无损 transcode 成 JXL,体积省 ~20%,任何时候可逆向恢复原 byte-exact JPEG;(b) 现代 VarDCT lossy 编码,质量比 AVIF 略好;(c) Modular 模式做无损,比 PNG / WebP-LL 都小;(d) 真正的渐进式解码——第一段 ~1/64 数据就能显示完整的"像素化粗略图",随后几段越来越清晰;(e) 8–32 bit + float、HDR、宽色域、CMYK、高位深 alpha 全套原生。技术上几乎是"现代格式应该有的样子"的完整集成,2021 年 2 月以 ISO/IEC 18181 标准化通过——但落地之路比技术艰难得多。

By 2017 AOMedia was already pushing AVIF hard, but a constituency wasn't satisfied: HDR photographers, the print and publishing industry, comic/manga artists, archival museums, and anyone holding billions of legacy JPEGs they couldn't migrate — AVIF solved none of their problems. Cloudinary and Google Research merged two independent projects (Cloudinary's FUIF and Google's PIK) into JPEG XL with the explicit ambition of "doing all of it at once": (a) losslessly transcode existing JPEGs into JXL, ~20 % smaller, reversible to byte-exact original JPEG; (b) modern VarDCT lossy with quality slightly above AVIF; (c) Modular mode for lossless, smaller than both PNG and WebP-LL; (d) real progressive decoding — the first ≈1/64 of the bitstream already displays a complete coarse image, with subsequent segments adding detail; (e) native 8–32 bit + float, HDR, wide gamut, CMYK and high-bit-depth alpha. Technically it's the complete integration of "what a modern format should look like". ISO/IEC 18181 was published in February 2021. The path to adoption proved much harder than the engineering.

JXL DUAL MODE · VARDCT · MODULAR input VarDCT (lossy) 256 8 16 2 32 2×2 ~ 256×256 Modular (lossless / near-lossless) R G B A channel chain predictors: WP · Gradient · Self-correcting
图 13.a · JXL 同时支持两条编码路径:VarDCT(可变块大小 DCT,2×2 到 256×256,lossy)和 Modular(meta-adaptive 预测器 + 通道链变换,lossless 或 near-lossless)。两条路在同一个 .jxl 容器里可以混用——一张图的不同区域可以用不同模式。
Fig 13.a · JXL ships two coding paths in one container: VarDCT (variable-size DCT blocks from 2×2 to 256×256, lossy) and Modular (meta-adaptive predictors with per-channel transform chains, lossless or near-lossless). They can be mixed within a single .jxl — different regions of an image can take different modes.
JPEG → JXL · LOSSLESS REPACK .jpg 100 KB DCT coeffs parse Huffman keep all integers no requantise .jxl ≈ 80 KB djxl recovers byte-exact .jpg
图 13.b · JPEG → JXL 无损 transcode:解码 JPEG 拿到原始 DCT 系数(不去量化、不再变换),用 JXL 更强的熵编码重新打包,体积省约 20%;djxl 反向恢复时,bit-by-bit 还原原始 .jpg。这是其它现代 codec 都做不到的事。
Fig 13.b · JPEG → JXL lossless transcode: decode the JPEG to obtain its raw DCT coefficients (no requantising, no re-transform), then re-encode with JXL's stronger entropy coder — about 20 % smaller. Run djxl to recover the original .jpg byte for byte. No other modern codec offers this.
JXL TRUE PROGRESSIVE · 1/64 → 1/16 → 1/4 → FULL ~1/64 ≈ 1.5 KB already useful ~1/16 ≈ 6 KB ~1/4 ≈ 24 KB FULL 100% ≈ 100 KB
图 13.c · JXL 的渐进式是 真分辨率渐进(spatial / DC-first)——只下载前 ~1/64 的字节(约 1.5 KB)就能渲染一张完整的低分辨率图;再下载到 1/16、1/4,逐步精化。这跟 progressive JPEG"按 DCT 频率扫描"完全不同——JPEG 的渐进结果在中途看起来是糊的,JXL 是清晰的小图。
Fig 13.c · JXL's progressive mode is true spatial / DC-first: load only the first ~1/64 of bytes (~1.5 KB) and you can render a complete, low-resolution image; load to 1/16 and 1/4 to refine. Unlike progressive JPEG (which streams DCT frequencies and looks blurry mid-load), JXL renders crisp small images at every stage.
RATE-DISTORTION · JXL vs AVIF bpp → PSNR ↑ 0.1 2.0 42 28 AVIF JXL +1 dB @ same bpp
图 13.d · 同一组测试图在 JXL 与 AVIF 下的客观 R-D 曲线(示意)。两条曲线非常接近,JXL 略上略左——同 bpp 下 PSNR / SSIM 通常高 0.5–1.5 dB,中高质量段尤明显;低 bpp 段两者 trade-off 各有胜负。技术差距远小于政治差距。
Fig 13.d · Indicative rate-distortion curves for JXL and AVIF on the same test set. The two are very close; JXL sits slightly above-left — typically 0.5–1.5 dB higher PSNR / SSIM at the same bitrate, most clearly in the mid-to-high quality range. At very low bitrates the trade-off goes either way. The technical gap is much smaller than the political gap.

技术内核

Technical core

JXL 的技术广度是当代图像格式里最大的——它把"现代图像格式应该有的所有能力"打包进同一个容器,六个核心点:① VarDCT(可变块 DCT)——块大小可在 2×2 到 256×256 之间自由变化,远比 AVIF 的 4×4–128×128 灵活;搭配 XYB(感知分离的色彩空间,JXL 自创)+ 自适应量化矩阵(可按图像内容定制),lossy 模式直接对标 AVIF。② Modular 模式——meta-adaptive 预测器(WP / Gradient / Self-correcting,可学习权重)+ 通道变换链(Squeeze / RCT / 自定义 transform),做无损或 near-lossless,小于 PNG / WebP-LL 30–50%。③ JPEG 无损 transcode(最革命性):任意 JPEG 文件解码到 DCT 系数,不再变换、不再量化,直接用 JXL 的熵编码重新打包,体积省 ~20%;djxl 反向时 byte-exact 恢复原 JPEG——这是其它 codec 全都做不到的事。④ 真渐进式解码——比特流头部就是低分辨率版本,解码器收到前 ~1/64 字节就能渲染一张完整的低分辨率图(不像 progressive JPEG 是按频率扫描,中途看起来糊);非常适合慢网。⑤ HDR / 32-bit float / wide gamut / CMYK 全原生——无需 ICC profile hack,XYB 色空间内部就支持 HDR;打印行业的高位深 + CMYK 也是一等公民。⑥ Patch 系统——对图片中重复出现的 pattern(同一个表情、漫画里反复出现的角色脸)单独编码一次,在出现位置插入引用,极大压缩漫画 / 表情包 / 截图。技术上几乎是"现代图像格式应该有的样子"的完整集成。

JXL has the broadest technical surface area of any current image format — it bundles every capability "a modern image format ought to have" into one container. Six pillars: ① VarDCT — block sizes range freely from 2×2 to 256×256, far more flexible than AVIF's 4×4–128×128. Combined with XYB (a perceptually separated colour space JXL invented) and content-adaptive quantisation matrices, lossy mode trades blow-for-blow with AVIF. ② Modular mode — meta-adaptive predictors (WP / Gradient / Self-correcting, weights learnable) plus channel-transform chains (Squeeze / RCT / custom) deliver lossless or near-lossless that's 30–50 % smaller than PNG and WebP-lossless. ③ JPEG lossless transcode (the revolutionary one): decode any JPEG into its DCT coefficients, skip requantising and re-transforming, and just re-encode with JXL's entropy coder — about 20 % smaller. djxl recovers the original JPEG byte for byte. No other codec offers this. ④ True progressive decoding — the bitstream's head is the low-resolution version. Receive the first ~1/64 of bytes and the decoder renders a complete coarse image (unlike progressive JPEG, which scans by frequency and stays blurry mid-load). Excellent for slow networks. ⑤ HDR, 32-bit float, wide gamut, CMYK all native — no ICC-profile hacks; XYB supports HDR internally; high-bit-depth + CMYK are first-class for print. ⑥ Patch system — encode a repeating pattern (an emoji, a recurring character face in a comic) once, then place references at every occurrence. Comics, sticker sheets and screenshots compress dramatically.

JPEG XL · THREE ENCODE PATHS · ONE CONTAINER SOURCE image · or .jpg lossy VarDCT block 2 ~ 256 XYB colour QUANTISE adaptive matrix ★ lossy step lossless MODULAR predictors WP / Grad channel transforms .jpg recompress DCT COEFFS parse Huffman no requantise · reversible ANS ENTROPY asymmetric numeral sys faster than arithmetic .jxl codestream + boxes Brotli metadata · Exif · ICC djxl recovers byte-exact original .jpg Three coding paths fan out from the source, all converge on the ANS entropy coder, all wrap in the same .jxl container. Modes can mix per region.

图 13 · JXL 三条编码路径并存:lossy 走 VarDCT + 自适应量化(★ 唯一有损步骤)、lossless 走 Modular + 预测器、JPEG transcode 直接打包 DCT 系数;三条路径都汇入 ANS(asymmetric numeral system)熵编码,最后包进 .jxl 容器。djxl 可把 transcode 路径反向恢复为 byte-exact 的原 JPEG。

Fig 13 · JXL fans out into three coding paths: lossy via VarDCT + adaptive quantisation (★ the only lossy step), lossless via Modular + predictors, JPEG transcode by repacking DCT coefficients. All three converge on ANS (asymmetric numeral system) entropy coding before being wrapped in the .jxl container. djxl reverses the transcode path back to a byte-exact original JPEG.

featureJPEGWebPHEICAVIFJXL
HDR / wide gamut✓ (XYB native)
16+ bit depthpartial (10/12)✓ (10/12)✓ (up to 32 + float)
lossless modenominal (10918-1 part 4)✓ (best in class)
JPEG recompress✓ (lossless · ~20 % smaller · reversible)
progressiveby frequency (blurry)✓ true spatial / DC-first
CMYK✓ first-class
Chrome support✗ (removed 2022-10)
Safari 17+✓ (since 2023-09)
$ cjxl in.png out.jxl --quality 90       # quality 0–100 (≈90 visually lossless)
$ cjxl in.png out.jxl --distance 1.0     # distance: 0=lossless, ~1=Q90, ~3=Q75
$ cjxl in.jpg out.jxl --lossless_jpeg 1  # JPEG → JXL lossless transcode (~20% smaller)
$ djxl out.jxl roundtrip.jpg             # reverse transcode — byte-exact original .jpg
$ cjxl in.png out.jxl -d 0 -e 9          # lossless, max effort (smallest, slowest)

适用

USE FOR

  • macOS / iOS 17+ 内部存储链路(Apple Photos 后端)
  • 摄影 / RAW 后期管线(Lightroom · Capture One 已 native)
  • 印刷出版业(CMYK + 高位深 first-class)
  • HDR / wide-gamut / Dolby Vision Stills 长期归档
  • 把现存 JPEG 资产无损迁移省 ~20% 体积(可逆)
  • 漫画 / 表情包 / 截图(patch 系统压缩极优)
  • macOS / iOS 17+ internal storage pipeline (Apple Photos back-end)
  • Photography / RAW post pipelines (Lightroom · Capture One ship JXL natively)
  • Print and publishing (CMYK + high bit depth as first-class)
  • HDR / wide-gamut / Dolby Vision Stills long-term archives
  • Migrating existing JPEG libraries — ~20 % smaller, fully reversible
  • Comics / sticker sheets / screenshots (patch system compresses superbly)

反适用

AVOID

  • 桌面 Chrome / Edge 主流量场景(2022-10 已移除支持)
  • Android 主流浏览器(WebView / Chrome 同样不支持)
  • 实时性能敏感的服务端 / 客户端 transcoding(库还在快速演进)
  • 需要"全 Web 兼容"的公共图床 / CDN 默认输出
  • Desktop Chrome / Edge mainstream traffic (support removed Oct 2022)
  • Android's main browsers (WebView / Chrome don't support it either)
  • Latency-sensitive server/client transcoding (libraries still maturing)
  • "Universal web compatibility" as the default CDN output
scopebrowserstoolsCLI
JPEG XL Safari 17+ (2023-09) · flag Firefox image.jxl.enabled · Chrome (removed 2022-10) · Edge ✓✓ Photoshop 24.2+ · Camera Raw · Lightroom · Capture One · Krita · GIMP 2.10.30+ · Affinity Photo 2 · macOS Preview / iOS Photos cjxl · djxl (libjxl) · sharp (Node, libjxl-bind)
父:parents: Google PIK (Practical Image Coding) + Cloudinary FUIF (Free Universal Image Format) 致敬:tribute to: JPEG (the only modern codec that can losslessly transcode it back and forth) 竞争:rival: AVIF (technically near-tied, politically AVIF won the open web)

KTX / KTX2 — 容器与 payload 的分离

KTX / KTX2 — separating container from payload

YEAR 2005 (KTX1) · 2019 (KTX2) AUTHOR Khronos Group EXT .ktx · .ktx2 MIME image/ktx2 STD Khronos KTX 1.0 / 2.0 KIND container (any GPU block-compressed payload) DEPTH payload-defined (8 / 10 / 16 / float) ALPHA payload-defined ANIM mip chain · cubemap faces · array layers STATUS KTX2 主流(glTF 2.0 / WebGPU 资产标准)

"它本身不是格式,是装格式的盒子。"

"Not a format itself — a box that holds formats."

GPU 块压缩格式(BCn / ETC2 / ASTC)的规范只规定了"4×4 像素块怎么编成几个字节",但没规定一个完整的纹理资产文件要怎么组织——mipmap 链怎么排?cubemap 的六个面怎么放?array layer 怎么索引?ICC color profile 放哪?Khronos 看不下去,做了 KTX(Khronos TeXture)当通用容器:头部 + key-value metadata + level/layer/face 的 byte-offset 索引表 + 真正的像素 payload。KTX 不关心 payload 是 BC7 还是 ASTC,只负责"把它装好、运行时一次性 upload 到 GPU"。2019 年 KTX2 加上 supercompression(用 Zstd 或 Basis Universal 把已经 GPU 压过的 payload 再压一遍),并把 mip 顺序改成 smallest-first 便于流式加载——成了 glTF 2.0 / WebGPU / Babylon / three.js 的资产事实标准。

GPU block-compression specs (BCn / ETC2 / ASTC) only define "how a 4×4 pixel block is encoded into a few bytes" — they say nothing about how a complete texture asset is laid out: how the mip chain is ordered, how the six faces of a cubemap sit together, how array layers are indexed, where the ICC colour profile lives. Khronos picked up the slack with KTX (Khronos TeXture): header + key-value metadata + a byte-offset index table for every level/layer/face + the actual pixel payload. KTX is payload-agnostic — it doesn't care whether the payload is BC7 or ASTC, it just packs the asset and lets the runtime upload it to the GPU in one go. KTX2 (2019) added supercompression — running the already-GPU-compressed payload through Zstd or Basis Universal a second time — and reversed the mip order to smallest-first so streaming loaders can swap in a low-res placeholder immediately. It is now the de-facto asset format for glTF 2.0, WebGPU, Babylon.js and three.js.

KTX2 · FILE LAYOUT (BYTE ORDER) header 80 B index offsets DFD format desc KVD key-value SGD supercomp mip levels (KTX2 = smallest-first · KTX1 = largest-first) L7 L6 L5 L4 L3 L2 L1 L0 (largest) 每 mip level 内 = GPU block payload(BC7 / ETC2 / ASTC / Basis transcoded) + padding 按硬件对齐 · 每 face / layer 一个 byte 区段 · index 表给 byte offset 魔数: AB 4B 54 58 32 30 BB 0D 0A 1A 0A → "«KTX 20»\r\n\x1A\n"
图 14 · KTX2 文件布局。前 80 字节是 header(魔数 + 格式 ID + width/height/levels/layers/faces);紧跟 index 表(每个 level 的 byte offset);DFD = data format descriptor(描述 payload 格式);KVD = key-value metadata(ICC profile / 作者信息);SGD = supercompression global data(Basis 字典);最后是按 mip level 排好的真正 payload——KTX2 反过来 smallest-first,L7(最小)在前、L0(最大)在后,流式加载时先拿小的占位。
Fig 14 · KTX2 file layout. The first 80 bytes are the header (magic + format ID + width/height/levels/layers/faces); next is the level index (a byte offset per level); DFD = data format descriptor (describes the payload's format); KVD = key-value metadata (ICC profile, author info); SGD = supercompression global data (Basis dictionary); then the actual payload, ordered by mip level. KTX2 reverses the order — L7 (smallest) first, L0 (largest) last — so streaming loaders can grab a tiny placeholder before the rest arrives.

技术内核

Technical core

KTX 的设计有四个支点。① header + index 表——文件头 80 字节,描述纹理的逻辑维度(width / height / depth / mip levels / array layers / faces);后面跟一张 level index 表,告诉 loader 第 N 级 mip 在文件内的 byte offset 和 byte length。这种"先索引后数据"的布局让 loader 不用扫整个文件就能跳读任意 level。② 每 mip level 内有 padding——GPU 上传时纹理需要按硬件对齐(通常 4 字节或 8 字节边界),KTX 直接在 file format 层面加 padding,运行时 memcpy 一行就能直接交给 glCompressedTexImage2D。③ KTX2 supercompression——这是 KTX2 相对 KTX1 最大的进化。GPU 块压缩(BC7 / ASTC)在 GPU 端是不能再压的——它们必须保持"硬件能直接 sample"的格式。但传输时(网络下载、磁盘存储)可以再用 Zstd 把字节流压一遍,运行时解压回原样再 upload。Basis Universal 更激进:它在 KTX2 里存的是一种"中间表示",运行时按目标设备转码成 BC7(桌面 D3D12 / Vulkan)、ETC2(老移动)或 ASTC(现代移动)——一个文件,所有平台。④ 多对象类型——同一份 KTX2 可以装单 2D 纹理、cubemap(6 face)、texture array(N layer)、3D 体积纹理,甚至带 mipmap 的 cubemap array(常用于 IBL 反射探针)。glTF 2.0 用 KHR_texture_basisu 扩展把 KTX2 + Basis 钉成 PBR 资产的官方携带格式。

KTX rests on four pillars. ① Header + level index — an 80-byte header describes the texture's logical dimensions (width / height / depth / mip levels / array layers / faces); then a level index lists the byte offset and byte length of every mip level. With "index first, data later" a loader can seek straight to any level without scanning the whole file. ② Padding inside each mip level — GPUs require texture rows to land on hardware-aligned boundaries (typically 4- or 8-byte). KTX bakes the padding into the file so the runtime can memcpy a row straight into glCompressedTexImage2D. ③ KTX2 supercompression — the headline upgrade over KTX1. GPU block compression (BC7 / ASTC) cannot be re-compressed on the GPU — the format has to stay "hardware-sampleable". But for transit (download, disk) the byte stream can be Zstd'd once and decompressed at load time before upload. Basis Universal goes further: KTX2 stores an intermediate representation that the runtime transcodes per-device into BC7 (desktop D3D12 / Vulkan), ETC2 (older mobile) or ASTC (modern mobile). One file, every platform. ④ Multi-object payload — a single KTX2 can carry a 2D texture, a cubemap (6 faces), a texture array (N layers), a 3D volume texture, even a mipmapped cubemap array for IBL reflection probes. glTF 2.0's KHR_texture_basisu extension nails KTX2 + Basis as the official carrier for PBR assets.

适用

USE FOR

  • glTF 2.0 模型纹理(KHR_texture_basisu)
  • WebGPU / WebGL 2 资产管线
  • 跨平台游戏纹理(一个 .ktx2 + Basis,运行时转目标格式)
  • cubemap / texture array / 3D volume 纹理打包
  • 需要流式加载的大尺寸纹理(smallest-first mip 顺序)
  • glTF 2.0 model textures (KHR_texture_basisu)
  • WebGPU / WebGL 2 asset pipelines
  • Cross-platform game textures (one .ktx2 + Basis, transcoded at runtime)
  • Cubemaps, texture arrays, 3D volumes packed into one file
  • Large textures that need streaming (smallest-first mip order)

反适用

AVOID

  • Web 主图 / 普通照片——KTX2 不是"图片格式",浏览器 <img> 不解
  • 编辑链(Photoshop / Affinity)——这是终端纹理资产,不是工作格式
  • 不需要 GPU 直接 sample 的场景(用 PNG / WebP)
  • Web hero images / regular photos — KTX2 is not an image format, <img> won't decode it
  • Editing chains (Photoshop / Affinity) — this is a final-asset format, not a working format
  • Anything that doesn't need direct GPU sampling — use PNG / WebP
scopebrowsers / enginestoolsCLI
KTX2 / Basis 浏览器原生 · WebGL/WebGPU 通过 loader · Babylon.js · three.js KTX2Loader Khronos KTX-Software · NVIDIA Texture Tools Exporter · AMD Compressonator toktx --bcmp --t2 out.ktx2 in.png · basisu in.png -ktx2 -uastc
概念前辈:conceptual ancestor: DDS (the Microsoft container that came first) 装载:payloads it carries: BC1 · BC2/3 · BC4/5 · BC6H · BC7 · ETC2 · ASTC · Basis Universal 绑定:bound to: glTF 2.0 (KHR_texture_basisu) · WebGPU asset pipelines

DDS — DirectDraw Surface 容器

DDS — the DirectDraw Surface container

YEAR 1999 (DirectX 7.0) AUTHOR Microsoft EXT .dds MIME — (无官方 MIME · 私有格式) STD Microsoft DirectX SDK 文档(非 ISO) KIND container (DXT / BCn / 未压缩 RGBA payload) DEPTH payload-defined (8 / 16 / float16 / float32) ALPHA payload-defined ANIM mip chain · cubemap · array · volume STATUS Windows / D3D 主流 · 跨平台基本不用

"D3D 时代的 KTX,只是没人记得它先来。"

"The KTX of the D3D era — except few remember it came first."

1999 年 Direct3D 7.0 推出的时候,游戏行业急需一个"硬件能直接 sample 的纹理容器"——你不能用 BMP / TGA,因为它们是 CPU 端 RGBA,显卡读到要先解压再上传,带宽吃不住。微软干脆把 DirectDraw Surface(.dds)定义成纹理资产的标准磁盘格式:头部 124 字节描述维度 / mip 数 / pixel format / cubemap 标记,后面直接是 DXT(后来的 BCn)块或未压缩 RGBA8 字节流。Khronos 的 KTX 要 6 年后(2005)才出来。所以严格讲,"GPU 纹理容器"这个范式是微软先做的——KTX 是开放生态对它的回应。Bethesda 时代的 PC 游戏 mod 圈,几乎所有纹理替换包都是 .dds——这就是它的护城河。

When Direct3D 7.0 shipped in 1999, the games industry urgently needed "a texture container the hardware could sample directly". BMP and TGA were CPU-side RGBA — the GPU would have to decompress and re-upload before sampling, and the bus simply couldn't take it. Microsoft defined DirectDraw Surface (.dds) as the standard on-disk texture asset: a 124-byte header describing dimensions / mip count / pixel format / cubemap flags, followed by raw DXT (later BCn) blocks or uncompressed RGBA8. Khronos's KTX wouldn't appear for another six years. Strictly speaking, the "GPU texture container" idea was Microsoft's first — KTX is the open-ecosystem reply. The Bethesda-era PC modding scene (Skyrim / Fallout) shipped texture replacements almost exclusively as .dds — that's the moat that keeps DDS relevant.

DDS · FILE LAYOUT 'DDS ' 4 B DDS_HEADER 124 B · w/h/mip/fmt DX10 ext (opt) 20 B · DXGI_FORMAT pixel payload mip 0 (largest) → mip 1 → mip 2 → ... → mip N (1×1) cubemap: +X, -X, +Y, -Y, +Z, -Z 顺序拼接 · array: layer 0..N 拼接 block payload(BC1/2/3/4/5/6H/7) 或未压缩 RGBA8/16/float 魔数: 0x44 0x44 0x53 0x20 → "DDS "(尾随空格)
图 15 · DDS 文件布局。"DDS " 4 字节魔数 → 124 字节 DDS_HEADER(width/height/mipcount/pitch/PixelFormat)→ 可选 20 字节 DX10 扩展头(用 DXGI_FORMAT 枚举,DX10+ 才能描述 BC6H/BC7) → 像素 payload。Mip 顺序与 KTX1 一致:largest-first;cubemap 按 +X/-X/+Y/-Y/+Z/-Z 顺序拼接。
Fig 15 · DDS file layout. The 4-byte "DDS " magic → a 124-byte DDS_HEADER (width / height / mipcount / pitch / PixelFormat) → an optional 20-byte DX10 extension header (DXGI_FORMAT enum — required for BC6H / BC7 on DX10+) → the pixel payload. Mip order matches KTX1 (largest-first); cubemap faces are concatenated in +X/-X/+Y/-Y/+Z/-Z order.

技术内核

Technical core

DDS 的结构简单到几乎没什么可讲的——这是它的优点。① 头部 124 字节 DDS_HEADER:固定字段描述 width / height / depth(volume 纹理用)/ mipMapCount / pitch(每行 byte 数)/ PixelFormat(老的 FourCC 字段:DXT1/DXT3/DXT5/...);加上 dwCaps / dwCaps2 标记(cubemap / volume / mip)。② DX10 扩展头 20 字节(可选):DirectX 10+ 引入的现代头,用 DXGI_FORMAT 枚举(DXGI_FORMAT_BC7_UNORM / DXGI_FORMAT_BC6H_UF16 / ...)替代 FourCC——因为新的块压缩格式(BC6H、BC7)的 FourCC 名字位不够用了。③ payload 直接是块压缩字节流——没有 padding 设计、没有 supercompression、没有 key-value metadata,只有最直接的 mip + face + layer 字节拼接。这是它跟 KTX2 最大的差距:DDS 是"足够好"的工程容器,KTX2 是"考虑到 Web / 跨平台 / Basis 转码"的现代容器。但对于 Windows / D3D 闭环,DDS 已经够用 25 年。

DDS's structure is almost embarrassingly simple — and that's its strength. ① The 124-byte DDS_HEADER: fixed fields for width / height / depth (for volume textures) / mipMapCount / pitch (bytes per row) / PixelFormat (the old FourCC field — DXT1 / DXT3 / DXT5 / …); plus dwCaps / dwCaps2 flags (cubemap / volume / mip). ② The optional 20-byte DX10 extension header: a modern header introduced in DirectX 10+ that swaps FourCC for the DXGI_FORMAT enum (DXGI_FORMAT_BC7_UNORM / DXGI_FORMAT_BC6H_UF16 / …) — necessary because newer block formats (BC6H, BC7) ran out of FourCC bits. ③ The payload is just block-compressed bytes — no padding scheme, no supercompression, no key-value metadata, just the most direct possible concatenation of mip × face × layer bytes. That's the gap with KTX2: DDS is a "good enough" engineering container, KTX2 is a modern container that thinks about Web, cross-platform delivery and Basis transcoding. For a Windows / D3D walled garden, though, DDS has been sufficient for 25 years.

适用

USE FOR

  • Windows PC 游戏纹理资产
  • D3D9 / D3D11 / D3D12 引擎(原生支持)
  • Bethesda / Valve / id Tech 等老牌游戏 mod 包
  • Unreal Engine 4 / 5 中间纹理(导入前)
  • Windows PC game texture assets
  • D3D9 / D3D11 / D3D12 engines (native support)
  • Bethesda / Valve / id Tech mod packs
  • Unreal Engine 4 / 5 intermediate textures (pre-import)

反适用

AVOID

  • 跨平台 / Web / 移动端——用 KTX2
  • 需要 supercompression(Zstd / Basis)的资产管线
  • 需要 ICC color profile / 丰富 metadata 的工程影像
  • Cross-platform / Web / mobile — use KTX2
  • Pipelines that need supercompression (Zstd / Basis)
  • Engineering imagery that needs ICC profiles or rich metadata
scopeenginestoolsCLI
DDS D3D9–12 原生 · Unreal · Unity · Source / id Tech NVIDIA Texture Tools · DirectXTex · GIMP DDS 插件 · Photoshop NVIDIA Plug-in texconv -f BC7_UNORM in.png · nvtt_export -f bc7 -o out.dds in.png
父:parent: Microsoft DirectDraw → DirectX 7.0 (1999) 概念后裔:conceptual descendant: KTX / KTX2 (open-ecosystem reply, 6 years later) 装载:payloads: BC1 · BC2/3 · BC4/5 · BC6H · BC7 · uncompressed RGBA

BC1 (DXT1) — 4×4 块、4 bpp 的祖宗

BC1 (DXT1) — the 4×4-block, 4-bpp ancestor

YEAR 1998 (S3TC) · 2000s 进入 D3D 改名 DXT1 · 2008+ Khronos 改名 BC1 AUTHOR S3 Graphics (S3TC) → Microsoft (DXT1) → Khronos (BC1) EXT — (payload, 装在 DDS / KTX 里) MIME STD D3D BC1_UNORM · OpenGL EXT_texture_compression_s3tc · Vulkan VK_FORMAT_BC1_RGB BLOCK 4×4 / 8 byte = 4 bpp DEPTH RGB(端点 5:6:5)+ 1-bit alpha(可选) ALPHA 1-bit (透 / 不透 二选一) SAMPLE GPU 硬件原生(纹理单元直接解块) STATUS 桌面 / 主机 GPU 全平台原生 25 年

"4 个像素压成 8 字节,显存砍掉 8 倍,从此再也回不去。"

"Four pixels squeezed into eight bytes — VRAM cut 8×, no going back."

1998 年的 GPU 显存极其稀缺——NVIDIA Riva TNT 旗舰 16 MB,普通卡 8 MB,而一张 256×256 的 RGBA 纹理就要 256 KB。一个游戏关卡要几十张纹理,显存装不下,带宽更扛不住(显存带宽要支撑帧缓冲、Z-buffer、纹理 sample 三路并发)。S3 Graphics 提出 S3TC(S3 Texture Compression):把 4×4 = 16 个像素打包成 8 字节,体积压到 1/8(原 64 字节),GPU 纹理单元在 sample 时硬件解块——不需要 CPU 全图解压上传,显存里存的就是块数据。一夜之间,同样显存能装 8 倍的纹理,带宽吃掉 1/8。这是 GPU 块压缩的开山之作,定义了往后 25 年所有 BCn / ETC / ASTC 的基础范式:固定大小块 + 端点 + 内插 + 索引。

In 1998, GPU VRAM was scarce — NVIDIA's flagship Riva TNT had 16 MB, mid-range cards 8 MB. A single 256×256 RGBA texture cost 256 KB. A game level needed dozens; the VRAM couldn't hold them and the bus couldn't feed them (memory bandwidth had to serve framebuffer, Z-buffer and texture sampling at the same time). S3 Graphics proposed S3TC (S3 Texture Compression): pack 4×4 = 16 pixels into 8 bytes, an 8× shrink from the original 64 bytes; the texture unit decodes a block on the fly during sampling, so VRAM stores the compressed blocks directly without any CPU-side full-image decompression. Overnight, the same VRAM could hold 8× as many textures and the bus had to move 1⁄8 the bytes. This is the founding act of GPU block compression and it set the template every later BCn / ETC / ASTC variant follows: fixed-size block + endpoints + interpolation + per-pixel index.

BC1 · 4×4 BLOCK = 8 BYTES = 4 BPP c0 RGB565 c1 RGB565 → interpolate c2 2c0+c1 /3 c3 c0+2c1 /3 2-bit index / pixel 0 2 3 1 2 2 3 3 0 2 3 1 0 0 2 1 byte 布局 / byte layout: c0 (2 B) c1 (2 B) indices · 16 px × 2 bit = 4 B 总 8 B / 16 px = 4 bpp · 同尺寸未压缩 RGBA8 = 64 B → 8× 压缩比
图 16 · BC1 块解码。每 4×4 块 8 字节:c0 / c1 是两个 RGB565 端点(各 2 字节);c2 / c3 由 c0 / c1 线性插值算出(c2 = (2c0 + c1)/3,c3 = (c0 + 2c1)/3);剩 4 字节是 16 个像素的 2-bit index,每个像素从 c0/c1/c2/c3 四色里选一个。8 字节 / 16 像素 = 4 bpp,比未压缩 RGBA8 的 32 bpp 小 8 倍——而且 GPU 纹理单元硬件解块,sample 时 0 开销。
Fig 16 · BC1 block decode. Every 4×4 block is 8 bytes: c0 / c1 are two RGB565 endpoints (2 bytes each); c2 / c3 are linearly interpolated from c0 / c1 (c2 = (2c0 + c1)/3, c3 = (c0 + 2c1)/3); the remaining 4 bytes hold 16 × 2-bit indices, each picking one of {c0, c1, c2, c3} per pixel. 8 bytes ÷ 16 pixels = 4 bpp — 8× smaller than uncompressed RGBA8 at 32 bpp — and the GPU's texture unit decodes a block in hardware with zero per-sample cost.

技术内核

Technical core

BC1 的"4×4 块 + 端点 + 内插 + 索引"四件套是它的全部技术内核,也是后面所有 BCn / ETC / ASTC 都在改进的同一个范式。① 固定大小块——4×4,绝不可变。这是为了让 GPU 纹理单元能直接通过坐标计算定位到块,不需要扫表;sample 一个像素只需要"算块号 → 加载 8 字节 → 解端点 → 查 index → 输出颜色"四步,完全硬件实现。② 端点 + 内插——只存两个端点 c0/c1(RGB565,各 16-bit),内插出 c2/c3 让块能表达 4 种颜色。这是个赌博:它假设一个 4×4 块内的颜色变化是"沿着色空间一条直线"的,适用于大多数自然纹理(草地、石头、皮肤)但对锯齿状颜色边缘会糊。③ 2-bit/像素 index——每像素只需要 2 bit 选 4 选 1,16 像素共 32 bit = 4 byte,跟 endpoints 的 4 byte 加一起正好 8 byte 一块。④ 1-bit alpha 隐藏档——如果 c0 ≤ c1(数值上),BC1 进入"alpha 模式":c3 变成"完全透明",c2 = (c0 + c1)/2 只有一种内插;每像素的 index = 3 表示透明。这就是 BC1 的"穷人 alpha"——只有透/不透,但不占额外字节。需要平滑 alpha 必须升级 BC2 / BC3。

BC1's "4×4 block + endpoints + interpolation + index" combo is the entire technical core — every later BCn / ETC / ASTC just iterates on this same template. ① Fixed-size blocks — 4×4, immutable. This lets the GPU's texture unit address a block directly via coordinate arithmetic, no lookup needed; sampling one pixel reduces to "compute block id → load 8 bytes → decode endpoints → read index → emit colour", four steps, all hardware. ② Endpoints + interpolation — only two endpoints c0/c1 (RGB565, 16 bits each) are stored; c2/c3 are interpolated so the block expresses four colours. It's a bet: BC1 assumes the colour variation in any 4×4 block lies along a straight line in colour space. True enough for most natural textures (grass, stone, skin), but jagged colour edges blur. ③ 2 bits per pixel of index — each pixel just needs 2 bits to choose one of four colours; 16 pixels × 2 bits = 32 bits = 4 bytes, which combined with the 4 bytes of endpoints lands exactly at 8 bytes per block. ④ 1-bit alpha hidden mode — if c0 ≤ c1 numerically, BC1 enters "alpha mode": c3 becomes fully transparent, c2 = (c0 + c1)/2 is the only interpolated colour, and an index of 3 means transparent. That's BC1's "poor man's alpha" — opaque/transparent only, no extra bytes. For smooth alpha you have to step up to BC2 / BC3.

适用

USE FOR

  • 不带 alpha 或仅需 1-bit alpha 的 RGB 纹理
  • 老游戏 / 移动端低端设备(对带宽极度敏感)
  • 显存预算极紧的 lightmap / 大尺寸地形纹理
  • BC7 不可用的旧 D3D9 / OpenGL ES 2.0 平台
  • RGB textures with no alpha (or 1-bit alpha at most)
  • Older games / low-end mobile devices (extreme bandwidth sensitivity)
  • Lightmaps and large terrain textures with tight VRAM budgets
  • Legacy D3D9 / OpenGL ES 2.0 platforms where BC7 isn't available

反适用

AVOID

  • 需要平滑 alpha 渐变(粒子、烟雾、UI 圆角)——用 BC3 / BC7
  • 颜色梯度细致的高质量纹理——块伪影明显
  • 法线贴图——4-bpp 端点精度不够,用 BC5
  • HDR——用 BC6H
  • Smooth alpha gradients (particles, smoke, UI rounded corners) — use BC3 / BC7
  • Fine colour gradients in high-quality textures — block artefacts show
  • Normal maps — 4-bpp endpoint precision is too coarse, use BC5
  • HDR — use BC6H
scopeAPIstoolsCLI
BC1 / DXT1 / S3TC D3D 全版本 · Vulkan · Metal · OpenGL 4.2+ (ARB) / OpenGL ES 3.0+ (extension) NVIDIA Texture Tools · AMD Compressonator · texconv · Crunch (cross-platform) nvtt_export -f bc1 -o out.dds in.png · toktx --t2 --bcmp out.ktx2 in.png
父:parent: S3 Graphics · S3TC (1998, the very first hardware-decodable block format) 子:descendants: BC2 · BC3 · BC4 · BC5 · BC6H · BC7 · ETC1 / ETC2 · ASTC 同辈:sibling lineage: ETC family (mobile / Khronos) — same template, different endpoint scheme

BC2 / BC3 (DXT3 / DXT5) — alpha 处理两条路

BC2 / BC3 (DXT3 / DXT5) — two ways to handle alpha

YEAR 1999 (DirectX 7.0) AUTHOR S3 Graphics → Microsoft (DXT3 / DXT5) → Khronos (BC2 / BC3) EXT — (payload, 装在 DDS / KTX 里) MIME STD D3D BC2_UNORM · BC3_UNORM · GL EXT_texture_compression_s3tc BLOCK 4×4 / 16 byte = 8 bpp(BC1 的两倍) DEPTH RGBA(端点 5:6:5 + 4-bit / 8-bit alpha) ALPHA BC2 = 显式 4-bit · BC3 = 内插 8-bit SAMPLE GPU 硬件原生 STATUS 老但仍用 · 现代项目首选 BC7

"BC2 给你显式 4-bit alpha,BC3 让 alpha 也学 BC1 的内插。"

"BC2 gives explicit 4-bit alpha; BC3 lets alpha use the BC1 trick too."

BC1 的 1-bit alpha(透 / 不透)对游戏 UI 圆角、粒子边缘、烟雾、玻璃、毛发都不够——这些都需要平滑的 alpha 渐变(0 到 255 中间的值)。S3 / Microsoft 在 1998-1999 同时提出 DXT3 和 DXT5 两条路:DXT3(BC2)粗暴,每像素直接给 4-bit alpha,16 像素共 64 bit = 8 byte;再加 BC1 的 8 byte 颜色块,共 16 byte/块,8 bpp。DXT5(BC3)聪明,把 alpha 也当成"端点 + 内插"块——存 2 个 8-bit alpha 端点 + 6 个内插值(共 8 种 alpha) + 每像素 3-bit index;颜色块仍用 BC1 那套。两者体积一样(16 byte/块),但 BC3 在平滑 alpha 渐变(粒子、烟雾)上明显好,BC2 在锐利 alpha 边缘(UI 图标的 1-bit-like alpha)上略好——但实践中 BC3 几乎全胜。所以游戏圈 BC3 / DXT5 才是事实主流。

BC1's 1-bit alpha (opaque or transparent, nothing in between) wasn't enough for game UI rounded corners, particle edges, smoke, glass or hair — all of those need smooth alpha gradients (values between 0 and 255). S3 / Microsoft proposed DXT3 and DXT5 in 1998-1999, two roads. DXT3 (BC2) is brute force: store an explicit 4-bit alpha per pixel; 16 pixels × 4 bits = 64 bits = 8 bytes; plus the 8-byte BC1 colour block, total 16 bytes per block at 8 bpp. DXT5 (BC3) is clever: treat alpha as an "endpoints + interpolation" block too — 2 × 8-bit alpha endpoints + 6 interpolated values (8 alpha levels in total) + a 3-bit index per pixel; the colour block still uses BC1. Both occupy the same 16 bytes per block, but BC3 clearly wins on smooth alpha gradients (particles, smoke); BC2 has a slight edge on razor-sharp alpha edges (UI icons that are basically 1-bit alpha). In practice BC3 wins almost everywhere — so the games industry treats BC3 / DXT5 as the de-facto default.

BC2 vs BC3 · ALPHA BLOCK CONTRAST BC2 · 显式 4-bit / 像素 15 12 9 5 12 15 9 3 9 9 5 1 5 3 1 0 16 px × 4 bit = 8 B 无端点 · 无内插 优:锐利 alpha 边缘 缺:渐变粗(只 16 阶) BC3 · 端点 + 内插 + 3-bit index a0=255 a1=20 + 6 内插值: a0 a2 a3 a4 a5 a6 a7 a1 每 px · 3-bit index → 选 a0..a7: 0 1 2 3 4 5 6 7 8 阶平滑
图 17 · BC2 vs BC3 的 alpha 块对比。BC2 上半:每像素直接 4-bit alpha(0-15),共 8 byte,无端点无内插——alpha 边缘锐利但只有 16 阶,平滑渐变会出阶梯。BC3 下半:存 2 个 8-bit alpha 端点 + 6 个内插值(共 8 阶) + 每像素 3-bit index;颜色块两者都用 BC1 那 8 byte——总都是 16 byte/块。
Fig 17 · BC2 vs BC3 alpha-block contrast. BC2 (top): 4-bit alpha per pixel (0-15), 8 bytes total, no endpoints, no interpolation — sharp alpha edges but only 16 quantisation levels, so smooth gradients banding-stair. BC3 (bottom): 2 × 8-bit alpha endpoints + 6 interpolated values (8 levels total) + 3-bit per-pixel index; both formats use the same BC1 colour block (8 bytes), so each ends up at 16 bytes per block.

技术内核

Technical core

两个格式的核心差异全在 alpha 块。① 都是 16 byte/块,8 bpp——BC1 的颜色块 8 byte 不变,各加 8 byte 的 alpha 块。颜色端点和 BC1 一样:c0/c1 RGB565 + 内插 c2/c3 + 2-bit index——没区别。② BC2 的 alpha 块 = 16 个 4-bit 直接值——每像素 0-15 表示 alpha 量化到 16 阶。优点:对锐利 alpha 边界(UI 图标、纹理掩码)无量化误差;缺点:平滑渐变只有 16 阶,会出 banding。BC2 在 1999 年被一些早期 UI 系统用过,后来逐渐让位给 BC3。③ BC3 的 alpha 块 = BC1 alpha 化——存 2 个 8-bit alpha 端点 a0/a1(各 1 byte = 2 byte),如果 a0 > a1 用 6 个 1/7 步长内插值(共 8 阶),如果 a0 ≤ a1 用 4 个内插值 + 2 个保留(0 和 255 的硬端点)= 8 阶里有 2 个固定;每像素 3-bit index(16 px × 3 bit = 48 bit = 6 byte)。共 2+6 = 8 byte。BC3 在平滑 alpha(粒子、烟雾、毛发)上明显优于 BC2,代价是锐利 alpha 边缘会有轻微模糊。④ 命名混乱:游戏圈一般叫 DXT3 / DXT5(D3D 老命名),Khronos / Vulkan / Metal 一般叫 BC2 / BC3——同一个东西两套名字,是 OpenGL 和 D3D 命名分歧的活化石。

The whole difference between the two lives in the alpha block. ① Both are 16 bytes per block, 8 bpp — the BC1 colour block (8 bytes) is unchanged; each format adds an 8-byte alpha block. Colour endpoints, c2/c3 interpolation and 2-bit indices are identical to BC1 — no surprises there. ② BC2's alpha block = 16 explicit 4-bit values — each pixel quantises alpha to one of 16 levels. Pro: zero quantisation error on sharp alpha edges (UI icons, masks). Con: only 16 levels, so smooth gradients band. BC2 saw use in some early-2000s UI systems and then quietly handed the baton to BC3. ③ BC3's alpha block = BC1, applied to alpha — store 2 × 8-bit alpha endpoints a0/a1 (1 byte each = 2 bytes); if a0 > a1, interpolate 6 values at 1/7 steps (8 levels total); if a0 ≤ a1, interpolate 4 values + reserve two slots for hard 0 and 255 (2 of the 8 levels are fixed); 3-bit index per pixel (16 × 3 = 48 bits = 6 bytes). Total 2 + 6 = 8 bytes. BC3 clearly beats BC2 on smooth alpha (particles, smoke, hair), at the cost of slightly fuzzier sharp alpha edges. ④ Naming chaos: the games industry says DXT3 / DXT5 (D3D legacy); Khronos / Vulkan / Metal say BC2 / BC3 — same thing, two name systems, a living fossil of the OpenGL-vs-D3D naming split.

适用

USE FOR

  • BC2 → 锐利 alpha 边缘的 UI 图标、纹理掩码
  • BC3 → 平滑 alpha 渐变的粒子、烟雾、毛发、玻璃、UI 圆角
  • 需要 alpha 但 BC7 不可用的旧平台(D3D9 / GL ES 2.0)
  • BC2 → sharp alpha edges (UI icons, texture masks)
  • BC3 → smooth alpha gradients (particles, smoke, hair, glass, UI rounded corners)
  • Anything that needs alpha on legacy platforms where BC7 isn't available (D3D9 / GL ES 2.0)

反适用

AVOID

  • 现代项目(2015+)——BC7 在质量上完全替代,体积一样
  • 法线贴图——用 BC5
  • HDR——用 BC6H
  • Modern projects (2015+) — BC7 fully replaces both at the same size with better quality
  • Normal maps — use BC5
  • HDR — use BC6H
scopeAPIstoolsCLI
BC2 / BC3 D3D 全版本 · Vulkan · Metal · OpenGL 4.2+ NVIDIA Texture Tools · AMD Compressonator · texconv · Crunch nvtt_export -f bc3 -o out.dds in.png · texconv -f BC3_UNORM in.png
父:parent: BC1 (alpha block bolted onto the 8-byte colour block) 现代替代:modern replacement: BC7 (same 8 bpp, much better quality, all-in-one mode) 兄弟:sibling: BC4 / BC5 (took the alpha block and re-purposed it for single / dual channels)

BC4 / BC5 — 单/双通道,法线贴图省一通道

BC4 / BC5 — single / dual channel, dropping a channel from normal maps

YEAR 2007 (DirectX 10) AUTHOR Microsoft / Khronos EXT — (payload, 装在 DDS / KTX 里) MIME STD D3D BC4_UNORM / BC5_UNORM · GL ARB_texture_compression_rgtc · Vulkan VK_FORMAT_BC4/5_* BLOCK 4×4 / BC4 = 8 byte (4 bpp) · BC5 = 16 byte (8 bpp) DEPTH BC4 = 1 通道 8-bit · BC5 = 2 通道 8-bit ALPHA — (没有 alpha 概念,通道任意) SAMPLE GPU 硬件原生 STATUS 法线贴图行业标准 · 灰度/高度图标准

"法线贴图省一通道,显存再砍一半。"

"Drop a channel from normal maps; halve the VRAM again."

游戏图形里,法线贴图是仅次于 albedo 的第二大显存消耗——每个像素一个法线向量(X, Y, Z)。直觉上要 RGB 三通道,但法线是单位向量(长度 = 1),所以 Z 可以由 X / Y 推导出来:Z = sqrt(1 - X² - Y²)。这意味着实际只需要存 X / Y 两个通道,Z 在 fragment shader 里现算。BC5 就是为这个场景设计的——只存 R / G 两通道,每个通道用 BC3 的 alpha 块法(端点 + 内插 + 3-bit index),共 16 byte/块、8 bpp。BC4 是 BC5 的"半个版本",只存一个通道,用于灰度纹理:高度图、roughness 图、AO 遮罩、metallic 通道。BC4 / BC5 的本质是"把 BC3 的 alpha 块单独拎出来当颜色通道用"——这种"通道拆分 + 几何内插"的思路把法线贴图的显存占用从 BC3 RGB 的 8 bpp 直接砍到 BC5 RG 的 8 bpp 但质量提升 3-5×(因为不浪费 bits 在不需要的通道上)。

In game graphics, normal maps are the second-largest VRAM hog after albedo — every pixel stores a normal vector (X, Y, Z). Intuitively that means three RGB channels, but a normal is a unit vector (length 1), so Z can be derived: Z = sqrt(1 − X² − Y²). You really only need to store X / Y; the fragment shader recomputes Z. BC5 is built for exactly that — store just R / G, each compressed with the BC3 alpha-block trick (endpoints + interpolation + 3-bit index), 16 bytes per block at 8 bpp. BC4 is the "half-version" of BC5: just one channel, for greyscale textures — height maps, roughness maps, AO masks, the metallic channel. BC4 / BC5 are essentially "BC3's alpha block lifted out and used as a colour channel". This "channel split + geometric interpolation" trick keeps normal maps at 8 bpp (same as BC3 RGB) but bumps quality 3-5× because no bits are wasted on a channel you don't need.

NORMAL MAP · RGB → BC5 (RG) + SHADER Z 朴素 RGB 法线 (3 通道) R=X G=Y B=Z BC3 RGB → 8 bpp B 通道浪费(可推导) BC5 仅存 RG (2 通道) R=X G=Y 同 8 bpp,但 X/Y 各得 4-bit 端点 质量 3-5× 提升 shader 现算 Z: Z = √(1−X²−Y²) 单 sqrt + 2 mul + 1 sub BC4 = BC5 半个版本(单通道):roughness · AO · 高度图 · metallic 8 byte/块 = 4 bpp · 端点 a0/a1 + 6 内插 + 3-bit index 跟 BC3 alpha 块同构
图 18 · 法线贴图从 RGB 3 通道压缩到 BC5 2 通道,Z 在 shader 里用 z = √(1 − x² − y²) 现算。同样 8 bpp,BC5 因为不浪费 bits 在 B 通道,X / Y 各得到更多 endpoint 精度,法线质量提升 3-5×。BC4 是 BC5 的半通道版,用于 roughness / AO / 高度图等单通道纹理。
Fig 18 · A normal map shrinks from 3-channel RGB to 2-channel BC5, with Z reconstructed in-shader as z = √(1 − x² − y²). Same 8 bpp budget — but because BC5 doesn't waste bits on the B channel, X / Y each get more endpoint precision, lifting normal-map quality 3-5×. BC4 is the half-channel version of BC5, used for single-channel textures (roughness, AO, height maps).

技术内核

Technical core

BC4 / BC5 的设计思路简洁到一句话:把 BC3 的 alpha 块当成"通用的单通道压缩块"用。① BC4 = BC3 的 alpha 块独立——4×4 块,8 byte;存 2 个 8-bit 端点 r0 / r1(2 byte)+ 6 个内插值(隐含,不占字节,运行时算)+ 每像素 3-bit index(16 × 3 = 48 bit = 6 byte);共 8 byte / 16 像素 = 4 bpp。每像素只有一个 8-bit 通道(原数据的 R)。② BC5 = 两个 BC4 块叠加——一个 BC4 块存 R(法线 X),一个 BC4 块存 G(法线 Y);共 16 byte / 块 = 8 bpp。Z 不存,fragment shader 里算 z = sqrt(1 - x*x - y*y)——单 sqrt + 2 mul + 1 sub,GPU 一周期完成。③ BC4 的"unsigned"和"signed"两种模式:BC4_UNORM(0-255)和 BC4_SNORM(-128 到 127),后者专门给法线分量这种"中心对称"信号用,避免 0.5 偏置。BC5 同理。④ 命名又分裂:Khronos 叫 BC4 / BC5,Microsoft 老命名叫 ATI1 / ATI2(AMD 提出的格式名),OpenGL ARB 扩展叫 RGTC1 / RGTC2(Red-Green Texture Compression)——三套名,一个东西。游戏引擎源码里三种叫法都能见到。

BC4 / BC5's design boils down to one sentence: take BC3's alpha block and reuse it as a generic single-channel compression block. ① BC4 = BC3's alpha block, standalone — 4×4 block, 8 bytes; 2 × 8-bit endpoints r0 / r1 (2 bytes) + 6 implicit interpolated values (computed at runtime, no bytes spent) + 3-bit per-pixel index (16 × 3 = 48 bits = 6 bytes); total 8 bytes / 16 pixels = 4 bpp. Each pixel carries one 8-bit channel (the input's R). ② BC5 = two BC4 blocks stacked — one BC4 block for R (normal X), one for G (normal Y); 16 bytes per block = 8 bpp. Z isn't stored — the fragment shader computes z = sqrt(1 − x*x − y*y), one sqrt + two muls + one sub, retired in a single GPU cycle. ③ BC4 has UNORM and SNORM modes — BC4_UNORM (0-255) and BC4_SNORM (−128 to 127); the signed variant is specifically for centre-symmetric signals like normal components, avoiding a 0.5 bias. BC5 mirrors this. ④ Naming forks again: Khronos says BC4 / BC5; Microsoft's legacy names are ATI1 / ATI2 (AMD-coined names); the OpenGL ARB extension calls them RGTC1 / RGTC2 (Red-Green Texture Compression). Three names, one thing — and you'll see all three in any sufficiently old engine source tree.

适用

USE FOR

  • BC5 → 法线贴图(行业标准,Unreal / Unity / id Tech 默认)
  • BC4 → roughness / metallic / AO / 高度图 等单通道
  • SDF(Signed Distance Field)字体纹理(BC4)
  • 需要 R / G 双通道但不需要 B 的任何场景
  • BC5 → normal maps (industry standard — Unreal / Unity / id Tech default)
  • BC4 → single-channel data: roughness / metallic / AO / height maps
  • SDF (Signed Distance Field) font textures (BC4)
  • Anything that needs R / G but not B

反适用

AVOID

  • 需要 RGB 三通道的彩色纹理(用 BC1 / BC7)
  • HDR(用 BC6H)
  • 3-channel colour textures (use BC1 / BC7)
  • HDR (use BC6H)
scopeAPIstoolsCLI
BC4 / BC5 D3D10+ · Vulkan · Metal · OpenGL 4.0+ (RGTC) NVIDIA Texture Tools · AMD Compressonator · texconv · Unreal / Unity 自动用 nvtt_export -f bc5 -o normal.dds normal.png · texconv -f BC5_UNORM normal.png
父:parent: BC3 (alpha block lifted out and reused as a generic single-channel block) 高质量替代:higher-quality alternative: BC7 (for normal maps that demand the very best, at 8 bpp like BC5) 同辈别名:sibling aliases: ATI1 / ATI2 (AMD names) · RGTC1 / RGTC2 (OpenGL ARB)

BC6H — HDR 块压缩

BC6H — HDR block compression

YEAR 2011 (DirectX 11) AUTHOR Microsoft / Khronos (BPTC family) EXT — (payload, 装在 DDS / KTX 里) MIME STD D3D BC6H_UF16 / SF16 · GL ARB_texture_compression_bptc · Vulkan VK_FORMAT_BC6H_* BLOCK 4×4 / 16 byte = 8 bpp DEPTH float16 RGB · 范围 [-65504, 65504] ALPHA — (无 alpha,HDR 场景一般不需要) SAMPLE GPU 硬件原生 STATUS HDR cubemap / IBL 反射探针 行业标准

"显存里的 HDR — 反射探针、cubemap 全靠它。"

"HDR in VRAM — reflection probes and cubemaps depend on it."

PBR(基于物理的渲染)需要 HDR 环境贴图——天空、室内 IBL 反射探针、自发光场景全是。问题是 BC1-5 都基于 8-bit/通道 端点 + 内插,根本无法表达 float16 的 [-65504, 65504] 范围。如果用未压缩 RGBA16F,一张 1024×1024 的 cubemap(6 面)要 1024×1024×6×8 = 48 MB。一个室外场景几张 cubemap 几百 MB 就没了。BC6H 是 D3D11 时代专门为 HDR 设计的块压缩:4×4 块、16 byte/块、8 bpp(跟 BC7 同尺寸),但 payload 直接是 float16 RGB(无 alpha)。它用 14 种块模式来权衡精度——根据这块的颜色分布选最合适的模式。BC6H 让 HDR cubemap 体积从 RGBA16F 的 64 bpp 砍到 8 bpp(8× 压缩),同时保持 float16 的动态范围——这是 PBR 渲染管线得以普及的硬件基础。Unreal Engine 4 / 5、Unity HDRP 默认对 cubemap 的 HDR 资产用 BC6H。

PBR (physically based rendering) needs HDR environment maps — skies, indoor IBL reflection probes, emissive scenes all live in HDR. The trouble is that BC1-5 all rely on 8-bit-per-channel endpoints + interpolation, so they simply cannot express float16's [−65504, 65504] range. Uncompressed RGBA16F would cost 1024 × 1024 × 6 × 8 = 48 MB for a single 1024² cubemap (six faces); an outdoor scene with a handful of cubemaps blows past hundreds of MB. BC6H is the D3D11-era block format built specifically for HDR: 4×4 block, 16 bytes per block, 8 bpp (same size as BC7), but the payload is float16 RGB (no alpha). Its trick is 14 block modes that trade off precision differently — the encoder picks the mode best suited to that block's colour distribution. BC6H takes HDR cubemaps from RGBA16F's 64 bpp down to 8 bpp (8× compression) while keeping float16's dynamic range. That's the hardware foundation that lets PBR pipelines exist at scale today. Unreal Engine 4 / 5 and Unity HDRP default to BC6H for HDR cubemap assets.

BC6H · 14 BLOCK MODES + FLOAT16 RANGE 14 块模式 modeendptpartidx 110·10·1023 27·6·6·623 311·5·4·423 411·4·5·423 511·4·4·523 69·9·9·923 78·8·8·823 88·8·8·823 14 modes23 float16 数值范围 -65504 0 +65504 对比: RGBA16F uncompressed → 64 bpp BC6H compressed → 8 bpp (8× 压缩) BC1 / BC7 → 仅 LDR(0-1) UF16 = unsigned · SF16 = signed
图 19 · BC6H 的 14 种块模式表(每模式有不同的端点 bit 分配 / 是否分区 / index bit 数)。右侧:float16 的 [-65504, +65504] 数值范围,远超 BC1-7 的 0-1 LDR 区间;BC6H 是 GPU 唯一原生 HDR 块压缩,把 HDR cubemap 的体积从 RGBA16F 的 64 bpp 砍到 8 bpp。UF16 / SF16 区分无符号 / 有符号变体。
Fig 19 · BC6H's 14 block modes (each with different endpoint bit allocations, partition counts and index bit widths). Right: float16's [−65504, +65504] range, far beyond the 0-1 LDR range BC1-7 are limited to. BC6H is the GPU's only native HDR block-compression format, dropping HDR cubemaps from RGBA16F's 64 bpp to 8 bpp. UF16 / SF16 distinguish unsigned and signed variants.

技术内核

Technical core

BC6H 跟 BC1-5 不是同一类设计——它没有"统一 4 端点 + 内插"的简洁结构,而是 14 种块模式让编码器按块的颜色分布挑最优解。① 14 种块模式——每种模式给端点不同 bit 数(如 10-10-10、7-6-6-6、11-5-4-4 等三/四个分量)、是否启用 2 分区(把 4×4 块拆成两组,每组独立端点 + 内插,适用于块内有明显颜色边界的情况)、index 用 3-bit 还是 4-bit。编码器对每个块尝试多种模式,挑 PSNR 最高那个塞进 16 byte。② 端点用 float16 表示——这是 BC6H 区别于所有其他 BC 的核心。BC1-5 的端点是定点整数(RGB565 或 8-bit),只能表示 0-1;BC6H 的端点是浮点,可以表示 [-65504, 65504]——HDR 高光、太阳直射、自发光物体的真实数值都能装进去。③ UF16 (unsigned) vs SF16 (signed)——UF16 范围 [0, 65504],适合不会有负值的 HDR 颜色;SF16 范围 [-65504, 65504],适合可能有负值的 HDR 法线或其他工程数据。④ 4×4 块仍只 16 byte——这是工程上最重要的一点:BC6H 跟 BC7 一样是 8 bpp,HDR 的体积成本只比 LDR 多 1×(BC1 是 4 bpp,BC7 / BC6H 都是 8 bpp)。这个"HDR 不贵"的承诺让 IBL 反射探针 / cubemap 的大规模使用成为可能——Unreal Engine 默认每个室外场景烘焙几十张 BC6H cubemap。

BC6H isn't built like BC1-5 — there's no clean "two endpoints + interpolation" template. Instead, 14 block modes let the encoder pick the best fit for that block's colour distribution. ① 14 block modes — each mode allocates different bit counts to the endpoints (e.g. 10-10-10, 7-6-6-6, 11-5-4-4, three or four components), optionally enables 2-partition mode (split the 4×4 block into two regions, each with its own endpoints + interpolation, which helps when a block has a sharp colour boundary), and uses 3- or 4-bit indices. The encoder tries multiple modes per block and packs whichever maximises PSNR into the 16-byte block. ② Endpoints expressed as float16 — this is the one thing that sets BC6H apart from every other BCn. BC1-5 endpoints are fixed-point integers (RGB565 or 8-bit) capped at 0-1; BC6H endpoints are floating point and can express [−65504, 65504] — the actual numerical range of HDR highlights, direct sun, emissive surfaces. ③ UF16 (unsigned) vs SF16 (signed) — UF16's range is [0, 65504], suitable for non-negative HDR colour; SF16's is [−65504, 65504], suitable for HDR normals or other engineering data that may go negative. ④ 4×4 block, still just 16 bytes — and this is the most important engineering fact: BC6H is 8 bpp, the same as BC7. HDR costs only 1× more bytes than LDR (BC1 is 4 bpp, BC7 / BC6H are 8 bpp). That "HDR isn't expensive" promise is what makes large-scale IBL reflection probes and HDR cubemaps practical — Unreal Engine routinely bakes dozens of BC6H cubemaps per outdoor scene.

适用

USE FOR

  • HDR cubemap(天空盒、IBL 反射探针)
  • 烘焙的 lightmap HDR 部分
  • HDR 自发光纹理(霓虹灯、屏幕、火焰)
  • volumetric 体积纹理(雾 / 云,需要 HDR 强度)
  • HDR cubemaps (skyboxes, IBL reflection probes)
  • HDR portions of baked lightmaps
  • HDR emissive textures (neon, screens, flames)
  • Volumetric textures (fog / clouds — need HDR intensity)

反适用

AVOID

  • LDR 纹理(用 BC7,质量更好且支持 alpha)
  • 需要 alpha 的 HDR(BC6H 不支持 alpha)
  • D3D10 及以下平台(BC6H 是 D3D11+)
  • 移动 GPU 早期型号(看 BPTC / ASTC HDR 支持情况)
  • LDR textures (use BC7 — better quality and supports alpha)
  • HDR that needs alpha (BC6H has no alpha channel)
  • D3D10 and earlier (BC6H requires D3D11+)
  • Older mobile GPUs (check BPTC / ASTC HDR support)
scopeAPIstoolsCLI
BC6H D3D11+ · Vulkan · Metal (macOS / iOS Apple Silicon) · OpenGL 4.2+ (BPTC) NVIDIA Texture Tools · AMD Compressonator · texconv · ISPC bc6h_enc nvtt_export -f bc6h -o sky.dds sky.exr · texconv -f BC6H_UF16 sky.exr
家族:family: BCn block-compression lineage (extends the family into HDR for the first time) 同期发布:released alongside: BC7 (D3D11, the LDR sibling — same 8 bpp, complementary roles) 移动对应物:mobile counterpart: ASTC HDR (Khronos, same era, different geometry)

BC7 — 现代 BCn 的集大成

BC7 — the synthesis of modern BCn

YEAR 2011 (DirectX 11) AUTHOR Microsoft / Khronos (BPTC family) EXT — (payload, 装在 DDS / KTX 里) MIME STD D3D BC7_UNORM · GL ARB_texture_compression_bptc · Vulkan VK_FORMAT_BC7_* BLOCK 4×4 / 16 byte = 8 bpp DEPTH RGBA 8-bit MODES 8 种内部块模式 (mode 0-7) SAMPLE GPU 硬件原生 STATUS 桌面纹理王 · 现代 AAA 游戏 90% 资产用它

"一种格式,八种块模式,自动挑最合适那种。"

"One format, eight block modes — pick whichever fits best."

BC1-5 各自只擅长一种场景:BC1 是 RGB 无 alpha、BC2 是 RGB + 锐利 alpha、BC3 是 RGB + 平滑 alpha、BC4 是单通道、BC5 是双通道。游戏纹理混合场景多——一张角色贴图可能同时有平滑 RGB 渐变 + 锐利 alpha 边缘 + 高频金属反光,任何单一 BCn 都解释不了整张。美术希望"一种格式覆盖所有"——不用每张图手动挑 BCn。BC7 的解法是 8 种内部块模式 + 编码器为每个 4×4 块自动挑最合适那种:同样 8 bpp(跟 BC2 / BC3 一样),但同图视觉质量比它们好 5-10×,几乎追上未压缩。BC7 因此成为 D3D11 时代之后桌面游戏纹理的事实唯一选择——AAA 游戏 90% 桌面贴图都用 BC7。

BC1-5 each excel at exactly one scenario: BC1 is RGB without alpha, BC2 is RGB + sharp alpha, BC3 is RGB + smooth alpha, BC4 is single-channel, BC5 is dual-channel. Real game textures mix scenarios — a single character map can carry smooth RGB gradients, sharp alpha edges and high-frequency metallic specular all at once, and no single BCn explains the whole thing. Artists want "one format that covers everything" without per-texture format picking. BC7's answer: 8 internal block modes plus an encoder that picks the best mode per 4×4 block. At the same 8 bpp as BC2 / BC3, BC7 looks 5-10× better visually — close to uncompressed. That's why, post-D3D11, BC7 became the de-facto only choice for desktop game textures: 90 % of AAA desktop textures are BC7.

BC7 · 8 BLOCK MODES (FAMILY OVERVIEW) mode subsets endpt bits idx bits p-bits α / rot 034·4·436RGB · — 126·6·632RGB · — 235·5·520RGB · — 327·7·724RGB · — 415·5·5·62 / 30RGBA · rot 517·7·7·82 / 20RGBA · rot 617·7·7·742RGBA · — 725·5·5·524RGBA · — 蓝 = mode 0-3 偏 RGB 高质量 · 桔 = mode 4-7 偏 RGBA · subsets = 块内分区数
图 20a · BC7 的 8 种内部块模式总览。每种 mode 给端点不同的 bit 分配、不同的分区数(1/2/3 个子区)、不同的 index bit 数,以及可选的 p-bit / 通道旋转。蓝色 mode 0-3 偏 RGB 高质量,橙色 mode 4-7 偏 RGBA。一张 16 byte block 只能用其中一种 mode——编码器要为每个 4×4 块挑最合适那种。
Fig 20a · Overview of BC7's eight internal block modes. Each mode allocates different bit budgets to the endpoints, picks a different partition count (1, 2 or 3 subsets), uses different index widths, and optionally adds p-bits / channel rotation. Blue (mode 0-3) lean toward high-quality RGB; orange (mode 4-7) lean toward RGBA. A single 16-byte block uses exactly one mode — the encoder must pick the best mode per 4×4 block.
PER-BLOCK MODE SELECTION 4×4 px try mode 0 → SSE try mode 1 → SSE try mode 2 → SSE try mode 7 → SSE argmin SSE → mode N 16 byte
图 20b · BC7 的 per-block mode 选择:对每个 4×4 块,编码器枚举 8 种 mode(每种再加上若干分区候选)逐一试编码,计算 SSE(squared error sum),选最低那个,把结果塞进 16 byte block。这就是 BC7 编码慢的根源——每块要试几十到上百次。
Fig 20b · BC7's per-block mode selection: for each 4×4 block the encoder enumerates all 8 modes (each with several partition candidates), trial-encodes each, computes SSE (squared error sum), keeps the lowest, and packs it into the 16-byte block. This is exactly why BC7 encoding is slow — tens to hundreds of trials per block.
BC1 vs BC7 · PSNR (dB) 50 dB 35 dB 20 dB grass brick char UI normal BC1 (4 bpp) BC7 (8 bpp)
图 20c · 5 类典型纹理(草地 / 砖墙 / 角色 / UI / 法线)的 PSNR 对比:BC1 vs BC7。BC7 在所有场景全面领先,典型 +8-12 dB(对应视觉质量 5-10× 提升)。代价:BC7 是 8 bpp 而 BC1 是 4 bpp(体积 2×),编码时间 50-200×(详见下图)。
Fig 20c · PSNR comparison across five typical texture categories (grass, brick, character, UI, normals): BC1 vs BC7. BC7 wins everywhere, typically by +8-12 dB (corresponding to a 5-10× visual quality improvement). The cost: BC7 is 8 bpp vs BC1's 4 bpp (2× the bytes) and 50-200× the encode time (see next figure).
BC7 ENCODE TIME (× of BC1) BC1 = 1× naive brute ~250× ispc SIMD ~5× nvtt SIMD ~8× nvtt CUDA ~2× comp. SIMD ~6×
图 20d · BC7 在 5 种编码方法下相对 BC1 的编码时间:naive brute-force 8 mode × 64 子分区 ≈ 250×;Intel ISPC SIMD ≈ 5×(开源 ispc_texcomp 是行业救命稻草);NVIDIA nvtt SIMD ≈ 8×;nvtt CUDA GPU 加速 ≈ 2×;AMD Compressonator SIMD ≈ 6×。BC7 编码慢的本质是 per-block 枚举,工程上用 SIMD / GPU 才把它压回可接受水平。
Fig 20d · BC7 encode time relative to BC1 across five encoders: naive brute-force (8 modes × 64 partitions) ≈ 250×; Intel ISPC SIMD ≈ 5× (open-source ispc_texcomp was the industry's lifeline); NVIDIA nvtt SIMD ≈ 8×; nvtt CUDA GPU-accelerated ≈ 2×; AMD Compressonator SIMD ≈ 6×. BC7 is slow at heart because of per-block mode enumeration; SIMD and GPU are what bring it back to acceptable wall time.

技术内核

Technical core

BC7 的设计哲学跟 BC1-5 完全相反——BC1-5 是"一种结构覆盖一类场景",BC7 是"八种结构都做出来,让编码器临时挑"。① 8 种 mode (mode 0-7):每种 mode 内部不同的 (a) 区块切分(1 / 2 / 3 个子区,subsets——把 4×4 块拆成多组,每组独立端点 + 内插,适用于块内有明显颜色边界);(b) endpoint bit 分配(如 mode 1 给端点 6·6·6 高精度,mode 2 给 5·5·5 留更多 bit 给 index);(c) index bit width(2 或 3 或 4 bit,索引位越多越能精细内插);(d) 可选 p-bit(端点末位补一位精度)与 rotation(把 alpha 跟某个颜色通道交换,提升 alpha 精度)。② mode 0-3 偏 RGB 高质量,mode 4-7 偏 RGBA——RGB 模式给颜色更多 bit 但不要 alpha;RGBA 模式拨一些 bit 给 alpha 通道。这种"分工"让 BC7 既能当 BC1 的 RGB 升级,又能当 BC3 的 RGBA 升级,完全覆盖。③ 编码器枚举所有 mode 选最优——每个 4×4 块要对 8 mode × 几十种分区组合 × 端点优化跑一遍,计算 SSE(平方误差和),选 SSE 最低那个塞进 16 byte。这就是 BC7 编码慢的根本原因——典型 8K 纹理用 naive brute-force 要 40 分钟,Intel ISPC SIMD 后降到几秒。④ 8 bpp(同 BC2 / BC3,但视觉质量好 5-10×)——BC1 / BC4 是 4 bpp,BC7 / BC2 / BC3 / BC5 / BC6H 都是 8 bpp。BC7 跟 BC2 / BC3 同 bpp,胜在 mode 选择灵活,典型纹理 PSNR 高 +8-12 dB。⑤ 解码硬件原生——D3D11+ / GL 4.2+ / Vulkan / Metal 全平台支持,GPU sample 一个 BC7 texel 跟 sample 一个 RGBA8 一样快。这是 BC7 比"软件解码 + 上传"格式(如 KTX 装 zlib)的根本优势。

BC7's design philosophy inverts BC1-5: BC1-5 use one structure per scenario, BC7 ships eight structures and lets the encoder pick at runtime. ① 8 modes (mode 0-7), each varying along (a) partitioning (1, 2 or 3 subsets — splitting the 4×4 block into independent regions, useful when there's a sharp colour boundary inside the block); (b) endpoint bit allocation (mode 1 gives endpoints 6·6·6 high precision; mode 2 gives 5·5·5 and donates the saved bits to the index); (c) index bit width (2, 3 or 4 bits — more bits means finer interpolation); (d) optional p-bits (one extra LSB on the endpoints) and rotation (swap alpha with one of the colour channels to boost alpha precision when warranted). ② Mode 0-3 lean toward high-quality RGB; mode 4-7 lean toward RGBA — RGB modes give colour more bits with no alpha, RGBA modes shave bits off colour to fund alpha. That division of labour is what lets BC7 simultaneously upgrade BC1 (RGB) and BC3 (RGBA). ③ The encoder enumerates all modes and picks the optimum — for every 4×4 block it tries 8 modes × tens of partition combinations × endpoint optimisations, scores them by SSE (sum of squared error), and writes the best one into 16 bytes. This is the core reason BC7 encoding is slow: a typical 8K texture needs ~40 minutes with naive brute-force, dropping to seconds with Intel's ISPC SIMD encoder. ④ 8 bpp (the same as BC2 / BC3) with 5-10× better visual quality — BC1 / BC4 are 4 bpp; BC7 / BC2 / BC3 / BC5 / BC6H are all 8 bpp. At equal bpp BC7's mode-selection flexibility wins +8-12 dB PSNR over BC2 / BC3 on typical textures. ⑤ Hardware-native decoding — D3D11+ / GL 4.2+ / Vulkan / Metal all decode BC7 in silicon; sampling a BC7 texel costs the same as sampling RGBA8. That hardware-native sampling is BC7's fundamental advantage over "software-decode + upload" formats like KTX-with-zlib payloads.

BC7 · PER-BLOCK 8-MODE TRIAL · ARGMIN SSE · 16 BYTE OUT 4×4 RGBA 16 px · 64 byte raw one block of source mode 03 subsets · endpt 4·4·4 · idx 3 · p-bit 6 · RGB mode 12 subsets · endpt 6·6·6 · idx 3 · p-bit 2 · RGB mode 23 subsets · endpt 5·5·5 · idx 2 · RGB mode 32 subsets · endpt 7·7·7 · idx 2 · p-bit 4 · RGB mode 41 subset · endpt 5·5·5·6 · idx 2 / 3 · rotation · RGBA mode 51 subset · endpt 7·7·7·8 · idx 2 / 2 · rotation · RGBA mode 61 subset · endpt 7·7·7·7 · idx 4 · p-bit 2 · RGBA mode 72 subsets · endpt 5·5·5·5 · idx 2 · p-bit 4 · RGBA → SSE₀ → SSE₁ → SSE₂ → SSE₃ → SSE₄ → SSE₅ → SSE₆ → SSE₇ argmin pick min SSE 16 byte BC7 block For each 4×4 block, the encoder runs 8 trial encodes (each itself enumerating partitions / endpoint optimisations), scores them by SSE, and writes the winning mode into 16 bytes.

图 20 · BC7 完整编码流程:输入一个 4×4 RGBA 块,编码器并行尝试 8 种 mode(每种 mode 内部还要枚举分区方案 / 端点优化),为每种 mode 算出 SSE(squared error sum),取最低那个,把"哪种 mode + 端点 + index"打包成 16 byte block。整张纹理重复几十万次——这就是 BC7 编码慢的根源,也是 ISPC / CUDA 加速器存在的理由。

Fig 20 · BC7's full encode pipeline: take a 4×4 RGBA block, run trial encodes through all 8 modes (each one in turn enumerates partition layouts and endpoint optimisations), score them by SSE (sum of squared error), pick the lowest, and pack "chosen mode + endpoints + indices" into a 16-byte block. A whole texture repeats this hundreds of thousands of times — exactly why BC7 encoding is slow, and exactly why ISPC / CUDA-accelerated encoders exist.

formatbppRGBAqualityencode time
BC14RGB + 1-bit αlow1× (baseline)
BC38RGBAmedium
BC78RGBAhigh~50-200× of BC1
ASTC 4×48RGBAhigh+similar to BC7
ASTC 6×63.56RGBAmedium+similar to BC7
$ nvtt_export --bc7 in.png -o out.dds            # NVIDIA Texture Tools, GPU-accelerated
$ ispc_texcomp -bc7 in.png out.dds               # Intel SIMD encoder, ~10× faster than naive
$ toktx --encode bc7 out.ktx2 in.png             # wrap into KTX2 (web / WebGPU friendly)
$ texconv -f BC7_UNORM in.png                    # Microsoft DirectXTex CLI
$ Compressonator.exe -fd BC7 in.png out.dds      # AMD Compressonator

适用

USE FOR

  • 桌面 AAA 游戏纹理(角色 / 场景 / UI / 道具,99% 默认)
  • WebGPU 高质量纹理(KTX2 容器封装)
  • 同时需要 RGB 高保真 + alpha 的混合贴图
  • 升级现存 BC1 / BC3 资产以提升画质(同 / 双倍体积)
  • 金属反光 / 高频细节贴图(mode 6 RGBA 单分区 + 4-bit index 表现极佳)
  • Desktop AAA game textures (characters / environments / UI / props — 99 % default)
  • High-quality WebGPU textures (wrapped in KTX2 containers)
  • Mixed maps that need both fidelity-grade RGB and alpha
  • Upgrading existing BC1 / BC3 assets for better quality (same or 2× the bytes)
  • Metallic specular / high-frequency detail (mode 6 — RGBA single subset + 4-bit index — excels)

反适用

AVOID

  • 移动端(用 ASTC,块尺寸更灵活、bpp 可调)
  • HDR 纹理(用 BC6H,BC7 仍是 LDR 0-1)
  • D3D10 及以下的老硬件(BC7 是 D3D11+)
  • 实时编码场景(即便 SIMD 仍比 BC1 慢 5-10×,服务端实时压缩慎用)
  • 单 / 双通道贴图(用 BC4 / BC5 更省空间)
  • Mobile (use ASTC — flexible block sizes, tunable bpp)
  • HDR textures (use BC6H — BC7 is still LDR 0-1)
  • D3D10 or older hardware (BC7 requires D3D11+)
  • Real-time encoding (even SIMD is 5-10× slower than BC1; server-side live compression is risky)
  • Single / dual-channel maps (BC4 / BC5 are more space-efficient)
scopeAPIstoolsCLI
BC7 D3D11+ · Vulkan · Metal · OpenGL 4.2+ (BPTC) · WebGPU (texture-compression-bc) ✓✓ NVIDIA Texture Tools (CUDA) · Intel ISPC ispc_texcomp · AMD Compressonator · Microsoft texconv · KTX-Software toktx nvtt_export --bc7 · ispc_texcomp -bc7 · toktx --encode bc7
家族:family: BC1-BC6H(BCn 的统一进化,BC7 是 LDR 集大成) 同期发布:released alongside: BC6H(D3D11,HDR 兄弟,同 8 bpp 互补分工) 移动对应:mobile counterpart: ASTC(在桌面 / 移动并行,BC7 守桌面、ASTC 守移动)

ETC1 — Android 早期标准

ETC1 — the early Android standard

YEAR 2005 AUTHOR Ericsson Research EXT — (payload, 装在 PKM / KTX 里) MIME STD OES_compressed_ETC1_RGB8_texture · GLES 2.0+ 强制 BLOCK 4×4 / 8 byte = 4 bpp DEPTH RGB 8-bit ALPHA — (无 alpha,需要单独的 alpha 贴图) SAMPLE GPU 硬件原生(全部 OpenGL ES 2.0+ 设备) STATUS 历史标准 · 已被 ETC2 / ASTC 替代

"OpenGL ES 时代第一个免专利的块压缩。"

"The first patent-free block codec of the OpenGL ES era."

2005 年 Khronos 在为 OpenGL ES 标准化纹理压缩时遇到一个棘手问题——S3TC(BC1-3)效果好但被 S3 Graphics 申请了一堆专利,Khronos 不可能把"必须授权才能用"的格式塞进开放标准。Ericsson Research 提了 ETC1(Ericsson Texture Compression),声明免专利,正好填上空缺,跟着 OpenGL ES 2.0(2007)一起进入 Android 强制基线。Android 从此在游戏纹理上有了统一格式——美术不必为不同 GPU 厂商分别打包,Mali / Adreno / PowerVR / Tegra 全都能解 ETC1。代价是 ETC1 没有 alpha 通道,任何带透明度的资产(UI 图标、粒子、角色边缘)都要拆成"RGB 用 ETC1 + alpha 用 8-bit 灰度图"两份纹理上传——显存和带宽都要付双份钱。这是 ETC2 在 2013 年出生的根本原因。但回到 2005,免专利 + GLES 2.0 强制 = ETC1 一夜之间成了 Android 游戏纹理事实标准。Angry Birds(2009)、Cut the Rope(2010)这一代手机游戏的纹理资产几乎全是 ETC1。

In 2005, while Khronos was standardising texture compression for OpenGL ES, it ran into a thorny problem — S3TC (BC1-3) worked beautifully but was wrapped in patents owned by S3 Graphics, and an open standard couldn't mandate "must license to use" formats. Ericsson Research proposed ETC1 (Ericsson Texture Compression), declared it patent-free, and it slotted neatly into the gap, riding alongside OpenGL ES 2.0 (2007) into the Android mandatory baseline. Suddenly Android had a single texture format every artist could ship — no need to repackage per vendor, since Mali, Adreno, PowerVR and Tegra all decoded ETC1. The price was that ETC1 had no alpha channel, so anything translucent (UI icons, particles, character edges) had to be split into "RGB as ETC1 + alpha as an 8-bit greyscale map" — two texture uploads, double the VRAM and bandwidth. That is exactly why ETC2 was born in 2013. But back in 2005, patent-free + GLES 2.0 mandatory equals ETC1 becoming the de-facto Android texture standard overnight. Angry Birds (2009) and Cut the Rope (2010) — that generation of mobile games — shipped almost their entire texture base in ETC1.

ETC1 · 4×4 BLOCK = TWO 2×4 HALVES 4×4 px (2×4 + 2×4) UPPER 2×4 HALF base RGB444 + modifier id (3 bit) + 2-bit/px index (8 px) LOWER 2×4 HALF base RGB444 + modifier id (3 bit) + 2-bit/px index (8 px) 16 modifier rows · pick 1 id 0 ±2 ±8 id 1 ±5 ±17 id 2 ±9 ±29 … id 15 ±42 ±183 TOTAL · 8 byte/块 2× (12 bit base + 3 bit mod) + 16 × 2 bit index = 64 bit
图 21 · ETC1 把一个 4×4 块切成上下两半(各 2×4),每半独立存一个 RGB444 base color(12 bit)+ 一个 3-bit modifier id(从 16 行预设表里挑一行)+ 每像素 2-bit index(在 modifier 的 4 个偏移值里选一个)。整块 8 byte = 4 bpp,跟 BC1 同体积。但这个结构没有 alpha 通道——任何带透明度的资产都要单独再上传一张 alpha 贴图,显存和带宽要付双份。
Fig 21 · ETC1 splits a 4×4 block into two 2×4 halves; each half independently stores an RGB444 base colour (12 bits) + a 3-bit modifier id (one row picked out of a 16-row preset table) + a 2-bit per-pixel index (one of the modifier's four offset values). The whole block is 8 bytes = 4 bpp — the same footprint as BC1. But this structure has no alpha channel, so any translucent asset has to upload a second alpha texture, doubling VRAM and bandwidth.

技术内核

Technical core

ETC1 的设计是"把 BC1 的思路换一种几何切分,绕开专利"。① 4×4 块切两半——不像 BC1 把 4×4 当整体处理,ETC1 把块切成上下 2×4 或左右 4×2 两半(块头有 1 bit 标记 flip 方向),每半独立有自己的颜色 base + modifier。这是 ETC1 跟 BCn 最大的几何差异——BCn 块是统一的 16 像素插值,ETC1 是两组 8 像素插值。② RGB444 base + 16 行 modifier 表——每半的 base color 只有 12 bit(RGB444),精度比 BC1 的 RGB565 还低;但靠 modifier 表补救——3 bit 选 16 行预设里的一行,每行给出 4 个亮度偏移值(如 ±2 / ±8 这种"小幅"组,或 ±42 / ±183 这种"大幅"组),覆盖从平滑渐变到硬边缘的不同需求。③ 2-bit/像素 index——每像素再用 2 bit 选 modifier 行里的 4 个偏移值之一,加到 base color 上得到最终颜色。换言之 ETC1 的颜色计算是"base ± modifier",只在亮度方向上调,色相不变——这意味着 ETC1 处理彩色高频细节(花布、彩色噪点)很差,但处理"单色平滑+亮度变化"(皮肤、墙面、地形)很好。④ 没有 alpha——这是 ETC1 最致命的局限。Android 游戏的解决方案是"双纹理上传":RGB 用 ETC1,alpha 用单通道 8-bit 灰度图(或 ETC1 的另一个块当 alpha 用,叫 ETC1+A 的 hack)。⑤ 每块 8 byte / 16 像素 = 4 bpp——跟 BC1 同体积。质量略差于 BC1(因为色相方向死板),但免专利 = 能强制进 GLES 标准,这是 BC1 做不到的。

ETC1's design is "use a different geometric split from BC1 to dodge the patents". ① 4×4 block split into two halves — unlike BC1, which treats the 4×4 as one unit, ETC1 splits the block into two 2×4 halves (or two 4×2 halves; the block header carries a single flip bit). Each half independently owns its colour base + modifier. That's the biggest geometric difference from BCn: BCn's block is one 16-pixel interpolation; ETC1's is two 8-pixel interpolations. ② RGB444 base + a 16-row modifier table — each half's base colour is only 12 bits (RGB444), even less precise than BC1's RGB565; the modifier table makes up the difference. Three bits pick one of 16 preset rows, each row carrying four brightness offsets (a "fine" set like ±2 / ±8, a "coarse" set like ±42 / ±183), covering everything from smooth gradients to hard edges. ③ 2 bits per pixel for the index — each pixel picks one of the four offsets in the chosen modifier row and adds it to the base colour, producing the final value. ETC1's colour math is therefore "base ± modifier" — adjustment only along brightness, never along hue. That makes ETC1 poor on coloured high-frequency detail (patterned cloth, coloured noise) and excellent on monochrome-plus-brightness signals (skin, walls, terrain). ④ No alpha — ETC1's most fatal limitation. The Android workaround was the "two-texture upload": RGB as ETC1, alpha as a single-channel 8-bit greyscale map (or a second ETC1 block reused as alpha — the "ETC1+A" hack). ⑤ 8 bytes per 16-pixel block = 4 bpp, the same footprint as BC1. Quality lags BC1 slightly (because hue can't move) but ETC1 is patent-free, which lets it become a mandatory part of the GLES standard — something BC1 could never be.

适用

USE FOR

  • (历史) OpenGL ES 2.0 时代 Android 游戏纹理
  • (历史) Android 4.x / 5.x 时代不带 alpha 的资产(地形、天空盒、道具背景)
  • 极少数仍需要兼容 OpenGL ES 2.0 设备的旧游戏维护
  • (historical) OpenGL ES 2.0-era Android game textures
  • (historical) Android 4.x / 5.x assets without alpha (terrain, skyboxes, prop backgrounds)
  • The rare modern case of maintaining a legacy game that still ships to GLES 2.0 devices

反适用

AVOID

  • 任何现代项目(用 ETC2ASTC 替代)
  • 需要 alpha 的纹理(ETC1 根本不支持)
  • 桌面 / 主机(用 BC7)
  • 彩色高频纹理(ETC1 色相方向死板,效果差)
  • Any modern project (use ETC2 or ASTC instead)
  • Textures that need alpha (ETC1 simply doesn't support it)
  • Desktop / console (use BC7)
  • Coloured high-frequency textures (ETC1's fixed-hue model handles them badly)
scopeAPIstoolsCLI
ETC1 OpenGL ES 2.0+(强制) · OpenGL 4.3+(ARB_ES3_compatibility) · ~ Vulkan(扩展) · D3D / Metal Khronos etc1tool · Mali Texture Compression Tool · ImageMagick · Unity / Unreal 早期内置 etc1tool in.png --encode -o out.pkm · etc2comp -format ETC1 in.png -o out.ktx
受启发于:inspired by: S3TC / BC1(免专利仿制,绕开 S3 Graphics 的专利墙) 直接继承:direct successor: ETC2 / EAC(在 ETC1 基础上加 alpha 通道) 同期对手:contemporary rival: PVRTC(PowerVR 私有方案,iOS 阵营)

ETC2 / EAC — alpha 加成

ETC2 / EAC — adding alpha

YEAR 2013 AUTHOR Khronos / Ericsson Research EXT — (payload, 装在 KTX 里) MIME STD GL_COMPRESSED_RGB8_ETC2 / RGBA8_ETC2_EAC · GLES 3.0+ 强制 BLOCK 4×4 / RGB8 = 8 byte (4 bpp) · RGBA8 = 16 byte (8 bpp) DEPTH RGB / RGBA 8-bit · R11 / RG11 单/双 11-bit ALPHA 11-bit EAC alpha block(高精度) SAMPLE GPU 硬件原生(全部 OpenGL ES 3.0+ 设备) STATUS Android 基线纹理格式 · Vulkan 移动端主流

"ETC1 加上 alpha 通道,正好赶上 OpenGL ES 3.0。"

"ETC1 with alpha — just in time for OpenGL ES 3.0."

ETC1 在 Android 上跑了 6 年(2007-2013),但"没有 alpha"这个缺陷越用越疼。任何带透明度的资产——UI 图标、HUD、粒子系统、抠图角色——都要拆成两份纹理上传:RGB 用 ETC1(4 bpp),alpha 用 8-bit 灰度图(8 bpp),合计 12 bpp,显存和带宽是单纹理的 3 倍。手机游戏的 UI 又特别多透明元素,这个负担实打实地让中低端 Android 设备跑不动。2013 年 Khronos 正式推 ETC2 / EAC——保持向下兼容(老的 ETC1 块在 ETC2 解码器里能直接用),同时加入 RGBA 模式(ETC2 RGB 块 + EAC alpha 块,共 16 byte = 8 bpp)。ETC2 还顺手补齐了 R11 / RG11 单/双通道格式(对应桌面的 BC4 / BC5,用于法线贴图、roughness 等),让移动端也有了完整的"通道拆分"工具箱。最重要的政治决定:Khronos 把 ETC2 定成 OpenGL ES 3.0 的强制基线——任何宣称支持 GLES 3.0 的 GPU 都必须解码 ETC2。这意味着 2014 年之后的 Android 游戏可以放心地"全资产 ETC2",不再需要为"老设备没 ETC2"留 fallback。Unity / Unreal 在 2014 年都把 Android 默认纹理改成了 ETC2。

ETC1 ran on Android for six years (2007-2013), but "no alpha" hurt more every year. Anything translucent — UI icons, HUDs, particle systems, alpha-masked characters — had to upload two textures: RGB as ETC1 (4 bpp) plus alpha as an 8-bit greyscale map (8 bpp), 12 bpp combined and roughly 3× the bandwidth of a single texture. Mobile UI is unusually heavy on translucent elements, and that overhead measurably broke mid- to low-end Android devices. In 2013 Khronos shipped ETC2 / EAC: keep ETC1 backward compatibility (legacy ETC1 blocks decode unchanged in an ETC2 decoder) and add an RGBA mode (an ETC2 RGB block + an EAC alpha block, 16 bytes = 8 bpp total). ETC2 also rounded out single- and dual-channel formats with R11 / RG11 (the mobile counterparts to desktop BC4 / BC5 — normals, roughness, etc.), giving mobile its own full "channel-split" toolbox. The crucial political decision: Khronos made ETC2 a mandatory baseline for OpenGL ES 3.0. Any GPU that claims GLES 3.0 support must decode ETC2. Post-2014 Android games could finally ship all-ETC2 with no "device might not have ETC2" fallback. Unity and Unreal both flipped their Android default to ETC2 in 2014.

ETC2 NEW MODES + EAC ALPHA BLOCK RGB BLOCK · 4 模式 ETC1 base T-mode H-mode Planar T/H 处理硬边 · Planar 处理平滑渐变 RGBA8 · 16 byte/块 = 8 + 8 ETC2 RGB block · 8 byte EAC alpha block · 8 byte 11-bit alpha 高精度 + R11 / RG11 单/双通道(取代 BC4/BC5 的角色)
图 22 · ETC2 在 ETC1 的"上下两半 base + modifier"基础上新增 T-mode / H-mode / Planar mode 三种块布局——T/H 处理块内有硬边的情况,Planar 处理平滑渐变。RGBA8 模式则把一块拆成"前 8 byte ETC2 RGB block + 后 8 byte EAC alpha block",alpha 用 11-bit 高精度,共 16 byte = 8 bpp。EAC 还独立支持 R11 / RG11 单/双通道格式,对应桌面的 BC4 / BC5。
Fig 22 · On top of ETC1's "two halves with base + modifier", ETC2 adds three new block layouts — T-mode, H-mode and Planar mode. T and H handle blocks containing hard colour edges; Planar handles smooth gradients. The RGBA8 mode splits each block into "8 bytes of ETC2 RGB + 8 bytes of EAC alpha", with alpha quantised at 11-bit high precision — 16 bytes total = 8 bpp. EAC also supports standalone R11 / RG11 single- and dual-channel formats, the mobile counterparts to desktop BC4 / BC5.

技术内核

Technical core

ETC2 的设计哲学是"在 ETC1 上做加法,不做减法"——所有 ETC1 块在 ETC2 解码器里都能正常工作(向下兼容),新增的能力都通过"block 头部模式位"切换。① RGB 块 4 种模式:(a) ETC1 兼容模式(老的"两半 base + modifier" 结构,8 byte);(b) T-mode(把 4×4 块按 T 形分成两个颜色区,适合块内有 L 形 / T 形硬边);(c) H-mode(把块按 H 形分两区,适合垂直硬边);(d) Planar mode(用三个角点的颜色定义平面,块内每像素从平面采样,适合平滑渐变如皮肤、天空)。每块的头部 1 bit 指明用哪种模式,编码器为每块挑最优。② RGBA8 = ETC2 RGB block + EAC alpha block——一块 16 byte,前 8 byte 是 ETC2 RGB,后 8 byte 是 EAC(Ericsson Alpha Compression)alpha 块。EAC alpha 块用 8 个端点 + 内插值 + 3-bit/像素 index,提供 11-bit 等效精度,远高于 BC3 alpha 块的 8-bit。③ R11 / RG11——独立的单/双 11-bit 通道格式,对应桌面的 BC4 / BC5,用于法线贴图(RG11)、高度图 / roughness(R11)等。R11 是 8 byte/块 = 4 bpp,RG11 是 16 byte/块 = 8 bpp。④ punch-through alpha——一种特殊模式叫 ETC2 RGBA1(RGB8_PUNCHTHROUGH_ALPHA1_ETC2),只允许 alpha = 0 或 255 的硬切边(像 BC1 的 1-bit alpha),用于树叶、栅栏这种"完全透明 / 完全不透"的资产,体积仍是 4 bpp。⑤ OpenGL ES 3.0 强制 = 不需要 fallback——这是 ETC2 最大的工程优势。BC1-7 在桌面是"硬件支持但要查 capability",ETC2 在 Android GLES 3.0+ 是"必然存在"。Unity / Unreal 因此在 2014 年果断把 Android 默认纹理改成 ETC2。

ETC2's design philosophy is "add to ETC1, never subtract" — every ETC1 block decodes correctly in an ETC2 decoder (backward compatibility), and new capabilities are gated behind block-header mode bits. ① The RGB block has four modes: (a) ETC1-compatible (the legacy "two halves, base + modifier" structure, 8 bytes); (b) T-mode (the 4×4 block split into two colour regions in a T shape — handy for blocks with L- or T-shaped hard edges); (c) H-mode (split into two regions in an H shape — for vertical hard edges); (d) Planar mode (three corner colours define a plane, every pixel is sampled from that plane — for smooth gradients like skin and sky). One bit in the block header chooses the mode; the encoder picks per-block. ② RGBA8 = ETC2 RGB block + EAC alpha block — 16 bytes per block: the first 8 are ETC2 RGB, the next 8 are EAC (Ericsson Alpha Compression). The EAC alpha block carries 8 endpoints + interpolated values + 3-bit per-pixel index, delivering an effective 11-bit precision — far above BC3 alpha's 8-bit. ③ R11 / RG11 — standalone single- and dual-channel 11-bit formats, the mobile counterparts to desktop BC4 / BC5, used for normal maps (RG11), height / roughness maps (R11), etc. R11 is 8 bytes per block = 4 bpp; RG11 is 16 bytes = 8 bpp. ④ Punch-through alpha — a special mode called ETC2 RGBA1 (RGB8_PUNCHTHROUGH_ALPHA1_ETC2) only allows alpha = 0 or 255 hard cut-outs (like BC1's 1-bit alpha), targeted at foliage / fences / "fully on or fully off" assets at 4 bpp. ⑤ OpenGL ES 3.0 mandatory = no fallback needed — and that is ETC2's biggest engineering advantage. On desktop, BC1-7 are "hardware-supported but capability-checked"; on Android with GLES 3.0+, ETC2 is guaranteed to exist. That is exactly why Unity and Unreal flipped the Android default to ETC2 in 2014.

适用

USE FOR

  • Android 游戏纹理 / OpenGL ES 3.0+ 全部资产(默认选择)
  • 需要兼容老 Android 设备但又想要 alpha 的项目
  • Vulkan 移动端纹理(广泛支持)
  • R11 / RG11 用于移动端法线贴图、roughness、高度图
  • punch-through alpha 用于树叶 / 栅栏 / UI 硬边切边资产
  • Android game textures / OpenGL ES 3.0+ assets (the default choice)
  • Projects that need to support older Android devices yet still ship alpha
  • Vulkan mobile textures (broadly supported)
  • R11 / RG11 for mobile normal maps, roughness, height maps
  • Punch-through alpha for foliage / fences / hard-edged UI cut-out assets

反适用

AVOID

  • 桌面 / 主机(用 BC7,质量更高且行业标准)
  • 新硬件移动端(用 ASTC,块尺寸灵活、bpp 可调)
  • iOS(Apple 设备只支持到 iOS 13;后续要 ASTC)
  • HDR 纹理(ETC2 仍是 LDR 0-1)
  • Desktop / console (use BC7 — higher quality and the industry standard)
  • Newer mobile hardware (use ASTC — flexible block sizes, tunable bpp)
  • iOS (Apple devices only supported it through iOS 13; everything later wants ASTC)
  • HDR textures (ETC2 is still LDR 0-1)
scopeAPIstoolsCLI
ETC2 / EAC OpenGL ES 3.0+(强制) · OpenGL 4.3+(ARB_ES3_compatibility) · Vulkan(VK_FORMAT_ETC2_*) · ~ Metal(iOS 13 之前) · D3D 原生 ✓✓ Google etc2comp(开源,SIMD 加速) · Mali Texture Compression Tool · Compressonator · Unity / Unreal 内置 etc2comp -format RGBA8 in.png -o out.ktx · toktx --encode etc2 out.ktx2 in.png
直接父辈:direct parent: ETC1(向下兼容,所有 ETC1 块在 ETC2 解码器里直接可用) 同代但路径不同:contemporary but different track: BC7(D3D11 桌面方向) 在新硬件被替代:superseded on newer hardware: ASTC(更灵活的块尺寸 + HDR 支持)

PVRTC — Apple 早期独占

PVRTC — Apple's early proprietary lock-in

YEAR 2003 AUTHOR Imagination Technologies (PowerVR) EXT .pvr · 也装在 KTX 里 MIME STD IMG_texture_compression_pvrtc · OES_compressed_paletted_texture(早期) BLOCK 8×4 (2 bpp 模式) · 4×4 (4 bpp 模式) / 8 byte DEPTH RGBA 8-bit · PVRTC2 加 punch-through alpha ALPHA 原生 alpha 通道 SAMPLE PowerVR GPU 硬件原生 · 其他 GPU 不支持 STATUS iOS 老设备唯一 · A10 后被 ASTC 取代

"PowerVR 的私有方案,iPhone 一代到 7 代的纹理本命。"

"PowerVR's proprietary scheme — texture-of-life for iPhone 1 through 7."

PVRTC 的诞生跟一个非常特定的硬件架构绑定:Imagination Technologies 的 PowerVR GPU 用的是 TBDR(Tile Based Deferred Rendering,基于瓦片的延迟渲染)——把屏幕切成小瓦片(典型 32×32 像素),每个瓦片独立渲染、合成,显著省功耗(手机的核心需求)。问题是 TBDR 处理瓦片时,纹理 sample 经常跨瓦片边界,如果纹理压缩格式是"块独立"的(像 BC1 / ETC1 那种,每个 4×4 块独立解码),瓦片边界处会出现明显的"块状不连续"(blocky artifact)。Imagination 在 2003 年提出 PVRTC 解决这个问题:不存"每块独立的颜色",而是存两层"低分辨率的颜色信号" + 一个"调制信号"——运行时 GPU 在采样点对两层信号做双线性插值,然后用调制信号在两个插值结果之间混合。这样块之间天然连续,没有边界 artifact——完美适配 TBDR。代价是 PVRTC 是私有格式,只有 PowerVR GPU 能解。但 Apple iPhone 1(2007)到 iPhone 7(2016)全部用 PowerVR GPU,所以 PVRTC 是 iOS 游戏的唯一标准纹理格式近十年。Infinity Blade、Real Racing、Monument Valley 一代游戏的纹理资产基本全是 PVRTC。iPhone 8 / iPad Pro A10(2017)改用 Apple 自研 GPU,默认 ASTC,PVRTC 进入历史。

PVRTC's birth is tied to one very specific hardware architecture: Imagination Technologies' PowerVR GPUs use TBDR (Tile Based Deferred Rendering), which slices the screen into small tiles (typically 32×32 pixels), renders and composites each tile independently, and saves significant power — the core mobile requirement. The trouble is that during tile processing, texture samples regularly cross tile boundaries; if the texture format is "block independent" (like BC1 / ETC1, each 4×4 block decoded in isolation), tile boundaries grow visible "blocky" artifacts. In 2003 Imagination proposed PVRTC to solve this. Instead of storing "independent colour per block", PVRTC stores two layers of low-resolution colour signals plus a modulation signal — at sample time the GPU bilinearly interpolates both colour layers, then blends the two interpolated results using the modulation signal. Blocks are naturally continuous across boundaries — no block artifacts, a perfect TBDR fit. The price is that PVRTC is proprietary, decodable only on PowerVR GPUs. But every iPhone from the iPhone 1 (2007) through the iPhone 7 (2016) shipped with a PowerVR GPU, so PVRTC was the de facto sole texture standard on iOS for nearly a decade. Infinity Blade, Real Racing and Monument Valley — that generation of iOS games — basically shipped their entire texture base as PVRTC. The iPhone 8 / iPad Pro A10 (2017) switched to Apple's own GPU, defaulting to ASTC, and PVRTC slid into history.

PVRTC · 2 LOW-RES + MODULATION → PIXEL SIGNAL A · 1/4 res RGB 稀疏色块网格 bilerp SIGNAL B · 1/4 res RGB 稀疏色块网格 MODULATION · 灰度 (1-2 bit/px) 每像素 mod 值 = mix 比例 pixel(x,y) = mix( bilerp(A, x, y), bilerp(B, x, y), mod(x, y) ) FINAL · 全分辨率 块边界天然连续 2 bpp 模式 · 块 8×4 4 bpp 模式 · 块 4×4 完美适配 TBDR PowerVR GPU 独占
图 23 · PVRTC 不像 BCn / ETCn 那样存"每块独立颜色",而是存两个 1/4 分辨率的 RGB 信号(A、B)+ 一个调制信号 mod。运行时 GPU 在采样点对 A、B 各做双线性插值,得到两个候选颜色,再用 mod 在两者之间混合得到最终像素。这种结构让块之间天然连续——块边界没有 BCn 那种"blocky" artifact,完美适配 PowerVR 的 TBDR(基于瓦片的延迟渲染)架构。块尺寸 8×4(2 bpp)或 4×4(4 bpp)双精度档可选。
Fig 23 · Unlike BCn / ETCn, PVRTC does not store "independent colour per block". Instead it stores two quarter-resolution RGB signals (A, B) plus a modulation signal mod. At sample time the GPU bilinearly interpolates A and B independently, producing two candidate colours, and then uses mod to blend the two into the final pixel. This structure makes block boundaries naturally continuous — no BCn-style "blocky" artifacts, perfectly suited to PowerVR's TBDR (tile-based deferred rendering). Block sizes are 8×4 (2 bpp) or 4×4 (4 bpp), two precision tiers.

技术内核

Technical core

PVRTC 的技术结构跟 BCn / ETCn 完全是另一条思路——它不做"每块独立解码",而是用"全图低分辨率信号 + 调制图"的方案。① 两个低分辨率 RGB 层 + 一个调制层——记原图分辨率 W×H,PVRTC 把它编码为:(a) 信号 A,分辨率 (W/4)×(H/4)(2 bpp 模式)或 (W/4)×(H/2)(4 bpp 模式),每个采样点存 RGB 端点;(b) 信号 B,跟 A 同分辨率,存另一组 RGB 端点;(c) 调制信号 mod,跟原图同分辨率,每像素 1 bit(2 bpp 模式)或 2 bit(4 bpp 模式)指明 A 和 B 的混合比例。② 采样时的实际运算:GPU 对 A、B 各自做双线性插值得到 colorA、colorB,再用 mod 在两者之间混合。这不是块独立——同一个像素的颜色受周围 4 个 A 端点 / 4 个 B 端点的影响,块边界因此天然平滑过渡。③ 块尺寸 8×4(2 bpp)或 4×4(4 bpp)——两种码率档:2 bpp 是 8×4 块 / 8 byte = 2 bpp(注意是 8 byte/块,跟 BC1 同 byte 数但块更大,所以 bpp 减半),4 bpp 是 4×4 块 / 8 byte。④ 原生 alpha——比 ETC1 强,能直接装 RGBA 数据(虽然质量略差于 BC3 / BC7)。⑤ "分辨率必须是 2 的幂 + 正方形 + ≥8×8"——PVRTC v1 的硬限制。这个限制源于"信号 A、B 必须能均匀采样到原图所有像素"。PVRTC2(2009)放宽了这个限制(支持任意宽高 + punch-through alpha),但 PVRTC2 的硬件支持远不如 v1 普及。⑥ PowerVR 独占解码硬件——这是 PVRTC 同时是它的优势和坟墓。优势:iPhone 1-7 全部 PowerVR,PVRTC 在 iOS 游戏里是"必然支持";坟墓:其他 GPU 不解 PVRTC,Android 设备完全用不了,跨平台游戏要分别打包 PVRTC(iOS)+ ETC2(Android)两份纹理。Apple 在 A10(2017)改用自研 GPU(基于 Imagination 但有改造),iOS 11+ 推荐 ASTC 后,PVRTC 就停止发展了。Imagination Technologies 也在 2017 年因 Apple 流失被收购,PVRTC 实际上跟着公司一起进入历史。

PVRTC's technical structure is on a completely different track from BCn / ETCn — it doesn't do "decode each block in isolation"; it uses "global low-resolution signals + a modulation map." ① Two low-resolution RGB layers plus one modulation layer — given source resolution W×H, PVRTC encodes: (a) signal A at (W/4)×(H/4) (2 bpp mode) or (W/4)×(H/2) (4 bpp mode), each sample storing an RGB endpoint; (b) signal B, same resolution as A, holding another RGB endpoint set; (c) modulation signal mod, at the source's full resolution, with 1 bit/pixel (2 bpp mode) or 2 bits/pixel (4 bpp mode) specifying the blend ratio between A and B. ② Sample-time arithmetic: the GPU bilinearly interpolates A and B independently to produce colourA and colourB, then uses mod to blend them. This is not block-independent — a single pixel's colour depends on the surrounding 4 A endpoints + 4 B endpoints, so block boundaries transition smoothly by construction. ③ Block size 8×4 (2 bpp) or 4×4 (4 bpp) — two bitrate tiers. The 2 bpp variant uses 8×4 blocks at 8 bytes per block (note: same bytes-per-block as BC1, but the block is larger, so bpp halves); the 4 bpp variant is 4×4 blocks at 8 bytes. ④ Native alpha — stronger than ETC1, can carry RGBA directly (though with somewhat lower quality than BC3 / BC7). ⑤ "Power-of-two, square, ≥ 8×8" — PVRTC v1's hard requirement, rooted in the need for signals A and B to sample uniformly onto every source pixel. PVRTC2 (2009) relaxed this (arbitrary aspect ratios + punch-through alpha), but PVRTC2 hardware support never reached v1's ubiquity. ⑥ PowerVR-exclusive decode hardware — both PVRTC's strength and its tomb. The strength: every iPhone 1-7 had a PowerVR GPU, so PVRTC was guaranteed-supported on iOS. The tomb: no other GPU decodes PVRTC, so Android couldn't use it at all, and cross-platform games had to ship two texture builds — PVRTC (iOS) + ETC2 (Android). When Apple's A10 (2017) moved to in-house GPUs (originally Imagination-derived, but heavily modified), and iOS 11+ recommended ASTC, PVRTC stopped evolving. Imagination Technologies itself was acquired in 2017 after losing the Apple business; PVRTC effectively went into the history books with the company.

适用

USE FOR

  • (历史) iPhone 1-7 / iPad 第一代到 Pro A9 的 iOS 游戏
  • (历史) 老 Android PowerVR 设备(MX 系列芯片)
  • 仍需要兼容到 iOS 9-10 的旧游戏维护
  • 需要"块边界天然连续"的特殊场景(罕见)
  • (historical) iOS games on iPhone 1-7 / first-gen iPad through iPad Pro A9
  • (historical) Older Android PowerVR devices (MX-series SoCs)
  • Maintaining a legacy game that still ships to iOS 9-10
  • The rare niche that genuinely needs "natively continuous block boundaries"

反适用

AVOID

  • 现代 iOS 项目(用 ASTC,Apple Silicon 默认)
  • 任何非 PowerVR GPU 设备(Android Mali / Adreno / Tegra,完全不解)
  • 跨平台游戏(双轨打包成本高,统一用 ASTC + ETC2 fallback)
  • 非 2 的幂 / 非正方形纹理(PVRTC v1 硬限制)
  • Modern iOS projects (use ASTC, Apple Silicon's default)
  • Any non-PowerVR GPU device (Android Mali / Adreno / Tegra simply can't decode it)
  • Cross-platform games (the dual-track packaging cost is high — unify on ASTC + ETC2 fallback)
  • Non-power-of-two or non-square textures (a hard PVRTC v1 limitation)
scopeAPIstoolsCLI
PVRTC v1 / v2 PowerVR GPU(iOS A4-A9, 部分老 Android) · ~ Vulkan iOS 兼容层 · 其他 GPU Imagination PVRTexTool(GUI + CLI) · texconv 不支持 · Unity / Unreal 早期 iOS 默认 PVRTexToolCLI -i in.png -o out.pvr -f PVRTC1_4_RGB · PVRTexToolCLI -f PVRTC1_2_RGBA -i in.png -o out.pvr
私有起源:proprietary origin: PowerVR 私有研发(1999-2003,跟 TBDR 架构同源) 同期对手:contemporary rival: ETC1(开放路线,Android 阵营) Apple 抛弃后接班:Apple's chosen successor: ASTC(Khronos 标准,A10 后默认)

ASTC — 可变块大小的现代之王

ASTC — the modern king of variable block size

YEAR 2012 AUTHOR ARM + Khronos EXT .astc · 装在 KTX2 里 MIME STD KHR_texture_compression_astc_ldr / _hdr · Vulkan VK_FORMAT_ASTC_* BLOCK 4×4 ~ 12×12(14 档)/ 16 byte BPP 8 bpp ~ 0.89 bpp 灵活档 DEPTH LDR / HDR 双 profile ALPHA 原生 alpha 通道(独立权重平面) SAMPLE GPU 硬件原生 STATUS Vulkan / OpenGL ES 3.2 强制 · Apple A8+ · WebGPU 可选 feature

"4×4 还是 12×12?同一个格式,你自己挑。"

"4×4 or 12×12? Same format — you choose."

BCn / ETC2 都是固定 4×4 块,bpp 永远 4 或 8——只能"全图统一档位"。但游戏里的纹理质量需求从来不是一档的:UI 图标、角色脸需要高质量(密集块、高 bpp);远处地形、天空盒可以低质量(稀疏块、低 bpp)。美术希望一种格式同时支持"质量/体积"光谱滑块——同样一个文件结构,从 8 bpp 一路滑到 1 bpp。ARM 主导设计 ASTC(Adaptive Scalable Texture Compression),提供 14 种块大小(4×4 至 12×12),bpp 从 8 降到 0.89——同一格式覆盖近 9× 体积范围。Khronos 在 2012 年通过标准化(GLES 3.2 强制 + Vulkan 默认 + Apple A8 起原生支持),ASTC 成为现代移动 + WebGPU 的事实之王。BC7 守桌面、ASTC 守移动——这是 GPU 纹理压缩 2010 年代后的两强格局。

BCn / ETC2 are fixed at 4×4 blocks; bpp is locked at 4 or 8 — every texture in a project must pick one tier for the whole image. But real game textures need a spectrum: UI icons and character faces want high quality (dense blocks, high bpp), while distant terrain and skyboxes can run low quality (sparse blocks, low bpp). Artists want one format that exposes a quality / size dial — the same file structure sliding from 8 bpp down to 1 bpp. ARM led the design of ASTC (Adaptive Scalable Texture Compression), shipping 14 block sizes from 4×4 to 12×12, with bpp dropping from 8 to 0.89 — one format spanning nearly a 9× size range. Khronos standardised it in 2012 (mandatory in GLES 3.2, default in Vulkan, native on Apple from A8 onward), and ASTC became the de-facto king of modern mobile and WebGPU. BC7 owns desktop, ASTC owns mobile — that's the post-2010s duopoly of GPU texture compression.

ASTC · 14 BLOCK SIZES → bpp block px / block bpp size bar (← high · low →) 4×4168.00 5×4206.40 5×5255.12 6×5304.27 6×6363.56 8×5403.20 8×6482.67 8×8642.00 10×5502.56 10×6602.13 10×8801.60 10×101001.28 12×101201.07 12×121440.89 每块固定 16 byte → 块越大,bpp 越低 · 4×4 (8 bpp) 到 12×12 (0.89 bpp) ≈ 9× 体积差 同一格式覆盖近 9× 体积范围 · 同一文件结构 · 编码器为每种 mip / 用途选档位
图 24a · ASTC 的 14 种块大小及对应 bpp。每块固定 16 byte——块越大(像素更多),bpp 就越低。从 4×4 块(每块 16 px,8 bpp)滑到 12×12 块(每块 144 px,0.89 bpp),同一格式覆盖近 9× 体积范围。这是 ASTC 相对 BCn / ETC2 最根本的优势:不是"格式更好",而是"档位更宽"。
Fig 24a · ASTC's 14 block sizes and the bpp each one yields. Every block is a fixed 16 bytes — the larger the block (more pixels packed in), the lower the bpp. Sliding from 4×4 (16 px, 8 bpp) to 12×12 (144 px, 0.89 bpp), the same format covers nearly a 9× size range. ASTC's most fundamental advantage over BCn / ETC2 isn't "better quality" — it's "wider tier range."
ASTC · 16-BYTE BLOCK LAYOUT mode part CEM endpoints weights ← packed cfg ~11 bit ~4-10 ~6-14 variable variable extra 13 种 endpoint 编码格式 LDR int · HDR float · 1/2/3 平面 从 byte 15 反向打包 grid 大小 ≠ 块大小,可重采样 总计 128 bit = 16 byte · 块大小变,各字段比例自动调,总长不变
图 24b · ASTC 块的 16 byte 内部结构:① 块模式(~11 bit)指示 weight grid 的形状与精度;② partition 计数(1-4 子区);③ CEM(Color Endpoint Mode,13 种 endpoint 编码格式);④ 颜色端点(可变长);⑤ 权重网格(从 byte 15 反向打包,网格大小可独立于块大小)。无论块是 4×4 还是 12×12,总长永远 128 bit。
Fig 24b · The internal layout of an ASTC block's 16 bytes: ① block mode (~11 bits) describing the weight grid's shape and precision; ② partition count (1-4 subsets); ③ CEM (Color Endpoint Mode — 13 endpoint encodings); ④ colour endpoints (variable); ⑤ weight grid (packed in reverse from byte 15, with a grid size independent of the block size). Whether the block is 4×4 or 12×12, the total is always 128 bits.
SAME IMAGE · 4×4 vs 6×6 vs 12×12 4×4 · 8 bpp PSNR ≈ 44 dB 几乎无损 6×6 · 3.56 bpp PSNR ≈ 38 dB 良好 12×12 · 0.89 bpp PSNR ≈ 30 dB 明显模糊 同一张参考图,3 种块尺寸 → 3 种 bpp/质量档
图 24c · 同一张纹理在 ASTC 三种块大小下的视觉表现:4×4(8 bpp,PSNR ≈ 44 dB,几乎无损),6×6(3.56 bpp,PSNR ≈ 38 dB,移动端默认档),12×12(0.89 bpp,PSNR ≈ 30 dB,明显糊但 9× 省体积)。美术按用途挑档——UI 用 4×4,环境用 6×6,远处 LOD 用 10×10 或 12×12。
Fig 24c · The same texture rendered with three ASTC block sizes: 4×4 (8 bpp, PSNR ≈ 44 dB, near-lossless), 6×6 (3.56 bpp, PSNR ≈ 38 dB, mobile's default tier), 12×12 (0.89 bpp, PSNR ≈ 30 dB, visibly blurry but 9× smaller). Artists pick a tier per use — 4×4 for UI, 6×6 for environments, 10×10 or 12×12 for distant LODs.
ASTC vs BC7 / BC1 · PSNR 50 dB 35 dB 20 dB 8 bpp ASTC 4×4 vs BC7 ~46 ~44 3.56 bpp ASTC 6×6 vs BC1 (4 bpp) ~38 ~32 2 bpp ASTC 8×8 (BCn N/A) ~33 ASTC BCn
图 24d · ASTC vs BCn 在三个 bpp 档位的 PSNR 对比。8 bpp:ASTC 4×4 略胜 BC7 ~2 dB(同 bpp 同高质量);3.56 bpp:ASTC 6×6 比 BC1(4 bpp,稍多体积)还好 6 dB;2 bpp:ASTC 8×8 仍能 ~33 dB,而 BCn 根本没有这个档位——这是 ASTC 真正的杀手锏:低 bpp 档没有竞品。
Fig 24d · ASTC vs BCn at three bpp tiers. 8 bpp: ASTC 4×4 narrowly beats BC7 by ~2 dB (same bpp, same premium tier). 3.56 bpp: ASTC 6×6 even beats BC1 (4 bpp, more bytes) by ~6 dB. 2 bpp: ASTC 8×8 still hits ~33 dB while BCn has no tier in this range at all — this is ASTC's true killer move: at low bpp it has no competition.

技术内核

Technical core

ASTC 的设计哲学是"一格框架,无限档位"——所有块共用 16 byte 容器,但内部组件按块大小重新分配比例,让格式从 8 bpp 一路滑到 0.89 bpp。① 14 种块大小:4×4 / 5×4 / 5×5 / 6×5 / 6×6 / 8×5 / 8×6 / 8×8 / 10×5 / 10×6 / 10×8 / 10×10 / 12×10 / 12×12——LDR profile 全部支持,HDR profile 仅前 8 种。还有 3D 体素扩展(用 3×3×3 等 11 种 3D 块)。② 每块固定 16 byte——这是 ASTC "档位光谱"的根本机制:容器不变,块越大(像素更多)→ 每像素分到的 bit 越少 → bpp 越低。比如 4×4 块 = 16 px / 16 byte = 8 bpp;12×12 块 = 144 px / 16 byte = 0.89 bpp。同样的解码硬件、同样的文件结构,档位却覆盖近 9× 体积差。③ 13 种 endpoint 编码格式 (CEM):LDR int 8/16/24-bit、HDR float、单/双/三平面(单平面 = RGB 共享一组端点;三平面 = RGB 各自独立端点,适合高频彩色细节)。每块在 endpoint mode 字段内挑一种,精确匹配局部像素分布。④ 权重平面 + 双权重平面:基本 ASTC 用一张权重图控制所有通道的内插;双权重平面(dual-plane)模式让 alpha 或某一颜色通道走独立权重——类比 BC7 的 "rotation",但更通用,在彩色 + 高频 alpha 混合贴图上质量明显更好。⑤ HDR + 3D 双扩展——LDR profile(主流硬件全支持)给颜色 0-1 范围;HDR profile(部分硬件)给 float 范围,直接当移动版 BC6H 用;3D profile(更小众)给体素纹理(医疗影像、烟雾模拟、地形 3D 噪声)。⑥ 权重网格大小可独立于块大小——一个 12×12 块的权重网格可以是 4×4(更稀疏,更节省 bit 给 endpoint),也可以是 8×8(更密,牺牲 endpoint 精度换插值精度)。这是 ASTC 比 BC7 更灵活的核心,编码器要在"块大小 × endpoint mode × 权重网格"三维空间搜索最优。⑦ 编码极慢——astcenc 参考编码器是 brute-force 搜全部组合,单图 6×6 thorough 模式可能要几分钟。但解码硬件原生,sample 一个 ASTC texel 跟 sample BC7 一样快。

ASTC's design philosophy is "one frame, infinite tiers" — every block shares a 16-byte container, but the internal allocation re-balances by block size, sliding the format from 8 bpp all the way to 0.89 bpp. ① 14 block sizes: 4×4 / 5×4 / 5×5 / 6×5 / 6×6 / 8×5 / 8×6 / 8×8 / 10×5 / 10×6 / 10×8 / 10×10 / 12×10 / 12×12 — all supported by the LDR profile; HDR is limited to the first 8. A 3D extension also defines 11 voxel block sizes. ② Every block is exactly 16 bytes — the mechanism behind ASTC's tier spectrum. The container stays constant; the bigger the block (more pixels packed in), the fewer bits per pixel, and the lower the bpp. 4×4 = 16 px / 16 bytes = 8 bpp; 12×12 = 144 px / 16 bytes = 0.89 bpp. Same decode hardware, same file layout, nearly 9× size difference between extremes. ③ 13 endpoint encodings (CEM): LDR int at 8 / 16 / 24-bit, HDR float, with one / two / three planes (one-plane = RGB share endpoints; three-plane = each colour channel has independent endpoints, ideal for high-frequency colour). Each block picks one CEM in its endpoint-mode field to match the local pixel distribution. ④ Weight plane + dual weight plane: basic ASTC uses one weight grid controlling all channels' interpolation; dual-plane mode lets alpha or one colour channel travel on an independent weight grid — analogous to BC7's "rotation" but more general, and visibly better on mixed colour + high-frequency-alpha maps. ⑤ HDR + 3D extensions — the LDR profile (universally supported) covers colour in [0, 1]; the HDR profile (partial hardware support) gives float range and effectively serves as mobile BC6H; the 3D profile (more niche) targets voxel textures (medical imaging, smoke simulation, 3D terrain noise). ⑥ Weight grid size independent of block size — a 12×12 block can use a 4×4 weight grid (sparser, donating bits to endpoints) or an 8×8 grid (denser, trading endpoint precision for interpolation precision). This is what makes ASTC more flexible than BC7: the encoder searches a three-dimensional space of "block size × endpoint mode × weight grid." ⑦ Encoding is brutally slow — astcenc, the reference encoder, brute-forces the whole combination space; a single image at 6×6 with the thorough preset can take minutes. But decoding is hardware-native — sampling an ASTC texel costs the same as sampling BC7.

ASTC · 14 BLOCK SIZES TRIAL · BUDGET-AWARE PICK · 16 BYTE OUT RGBA tile N×M px (block-sized) one block of source 4×48.00 bpp · near-lossless · UI / characters 5×55.12 bpp · high quality · skin / cloth 6×63.56 bpp · environment · mobile default 8×62.67 bpp · large environment / decals 8×82.00 bpp · terrain / distant 10×81.60 bpp · far LOD 10×101.28 bpp · skybox / very far 12×120.89 bpp · extreme low / ambient + 6 more sizes (5×4, 6×5, 8×5, 10×5, 10×6, 12×10) · 14 total → score₀ → score₁ → score₂ → score₃ → score₄ → score₅ → score₆ → score₇ best SSIM under bpp budget 16 byte ASTC block GPU sample Vulkan / GLES 3.2 native For each tile, the encoder trials all 14 block sizes (each with multiple endpoint modes / weight grids), scores them by SSIM, and picks the best size that meets the project's bpp budget. Output is always 16 bytes; GPU samples it natively in one cycle.

图 24 · ASTC 完整编码 + 采样流程:输入一个 RGBA 块,编码器对 14 种块大小逐一试压(每种内部还要枚举 endpoint mode 和权重网格组合),按 SSIM 评分,在用户给定的"bpp 预算"约束下选最优块大小,把结果塞进 16 byte。GPU 在 sample 时硬件原生解码——Vulkan / OpenGL ES 3.2 / Metal / WebGPU 全部一次 cycle 取出像素,跟 sample BC7 同样快。

Fig 24 · ASTC's full encode + sample pipeline: take an RGBA tile, run trial encodes against all 14 block sizes (each enumerating endpoint modes and weight-grid configurations), score by SSIM, and pick the best block size that fits the project's bpp budget. The result is always 16 bytes. GPUs decode it natively at sample time — Vulkan / OpenGL ES 3.2 / Metal / WebGPU all fetch a pixel in a single cycle, exactly as fast as sampling BC7.

blockbpptypical usevs BCn at same bpp
4×48.00UI icons, important textures≈ BC7 (slightly better)
5×55.12mid-quality, skin / clothBCn no equivalent tier
6×63.56environment, mobile default≫ BC1 (4 bpp) by ~6 dB
8×82.00terrain, distantBCn no tier here
10×101.28far LOD, skyboxBCn no tier here
12×120.89extreme low, ambientBCn no tier here
$ astcenc -cl in.png out.astc 6x6 -medium         # LDR profile, 6×6 block, medium preset
$ astcenc -cs in.png out.astc 6x6 -thorough       # thorough = brute-force search, much slower / better
$ astcenc -ch in.exr out.astc 6x6 -medium         # HDR profile (input EXR float)
$ toktx --astc 6x6 out.ktx2 in.png                # wrap into KTX2 (web / WebGPU friendly)
$ toktx --encode astc --astc_blk_d 6x6 out.ktx2 in.png  # explicit block-size flag

适用

USE FOR

  • 移动游戏(iOS A8+ / Android GLES 3.2+,99% 默认)
  • WebGPU(macOS 默认 / 移动 Web,配合 KTX2)
  • VR 头显纹理(Quest / Vision Pro 全部 ASTC)
  • 跨平台游戏纹理打包(配合 Basis Universal 转码)
  • 需要"质量/体积"档位灵活调节的项目(同 image 不同 mip 用不同档)
  • Mobile games (iOS A8+ / Android GLES 3.2+ — 99 % default)
  • WebGPU (macOS default / mobile web, paired with KTX2)
  • VR headset textures (Quest / Vision Pro all use ASTC)
  • Cross-platform game texture packaging (with Basis Universal transcoding)
  • Projects that need a flexible quality / size dial (same image, different mip tiers)

反适用

AVOID

  • 桌面 Windows / Linux 含 Intel HD 集显的目标(用 BC7)
  • D3D11 / D3D12 桌面游戏(BC7 是事实标准)
  • HDR 纹理在多数桌面硬件(用 BC6H;ASTC HDR Profile 桌面支持差)
  • 实时编码场景(astcenc thorough 单图几分钟,极不适合服务端实时)
  • 极老的 Android 设备(GLES 3.0 及以下,改用 ETC2 fallback)
  • Desktop Windows / Linux targets that include Intel HD iGPUs (use BC7)
  • D3D11 / D3D12 desktop games (BC7 is the de-facto standard)
  • HDR textures on most desktop hardware (use BC6H; desktop ASTC HDR support is poor)
  • Real-time encoding (astcenc thorough takes minutes per image — never use server-side live)
  • Very old Android devices (GLES 3.0 and below — fall back to ETC2)
scopeAPIstoolsCLI
ASTC LDR ✓✓ Vulkan · ✓✓ OpenGL ES 3.2(强制)· Metal (Apple A8+) · ~ WebGPU (可选 feature) · Intel HD Graphics 桌面 ✓✓ ARM astcenc(参考编码器,开源)· KTX-Software toktx · NVIDIA Texture Tools · Mali Texture Compression Tool · Unity / Unreal 内置 astcenc -cl in.png out.astc 6x6 -medium · toktx --astc 6x6 out.ktx2 in.png
ASTC HDR ~ Vulkan(部分硬件)· ~ Apple Metal (A11+ 部分支持) · 多数移动 GPU astcenc -ch astcenc -ch in.exr out.astc 6x6 -medium
突破固定块大小:broke the fixed-block-size mold: ETC2 / BC7(都是 4×4 固定,ASTC 14 档) 桌面 / 移动并行:desktop / mobile parallel: BC7(BC7 守桌面 · ASTC 守移动) 影响:influenced: KTX2 + Basis(把"自适应"思路推进到容器层)

Basis Universal — 一次编码、多平台转码

Basis Universal — encode once, transcode anywhere

YEAR 2018 AUTHOR Binomial LLC (Rich Geldreich) · 捐赠给 Khronos EXT .basis(独立) / 内嵌 KTX2 supercompression MIME — · 容器层走 image/ktx2 STD Khronos Basis Universal · KTX2 supercompressionScheme PROFILE ETC1S(~2 bpp · 小)/ UASTC(~8 bpp · 高质) DEPTH RGB / RGBA ALPHA 原生 alpha(UASTC 4×4 / ETC1S 双段) SAMPLE 运行时转码到 GPU 原生块格式 STATUS WebGL / WebGPU 主流 · glTF 推荐 · three.js / Babylon.js 内置

"一份 .basis,运行时按设备转 BC7、ETC2 或 ASTC。"

"One .basis file — transcoded at runtime to BC7, ETC2 or ASTC depending on the device."

Web 和跨平台游戏一直有个尴尬:桌面要 BC7、Android 要 ETC2、现代移动要 ASTC、老 iOS 要 PVRTC——同一张纹理要打四份,资产包体积爆炸,CDN 流量翻倍,管理痛苦不堪。Rich Geldreich(前 Valve、Crunch 作者、桌面纹理压缩领域的活字典)在 2018 年提出"中间格式"思路:编码时存为 Basis Universal(一种紧凑的 IR——intermediate representation),运行时用 JS 或 Wasm 解码到目标设备的块格式。一份资产 → 任意设备。Khronos 接受捐赠后,Basis 成为 KTX2 supercompression scheme 的事实标准,glTF 2.0 把 KTX2 + Basis 列为推荐的纹理 payload。three.js / Babylon.js / Unity WebGL / godot Web 全都内置 Basis transcoder。Web 端从此告别"打四份纹理"的时代。

Web and cross-platform games long suffered an awkward problem: desktop needs BC7, Android needs ETC2, modern mobile wants ASTC, legacy iOS wants PVRTC — the same texture has to be packed four ways, asset bundles balloon, CDN traffic doubles, and asset management becomes a nightmare. In 2018 Rich Geldreich (ex-Valve, author of Crunch, a living encyclopedia of desktop texture compression) proposed an "intermediate format" approach: encode once into Basis Universal (a compact IR — intermediate representation), then at runtime use JS or Wasm to transcode to whatever block format the target device wants. One asset → any device. After Geldreich donated the project to Khronos, Basis became the de-facto KTX2 supercompression scheme; glTF 2.0 lists KTX2 + Basis as the recommended texture payload. three.js / Babylon.js / Unity WebGL / godot Web all ship the Basis transcoder. The "pack four textures" era of the Web ended here.

BASIS UNIVERSAL · ENCODE ONCE · TRANSCODE PER DEVICE PNG RGBA8 basisu encode → IR ETC1S(~2 bpp) or UASTC(~8 bpp) .basis or KTX2 runtime device? JS / Wasm BC7 · desktop ETC2 · Android ASTC · modern mobile PVRTC · old iOS single asset → JS/Wasm transcoder picks per-device target → GPU samples natively
图 25 · Basis Universal 工作流。编码端用 basisu 把 PNG 压成 ETC1S(超小,~2 bpp,适合普通贴图)或 UASTC(高质,~8 bpp,适合法线/UI/重要纹理)中间格式,装进 .basis 独立容器或 KTX2 supercompression payload。运行时 JS/Wasm transcoder 检测设备能力——桌面转 BC7、Android 转 ETC2、现代移动转 ASTC、老 iOS 转 PVRTC——一份资产打通所有平台,GPU 拿到的是原生块格式可以直接 sample。
Fig 25 · The Basis Universal pipeline. On the encode side, basisu compresses a PNG into either ETC1S (tiny, ~2 bpp, for general textures) or UASTC (high quality, ~8 bpp, for normals / UI / important textures), packed into a standalone .basis container or a KTX2 supercompression payload. At runtime a JS/Wasm transcoder probes the device — desktops get BC7, Android gets ETC2, modern mobile gets ASTC, legacy iOS gets PVRTC — one asset covers every platform, and the GPU receives a native block format it can sample directly.

技术内核

Technical core

Basis 的核心是"中间表示 + 运行时转码"——既不像 BCn/ASTC 那样直接是 GPU 块格式,也不像 PNG/JPEG 那样是 CPU 像素流,而是一种专门设计来"几乎零成本转码到任何块格式"的紧凑中间形态。① 两个 profile:ETC1S 基于 ETC1 的色彩端点结构,每块 ~2 bpp,体积极小,质量约相当于 JPEG 中等;UASTC("Universal ASTC")基于 ASTC 4×4 的子集,每块 8 bpp,质量约等于 ASTC 4×4 / BC7。两档对应"小贴图随便堆"和"重要纹理用高质量"。② 编码后再用 supercompression 压一遍——ETC1S 用 LZ-style + RDO(rate-distortion optimisation)再压缩 30-50%,UASTC 用 Zstd 压缩 ~30%。最终 .basis / KTX2 文件比裸 BCn 还小,比 PNG/JPEG 慢一点解码但能直接送给 GPU。③ 运行时转码极快——transcoder 是设计成 O(blocks) 的简单查表 + 位重排,Wasm 实现单核能跑 几百 MB/s,比 PNG 解码快一个数量级。这是 Basis 跟传统"在线解码 PNG → CPU RGBA → uploadTexture"路径的根本区别——后者占 CPU + 占带宽 + 占显存,前者一步到位送 GPU 块格式。④ 支持目标:BC1 / BC3 / BC4 / BC5 / BC7 / ETC1 / ETC2 / ASTC 4×4 / PVRTC1 / PVRTC2 / RGBA32(无块格式硬件兜底)——基本覆盖现役所有 GPU。

Basis's core idea is "intermediate representation + runtime transcode" — it is neither a direct GPU block format like BCn / ASTC, nor a CPU pixel stream like PNG / JPEG, but a compact intermediate form deliberately designed to transcode to any block format at almost zero cost. ① Two profiles: ETC1S is built on ETC1's colour-endpoint structure, ~2 bpp per block, extremely small, with quality roughly on par with mid-quality JPEG; UASTC ("Universal ASTC") is built on a subset of ASTC 4×4, 8 bpp per block, with quality close to ASTC 4×4 / BC7. The two tiers map to "stack lots of small textures" vs "use high quality on important textures." ② The encode is then run through supercompression — ETC1S uses an LZ-style codec plus RDO (rate-distortion optimisation) and shrinks another 30–50 %; UASTC uses Zstd for about 30 %. The resulting .basis / KTX2 file is smaller than raw BCn, decodes a touch slower than PNG / JPEG, but ships straight to the GPU. ③ Runtime transcoding is blazing fast — the transcoder is engineered as O(blocks) with simple table lookups and bit re-shuffling; the Wasm build hits hundreds of MB/s on a single core, an order of magnitude faster than PNG decoding. That's the fundamental difference between Basis and the traditional "decode PNG → CPU RGBA → uploadTexture" path: the latter eats CPU + bandwidth + VRAM, while the former hands the GPU a block format in one step. ④ Supported targets: BC1 / BC3 / BC4 / BC5 / BC7 / ETC1 / ETC2 / ASTC 4×4 / PVRTC1 / PVRTC2 / RGBA32 (an uncompressed fallback for hardware without block formats) — effectively every GPU in service.

适用

USE FOR

  • glTF 2.0 模型纹理(KTX2 + Basis 是官方推荐)
  • WebGPU / WebGL 资产(配合 KTX2 容器)
  • three.js / Babylon.js / PlayCanvas / godot Web 项目
  • 跨平台游戏纹理打包(一份资产覆盖桌面/移动/Web)
  • CDN 流量敏感的场景(ETC1S ~2 bpp 体积比 PNG 小很多)
  • glTF 2.0 model textures (KTX2 + Basis is the official recommendation)
  • WebGPU / WebGL assets (paired with the KTX2 container)
  • three.js / Babylon.js / PlayCanvas / godot Web projects
  • Cross-platform game texture packaging (one asset for desktop / mobile / Web)
  • CDN-bandwidth-sensitive scenarios (ETC1S at ~2 bpp is much smaller than PNG)

反适用

AVOID

  • 服务端不预编译就直接用的场景(Basis 必须离线编码)
  • 需要无损质量的纹理(用未压缩 RGBA8 装进 KTX2)
  • 原生桌面引擎不走 Web 路径(直接用 DDS + BC7)
  • HDR 纹理(目前 Basis 只支持 LDR;HDR 用 BC6H / ASTC HDR)
  • Pipelines that skip pre-compilation on the server (Basis requires offline encoding)
  • Textures that demand lossless quality (use uncompressed RGBA8 inside KTX2)
  • Native desktop engines not on the Web path (use DDS + BC7 directly)
  • HDR textures (Basis is LDR-only today; use BC6H / ASTC HDR)
scoperuntimestoolsCLI
Basis Universal ✓✓ three.js / Babylon.js / PlayCanvas 内置 transcoder · ✓✓ Unity WebGL / godot Web · 任意 WebGL/WebGPU + Wasm transcoder ✓✓ Khronos basisu(参考编码器,开源) · toktx(打 KTX2 + Basis payload) · KTX-Software 套件 basisu in.png -uastc -output_file out.basis · toktx --encode uastc out.ktx2 in.png
统一接口:unifies: BC7 / ETC2 / ASTC 承载:carried by: KTX2 supercompression payload 思想前身:conceptual ancestor: Crunch(同作者 6 年前作品)

Crunch — 在 BC 体积上再砍一半

Crunch — halving BC's size with a second pass

YEAR 2012 AUTHOR Rich Geldreich(Basis Universal 同作者) EXT .crn(独立容器) MIME — · 私有 STD crnlib 开源(MIT) · 无 ISO PAYLOAD BC1 / BC3(可选 BC4/5) DEPTH RGB / RGBA ALPHA BC3 通道(独立 64-bit alpha 段) SAMPLE 解码回 BCn → GPU 原生 sample STATUS niche · 桌面/移动游戏纹理(被 Basis 替代)

"先 cluster 再 BC1 — 在 BC 体积上再砍一半。"

"Cluster first, then BC1 — halve the BC size."

2010s 初期移动 + 主机游戏的纹理资产包动辄几百 MB,主要是 BC1/BC3 块的累积——iOS App Store 限制单包 < 2 GB,主机光盘也是有限介质。Rich Geldreich 在 2012 年观察到一个事实:大量 4×4 块其实彼此相似——同一张草地纹理里成千上万个块都是"绿色为主、轻微噪点变化",同一面墙的砖块色调几乎一致。如果这些块共用一个码本(codebook),只存"指向码本的索引 + 微小偏移",体积可以再砍一半。Crunch 把这个想法落地:对所有块做 k-means 聚类,然后用 RC(range coder)+ Huffman 二次熵编码 BC 字典——磁盘体积比裸 BCn 再小 30-50%。运行时只需在 CPU 上花几十毫秒解回普通 BCn,再上传给 GPU。这是"BC 之上还能再压"的第一个工程化实践。后来同作者 6 年后用同样思路做了 Basis Universal,覆盖更广 GPU 块格式 + 加上 transcode 维度——Crunch 进入历史。

By the early 2010s, mobile and console game texture bundles had ballooned to hundreds of MB, mostly accumulated BC1/BC3 blocks — the iOS App Store capped single bundles at 2 GB, and console discs are finite media. In 2012 Rich Geldreich noticed an obvious truth: most 4×4 blocks are similar to each other — a grass texture has thousands of blocks that are all "mostly green with mild noise"; the bricks on a wall share an almost identical palette. If those blocks shared a single codebook and we only stored "codebook index + small offset," size could be halved again. Crunch put that idea into practice: run k-means clustering across all blocks, then run RC (range coder) + Huffman as a second-pass entropy code over the BC dictionary — on disk the result is 30–50 % smaller than raw BCn. At runtime the CPU spends tens of milliseconds decoding back to ordinary BCn, then uploads it to the GPU. It was the first engineering-grade demonstration of "compressing on top of BC." Six years later the same author took the idea further, covering more GPU block formats plus an extra transcode dimension — that became Basis Universal, and Crunch quietly walked off into history.

CRUNCH · CLUSTER 4×4 BLOCKS → 1024 CODEBOOK ~10000 raw blocks 每块独立 64-bit BC1 k-means cluster 1024-entry codebook block₀ → idx 17, +δ block₁ → idx 17, +δ' block₂ → idx 91, +δ block₃ → idx 17, +δ" ~10 bit / block vs 64 bit raw BC1 RC + Huffman 二次熵编码 codebook 与 index → 比裸 BCn 再小 30-50%
图 26 · Crunch 的核心思路。把整张图所有 4×4 BCn 块视作样本,做 k-means 聚类成 1024 个代表块(codebook),原图每块只存"codebook 索引 + 小偏移量"——典型 ~10 bit 一块,远小于裸 BCn 的 64 bit。codebook 自身和 index 流再用 RC + Huffman 二次熵压缩,最终磁盘体积比 BCn 再砍 30-50%。运行时 CPU 侧解回普通 BC1/BC3 块,再上传 GPU——sample 性能跟普通 BCn 完全一样。
Fig 26 · Crunch's core idea. Treat every 4×4 BCn block in an image as a sample, run k-means to cluster them into 1024 representative blocks (a codebook), and store each block as just "codebook index + small offset" — typically about 10 bits per block, far less than the 64 bits raw BCn would use. Both the codebook and the index stream go through a second pass of RC + Huffman entropy coding, and the on-disk size shrinks another 30–50 % below BCn. At runtime the CPU decodes back to ordinary BC1 / BC3 blocks before upload — sample performance is identical to vanilla BCn.

技术内核

Technical core

Crunch 的工程实现只有两步,但每步都精妙。① k-means cluster:把所有 4×4 块当成一个 64-bit 高维向量样本(BC1 块结构 = 2 个 16-bit 端点 + 32-bit 4-color 索引),用 k-means 在 BC 块空间内聚类成 N 个代表块(典型 N = 1024);每块只存"代表块索引 + 局部偏移量"。这一步把"每块 64 bit 独立"变成"每块 ~10 bit 索引 + 小残差",体积压缩比通常 4-6 倍,但因为 BCn 本身已经是有损,残差很小,质量损失可控。② RC + Huffman 二次熵编码:对 codebook 自身(1024 × 64 bit = 8 KB)和 index 流(~10 bit × 块数)再用 range coder + Huffman 树压缩——index 流通常有强自相关(同一区域的相邻块很可能落在同一 cluster),熵很低,Huffman 能再砍 30-50%。最终 .crn 文件平均比裸 BCn 小 50%,跟 PNG 体积差不多但能直接送 GPU(还要先在 CPU 上 swizzle 回 BCn,有几十 ms 解码延迟)。运行时解码是 streaming 的——可以一边读文件一边解块,不需要一次加载整张图——这是 Crunch 设计上对 mmap 友好的一个细节。

Crunch's engineering implementation has just two steps, but each is delicately tuned. ① k-means clustering: treat every 4×4 block as a 64-bit high-dimensional vector (a BC1 block = two 16-bit endpoints + a 32-bit 4-colour index), then run k-means in BC-block space to find N representative blocks (typically N = 1024); each block stores "representative-block index + local offset." This step turns "64 independent bits per block" into "about 10 bits of index + a tiny residual," giving a 4–6× size compression — and because BCn is already lossy, the residual is small and quality loss stays controlled. ② Second-pass RC + Huffman entropy coding: the codebook itself (1024 × 64 bits = 8 KB) and the index stream (~10 bits × block count) go through a range coder plus Huffman tree — the index stream is strongly auto-correlated (adjacent blocks in the same region almost always fall in the same cluster), entropy is low, and Huffman shaves another 30–50 %. The resulting .crn file averages 50 % smaller than raw BCn, comparable in size to PNG but ready to ship to the GPU (you do still need a CPU swizzle back to BCn first, costing tens of ms of decode latency). Runtime decode is streaming — blocks can be decoded as the file streams in, no need to load the whole image at once — a deliberate design choice that makes Crunch friendly to mmap.

适用

USE FOR

  • 2010s 移动游戏纹理资产包压缩(BC1/BC3 体积敏感场景)
  • 需要"BC 之上再省一半"的资产管线
  • 研究/学习 cluster + 熵编码思路的参考
  • 2010s mobile-game texture bundle compression (BC1 / BC3 size-sensitive cases)
  • Asset pipelines needing another 50 % off on top of BC
  • Reference for studying cluster + entropy-coding designs

反适用

AVOID

  • 2018 之后任何新项目——直接用 Basis Universal
  • 需要支持 BC7 / ASTC / ETC2 等多种块格式(Crunch 只覆盖 BC1/BC3)
  • 需要运行时直接送 GPU,不接受 CPU 解码延迟
  • 跨平台部署(无 ETC/ASTC 转码,移动端覆盖差)
  • Any new project after 2018 — use Basis Universal instead
  • Pipelines needing multiple block formats (Crunch only covers BC1 / BC3)
  • Cases that demand zero CPU decode at runtime
  • Cross-platform deployment (no ETC / ASTC transcode, weak mobile coverage)
scopeenginestoolsCLI
Crunch (.crn) Unity 内置 importer(早期版本) · Unreal Engine 4 移动端(可选) · 现代主流引擎已弃用 crnlib(开源 C++ 库) · crunch CLI · ~ 命令仍可用但维护停滞 crunch -file in.png -fileformat crn -dxt1
基于:based on: BC1 / BC3(额外熵编码层) 被替代:superseded by: Basis Universal(同作者 2018 年的通用化版本) 同代竞品:contemporary peer: KTX2 + Zstd supercompression

Mipmap — 一张纹理八张分辨率

Mipmap — one texture, eight resolutions

YEAR 1983 AUTHOR Lance Williams(NYIT) EXT — · 概念,不是文件格式 PAPER "Pyramidal Parametrics" · SIGGRAPH 1983 STD OpenGL / D3D / Vulkan / Metal / WebGL / WebGPU 全部内置 STORAGE base + log₂(N) 降采样层 · 显存 +33% SAMPLE GPU 按 LOD 自动选 mip level · 双线性 / trilinear / aniso CONTAINERS KTX / KTX2 / DDS 内置 mip chain STATUS 所有 3D 场景纹理的基础设施(必须)

"一张纹理八张分辨率,采样时按距离自动选。"

"One texture, eight resolutions — auto-picked by distance at sample time."

1983 年 Lance Williams(NYIT,纽约理工学院计算机图形实验室)在 SIGGRAPH 发表 'Pyramidal Parametrics',第一次系统提出 mipmap 概念。它解决的问题是 3D 场景里最古老也最折磨人的视觉 bug:aliasing(走样/摩尔纹/闪烁)。当一个有纹理的多边形(墙、地面、远处地形)远离相机,屏幕上一个像素就会覆盖纹理上多个 texel——简单的"取最近 texel"采样会随机丢掉大部分信息,产生移动时的闪烁、网格图案上的摩尔纹、远处瓦片纹理的"沸腾"效果。Williams 的洞察:预先存好 N 个降采样层(每层是上一层的 2× 缩放,带 box filter 平均),采样时按"屏幕像素覆盖纹理多大"(LOD,Level of Detail)自动选合适那一层。代价是显存 +33%(几何级数 1+1/4+1/16+...→4/3),收益是无 aliasing + 缓存命中率提升(远处 mip 是小图,容易留在 GPU L2)。所有现代 GPU 纹理默认都带 mipmap,几乎所有引擎都强制开启——这是 GPU 时代的"一旦学会就回不去"的基础设施。

In 1983 Lance Williams (NYIT — the New York Institute of Technology Computer Graphics Lab) published "Pyramidal Parametrics" at SIGGRAPH, the first systematic proposal of the mipmap concept. It solved one of the oldest and most maddening visual bugs in 3D rendering: aliasing (moiré patterns, shimmering, sparkle). When a textured polygon (a wall, the ground, distant terrain) recedes from the camera, a single screen pixel covers many texels — naive "nearest-texel" sampling randomly throws away most of the information, producing shimmering as the camera moves, moiré on grid textures, and "boiling" on distant tiled surfaces. Williams's insight: pre-store N down-sampled layers (each is the previous one box-filtered to 2× smaller), and at sample time pick the right layer based on how much texture area each screen pixel covers (LOD — Level of Detail). The cost is +33 % VRAM (a geometric series 1 + 1/4 + 1/16 + … → 4/3); the payoff is zero aliasing plus better cache hits (distant mips are small and fit in GPU L2). Every modern GPU texture defaults to having mipmaps, almost every engine enforces them — once you know how it feels, you never go back. It's the infrastructure of the GPU era.

MIPMAP CHAIN · 256² → 1² · VRAM ≈ 4/3 × base 256 128 64 32 16 8 4 2 1 mip 0 mip 1 mip 2 mip 3 mip 4 mip 5 mip 6 mip 7 mip 8 VRAM cost (relative to base = 1) base = 1.00 mip 1 = 0.25 mip 2+ = 0.0833... (geo series) Σ = 1 + 1/4 + 1/16 + ... → 4/3 × base +33% VRAM 换零 aliasing
图 27 · 标准 mipmap 金字塔。一张 256×256 的 base 纹理(mip 0)逐层降采样到 128 / 64 / 32 / ... / 1(mip 8)。每层面积是上一层的 1/4,所有 mip 加起来体积为 base × (1 + 1/4 + 1/16 + ...) = 4/3 × base——只多 33% 显存,换来全距离零 aliasing + 远处纹理 cache 命中率大幅提升。GPU sample 时基于像素 derivative(屏幕像素覆盖纹理多大)自动选合适 mip level,可在两层之间双线性插值(trilinear filtering)避免 mip 跳变可见的"接缝"。
Fig 27 · A standard mipmap pyramid. A 256×256 base texture (mip 0) is down-sampled layer by layer to 128 / 64 / 32 / … / 1 (mip 8). Each layer is one-quarter the area of the previous one, and the total comes to base × (1 + 1/4 + 1/16 + …) = 4/3 × base — only 33 % more VRAM in exchange for zero aliasing at any distance and dramatically better cache hit-rate for distant textures. At sample time the GPU picks the right mip level from pixel derivatives (how much texture area one screen pixel covers), and can bilinearly blend between two adjacent mips (trilinear filtering) to hide the visible "seams" of mip transitions.

技术内核

Technical core

Mipmap 的工程实现非常直接,但每个细节都暗含思想。① 每张 base 图额外存 log₂(N) 个降采样层:mip 0 = base 原始图;mip 1 = box-filtered 2× 缩放;mip 2 = 又 2×;直到 1×1。一张 1024×1024 base 共 11 个 mip(0~10)。降采样可以用 box filter(简单平均)、Lanczos(更锐利但更贵)、或在 sRGB 空间需要先 gamma decode 再 filter 再 encode 回去——很多老引擎在 sRGB 纹理上没做 gamma-correct mip 生成,导致远处纹理看起来"灰蒙蒙"。② 显存额外 +33%:几何级数 1 + 1/4 + 1/16 + ... = 4/3,极限是基础体积的 4/3。这是个固定开销,大概率值得——除非你的纹理永远只在近距离用(UI、屏幕特效)。③ GPU 采样时按 LOD 自动选 mip level:GPU 在像素着色器里能算出当前像素的 dPdx/dPdy(纹理坐标在屏幕水平/垂直方向的偏导数),据此估出"一个屏幕像素覆盖纹理多大",对数运算后得到 LOD 浮点数。整数部分选 mip level,小数部分用于trilinear filtering——在两个 mip 之间双线性插值,避免 mip 跳变可见的"接缝"。④ 各向异性过滤(anisotropic filtering)是 mipmap 的延伸——当视角倾斜时(比如往远处看的地面),屏幕像素在纹理上覆盖的不是正方形而是细长的矩形,简单 trilinear 会过模糊。aniso filtering 沿主轴方向多采样几次再加权,质量更好但带宽更大,通常给"开 16x aniso"档位。⑤ 容器内置 mip chain:KTX/KTX2/DDS 都把 mip 0 → mip N 顺序拼接进 payload,加载时一次 mmap 全部入显存——这就是为什么 GPU 容器格式天生跟 mipmap 绑定的设计。

The engineering of mipmap is straightforward, but every detail hides a small lesson. ① Each base texture stores log₂(N) extra down-sampled layers: mip 0 = the original base; mip 1 = box-filtered 2× smaller; mip 2 = another 2×; … down to 1×1. A 1024×1024 base has 11 total mips (0–10). The down-sample filter can be box (simple averaging), Lanczos (sharper but more expensive), or — for sRGB textures — must gamma-decode, filter in linear, then re-encode; many older engines skipped gamma-correct mip generation, which is why distant textures looked "washed out" in their games. ② +33 % VRAM: the geometric series 1 + 1/4 + 1/16 + … = 4/3, fixed extra cost. Almost always worth it, unless the texture is only ever used up close (UI, screen FX). ③ GPU picks the mip level by LOD at sample time: in a pixel shader the GPU can compute dPdx / dPdy (the partial derivatives of the texture coordinate in screen X / Y), use that to estimate "how much texture one screen pixel covers," and take the log to get a floating-point LOD. The integer part chooses the mip; the fractional part feeds trilinear filtering — bilinear blending between two adjacent mips to mask the visible "seams" of a mip transition. ④ Anisotropic filtering is an extension of mipmap — at oblique viewing angles (looking down at distant ground, say), one screen pixel covers a long thin rectangle on the texture, not a square, and plain trilinear over-blurs. Aniso filtering takes multiple samples along the major axis and weights them, giving better quality at the cost of bandwidth — usually exposed as a "16× aniso" toggle. ⑤ Containers embed the mip chain: KTX / KTX2 / DDS all concatenate mip 0 → mip N into the payload so a load can mmap the whole pyramid into VRAM at once — which is why GPU container formats have always been designed hand-in-glove with mipmaps.

适用

USE FOR

  • 任何 3D 场景纹理(地形、建筑、角色、远景)——必须开启
  • 视差贴图、normal map、AO、roughness 等 PBR 纹理
  • 大场景中需要保留远处细节但避免 aliasing 的所有用途
  • 纹理 cache 性能优化(远处用小 mip,提升 L2 命中)
  • Any 3D scene texture (terrain, architecture, characters, vistas) — must be enabled
  • Parallax / normal / AO / roughness and other PBR maps
  • Any case in a large scene that wants distant detail without aliasing
  • Texture cache optimisation (distant geometry uses small mips, boosting L2 hit-rate)

反适用

AVOID

  • 2D UI 元素、屏幕后处理 LUT(永远 1:1 采样,mip 浪费 33% 显存)
  • 动态生成的纹理(每帧重新生成 mip 太贵)
  • Render target / framebuffer attachment(通常不需要 mip)
  • 极小图(< 32×32,mip 层只有几层,收益小)
  • 2D UI elements, screen post-process LUTs (always 1:1 sampling — mip wastes 33 % VRAM)
  • Dynamically generated textures (regenerating mips every frame is expensive)
  • Render target / framebuffer attachments (usually don't need mips)
  • Very small textures (< 32×32 — only a couple of mip levels, minimal benefit)
scopeAPIstoolsCLI
Mipmap ✓✓ 所有 GPU 硬件原生 · OpenGL / D3D / Vulkan / Metal / WebGL / WebGPU 全部内置 ✓✓ glGenerateMipmap(GPU 端生成) · texconv -m · NVIDIA Texture Tools · KTX-Software toktx · ImageMagick texconv -m 0 -f BC7_UNORM in.png(0 = full chain)· toktx --genmipmap out.ktx2 in.png
起源:origin: Lance Williams · "Pyramidal Parametrics" · SIGGRAPH 1983 运行环境:runtime substrate for: BCn / ETC2 / ASTC 内置于:embedded in: KTX · DDS · KTX2 · 所有 GPU 容器 延伸:extension: anisotropic filtering / sparse virtual textures (Mega-Texture)

OpenEXR — 影视行业标准

OpenEXR — the film industry standard

YEAR 1999 (ILM 内部) / 2003 (开源) AUTHOR Industrial Light & Magic · Florian Kainz / Rod Bogart EXT .exr MIME image/x-exr STD ASWF (Academy Software Foundation, Linux Foundation) CODECS NONE / RLE / ZIP / PIZ (wavelet 无损) / PXR24 / B44 / DWAA / DWAB DEPTH float16 (half) · float32 · uint32 CHANNELS 任意数量 · 不止 RGBA · 可有 Z / motion / object_id / normal / UV / pass… ALPHA ✓ (premultiplied 默认) ANIM 帧序列 (多文件 .0001.exr / .0002.exr) · 单文件 multi-part (2.0+) STATUS VFX / 动画 / 渲染输出行业默认 · ACES 工作流绑定

"星战幕后用了 30 年的格式,你做合成第一个学的就是它。"

"30 years of Star Wars VFX runs on this. The first format you learn in compositing."

1999 年 ILM 在做《珍珠港》《魔戒》前期合成时,发现手上没有一个合用的中间格式:16-bit TIFF 不够动态范围(镜头闪光、火焰、HDRI 环境贴图很容易超过 1.0),Radiance HDR(C29)只有 RGBE 三通道、不能装 Z-depth / motion vector / object ID。VFX 合成流程的真实需求是:(a) 真 HDR float——亮度无上限,可正可负;(b) 任意 channel——一张文件能塞 RGBA + Z + Normal + Motion + Object ID + UV pass + 几十层灯光分层;(c) tile-based 部分加载——一个 8K EXR 可能 2 GB,Nuke / Houdini 经常只读视口看得到的那一小块;(d) 多分辨率 mip——给 IBL 环境贴图直接拿不同 LOD 采样。OpenEXR 就是为这四件事设计的,30 年没出过第二个对手。

In 1999 ILM was deep in pre-production on Pearl Harbor and The Lord of the Rings, and discovered that no existing intermediate format fit their pipeline: 16-bit TIFF lacked dynamic range (lens flares, explosions and HDRI environment maps easily exceed 1.0), and Radiance HDR (C29) only carried three RGBE channels — no Z-depth, motion vector or object ID. The real VFX-compositing requirements were: (a) true HDR float — unbounded brightness, possibly negative; (b) arbitrary channels — one file holding RGBA + Z + Normal + Motion + Object ID + UV pass + dozens of light groups; (c) tile-based partial loading — an 8K EXR can be 2 GB, and Nuke / Houdini routinely read only the viewport tile; (d) multi-resolution mips — sampling IBL environment maps at the right LOD. OpenEXR was designed for those four needs, and in 30 years no rival has emerged.

EXR · FILE LAYOUT (SCANLINE / TILED) 0x00 EOF M magic 4 B V ver 4 B header attributes displayWindow / dataWindow / compression / lineOrder… name=value · null-terminated channel list R/G/B/A/Z/Normal.x/… name + pixelType offsets tile/scan → random IO pixel data tiles or scanline blocks codec compressed ↳ pixel data zoom · tiled mode (4×4 tiles) read only this offset table → seek to any single tile in O(1) → 2 GB EXR, read 64 KB Nuke / Houdini viewport streams just visible tiles
图 28a · OpenEXR 文件布局。开头 magic + version,接下来是 header attributes(displayWindow / dataWindow / compression / lineOrder 等键值对),然后是 channel list(每个 channel 名 + pixelType,可以有几十个),再到 tile / scanline offset table(随机访问索引),最后才是按 codec 压缩过的 pixel data。底部展示 tiled 模式下"只读视口那一块"的能力——offset table 让你能 seek 到任意 tile,Nuke / Houdini 加载 2 GB EXR 时只 stream 屏幕上看得到的几十 KB。这是 OpenEXR 跟 TIFF / PNG 最本质的区别:不是为完整读图设计的,是为合成流水线的"局部读"设计的。
Fig 28a · OpenEXR file layout. The header begins with magic + version, then a stream of attributes (displayWindow / dataWindow / compression / lineOrder…), then a channel list (each channel's name + pixelType — there can be dozens), then a tile / scanline offset table (random-access index), and finally the codec-compressed pixel data. The lower diagram shows tiled mode's "read only the visible tile" capability: the offset table lets you seek to any tile in O(1), so Nuke / Houdini reading a 2 GB EXR only ever streams the few kilobytes the viewport actually shows. That is OpenEXR's deepest difference from TIFF / PNG: it was not designed for whole-image reads, it was designed for partial reads in a compositing pipeline.
EXR CODECS · RELATIVE FILE SIZE (lower = smaller) NONE 1.00 RLE 0.78 ZIP 0.48 PIZ 0.33 ← lossless 默认 PXR24 0.27 (lossy 24-bit) DWAA 0.12 ← lossy 视觉无损 桔 = 无损快 · 蓝 = 无损 · 紫 = lossy float24 · 绿 = lossy 视觉无损 (DCT)
图 28b · EXR 6 种主流 codec 在同张 4K HDRI 上的相对体积:NONE 1.00(基准,纯字节流);RLE 0.78(适合大片同色,如 alpha matte);ZIP 0.48(zlib 通用快速);PIZ 0.33(wavelet 无损,行业默认,比 ZIP 略慢但小 30%);PXR24 0.27(把 float32 截到 24-bit,几乎无损);DWAA 0.12(基于 DCT 的 lossy,视觉无损,体积砍 5-10×,常用于 dailies)。EXR 的特殊之处:你能 per-part / per-channel 选 codec——RGBA 用 DWAA,Z-depth 必须 ZIP 无损,Object ID 用 RLE 整数最佳。
Fig 28b · Relative size of the six main EXR codecs on the same 4K HDRI: NONE 1.00 (raw byte stream baseline); RLE 0.78 (great for large flat regions like alpha mattes); ZIP 0.48 (general-purpose, zlib, fast); PIZ 0.33 (wavelet, lossless, the industry default — slightly slower than ZIP but ~30 % smaller); PXR24 0.27 (truncates float32 to 24 bits, nearly lossless); DWAA 0.12 (DCT-based lossy, visually lossless, 5-10× smaller — the dailies favourite). What makes EXR special: you can pick a codec per part / per channel — DWAA for RGBA, lossless ZIP for Z-depth, RLE for the integer Object ID.
EXR · ONE FILE, MANY CHANNELS scene_v003.exr · 1 file RGBA Z depth N.xyz normal M.xy motion objID int UV.xy pass light_* key/fill/rim/… channel list: R, G, B, A, Z, N.x, N.y, N.z, M.x, M.y, objID, UV.x, UV.y, light_key.r, light_fill.r, light_rim.r, … → Nuke "Shuffle" 节点按名抽取任意一组 · 合成时 Z + N + M 全在手
图 28c · 一张典型 VFX 渲染输出 EXR:除了常规 RGBA,还可同时存 Z(depth pass,景深 / 雾 / Z-merge 用)、Normal(法线 pass,relight 用)、Motion(运动矢量,motion blur 用)、Object ID(整数 ID,选择性合成用)、UV(投影修改贴图)、以及若干灯光分层(key / fill / rim 各自的 RGB,用于在合成里调灯比)。一张 EXR 几十个 channel 是常态。Nuke 用 "Shuffle" 节点按名抽取——这就是为什么 EXR 是合成行业的母语,而 PNG / JPEG 在合成里完全没用武之地。
Fig 28c · A typical VFX render-output EXR: alongside the regular RGBA it also stores Z (depth pass — for DOF, fog, Z-merge), Normal (for relighting), Motion (motion-blur vectors), Object ID (integer IDs for selective compositing), UV (projection-based texture edits), plus several light-group passes (key / fill / rim — each as its own RGB, used to re-balance lighting in the comp). Dozens of channels in one EXR is normal. Nuke's "Shuffle" node extracts any group by name — which is exactly why EXR is the compositing industry's mother tongue, and why PNG / JPEG simply have no role in the comp room.
PIXEL TYPE · NUMERIC RANGE (log scale) −1e38 −65504 0 +65504 +1e38 uint16 [0, 65535] · LDR · 老 TIFF float16 [−65504, 65504] · ±Inf/NaN · EXR 默认 float32 ±3.4e38 · 高精度科学渲染 2× 体积 SDR=1
图 28d · 三种像素类型的数值范围。uint16 只能 [0, 65535] 而且只是整数,SDR 上限是 1.0(=65535)——传统 TIFF 用它。float16(half)用 16 bit 表示 [−65504, +65504] 的浮点 + ±Inf + NaN,可以远超 SDR 1.0(高光、火焰、太阳),也可以是负数(色彩空间变换中间值),这是 EXR 的默认float32 范围 ±3.4×10³⁸,精度极高但体积 2×,科学渲染 / 高精度 IBL 用它。HDR 的本质就是"突破 1.0 这条线",而 float16 是工程上最经济的实现。
Fig 28d · Numeric range for the three pixel types. uint16 covers only the integer range [0, 65535] — the SDR ceiling is 1.0 (= 65535), the realm of traditional TIFF. float16 (half) uses 16 bits to represent floats in [−65504, +65504] plus ±Inf and NaN, can far exceed SDR 1.0 (highlights, fire, the sun), and can be negative (intermediate values in colour-space transforms) — this is EXR's default. float32 spans ±3.4 × 10³⁸ with extreme precision, but at 2× the storage; reserved for scientific rendering and high-precision IBL. HDR is fundamentally about "crossing the 1.0 line," and float16 is the cheapest engineering vehicle for that.

技术内核

Technical core

OpenEXR 的设计跟 PNG / JPEG / TIFF 走的不是一条路——它不是"把一张 RGBA 图存好",而是"为合成流水线提供一个可流式部分加载、可任意配 channel、可分通道选 codec 的容器"。① 任意 channel:不止 RGBA,可以是 R / G / B / A / Z / object_id / motion.x / motion.y / normal.x / normal.y / normal.z / UV.x / UV.y 以及任意自定义命名。channel list 在 header 里,每个 channel 自带 pixelType(half / float / uint)和 sampling rate(支持次采样)。② 半精度 float16(half)是默认 pixelType,16 bit 表示 [−65504, +65504] + ±Inf + NaN——这是 ILM 跟 NVIDIA 在 1999 年一起定义的格式,后来被 IEEE 754-2008 收编(binary16),并成为 GPU 显存里 HDR 纹理的事实标准。③ 多压缩 codec:NONE(纯字节流)/ RLE(整数离散最佳)/ ZIP(zlib 通用)/ ZIPS(逐 scanline ZIP)/ PIZ(wavelet 无损,行业默认)/ PXR24(把 float32 截到 24-bit,Pixar 贡献,几乎无损)/ B44 / B44A(老 lossy)/ DWAA / DWAB(基于 DCT 的现代 lossy,DreamWorks 贡献,体积砍 5-10×,常用于 dailies)。可以 per-part 选不同 codec——RGBA 用 DWAA、Z 用 ZIP、Object ID 用 RLE,各取所长。④ Tile-based 部分加载:文件可选 scanline 或 tile 模式,tile 模式下 header 里有 offset table,Nuke / Houdini / Mari 加载 8K EXR 时只读视口看得到的几个 tile(可能 64 KB 而不是 2 GB)——这个能力是非线性合成软件 / 数字绘景的命脉。⑤ 多分辨率 mip:tiled EXR 可存 mipmaps(rip-maps 也行),IBL 环境贴图按 LOD 直接采样,不必外部生成 mip chain。⑥ 多帧 / multi-part:OpenEXR 1.x 用 .0001.exr / .0002.exr 帧序列(每帧一文件,管线友好);2.0(2013)引入单文件多 part,可在一个 .exr 里塞多个 layer / 多个 view(立体渲染左右眼)/ 多个分辨率,每个 part 独立 codec。这种"容器化"路线让 EXR 跟 USD / OCIO / ACES 这些现代 VFX 中间件无缝衔接。

OpenEXR's design takes a different road from PNG / JPEG / TIFF — it is not "store one RGBA image well" but "provide a streamable, partially-loadable container with arbitrary channels and per-channel codec choice for the compositing pipeline." ① Arbitrary channels: not just RGBA but R / G / B / A / Z / object_id / motion.x / motion.y / normal.x / normal.y / normal.z / UV.x / UV.y plus any custom name. The channel list lives in the header, each channel carrying its own pixelType (half / float / uint) and sampling rate (subsampling supported). ② Half-precision float16 is the default pixelType — 16 bits representing [−65504, +65504] plus ±Inf and NaN — a format ILM and NVIDIA jointly defined in 1999, later folded into IEEE 754-2008 (binary16) and now the de-facto standard for HDR textures in GPU VRAM. ③ Multiple compression codecs: NONE (raw bytes) / RLE (best for integer discrete data) / ZIP (general-purpose zlib) / ZIPS (per-scanline ZIP) / PIZ (wavelet lossless, industry default) / PXR24 (truncates float32 to 24 bits — Pixar's contribution, near-lossless) / B44 / B44A (legacy lossy) / DWAA / DWAB (modern DCT-based lossy, DreamWorks' contribution, 5-10× smaller — the dailies workhorse). Codecs can be picked per part — DWAA on RGBA, ZIP on Z, RLE on Object ID; each plays to strength. ④ Tile-based partial loading: files can be scanline or tile mode; in tile mode the header carries an offset table, so Nuke / Houdini / Mari loading an 8K EXR read only the visible tiles (possibly 64 KB out of 2 GB) — that capability is the lifeblood of node-based compositors and digital matte painters. ⑤ Multi-resolution mips: tiled EXR can carry mipmaps (or rip-maps), so IBL environment maps sample at the right LOD without an external mip chain. ⑥ Multi-frame / multi-part: OpenEXR 1.x used per-frame .0001.exr / .0002.exr sequences (one file per frame, pipeline-friendly); 2.0 (2013) added single-file multi-part, packing multiple layers, multiple views (stereo left/right), or multiple resolutions into one .exr with per-part codec choice. That "container-like" direction lets EXR plug seamlessly into modern VFX middleware — USD, OCIO, ACES.

OPENEXR · PER-CHANNEL CODEC · TILE SPLIT · MULTI-PART CONTAINER render output float16 multi-channel buffer RGBA · float16 Z (depth) · float32 Normal.xyz · float16 Motion.xy · float16 objectID · uint32 UV.xy · float16 light_key.rgb · float16 light_fill.rgb · float16 … dozens more channels … GROUP & CODEC PICK RGBA → DWAA (lossy) Z (depth) → ZIP (lossless) Normal.xyz → ZIP (lossless) Motion.xy → ZIP (lossless) objectID → RLE (integer) light_*.rgb → DWAA (lossy) UV.xy → ZIP (lossless) TILE / SCANLINE SPLIT build offset table for random IO tiled · 64×64 px each · seekable → Nuke streams just visible tiles scene.exr multi-part container part 0 · beauty · DWAA part 1 · Z+N+M · ZIP part 2 · objectID · RLE part 3 · light_* · DWAA part 4 · UV · ZIP + offset tables + ACES color space attr RGBA / light passes use lossy DWAA (visually lossless, 5-10× smaller). Z / Normal / Motion / UV stay lossless ZIP (geometry data must survive bit-exact). Integer Object ID uses RLE. Each part can be a different codec, a different pixelType, a different resolution — that's the EXR philosophy: per-channel everything.

图 28 · OpenEXR 完整编码流程。渲染输出是一个多通道浮点 buffer——RGBA、Z(深度,float32)、Normal.xyz、Motion.xy、Object ID(整数)、UV、若干灯光分层。EXR 编码器先按用途分组(beauty / data / int / light / UV),再per-part 选 codec:RGBA 和灯光分层用 DWAA(视觉无损,体积砍 5-10×);Z / Normal / Motion / UV 必须 ZIP 无损(几何数据 1 bit 错就出 artifact);Object ID 是整数用 RLE 最优。然后把每个 part 切成 64×64 tile 并建 offset table(Nuke 加载时可只读视口 tile),最后封装进 multi-part 容器并附 ACES 色彩空间属性。整张 8K EXR 可能从 RAW 800 MB 压到 80 MB,而且任何子集可独立读出——这是 30 年没人能替代它的原因。

Fig 28 · OpenEXR's full encode pipeline. The render output is a multi-channel floating-point buffer — RGBA, Z (depth, float32), Normal.xyz, Motion.xy, Object ID (integer), UV, and several light-group passes. The EXR encoder first groups channels by purpose (beauty / data / int / light / UV) and then picks a codec per part: RGBA and light groups go through DWAA (visually lossless, 5-10× smaller); Z / Normal / Motion / UV must stay lossless ZIP (a single bit-flip in geometry data is a visible artifact); integer Object ID is best as RLE. Then each part is split into 64×64 tiles with an offset table (so Nuke can read just the viewport tiles), and finally everything is packed into a multi-part container with the ACES colour-space attribute attached. An entire 8K EXR can compress from raw 800 MB down to about 80 MB, and any subset can still be read independently — which is why no one has displaced it in 30 years.

formatbit depthfloatchannelstypical use
8-bit JPEG8RGB(YCbCr 内部)screen photo / web
16-bit TIFF16 intRGBAprint photo / scan
Radiance HDRRGBE 32-bit✓ (shared exp)RGBearly CG / IBL
OpenEXR16 / 32 float✓ (true)unlimited (任意 named)VFX / film / render output
HDR10 / HLG10-bit PQ✗ (perceptual)YCbCrTV broadcast / streaming
$ exrheader scene.exr                          # 看 channel / codec / displayWindow / attrs
$ exrinfo scene.exr                            # 简洁版 header 摘要 (OpenEXR 3.x)
$ oiiotool scene.exr -ch R,G,B,A -o rgb.exr    # OpenImageIO 抽 channel
$ oiiotool scene.exr --channels=Z -o depth.exr # 单独抽 depth pass
$ exrenvmap input.exr cubemap.exr              # latlong → cube · IBL 预处理
$ exrmaketiled in.exr tiled.exr                # scanline → tiled (启用部分加载)
$ exrmultipart -combine -i a.exr b.exr -o m.exr# 多 part 合并到一个文件
$ exr2aces in.exr out.exr                      # 转 ACES2065-1 色彩空间

适用

USE FOR

  • VFX 合成中间格式(Nuke / After Effects / Fusion 必备)
  • 渲染器输出(Arnold / V-Ray / Renderman / Cycles 默认 EXR)
  • IBL 环境贴图(latlong / cube,带 mip)
  • ACES 工作流(2015+ 几乎所有好莱坞片)的全流程交换格式
  • 数字绘景 / matte painting(Mari / Photoshop 32-bit 模式)
  • 需要保留 Z / Normal / Motion / Object ID 等 AOV pass 的渲染管线
  • 立体 / 多视图渲染(单文件 multi-part 装左右眼)
  • VFX compositing intermediate (mandatory in Nuke / After Effects / Fusion)
  • Renderer output (Arnold / V-Ray / Renderman / Cycles default to EXR)
  • IBL environment maps (latlong / cubemap, with mip chain)
  • End-to-end exchange format for any ACES workflow (essentially every Hollywood release post-2015)
  • Digital matte painting (Mari, Photoshop 32-bit mode)
  • Render pipelines that must preserve AOV passes — Z / Normal / Motion / Object ID
  • Stereo / multi-view renders (single-file multi-part packs left/right eyes)

反适用

AVOID

  • Web 显示(浏览器不解 EXR · 文件巨大)
  • 移动端 / app 资源(没有 GPU 硬件解码 · 体积不友好)
  • 消费级照片分发(用 JPEG / AVIF / HEIF)
  • 需要 8-bit / 整数像素的最终交付(用 TIFF / PNG)
  • 对体积极敏感的传输场景(EXR 即便 DWAA 也比 JPEG 大几倍)
  • Web display (browsers don't decode EXR; files are huge)
  • Mobile / app assets (no GPU hardware decode; size unfriendly)
  • Consumer photo distribution (use JPEG / AVIF / HEIF)
  • Final-delivery 8-bit / integer pixels (use TIFF / PNG)
  • Bandwidth-critical transmission (even DWAA EXR is several times larger than JPEG)
scopeAPIs / DCCtoolsCLI
OpenEXR ✓✓ Nuke · Houdini · Mari · Maya · Blender · Cinema 4D · DaVinci Resolve · After Effects · Fusion · Photoshop(32-bit) · Arnold / V-Ray / Renderman / Cycles 渲染器全部原生 ✓✓ OpenEXR 官方 lib(C++) · OpenImageIO(oiiotool) · ImageMagick · ffmpeg(EXR sequence) · DJV / mrViewer 看片器 exrheader · exrinfo · oiiotool · exrmaketiled · exrenvmap · exrmultipart
前辈:predecessors: Radiance HDR · TIFF(EXR 是它们的精神接班人) 起源:origin: ILM (Florian Kainz / Rod Bogart) · 1999 内部使用 · 2003 SIGGRAPH 开源 同生态:ecosystem peers: OpenColorIO · OpenVDB · USD · MaterialX(同属 ASWF 现代 VFX 中间件) 影响:influence: USDZ / glTF 的 HDR IBL 标准都源自 EXR latlong/cubemap 的设计 绑定标准:bound to standard: ACES (AMPAS) — 现代电影工业 HDR 色彩管理 · EXR 是 ACES 的原生交换格式 荣誉:honour: Academy Award for Technical Achievement, 2007 — 唯一拿过奥斯卡的图像格式

Radiance HDR — 光照贴图老兵

Radiance HDR — the lightmap veteran

YEAR 1989 AUTHOR Greg Ward · Lawrence Berkeley National Lab(Radiance renderer) EXT .hdr · .pic MIME image/vnd.radiance LOSSY RGBE 编码(hack-y lossy · ~1% 精度) DEPTH 32-bit RGBE(三个 8-bit 尾数 + 一个 8-bit 共享指数) RANGE 理论 1e-38 ~ 1e38 ALPHA STATUS IBL / 全景图老兵 · 老资产仍在用

"用 8-bit 共享指数装 32-bit float —— 1989 的 hack。"

"32-bit float packed via shared 8-bit exponent — a 1989 hack."

1989 年 Greg Ward 在 Lawrence Berkeley National Lab 写 Radiance —— 一套物理光照模拟渲染器,要算光在场景里的真实辐射度,输出值会从 1e-6(月光)横跨到 1e6(太阳直射)。当时的难题不是算法,而是把这些数装到磁盘里:浮点 IEEE 754 32-bit/通道的话,一张 1024×768 的图就要 12 MB,而 1989 的硬盘是几十 MB 起跳的奢侈品。Greg Ward 的 hack:RGB 三个通道共享一个 8-bit 指数—— 把 R/G/B 三个 float 归一化到同一个 2^E 之下,只存归一化后的 8-bit 尾数 + 一个 8-bit 指数,合计 32 bit/像素(跟 RGBA8 一样大)。范围理论上 1e-38 到 1e38,精度 ~1%(对光照足够,对色彩管理就显粗糙)。再配一个极简的 RLE 行内压缩,这就是 .hdr 格式。靠着这个 hack,IBL 环境贴图、PSPI(panoramic stereo painted images)、HDR 全景照片在 90 年代到 2000 年代撑了 20 年,直到 OpenEXR 把它替换掉。

In 1989 Greg Ward at Lawrence Berkeley National Lab was writing Radiance — a physically-based lighting-simulation renderer — and needed to store radiance values that spanned 1e-6 (moonlight) to 1e6 (direct sun). The challenge wasn't the math; it was fitting that range on disk: IEEE 754 32-bit per channel meant a 1024×768 image cost 12 MB, and 1989 hard drives were luxuries measured in tens of megabytes. Ward's hack: have R/G/B share a single 8-bit exponent — normalise the three floats to the same 2^E, store the normalised 8-bit mantissas plus one 8-bit exponent, totalling 32 bit/pixel (the same as RGBA8). Range nominally 1e-38 to 1e38, precision ~1 % (good enough for lighting, coarse for colour management). Add a minimal scanline RLE on top, and you have the .hdr format. The hack carried IBL environment maps, PSPI panoramas and HDR photography through the 1990s and 2000s for two decades, until OpenEXR finally retired it.

RGBE · 4 BYTES PER PIXEL · SHARED EXPONENT R G B E 8 bit · mantissa 8 bit · mantissa 8 bit · mantissa 8 bit · shared exp ↑ shared by R, G, B value = (RGB / 256) × 2^(E − 128) range ~ 1e−38 to 1e38 · precision ~1 %
图 29 · Radiance RGBE 字节布局。每个像素 4 个字节:三个 8-bit 尾数 R / G / B 各占一字节,加一个三个通道共享的 8-bit 指数 E。解码公式 value = (RGB / 256) × 2^(E − 128)——共享指数让三个通道一起放缩,代价是亮度差异极大的颜色(比如蓝色通道很弱、红色很强)精度退化。范围理论上 1e-38 ~ 1e38,精度 ~1%——对光照模拟够用,对色彩管理就嫌粗糙。这是 1989 年的工程取舍:用跟 RGBA8 一样的 32 bit/pixel 装下 6 个数量级的动态范围。
Fig 29 · Radiance RGBE byte layout. Four bytes per pixel: an 8-bit mantissa for each of R / G / B plus a single 8-bit exponent E shared across all three channels. Decode is value = (RGB / 256) × 2^(E − 128) — the shared exponent scales the three channels together, the cost being precision loss when channel intensities differ wildly (a strong red beside a weak blue). Range is nominally 1e−38 to 1e38 at ~1 % precision — fine for lighting simulation, coarse for colour management. The 1989 trade-off: pack six orders of magnitude into the same 32 bit/pixel as RGBA8.

技术内核

Technical core

Radiance HDR 的内核小到只有三件事。① RGBE 编码——三个通道共用一个指数。编码时找 max(R, G, B),归一化到 [0, 1],尾数乘 256 取整,指数加 128 偏移存为一字节;解码时反向。共享指数让"亮度差异极大的颜色"(蓝色通道极弱、红色极强)精度退化——这是它跟 float16 / float32 在色彩管理意义上的本质差距。② 极简 RLE——文件里每一行像素分开压缩:旧格式整行 RGBE 一起 RLE,1991 之后改成"先把 R / G / B / E 四个字节流分别拆开,再各自 RLE",压缩率显著提升(因为 E 经常大段重复,RLE 在它上面收益最大)。压缩开销小到 90 年代的 SGI 能软件实时解码。③ 文本头——文件开头是 ASCII 头,几行 #?RADIANCE / 标识 / 曝光值 / EXPOSURE= / FORMAT= 32-bit_rle_rgbe,然后一个空行,然后是分辨率字符串(-Y 480 +X 640),再之后才是 RLE 二进制流。这种"文本头 + 二进制 payload"的设计后来被 PFM / NetPBM 继承。三件事加起来就是整个 .hdr 格式——简单、自包含、跨平台。代价:① 不支持 alpha;② 没有 metadata(没有 ICC profile、白点、色彩空间);③ 只有 RGB,不能存 Z / Normal / Motion;④ 共享指数精度天生粗糙。这些缺点直接催生了 OpenEXR 在 1999 年的设计目标——"做 Radiance HDR 做不到的所有事"。

The Radiance HDR core is just three things. ① RGBE encoding — three channels share one exponent. Encode by finding max(R, G, B), normalising to [0, 1], scaling mantissas by 256 and rounding, and storing the exponent biased by 128 in one byte; decode is the inverse. The shared exponent loses precision for "channels of wildly different magnitudes" (a strong red beside a weak blue) — that's the format's fundamental colour-management weakness compared to float16 / float32. ② Minimal RLE — pixels are compressed per scanline: the old format ran RLE over the interleaved RGBE bytes; the post-1991 format de-interleaves into four byte streams (R / G / B / E) and RLEs each separately, dramatically improving compression (E often has long runs, where RLE wins biggest). Compression overhead is light enough that 1990s SGI workstations decoded in software in real time. ③ Text header — the file begins with an ASCII header: a few lines of #?RADIANCE / identifier / EXPOSURE= / FORMAT=32-bit_rle_rgbe, then a blank line, then a resolution string (-Y 480 +X 640), and only then the RLE binary stream. The "text header + binary payload" pattern was later inherited by PFM and the NetPBM family. Those three things are the entire .hdr format — simple, self-contained, portable. The costs: ① no alpha; ② no metadata (no ICC profile, no white-point, no colour-space tag); ③ RGB only — no Z, normal or motion channels; ④ inherent precision floor from the shared exponent. Those very gaps drove OpenEXR's 1999 design brief: "do everything Radiance HDR can't."

适用

USE FOR

  • IBL 环境贴图老资产 · 兼容旧渲染器(1990s-2000s 的 .hdr 库)
  • 全景 latlong HDR 照片(Bracketed exposure stitch 工作流末端)
  • HDR 光照模拟的最终输出 · 论文 demo / 教学
  • 需要跨多个 DCC 但又不愿付 EXR 复杂度的场景
  • Legacy IBL environment maps · old-renderer compatibility (1990s-2000s .hdr libraries)
  • Panoramic latlong HDR photographs (the tail end of bracketed-exposure stitch workflows)
  • Final output of HDR lighting simulations · paper demos / teaching
  • Cross-DCC interchange when EXR's complexity isn't worth paying for

反适用

AVOID

  • 现代 VFX 合成场景(用 OpenEXR · 多通道 / 高精度)
  • 需要 alpha 的任何场景(RGBE 没有 A)
  • 色彩管理严格的工作流(共享指数精度太粗 · 没有 ICC)
  • 负数 / 复杂数学中间值(RGBE 只能存非负 RGB)
  • Modern VFX compositing (use OpenEXR — multi-channel, higher precision)
  • Anything needing alpha (RGBE has no A)
  • Strict colour-managed pipelines (the shared exponent is too coarse, no ICC)
  • Negative or complex intermediate maths (RGBE stores only non-negative RGB)
scoperenderers / toolseditorsCLI
Radiance HDR (.hdr / .pic) ✓✓ Radiance · pbrt · Mitsuba · Arnold · V-Ray · Blender Cycles · 几乎所有 IBL 输入支持 Photoshop(32-bit) · GIMP · Affinity Photo · HDRShop · Picturenaut ra_ppm · ra_tiff · ra_xyze(Radiance 自带 ra_* 套件)· oiiotool --convert
起源:origin: Greg Ward · Lawrence Berkeley National Lab · 1989 · Radiance renderer 输出格式 思想前辈:conceptual ancestor: TIFF(自描述容器思想) 影视接班人:VFX successor: OpenEXR · 1999 ILM 设计目标"做 Radiance 做不到的" 仍活在:still alive in: 老 IBL 资产库 · HDRI Haven 早期下载包 · 教育材料

PFM — Portable FloatMap

PFM — Portable FloatMap

YEAR ~2005(spec by Paul Debevec / 学术圈) AUTHOR Paul Debevec(USC ICT)/ 计算机图形学术圈 EXT .pfm CONTAINER ASCII 头 + raw float32 像素流 DEPTH float32 RGB(PF)/ float32 灰度(Pf) COMPRESSION 无 · 故意 METADATA 无 · 故意 STATUS 学术 niche · 渲染器中间盘

"NetPBM 的 HDR 表亲 —— ASCII 头加裸 float。"

"NetPBM's HDR cousin — ASCII header plus raw float."

学术研究和渲染器中间格式有一种长期需求,主流格式都满足不了:"最简单的 HDR 容器"——不要任何压缩(读写都是 mmap,瞬间)、不要任何 metadata(纯净,bit 级 reproducibility)、能直接当 float* 数组操作(C 代码 fopen + fseek 过头部就能用,不需要任何库)。OpenEXR 太复杂(几百种 attribute、wavelet codec、tile / scanline 切换),Radiance HDR 精度太粗(RGBE shared exponent),float TIFF 的 IFD 解析又是一坨。Paul Debevec 等学术圈的人在 NetPBM(PPM / PGM / PBM)风格基础上,做了 PFM:三行 ASCII 头(magic / 宽高 / scale 字段)紧跟 raw float32 像素流。论文 supplementary、渲染器中间盘、调试图像 dump,这些场景里 PFM 是最舒服的——别的格式都嫌"太聪明"。

Academic research and renderer-internal storage share a recurring need that no mainstream format satisfies: the simplest possible HDR container — no compression (read / write is just mmap), no metadata (pure, bit-exact reproducibility), and direct use as a float* array (C code can fopen, fseek past the header, and operate on the bytes without any library). OpenEXR is too complex (hundreds of attributes, wavelet codecs, tile / scanline modes), Radiance HDR is too coarse (RGBE shared exponent), float TIFF's IFD parsing is its own mess. Paul Debevec and colleagues in academia took the NetPBM lineage (PPM / PGM / PBM) and produced PFM: three ASCII header lines (magic / width-height / scale) followed by a raw float32 pixel stream. For paper supplementaries, renderer internal dumps, and debugging-image scratch storage, PFM is the most comfortable choice — every other format feels "too clever."

PFM · 3-LINE TEXT HEADER + RAW FLOAT32 text header PF ← magic (PF=RGB · Pf=gray) 640 480 ← width height −1.0 ← scale (sign = endian) 3 lines · ~20 bytes raw float32 stream R₀ G₀ B₀ R₁ G₁ B₁ … 12 byte / pixel · RGB bottom-up (like BMP) no padding · mmap-able
图 30 · PFM 文件布局。前三行是 ASCII 文本:第一行 magic(PF = RGB · Pf = 灰度);第二行宽高(空格分隔);第三行 scale 字段(浮点数,符号位决定字节序——负数小端,正数大端,数值本身用作曝光缩放)。三行加起来约 20 字节。紧跟着就是 raw float32 像素流,RGB 顺序排列,12 字节/像素,自下而上(跟 BMP 同向)。整个文件可以 mmap 直接当 float* 用,跳过头部就行——这是 PFM 唯一的设计目标。
Fig 30 · PFM file layout. The first three lines are ASCII text: line 1 is the magic (PF = RGB, Pf = grayscale); line 2 is width and height separated by a space; line 3 is the scale field (a float whose sign bit encodes endianness — negative for little-endian, positive for big-endian — and whose magnitude doubles as an exposure factor). Total header ~20 bytes. The raw float32 pixel stream follows, RGB-interleaved, 12 bytes per pixel, stored bottom-up (same orientation as BMP). The entire file can be mmap'd as a float* after skipping the header — that simplicity is PFM's single design goal.

技术内核

Technical core

PFM 内核三件事,合起来不到 30 行 C 代码就能写完读写器。① NetPBM 风格的文本头:像 PPM 一样,前几行是 ASCII。第一行是 magic 标识——PF 表示 float32 RGB,Pf 表示 float32 单通道灰度。第二行是宽高(空格分隔的整数)。第三行是 scale 字段——一个浮点数,绝对值是曝光 / 缩放因子(读取时通常忽略,渲染器自己处理),符号编码字节序:负数小端,正数大端。三行,十几个字符。② raw float32 RGB 像素流:头部紧跟二进制 float32 数据,RGB 交错(R₀ G₀ B₀ R₁ G₁ B₁ …),12 字节/像素;灰度模式 4 字节/像素。自下而上(像 BMP,但跟 OpenGL 纹理坐标天然吻合)——这是最常见的踩坑点,新人写 reader 很容易上下颠倒。③ 无任何压缩 / 无任何 metadata:这是故意的。没有 ICC profile,没有色彩空间,没有曝光记录,没有作者注释——纯粹"一张数字"。这恰好是论文实验、渲染器调试、参考实现里最重要的属性:你要 reproduce 别人的结果,任何额外 metadata 都是干扰。代价是它没法用于生产:文件大(4K RGB float32 ≈ 95 MB,无压缩),没法做色彩管理,工具支持 niche。但在它的位置——学术调试、bit-exact 中间盘——没人能替代它。

PFM has three core elements; a complete reader/writer fits in under 30 lines of C. ① NetPBM-style text header: like PPM, the first few lines are ASCII. Line 1 is the magic — PF for float32 RGB, Pf for float32 grayscale. Line 2 is the width and height (space-separated integers). Line 3 is the scale field — a float whose absolute value is an exposure / scale factor (typically ignored at read time; the renderer handles tone mapping itself), and whose sign encodes endianness: negative is little-endian, positive is big-endian. Three lines, a dozen characters. ② Raw float32 RGB pixel stream: binary float32 data follows the header, RGB-interleaved (R₀ G₀ B₀ R₁ G₁ B₁ …) at 12 bytes per pixel; grayscale is 4 bytes per pixel. Bottom-up (like BMP, but conveniently aligned with OpenGL's texture coordinate origin) — the most common pitfall when writing a reader is flipping the rows. ③ No compression, no metadata: intentional. No ICC profile, no colour-space tag, no exposure record, no author comment — just "the numbers." That happens to be the single most important property for paper experiments, renderer debugging, and reference implementations: when you reproduce someone else's result, any extra metadata is noise. The cost is that PFM is unsuitable for production: files are huge (a 4K RGB float32 image is ~95 MB uncompressed), there's no colour management, and tooling support is niche. But in its niche — academic debugging, bit-exact intermediate scratch — nothing else replaces it.

适用

USE FOR

  • 学术论文 supplementary / reproducibility 数据集
  • 渲染器内部中间盘(每帧 dump,要求 mmap 速度)
  • 算法 bit-exact 对比(diff 两个 .pfm 必须完全一致)
  • 调试 / 可视化 float buffer(GPU readback dump)
  • Academic-paper supplementaries / reproducibility datasets
  • Renderer internal scratch storage (per-frame dumps that need mmap speed)
  • Bit-exact algorithm comparison (diff'ing two .pfm files must match byte for byte)
  • Debugging / visualising float buffers (GPU readback dumps)

反适用

AVOID

  • 任何生产场景(无压缩 · 文件巨大)
  • 需要色彩管理的工作流(无 ICC / 无色彩空间)
  • 跨工具协作(支持 niche)
  • Web / 移动端(浏览器不解码)
  • Any production scenario (uncompressed → enormous files)
  • Colour-managed pipelines (no ICC, no colour-space tag)
  • Cross-tool collaboration (niche support)
  • Web / mobile (no browser decode)
scopetoolslibrariesCLI
PFM (.pfm) ImageMagick · OpenImageIO · pfstools · MATLAB / OpenCV(自定义 reader 常见) libpfs · OIIO · pbrt 自带 reader · 论文配套源码常自带 30 行 C 实现 pfsin / pfsout(pfstools)· oiiotool in.pfm -o out.exr
同源精神:kindred spirit: PPM / NetPBM · 简洁优先,文本头 + raw payload 并存:coexists with: OpenEXR(学术 vs 工业 · PFM 给 paper,EXR 给 pipeline) 活跃在:alive in: 计算机图形学论文 supplementary · pbrt / Mitsuba 教科书

16/32-bit TIFF — 被忽视的扛把子

16/32-bit TIFF — the unsung workhorse

YEAR 1986(1.0)· 1992(6.0,事实标准至今) AUTHOR Aldus → Adobe(1994 收购后接管) EXT .tif · .tiff MIME image/tiff STD TIFF 6.0(1992,Aldus / Adobe 联合发布) CODECS NONE / LZW / DEFLATE(ZIP)/ JPEG-in-TIFF / Group 4 fax / PackBits DEPTH 1-bit 黑白 → 32-bit float · 任意精度 ALPHA ✓(ExtraSamples · 多通道任意命名) ANIM 多页(IFD chain · 传真 / 扫描一文件多张) STATUS 印刷 / 摄影 / 扫描 / 卫星 / 医学 / 显微镜 行业默认

"40 年了,印刷厂还在用它,因为没有更好的替代。"

"40 years on, print shops still use it because nothing better replaced it."

1986 年 Aldus(后来被 Adobe 在 1994 收购)推 PageMaker —— 桌面排版革命的开端,问题是当时图像格式碎成一地:GIF 只有 256 色,EPS 是 PostScript 矢量,Mac PICT 跨不了平台,扫描仪厂商各自用私有格式。Aldus 跟扫描仪厂商一起设计了 TIFF —— Tag Image File Format —— 目标是"任何位深、任何 codec、任何 metadata、跨平台无损"。它的解法是 tag 系统:不像 BMP 那样固定字段,而是用 IFD(Image File Directory)装一个"几百种 tag 都可选"的描述表,payload 可换 codec,可多页,可跨设备元信息。从此所有需要"高保真 + 灵活元数据"的领域都默认 TIFF:印刷出版、扫描仪、显微镜、医学影像、卫星遥感、文物档案。40 年了,没人替代得了——不是因为它优秀,是因为它什么都能装:DICOM 内嵌它,GeoTIFF 是它的子集,DNG 是它的子集,Photoshop 16-bit 工作流默认它。被忽视的扛把子。

In 1986 Aldus (acquired by Adobe in 1994) launched PageMaker, the start of the desktop publishing revolution. The problem: image formats were a Tower of Babel — GIF only did 256 colours, EPS was vector PostScript, Mac PICT didn't cross platforms, scanner vendors each shipped a proprietary format. Aldus partnered with scanner vendors to design TIFF — Tag Image File Format — aiming for "any bit depth, any codec, any metadata, cross-platform, lossless." The solution was a tag system: rather than fixed fields like BMP, an IFD (Image File Directory) carries a descriptive table of "hundreds of optional tags," the payload swaps codecs, files can be multi-page, and device metadata travels with the image. From then on every domain needing "high fidelity + flexible metadata" defaulted to TIFF: print publishing, scanners, microscopes, medical imaging, satellite remote-sensing, museum archives. Forty years later nothing has replaced it — not because it's elegant, but because it can hold anything: DICOM embeds it, GeoTIFF is its subset, DNG is its subset, Photoshop's 16-bit workflow defaults to it. The unsung workhorse.

TIFF · IFD CHAIN · TAG-DRIVEN CONTAINER header 8 byte · MM/II · IFD0 ofs IFD0 · tag table tag=256 ImageWidth · LONG · 1 · 1024 tag=257 ImageLength · LONG · 1 · 768 tag=258 BitsPerSample · SHORT · 3 · ofs tag=259 Compression · SHORT · 1 · 8(LZW) image data · LZW strip next IFD ofs IFD1 · page 2 tag=256 ImageWidth · … tag=259 Compression · ZIP tag=… ExtraSamples · α → next IFD ofs = 0(end) image data · ZIP strip 每个 tag = (id 2B, type 2B, count 4B, value/offset 4B) · 12 byte 一行 → 任意位深 / 任意 codec / 任意页数 / 任意自定义 metadata GeoTIFF / DNG / OME-TIFF / DICOM 都是这套 IFD 的子集扩展
图 31 · TIFF 的 IFD(Image File Directory)结构。文件以 8 字节 header 开始(MM=大端 / II=小端 + 版本号 42 + IFD0 偏移)。IFD0 是tag 表,每行 12 字节(tag id 2 字节 + 数据类型 2 字节 + 元素个数 4 字节 + value 或指向 value 的 offset 4 字节)。tag 256 = ImageWidth,tag 259 = Compression(8 = LZW,32773 = PackBits,7 = JPEG,…),tag 322 = TileWidth,几百种。每个 IFD 末尾还有"指向下一个 IFD 的 offset"——多页 TIFF 就是 IFD chain(传真组 4 文档常见,扫描仪扫一沓纸输出一个 .tif)。GeoTIFF / DNG / OME-TIFF / 内嵌 TIFF 的 DICOM 全都是在这个 tag 系统上加自定义 tag——TIFF 的真正持久力在于 tag 系统的扩展性,不在于 baseline TIFF 本身。
Fig 31 · TIFF's IFD (Image File Directory) structure. The file opens with an 8-byte header (MM=big-endian or II=little-endian + version 42 + IFD0 offset). IFD0 is a tag table; each row is 12 bytes (tag id 2 B + data type 2 B + element count 4 B + value-or-offset 4 B). Tag 256 = ImageWidth, tag 259 = Compression (8 = LZW, 32773 = PackBits, 7 = JPEG, …), tag 322 = TileWidth — hundreds of tags exist. Each IFD ends with an "offset to the next IFD" — a multi-page TIFF is an IFD chain (Group 4 fax documents are the canonical case; flatbed scanners produce one .tif per stack of pages). GeoTIFF, DNG, OME-TIFF, and TIFF-embedded DICOM all extend the tag system with custom tags — TIFF's true longevity lies in the extensibility of the tag system, not in baseline TIFF itself.

技术内核

Technical core

TIFF 的设计核心可以总结成四条规则,40 年没变。① 基于 IFD 的 tag 系统:文件不是"按字段顺序"装数据,而是"我有什么属性,就在 tag 表里加一行"。tag id 是 16-bit 无符号整数(0~65535),Adobe 保留 32768 以下,32768~65535 是private tags(GeoTIFF / DNG / OME-TIFF 等子集格式都在这个区域)。每个 tag 自带数据类型(BYTE / SHORT / LONG / RATIONAL / ASCII / FLOAT / DOUBLE 等 12 种),解析器只要"我认识的 tag 处理,不认识的跳过"。这种设计直接借鉴了 IBM 的 EBCDIC 数据描述传统,后来又被 ISOBMFF / Matroska 等容器借鉴。② 多页(IFD chain):每个 IFD 末尾有一个指向"下一个 IFD"的 offset,多页 TIFF 就是把 IFD 串成链表。最经典用例是传真组 4(Group 4 Fax)——黑白文档扫描多页存一个 .tif;现在扩展到扫描仪批量扫描、显微镜 z-stack、卫星多光谱波段,每页一个 IFD。③ 多种 codec 可选:NONE(原始)/ PackBits(早期 Mac RLE)/ LZW(默认无损,90 年代有专利争议)/ DEFLATE(zlib,无损,现在最常用)/ JPEG-in-TIFF(把 JPEG bitstream 当 strip 数据装,1992 加,但 spec 模糊导致实现不一致)/ Group 3 / Group 4 Fax(双值黑白图像专用)/ LERC(地理空间近无损)。每个 strip 或 tile 独立 codec。④ 任意位深:1 bit(黑白扫描)/ 4 / 8(普通照片)/ 16(高保真扫描、医学影像)/ 32-bit float(IEEE 754,科研、HDR)。BitsPerSample tag 是个数组——可以是 (16, 16, 16) 表示 RGB 各 16 bit,可以是 (8, 8, 8, 8) 表示 RGBA8,甚至 (16, 16, 16, 16, 16, 16) 表示 6 通道高光谱。SampleFormat tag 进一步指定每个通道是 unsigned int / signed int / IEEE float / void(自定义)——这就是 TIFF 能存 16-bit 摄影、32-bit float HDR、整数 ID buffer 的根源。

TIFF's design boils down to four rules, unchanged in 40 years. ① IFD-based tag system: the file isn't laid out as "fields in fixed order," it's "whatever properties exist, add a row to the tag table." Tag IDs are 16-bit unsigned integers (0–65535); Adobe reserves 0–32767 and the 32768–65535 range is private tags (where GeoTIFF, DNG, OME-TIFF and other subset formats live). Each tag carries its own data type (BYTE / SHORT / LONG / RATIONAL / ASCII / FLOAT / DOUBLE — 12 in total), and a parser simply handles tags it knows and skips the rest. The design borrows directly from IBM's EBCDIC data-description tradition and was later borrowed by ISOBMFF, Matroska and other modern containers. ② Multi-page (IFD chain): each IFD ends with an offset to the next IFD, so multi-page TIFFs are linked lists of IFDs. The classic use case is Group 4 fax — multi-page black-and-white document scans in a single .tif; today this extends to flatbed batch scans, microscope z-stacks, and satellite multi-spectral bands, one IFD per page. ③ Multiple codec options: NONE (raw) / PackBits (early Mac RLE) / LZW (default lossless, embroiled in 1990s patent disputes) / DEFLATE (zlib, lossless, today's most common choice) / JPEG-in-TIFF (a JPEG bitstream stuffed into strip data, added in 1992 but with vague enough spec language that implementations still disagree) / Group 3 and Group 4 fax (bilevel black-and-white only) / LERC (near-lossless geospatial). Each strip or tile picks its codec independently. ④ Arbitrary bit depth: 1-bit (B&W scans) / 4 / 8 (regular photos) / 16 (high-fidelity scans, medical imaging) / 32-bit float (IEEE 754, science, HDR). The BitsPerSample tag is an array — it can be (16, 16, 16) for RGB at 16 bpp, (8, 8, 8, 8) for RGBA8, or even (16, 16, 16, 16, 16, 16) for six-channel hyperspectral. The SampleFormat tag further specifies whether each channel is unsigned int / signed int / IEEE float / void (custom) — that combination is exactly why TIFF can hold 16-bit photography, 32-bit float HDR, and integer ID buffers in the same container.

适用

USE FOR

  • 印刷出版 · 高保真扫描 · CMYK 精确印前流程
  • 扫描仪 / 复印机 / 传真机默认输出(Group 4 多页)
  • 卫星 / 航测 GeoTIFF · 医学 DICOM · 显微镜 OME-TIFF · 文物数字化
  • 16-bit 摄影 / Photoshop 高位深工作流中间格式
  • Print publishing · high-fidelity scanning · CMYK pre-press pipelines
  • Default output of scanners / copiers / fax machines (multi-page Group 4)
  • Satellite / aerial GeoTIFF · medical DICOM · microscopy OME-TIFF · cultural-heritage digitisation
  • 16-bit photography / intermediate format in Photoshop's high-bit-depth workflow

反适用

AVOID

  • Web 网页内嵌图(浏览器不解码,得转 JPEG / PNG / WebP / AVIF)
  • 移动端 / app 内分发(体积大、解码慢)
  • 消费级照片分享(用 JPEG / HEIC)
  • 对"必须 100% 兼容"敏感的场景(各家 reader 支持 tag 子集不同)
  • Web pages (browsers don't decode TIFF; convert to JPEG / PNG / WebP / AVIF)
  • Mobile / in-app distribution (large, slow to decode)
  • Consumer photo sharing (use JPEG / HEIC)
  • Anywhere "must be 100 % compatible" matters (different readers support different tag subsets)
scopeeditors / DCClibrariesCLI
TIFF (.tif / .tiff) ✓✓✓ Photoshop · Lightroom · Capture One · Affinity · GIMP · DaVinci Resolve · ArcGIS / QGIS · Fiji / ImageJ · DICOM viewers · 几乎所有图像工具 libtiff(40 年事实标准 reference 实现)· OpenImageIO · GDAL · scikit-image · libgeotiff · OME Bio-Formats tiffinfo · tiffcp · tiffsplit · tiff2pdf · oiiotool · gdalinfo
起源:origin: Aldus(1986)→ Adobe(1994 收购后接管)· TIFF 6.0 (1992) 仍是事实标准 设计灵感:design inspiration: IBM EBCDIC 数据描述传统 → tag 系统 子集 / 扩展:subsets / extensions: GeoTIFF(地理空间)· DNG(Adobe Raw)· OME-TIFF(显微镜)· BigTIFF(64-bit offset) 影响:influence: OpenEXR 的 attribute 系统 · ISOBMFF / Matroska 的 box 系统都来自这条思想 嵌入于:embedded in: DICOM(医学)· 印刷 RIP · 扫描仪固件 · 摄影 raw 工作流

RAW — 厂商林立的原始数据

RAW — the manufacturer-fragmented origin data

YEAR 1990s 起 · 各厂商各自起步 AUTHOR 各厂商各自(Canon / Nikon / Sony / Fuji / Olympus / Pentax / Adobe …) EXT 无统一(.cr2 / .cr3 / .nef / .arw / .raf / .orf / .rw2 / .dng …几十种) CONTAINER 绝大多数 TIFF/IFD base · CR3 例外(ISOBMFF) DEPTH 12 / 14 / 16 bit/channel · sensor 原生 PATTERN 通常 Bayer (RGGB / BGGR / GRBG / GBRG) · Fuji X-Trans 例外 LOSSY 基本无损 · 部分厂商有"有损 RAW"模式(Canon CRaw / Sony Lossy) METADATA EXIF + 厂商私有 MakerNotes(白平衡 / 色彩矩阵 / 镜头 / 拍摄参数) STATUS 摄影后期主流 · 商业拍摄 / 专业新闻 / 风光 / 婚礼 / 影楼默认

"所谓 RAW,不是一个格式,是几十个互不兼容的格式族。"

"'RAW' is not one format — it's a zoo of several dozen incompatible formats."

数码相机 sensor 的原始输出是 12-14 bit Bayer pattern raw 数据——每个像素位置上只有一个颜色样本(R 或 G 或 B),需要 demosaic 算法才能算出完整 RGB。如果在相机里直接转 JPEG,会立即丢掉四样东西:(a) 高位深(14 bit → 8 bit,动态范围砍 64 倍);(b) demosaic 之前的灵活性(JPEG 已经是固定算法插值过的结果,不能换);(c) 白平衡可调性(JPEG 已经把 WB 烘进像素,后期改容易出色偏);(d) 曝光宽容度(过曝 / 欠曝在 14 bit RAW 里能拉回来,JPEG clip 后无法恢复)。摄影师需要"把决定留到后期再做"的格式 = RAW。但每家相机厂商都自己定义,互不兼容,这是后期工作流 30 年的最大头疼——也是 LibRaw / Lightroom 这些工具存在的全部理由。

A digital camera sensor's raw output is 12-14 bit Bayer-pattern data — each pixel position carries only one colour sample (R or G or B), and a demosaic algorithm has to interpolate the full RGB. Convert to JPEG inside the camera and you immediately lose four things: (a) high bit depth (14 bit → 8 bit, dynamic range cut by 64×); (b) flexibility before demosaic (JPEG is already a fixed-algorithm interpolation, you can't swap it); (c) white-balance malleability (JPEG bakes WB into pixels; later changes risk colour casts); (d) exposure latitude (over- and under-exposure can be recovered in 14 bit RAW; JPEG clips and the data is gone). The format that lets photographers "defer decisions to post" is RAW. But every camera maker defined its own, none compatible with the others — that has been the post-production headache of the past 30 years, and the entire reason LibRaw / Lightroom / Capture One exist.

BAYER CFA · FOUR COMMON 2×2 ARRANGEMENTS RGGB Sony / Nikon Canon 多数 BGGR 老 Olympus 部分 Phase One GRBG 部分 Pentax Hasselblad GBRG 某些 Panasonic 老 Leica 2×2 重复 · 2 绿 + 1 红 + 1 蓝(模拟人眼对绿色更敏感) → RAW reader 必须先知道是哪种排列,否则 demosaic 全错
图 32a · 主流 Bayer CFA(Color Filter Array)的四种 2×2 排列。绿色样本是红/蓝的两倍,因为人眼对绿色亮度最敏感(Bayer 1976 在专利里就这样设计的)。RGGB 是 Sony / Nikon / Canon 大多数现代机的默认;BGGR / GRBG / GBRG 是早期或特定厂商的选择。RAW 文件必须在 metadata 里声明排列,否则解码器 demosaic 会把红绿蓝全部错位。Fuji X-Trans 是另外一套 6×6 排列,完全不是 Bayer——这是 Fuji RAF 文件特别麻烦的原因。
Fig 32a · The four common 2×2 arrangements of the Bayer CFA (Color Filter Array). Green samples are double red and blue because the human eye is most sensitive to green luminance — Bayer's original 1976 patent already specified this. RGGB is the default on most modern Sony / Nikon / Canon bodies; BGGR / GRBG / GBRG appear in earlier or vendor-specific lines. The RAW file must declare its arrangement in metadata, otherwise the demosaic step will permute red, green, and blue everywhere. Fuji's X-Trans is a separate 6×6 arrangement, not Bayer at all — which is what makes Fuji RAF files particularly awkward.
DEMOSAIC · 1 SAMPLE/PX → 3 SAMPLES/PX Bayer raw 1 颜色 / 像素 demosaic AHD / VNG PPG / AMaZE interpolate RGB demosaiced 3 颜色 / 像素 · 内插
图 32b · demosaic(去马赛克)是 RAW 处理的灵魂:Bayer raw 每个像素只有一个颜色样本,需要算法内插出另外两个通道。常见算法有 AHD(Adaptive Homogeneity-Directed,各向异性同质,默认平衡)、VNG(Variable Number of Gradients,梯度自适应)、PPG(Patterned Pixel Grouping,Canon DPP 老算法)、AMaZE(RawTherapee 自研,细节保留好但慢)。不同 demosaic 算法会让同一张 RAW 出来的细节、伪色、摩尔纹完全不同——这是为什么不同 RAW 处理软件(Lightroom / Capture One / DxO PhotoLab)对同一文件的"出图味道"差别很大。
Fig 32b · Demosaic is the soul of RAW processing: a Bayer raw stores only one colour sample per pixel, and an algorithm has to interpolate the other two channels. Common choices: AHD (Adaptive Homogeneity-Directed, a balanced default), VNG (Variable Number of Gradients, gradient-adaptive), PPG (Patterned Pixel Grouping, Canon DPP's classic), and AMaZE (RawTherapee's home-grown, detail-preserving but slow). Different demosaic algorithms produce visibly different detail, false colour and moiré on the very same RAW — which is why Lightroom, Capture One and DxO PhotoLab all give the same file a different "look."
RAW PROCESSING PIPELINE sensor 12-14 bit Bayer raw CFA demosaic AHD / VNG WB white balance color matrix camera → XYZ tone curve exposure / S-curve color space sRGB / P3 8-bit JPEG or 16-bit TIFF / DNG output 每一步都可调 · 后期不喜欢可重做 · 这是 RAW 的全部价值 JPEG 是把这条流水线在相机里跑完一次 · 烘进 8-bit 像素 · 不可逆 → Lightroom / Capture One / RawTherapee 都是这条流水线的 GUI 底层引擎来自 LibRaw / dcraw / DCP color profile
图 32c · RAW 处理流水线。从 sensor 出来的 Bayer raw,要经过 demosaic → 白平衡 → 色彩矩阵(camera RGB → XYZ → 输出色彩空间)→ 曝光 / tone curve → 最终编码(JPEG / TIFF / DNG / HEIC)。每一步都是可调参数——这正是 RAW 区别于 JPEG 的全部价值。JPEG 是相机里把整条流水线跑完一次、把结果烘进 8-bit 像素;RAW 是把"sensor 出厂的样子"原封不动存下来,把所有决定留到电脑前的后期阶段。Lightroom / Capture One / RawTherapee 的全部 UI,本质上都是这条流水线的可视化前端。
Fig 32c · The RAW processing pipeline. Bayer raw from the sensor passes through demosaic → white balance → colour matrix (camera RGB → XYZ → output colour space) → exposure / tone curve → final encoding (JPEG / TIFF / DNG / HEIC). Every step is a tunable parameter — that's the entire value RAW has over JPEG. JPEG runs this whole pipeline once inside the camera and bakes the result into 8-bit pixels; RAW preserves the sensor's "as-shipped" state byte-for-byte and defers every decision to post on the computer. The UIs of Lightroom, Capture One and RawTherapee are essentially visual front-ends to this pipeline.
VENDOR RAW · WHO INVENTS WHAT brand ext since container Canon .CR2 / .CR3 2004 / 2018 TIFF · ISOBMFF Nikon .NEF 1999 TIFF Sony .ARW 2005 TIFF Fujifilm .RAF 2000 TIFF · X-Trans Olympus .ORF 2003 TIFF Panasonic .RW2 2008 TIFF Pentax .PEF / .DNG 2003 / DNG opt. TIFF · 可选 DNG Leica .DNG 原生 DNG · 唯一原生大厂 Adobe .DNG 2004 TIFF · 公开 spec 三大厂(Canon / Nikon / Sony)都不主动给开源 spec
图 32d · 主流相机厂商的 RAW 格式与起源年份。绝大多数都是 TIFF/IFD 容器加私有 tag + 私有有损/无损压缩;Canon CR3 是个例外——2018 年 Canon 把容器从 TIFF 改成了 ISOBMFF(同 HEIF / MP4),为的是跟 HEIF 工具链兼容。Leica 是唯一原生用 DNG 的大厂。三巨头(Canon / Nikon / Sony)都不公开 spec,LibRaw / dcraw 全靠逆向工程支持。这就是为什么"开一个 ARW 文件"在不同软件里得到的结果不完全一致——大家都在猜 Sony 的有损压缩到底怎么解。
Fig 32d · Major camera vendors' RAW formats and start years. The vast majority are TIFF/IFD containers with private tags and proprietary lossy/lossless compression; Canon CR3 is the exception — in 2018 Canon swapped the container from TIFF to ISOBMFF (same family as HEIF / MP4) to interoperate with the HEIF toolchain. Leica is the only major brand using DNG natively. The three giants (Canon, Nikon, Sony) don't publish open specs, and LibRaw / dcraw support them entirely through reverse engineering. That's why "open an ARW file" gives slightly different results across applications — everyone is guessing how Sony's lossy compression actually decodes.

技术内核

Technical core

RAW 不是一个格式,是一种思路的几十种实现。技术上有五条共同线索。① Bayer mosaic CFA:sensor 上每个物理像素只盖一种颜色滤镜(R/G/B 中的一种),按 2×2 重复排列。每个 2×2 块里有 2 绿 + 1 红 + 1 蓝(模拟人眼对绿色亮度更敏感)。读 RAW 必须先知道是 RGGB / BGGR / GRBG / GBRG 哪种,再用demosaic 算法(AHD / VNG / PPG / AMaZE / DCB / Igv …十多种)插出每个像素完整的 RGB。Fuji X-Trans 是个异类——6×6 X 形排列,普通 demosaic 算法对它效果差,得用专门的 X-Trans demosaic。② 12-14 bit/channel:不是 8 bit。这意味着比 JPEG 多 4-6 stop 动态范围(高光 / 暗部都能拉)。CMOS sensor 物理 ADC 通常 14 bit,Phase One 等中画幅可达 16 bit。RAW 把这些位深原样保留,后期"曝光 +2 / -2"才不会出 banding。③ 白平衡 / 色彩矩阵 / tone curve 全部未应用:相机只在 EXIF / MakerNotes 里"记录"拍摄时的 WB 是 5500K 还是 Auto,但不烘进像素。色彩矩阵(把 sensor 厂商特定的 R/G/B 响应曲线映射到标准 XYZ 色彩空间的 3×3 矩阵)也是同样:存为 metadata,由后期解码器应用。这是 RAW 跟 JPEG 的根本不同——后者是"决定都做完了的最终结果",前者是"原料 + 配方,但还没开火"。④ 容器基本都基于 TIFF/IFD:Canon CR2 / Nikon NEF / Sony ARW / Fuji RAF / Olympus ORF / Pentax PEF / Panasonic RW2 几乎全是 TIFF base 加私有 tag 区(0x8769 EXIF + 0x927C MakerNote + 厂商私有 tag id)。例外是 Canon CR3(2018 起,改用 ISOBMFF / HEIF 同源容器)和 Sigma X3F(自家完全独立)。这种"TIFF + 私有 tag"的设计意味着标准 TIFF reader 能看到大致结构,但解不出像素——必须靠厂商 SDK 或 LibRaw 的逆向工程。⑤ 解码必须靠厂商 SDK 或 LibRaw:Adobe Camera Raw / Lightroom 的 RAW 解码引擎是闭源商业;开源世界里 LibRaw(Dave Coffin 单文件 C 程序 dcraw 的继承者)通过逆向工程支持几乎所有相机 RAW,是 darktable / RawTherapee / digiKam / Fiji 的共同底层。dcraw 本身是工程史奇迹——Coffin 一个人 20 年维护一份单文件 C,支持上千款相机。LibRaw 接手后变成了正式 lib + 持续更新。

RAW is not one format but one idea realised dozens of times. Five common technical threads. ① Bayer mosaic CFA: every physical sensor pixel sits behind a single colour filter (one of R/G/B), arranged in a repeating 2×2. Each 2×2 has 2 green + 1 red + 1 blue (mirroring the eye's stronger luminance response to green). Reading a RAW requires first knowing the arrangement (RGGB / BGGR / GRBG / GBRG) and then running a demosaic algorithm (AHD / VNG / PPG / AMaZE / DCB / Igv … more than ten exist) to interpolate the full RGB at every pixel. Fuji X-Trans is the oddball — a 6×6 X-shaped pattern, on which generic Bayer demosaicers do poorly; it needs a dedicated X-Trans demosaic. ② 12-14 bit/channel, not 8. That means 4-6 stops more dynamic range than JPEG (highlights and shadows both recoverable). CMOS sensor ADCs are usually physically 14 bit; Phase One and similar medium-format gear reach 16 bit. RAW keeps every bit, so post-exposure "+2 / −2" doesn't band. ③ White balance, colour matrix, and tone curve are not applied. The camera only records in EXIF / MakerNotes that WB was set to 5500K or Auto — it does not bake it into the pixels. The colour matrix (a 3×3 mapping from the sensor's vendor-specific R/G/B response into standard XYZ) is likewise stored as metadata for the decoder to apply later. That is the deep difference from JPEG: JPEG is "all decisions, finalised"; RAW is "ingredients plus recipe, but the burner is off." ④ The container is almost always TIFF/IFD: Canon CR2 / Nikon NEF / Sony ARW / Fuji RAF / Olympus ORF / Pentax PEF / Panasonic RW2 are all TIFF-based with private tag regions (0x8769 EXIF + 0x927C MakerNote + vendor-private tag ids). Exceptions: Canon CR3 (since 2018, ISOBMFF — the HEIF / MP4 family) and Sigma X3F (entirely independent). The "TIFF + private tags" design means a generic TIFF reader can see the gross structure but can't decode the pixels — that requires the vendor SDK or LibRaw's reverse-engineering. ⑤ Decoding leans on the vendor SDK or LibRaw: Adobe Camera Raw / Lightroom's RAW decoder is a closed-source commercial engine; in open source, LibRaw (the successor to Dave Coffin's single-file dcraw) supports nearly every camera RAW through reverse engineering and is the shared backend of darktable / RawTherapee / digiKam / Fiji. dcraw itself is an engineering miracle — Coffin maintained a single-file C program for 20 years that supported thousands of cameras solo. LibRaw took over and turned it into a proper library with continuous updates.

RAW · END-TO-END PIPELINE · CAMERA TO FINAL DELIVERABLE capture (in camera) shutter pressed CMOS sensor Bayer raw 14-bit vendor private compress .CR3 / .NEF / .ARW + EXIF + MakerNotes → stored to SD card import DECODE (host computer) LibRaw / Adobe Camera Raw vendor SDK · DCP color profile demosaic · AHD / VNG / AMaZE white balance · 5500K / 自定义 exposure · highlight / shadow tone curve · color space → sRGB / P3 output (final deliverable) multiple targets · choose per use 16-bit TIFF · print master / archive DNG · vendor-neutral RAW archive JPEG · web / 客户预览 HEIC / AVIF · 移动端高质量 JPEG XL · 高质量低体积新选 原 RAW 文件保留 · 后期不喜欢可重做 → 这就是 RAW 的全部价值:决定都还在 同一张 RAW 五年后用新算法可重出更好结果 RAW 工作流 = 把"在相机里跑完整条流水线 + 烘进 8-bit JPEG"的不可逆决定,延后到电脑前的可逆后期。 代价:厂商互不兼容 · 文件大(20-100 MB / 张)· 需要 LibRaw 或 Adobe / Capture One 才能解码。 收益:动态范围多 4-6 stop · 白平衡可改 · demosaic 可换算法 · 5 年后用新工具重出仍受益。

图 32 · RAW 端到端处理流水线。左:相机内,sensor 出 14-bit Bayer raw,经厂商私有压缩(基本无损或可选有损)写到 .CR3 / .NEF / .ARW 文件,带 EXIF + MakerNotes 元数据。中:导入电脑后由 LibRaw / Adobe Camera Raw 用 DCP color profile + 厂商 SDK 解码,跑 demosaic → 白平衡 → 曝光 → tone curve → 色彩空间。右:输出多种最终格式——印刷归档用 16-bit TIFF,跨厂商归档用 DNG,网页用 JPEG,手机端用 HEIC / AVIF,新选项 JPEG XL。原 RAW 文件保留——这是 RAW 的全部价值:5 年后有更好的 demosaic 算法或调色风格,你可以重新出图。

Fig 32 · The end-to-end RAW pipeline. Left: in camera, the sensor produces 14-bit Bayer raw, vendor-private compression writes it to .CR3 / .NEF / .ARW with EXIF + MakerNotes metadata. Middle: imported to a host computer where LibRaw / Adobe Camera Raw decode it with the DCP colour profile and the vendor SDK, running demosaic → white balance → exposure → tone curve → colour space. Right: multiple final outputs — 16-bit TIFF for print masters and archive, DNG for vendor-neutral archive, JPEG for the web, HEIC / AVIF for mobile, and JPEG XL as the newer high-quality / low-size option. The original RAW is kept — this is the whole point of RAW: in five years, better demosaic algorithms or a new grade let you re-render the same shot.

brandformatyearbit depthcontainer
CanonCR2 / CR32004 / 201814TIFF base · CR3 改 ISOBMFF
NikonNEF199912-14TIFF base
SonyARW200514TIFF base
FujifilmRAF200014TIFF base · X-Trans CFA
OlympusORF200312TIFF base
AdobeDNG200412-32TIFF base · 公开 spec
$ dcraw -v -w in.NEF                          # dcraw: 用相机 WB 解码 NEF 输出 PPM
$ dcraw -i -v in.CR2                          # 只读 metadata 不解码
$ rawtherapee-cli -o out.tif -t -c in.CR2     # RawTherapee 命令行 RAW → 16-bit TIFF
$ darktable-cli in.ARW out.jpg                # darktable 命令行 RAW → JPEG
$ exiv2 -p a in.RAF                           # 查 EXIF + MakerNotes
$ exiftool -a -G1 -s in.NEF                   # 万能元数据查看 · 厂商私有 tag 都列出来
$ libraw_unpack in.ARW                        # LibRaw 命令行: 输出未处理 Bayer raw
$ Adobe\ DNG\ Converter --convert in.CR3 out.dng # 转 DNG 归档

适用

USE FOR

  • 商业摄影 / 婚礼 / 时尚 / 风光 / 影楼后期(必需 RAW)
  • 专业新闻 / 体育摄影(后期裁剪 / 曝光宽容度)
  • HDR 包围曝光合成源(三张 RAW 比三张 JPEG 信息多得多)
  • 天文摄影 / 长时间曝光(暗部噪点处理依赖 14 bit)
  • 需要"5 年后用新工具重出"的归档(DNG 推荐)
  • Commercial / wedding / fashion / landscape / studio post-production (RAW required)
  • Professional news / sports (post-crop, exposure latitude)
  • HDR bracketed merging (three RAWs carry vastly more information than three JPEGs)
  • Astrophotography / long exposures (shadow noise-handling needs 14-bit headroom)
  • Archives expected to be re-rendered with future tools (DNG recommended)

反适用

AVOID

  • 终端用户分享(没人想看 .NEF · 给 JPEG / HEIC)
  • 实时预览 / 直播(解码慢)
  • 移动端 / Web(浏览器不解 · 工具链没接)
  • 手机日常拍照(ProRAW 例外,但 99% 场景普通 JPEG / HEIC 够用)
  • 极小存储 / 极小内存设备(RAW 文件 20-100 MB / 张)
  • Sharing with end users (nobody wants a .NEF — give them JPEG / HEIC)
  • Live preview / streaming (decode is slow)
  • Mobile / web (browsers don't decode; toolchains aren't wired)
  • Everyday phone photography (ProRAW excepted; JPEG / HEIC suffices for 99 % of cases)
  • Very-tight-storage / tight-memory devices (a RAW file is 20-100 MB)
scopecommercialopen sourceCLI / lib
vendor RAW (CR3 / NEF / ARW / RAF / ORF / RW2 / DNG …) ✓✓✓ Adobe Lightroom · Camera Raw · Capture One · DxO PhotoLab · Phase One Capture · ON1 Photo RAW · Luminar ✓✓ RawTherapee · darktable · ART · digiKam · UFRaw · Krita(导入)· GIMP(via plug-in)· Fiji dcraw · libraw · exiftool · exiv2 · rawtherapee-cli · darktable-cli · Adobe DNG Converter
容器祖宗:container ancestor: TIFF(几乎所有 RAW 都是 TIFF base · CR3 例外用 ISOBMFF) 试图统一:attempted unification: DNG(Adobe 2004 推 · 部分采用) 主流厂商分支:vendor branches: CR3 / NEF / ARW(Canon / Nikon / Sony 三巨头各自专有) 解码生态:decode ecosystem: dcraw → LibRaw(开源)· Adobe Camera Raw / Capture One / DxO(商业) 输出去向:delivery targets: 16-bit TIFF · JPEG · JPEG XL · HEIC / AVIF

DNG — Adobe 想统一 RAW

DNG — Adobe's attempt to unify RAW

YEAR 2004 AUTHOR Adobe Systems EXT .dng MIME image/x-adobe-dng STD DNG 1.7 (2023) · 公开 spec · 免授权 BASE TIFF 6.0 / TIFF/EP 扩展 DEPTH 12 / 14 / 16 / 32 bit · 整数或浮点 STATUS Pentax / Leica / Hasselblad / iPhone ProRAW 原生 · Canon / Nikon / Sony 不主动

"想做 RAW 的 PNG,部分成功。"

"Tried to be the PNG of RAW. Partial success."

2004 年 Adobe 看到 RAW 生态彻底碎掉:Canon CR2、Nikon NEF、Sony ARW、Fuji RAF、Olympus ORF、Pentax PEF、Panasonic RW2…几十种格式互不兼容,每出一款新相机 Adobe Camera Raw / Lightroom 就得加一个 decoder profile,工作量惊人;摄影师归档时也心慌——5 年后还能不能开一张今天的 .ARW?Adobe 推出 DNG(Digital Negative),基于开放的 TIFF/EP(TIFF Electronic Photography)扩展,目标只有一个:"一个公开 spec 的 RAW 格式装所有厂商的数据"。结果一半成功:Pentax / Leica / Hasselblad 选择原生输出 DNG,Apple 2020 年的 iPhone ProRAW 也用 DNG 包装;但 Canon / Nikon / Sony 三巨头坚持自家专有,从未给 DNG 让路。Adobe DNG Converter 工具可以把任意厂商 RAW 离线转 DNG 做归档,但转换过程可能有损 metadata——某些 MakerNotes 字段在 DNG 里没有标准对应,只能丢弃。

By 2004 Adobe saw the RAW ecosystem fully fragmented: Canon CR2, Nikon NEF, Sony ARW, Fuji RAF, Olympus ORF, Pentax PEF, Panasonic RW2 — dozens of mutually incompatible formats. Every new camera body forced Adobe Camera Raw / Lightroom to add another decoder profile, the workload was extraordinary, and photographers were nervous about archiving — would today's .ARW still open in five years? Adobe introduced DNG (Digital Negative), built on the open TIFF/EP (TIFF Electronic Photography) extension, with one goal: "one publicly specified RAW format that holds every vendor's data". The result was half a success: Pentax / Leica / Hasselblad chose to output DNG natively, and Apple's 2020 iPhone ProRAW wraps DNG too — but Canon / Nikon / Sony stuck with their proprietary formats and have never made room for DNG. The Adobe DNG Converter can offline-convert any vendor's RAW to DNG for archive, but conversion may lose some metadata — certain MakerNotes fields have no standard DNG equivalent and are simply dropped.

DNG · TIFF/EP-BASED CONTAINER TIFF header II/MM version 42 IFD0 ofs DNG tags CFAPattern CalibrationIlluminant ColorMatrix1/2 AsShotNeutral MakerNotes vendor private passthrough 透传 · 不丢 (部分支持) raw Bayer 14 bit + JPEG preview 基于 TIFF 6.0 + TIFF/EP · 加 DNG 私有 tag(色彩矩阵 / 校准光源 / CFA pattern …) → 任意 TIFF reader 能看见结构 · DNG 解码器才能正确出图 MakerNotes 透传:厂商私有元数据原样保留(虽然 DNG 解码器看不懂) 通常嵌一张 JPEG preview 用于快速浏览(Lightroom 缩略图)
图 33 · DNG 容器结构。底层是 TIFF 6.0 + TIFF/EP 扩展;Adobe 加了一组 DNG 私有 tag,核心是 ColorMatrix1 / ColorMatrix2(标准光源 D65 / A 下的色彩矩阵,把 sensor RGB 映射到 XYZ)、CFAPattern(Bayer 排列)、CalibrationIlluminant(校准光源)、AsShotNeutral(拍摄时白平衡)。第三个 chunk 是 MakerNotes 透传区——厂商专有元数据原样保留,即使 DNG 解码器看不懂也不丢。最后是原始 Bayer 像素数据,通常还嵌一张 JPEG preview 给 Lightroom 做缩略图。任何 TIFF reader 能看到结构,但只有 DNG-aware 解码器(Adobe Camera Raw / LibRaw)能正确出图。
Fig 33 · DNG container structure. The base is TIFF 6.0 + the TIFF/EP extension; Adobe adds a set of DNG-private tags whose core is ColorMatrix1 / ColorMatrix2 (colour matrices under standard illuminants D65 / A, mapping sensor RGB to XYZ), CFAPattern (the Bayer arrangement), CalibrationIlluminant (calibration illuminant), and AsShotNeutral (white balance at capture). The third chunk is a MakerNotes passthrough region — vendor-private metadata is preserved verbatim even if the DNG decoder can't interpret it. Finally comes the raw Bayer pixel data, usually with an embedded JPEG preview for Lightroom thumbnails. Any TIFF reader can see the structure, but only a DNG-aware decoder (Adobe Camera Raw / LibRaw) can render it correctly.

技术内核

Technical core

DNG 三件事撑起整个设计。① 基于 TIFF/EP 扩展:DNG 不是从零设计的容器,而是在 TIFF 6.0 + TIFF/EP(TIFF Electronic Photography,1998 ISO 12234-2)上加了一组规范化的私有 tag。这意味着已有 TIFF reader 能看到大致结构(虽然不能正确出图),也意味着 DNG spec 公开后,任何人能写 DNG 解码器——Adobe 故意降低门槛。② 厂商私有 metadata 透传:DNG 在容器里专门留一块 MakerNotes 区,把原厂的私有元数据(比如 Sony ARW 里的某个加密曝光块)原样塞进去,DNG 解码器看不懂也不会丢。这是 Adobe 跟厂商的"和解":你转 DNG 不会丢你的相机特定信息,某天厂商 SDK 想读还能读回去。③ 包含 demosaic 后的可选预览 + 完整原始 sensor 数据:DNG 文件里通常嵌一张 JPEG preview(给 Lightroom 缩略图秒开)+ 完整的 Bayer raw payload(给后期重新解码)。比起原厂 RAW 多 5-10% 体积,但换来"打开就有缩略图"的体验。某些 DNG 还可选 lossy compressed 模式(Adobe Lossy DNG,基于 JPEG 在 raw 域上做有损,体积砍 50% 但有损 RAW 的灵活度——主要给 iPhone ProRAW 用)。

DNG rests on three pillars. ① Built on TIFF/EP: DNG is not a from-scratch container; it sits on TIFF 6.0 + TIFF/EP (TIFF Electronic Photography, ISO 12234-2 from 1998) with a standardised set of private tags. Existing TIFF readers can see the gross structure (without rendering correctly), and once the DNG spec was public anyone could write a DNG decoder — Adobe deliberately lowered the barrier. ② Vendor metadata passthrough: DNG reserves a MakerNotes region in the container and stores the original vendor's private metadata (e.g. some encrypted exposure block from Sony ARW) verbatim; the DNG decoder needn't understand it, but it isn't dropped. This is Adobe's reconciliation gesture to vendors: converting to DNG doesn't lose your camera-specific information, and a vendor SDK could in principle read it back later. ③ Optional demosaiced preview + full original sensor data: a DNG file usually carries an embedded JPEG preview (so Lightroom thumbnails appear instantly) plus the complete Bayer raw payload (for re-decoding in post). The cost is 5-10 % more bytes than the original vendor RAW, in exchange for "opens with a thumbnail" UX. Some DNGs also enable a lossy mode (Adobe Lossy DNG — JPEG-style lossy in the raw domain, 50 % smaller, at the cost of some RAW flexibility — primarily targeted at iPhone ProRAW).

适用

USE FOR

  • 厂商无关的 RAW 长期归档(摄影师整理 5-10 年素材)
  • iPhone ProRAW(Apple 2020 起的官方 RAW 选项)
  • Pentax / Leica / Hasselblad 原生输出
  • Lightroom 默认导入选项("转 DNG 后导入")
  • 需要可移植 RAW 的科研 / 文物数字化场景
  • Vendor-neutral long-term RAW archive (5-10 years of photographer footage)
  • iPhone ProRAW (Apple's official RAW option since 2020)
  • Native output from Pentax / Leica / Hasselblad
  • Lightroom's default import option ("convert to DNG on import")
  • Research / cultural-heritage digitisation needing portable RAW

反适用

AVOID

  • Canon / Nikon / Sony 主流相机原生输出(没有,只能事后转)
  • 当前流水线已绑定厂商 SDK 的工作流(转换增加风险)
  • 体积极敏感场景(DNG 通常比原厂 RAW 大 5-10%)
  • 普通终端用户分享(用 JPEG / HEIC)
  • Native output from mainstream Canon / Nikon / Sony bodies (none — can only convert)
  • Workflows already bound to vendor SDKs (conversion adds risk)
  • Strictly size-sensitive scenarios (DNG is typically 5-10 % larger than the original RAW)
  • Sharing with regular end users (use JPEG / HEIC)
scopetoolslibrariesCLI
DNG (.dng) ✓✓ Adobe Camera Raw · Lightroom · Capture One · darktable · RawTherapee · Apple 系统(iPhone ProRAW 原生) LibRaw(读)· Adobe DNG SDK(读 / 写)· libtiff(读基础结构) Adobe DNG Converter(GUI + CLI)· dnglab(开源 RAW → DNG)· exiftool
基于:based on: TIFF 6.0 / TIFF/EP(ISO 12234-2) 试图替代:attempted to replace: CR3 / NEF / ARW 各家厂商专有 RAW(部分成功) 原生采用:native adopters: Pentax · Leica · Hasselblad · Apple iPhone ProRAW 解码生态:decode ecosystem: Adobe Camera Raw · LibRaw · darktable · RawTherapee

CR3 / NEF / ARW — 主流厂商的 RAW

CR3 / NEF / ARW — the big-three vendor RAWs

YEAR Canon CR3 (2018) / Nikon NEF (1999) / Sony ARW (2005) AUTHOR Canon · Nikon · Sony 各自 EXT .cr3 · .cr2 · .nef · .nrw · .arw · .srf · .sr2 CONTAINER CR3 = ISOBMFF(同 HEIF)· CR2 / NEF / ARW = TIFF base DEPTH 14-bit 主流(部分 12-bit 入门机) COMPRESSION 私有 · 基本无损 + 可选有损(Canon CRaw / Nikon NEF Compressed / Sony Lossy) SPEC 不公开 · 全靠 LibRaw 逆向工程支持 STATUS 三家占数码相机市场 80%+ · 现役相机主流

"三家相机巨头各做一套,都不兼容,都活得很好。"

"Three camera giants, three formats, none compatible — and all thriving."

Canon / Nikon / Sony 三家占数码相机市场 80% 以上,各家拥有完整的 DSLR / 无反 + 镜头生态(EF / RF / F / Z / E / FE 卡口等),RAW 格式是其专有生态的最后一环——锁定到自家 RAW 意味着用户后期工作流也跟着锁定(用 Canon DPP / Nikon NX Studio / Sony Imaging Edge 时体验最完整,跨家就得依赖 LibRaw 或商业第三方)。Canon 2018 把 CR2 升级 CR3,容器从 TIFF 换成 ISOBMFF(同 HEIF / MP4 spec 族)——为的是跟 HEIF 工具链共享 box 解析器,顺便能在 RAW 文件里塞 HEIF 缩略图、HEVC 视频片段、AAC 音频(给"双重曝光"和短视频功能用)。Nikon NEF 一直是 TIFF base,从 1999 年 D1 到现在 Z 系列没换。Sony ARW 也是 TIFF base,但有臭名昭著的"有损 RAW"模式——早期 α 系列默认输出"压缩 RAW"实际上是有损,被摄影社区批评后才允许选"未压缩"。三家都不公开 RAW spec,LibRaw / dcraw 全靠逆向工程支持。

Canon / Nikon / Sony together hold over 80 % of the digital-camera market, each with a complete DSLR / mirrorless + lens ecosystem (EF / RF / F / Z / E / FE mounts and so on), and the RAW format is the final piece of that proprietary stack — being locked into a vendor's RAW means your post workflow follows (the experience is most complete in Canon DPP / Nikon NX Studio / Sony Imaging Edge; cross-vendor work depends on LibRaw or commercial third parties). Canon upgraded CR2 to CR3 in 2018, swapping the container from TIFF to ISOBMFF (same family as HEIF / MP4) — to share box parsers with the HEIF toolchain and incidentally to embed HEIF thumbnails, HEVC video clips, and AAC audio in the RAW file (for "double-exposure" and short-video features). Nikon NEF has been TIFF-based since the 1999 D1 and the Z series has not changed it. Sony ARW is also TIFF-based, but with the notorious "lossy RAW" mode — early α bodies defaulted to "compressed RAW" that was actually lossy, and only after sustained criticism from the photography community was an "uncompressed" option allowed. None of the three publish RAW specs; LibRaw / dcraw support them entirely through reverse engineering.

CR2 vs CR3 · CONTAINER REDESIGN (2018) CR2 · TIFF base (2004) TIFF header (II + 42) IFD0 · preview JPEG IFD1 · thumbnail IFD2 · RGB preview IFD3 · raw + private tags Canon CR2 raw payload linear · IFD chain CR3 · ISOBMFF (2018) ftyp · brand=crx moov trak[0] · raw image trak[1] · JPEG preview trak[2] · HEVC video clip mdat (media data) CRaw payload + JPEG + HEVC all interleaved meta · CTBO + EXIF box tree · 同 HEIF / MP4
图 34 · Canon CR2 vs CR3 容器对比。CR2 (2004) 是 TIFF base——文件开头是 TIFF header,然后线性排着多个 IFD(每个 IFD 是一张图:预览 JPEG / 缩略图 / RGB preview / RAW 数据),最后是 Canon 私有 raw payload。CR3 (2018) 完全换成 ISOBMFF——跟 HEIF / MP4 同一个 spec 族,文件由 ftyp + moov + mdat + meta 等 box 组成,RAW 数据、JPEG 预览、HEVC 视频片段、AAC 音频可同时装在一个 mdat 里。Canon 这么改是为了:① 跟 HEIF 工具链共享 box 解析器;② 给"双重曝光"和短视频拍摄留接口;③ 跟现代 ISOBMFF 生态(MP4 / HEIF / AVIF / JPEG XL 容器)对齐。代价:老 dcraw / 老 LibRaw 全部要重写——CR3 出来后开源世界花了一年多才稳定支持。
Fig 34 · Canon CR2 vs CR3 container comparison. CR2 (2004) is TIFF-based — the file opens with a TIFF header, then several IFDs in linear order (each IFD is one image: preview JPEG / thumbnail / RGB preview / RAW data), and ends with the Canon-private raw payload. CR3 (2018) switches entirely to ISOBMFF — the same family as HEIF / MP4 — so the file is composed of ftyp + moov + mdat + meta boxes, and RAW data, JPEG previews, HEVC video clips, and AAC audio can all sit in the same mdat. Canon made the move to: ① share box parsers with the HEIF toolchain; ② leave room for "double-exposure" and short-video features; ③ align with the modern ISOBMFF ecosystem (MP4 / HEIF / AVIF / JPEG XL containers). Cost: old dcraw / old LibRaw had to be rewritten — after CR3 launched, it took the open-source world over a year to support it stably.

技术内核

Technical core

三巨头 RAW 共三条线索。① 容器:CR3 是 ISOBMFF,CR2 / NEF / ARW 是 TIFF 系。Canon 2018 把 CR2 升级 CR3 时换了容器,目的就是跟现代 ISOBMFF 生态(HEIF / MP4 / AVIF / JPEG XL)对齐,顺便能在一个 .CR3 里塞 RAW + JPEG preview + HEVC 视频片段 + AAC 音频(给"双重曝光"和短视频功能用)。Nikon NEF 和 Sony ARW 还是传统 TIFF base——文件开头 TIFF header,接 IFD chain,每个 IFD 装一张图(thumbnail / preview JPEG / 真正 RAW),Sony 还在 IFD 里加私有 SR2 sub-IFD 装额外 metadata。② 各家私有有损 RAW 压缩。Canon 有 CRaw(visually lossless,体积砍 30-40%);Nikon 有 NEF Compressed(实际是把 14-bit raw 用一个查找表压成 12-bit 等价精度,有损但视觉无损);Sony 早期默认就是有损"压缩 RAW"(被批评后允许选"未压缩")。这些有损模式都是闭源算法,LibRaw 逆向支持但有时跟厂商官方解码结果略有偏差。③ "有损 RAW"概念的兴起。原本 RAW 的精神就是"无损保留 sensor 数据",但 14-bit 有损压缩(类似 Lossy DNG)能砍体积 50-70%、视觉几乎无损,对存储敏感的场景(连拍 / 4K 视频拍摄间隙拍照)很有吸引力。Canon CRaw / Sony Compressed RAW / Nikon NEF Compressed 都属于这类——长远看 RAW 文件正在向 "有损但视觉无损" 滑动,这跟 JPEG XL / HEIC 的设计哲学不谋而合。

Three threads connect the big-three RAWs. ① Container: CR3 is ISOBMFF, CR2 / NEF / ARW are TIFF-family. When Canon upgraded CR2 to CR3 in 2018 it swapped the container — the goal was to align with the modern ISOBMFF ecosystem (HEIF / MP4 / AVIF / JPEG XL) and incidentally to pack RAW + JPEG preview + HEVC video clips + AAC audio into one .CR3 (used by "double-exposure" and short-video features). Nikon NEF and Sony ARW remain traditional TIFF-based — the file opens with a TIFF header, then an IFD chain, each IFD holding one image (thumbnail / preview JPEG / actual RAW); Sony additionally puts a private SR2 sub-IFD inside the IFD to carry extra metadata. ② Each vendor's private lossy RAW compression. Canon offers CRaw (visually lossless, 30-40 % size reduction); Nikon offers NEF Compressed (effectively a lookup-table that compresses 14-bit raw to a 12-bit-equivalent precision, lossy but visually lossless); Sony's early default was a lossy "compressed RAW" (after criticism, an "uncompressed" option was added). These lossy modes are closed-source algorithms — LibRaw supports them via reverse engineering but its decoder occasionally diverges slightly from the vendor's. ③ The rise of "lossy RAW". RAW's original spirit is "preserve sensor data losslessly," but 14-bit lossy compression (similar to Lossy DNG) cuts size by 50-70 % with virtually no visible loss — attractive in storage-sensitive scenarios (burst shooting, stills between 4K video clips). Canon CRaw / Sony Compressed RAW / Nikon NEF Compressed all belong here. Long-term, RAW files are sliding toward "lossy but visually lossless" — coincidentally the same design philosophy as JPEG XL / HEIC.

适用

USE FOR

  • 各家相机的原生输出(谁拍用谁的 RAW · 没第二选择)
  • 用厂商官方软件做后期(Canon DPP / Nikon NX Studio / Sony Imaging Edge)
  • 需要厂商完整 metadata 的场景(镜头校正 / 自动 WB 微调)
  • 跟 HEIF / MP4 工具链协同(CR3 ISOBMFF 容器友好)
  • Native output from each vendor's cameras (whoever you shoot with, that's your RAW — no choice)
  • Post in the vendor's official software (Canon DPP / Nikon NX Studio / Sony Imaging Edge)
  • Scenarios needing the vendor's complete metadata (lens correction, auto-WB fine-tuning)
  • Pipelines aligned with the HEIF / MP4 toolchain (CR3's ISOBMFF container fits naturally)

反适用

AVOID

  • 跨厂商 / 跨工具长期归档(转 DNG 更稳)
  • 需要公开 spec 的科研归档(三家都不公开)
  • 极度敏感的 bit-exact 比对(LibRaw 解出的结果跟厂商 SDK 可能略有偏差)
  • 移动端 / Web 直接显示
  • Cross-vendor / cross-tool long-term archiving (DNG is more reliable)
  • Scientific archives needing public specs (none of the three publish)
  • Highly bit-exact comparisons (LibRaw decodes can deviate slightly from vendor SDKs)
  • Direct display on mobile / web
scopevendorlibrariesCLI
CR3 / NEF / ARW + 各家私有压缩 Canon DPP · Nikon NX Studio · Sony Imaging Edge · Adobe Camera Raw / Lightroom · Capture One LibRaw(逆向)· Canon EDSDK / Nikon SDK / Sony SDK(闭源 / 需申请) libraw_unprocessed_raw · dcraw -D(输出原始 sensor 数据)· exiftool
起源:origin: 厂商各自分支 — Canon (2004 CR2 → 2018 CR3) · Nikon (1999 NEF) · Sony (2005 ARW) 容器关系:container kinship: CR2 / NEF / ARW 基于 TIFF · CR3 改用 ISOBMFF(同 HEIC / MP4) 并存:coexists with: DNG(Adobe 试图统一 · 三巨头未跟) 解码靠:decoded by: 厂商 SDK(闭源)· LibRaw(逆向工程)· Adobe Camera Raw / Capture One(商业)

DICOM — 医学影像的封闭城堡 · 扛把子

DICOM — the walled city of medical imaging · heavy hitter

YEAR 1985 (ACR-NEMA 1.0) · 1993 (DICOM 3.0) AUTHOR ACR + NEMA · DICOM 标准委员会 EXT .dcm · .dcm30 · .dicom MIME application/dicom STD ISO 12052 · DICOM PS 3.x(2024 仍在版本化) LOSSY 多 transfer syntax(JPEG · JPEG-LS · JPEG 2000 · RLE · 无压缩 · HEVC) DEPTH 8-32 bit · 整数 / 浮点 / 12-16 bit 灰度主流 STATUS 医院唯一标准 · 全球 CT / MRI / PACS / EHR 通用

"它不是图片格式,是带 4000 个字段的医疗记录。"

"Not just an image format — a medical record with 4,000 attributes."

1980 年代,医院里 CT、MRI、X-ray、超声各家厂商各做一套协议:GE 的 CT 出不来 Siemens MRI 能读的文件,科室之间没法交换数据,医生想做一次跨设备影像会诊基本不可能。ACR(美国放射学院)与 NEMA(美国电气制造商协会)1985 年合作发布 ACR-NEMA 1.0,1993 年改名 DICOM 3.0 并加入网络协议。DICOM 同时定义了三件事:(a) 文件格式——一个 .dcm 既是图像也是患者完整病历;(b) 网络协议 DIMSE——医院里 CT 跟 PACS 之间的传输怎么走;(c) 元数据字典——4000+ 标准 tag 涵盖患者姓名、研究日期、modality、像素数据、窗宽窗位等任何医疗影像可能需要的字段。这套体系后来成了医院 IT 的事实标准——全球任何 CT / MRI / 超声 / 病理切片设备出厂时都说 DICOM,任何 PACS / EHR / 工作站默认输入也是 DICOM。30 年没人能挑战,因为它解决的不是"压像素",而是整个医疗影像的协议栈

In the 1980s, hospital CT / MRI / X-ray / ultrasound vendors each defined their own protocols: a GE CT image couldn't be opened by a Siemens MRI station, departments couldn't exchange data, and a multi-modality consult was effectively impossible. ACR (American College of Radiology) and NEMA (National Electrical Manufacturers Association) jointly released ACR-NEMA 1.0 in 1985, then renamed it DICOM 3.0 in 1993 and added a network protocol. DICOM defines three things at once: (a) a file format — one .dcm is simultaneously an image and a complete patient record; (b) a network protocol, DIMSE — how images move between a CT scanner and a PACS server inside a hospital; (c) a metadata dictionary — 4,000+ standard tags covering patient name, study date, modality, pixel data, window width / level, and every medical-imaging attribute imaginable. The whole stack became the de-facto standard of hospital IT: every CT / MRI / ultrasound / pathology-slide device on Earth speaks DICOM out of the box, every PACS / EHR / workstation reads DICOM by default. 30 years later it remains unchallenged — because what it solved isn't "compressing pixels" but the entire medical-imaging protocol stack.

DICOM · FILE LAYOUT preamble 128 bytes (任意数据) 'DICM' 4 B magic File Meta Info (0002, xxxx) group transfer syntax UID DataSet body DataElement[] + pixel data tag 展开 · DataSet 内部 (0010,0010) PatientName (0008,0060) Modality (0028,1050) WindowCenter (7FE0,0010) PixelData … 4000+ standard tags + private group(odd group number) … 注:preamble 128 字节可以塞任何数据 — 有人在那儿藏过 PNG 缩略图,让 .dcm 在文件管理器里能预览。
图 a · DICOM 文件结构。开头 128 字节 preamble 可以塞任何数据(规范允许),紧跟 4 字节 magic 'DICM'。然后是 File Meta Information(group 0x0002 的 tag,核心是 transfer syntax UID,告诉解码器像素是 JPEG / JPEG-LS / 无压缩还是别的)。最后是 DataSet body——一长串 DataElement,每个是一个 (group, element) tag + 数据,既包含医疗 metadata(患者名 / modality / 窗宽窗位)也包含 (7FE0, 0010) PixelData——真正的图像。一份 .dcm 同时是图像 + 病历 + 设备信息 + 工作流上下文。
Fig a · DICOM file layout. The first 128-byte preamble can hold any data (the spec permits it), followed by the 4-byte magic 'DICM'. Next is the File Meta Information (tags in group 0x0002 — most importantly the transfer-syntax UID, telling the decoder whether pixels are JPEG / JPEG-LS / uncompressed or something else). Last is the DataSet body — a long sequence of DataElements, each a (group, element) tag plus data, mixing medical metadata (patient name / modality / window width / level) and the (7FE0, 0010) PixelData — the actual image. One .dcm is image + record + device info + workflow context all at once.

技术内核

Technical core

DICOM 体系庞大但有六根支柱。① DataSet = 一组 DataElement,每个 DataElement 由 4 字段组成:Tag(group, element)+ VR(Value Representation,数据类型,如 PN=PersonName / DA=Date / US=UnsignedShort / OB=OtherByte)+ Length + Value。DataSet 可以嵌套(SQ 类型 = Sequence)。② 4000+ 标准 tag 由 DICOM 数据字典维护:(0010, 0010)=PatientName、(0008, 0060)=Modality、(0008, 0020)=StudyDate、(0028, 0010)=Rows、(0028, 1050)=WindowCenter、(7FE0, 0010)=PixelData…奇数 group 留给私有扩展(各厂商私有 tag 的栖息地)。③ Transfer Syntax UID 决定像素压缩方式——这是 DICOM 最关键的"开关":1.2.840.10008.1.2(无压缩, implicit VR)、.1.2.1(无压缩 explicit VR,最常见)、.1.2.4.50(JPEG baseline)、.1.2.4.80(JPEG-LS lossless,CT/MRI 默认)、.1.2.4.91(JPEG 2000 lossy)、.1.2.5(RLE)、.1.2.4.107(HEVC main profile,新)等几十种。同一份 .dcm 可以"换 transfer syntax"重新压缩,但 metadata 完全保留。④ Multi-frame:CT 和 MRI 一次扫描会出几十到几百张切片,DICOM 既支持每张切片一个 .dcm 文件(典型用法),也支持一个文件多帧(类似 GIF 多帧)——后者方便长 cine 序列。⑤ Window / Level metadata:CT 是 12-bit 灰度数据(范围 -1024~3071 Hounsfield Units),但屏幕只能显示 8-bit。DICOM 在 metadata 里存窗宽(WW)+ 窗位(WL)——告诉显示器"把哪段 12-bit 范围映射到 8-bit 灰度"。同一张 CT,医生可以切到"骨窗"(WW=2000, WL=300)看骨折,切到"软组织窗"(WW=400, WL=40)看肿瘤,切到"肺窗"(WW=1500, WL=-600)看肺纹理——一张图三种用途。⑥ DICOMweb(WADO-RS / STOW-RS / QIDO-RS):2010s 后基于 HTTP REST 的现代接口,正在逐步替代 1980s 设计的 DIMSE TCP 协议——本质上是把 DICOM 网络层从 OSI 7 层改造成 HTTP 友好版,方便跟现代云原生 PACS / Web 浏览器集成。

DICOM is sprawling but rests on six pillars. ① DataSet = a list of DataElements; each DataElement has four fields: Tag (group, element) + VR (Value Representation — the type, e.g. PN=PersonName / DA=Date / US=UnsignedShort / OB=OtherByte) + Length + Value. DataSets can nest (the SQ / Sequence type). ② 4,000+ standard tags, maintained by the DICOM Data Dictionary: (0010, 0010)=PatientName, (0008, 0060)=Modality, (0008, 0020)=StudyDate, (0028, 0010)=Rows, (0028, 1050)=WindowCenter, (7FE0, 0010)=PixelData… Odd-numbered groups are reserved for private extensions (where vendor-private tags live). ③ Transfer Syntax UID decides pixel compression — DICOM's most important switch: 1.2.840.10008.1.2 (uncompressed, implicit VR), .1.2.1 (uncompressed, explicit VR, most common), .1.2.4.50 (JPEG baseline), .1.2.4.80 (JPEG-LS lossless, the CT/MRI default), .1.2.4.91 (JPEG 2000 lossy), .1.2.5 (RLE), .1.2.4.107 (HEVC main profile, new), and dozens more. The same .dcm can be "transcoded to a new transfer syntax" — the metadata survives untouched. ④ Multi-frame: a CT or MRI scan produces tens to hundreds of slices; DICOM supports either one file per slice (the typical layout) or one file holding many frames (like a multi-frame GIF) — the latter is handy for long cine sequences. ⑤ Window / Level metadata: CT data is 12-bit greyscale (range −1024 to 3071 Hounsfield Units) but a display only shows 8 bits. DICOM stores window width (WW) + window level (WL) in metadata — telling the viewer "map this slice of the 12-bit range to 8-bit greys." A single CT can be re-windowed: "bone window" (WW=2000, WL=300) for fractures, "soft-tissue window" (WW=400, WL=40) for tumours, "lung window" (WW=1500, WL=−600) for lung markings — one image, three purposes. ⑥ DICOMweb (WADO-RS / STOW-RS / QIDO-RS): a post-2010 HTTP-REST modernisation that is steadily replacing the 1980s-era DIMSE TCP protocol — fundamentally re-architecting DICOM's network layer from OSI-7 into something HTTP-friendly, so cloud-native PACS and web browsers can integrate cleanly.

DATAELEMENT · 4 FIELDS Tag (group, element) 4 bytes e.g. (0010, 0010) VR type 2 bytes e.g. PN Length value size 2 / 4 B e.g. 14 Value payload N bytes "AIRING^ZHANG^" (0010,0010) · PN · 14 · "AIRING^ZHANG^" → 一个 DataElement = 一行病历字段
图 b · DICOM DataElement 4 字段展开。Tag(group, element 各 2 字节)是字段身份,如 (0010, 0010) 是 PatientName,(0008, 0060) 是 Modality。VR(Value Representation)是 2 字符的类型代码——PN=PersonName / DA=Date / US=UnsignedShort / OB=OtherByte / SQ=Sequence(嵌套 DataSet)/ UI=UID 等约 30 种。Length 标 Value 字节数。Value 是真正的数据。一份 .dcm 就是几百到几千个这样的 DataElement 顺序排列——这种"自描述 + tag 化"的设计深受 TIFF IFD 影响,但比 TIFF 多了 4000 个标准化的 tag 字典。
Fig b · A DICOM DataElement expanded into its four fields. Tag (group + element, 2 bytes each) is the field's identity — e.g. (0010, 0010) is PatientName, (0008, 0060) is Modality. VR (Value Representation) is a 2-char type code — PN=PersonName / DA=Date / US=UnsignedShort / OB=OtherByte / SQ=Sequence (nested DataSet) / UI=UID — about 30 in total. Length records the byte count of Value. Value is the actual data. A .dcm is hundreds to thousands of these in sequence — the "self-describing, tag-based" design is deeply inspired by TIFF's IFDs, but with a 4,000-entry standardised tag dictionary on top.
TRANSFER SYNTAX · UID → CODEC UID suffix codec use .1.2 uncompressed (implicit VR) small images .1.2.1 uncompressed (explicit VR) most common .1.2.4.50 JPEG baseline low-priority .1.2.4.80 JPEG-LS lossless CT/MRI default .1.2.4.91 JPEG 2000 lossy research .1.2.5 RLE simple/integer .1.2.4.107 HEVC main profile new (2017+)
图 c · DICOM transfer syntax UID 决定像素 codec。前缀 1.2.840.10008 是 DICOM 的 OID 根命名空间,后缀决定算法。CT / MRI 实际部署里 JPEG-LS lossless(.4.80) 是默认——因为医疗影像必须无损,而 JPEG-LS 对 12-16 bit 灰度高效。JPEG 2000 lossy 主要在科研和非诊断场景。HEVC(.4.107)是 2017 年加入的新选项,主要给超声 cine 和 4D 数据用。同一份 .dcm 可以离线 transcode 换 transfer syntax,metadata 不动,只换像素压缩——这是 DICOM 灵活性的关键。
Fig c · DICOM transfer-syntax UIDs select the pixel codec. Prefix 1.2.840.10008 is DICOM's OID root namespace; the suffix selects an algorithm. In real CT / MRI deployments JPEG-LS lossless (.4.80) is the default — medical imagery must be lossless and JPEG-LS handles 12-16 bit greyscale efficiently. Lossy JPEG 2000 is mostly for research and non-diagnostic uses. HEVC (.4.107) was added in 2017 mainly for ultrasound cine loops and 4D data. The same .dcm can be transcoded offline to a different transfer syntax — metadata stays untouched, only the pixel compression changes — which is the heart of DICOM's flexibility.
WINDOW / LEVEL · 12-BIT → 8-BIT CT data · 12-bit Hounsfield · range −1024 → 3071 −1024 0 (water) 3071 window: WW=400 WL=40 soft-tissue · −160 → 240 display · 8-bit · 0 → 255 同一张 CT,换 WW/WL 看不同组织:骨窗(2000/300)· 肺窗(1500/-600)· 脑窗(80/40)
图 d · CT 12-bit 数据怎么映射到 8-bit 屏幕。CT 的原始像素是 12-bit Hounsfield Units(范围 −1024 到 3071,水=0,空气=−1000,致密骨=+1000),共 4096 灰度级别——但屏幕只能显示 256 级。DICOM 在 metadata 里存 窗宽 WW + 窗位 WL,定义一个"取景框":只把 [WL−WW/2, WL+WW/2] 这段映射到 0~255 输出,框外要么纯黑要么纯白。同一张 CT,医生在工作站可以瞬间切换:骨窗(WW=2000, WL=300)看骨折,软组织窗(WW=400, WL=40)看肿瘤,肺窗(WW=1500, WL=−600)看肺纹理,脑窗(WW=80, WL=40)看脑出血。一张图三种诊断用途——这是 DICOM 必须用 12+ bit 的原因。
Fig d · How 12-bit CT data is mapped to an 8-bit screen. Raw CT pixels are 12-bit Hounsfield Units (range −1024 to 3071; water=0, air=−1000, dense bone=+1000) — 4,096 grey levels — but a display only shows 256. DICOM stores window width (WW) + level (WL) in metadata, defining a "viewing frame": only [WL−WW/2, WL+WW/2] is mapped to 0-255; outside that window everything saturates to black or white. From the workstation a radiologist can switch instantly: bone window (WW=2000, WL=300) for fractures, soft-tissue (WW=400, WL=40) for tumours, lung window (WW=1500, WL=−600) for lung markings, brain window (WW=80, WL=40) for haemorrhage. One image, multiple diagnostic uses — this is why DICOM has to be 12+ bit.
DICOM · MEDICAL IT PIPELINE · SCANNER TO DIAGNOSIS CT scanner acquisition device X-ray detector 12-bit Hounsfield scan reconstruction .dcm × N slices + 4000 tags JPEG-LS lossless DIMSE C-STORE PACS SERVER Orthanc / dcm4che SQL index + blob store Patient → Study → Series → Instance QIDO-RS · search by tag WADO-RS · retrieve frames STOW-RS · upload new DICOMweb HTTP REST consumers three downstream paths radiologist workstation apply WW/WL · MPR · 3D rendering OsiriX / Horos / RadiAnt / Weasis → written report AI inference CheXpert · nnU-Net · MONAI · TotalSegmentator read pixel + metadata → findings / segmentation mask EHR / FHIR ImagingStudy link to patient record · longitudinal compare Epic · Cerner · OpenMRS DICOM 同时管:① 文件格式(.dcm)② 网络协议(DIMSE / DICOMweb)③ 元数据字典(4000+ tag) — 三件事一套规范。 医院 IT 全靠 DICOM 串起来:CT 一定吐 DICOM、PACS 一定收 DICOM、AI 一定读 DICOM、EHR 一定连 DICOM — 30 年没人能换。

图 33 · DICOM 端到端医疗 IT 流水线。左:CT 设备扫描出 12-bit Hounsfield,经 reconstruction 后写成 N 张 .dcm 切片(每张带 4000 tag,默认 JPEG-LS lossless),通过 DIMSE 的 C-STORE 命令推到医院 PACS。中:PACS 服务器(Orthanc / dcm4che)按"Patient → Study → Series → Instance"四级层次索引,对外提供 DICOMweb HTTP REST API——QIDO-RS 按 tag 搜、WADO-RS 取像素、STOW-RS 上传。右:三条下游路径——① 放射科医生工作站(OsiriX / Horos / RadiAnt)应用窗宽窗位 + MPR + 3D 重建,出诊断报告;② AI 模型(CheXpert / nnU-Net / MONAI)读 .dcm pixel + metadata 出 segmentation;③ EHR(Epic / Cerner)用 FHIR ImagingStudy 资源把影像挂到患者档案上。整个医院 IT 体系的"像素 + 协议 + 字典"层全是 DICOM——30 年没人能挑战。

Fig 33 · The end-to-end DICOM medical-IT pipeline. Left: a CT scanner produces 12-bit Hounsfield data, runs reconstruction, writes N .dcm slices (each carrying 4,000 tags, default JPEG-LS lossless), and pushes them to the hospital PACS via the DIMSE C-STORE command. Middle: the PACS server (Orthanc / dcm4che) indexes data along the four-level "Patient → Study → Series → Instance" hierarchy and exposes a DICOMweb HTTP REST API — QIDO-RS searches by tag, WADO-RS retrieves pixels, STOW-RS uploads. Right: three downstream consumers — ① the radiologist workstation (OsiriX / Horos / RadiAnt) applies window/level + MPR + 3D rendering and produces a written report; ② AI models (CheXpert / nnU-Net / MONAI) read .dcm pixel + metadata to output segmentations; ③ the EHR (Epic / Cerner) uses the FHIR ImagingStudy resource to attach the imaging to a patient record. The entire hospital-IT stack — pixel layer, protocol layer, dictionary layer — runs on DICOM. 30 years and no one has displaced it.

transfer syntax UIDcodeclossy?typical use
1.2.840.10008.1.2uncompressed (implicit VR)small images / legacy
1.2.840.10008.1.2.1uncompressed (explicit VR)most common · default
1.2.840.10008.1.2.4.50JPEG baselinelossylow-priority / preview
1.2.840.10008.1.2.4.80JPEG-LS losslesslosslessCT / MRI default
1.2.840.10008.1.2.4.91JPEG 2000 lossylossyresearch / non-diagnostic
1.2.840.10008.1.2.5RLElosslesssimple / integer
1.2.840.10008.1.2.4.107HEVC main profilelossyultrasound cine / 4D (new)
$ dcmdump in.dcm                              # DCMTK: 看所有 tag + value
$ dcm2pnm in.dcm out.pnm                      # DICOM → PNM (应用 W/L)
$ dcmconv -ti in.dcm out.dcm                  # 改 transfer syntax(转码)
$ dcmodify -ea "(0010,0010)" in.dcm           # 删除 PatientName tag(匿名化)
$ python -c "import pydicom; print(pydicom.dcmread('in.dcm'))"  # pydicom 读 .dcm
$ orthanc-cli upload http://pacs:8042/ in.dcm # 上传到 Orthanc PACS
$ curl http://pacs/dicom-web/studies          # DICOMweb QIDO-RS 查 studies
$ TotalSegmentator -i ct.dcm -o seg/          # AI 分割: 100+ 解剖结构

适用

USE FOR

  • 医学影像所有 modality(CT / MRI / X-ray / 超声 / 病理切片 / 核医学 / 心电)
  • 医院 PACS / EHR 集成(没第二选择)
  • 医学 AI 模型训练 / 推理(DICOM 是事实输入格式)
  • 跨设备 / 跨医院影像交换(必须遵循)
  • 公开医学影像数据集发布(MIMIC-CXR / BraTS / RSNA / NIH)
  • 临床 GxP / HIPAA / GDPR 合规归档
  • Every medical-imaging modality (CT / MRI / X-ray / ultrasound / pathology slides / nuclear medicine / ECG)
  • Hospital PACS / EHR integration (no alternative)
  • Medical-AI training / inference (DICOM is the de-facto input format)
  • Cross-device / cross-hospital image exchange (mandatory)
  • Releasing public medical-imaging datasets (MIMIC-CXR / BraTS / RSNA / NIH)
  • Clinical GxP / HIPAA / GDPR compliant archival

反适用

AVOID

  • 任何非医疗场景(几乎是定义)
  • 消费级 / Web / 手机端图片(没浏览器支持)
  • 纯科研非临床(NIfTI / NRRD / Zarr 更轻)
  • 艺术 / 摄影 / 设计(用错赛道)
  • 需要小文件 / 低复杂度的场景(DICOM 元数据开销大)
  • Any non-medical scenario (almost by definition)
  • Consumer / web / mobile imagery (no browser support)
  • Pure-research non-clinical work (NIfTI / NRRD / Zarr are lighter)
  • Art / photography / design (wrong lane entirely)
  • Scenarios needing tiny files / low complexity (DICOM metadata overhead is heavy)
scopecommercialopen sourceCLI / lib
DICOM 文件 + DIMSE + DICOMweb ✓✓✓ GE Centricity · Siemens syngo · Philips IntelliSpace · Epic Radiant · Sectra · Agfa IMPAX · Carestream ✓✓ Orthanc · dcm4che · DCMTK · OHIF Viewer · Cornerstone.js · OsiriX Lite · Horos · Weasis · 3D Slicer · pydicom · MONAI dcmdump · dcm2pnm · dcmconv · dcmodify · pydicom · SimpleITK · orthanc-cli · dcm4che-tools
设计灵感:design inspiration: TIFF(自描述 tag-based 容器思想)+ ACR-NEMA 1.0 (1985) 独立分支:independent branch: 与所有非医疗格式并存 · 完全不交集 内嵌 codec:embedded codecs: JPEG-LS(默认)· JPEG 2000 · RLE · HEVC payload 现代搭档:modern partner: HL7 / FHIR ImagingStudy(EHR 串影像)· DICOMweb(HTTP REST 现代化) AI 时代:AI era: CheXpert · nnU-Net · MONAI · TotalSegmentator(全部以 DICOM 为输入)

SVG — 不是位图,但 web 里就是图

SVG — not a bitmap, but on the web it just is the image

YEAR 2001 (SVG 1.0) · 2018 (SVG 2 部分) AUTHOR W3C SVG Working Group EXT .svg · .svgz (gzip) MIME image/svg+xml STD W3C Recommendation LOSSY 无 (矢量数学,栅格化时按 DPR 重算) DEPTH 任意 (display dependent) ALPHA ✓ (每个元素 fill-opacity / stroke-opacity) ANIM SMIL (legacy) · CSS animation · JS · Lottie (外接) STATUS Web 矢量唯一标准 · 设计/数据可视化默认

"不存像素,存数学。屏幕多大,它就多清晰。"

"Stores math, not pixels — sharp at any size."

1990 年代末,W3C 想要一个"web 上的矢量"——能在浏览器里直接渲染、能跟 HTML / CSS / JS 共存的开放格式。当时的对手是 Adobe Flash 和 Macromedia 的私有矢量动画,以及微软推的 VML(Vector Markup Language)。1999 年 W3C 启动 SVG WG,2001 年发布 SVG 1.0 Recommendation。SVG 的核心是 XML + 矢量数学:一份 .svg 文件就是一棵 DOM 树,根 <svg> 下挂着 <rect> / <circle> / <path> / <text> 等几何元素,辅以 <linearGradient> / <filter> 等装饰。整张图被嵌入到 HTML 的 DOM 里,可被 CSS 染色、被 JS 操控、被屏幕阅读器朗读。最关键的:它不是被栅格化后才渲染——浏览器在屏幕分辨率上重新计算每条 path,所以它在 1×、2×、3× DPR 上都同等清晰。这是位图永远做不到的事。最终,SVG 战胜 VML(微软 2010 起放弃),又熬过了 Adobe 在 2010 因安全和性能问题逐步禁用的 Flash —— 2020 年 Adobe 正式停止 Flash 支持,SVG 成为 web 矢量的唯一标准。

In the late 1990s, W3C wanted a "web-native vector" — an open format that could be rendered in the browser and live alongside HTML / CSS / JS. The contenders of the day were Adobe Flash, Macromedia's proprietary vector animation, and Microsoft's VML (Vector Markup Language). W3C started the SVG WG in 1999 and shipped SVG 1.0 Recommendation in 2001. SVG's core is XML + vector math: an .svg file is a DOM tree — a root <svg> with <rect> / <circle> / <path> / <text> geometry inside, decorated by <linearGradient> / <filter> and friends. The whole image lives inside HTML's DOM — colourable by CSS, scriptable by JS, readable by screen readers. Most crucially: SVG is not rasterised first and rendered second — the browser recomputes every path at the screen's true resolution, so it stays equally sharp at 1×, 2×, 3× DPR. That is something a bitmap can never do. SVG eventually defeated VML (Microsoft abandoned it after 2010) and outlived Flash (which Adobe began phasing out in 2010 for security and performance reasons, and formally killed in 2020) — leaving SVG as the web's sole vector standard.

SVG PATH · M / L / C / Q / A / Z · ONE STROKE M 30,90 L 70,60 C cubic Q quad A arc Z close M move · L line · C cubic Bézier · Q quadratic · A elliptical arc · Z close
图 36a · SVG path 的"一笔画"演示。d 属性是一段命令字符串:M 移笔(move,起点)/ L 直线(line)/ C 三次贝塞尔(cubic)/ Q 二次贝塞尔(quadratic)/ A 椭圆弧(arc)/ Z 闭合(close)。一条 path 就是一串这种命令拼起来的轨迹,引擎按命令顺序"画"一遍。所有矢量字体、Adobe Illustrator 输出、绝大多数图标 SVG 都是这种 path —— 圆和矩形只是它的语法糖。
Fig 36a · SVG path as a "single stroke" demo. The d attribute is a command string: M move (start point) / L line / C cubic Bézier / Q quadratic Bézier / A elliptical arc / Z close. A path is just that command string strung together; the engine "draws" it in order. All vector fonts, Adobe Illustrator output and the vast majority of icon SVGs are paths like this — <rect> and <circle> are merely sugar.
viewBox · STAYS THE SAME · CSS SIZE × 1 / 2 / 3 40×40 60×60 80×80 同一份 viewBox="0 0 40 40" · 浏览器在屏幕分辨率上重新栅格化
图 36b · viewBox / preserveAspectRatio 缩放示意。同一份 SVG(viewBox="0 0 40 40")在 CSS 上分别按 40 / 60 / 80 px 渲染,内部的圆 被预栅格化再放大,而是浏览器在当前屏幕分辨率下重新解一遍 path/circle 的几何方程。这就是矢量"无限清晰"的本质 —— 不是"图变大",而是"图被重新画了一遍"。
Fig 36b · viewBox / preserveAspectRatio scaling demo. The same SVG (viewBox="0 0 40 40") is rendered at 40 / 60 / 80 px in CSS; the inner circle is not pre-rasterised then scaled — the browser re-solves the geometric equation of path / circle at the current screen resolution. That is the essence of vector "infinite sharpness" — not "the picture got bigger," but "the picture was redrawn."
SVG FILTER PRIMITIVES · 4 EXAMPLES feGaussianBlur feColorMatrix feOffset feMerge 模糊 变色 (LUT) 偏移 合并图层 SVG filter chain = 一组 primitive 串联,跟 Photoshop 滤镜栈同源
图 36c · SVG filter primitive 4 种。feGaussianBlur 高斯模糊;feColorMatrix 颜色矩阵(等价于 LUT,可做去色 / 偏色 / 反相);feOffset 像素偏移(常用作 drop shadow 第一步);feMerge 把若干层合并(把 offset+blur 跟原图叠成投影)。SVG filter 是一条,跟 Photoshop 的滤镜栈同源 —— 实际上 Photoshop / Sketch / Figma 的"投影 / 内阴影 / 模糊"导出 SVG 时就是翻译成这几个 primitive。
Fig 36c · Four SVG filter primitives. feGaussianBlur for blur; feColorMatrix (an LUT — desaturate, tint, invert); feOffset for pixel translation (the first step of a drop shadow); feMerge to stack outputs (combine offset+blur with the source for a shadow). An SVG filter is a chain — the same lineage as Photoshop's filter stack — and indeed Photoshop / Sketch / Figma translate "drop shadow / inner shadow / blur" into exactly these primitives when exporting SVG.
SVG vs PNG · 1× / 2× / 3× DPR SVG always sharp always sharp always sharp PNG PNG 在 2×/3× 上需要重新采样 · 锯齿 / 模糊不可避免
图 36d · SVG vs PNG 在 1× / 2× / 3× DPR 显示对比。SVG 三种尺寸下都同等清晰(浏览器按显示分辨率重算)。PNG 在原生 1× 像素对齐时清晰,放到 2× / 3× 屏幕上必须做双线性 / 双三次重采样,锯齿与模糊不可避免。这就是为什么图标 / logo / UI 装饰永远应该用 SVG —— 不仅体积更小,而且抗 Retina / 抗 4K 屏天生免疫。
Fig 36d · SVG vs PNG at 1× / 2× / 3× DPR. SVG stays equally sharp across all three sizes (the browser recomputes at the device's true resolution). PNG is sharp at native 1× pixel alignment but must bilinear / bicubic resample on 2× / 3× screens, so aliasing and blurring are inevitable. That is why icons, logos and UI decoration should always be SVG — not just smaller, but inherently immune to Retina and 4K displays.

技术内核

Technical core

SVG 的工程内核可分六块。① XML 文档——不是二进制,是文本,因此可被 grep / diff / git blame / sed / 任何文本工具处理,这一点跟 PNG / JPEG 完全相反。优点是版本管理友好、可程序生成、可手写;缺点是大体积场景(几十万节点的复杂可视化)解析慢、内存大。② shapes + path——基本几何元素 <rect> / <circle> / <ellipse> / <line> / <polyline> / <polygon>,加最强的 <path>(命令字符串拼出任意曲线 — M/L/H/V 直线类、C/S/Q/T 贝塞尔、A 椭圆弧、Z 闭合)。所有矢量字体、所有 Illustrator 输出本质都是 path。③ 装饰 = gradient + pattern + filter——<linearGradient> / <radialGradient> 渐变;<pattern> 平铺纹理;<filter>滤镜链,提供 feGaussianBlur(模糊)/ feColorMatrix(LUT)/ feOffset(偏移)/ feMerge(合并)/ feComposite(合成)/ feTurbulence(柏林噪声)/ feMorphology(膨胀腐蚀)等 20+ primitive,串联起来等价于 Photoshop 滤镜栈,Sketch / Figma 的"投影"导出 SVG 就是 feOffset+feGaussianBlur+feMerge 三件套。④ CSS 染色 + class——SVG 元素接受 fill / stroke / opacity / transform 等表现属性,也接受 CSS。一个图标 SVG 在不同 dark/light theme 下只需切换 CSS 变量,不必重新导出;currentColor 关键字让 fill 跟随父元素文字颜色,这是图标库(Heroicons / Lucide / Phosphor)的核心机制。⑤ JS 操控——每个 SVG 元素都是 DOM Node,document.querySelector('circle').setAttribute('cx', 100) 直接生效。这是 D3.js / Observable Plot / Chart.js / Recharts 这一整代数据可视化库的根基 —— 它们的真正能力不是"画 SVG",而是"把数据 join 到 SVG DOM 元素上,让 SVG DOM 跟随数据更新"。⑥ 动画三条路:(a) SMIL(Synchronized Multimedia Integration Language)在 SVG 1.0 时定的 <animate> / <animateTransform> / <animateMotion>,声明式但被 Chrome 一度想废弃,现在保留但不推荐;(b) CSS animation + transform / opacity,现代主流,跟 HTML 一致;(c) JS / requestAnimationFrame,最灵活,D3 / GSAP / anime.js 都用。在它们之上,Lottie(2017,Airbnb 的 Bodymovin AE 插件 → JSON,JS lib 渲染)是矢量动画的现代补充 —— 设计师在 After Effects 里做动画,导成 JSON,Lottie lib 在浏览器 / iOS / Android 上以 SVG 或 Canvas 渲染。底层渲染路径仍然是 SVG / Canvas 的几何指令。

SVG's engineering core breaks into six pieces. ① XML document — text, not binary, so it grep / diff / git blame / sed / any text tool — the polar opposite of PNG / JPEG. The upside is version-control friendliness, scriptability, hand-authorability; the downside is that giant scenes (a viz with 100k nodes) parse slowly and bloat memory. ② Shapes + path — primitive geometry <rect> / <circle> / <ellipse> / <line> / <polyline> / <polygon>, plus the killer <path> (a command string composing arbitrary curves — M/L/H/V for straight, C/S/Q/T for Béziers, A for elliptical arcs, Z to close). Every vector font and every Illustrator export is essentially a path. ③ Decoration = gradient + pattern + filter<linearGradient> / <radialGradient>; <pattern> for tiles; <filter> is a filter chain with 20+ primitives — feGaussianBlur, feColorMatrix (LUT), feOffset, feMerge, feComposite, feTurbulence (Perlin noise), feMorphology — strung together equivalent to Photoshop's filter stack. Sketch / Figma's "drop shadow" export is exactly feOffset + feGaussianBlur + feMerge. ④ CSS styling + class — SVG elements accept presentation attributes (fill / stroke / opacity / transform) and also CSS. An icon SVG can switch dark/light theme via a single CSS variable; currentColor lets fill inherit the parent's text colour — the central mechanism behind icon libraries like Heroicons / Lucide / Phosphor. ⑤ JS manipulation — every SVG element is a DOM Node, so document.querySelector('circle').setAttribute('cx', 100) just works. That is the foundation of an entire generation of dataviz libraries — D3.js, Observable Plot, Chart.js, Recharts — whose true superpower isn't "drawing SVG" but "joining data to SVG DOM nodes so the DOM updates with the data." ⑥ Three animation paths: (a) SMIL (Synchronized Multimedia Integration Language) defined the original <animate> / <animateTransform> / <animateMotion> — declarative, briefly threatened with deprecation by Chrome, now retained but not recommended; (b) CSS animation + transform / opacity, the modern mainstream — same as HTML; (c) JS / requestAnimationFrame, the most flexible — D3 / GSAP / anime.js. Layered on top, Lottie (2017, Airbnb's Bodymovin After-Effects plugin → JSON + JS runtime) is the modern vector-animation supplement: designers animate in AE, export to JSON, and Lottie renders in browsers / iOS / Android via SVG or Canvas. The underlying render path is still SVG / Canvas geometric drawing.

SVG · XML SOURCE → DOM → LAYOUT → RASTERISE → SCREEN PIXELS XML 源文档 .svg / inline <svg> <svg viewBox="0 0 24 24"> <path d="M..." fill="..."/> <circle cx="12".../> </svg> 浏览器解析 · DOM 树 svg ├─ defs ├─ path ← Node ├─ circle ← Node └─ g (group) CSS 样式应用 fill / stroke / transform / animation JS 操控 DOM D3 / setAttribute / event 布局 viewBox + transform filter chain (optional) feOffset feBlur feMerge → out applies before rasterise 栅格化 @ DPR screen 关键点:栅格化是最后一步,且发生在当前 DPR 上 — 同一 SVG 在 Retina / 4K 上重新栅格化、永远清晰。 CSS 样式 / JS 操控 / filter chain 都发生在 DOM 树上,不需重导出。这是 SVG 跟 PNG 最本质的区别。 Chromium / Firefox / WebKit 内部最终仍调用 Skia / Cairo / Core Graphics 的 path 渲染器,跟 Canvas 2D 同一条路径。

图 36 · SVG 完整处理流程。XML 源被浏览器解析为 SVG DOM 树,CSS 样式与 JS 操控直接作用于 DOM 节点(可热更新),布局阶段按 viewBox + transform 计算几何,可选的 filter chain(由 feOffset / feBlur / feMerge 等 primitive 串联)在栅格化前应用,最后才在当前屏幕 DPR 上栅格化为像素。这条流水线跟 PNG / JPEG 走"解码 → 完整位图 → 缩放采样"完全不同 — SVG 永远在最后一刻、按设备分辨率重画一次,所以无论 1× / 2× / 3× 屏都同等清晰。CSS 染色 / JS 数据可视化 / filter 投影都发生在 DOM 上,不需要重新导出文件 — 这是 D3 / Heroicons / Figma 设计交付能跑起来的工程基础。

Fig 36 · SVG's full processing pipeline. The XML source is parsed by the browser into a live SVG DOM tree; CSS and JS act directly on the DOM (hot-reloadable); layout computes geometry from viewBox + transform; an optional filter chain (composed of feOffset / feBlur / feMerge primitives) runs before rasterisation; only at the very end is the result rasterised at the current screen DPR. This differs fundamentally from PNG / JPEG's "decode → full bitmap → resample on resize" — SVG is redrawn once, at the device's true resolution, in the last moment, so it stays equally sharp on 1× / 2× / 3× displays. CSS theming, JS-driven dataviz, filter shadows — all on the DOM, no re-export needed — that's the engineering foundation under D3, Heroicons and Figma's "design hand-off" workflow.

featureSVGPNGPDFLottie
缩放无损
Web 嵌入✓ inline / img✓ img✗(大多数)✓ JS lib
动画SMIL / CSS / JSAPNGJSON timeline
文本可搜索✓ XMLpartial
体积(图标)~1 KB~3-10 KB~10 KB~30 KB
$ svgo in.svg -o out.svg                  # 优化 SVG · 删冗余、合并 path
$ inkscape --export-png=out.png in.svg    # SVG → PNG · CLI 友好
$ resvg in.svg out.png                    # Rust SVG 渲染器 · 服务端常用
$ lottie2html in.json out.html            # Lottie JSON → HTML/SVG 静态化
$ npx @figma/code-connect svg in.fig      # Figma → SVG 导出

适用

USE FOR

  • 图标 / logo / UI 装饰(Heroicons / Lucide / Phosphor)
  • 数据可视化(D3.js / Observable Plot / Chart.js / Recharts)
  • 需要 Retina / 4K 屏天生免疫的任何图
  • 需要 CSS 变量切 dark/light theme 的图
  • 需要 currentColor 跟随父元素文字颜色的图标
  • 简单动画 / loader / 微交互(CSS animate)
  • 设计交付 / 跨 DCC(Figma / Sketch / Illustrator 都原生输出)
  • Icons / logos / UI decoration (Heroicons / Lucide / Phosphor)
  • Data visualization (D3.js / Observable Plot / Chart.js / Recharts)
  • Anything that must stay sharp on Retina / 4K displays
  • Anything theme-able via CSS variables (dark / light)
  • Icons that follow parent text colour via currentColor
  • Simple loaders / micro-interactions (CSS animation)
  • Design hand-off across DCCs (Figma / Sketch / Illustrator all export SVG)

反适用

AVOID

  • 照片(没有压缩比优势 · 体积爆炸)
  • 复杂栅格内容(渐变噪点 / 真实纹理 / 模糊)
  • 百万节点级 dataviz(DOM 解析 + 重排极慢 · 改用 Canvas / WebGL)
  • 外部嵌入需执行 JS 的场景(<img src> 模式 JS 被浏览器禁)
  • Photos (no compression edge — files explode)
  • Complex raster content (noisy gradients, real textures, blur)
  • Million-node dataviz (DOM parse + reflow are slow — use Canvas / WebGL)
  • Embeds that need to run JS (browsers disable JS in <img src> mode)
scopebrowsers / runtimeseditors / DCCCLI
SVG (W3C) ✓✓ 所有现代浏览器原生 · React / Vue / Svelte 原生 JSX 支持 · iOS / Android (WebView · React Native SVG) · Skia / Cairo / Core Graphics 引擎 ✓✓ Figma · Sketch · Illustrator · Inkscape · Affinity Designer · Boxy SVG · 所有现代设计工具均原生导出 svgo · inkscape · resvg · rsvg-convert · imagemagick convert · cairosvg
前辈:predecessors: Microsoft VML · Adobe/Sun PGML · PostScript(思想源头) 起源:origin: W3C SVG Working Group · 2001 SVG 1.0 Recommendation 同源亲戚:cousins: PDF(同源 PostScript · 视为"分页 SVG")· AI(底层就是 PDF) 现代补充:modern supplement: Lottie · 复杂时间线动画 · 底层仍走 SVG / Canvas 2D 击败:defeated: Microsoft VML(2010)· Adobe Flash(2020)— 2021 起 web 矢量唯一标准

PDF — 容器之王

PDF — king of containers

YEAR 1993 (Acrobat 1.0) AUTHOR Adobe Systems · John Warnock EXT .pdf MIME application/pdf STD ISO 32000 (1.7 / 2.0) · PDF/A 归档子集 ISO 19005 DEPTH 任意(取决于嵌入对象) ALPHA ✓ (PDF 1.4+ 透明) STATUS 印刷 / 文档 / 表单 / 归档全行业

"你以为它是文档,其实是个能装一切的容器 —— 矢量、位图、字体、JS。"

"You think it's a document. It's a container holding everything — vectors, bitmaps, fonts, JavaScript."

1993 年 Adobe Acrobat 1.0 推出 PDF(Portable Document Format),目标是"任何打印机、任何屏幕看到的内容一致"——这件事在 1993 年其实没解决:你在 Mac 上排好的版到 Windows 打印机上字体丢失、布局错位是日常,LaTeX / TeX 那种把布局序列化进文件的思路在工业界没普及。Adobe 创始人 John Warnock 决定把自家的 PostScript(打印机用的页面描述语言)简化、加上随机访问索引,做成一个面向查看与归档的格式 — 这就是 PDF。基于 PostScript 简化,固定页面布局,可嵌字体 + 位图 + 矢量 + JS + 表单。30 年后成为合同、表单、印刷、归档的事实标准,2008 年 PDF 1.7 成为开放 ISO 32000 标准,Adobe 对自家格式失去专有控制权 —— 这个让步反而是 PDF 真正普及的关键。

In 1993 Adobe Acrobat 1.0 launched PDF (Portable Document Format) with a single ambition: "any printer, any screen, the same page." That problem wasn't solved at the time — typing a layout on Mac and printing it on Windows routinely produced missing fonts and broken pages, while LaTeX / TeX's idea of serialising the layout into the file hadn't reached industry. Adobe co-founder John Warnock chose to simplify his own PostScript (the page-description language inside printers), add a random-access index, and ship it as a viewer / archival format — that is PDF. Built on a simplified PostScript, fixed-page-layout, and able to embed fonts + bitmaps + vectors + JS + forms. Thirty years later it is the de-facto standard for contracts, forms, print, and archival. In 2008 PDF 1.7 became the open ISO 32000 standard and Adobe lost proprietary control of its own format — the concession that finally made PDF universal.

PDF · OBJECT TREE (random-access by xref) Catalog Page Tree Page 1 Page 2 Page 3 Resources Font (Type1/TT/CID) XObject · Image ContentStream
图 37 · PDF 对象树。文件根是 Catalog(全局入口),挂着 Page Tree(分页索引,允许嵌套以支持长文档),Page Tree 下挂多个 Page,每个 Page 引用一个 Resources 对象(字体 / XObject 图像 / ColorSpace 等共享资源),再加一个 ContentStream(实际的绘图指令流,跟 PostScript 同源)。文件末尾的 xref 表让阅读器随机访问任意对象,所以打开 1000 页 PDF 不需要先读完全部 — 这是 PDF 跟 PostScript 最关键的工程改进。
Fig 37 · PDF's object tree. The root is the Catalog (global entry), which references the Page Tree (page index, possibly nested to scale to long documents). Each Page points to a Resources object (fonts / XObject images / ColorSpace — shared resources) plus a ContentStream (the actual drawing-command stream, descended from PostScript). The xref table at the file tail lets viewers seek to any object in O(1), so opening a 1000-page PDF doesn't require streaming the whole file — the key engineering gain over plain PostScript.

技术内核

Technical core

PDF 的工程内核四件事。① 基于 PostScript 改良的页面描述语言—— 矢量绘图原语(m moveto / l lineto / c curveto / S stroke / f fill),跟 SVG path 命令同源思想。但 PDF 把 PostScript 的"图灵完备 + 解释执行"裁掉了,只保留可渲染的子集,加上 xref 随机访问索引,让 1000 页文件能任意翻页。② 可嵌入字体 + 位图 + 矢量—— 字体支持 Type1 / TrueType / OpenType / CID(CJK 大字符集);图像支持 JPEG / JBIG2(C40·黑白扫描)/ CCITT G4(传真)/ JPEG 2000(C8)/ Flate(zlib)等多种 codec — 整个 PDF 文件本质上是一个容器,实际像素由内嵌的 codec 解码。③ 分页 + 表单 + JS + 数字签名—— Page Tree 支持长文档;AcroForm / XFA 表单(可填写、可提交);Action 对象可绑定 JavaScript(报税表 / 计算字段);Signature 字段配合 PKI 数字签名让 PDF 在合同 / 法律文件场景立足。④ 归档子集 PDF/A —— ISO 19005,2005 起定义,禁用透明 / JS / 外部依赖 / 加密,要求嵌入所有字体 — 是 PDF 的 strict 子集,目的是"30 年后还能打开"。法律 / 政府 / 科研论文归档是 PDF/A 的主战场。

PDF's engineering core, four pieces. ① Page-description language descended from PostScript — vector primitives (m moveto / l lineto / c curveto / S stroke / f fill), kindred to SVG's path commands. But PDF removed PostScript's "Turing-complete interpreted execution," keeping only the renderable subset and adding the xref random-access table, so jumping around a 1000-page file is fast. ② Embeddable fonts + bitmaps + vectors — fonts: Type1 / TrueType / OpenType / CID (large CJK glyph sets); images: JPEG / JBIG2 (C40 · black-white scan) / CCITT G4 (fax) / JPEG 2000 (C8) / Flate (zlib). PDF is fundamentally a container; actual pixels are decoded by the inner codecs. ③ Pagination + forms + JS + digital signatures — Page Tree scales to long docs; AcroForm / XFA forms (fillable, submittable); Action objects bind JavaScript (tax forms with computed cells); Signature fields use PKI to put PDF on solid legal ground for contracts. ④ PDF/A archival subset — ISO 19005, defined from 2005, bans transparency / JS / external dependencies / encryption and mandates embedded fonts — a strict subset designed to "still open in 30 years." Legal, government and scientific-paper archival lives on PDF/A.

适用

USE FOR

  • 合同 / 法律文件(数字签名 + 跨平台一致)
  • 印刷 / 排版交付(InDesign 导出 PDF/X 印刷标准)
  • 表单(税表 / 申请表 / 可填可提交)
  • 长文档归档(PDF/A · 30 年后仍可打开)
  • 科研论文 / 学术出版(LaTeX → pdflatex 输出)
  • 电子书(固定布局 · 不重排)
  • Contracts / legal documents (digital signature + cross-platform consistency)
  • Print / typography delivery (InDesign → PDF/X print standard)
  • Forms (tax forms, applications — fillable, submittable)
  • Long-term archival (PDF/A — still openable in 30 years)
  • Scientific papers / academic publishing (LaTeX → pdflatex)
  • E-books (fixed layout, no reflow)

反适用

AVOID

  • Web 主图(浏览器有原生 viewer 但加载慢 · 用 SVG / image)
  • 响应式 / 重排内容(PDF 是固定布局 · 用 EPUB / HTML)
  • 移动端阅读体验(放大缩小笨重 · 用 EPUB)
  • 需要修改 / 协作的活文档(用 Google Doc / Notion / Office 365)
  • Web hero images (browsers have viewers but loading is slow — use SVG / image)
  • Responsive / reflowable content (PDF is fixed-layout — use EPUB / HTML)
  • Mobile reading (zoom is clumsy — use EPUB)
  • Live collaborative documents (use Google Docs / Notion / Office 365)
scopeviewerstoolsCLI
PDF (ISO 32000) ✓✓ Adobe Acrobat / Reader · macOS Preview · pdf.js (Mozilla, 浏览器内置) · Foxit · SumatraPDF · Skim ✓✓ Adobe Acrobat Pro · Affinity Publisher · LibreOffice · LaTeX (pdflatex) · Word / Pages 导出 · InDesign 导出 PDF/X qpdf · pdftk · pdftoppm · pdfinfo · mutool · ghostscript · pandoc
前辈:predecessor: Adobe PostScript(打印机页面描述语言) 起源:origin: Adobe Acrobat 1.0 · 1993 · John Warnock 同源亲戚:cousin: SVG(同源 PostScript · 视为"分页 SVG") 装载:embeds: JBIG2 · JPEG · JPEG 2000 · CCITT G4 · Flate 变体:variants: PDF/A(归档)· PDF/X(印刷)· PDF/UA(无障碍)· PDF/E(工程)

EPS — PostScript 的图片化身

EPS — PostScript dressed as an image

YEAR 1987 AUTHOR Adobe Systems EXT .eps · .epsf · .epsi(带 preview) MIME application/postscript DEPTH 任意(取决于 PS 内嵌资产) ALPHA 无(EPS 不支持透明) STATUS 印刷历史遗存 · 2010 年代后被 PDF 替代

"PostScript 加上 BoundingBox,就成了'图片'。"

"PostScript plus a BoundingBox = an 'image'."

1987 年 Adobe 为印刷出版定义 EPS(Encapsulated PostScript)—— 解决一个具体的工程问题:PostScript 1985 起作为打印机页面描述语言,文件本身是整页描述,没有"这张图占多大区域"的概念。但当时的 DTP(桌面出版)排版软件(Aldus PageMaker · QuarkXPress · 后来的 InDesign)需要把插图嵌入文档,要知道图边界做版心 / 文字绕排 / 缩放。Adobe 的解法非常简洁:一个普通 PostScript 文档,加上一行 %%BoundingBox: x1 y1 x2 y2 注释声明图像边框 —— 排版软件读这一行就能知道该图占多少空间,不必真去解 PS。再加 %%BeginPreview / %%EndPreview 嵌入位图预览(给不能渲染 PS 的程序看)。这就是 EPS。一个超低成本的"约定":不修改 PS 语法本身,只用注释扩展。这个格式撑起 90 年代到 2000 年代的全部印刷设计与 LaTeX 论文图表,2010 年代后被 PDF 完全替代 —— 因为 PDF 同样能做这件事,而且不需要"约定",直接是标准。

In 1987 Adobe defined EPS (Encapsulated PostScript) for the print-publishing industry — solving a concrete engineering problem. PostScript, from 1985, was a printer page-description language; a file described an entire page, with no notion of "how much space this illustration takes." But DTP applications (Aldus PageMaker / QuarkXPress / later InDesign) needed to embed illustrations inside documents, with a known bounding box for layout, text wrap, and scaling. Adobe's fix was elegantly minimal: a regular PostScript document plus one comment line — %%BoundingBox: x1 y1 x2 y2 — declaring the image's frame. The DTP app reads that line to know the size, without ever interpreting the PS itself. Add %%BeginPreview / %%EndPreview to embed a bitmap preview (for apps that can't render PS), and you have EPS — a near-zero-cost "convention" that extends PS via comments rather than syntax. The format carried virtually all print design and LaTeX paper figures through the 1990s and 2000s, and was wholly replaced by PDF after the 2010s — because PDF does the same thing without needing a convention; it's just the standard.

EPS · FILE STRUCTURE PS header %%BoundingBox [Preview] PostScript %!PS-Adobe x1 y1 x2 y2 optional drawing code DSC line 1 required bitmap thumb m / l / c / S byte 0 EOF ↑ DTP 软件读这一行决定图框
图 38 · EPS 文件结构 — 横向 4 段。① PS header(%!PS-Adobe-3.0 EPSF-3.0 标识 + 一些 DSC structuring comments);② %%BoundingBox: x1 y1 x2 y2(图像边框,DTP 软件读这一行决定版心);③ 可选的位图预览(给不能渲染 PS 的旧程序看);④ 真正的 PostScript 绘图代码(m / l / c / S / f 等命令)。整个文件是合法的 PostScript,可被 GhostScript 直接解释 —— EPS 的"图片化"完全靠 BoundingBox 这一行注释,不修改 PS 语法本身。
Fig 38 · EPS file structure — four horizontal chunks. ① PS header (%!PS-Adobe-3.0 EPSF-3.0 marker + DSC structuring comments); ② %%BoundingBox: x1 y1 x2 y2 (the bounding box DTP apps read for layout); ③ optional bitmap preview (for legacy apps that can't render PS); ④ the actual PostScript drawing code (m / l / c / S / f commands). The whole file is valid PostScript, interpretable directly by GhostScript — the "image-ness" of EPS rests entirely on the BoundingBox comment, with no change to PS syntax itself.

技术内核

Technical core

EPS 的内核只有两件事。① 普通 PostScript 文档 + 必须包含 %%BoundingBox 注释 —— BoundingBox 用 4 个数字声明图框(左下 x / 左下 y / 右上 x / 右上 y,单位 PostScript point = 1/72 inch);DTP 软件读这一行做版心,完全不需要解释 PS 本体。这是"约定式扩展"的工程经典 —— 0 成本扩展旧标准。② 可选 %%BeginPreview / %%EndPreview 嵌入位图缩略图(TIFF / WMF / PICT 三种主流格式)。1990 年代很多 DTP 软件不能在屏幕上渲染 PS(GhostScript 普及前 PS 解释开销大),所以排版时屏幕看到的是 preview 位图,打印时打印机解释真正的 PS 输出矢量。这种"屏幕用预览 / 打印用矢量"的双轨工作流是 EPS 的实际使用模式。EPS 的限制也很清楚:不支持透明(PS 没有 alpha 通道概念)、不支持多页(单页才叫"图片")、不支持表单 / JS / 加密(那是 PDF 的事)。这些限制在 1987 年是合理的,但到 2000 年代设计需求复杂化后,PDF(同样基于 PostScript 但加了透明、压缩、随机访问、多页、嵌入字体)就成了天然替代。LaTeX \includegraphics{fig.eps} 是 90 年代-2010 年代论文标配 —— pdflatex 流行后,EPS 几乎被 PDF 替代,因为 pdflatex 不能直接吃 EPS,需要 epstopdf 转换。

EPS's core is only two things. ① A regular PostScript document that must contain a %%BoundingBox comment — BoundingBox declares the figure frame using four numbers (lower-left x / lower-left y / upper-right x / upper-right y, in PostScript points = 1/72 inch). DTP apps read that single line for layout without interpreting the PS body — a textbook example of "convention-based extension," which extends a legacy standard at zero cost. ② Optional %%BeginPreview / %%EndPreview embeds a bitmap thumbnail (TIFF / WMF / PICT being the main formats). In the 1990s many DTP apps couldn't render PS on screen (PS interpretation was expensive before GhostScript matured), so on screen they showed the preview bitmap and at print time the printer interpreted the real PS as vectors. This "preview on screen / vector at print" two-track workflow was how EPS was actually used. EPS's limitations are equally clear: no transparency (PS has no alpha concept), no multi-page (a single page is what "image" meant), no forms / JS / encryption (those came in PDF). Reasonable in 1987, but as design needs grew through the 2000s, PDF — also PostScript-derived but with transparency, compression, random access, multi-page, embedded fonts — became the natural successor. LaTeX's \includegraphics{fig.eps} was the standard for academic figures from the 90s through the 2010s; once pdflatex became dominant, EPS was almost entirely replaced by PDF, since pdflatex doesn't ingest EPS directly and requires epstopdf.

适用

USE FOR

  • (历史)90 年代-2000 年代印刷设计交付
  • (历史)老 LaTeX 论文图表(latex+dvips 工作流)
  • 跟老印刷机 / 老 RIP 兼容的图形交付
  • 需要纯矢量 PostScript 输出的科学绘图(老版 gnuplot / xfig)
  • (legacy) 1990s-2000s print design hand-off
  • (legacy) old LaTeX figures (latex + dvips workflow)
  • Compatibility with vintage presses / RIPs
  • Pure-PostScript scientific plots (old gnuplot / xfig)

反适用

AVOID

  • Web(浏览器不支持 EPS)
  • 现代设计交付(用 PDF / SVG)
  • 需要透明的设计(EPS 不支持透明)
  • 多页文档(用 PDF)
  • pdflatex 工作流(需 epstopdf 转换 · 不如直接 PDF)
  • Web (no browser support for EPS)
  • Modern design hand-off (use PDF / SVG)
  • Anything needing transparency (EPS has none)
  • Multi-page documents (use PDF)
  • pdflatex workflows (needs epstopdf — go straight to PDF)
scopeeditorsrenderersCLI
EPS (Adobe) Adobe Illustrator · Inkscape · Affinity Designer · CorelDRAW(都可读老资产) GhostScript · old QuarkXPress · old PageMaker · 老印刷机 RIP ps2pdf · epstopdf · gs(GhostScript)· pstoedit
前辈:predecessor: Adobe PostScript (1985) 起源:origin: Adobe · 1987 · 为 DTP 排版软件(PageMaker / QuarkXPress)定义 被替代:replaced by: PDF(同源 PostScript · 但更全能) 仍活在:still alive in: 老 LaTeX 资产 / 老印刷机 / Adobe Illustrator 兼容读

AI — Illustrator 的私有

AI — Illustrator's proprietary file

YEAR 1987 (Illustrator 1.0) AUTHOR Adobe Systems EXT .ai SPEC 私有 · 无公开规范 DEPTH 任意(底层 PDF) ALPHA ✓ (CS2 后 PDF 1.4+ 透明) STATUS 设计交付源文件 · 视觉行业事实标准

"实质是 PDF + Adobe 私有 metadata。"

"Actually a PDF with Adobe-private metadata."

1987 年 Adobe Illustrator 1.0 推出,自定义 .ai 格式存放矢量插画 —— 跟 EPS / PDF 同年诞生,是 Adobe 在 80 年代末"PostScript 三件套"里专门给设计师用的源文件容器。早期(Illustrator 1.0 - CS1)的 .ai 是简化 PostScript,跟 EPS 几乎同源(都是 PS 子集),但加了 Illustrator 专有的图层、画板、笔刷等 metadata。CS2(2005)后 Adobe 做了一个有趣的工程决定:把 .ai 底层切到 PDF —— 因为 PDF 已经能装下 PostScript 矢量 + 字体 + 透明 + 嵌入位图(Adobe 内部的 PGF 私有 codec),再加上 Illustrator 私有的 PrivateData section 存放图层 / artboard / brush / 实时效果等 Illustrator 专有信息,就是完整的 .ai。结果:.ai 文件用 Adobe Reader 打开能看到栅格化预览(因为底层就是 PDF,Reader 直接渲染了内嵌的栅格化版本),但只有 Illustrator 才能完整编辑图层结构。这是设计师交付的"源文件"标准 —— 你在视觉行业接到的 brand kit、logo 源文件、海报源文件,90% 是 .ai。

In 1987 Adobe Illustrator 1.0 launched with the proprietary .ai format for vector illustrations — born the same year as EPS / PDF, the source-file container of Adobe's "PostScript trifecta" of the late 1980s. Early .ai (Illustrator 1.0 - CS1) was a simplified PostScript, kindred to EPS (both PS subsets), with Illustrator-specific metadata layered on top — layers, artboards, brushes. From CS2 (2005), Adobe made an interesting engineering decision: switch the .ai underbelly to PDF — because PDF already carried PostScript vectors + fonts + transparency + embedded raster (via Adobe's private PGF codec), plus an Illustrator-private PrivateData section for layers, artboards, brushes, live effects. So a .ai opens in Adobe Reader and shows a rasterised preview (because the file is fundamentally a PDF, and Reader renders the embedded raster version) — but only Illustrator can edit the layer structure. That is the "source file" standard for design delivery: 90 % of the brand kits, logo sources, and poster sources you'll receive in the visual industry are .ai files.

.ai vs .pdf · CHUNK STRUCTURE .pdf Catalog Pages Resources ContentStream .ai Catalog (PDF) Pages (PDF) Resources (PDF) ContentStream (PDF) ↑ same as PDF ↓ AI-only: + PrivateData (Layers / Artboards / Brushes / Effects)
图 39 · AI vs PDF 文件结构对比。两个并排 chunk 树:.pdf 有 Catalog / Pages / Resources / ContentStream 四个核心对象;.ai 完全继承这四个(从 CS2 起 .ai 底层就是 PDF),但额外加一个私有 PrivateData section,存 Illustrator 特有的图层、画板、笔刷、实时效果。这就是为什么 .ai 改后缀为 .pdf 后 Adobe Reader 能直接打开看 — 它就是一份合法 PDF,只是 Illustrator 用私有 chunk 加了"图层结构"这层 metadata。
Fig 39 · AI vs PDF file structure side-by-side. .pdf has the four core objects: Catalog / Pages / Resources / ContentStream. .ai inherits all four exactly (since CS2 the underbelly is PDF) but adds a private PrivateData section for Illustrator-specific layers, artboards, brushes, and live effects. That's why renaming a .ai to .pdf opens cleanly in Adobe Reader — it is a valid PDF; Illustrator just hides "layer structure" in private chunks above the PDF base.

技术内核

Technical core

.ai 的内核两件事。① CS2(2005)后 .ai 格式底层就是 PDF —— 严格说是带 Adobe PGF(私有位图 codec,内嵌栅格化预览)+ 完整矢量绘图指令的 PDF 文档。这个工程决定的副作用极其有趣:.ai 保存时会内嵌一份 PDF 兼容预览(默认勾选 "Create PDF Compatible File"),所以 Adobe Reader / Preview / 浏览器 PDF viewer 都能直接打开 .ai 看到栅格化效果 — 但拿不到图层。② 私有 PrivateData section 存 Illustrator 特有的图层(Layers,可命名 / 锁定 / 隐藏 / 嵌套)/ 画板(Artboards,一份 .ai 可有多张画板,做 brand kit 一次性交付 logo + favicon + 名片)/ 笔刷(自定义 brush)/ 实时效果(Live Effects:阴影 / 模糊 / 3D 等可编辑非破坏性效果)/ 符号(Symbol,可重用元件)。这部分私有 chunk 是 Adobe 的护城河 —— 没有公开规范,Inkscape / Affinity Designer 只能部分解析(读到矢量 path 和填色,但图层结构 / 实时效果常丢)。设计师交付源文件时圈内默认就是 .ai —— 因为它是唯一能保留全部"可编辑性"的格式;导出 SVG / PDF 都会损失部分 Illustrator-only 信息。

.ai's core, two pieces. ① From CS2 (2005), the .ai underbelly is PDF — strictly, a PDF document with Adobe's PGF (private bitmap codec for the embedded raster preview) plus full vector drawing commands. The side effect is amusing: when you save a .ai, Illustrator embeds a PDF-compatible preview by default (the "Create PDF Compatible File" checkbox), so Adobe Reader / Preview / browser PDF viewers all open it and show the rasterised view — but never the layer structure. ② Private PrivateData section for Illustrator-specific layers (named / lockable / hidable / nestable), artboards (a .ai can hold many artboards — deliver logo + favicon + business card in one brand-kit file), brushes (custom), live effects (non-destructive shadow / blur / 3D), and symbols (reusable components). That private section is Adobe's moat — undocumented; Inkscape and Affinity Designer can only partially parse it (reading vectors and fills but commonly losing layer hierarchy and live effects). The designer's industry default is to hand off .ai because it is the only format that preserves full "editability" — exporting to SVG / PDF discards Illustrator-only information.

适用

USE FOR

  • 设计师交付源文件(brand kit / logo / 海报源)
  • 多画板项目(一文件装多张交付)
  • 需要保留图层 / 实时效果的可编辑设计
  • 跟其他 Adobe CC 软件协作(InDesign / After Effects / Photoshop 智能对象)
  • Designer source-file delivery (brand kit / logo / poster source)
  • Multi-artboard projects (one file for many deliveries)
  • Editable designs preserving layers / live effects
  • Adobe-CC interop (InDesign / After Effects / Photoshop smart objects)

反适用

AVOID

  • 任何不装 Illustrator 的场景(Inkscape / Affinity 只能部分读)
  • Web 嵌入(用 SVG 导出)
  • 跨工具协作(用 SVG / PDF 中间格式)
  • 开源工作流(私有格式 · 锁定 Adobe 生态)
  • Anywhere without Illustrator (Inkscape / Affinity only partially parse)
  • Web embedding (export to SVG)
  • Cross-tool collaboration (use SVG / PDF as the lingua franca)
  • Open-source workflows (proprietary — locks you into Adobe)
scopefull editorpartial readersCLI
.ai (Adobe 私有) ✓✓ Adobe Illustrator(唯一完整支持) ~ Inkscape(读矢量 + path)· Affinity Designer · CorelDRAW · Adobe Reader(只读 PDF 预览) 几乎无 · uconv / pdftocairo 把 PDF 部分提取
前辈:predecessor: Adobe PostScript / 早期 .ai 是 simplified PS 底层:underbelly (CS2+): PDF + Adobe PGF private codec + PrivateData section 导出:exports to: SVG · PDF · PNG · 但都会损失 Illustrator-only 信息 影响:influence: 视觉行业事实标准 · "源文件 = .ai"是设计师默认假设

JBIG2 — PDF 里的黑白压缩

JBIG2 — the black-and-white compressor inside PDF

YEAR 2000 (ITU-T T.88) AUTHOR Joint Bi-level Image Experts Group EXT 通常嵌入 .pdf · 独立 .jb2 / .jbig2(罕见) MIME image/jbig2 (rare) STD ISO/IEC 14492 · ITU-T T.88 LOSSY 无损 + 有损(有损模式有 bug 历史) DEPTH 1-bit (黑白二值) STATUS PDF 内大量使用 · 法律 / 工程慎用有损

"扫描黑白合同的瘦身高手,但 2013 出过事故。"

"The B&W scan slimming wizard — with a 2013 incident on its record."

2000 年 ITU-T 标准化 JBIG2(T.88)替代上一代 G3 / G4 传真编码,专门压扫描黑白文档(合同、票据、账单、医学胶片黑白扫描)。它解决一个具体的工程问题:CCITT G4(1980 年代传真标准)是逐行 RLE,体积砍 10× 已经是上限,但 1990 年代末扫描分辨率从 200 DPI 升到 600 DPI,文件再次膨胀。JBIG2 的关键创新是把页面切成 symbol(连通域) —— 把扫描页面里所有连通的像素块识别出来,相似 symbol 共享一个 dictionary 模板,实际像素流变成"在 (x, y) 引用 dictionary 第 N 个 symbol"。一页扫描合同里所有的 'e' 在视觉上可能 90% 像,JBIG2 只存一个 'e' 模板,其余位置都是引用 —— 体积比 CCITT G4 砍一半到三分之二。Acrobat 9(2008)起,JBIG2 成为 PDF 默认黑白扫描压缩。但 2013 年 Xerox 复印机用有损 JBIG2(允许"用相似 symbol 替代")导致扫描合同里的数字 6 被替换成 8,工程图纸尺寸出错,Xerox 召回固件 —— 此后法律 / 工程行业默认关 JBIG2 选无损 CCITT G4。

In 2000 ITU-T standardised JBIG2 (T.88) to replace the previous-generation G3 / G4 fax encodings, targeting scanned black-and-white documents (contracts, invoices, statements, B&W medical scans). It solved a specific engineering problem: CCITT G4 (1980s fax) was per-row RLE, capped near 10× compression, but late-1990s scanners climbed from 200 DPI to 600 DPI and files swelled again. JBIG2's key innovation is cutting pages into symbols (connected components) — every connected pixel cluster on a page is identified, similar symbols share one dictionary template, and the actual stream becomes "at (x, y) reference symbol #N." On a scanned contract, all of the 'e' glyphs are ~90 % visually identical, so JBIG2 stores one 'e' template and turns the rest into references — file size drops to half or a third of CCITT G4. From Acrobat 9 (2008), JBIG2 became PDF's default for black-and-white scans. But in 2013 a Xerox copier using lossy JBIG2 (which permits "substitute a similar symbol") caused 6s in scanned contracts to be replaced by 8s, producing wrong dimensions in engineering blueprints. Xerox recalled the firmware. Legal and engineering industries have since defaulted to disabling JBIG2 and using lossless CCITT G4.

JBIG2 · SYMBOL DICTIONARY · ALL "e" → ONE TEMPLATE th e ent e r e d amount symbol dict #17 e (11×14 px) 3 个 'e' 字形 → 共 3×(11×14) = 462 px JBIG2 编码: 1 个 dict 模板(154 px) + 3 个 (x, y, refid) → 体积 1/3
图 40 · JBIG2 符号字典工作原理。一行文本 "the entered amount" 里有 3 个 'e',视觉上几乎一模一样;JBIG2 把一个 'e' 字形(连通域)抽取成 dictionary symbol #17,然后位流里只存一个模板加 3 个 (x, y, refid) 引用 —— 体积从 462 px 降到 154 px + 3 个坐标对。整个扫描页里所有重复字形都这样处理,实际能砍掉 50-70% 的 G4 体积。有损模式更激进:允许"非常相似的 symbol 共享一个模板",这就是 2013 年 Xerox 把数字 6 错配成 8 的原因 — 6 和 8 在扫描噪点下视觉相似度极高。
Fig 40 · JBIG2 symbol-dictionary mechanics. A line "the entered amount" contains three near-identical 'e' glyphs; JBIG2 extracts one 'e' connected component as dictionary symbol #17, then the bitstream stores only the template plus three (x, y, refid) references — 462 px collapses to 154 px + three coordinate triples. Across an entire scanned page, all repeated glyphs go through this dictionary, knocking 50-70 % off CCITT G4. Lossy mode is more aggressive: it allows "very similar symbols to share one template," which is exactly how 2013 Xerox mis-substituted '6' for '8' — the two digits are visually indistinguishable to the heuristic under scan noise.

技术内核

Technical core

JBIG2 三件事。① 把扫描页面切成 symbol(连通域)—— 编码器扫描整页,识别出所有连通像素块(每个字符 / 每个标点 / 每段线条),把视觉相似的 symbol 共享一个 dictionary 模板。位流变成"位置 + dictionary 引用",而不是"逐像素栅格"。② 三种 region 编码:(a) generic region 用 CABAC 算术编码逐像素压缩,处理不规则内容(图标、签名、印章);(b) text region 用上面的 symbol 字典,处理文本(占扫描合同的 90%);(c) halftone region 用 grayscale 模板字典,处理半调网点(扫描照片的二值化)。三种 region 在同一页里可混用 —— 编码器自动分割。③ 有损模式 vs 无损模式:无损模式严格匹配 symbol(只共享 bit-exact 相同的连通域);有损模式允许"用相似 symbol 替代",阈值由编码器决定 —— 体积更小,但可能静默修改字符。这就是 2013 年 Xerox 事故的根因:数字 6 和 8 在扫描噪点下连通域形状相似,有损 JBIG2 把同一份模板用在两个不同字符上,导致扫描出的合同跟原件数字不一样。Xerox 召回固件,法律 / 工程 / 医疗行业从此默认关 JBIG2 选无损 CCITT G4 —— 即便牺牲一倍体积也要保证 bit-exact。Acrobat 提供"无损 JBIG2"选项,但 default 是有损,所以你扫描合同前要手动关掉。

JBIG2 in three pieces. ① Slice the scanned page into symbols (connected components) — the encoder scans the whole page, identifies every connected pixel cluster (every character, every punctuation mark, every line stroke), and lets visually similar symbols share one dictionary template. The bitstream becomes "position + dictionary reference," not "pixel-by-pixel raster." ② Three region encodings: (a) generic region uses CABAC arithmetic per-pixel for irregular content (icons, signatures, stamps); (b) text region uses the symbol dictionary above, for text (about 90 % of a contract scan); (c) halftone region uses grayscale-template dictionaries, for halftone screens (the binarisation of scanned photos). All three coexist on a single page, with the encoder choosing per-area. ③ Lossy vs lossless mode: lossless matches symbols strictly (sharing only bit-exact identical components); lossy permits "substitute a similar symbol" by an encoder-side threshold — smaller, but can silently rewrite characters. That is exactly the 2013 Xerox bug's root cause: the digits 6 and 8 have visually similar connected components under scan noise, and lossy JBIG2 reused one template across two different characters — so the scanned contract's digits no longer matched the original. Xerox recalled the firmware, and legal / engineering / medical industries have since disabled JBIG2 in favour of lossless CCITT G4 — willing to pay 2× the size to guarantee bit-exactness. Acrobat does offer a "lossless JBIG2" option, but the default is lossy, so you must turn it off explicitly before scanning a contract.

适用

USE FOR

  • 非关键扫描黑白文档(图书馆藏书 / 报纸归档 / 普通账单)
  • 已开启无损模式的合同 / 票据扫描
  • 需要 PDF 体积砍 5-10× 的纯文本扫描场景
  • 医学胶片黑白图像归档(无损模式)
  • Non-critical B&W scans (library books, newspaper archives, casual statements)
  • Contract / receipt scans only when lossless mode is enabled
  • Pure-text scan PDFs needing 5-10× shrink
  • B&W medical-image archival (lossless mode)

反适用

AVOID

  • 灰度 / 彩色扫描(JBIG2 只有 1-bit · 用 JPEG 2000)
  • 法律合同(默认有损模式可能改字符 · 强烈建议无损或关闭)
  • 工程图纸 / 数字尺寸(2013 Xerox 事故先例)
  • 医学诊断报告(任何字符替换都不可接受)
  • Grayscale / colour scans (JBIG2 is 1-bit only — use JPEG 2000)
  • Legal contracts (default lossy can rewrite characters — force lossless or off)
  • Engineering blueprints with numeric dimensions (the 2013 Xerox precedent)
  • Medical diagnostic reports (no character substitution acceptable)
scopeencodersdecodersCLI
JBIG2 (ITU-T T.88) Adobe Acrobat Pro · LuraTech · CVision · ABBYY · Foxit Phantom 所有 PDF viewer(Adobe Reader · Preview · pdf.js)· 独立 .jb2 解码罕见 jbig2enc(开源 · Lepton)· jbig2dec(GhostScript)· Acrobat 命令行
前辈:predecessors: CCITT G3(1980 传真)· CCITT G4(逐行 RLE 二进制压缩) 起源:origin: Joint Bi-level Image Experts Group · ITU-T T.88(2000)· ISO/IEC 14492 主要在:primarily lives inside: PDF(Acrobat 9+ 黑白扫描默认) 事故标记:incident marker: Xerox 2013 · 数字替换 bug · 法律行业从此关有损模式

TGA — Truevision 时代的纹理王

TGA — the texture king from the Truevision era

YEAR 1984 (Truevision Targa) AUTHOR Truevision, Inc. EXT .tga · .targa · .icb · .vda MIME image/x-tga LOSSY 无损(可选 RLE) DEPTH 8 / 16 / 24 / 32 bit ALPHA ✓ (32-bit RGBA) STATUS 游戏纹理老兵 · 工具链中转

"3D 游戏行业用了 20 年的纹理格式 —— 因为 alpha 简单。"

"Twenty years of 3D game textures — chosen for its simple alpha."

1984 年 Truevision 公司推出 Targa 系列显卡 —— 这是早期 PC 真彩(24-bit)显卡的代表作,而 TGA(Targa Graphics Adapter)正是该卡的"出厂格式":一种结构极简的位图容器,用来把显卡里 24-bit RGB / 32-bit RGBA 像素数据原样存到磁盘。规范一句话能讲完:18 byte 文件头 + 可选 image ID + 可选调色板 + 像素数据 + 可选 RLE,解析比 BMP 还快(BMP 还得分 V3 / V4 / V5 几代)。Truevision 公司在 90 年代后期破产,但 TGA 因为另一个生态延续了生命 —— 1990 年代中期 id Software 的 Quake 引擎、Epic 的 Unreal 引擎、Valve 的 Source 引擎都把 TGA 当作纹理标准,理由极简单:① 32-bit RGBA 透明用一个独立 alpha 通道,不像 BMP 那样要靠 magic 像素;② 18 byte 头部解析 30 行 C 代码搞定,引擎启动时一次性吃掉成百上千张纹理零负担;③ 跨平台 (DOS / Windows / IRIX / Mac),老纹理工具链全部支持。所以"格式厂商死了,格式靠用户死撑"是 TGA 的故事 —— 你今天打开 Quake 1 的 mod 包,里面 90% 的纹理仍是 .tga,Photoshop 也仍原生支持。

In 1984 Truevision launched the Targa line of graphics cards — early flagship 24-bit colour cards for the PC — and TGA (Targa Graphics Adapter) was the card's "factory" format: a minimal bitmap container for dumping 24-bit RGB / 32-bit RGBA pixel data straight from VRAM to disk. The whole spec fits in a sentence: 18-byte header + optional image ID + optional palette + pixel data + optional RLE, parseable faster than BMP (which has the V3 / V4 / V5 generation soup). Truevision went out of business in the late 1990s, but TGA lived on inside another ecosystem — id Software's Quake, Epic's Unreal, and Valve's Source engine all adopted TGA as their texture standard for embarrassingly simple reasons: (1) 32-bit RGBA carries transparency in a real alpha channel, not BMP's magic-pixel hack; (2) the 18-byte header parses in 30 lines of C, so an engine can wolf down hundreds of textures at startup; (3) it's cross-platform (DOS / Windows / IRIX / Mac) and every legacy texture tool already supported it. So TGA's story is "the vendor died, but the users carried the format" — open a Quake 1 mod pack today and 90 % of the textures are still .tga, with Photoshop still supporting it natively.

TGA · FILE STRUCTURE · 极简 header img ID colormap pixel data (BGR/A) footer 18 byte opt opt 8/16 raw or RLE v2.0 byte 0 EOF ↑ BGR(A) 顺序 · 同 BMP · 反映 80s 显存布局
图 41 · TGA 文件结构 — 横向五段。① 18 byte 文件头(image type / 调色板属性 / 宽高 / 每像素位数 / origin / descriptor);② 可选 image ID 字段(通常空);③ 可选调色板(8 / 16-bit 模式才有);④ 真正的像素数据,顺序是 BGR(A) 而非 RGB(A)—— 这跟 BMP 一致,反映 80 年代显存按字节小端读出的事实;⑤ 可选的 v2.0 footer(20 byte),里面带 "TRUEVISION-XFILE.\0" 签名。整张图可选 RLE 压缩(每段 1 byte 头 + 像素),压缩率不高但解码 50 行 C。规范极简,所以 Quake/Unreal 系引擎用了 20 年。
Fig 41 · TGA file structure — five horizontal chunks. ① An 18-byte header (image type / palette attrs / width / height / bits-per-pixel / origin / descriptor); ② optional image ID (usually empty); ③ optional colormap (only for 8 / 16-bit modes); ④ the real pixel payload, ordered BGR(A) rather than RGB(A) — same as BMP, a reflection of 1980s little-endian byte-by-byte VRAM reads; ⑤ an optional v2.0 footer (20 bytes) carrying the "TRUEVISION-XFILE.\0" signature. The whole pixel block can be RLE-compressed (1-byte header per run + pixels) — compression isn't great but the decoder is 50 lines of C. That minimalism is exactly why Quake / Unreal engines used it for two decades.

技术内核

Technical core

TGA 内核三件事。① 18 byte 文件头是规范的全部 —— 包含 image type ID(决定 colormap / RGB / B&W,有无 RLE)、colormap 属性、image origin / 宽 / 高、bits-per-pixel(8 / 16 / 24 / 32)、image descriptor(里面有 alpha 位数 + 行扫描方向);整个解析 50 行 C 全搞定。这是极简的工程胜利:对比 BMP 三代 header(BITMAPINFOHEADER → V4 → V5),TGA 一份 header 写到死。② 像素深度 8 / 16 / 24 / 32 四档:8-bit indexed 走调色板(老游戏精灵图);16-bit 是 5:5:5 + 1-bit alpha(老 3D 卡硬件最爱);24-bit 真彩 BGR;32-bit BGRA 带完整 alpha —— 后两个是 1990 年代游戏纹理的事实标准。注意像素是 BGR 顺序(同 BMP),这是 80 年代 x86 小端 + 显存按 byte 读取的共识,移植到 OpenGL / Direct3D 时引擎要逐像素 swap。③ 可选 RLE 压缩:每段 1 byte run header(高位 1 表示 RLE,7 位长度) + 像素值,简陋但解码极快。Quake 1 / 2 / 3 / Unreal Tournament / Half-Life 1 时代的纹理基本都是 24-bit TGA + RLE off(磁盘大,但 mmap 进显存零拷贝),这个工作流一直延续到 2005 年前后 DDS / KTX 把 TGA 替代 —— 因为 GPU 直接支持的压缩纹理(DXT / BCn)能在 VRAM 里压成 1/4 体积,TGA 只是 raw RGBA。

TGA's core, three pieces. ① The 18-byte header is the entire spec — image type ID (colormap / RGB / B&W, RLE or not), colormap attrs, image origin / width / height, bits-per-pixel (8 / 16 / 24 / 32), image descriptor (alpha bit count + scan direction). The whole parser is 50 lines of C. A win for minimalism: compare BMP's three-generation header soup (BITMAPINFOHEADER → V4 → V5); TGA wrote one header and never changed it. ② Four pixel depths: 8 / 16 / 24 / 32 — 8-bit indexed via colormap (old game sprites); 16-bit as 5:5:5 + 1-bit alpha (favourite of early 3D cards); 24-bit truecolour BGR; 32-bit BGRA with full alpha — the last two are the 1990s game-texture standards. Note pixels are BGR-ordered (same as BMP), the consensus of 1980s x86 little-endian + byte-by-byte VRAM reads; engines must swap per pixel when porting to OpenGL / Direct3D. ③ Optional RLE: each run is 1 byte (top bit = RLE flag, 7 bits = length) + pixel value — crude, but very fast to decode. Quake 1 / 2 / 3 / Unreal Tournament / Half-Life 1 era textures were almost all 24-bit TGA, RLE off (bigger on disk, but mmap straight into VRAM with zero copies). That workflow lasted until ~2005 when DDS / KTX replaced TGA — because GPU-native compressed textures (DXT / BCn) shrink to 1/4 in VRAM, while TGA is just raw RGBA.

适用

USE FOR

  • 老 3D 引擎纹理(Quake / Unreal / Source 系 mod)
  • 需要带 alpha + 极简头的中转格式
  • 工具链中间格式(渲染输出 → TGA → 压缩成 DDS / KTX)
  • 视频后期合成中转(Nuke / After Effects 序列帧)
  • Old 3D-engine textures (Quake / Unreal / Source mods)
  • Intermediate format that wants alpha + a tiny header
  • Toolchain bridges (renderer output → TGA → DDS / KTX)
  • VFX compositing intermediates (Nuke / After Effects sequences)

反适用

AVOID

  • 现代 web(浏览器不支持 · 用 PNG / WebP)
  • 现代游戏引擎运行时(用 DDS / KTX)
  • 需要高压缩比的存储(TGA RLE 几乎无效)
  • HDR / 浮点纹理(TGA 只到 32-bit 整数)
  • Modern web (no browser support — use PNG / WebP)
  • Modern engine runtimes (use DDS / KTX)
  • Storage where compression matters (TGA RLE barely helps)
  • HDR / float textures (TGA caps at 32-bit integer)
scopeeditorsengines / readersCLI
TGA (Truevision) Photoshop 原生 · GIMP · Krita · Affinity Photo ✓✓ Quake / Unreal / Source / Cryengine 系老引擎 · Nuke · After Effects convert in.png out.tga(ImageMagick)· tga2png · stb_image(50 行 C)
同期:contemporary: BMP(同期 · 长期并存 · 同 BGR 顺序) 起源:origin: Truevision Targa 显卡(1984)· 卡带格式 游戏圈替代:replaced (in games) by: DDS(GPU 压缩纹理)· KTX(跨平台 GPU 容器) 仍活在:still alive in: 老 3D 引擎 mod / 工具链中转 / Photoshop 原生支持

ICO / CUR — 浏览器标签上的小图

ICO / CUR — the tiny image on your browser tab

YEAR 1985 (Windows 1.0) AUTHOR Microsoft EXT .ico · .cur(指针) MIME image/x-icon · image/vnd.microsoft.icon DEPTH BMP 内嵌(早期)+ PNG 内嵌(Vista+) ALPHA ✓ (32-bit BMP / PNG 内嵌) STATUS Web favicon 万年标准 · Windows 桌面图标

"它装着多分辨率的同一个图标,16 / 32 / 48 / 256 一锅端。"

"It packs the same icon at 16 / 32 / 48 / 256 — a multi-res bundle."

1985 年 Windows 1.0 出现时,Microsoft 面临一个具体的工程问题:同一个应用图标在 16×16(任务栏 / 标题栏)、32×32(桌面 / 文件管理器)、48×48(start menu)甚至更大尺寸下都要好看 —— 但单纯把 32×32 缩到 16×16 会糊掉,小尺寸需要手工像素绘制(每个像素都要算)。Microsoft 的解法是设计一个"多分辨率包" —— 一个 ICO 文件存 N 个 image entry,每个 entry 是一张完整图(不同尺寸 / 不同色深),操作系统按显示场景挑最合适的那个。1999 年 IE 5 把它推上 web 一等公民:<link rel="icon" href="favicon.ico"> 让浏览器标签也能展示网站图标 —— 这是 favicon 的诞生。Vista(2007)给 ICO 加了"内嵌 PNG"支持,256×256 大尺寸图标终于可用(BMP 256×256 太大,PNG 压缩后只剩 1/10)。CUR 是 ICO 的鼠标指针变种,几乎一样的容器结构,只多两个字段:hotspot x / y(指针的"实际点击位置",比如箭头尖在哪个像素)。今天每个浏览器仍优先认 favicon.ico,即便你已经声明了 SVG / PNG favicon —— 因为 IE 时代的事实标准实在太顽强了。

When Windows 1.0 shipped in 1985, Microsoft faced a concrete engineering problem: the same application icon had to look right at 16×16 (taskbar / titlebar), 32×32 (desktop / Explorer), 48×48 (start menu), and beyond — but naively scaling 32×32 down to 16×16 looks blurry, because at small sizes every pixel must be hand-painted. Microsoft's fix was a "multi-res package": one ICO file holding N image entries, each a complete image at a different resolution / colour depth, with the OS picking the best match for the display scenario. In 1999, IE 5 promoted it to a first-class citizen of the web: <link rel="icon" href="favicon.ico"> let browser tabs show site icons — that's the birth of favicon. Vista (2007) added "embedded PNG" support to ICO, finally making 256×256 icons practical (a 256×256 BMP is huge; PNG-compressed it's a tenth the size). CUR is the cursor variant — same container, plus two extra fields: hotspot x / y (which pixel of the cursor counts as "the click point," e.g. the arrow tip). Every browser today still favours favicon.ico even after you declare SVG / PNG favicons — the IE-era de-facto standard is just that hard to dislodge.

ICO · MULTI-RES CONTAINER HEADER 6 byte · n=4 ENTRY 0 16 · BMP ENTRY 1 32 · BMP ENTRY 2 48 · BMP ENTRY 3 256 · PNG 16×16 32×32 48×48 256×256 BMP 32-bit BMP 32-bit BMP 32-bit PNG (Vista+) 每个尺寸都是独立手绘 · OS 按场景选最合适
图 42 · ICO 容器结构。一个总容器 box 装着:① 6-byte 头(reserved + type + image count);② N 个 ICONDIRENTRY(16 byte 一条 · 描述每个 entry 的宽 / 高 / 色深 / 偏移);③ N 个真正的 image payload(早期 entry 内嵌 BMP,Vista 后允许 256×256 内嵌 PNG)。OS 显示时根据当前场景(任务栏 / 桌面 / 高 DPI)挑最合适的尺寸;每个尺寸都是独立手绘的,这就是为什么好的 ICO 在 16×16 看不糊 —— 不是缩出来的,是单独画的。
Fig 42 · ICO container layout. One outer box holds: ① a 6-byte header (reserved + type + image count); ② N ICONDIRENTRYs (16 bytes each, describing each entry's width / height / colour depth / offset); ③ N actual image payloads (early entries embed BMP; Vista+ allows 256×256 to embed PNG). At display time the OS picks the best size for the scenario (taskbar / desktop / hi-DPI); each size is independently hand-drawn, which is why a good ICO doesn't blur at 16×16 — that resolution wasn't downscaled, it was painted on its own.

技术内核

Technical core

ICO 内核三件事。① 容器结构 + N 个 image entry —— 6 byte ICONDIR 头(reserved + type=1 是 ICO / 2 是 CUR + image_count) + N 个 16 byte 的 ICONDIRENTRY(每条描述 width / height / colour count / planes / bit count / size / offset)+ N 段真正的 image data。entry 在文件末尾按偏移堆放。这种"目录 + payload"是 80 年代 PE / OLE 时期 Microsoft 的标准设计语言。② 早期 entry 内嵌 BMP,Vista 后允许内嵌 PNG —— 1985-2007 年所有 ICO 内嵌的都是 BMP(去掉 BITMAPFILEHEADER,只留 BITMAPINFOHEADER + 像素 + AND mask),32×32 32-bit alpha 一个 4096 byte 起步。Vista(2007)给 ICO 加 PNG 内嵌支持(magic 字节判断:开头 89 50 4E 47 是 PNG,否则当 BMP),终于让 256×256 大尺寸图标可用 —— BMP 256×256 32-bit 是 256 KB,PNG 压完通常 20-50 KB。这是 ICO 唯一一次重大演进。③ CUR 与 ICO 几乎一样,多 hotspot 字段 —— ICONDIRENTRY 里 reserved 的两个 byte,在 CUR 文件里被重新解释为 hotspot_x / hotspot_y(指针图像里的"实际点击点"坐标,比如箭头尖在 (0, 0) 像素位置)。这就是鼠标指针的全部技术差异。今天 favicon.ico 仍是浏览器最优先识别的图标格式 —— 即便你声明了 <link rel="icon" href="favicon.svg">,浏览器仍会先 GET /favicon.ico 再走 link 标签。

ICO's core, three pieces. ① Directory + N image entries — a 6-byte ICONDIR header (reserved + type=1 for ICO / 2 for CUR + image_count) + N 16-byte ICONDIRENTRYs (each describing width / height / colour count / planes / bit count / size / offset) + N image-data blobs piled at the end by offset. The "directory + payload" idiom is pure 1980s Microsoft (PE / OLE house style). ② Early entries embed BMP, Vista+ embeds PNG — from 1985 to 2007 every ICO entry was a BMP (BITMAPFILEHEADER stripped; just BITMAPINFOHEADER + pixels + AND mask), with a 32×32 32-bit alpha icon starting around 4 KB. Vista (2007) added PNG embedding (magic-byte sniff: 89 50 4E 47 means PNG, otherwise BMP), finally making 256×256 icons practical — a 256×256 32-bit BMP is 256 KB, PNG-compressed usually 20-50 KB. That was ICO's one and only major evolution. ③ CUR is ICO with hotspot fields — the two "reserved" bytes in ICONDIRENTRY are reinterpreted as hotspot_x / hotspot_y in CUR (the cursor's "actual click point," e.g. an arrow's tip lives at pixel (0, 0)). That's the whole cursor difference. Today favicon.ico is still the highest-priority icon for browsers — even with <link rel="icon" href="favicon.svg"> declared, browsers still GET /favicon.ico first, then check the link tags.

适用

USE FOR

  • Web favicon(浏览器最高优先级)
  • Windows 桌面 / 文件管理器 / 任务栏图标
  • 需要多分辨率打包的应用图标
  • CUR · 自定义鼠标指针(游戏 / 老 Windows 主题)
  • Web favicon (highest browser priority)
  • Windows desktop / Explorer / taskbar icons
  • Application icons that need multi-res packaging
  • CUR · custom mouse cursors (games / old Windows themes)

反适用

AVOID

  • 任何非 favicon / 非桌面图标场景
  • 需要矢量缩放的图标(用 favicon.svg)
  • 跨平台(macOS 用 .icns · Linux 用 PNG)
  • 动画图标(用 SVG / GIF / animated PNG)
  • Anything that isn't a favicon or desktop icon
  • Vector-scalable icons (use favicon.svg)
  • Cross-platform (macOS uses .icns; Linux uses PNG)
  • Animated icons (use SVG / GIF / animated PNG)
scopebrowsers / OSeditorsCLI
ICO / CUR (Microsoft) ✓✓✓ 所有浏览器(自 IE 5)· Windows native · macOS / Linux 也能读 Photoshop(插件)· GIMP · IcoFX · Greenfish Icon Editor convert in.png -resize 256 out.ico(ImageMagick)· icotool · png2ico
前辈:predecessor: BMP(早期 entry 内嵌 BMP) 起源:origin: Microsoft Windows 1.0(1985)· IE 5 favicon(1999) 扩展:extension: Vista(2007)内嵌 PNG · 终于支持 256×256 现代替代:modern alternatives: favicon.svg / apple-touch-icon.png · 但 favicon.ico 仍最高优先级

NetPBM (PPM / PGM / PBM) — 教科书最爱的 ASCII 三件套

NetPBM (PPM / PGM / PBM) — the textbook ASCII trio

YEAR 1988 AUTHOR Jef Poskanzer EXT .ppm · .pgm · .pbm · .pnm MIME image/x-portable-pixmap LOSSY 无压缩(纯文本 / raw bytes) DEPTH 1-bit (PBM) · 灰度 (PGM) · RGB (PPM) STATUS 学术 / 工具链中转 / Unix 哲学活化石

"你用文本编辑器就能写一张图。"

"You can write an image in a text editor."

1988 年 Jef Poskanzer 写 NetPBM 工具集时,需要一个"最简单的图像格式" —— 简单到 vim 能直接编辑、cat 能看出大致内容、Unix 管道能 convert in.png pgm:- | sharpen | convert pgm:- out.png 串联。三件套从极简递增:PBM(Portable Bitmap,1-bit 黑白)、PGM(Portable Graymap,灰度)、PPM(Portable Pixmap,RGB)。每个有两套编码:ASCII 模式(P1 / P2 / P3 magic)用空格分隔的十进制数字写像素值;binary 模式(P4 / P5 / P6 magic)用 raw 字节。ASCII 模式可以 vim 直接编辑像素 —— 这就是计算机视觉教学最常见的"hello world":学生第一次自己 fwrite 出一张图,几乎都是 PPM。NetPBM 工具集本身有 200+ 个小命令(pamflip / pamcat / pnmtopng / pamscale / ...),每个只做一件事 —— 这是 80s Unix 哲学的活化石,跟 grep / sed / awk 是同一种生物。今天 NetPBM 在生产几乎不用,但学术研究、算法测试、工具链中转格式至今仍用 —— 因为它太简单,任何人 1 小时能写完整的 PPM reader / writer。

In 1988, while writing the NetPBM toolkit, Jef Poskanzer needed "the simplest possible image format" — simple enough that vim could edit it directly, cat could roughly read it, and Unix pipes could chain it as convert in.png pgm:- | sharpen | convert pgm:- out.png. The trio steps up in capability: PBM (Portable Bitmap, 1-bit black & white), PGM (Portable Graymap, grayscale), PPM (Portable Pixmap, RGB). Each has two encodings: ASCII mode (magic P1 / P2 / P3) writing pixel values as space-separated decimal numbers; binary mode (P4 / P5 / P6) using raw bytes. The ASCII mode is editable in vim — which is why PPM is the canonical "hello world" of computer-vision teaching: a student's first fwrite-an-image is almost always a PPM. The NetPBM toolkit itself ships 200+ small commands (pamflip / pamcat / pnmtopng / pamscale / ...), each doing exactly one thing — a living fossil of 1980s Unix philosophy, kin to grep / sed / awk. Today NetPBM is almost never used in production, but academic work, algorithm tests, and toolchain bridges still rely on it — because it is so simple that anyone can write a complete PPM reader / writer in an hour.

PPM (P3) · ASCII · 4×4 RGB P3 4 4 255 255 0 0 255 0 0 0 255 0 0 255 0 0 0 255 0 0 255 255 255 0 255 255 0 ... ↑ vim 可直接编辑像素值 渲染结果
图 43 · PPM(P3 ASCII)文件示意。一份完整文件四行起步:① magic P3(P1=PBM ASCII / P2=PGM / P3=PPM,P4 / 5 / 6 是对应 binary);② 4 4 宽高;③ 255 maxval(色深上限,8-bit 就是 255);④ 后面是 4×4=16 个像素的 RGB 三元组。整个文件可用 vim 编辑、cat 阅读。binary 模式(P6)只把第 ④ 段换成 raw 字节,前三行仍是 ASCII —— 所以 PPM 文件 head -3 永远是文本头,parser 一行解析一个字段即可。
Fig 43 · PPM (P3 ASCII) file content. A complete file starts with four lines: ① magic P3 (P1 = PBM ASCII / P2 = PGM / P3 = PPM; P4 / 5 / 6 are the binary counterparts); ② 4 4 width and height; ③ 255 maxval (colour depth ceiling — 255 for 8-bit); ④ then 16 RGB triples for the 4×4 image. Edit it in vim, read it with cat. Binary mode (P6) only replaces section ④ with raw bytes — the first three lines stay ASCII — so head -3 on any PPM is always a text header, and a parser can lex one field per line.

技术内核

Technical core

NetPBM 三件套的内核三件事。① 三档色深 = 三个格式:PBM(1-bit 黑白,每像素 0 / 1)、PGM(灰度,每像素一个 0..maxval 的整数)、PPM(RGB,每像素三个 0..maxval 的整数)。再加一个伞名 PNM(Portable Anymap)代表"上面三个之一",NetPBM 工具命令 pnmtopng 表示"任何 PNM 都能转 PNG"。② ASCII 模式 + binary 模式两套 magic:P1(PBM ASCII)/ P2(PGM ASCII)/ P3(PPM ASCII)的像素值用空格 / 换行分隔的十进制数字写;P4 / P5 / P6 是对应的 binary 模式,像素是 raw 字节(PBM packed bits,PGM / PPM 是 1 byte 或 2 byte per channel 取决于 maxval)。ASCII 体积大但能 vim 编辑、binary 紧凑但仍头部 ASCII。③ 头部 = magic + 宽 + 高 + maxval(PBM 无 maxval)+ 像素值,字段之间用任意空白(空格 / 制表 / 换行)分隔,允许 # 开头的注释行。整个 spec 一页能写完。NetPBM 工具集设计哲学:200+ 个小命令(pamflip 翻转 / pamcat 拼接 / pnmtopng 转 PNG / pamscale 缩放 / pamcomp 合成 / ...),每个 source 几百行 C,只做一件事,可 Unix 管道串联 —— cat input.ppm | pamflip -lr | pnmtopng > output.png 是合法工作流。这套哲学今天活在 ImageMagick / FFmpeg 里,但 NetPBM 是更纯的"原版"。

NetPBM trio's core, three pieces. ① Three depths = three formats: PBM (1-bit B&W, pixel is 0 / 1), PGM (grayscale, pixel is one 0..maxval integer), PPM (RGB, pixel is three 0..maxval integers). Plus an umbrella name PNM (Portable Anymap) meaning "any of the three"; NetPBM's pnmtopng command means "any PNM convertible to PNG." ② ASCII + binary modes, two magics each: P1 (PBM ASCII) / P2 (PGM ASCII) / P3 (PPM ASCII) write pixel values as decimal numbers separated by whitespace; P4 / P5 / P6 are the binary counterparts (PBM packed bits, PGM / PPM 1 byte or 2 bytes per channel depending on maxval). ASCII is bulky but vim-editable; binary is compact but still has an ASCII header. ③ Header = magic + width + height + maxval (PBM has no maxval) + pixel values, fields separated by any whitespace, with # comments allowed. The whole spec fits on one page. NetPBM's toolkit philosophy: 200+ small commands (pamflip, pamcat, pnmtopng, pamscale, pamcomp, ...), each a few hundred lines of C, each doing one thing, all pipeable — cat input.ppm | pamflip -lr | pnmtopng > output.png is a legitimate workflow. That spirit lives on inside ImageMagick / FFmpeg today, but NetPBM is the purer original.

适用

USE FOR

  • 学术研究 / 计算机视觉教学(算法 hello world)
  • 算法测试(写 50 行 C / Python 就能 fwrite 出可视化)
  • 工具链中间格式(很多 Unix 命令直接读写 PPM)
  • 需要"vim 能改"的极端调试场景
  • Academic research / CV teaching (the algorithm "hello world")
  • Algorithm testing (50 lines of C / Python fwrites a visualisation)
  • Toolchain bridges (many Unix tools read / write PPM natively)
  • Extreme debugging where you need vim-editable pixels

反适用

AVOID

  • 任何生产 / web 场景(无压缩 · 体积比 BMP 还大)
  • 需要 alpha 通道(PPM 本身无 alpha · PAM 才有)
  • 需要色彩管理 / EXIF / 嵌入 ICC 的场景
  • 跟现代 web / GPU 工具链对接(用 PNG)
  • Any production / web context (no compression — bigger than BMP)
  • Anything needing alpha (PPM has none — only PAM does)
  • Workflows needing colour management / EXIF / embedded ICC
  • Modern web / GPU toolchains (use PNG)
scopereaderseditorsCLI
NetPBM (PPM / PGM / PBM / PNM) GIMP · ImageMagick · ffmpeg · Python (PIL / OpenCV) · 几乎所有 Unix 图像工具 任意文本编辑器(ASCII 模式)· GIMP · ImageMagick NetPBM 200+ 命令套件:pamcat · pnmtopng · pnmtotiff · pamflip · pamscale
前辈 / 哲学:predecessor / philosophy: 1980s Unix 极简哲学(grep / sed / awk 同代) 起源:origin: Jef Poskanzer · 1988 · NetPBM 工具集 扩展:extension: PAM(Portable Arbitrary Map · 加 alpha + 多通道)· PFM(浮点 HDR 表亲) 现代继承:modern descendants: Farbfeld(2014 极简继承)· QOI(2021 也是"超简单"哲学)

XBM / XPM — X Window 的 ASCII 图

XBM / XPM — X Window's ASCII images

YEAR 1985 (XBM) · 1989 (XPM) AUTHOR X Consortium · Bull Research EXT .xbm · .xpm FORMAT 实质就是 C 源码 DEPTH 1-bit (XBM) · 多色 + 调色板 (XPM) STATUS 历史 · X Window 老应用图标

"图片就是 C 数组,#include 进 X 程序就能用。"

"The image is a C array — #include it into your X program."

1985 年 X Window System 在 MIT Athena 项目里诞生时,所有 X 应用都是 C 写的;开发者需要把图标(光标 / 应用 logo / 工具栏 button)直接嵌入程序 —— 当时没有"资源文件"的标准做法(Windows 的 .rc 资源系统 1985 年也才刚出来)。X Consortium 想出一个绝妙的偷懒解法:既然程序是 C,那图标就写成C 字节数组,编译时 #include "icon.xbm" 直接进 binary。XBM(X Bitmap)是 1-bit 黑白:static char name_bits[] = { 0xff, 0x80, 0x40, ... };,每个 byte 8 个像素。1989 年法国 Bull 公司扩展成 XPM(X PixMap)加调色板:文件顶部声明每个 ASCII 字符对应一种颜色(" c None" 透明,". c #ffffff" 白,"+ c #000000" 黑),下面是字符矩阵图,每个像素用一个或多个 ASCII 字符表示 —— 整个 .xpm 文件本身就是合法 C 字符串数组。Web 早期 Mozilla / Netscape 也支持过 XBM / XPM(因为 Unix 上的浏览器开发者太熟了),但 1990 年代后 PNG / GIF 成为主流,XBM / XPM 退到 X 老应用领域。今天 GIMP 安装目录里仍有 .xpm 图标,fvwm / twm 老 X 主题也仍用 XPM —— 这是 80 年代 Unix-C 共生关系的活化石。

When X Window System was born at MIT's Project Athena in 1985, every X app was written in C, and developers needed to embed icons (cursors / app logos / toolbar buttons) directly inside binaries — at the time there was no standard "resource file" idiom (Windows' .rc system also only emerged in 1985). The X Consortium's clever lazy fix: since the program is C, write the icon as a C byte array, then #include "icon.xbm" at compile time. XBM (X Bitmap) was 1-bit B&W: static char name_bits[] = { 0xff, 0x80, 0x40, ... };, eight pixels per byte. In 1989 France's Bull Research extended it to XPM (X PixMap) with palettes: the file header declares one ASCII character per colour (" c None" transparent, ". c #ffffff" white, "+ c #000000" black), followed by a character matrix where each pixel is one or more ASCII characters — the whole .xpm file is itself a valid C string array. Early web Mozilla / Netscape supported XBM / XPM (Unix-side browser devs knew them intimately), but after the 1990s PNG / GIF took over and XBM / XPM retreated to legacy X apps. Today GIMP's install directory still ships .xpm icons; fvwm / twm legacy X themes still use XPM — a living fossil of the 1980s Unix-C symbiosis.

XPM · 一个 6×6 笑脸图标 /* XPM */ static char *smile[] = { "6 6 3 1", /* w h ncol cpp */ " c None", /* space = transp */ ". c #c89a3a", /* . = yellow */ "+ c #15171c", /* + = black */ " .... ", " .+..+. ", " ...... ", " .+..+. ", " .... "}; ↑ 整文件 = C string 数组 #include "smile.xpm" 直接用
图 44 · XPM 文件内容示例 — 一个 6×6 像素的笑脸图标。文件分两段:① 顶部 colormap(" " = 透明,"." = 黄,"+" = 黑),声明每个 ASCII 字符对应的颜色;② 字符矩阵图,每行一个字符串字面量,每个字符就是一个像素。整文件是合法的 C string 数组,#include "smile.xpm" 直接编译进 binary。这就是为什么早期 X 程序能"内嵌图标" —— 不需要加载器,编译时就嵌进去了。
Fig 44 · XPM file content — a 6×6 smiley icon. The file is two parts: ① a top colormap (" " = transparent, "." = yellow, "+" = black) declaring each ASCII character's colour; ② a character matrix — one string literal per row, each character is one pixel. The whole file is a valid C string array; #include "smile.xpm" compiles it straight into the binary. That's how early X programs "embedded icons" — no loader needed; the icon enters the binary at compile time.

技术内核

Technical core

XBM / XPM 内核两件事。① XBM = 1-bit 黑白 + C 字节数组 —— 文件就是 #define name_width 16 / #define name_height 16 + static char name_bits[] = { 0xff, 0x80, ... };;每个 byte 装 8 个像素(LSB-first,跟 X server 的 bitmap 图像对齐),解析 = C 编译器读取 = 0 解码代价。这种"图片即源码"的设计,只有在开发者就是用户(X Window 程序员)的语境下才合理 —— 没有非开发者会写 XBM。② XPM = 多色 + 调色板 + 字符矩阵 —— 1989 年 Bull 公司扩展:头部声明 "width height ncolors chars_per_pixel",然后是 ncolors 行 colormap("X c #rrggbb""X c colorname"),最后是字符矩阵(每行一个字符串字面量)。chars_per_pixel 通常是 1,但调色板超过 ASCII 印刷字符数(~94)时可以用 2 个字符代表一个像素(支持上千色)。整文件仍是 C string 数组语法,所以可 #include 进程序也可以从磁盘 fopen 解析(libXpm 提供解析器 —— 但其实就是个 C 源码 lexer)。这种"格式即源码"的设计后来还有几个变种:Apple PICT 早期可序列化为 PostScript,Lisp Machine 直接把图片存成 sexp,但只有 XBM / XPM 真正广泛使用过。今天 XBM / XPM 几乎被 PNG / SVG 完全替代,但你打开 GIMP 安装目录(/usr/share/gimp/2.10/)里的 themes / icons,仍能看到大量 .xpm —— 老 X 应用的代码资产惯性。

XBM / XPM's core, two pieces. ① XBM = 1-bit B&W + C byte array — the file is #define name_width 16 / #define name_height 16 + static char name_bits[] = { 0xff, 0x80, ... };; each byte holds 8 pixels (LSB-first to align with X server bitmaps), and "parsing" means letting the C compiler read it — zero decoding cost. The "image is source code" design only makes sense when developers are the users (X programmers). Non-developers don't author XBMs. ② XPM = multi-colour + palette + character matrix — Bull's 1989 extension declares "width height ncolors chars_per_pixel" at the top, then ncolors colormap lines ("X c #rrggbb" or "X c colorname"), then a character matrix (one string literal per row). chars_per_pixel is usually 1, but if the palette exceeds the ~94 printable ASCII range you can use 2 chars per pixel (supporting thousands of colours). The file remains valid C string-array syntax, so it's both #include-able into a program and fopen-parseable from disk (libXpm provides a parser — really just a C-source lexer). Several variants attempted similar tricks later (Apple PICT serialised to PostScript; Lisp Machines stored images as sexps), but XBM / XPM are the only ones that saw real adoption. They are essentially obsolete today, but the GIMP install directory (/usr/share/gimp/2.10/) still ships piles of .xpm icons in themes / icons — pure code-asset inertia from legacy X apps.

适用

USE FOR

  • (历史)X Window 系应用图标
  • (历史)fvwm / twm / IceWM 等老 X 主题
  • (历史)GIMP / xterm / xfig 内嵌图标资产
  • 极简调试场景(直接 cat 看图,因为是 ASCII)
  • (legacy) icons in X Window applications
  • (legacy) fvwm / twm / IceWM old X themes
  • (legacy) embedded icons in GIMP / xterm / xfig
  • Extreme debugging — cat the file and "see" the image (it's ASCII)

反适用

AVOID

  • 任何现代场景(用 PNG / SVG)
  • 非 X 平台(Windows / macOS 几乎不用)
  • 高色深 / 大尺寸(XPM 字符矩阵在 256×256 24-bit 下文件巨大)
  • 需要压缩 / alpha 通道 / 色彩管理
  • Anything modern (use PNG / SVG)
  • Non-X platforms (almost unused on Windows / macOS)
  • High-depth / large images (XPM's char matrix bloats at 256×256 24-bit)
  • Anything needing compression / alpha / colour management
scopereaderseditorsCLI
XBM / XPM (X Consortium) ImageMagick · GIMP · libXpm · 老 X11 应用 · 早期 Mozilla / Netscape 任意文本编辑器(本质是 C 源码)· GIMP 导出 · pixmap-tools xbmtopbm · pamtoxpm · convert in.png out.xpm(ImageMagick)
前辈:predecessor: X Consortium 1985(MIT Athena 项目)· "图片即 C 源码"哲学 起源 / 扩展:origin / extension: XBM(1985 1-bit)→ XPM(1989 Bull · 多色 + 调色板) 同期不同生态:contemporary, different ecosystem: ICO(Windows 一边)/ XBM(Unix 一边) 被替代:replaced by: PNG / SVG · 但 GIMP / fvwm 等老 X 应用仍带 .xpm 资产

PCX — DOS 时代的痕迹

PCX — a fingerprint of the DOS era

YEAR 1985 AUTHOR ZSoft Corporation EXT .pcx · .dcx (multi-page) MIME image/x-pcx LOSSY 无损 RLE DEPTH 1 / 4 / 8 / 24 bit STATUS 历史 · DOS / 早期 BBS 时代图片标准

"DOS Paintbrush 的产物,有过 BBS 时代的辉煌。"

"From DOS Paintbrush — once king of the BBS era."

1985 年 ZSoft 公司推出 PC Paintbrush —— 这是 DOS 时代最流行的画图程序;Microsoft 1990 年代收购 ZSoft 部分技术后,把 PC Paintbrush 改名 Windows Paintbrush 内置进 Windows 3.0(后来又改名 Paint)。配套的 PCX 格式有两个核心特性:简单 RLE 压缩(让 PCX 比 BMP 小一半)+ 可选 256-color 调色板放在文件末尾(这是个怪设计,也是 PCX 最大的工程特色)。1980 年代后期 BBS 时代是 PCX 的高光时刻 —— 那个年代上传 / 下载图片靠 9600 / 14400 baud 拨号调制解调器(几 KB/s),体积每减一半,下载时间就少一半。BBS 上传图片实际标准就是 PCX,跟 ZIP 套着发是常见组合。1990 年代初 GIF(1987)凭 LZW 压缩(更高压缩比 + 调色板更优)+ AOL / CompuServe 的推广迅速取代 PCX,JPEG(1992)再砍掉照片场景,PCX 跌出舞台。今天 PCX 几乎只在游戏考古(老 DOS 游戏的图形资产)和 fax 系统(早期 fax 软件用 PCX 做缓存)里出现;但 ImageMagick / GIMP 仍能读 .pcx 文件,这是格式生态学里"读得了但没人写"的典型案例。

In 1985 ZSoft Corporation launched PC Paintbrush — the most popular drawing program of the DOS era. Microsoft acquired some of ZSoft's tech in the early 1990s, renamed PC Paintbrush as Windows Paintbrush, and bundled it into Windows 3.0 (later renamed Paint). The accompanying PCX format had two core traits: simple RLE compression (making PCX about half the size of BMP) plus an optional 256-colour palette placed at the end of the file (a quirky design, and PCX's signature engineering trait). The late-1980s BBS era was PCX's golden age — that period's image transfers ran over 9600 / 14400 baud modems (a few KB/s), where halving file size meant halving download time. PCX was effectively the BBS image standard, often bundled inside ZIP archives. In the early 1990s GIF (1987) overtook PCX through LZW compression (better ratio + better palette handling) and AOL / CompuServe distribution; JPEG (1992) then claimed the photo niche, and PCX fell off the stage. Today PCX really only shows up in game archaeology (DOS-era game graphics) and fax systems (early fax software used PCX as an internal cache); ImageMagick / GIMP still read .pcx, a textbook case of "readable but no one writes" in format ecology.

PCX · FILE STRUCTURE · 末尾调色板 HEADER RLE pixel data PALETTE 128 byte 变长 · 1-bit RLE 256 × 3 byte manuf / ver / w / h 高 2 位 = 11 表示 RLE 末尾! byte 0 EOF ↑ DOS 早期生成时 还不知道用了哪些颜色
图 45 · PCX 文件结构 — 横向三段。① 128 byte 文件头(manufacturer=10 / version / encoding / bits-per-pixel / window 边界 / dpi / 16-color EGA 调色板 / planes 数 / bytes-per-line);② RLE 像素数据(每段 = 1 byte run header + 1 byte 像素值;run header 高 2 位为 11 时,低 6 位是 run length 1-63);③ 256-color VGA 调色板放在文件末尾(769 byte:1 byte signature 0x0C + 256 × 3 byte RGB)。这个"调色板放末尾"的怪设计源于 DOS 早期生成图像时还不知道用了哪些颜色,只能边写像素边累计调色板,生成完才能写出来 —— 顺序写文件 + 不能 seek 回头改的硬约束。
Fig 45 · PCX file structure — three horizontal chunks. ① A 128-byte header (manufacturer = 10 / version / encoding / bits-per-pixel / window bounds / dpi / 16-colour EGA palette / planes / bytes-per-line); ② RLE pixel data (each run = 1 byte header + 1 byte pixel value; if the header's top two bits are 11, the lower six bits are the run length 1-63); ③ the 256-colour VGA palette lives at the file end (769 bytes: 1-byte signature 0x0C + 256 × 3 RGB bytes). The "palette at the end" oddity comes from a DOS-era constraint: when generating the image, the encoder didn't yet know which colours would be used — it had to accumulate the palette while streaming pixels, and only emit it once finished. Sequential write + no-seek constraint of early DOS file I/O.

技术内核

Technical core

PCX 内核两件事。① 简单 RLE 压缩 —— 像素数据每段读 1 byte:如果高 2 位是 11,低 6 位(1-63)就是 run length,下一 byte 是要重复的像素值;否则这 byte 本身就是单像素。这是极简 RLE,压缩率不高(漫画 / 大色块图能砍 50%,真彩照片几乎没效果),但解码代价低,DOS 上 8086 CPU 也能实时解。规范一句话写完。② 头部 128 byte + 像素数据 + 可选 256-color 调色板放在文件末尾 —— 这是 PCX 最大的工程特色,也是它的根源。原因:DOS 早期生成图像时是顺序写文件的(8086 上文件 seek 慢且不可靠),编码器边读屏幕像素边写 RLE,但还不知道会用到哪些颜色 —— 索引扩展时(偶尔出现新颜色)只能累计调色板,直到写完所有像素才能在文件末尾追加 769 byte 调色板(1 byte signature 0x0C 标记 + 256 × 3 byte RGB)。这是"流式编码 + 只能追加"硬约束下的产物,跟今天的 PNG / WebP 必须先决定调色板是完全相反的工程权衡。1980 年代 BBS 时代下载 PCX 的人其实经常解码到一半就显示出像素 —— 但调色板还没下载完,所以图像颜色是错的,直到下载完整个文件(包括末尾调色板),才能用正确颜色重绘。这就是 BBS 时代特有的"图像渐进显示但颜色慢慢校准"体验。今天的 progressive JPEG 是有意为之,PCX 的渐进显示其实是意外的副作用

PCX's core, two pieces. ① Simple RLE — read 1 byte from the pixel stream: if its top two bits are 11, the lower six bits (1-63) give the run length and the next byte is the pixel value to repeat; otherwise the byte is itself a single pixel. Minimalist RLE — modest ratios (50 % for cartoons / flat-colour images, near-nothing for photos), but cheap to decode (real-time on a DOS 8086 CPU). The whole spec fits in one sentence. ② 128-byte header + pixel data + optional 256-colour palette at the end of the file — PCX's signature engineering trait, and the source of all its quirkiness. Reason: in early DOS, file generation was sequential (8086 file seek was slow and unreliable). The encoder streamed screen pixels into RLE while writing the file, yet didn't yet know the full set of colours used — palette accumulation could pick up a new colour at any time, and the palette could only be appended (769 bytes: 1-byte signature 0x0C + 256 × 3 bytes RGB) once all pixels had been written. It's the product of a "streaming encode, append-only" hardware constraint — the opposite of today's PNG / WebP, which must commit a palette up front. BBS-era PCX downloaders frequently saw the image render mid-download — but with wrong colours, until the trailing palette finally arrived and a redraw fixed them. That's the BBS-specific experience: progressive image display with gradually-correcting colours. Modern progressive JPEG is intentional; PCX's progressive display was an accidental side effect.

适用

USE FOR

  • (历史)DOS 时代图像 / PC Paintbrush 输出
  • (历史)BBS 上传 / 下载图片
  • (历史)早期 Windows 3.0 / 3.1 应用
  • 游戏考古 / 老 DOS 游戏图形资产解码
  • (legacy) DOS-era images / PC Paintbrush output
  • (legacy) BBS image upload / download
  • (legacy) early Windows 3.0 / 3.1 applications
  • Game archaeology — DOS-era graphics asset decoding

反适用

AVOID

  • 任何现代场景(用 PNG / WebP)
  • 真彩照片(RLE 几乎无压缩 · 用 JPEG)
  • 需要 alpha 通道(PCX 不支持透明)
  • 需要色彩管理 / 嵌入 ICC profile
  • Anything modern (use PNG / WebP)
  • True-colour photos (RLE buys ~nothing — use JPEG)
  • Anything needing alpha (PCX has no transparency)
  • Workflows needing colour management or embedded ICC
scopereaderseditorsCLI
PCX (ZSoft) ImageMagick · GIMP · IrfanView · XnView · 老 DOS 软件 ~ GIMP 导出 · ImageMagick · 现代编辑器很少原生写 PCX convert in.pcx out.png(ImageMagick)· pcxtoppm(NetPBM)
前辈:predecessor: ZSoft PC Paintbrush(1985 DOS)· 后被 Microsoft 收购 → Windows Paintbrush 起源:origin: ZSoft Corporation · 1985 · DOS 画图程序配套 被替代:replaced by: GIF(BBS 时代后期)· JPEG(照片场景)· BMP(Windows 内置) 仍活在:still alive in: 老 DOS 游戏资产 / 早期 fax 系统缓存 / ImageMagick 仍能读

Sun Raster — 工作站老照片

Sun Raster — workstation snapshots

YEAR 1988 AUTHOR Sun Microsystems EXT .ras · .rast · .rs DEPTH 1 / 8 / 24 / 32 bit STATUS 已死 · 仅博物馆

"SunOS 屏幕截图的格式 —— 现在只在博物馆里。"

"SunOS screenshot format — found only in museums."

1988 年 Sun Microsystems 推出 SPARCstation —— 那个年代 Unix 工作站的代名词,"a network is the computer" 的实体。SunOS 桌面环境(OpenWindows,基于 NeWS / X11 混合)需要一个标准截图格式,Sun 顺手定义了 Sun Raster:32-byte header(magic / width / height / depth / length / type / colormap type / colormap length)+ 可选 colormap + raw 或 byte-RLE 像素流。极简到一页规格能写完。当时 X11 + xv(著名的图像查看器)是 Sun 工作站圈子的实际标准 —— 写论文 / 投幻灯片 / 文档配图,大家都用 .ras。但走出 Sun 生态就没人认了:PC 那边是 BMP / GIF / PCX,Mac 那边是 PICT / TIFF,Sun Raster 是 Sun 圈内独有方言。1990 年代后期 SGI / HP / IBM 各家工作站逐渐被 Linux / PC 替代,Sun 自己 2009 年被 Oracle 收购,Sun Raster 就跟 SunOS 一起进入历史。今天 ImageMagick 仍能读 .ras,但写它的人几乎绝迹 —— 跟 PCX 一样的"读得了但没人写"格式 fossil。

In 1988 Sun Microsystems shipped the SPARCstation — the very face of Unix workstations in that era, the physical embodiment of "the network is the computer". SunOS's desktop environment (OpenWindows, a NeWS / X11 hybrid) needed a standard screenshot format, and Sun casually defined Sun Raster: a 32-byte header (magic / width / height / depth / length / type / colormap type / colormap length), an optional colormap, and a raw-or-byte-RLE pixel stream. Minimal enough to fit on a single page of spec. At the time X11 + xv (the legendary image viewer) was the workstation circle's de-facto stack — papers, slides, documentation figures, all shipped as .ras. But step outside the Sun ecosystem and no one knew the format: PC land had BMP / GIF / PCX, Mac land had PICT / TIFF, Sun Raster was a Sun-only dialect. In the late 1990s SGI / HP / IBM workstations gave way to Linux / PCs; Sun itself was acquired by Oracle in 2009, and Sun Raster slipped into history alongside SunOS. ImageMagick still reads .ras today, but writers have all but vanished — another "readable but no one writes" fossil, just like PCX.

SUN RASTER · FILE LAYOUT MAGIC HEADER COLORMAP PIXELS 4 byte 28 byte 可选 raw / byte-RLE 0x59A66A95 w/h/depth/... 8-bit indexed RGB / BGR / 索引 'Y' + 三个非 ASCII byte ↑ 设计者刻意"坏掉"
图 46 · Sun Raster 文件布局 — 横向四段:① 4-byte magic 0x59A66A95;② 28-byte 头部(width / height / depth / length / type / colormap type / colormap length);③ 可选 colormap(8-bit 索引模式);④ 像素流 raw 或 byte-RLE。整个 spec 一页能写完,跟同时代 PCX(128-byte 头 + 末尾调色板)的工程取舍正好相反 —— Sun Raster 把头放在最前。
Fig 46 · Sun Raster file layout — four horizontal chunks: ① 4-byte magic 0x59A66A95; ② 28-byte header (width / height / depth / length / type / colormap type / colormap length); ③ optional colormap (8-bit indexed mode); ④ pixel stream, raw or byte-RLE. The whole spec fits on a single page — the opposite engineering trade-off from contemporary PCX (128-byte header + trailing palette); Sun Raster puts everything up front.

技术内核

Technical core

Sun Raster 内核就一件事:32-byte header + 可选 colormap + raw 或 byte-RLE 像素。头部 8 个 32-bit big-endian 字段:magic(0x59A66A95)、width、height、depth(1/8/24/32)、length(像素数据字节数)、type(0=old / 1=standard / 2=byte-encoded RLE / 3=RGB / 5=TIFF / 6=IFF)、colormap type(0=none / 1=RGB / 2=raw)、colormap length。pixels 类型决定是 raw 还是 RLE:byte-encoded RLE 简单粗暴,跟 PCX RLE 一脉相承 —— 看到 0x80 byte 就开始 run-length 编码(0x80 = escape · 0x80 0x00 = 单个 0x80 字面量 · 0x80 N V = 重复 N 次 V)。整个格式没有 chunk 系统、没有 metadata、没有 EXIF、没有 ICC profile、没有 alpha(depth=32 时第 4 个 byte 是保留位通常不渲染)。这就是 1988 年工作站语境的设计:系统截图工具需要的是简单 + 快 + 跟 X server 内存布局对齐,其它都是包袱。Sun Raster 跟同期(也已死)Silicon Graphics 的 SGI RGB(.rgb / .sgi)、HP 的 PCL Raster 是同一类东西 —— 工作站厂商各自定义的"系统级图片格式",随着工作站本身退场而消亡。

Sun Raster's core is one thing: a 32-byte header + optional colormap + raw or byte-RLE pixels. The header has eight 32-bit big-endian fields: magic (0x59A66A95), width, height, depth (1 / 8 / 24 / 32), length (pixel byte count), type (0 = old / 1 = standard / 2 = byte-encoded RLE / 3 = RGB / 5 = TIFF / 6 = IFF), colormap type (0 = none / 1 = RGB / 2 = raw), colormap length. The type field decides raw versus RLE: byte-encoded RLE is brute-simple and inherits straight from PCX RLE — when a 0x80 byte is seen, run-length kicks in (0x80 = escape · 0x80 0x00 = a literal 0x80 · 0x80 N V = N copies of V). The format has no chunk system, no metadata, no EXIF, no ICC profile, no alpha (depth = 32 reserves the 4th byte but typically doesn't render it). That's the 1988 workstation mindset: a system screenshot tool wants simple + fast + aligned with the X server's framebuffer; everything else is overhead. Sun Raster sits next to Silicon Graphics' SGI RGB (.rgb / .sgi) and HP's PCL Raster as a class of "system-level image formats" defined by individual workstation vendors — and they all died with the workstations themselves.

适用

USE FOR

  • (已死)SunOS / OpenWindows 屏幕截图
  • (已死)1990s X11 + xv 工作站论文配图
  • 计算机历史考古 · 老 Sun 实验室资产解码
  • 博物馆数字化项目
  • (dead) SunOS / OpenWindows screenshots
  • (dead) 1990s X11 + xv workstation paper figures
  • Computing-history archaeology — decoding old Sun lab assets
  • Museum digitisation projects

反适用

AVOID

  • 任何现代场景(用 PNG / WebP)
  • 需要 alpha / metadata / 色彩管理
  • 非 Unix 工作站平台
  • 需要现代浏览器或编辑器原生支持
  • Anything modern (use PNG / WebP)
  • Anything needing alpha / metadata / colour management
  • Non-Unix-workstation platforms
  • Anything needing native browser or editor support
scopereaderseditorsCLI
Sun Raster (Sun Microsystems) ImageMagick(legacy)· NetPBM · 老 xv / xli · libsun-raster ~ ImageMagick 还能写 · GIMP 早期版本支持读 · 几乎无现代写入器 convert in.ras out.png(ImageMagick)· rasttopnm(NetPBM)
起源:origin: Sun Microsystems 1988 · SPARCstation 配套 / SunOS OpenWindows 截图 同期同类:contemporaries of the same kind: SGI RGB(.rgb / .sgi)· HP PCL Raster · 各家工作站系统级图片格式 被替代:replaced by: PNG(完全替代)· TIFF(科研存档) 仍活在:still alive in: ImageMagick / NetPBM 仍能读 · 实际生产几乎绝迹

IFF / ILBM — Amiga 的传家宝

IFF / ILBM — Amiga's family heirloom

YEAR 1985 AUTHOR Electronic Arts · Jerry Morrison EXT .iff · .lbm · .ilbm STATUS Amiga 复古社区

"chunk 容器思想的祖宗 —— PNG 都受它影响。"

"The grandfather of chunk-based containers — even PNG owes it credit."

1985 年 Commodore 推出 Amiga 1000 —— 那台领先时代 5 年的多媒体个人电脑(custom 芯片组 Agnus / Denise / Paula 同时跑图形 / 音频 / DMA,自定义协处理器堆出 4096 色 HAM 模式 / 4 路立体声 8-bit PCM,1985 年的硬件配置直到 1990s 中期 PC 才追上)。EA(Electronic Arts)的工程师 Jerry Morrison 为 Amiga 设计了 IFF(Interchange File Format)—— 一个"通用的多媒体容器":每个 chunk 是 4-byte ASCII ID + 4-byte big-endian length + payload,FORM 是顶层 chunk(描述具体类型如 ILBM = Interleaved BitMap),内嵌 BMHD(图像头)、CMAP(调色板)、BODY(像素数据)等子 chunk。主流 image type 是 ILBM,按 bitplane 而非 packed pixels 存储 —— 6 张 1-bit bitmap 叠加表达 6-bit 色,正好对应 Amiga 显存的 bitplane 硬件布局。这套 chunk 容器思想后来直接影响了 PNG / WebP / RIFF / AIFF / ISOBMFF / MP4 —— 你今天用过的几乎所有 chunk-based 文件格式都欠 IFF 一个 credit。Amiga 1994 年破产被 Commodore 拖死,IFF 跟着退场到复古圈子,但它的设计 DNA仍活在你每天用的格式里。

In 1985 Commodore launched the Amiga 1000 — a multimedia PC five years ahead of its time (custom chipset Agnus / Denise / Paula running graphics / audio / DMA in parallel; coprocessors stacking up to 4096-colour HAM mode and four-channel 8-bit stereo PCM; 1985 hardware specs PCs only caught up to in the mid-1990s). EA's Jerry Morrison designed IFF (Interchange File Format) for Amiga — a "universal multimedia container": every chunk is a 4-byte ASCII ID + 4-byte big-endian length + payload; FORM is the top-level chunk (declaring the concrete type, e.g. ILBM = Interleaved BitMap), with sub-chunks like BMHD (image header), CMAP (palette), BODY (pixel data). The dominant image type is ILBM, stored as bitplanes rather than packed pixels — six 1-bit bitmaps stacked to represent 6-bit colour, matching Amiga's bitplane framebuffer layout exactly. This chunk-container idiom went on to shape PNG / WebP / RIFF / AIFF / ISOBMFF / MP4 — almost every chunk-based file format you touch today owes IFF a credit. Amiga went down with Commodore in 1994 and IFF retreated to the retro scene, but its design DNA still lives in formats you use every day.

IFF · CHUNK TREE FORM "ILBM" (top-level chunk · 4B id + 4B len + payload) BMHD CMAP BODY w / h / depth RGB palette pixels (planar bitplanes) ILBM = Interleaved BitMap 每行像素拆成 N 张 1-bit bitmap plane 0 — bit 0 plane 1 — bit 1 plane 2 — bit 2 plane 3 — bit 3 ↑ 对齐 Amiga 显存 bitplane 硬件
图 47 · IFF chunk 树 + ILBM bitplane 布局。左:FORM "ILBM" 顶层容器内嵌 BMHD(头)、CMAP(调色板)、BODY(像素)三个 sub-chunk —— 这就是 chunk-based 容器的范本。右:ILBM 把图像拆成 N 张 1-bit bitmap(plane 0 = 像素值的 bit 0,plane 1 = bit 1,...)按行交错存储,跟 Amiga 显存 bitplane 硬件直接对齐 —— 显示时 Denise 芯片可以同时从 N 张 bitmap 取位组成像素索引。
Fig 47 · IFF chunk tree + ILBM bitplane layout. Left: a FORM "ILBM" top-level container holding BMHD (header), CMAP (palette), BODY (pixels) sub-chunks — the very prototype of chunk-based containers. Right: ILBM splits the image into N 1-bit bitmaps (plane 0 = bit 0 of the pixel value, plane 1 = bit 1, ...), interleaved row by row, matching Amiga's bitplane framebuffer hardware so the Denise chip could read N bitmaps in parallel to assemble pixel indices.

技术内核

Technical core

IFF / ILBM 内核两件事。① chunk = 4-byte ASCII ID + 4-byte big-endian length + payload(payload 末尾 padding 到偶数对齐)。这套语法极其简洁:解码器读 8 byte 就知道这个 chunk 是什么、有多大、跳到哪;不认识的 chunk 直接 skip 不报错 —— 前向兼容靠这个机制实现。FORM / LIST / CAT 是几个特殊容器 chunk(payload 内嵌其它 chunk),其它的 BMHD / CMAP / BODY / GRAB / CRNG / CCRT / ANNO / AUTH 等都是 leaf chunk。整套机制后来被 PNG 抄了:PNG 的 IHDR / PLTE / IDAT / IEND chunk 系统(4-byte type + 4-byte length + data + 4-byte CRC)就是 IFF chunk 加了一个 CRC 校验字段。WebP 用 RIFF(IFF 的 little-endian 变种)更是直接继承。② ILBM 按 bitplane 而非 packed pixels 存储 —— 一行 320 像素 6-bit 颜色,packed pixels 存法是 320 × 6 bit / 8 = 240 byte,一行像素值连续;ILBM 存法是 6 张 320-bit(40-byte)bitmap 交错,每张 bitmap 上一个像素位置只放该像素值的某一位。这种"诡异"布局不是为了压缩,是为了跟 Amiga Denise 芯片的 bitplane DMA 硬件直接对齐 —— Denise 在每个像素时钟从 6 张 bitmap 同时取一位组成 6-bit 索引,然后查 CMAP 调色板得到 RGB。这是 1985 年定制硬件 + 文件格式协同设计的范例,跟 GIF 的 LZW(为通用 CPU 设计)完全不同的工程方向。今天 ILBM 只在 Amiga 模拟器(WinUAE / FS-UAE)和老游戏(Lemmings / Defender of the Crown / Shadow of the Beast)资产解码里用得到 —— 但 chunk-container 思想已经统治了一切。

IFF / ILBM's core, two pieces. ① chunk = 4-byte ASCII ID + 4-byte big-endian length + payload (payload padded to even-byte alignment). Brutally simple: read 8 bytes and the decoder knows what the chunk is, how big, and where to jump; unknown chunks are silently skipped — forward compatibility falls out of this mechanism. FORM / LIST / CAT are the special container chunks (their payload nests other chunks); everything else (BMHD / CMAP / BODY / GRAB / CRNG / CCRT / ANNO / AUTH ...) is a leaf chunk. PNG copied the idiom wholesale: PNG's IHDR / PLTE / IDAT / IEND system (4-byte type + 4-byte length + data + 4-byte CRC) is essentially IFF chunks plus a CRC field. WebP, built on RIFF (the little-endian variant of IFF), inherits even more directly. ② ILBM stores bitplanes rather than packed pixels — a 320-pixel row at 6-bit colour, packed-pixel storage is 320 × 6 bit / 8 = 240 bytes (pixel values laid out contiguously); ILBM stores it as six interleaved 320-bit (40-byte) bitmaps, each bitmap holding one bit of the pixel index. The "weird" layout isn't for compression — it's aligned with the Amiga Denise chip's bitplane DMA hardware: every pixel clock Denise reads one bit from each of the six bitmaps in parallel to assemble a 6-bit index, then a CMAP lookup gives RGB. A perfect 1985 example of custom hardware co-designed with the file format — the opposite engineering direction from GIF's LZW (designed for general-purpose CPUs). Today ILBM lives only in Amiga emulators (WinUAE / FS-UAE) and legacy game asset decoding (Lemmings / Defender of the Crown / Shadow of the Beast) — but the chunk-container idea has gone on to dominate everything.

适用

USE FOR

  • (已死)Amiga 应用程序图像 / 游戏资产
  • Amiga 模拟器(WinUAE / FS-UAE)
  • 计算机历史 / 复古游戏考古
  • 研究 chunk-container 设计的"原型样本"
  • (dead) Amiga application images / game assets
  • Amiga emulators (WinUAE / FS-UAE)
  • Computing history / retro game archaeology
  • Studying chunk-container design from its prototype

反适用

AVOID

  • 任何现代场景(用 PNG / WebP)
  • 非 Amiga 平台原生显示
  • 需要现代浏览器或社交媒体支持
  • 大色深 / 真彩照片(bitplane 布局对 24-bit 不友好)
  • Anything modern (use PNG / WebP)
  • Native display outside Amiga platforms
  • Anything needing modern browser / social-media support
  • High-depth / true-colour photos (bitplane layout fights 24-bit)
scopereaderseditorsCLI
IFF / ILBM (EA · Jerry Morrison) ImageMagick · 部分 GIMP 版本 · WinUAE / FS-UAE · libilbm ~ GIMP(部分版本)· DPaint(原生 Amiga)· ImageMagick 转换 iffinfo(legacy)· ilbmtoppm / ppmtoilbm(NetPBM)
起源:origin: Electronic Arts · Jerry Morrison · 1985 · 为 Amiga 1000 设计 设计 DNA 影响:DNA influence: PNG chunk 系统 · WebP RIFF 容器 · AIFF · ISOBMFF · 几乎所有 chunk-based 格式 同期同类:contemporaries of the same kind: Macintosh PICT · GIF(都是 1985-1987 年通用图像容器) 仍活在:still alive in: Amiga 模拟器 / 复古游戏资产 / NetPBM 工具链

QOI — 现代极简主义

QOI — modern minimalism

YEAR 2021 AUTHOR Dominic Szablewski (phoboslab) EXT .qoi MIME image/qoi STD spec 1 页 LOSSY 无损 DEPTH 8 / 16-bit/channel ALPHA STATUS 玩具 / 嵌入式 / 教学

"一个人,一个周末,写了一个比 PNG 简单 100 倍的格式。"

"One person, one weekend, made a format 100× simpler than PNG."

2021 年 Dominic Szablewski(phoboslab,知名 JS 游戏引擎 Impact / Q1K3 系列作者)做了一个反思:"为什么 PNG 这么复杂?LZ77 + Huffman + 5 种 filter + zlib 包装 + chunks 系统 + CRC32?能不能用一个周末写一个'够用'的无损图片格式?"答案是 QOI(Quite OK Image format):5 个简单 op(RGB / RGBA / INDEX / DIFF / RUN),编码器和解码器各 ~300 行 C 代码,速度比 PNG 快 50× 编码 / 3-4× 解码,体积大 5-10%。spec 一页 PDF 印得下。Dominic 把项目发到 Hacker News 后排第一,Reddit / Twitter 疯转,一周内 ffmpeg / ImageMagick / Rust crates / Go libraries 就接入了 QOI,phoboslab/qoi 单 repo 拿到 7000+ star。这是格式生态学里少见的"个人作品 vs 工业标准"现象 —— QOI 不会替代 PNG(浏览器零支持 + 体积更大),但它证明了"PNG 的复杂度其实很多是历史包袱,90% 用例不需要"。同时代 Farbfeld(2014 suckless)是同类哲学的另一个尝试,两者并存形成"现代极简主义"小流派。

In 2021 Dominic Szablewski (phoboslab, well-known author of the Impact JS engine and the Q1K3 game series) asked a sharp question: "Why is PNG so complex? LZ77 + Huffman + five filter types + zlib wrapping + chunks + CRC32 — could you write a 'good enough' lossless image format in a weekend?" The answer was QOI (Quite OK Image format): five simple ops (RGB / RGBA / INDEX / DIFF / RUN), encoder and decoder ~300 lines of C each, encoding ~50× faster and decoding 3-4× faster than PNG, files 5-10% larger. The spec fits on a one-page PDF. Dominic posted it on Hacker News and hit #1; Reddit / Twitter blew up; within a week ffmpeg / ImageMagick / Rust crates / Go libraries had QOI support, and phoboslab/qoi crossed 7000+ stars. A rare format-ecology event: a personal project versus an industrial standard — QOI won't displace PNG (zero browser support + larger files), but it proved "much of PNG's complexity is historical baggage; 90% of use cases don't need it." Farbfeld (2014, suckless) is a contemporary attempt in the same vein; the two coexist as the "modern minimalism" mini-school.

QOI · 5 OPS · 一行像素的编码示意 QOI_OP_RGB OP_RGBA OP_INDEX OP_DIFF OP_RUN 11111110 11111111 00xxxxxx 01xxxxxx 11xxxxxx + 3 byte + 4 byte hash idx ±2 dr/dg/db run 1-62 编码一行 6 像素 [R R R G B B] → RGB R0 RUN 2 DIFF RGB B RUN 1 QOI_OP_INDEX 引用最近 64 个像素的 hash 表: hash(r,g,b,a) = (r·3 + g·5 + b·7 + a·11) % 64
图 48 · QOI 5 个 op 标签表 + 一行像素的编码示意。① 5 个 op:RGB(8-bit prefix · 3 byte 像素)、RGBA(8-bit prefix · 4 byte 像素)、INDEX(2-bit prefix · 6-bit hash 表 idx)、DIFF(2-bit prefix · 6-bit ±2 颜色差)、RUN(2-bit prefix · 6-bit run 长度 1-62)。② 编码一行 6 像素 [R R R G B B] 流:先 RGB R0 写第一个红、RUN 2 重复 2 次、DIFF 跳到 G(假设差 ±2 内)、RGB 写新蓝色 B、RUN 1 重复 1 次。③ INDEX 引用最近 64 像素的 hash 表 —— 公式 (r·3 + g·5 + b·7 + a·11) % 64,极简但实测冲突率合理。
Fig 48 · QOI's five op tags and a sample one-row encoding. ① Five ops: RGB (8-bit prefix · 3-byte pixel), RGBA (8-bit prefix · 4-byte pixel), INDEX (2-bit prefix · 6-bit hash-table index), DIFF (2-bit prefix · 6-bit ±2 colour delta), RUN (2-bit prefix · 6-bit run length 1-62). ② Encoding a 6-pixel row [R R R G B B]: write the first red as RGB R0, then RUN 2, then DIFF for G (assuming the delta fits in ±2), then RGB for the new blue B, then RUN 1. ③ INDEX references a hash table of the last 64 pixels — the formula (r·3 + g·5 + b·7 + a·11) % 64 is dead simple but has a reasonable collision rate in practice.

技术内核

Technical core

QOI 内核四件事。① 极简 5 个 op:QOI_OP_RGB(prefix 8-bit + 3 byte 像素)、QOI_OP_RGBA(prefix 8-bit + 4 byte 像素)、QOI_OP_INDEX(prefix 2-bit + 6-bit 引用最近 64 像素 hash 表)、QOI_OP_DIFF(prefix 2-bit + 6-bit ±2 RGB 颜色差)、QOI_OP_RUN(prefix 2-bit + 6-bit run 长度 1-62)。整个码本就 5 个 op,跟 PNG 的 LZ77 + Huffman 比起来连"压缩"都谈不上 —— QOI 是用结构化预测(相邻像素相同 → RUN;相邻像素差 ±2 内 → DIFF;最近 64 像素曾出现 → INDEX)避免重复传输,LZ77 才是真正的字典压缩。② 编码 / 解码各只 ~300 行 C 代码 —— phoboslab 把整个 reference 实现做成单文件 header-only library `qoi.h`,500 行不到包括两套 API。对比之下 libpng + zlib 加起来 100k+ 行。③ 速度比 PNG 快 ~50×(编码)/ ~3-4×(解码)—— 因为没有 LZ77 字典查找、没有 Huffman 自适应、没有 5 种 filter 自适应预测;每个像素就是 1-5 byte 的 op + payload,编码器一遍扫,解码器也一遍扫,cache friendly 到极致。④ 体积比 PNG 大 ~5-10%—— 这是 QOI 的代价。在嵌入式 / 教学 / 不能依赖大型 codec lib 的场景里这点体积差完全可以接受,但 web 上下行带宽贵,所以 QOI 永远不会替代 PNG。Hacker News 上很多人不理解这一点 —— "为什么浏览器不集成?" 因为 web 是体积敏感的,QOI 是 CPU / 复杂度敏感的,目标用户群完全不同。

QOI's core, four pieces. ① Five minimal ops: QOI_OP_RGB (8-bit prefix + 3-byte pixel), QOI_OP_RGBA (8-bit prefix + 4-byte pixel), QOI_OP_INDEX (2-bit prefix + 6-bit reference into a hash table of the last 64 pixels), QOI_OP_DIFF (2-bit prefix + 6-bit ±2 RGB delta), QOI_OP_RUN (2-bit prefix + 6-bit run length 1-62). The whole codebook is five ops — compared to PNG's LZ77 + Huffman, you can barely call QOI "compression": it uses structured prediction (adjacent pixel identical → RUN; delta within ±2 → DIFF; one of the last 64 → INDEX) to avoid retransmitting redundant data; LZ77 is true dictionary compression. ② Encoder / decoder each ~300 lines of C — phoboslab ships the whole reference implementation as a header-only single file `qoi.h`, under 500 lines including both APIs. Compare libpng + zlib together: 100k+ lines. ③ ~50× faster encoding, ~3-4× faster decoding than PNG — no LZ77 dictionary lookup, no adaptive Huffman, no five-way filter prediction; each pixel becomes a 1-5 byte op + payload; the encoder scans once, the decoder scans once, cache-friendly to the extreme. ④ Files ~5-10% larger than PNG — QOI's cost. Acceptable in embedded / teaching / no-big-codec-lib scenarios; unacceptable on the web, where downlink bandwidth is expensive — which is why QOI will never displace PNG. Many on Hacker News didn't get this — "why don't browsers integrate it?" Because the web is size-sensitive while QOI is CPU- and complexity-sensitive; their target audiences barely overlap.

适用

USE FOR

  • 嵌入式 / IoT(SRAM 紧张,装不下 libpng)
  • 教学场景 · 让学生 1 周内写完一个无损图片格式
  • 游戏内部资产(载入速度比体积重要)
  • 命令行工具临时缓存(qoi 比 ppm / bmp 都好)
  • Embedded / IoT (tight SRAM, no room for libpng)
  • Teaching — students can write a complete lossless format in a week
  • Game internal assets (load speed beats file size)
  • CLI tool intermediate caches (better than ppm / bmp)

反适用

AVOID

  • Web(浏览器零支持 + 体积比 PNG 大)
  • 需要 progressive / interlace / 多帧动画
  • 需要色彩管理 / EXIF / ICC profile
  • 对体积敏感的存档场景(用 PNG / WebP)
  • Web (zero browser support + larger than PNG)
  • Anything needing progressive / interlaced / multi-frame
  • Anything needing colour management / EXIF / ICC profiles
  • Size-sensitive archival (use PNG / WebP)
scopereaderseditorsCLI
QOI (phoboslab) ImageMagick · ffmpeg · Rust qoi crate · Go qoi · Python qoi-py · phoboslab/qoi.h ~ ImageMagick / GIMP(插件)/ ffmpeg 转换 · 浏览器零支持 qoiconv in.png out.qoi · convert in.qoi out.png
起源:origin: Dominic Szablewski (phoboslab) · 2021 · "PNG 太复杂"的反思 设计动机:design motivation: 反思 PNG 的 LZ77 + Huffman + filter + zlib + chunks 复杂度 同时代极简主义:contemporary minimalism: Farbfeld(2014 suckless · 完全无压缩)· NetPBM(1980s · ASCII 极简) 现实定位:real position: 不会替代 PNG · 但证明"PNG 90% 复杂度其实是历史包袱"

Farbfeld — suckless 的 32 字节头

Farbfeld — suckless's 32-byte header

YEAR 2014 AUTHOR suckless.org EXT .ff COMPRESS 无 · 让 gzip 来管 DEPTH 16-bit RGBA STATUS suckless 圈子

"32 byte 头 + 16-bit BE RGBA,可被 gzip 替你压缩。"

"32-byte header + 16-bit BE RGBA — let gzip do the compression for you."

2014 年 suckless.org —— 那个推崇 dwm / dmenu / st / surf 的极简主义社区(口号"software that sucks less",拒绝任何"非必要"的功能)—— 做了 Farbfeld:一个完全无压缩的图像格式,32 byte 头部 + raw 16-bit big-endian RGBA。整个 spec 文档总共 11 行,比 PNG spec(200+ 页)短 4 个数量级。设计哲学就一句话:"压缩是 gzip 的事,不是图像格式的事。"于是 Unix 管道用得很好:png2ff in.png | gzip > out.ff.gz 就能存档,cat in.ff.gz | gunzip | ff2png > out.png 就能恢复;每个工具只做一件事(do one thing well),完全是 Unix 哲学的复刻。学术 / suckless 圈子里这种极简哲学很受欢迎,但生产几乎无人用 —— 没有压缩(gzip 替代品)、没有 alpha 行为标准、没有 metadata、没有色彩管理、没有 progressive。Farbfeld 跟同时代的 QOI(2021)是"现代极简主义"双子星:QOI 是"五个 op + ~300 行解码器",Farbfeld 是"32 byte 头 + 0 行解码器(就是 raw bytes)"—— 走得比 QOI 更远,但也更不实用。

In 2014 suckless.org — the minimalism community behind dwm / dmenu / st / surf, motto "software that sucks less", famous for refusing anything "non-essential" — released Farbfeld: a fully uncompressed image format, 32-byte header + raw 16-bit big-endian RGBA. The whole spec document is 11 lines — four orders of magnitude shorter than PNG's 200+ page spec. The design philosophy is one sentence: "compression is gzip's job, not the image format's." So Unix pipes work beautifully: png2ff in.png | gzip > out.ff.gz for archive, cat in.ff.gz | gunzip | ff2png > out.png to restore; each tool does one thing well — pure Unix philosophy reimplemented. The academic / suckless circle adores this minimalism, but production usage is essentially zero — no compression (gzip stands in), no defined alpha semantics, no metadata, no colour management, no progressive. Farbfeld and the contemporaneous QOI (2021) are the twin stars of "modern minimalism": QOI is "five ops + ~300-line decoder", Farbfeld is "32-byte header + zero-line decoder (literally raw bytes)" — going further than QOI, and even less practical.

FARBFELD · FILE LAYOUT · 11 行 spec MAGIC WIDTH HEIGHT PIXEL STREAM 8 byte 4 byte BE 4 byte BE w·h × 8 byte "farbfeld" uint32 uint32 16-bit BE RGBA × N 每像素 8 byte(16-bit big-endian per channel): R 16-bit G 16-bit B 16-bit A 16-bit ↑ 完全无压缩 · 压缩是 gzip 的事
图 49 · Farbfeld 文件布局 — 32 byte 固定头(8 byte magic "farbfeld" + 4 byte 宽 BE + 4 byte 高 BE + 16 byte 保留)+ 像素流(每像素 8 byte = 4 channel × 16-bit BE = R / G / B / A)。完全无压缩。整个 spec 11 行写完(magic / width / height / 像素流定义 / 颜色空间约定),官方 site 上一页就能看完。设计哲学:"压缩是 gzip 的事,不是图像格式的事" —— 跟 PNG 内置 zlib / WebP 内置 VP8 完全相反。
Fig 49 · Farbfeld file layout — a fixed 32-byte header (8-byte magic "farbfeld" + 4-byte BE width + 4-byte BE height + 16 reserved) + pixel stream (each pixel 8 bytes = 4 channels × 16-bit BE = R / G / B / A). Fully uncompressed. The entire spec fits in 11 lines (magic / width / height / pixel stream definition / colour-space convention) — readable on a single page on the official site. Design philosophy: "compression is gzip's job, not the image format's" — the polar opposite of PNG bundling zlib or WebP bundling VP8.

技术内核

Technical core

Farbfeld 内核两件事。① 头部固定 32 byte:8 byte ASCII magic "farbfeld"(注意是小写,跟 PNG 0x89PNG 不同 —— suckless 觉得 magic byte 里塞 high-bit 是过度工程)+ 4 byte big-endian uint32 width + 4 byte big-endian uint32 height + 16 byte 保留。spec 没说保留区做什么,因为不需要—— 任何扩展应该靠 gzip 包装外面的 metadata 文件,而不是格式内部。② 像素流 16-bit big-endian RGBA × (width × height),完全无压缩 —— 设计哲学是"压缩是 gzip 的事,不是图像格式的事"。所以一张 1920×1080 的 Farbfeld 文件原始大小就是 32 + 1920 × 1080 × 8 = 16.59 MB,gzip 之后大概 2-5 MB(取决于内容);PNG 同图像大概 200 KB-2 MB,WebP 大概 100 KB-1 MB。Farbfeld 在体积上完全打不过 —— 但它的 spec 短(11 行)、解码器短(0 行,因为就是 raw bytes,memcpy 就完事)、跟 Unix 管道完美兼容(每个 pipeline stage 只做一件事:png2ff 转入,gzip 压缩,网络传输,gunzip 解压,ff2png 转出)。这是哲学上的图像格式,不是工程上的图像格式。生产用不了,但教 Unix 哲学课的时候完美样本。

Farbfeld's core, two pieces. ① Fixed 32-byte header: 8 bytes of ASCII magic "farbfeld" (note lowercase — different from PNG's 0x89PNG; suckless considers high-bit-in-magic over-engineering) + 4-byte big-endian uint32 width + 4-byte big-endian uint32 height + 16 reserved bytes. The spec doesn't say what reserved bytes are for, because they don't need to be — any extension should be a separate metadata file wrapped in the same gzip, not inside the format. ② Pixel stream: 16-bit big-endian RGBA × (width × height), completely uncompressed — the design philosophy is "compression is gzip's job, not the image format's." So a 1920×1080 Farbfeld file is literally 32 + 1920 × 1080 × 8 = 16.59 MB raw; gzip takes it down to 2-5 MB depending on content; PNG of the same image is ~200 KB-2 MB; WebP ~100 KB-1 MB. Farbfeld loses on size every time — but its spec is short (11 lines), its decoder is zero lines (raw bytes, memcpy and you're done), and it's perfect with Unix pipelines (each stage does one thing: png2ff converts in, gzip compresses, network transfers, gunzip decompresses, ff2png converts out). It's a philosophical image format, not an engineering one. Useless in production, perfect for teaching Unix philosophy.

适用

USE FOR

  • Unix 管道临时缓存(png2ff | gzip)
  • suckless 哲学的练习 / 教学样本
  • 极简图像处理工具链(每段 pipeline 各司其职)
  • 科研里需要完全确定字节布局的场景
  • Unix pipeline intermediate caches (png2ff | gzip)
  • suckless-philosophy exercises / teaching samples
  • Minimalist image-processing toolchains (one job per stage)
  • Research scenarios needing fully deterministic byte layout

反适用

AVOID

  • 几乎任何实际场景(用 PNG / WebP / AVIF)
  • Web(浏览器零支持 + 体积巨大)
  • 需要 metadata / 色彩管理 / EXIF
  • 需要 8-bit RGB(Farbfeld 强制 16-bit RGBA,常见 8-bit 输入要 expand 一倍体积)
  • Almost any real-world scenario (use PNG / WebP / AVIF)
  • Web (zero browser support + massive size)
  • Anything needing metadata / colour management / EXIF
  • Common 8-bit RGB input (Farbfeld forces 16-bit RGBA — doubles the size)
scopereaderseditorsCLI
Farbfeld (suckless.org) suckless farbfeld utils · NetPBM(部分版本)· ImageMagick ~ 任意能写 16-bit BE RGBA 的工具(几行 C 就能写) png2ff · ff2png · jpg2ff · pamtoff(NetPBM)
起源:origin: suckless.org · 2014 · "software that sucks less" 哲学 设计动机:design motivation: Unix 哲学 + "压缩是 gzip 的事"分层设计 同代极简主义双子星:contemporary minimalism twins: QOI(5 op + ~300 行解码器)· Farbfeld(0 行解码器 · raw bytes)· NetPBM(1980s 同哲学先驱) 现实定位:real position: 生产几乎无人用 · 但 Unix 哲学的活样本

GeoTIFF — 卫星图像的字面"地球图"

GeoTIFF — TIFF that literally maps Earth

YEAR 2000 (1.0) · 2019 OGC GeoTIFF 1.1 AUTHOR USGS / OGC / 各遥感机构 EXT .tif / .tiff (with GeoTIFF tags) BASE 基于 TIFF 6.0 STD OGC 19-008r4 DEPTH 8 / 16 / 32-bit int + float BANDS 多波段(RGB / NIR / SWIR / Thermal …) STATUS 遥感 / GIS 行业唯一

"TIFF 加 6 个 tag,就成了卫星图像的字面意思的'地球图'。"

"Six extra tags turn TIFF into a literal 'image of Earth'."

1990 年代末,卫星遥感(Landsat、SPOT、Sentinel)生成的图像数量爆炸式增长,带来一个共性问题:像素本身只是亮度,但图像必须能告诉下游"第 (1024, 768) 个像素在地球上是哪一点经纬度、用什么投影、用什么 datum"——否则它就只是一张漂亮的灰阶,做不了 GIS 分析。USGS 联合一批遥感机构在 TIFF 6.0 之上加了 6 个核心 GeoKey,把"像素 → 大地坐标"的映射元数据标准化,并在 2000 年正式发布 GeoTIFF 1.0。OGC 在 2019 年把它升级为 GeoTIFF 1.1 国际标准。结果是:卫星图像、航空摄影、数字高程模型(DEM)、土地利用图全部默认 GeoTIFF;GDAL(几乎所有 GIS 软件的底层 I/O 库)、QGIS、ArcGIS 是工具链命脉;NASA Worldview、Sentinel Hub、Google Earth Engine 内部也走 GeoTIFF。

By the late 1990s, satellite remote sensing (Landsat, SPOT, Sentinel) was producing images at industrial scale, all sharing one problem: a pixel is just a brightness value, but the image must tell downstream "where on Earth is pixel (1024, 768)? in what projection? on what datum?" — otherwise it's just a pretty greyscale, useless for GIS analysis. USGS and a coalition of remote-sensing agencies layered six core GeoKeys on top of TIFF 6.0 to standardise the "pixel → geographic coordinate" mapping metadata, and shipped GeoTIFF 1.0 in 2000. OGC promoted it to international standard GeoTIFF 1.1 in 2019. The result: satellite imagery, aerial photography, digital elevation models (DEMs) and land-use maps all default to GeoTIFF; GDAL (the I/O backbone of nearly every GIS app), QGIS and ArcGIS form the toolchain spine; NASA Worldview, Sentinel Hub and Google Earth Engine all consume GeoTIFF internally.

GEOTIFF · TIFF + 6 GEOKEY · MULTI-BAND TIFF 6.0 IFD · tags band 1 · R band 2 · G band 3 · B band 4 · NIR ModelPixelScaleTag ModelTiepointTag ModelTransformationTag GeoKeyDirectoryTag GeoDoubleParamsTag GeoAsciiParamsTag lat / lon · datum · proj pixel (col, row) ─ ModelTiepoint + ModelPixelScale → (lat, lon) on chosen ellipsoid TIFF 6.0 容器零修改 · GeoKey 寄生在私有 tag 域(34735 / 34736 / 34737) 多波段 · band 1-3 是 RGB · band 4 是近红外(NDVI 算植被靠它)
图 50 · GeoTIFF = TIFF 6.0 容器 + 6 个 GeoKey tag + 多波段。左:TIFF 容器内的 IFD 串挂着 N 个波段(R / G / B / NIR / SWIR / Thermal …);右上:6 个核心 GeoKey 寄生在 TIFF 私有 tag 域(34735 GeoKeyDirectory / 34736 GeoDoubleParams / 34737 GeoAsciiParams),通过 ModelPixelScale + ModelTiepoint 把任意像素 (col, row) 映射到 (lat, lon) 在指定 datum / projection 上的真实地理坐标;右:地球经纬网格代表"像素终点"。整套机制对 TIFF 阅读器完全向后兼容 —— 不认识 GeoKey 的工具仍能把 .tif 当普通 TIFF 打开。
Fig 50 · GeoTIFF = TIFF 6.0 container + six GeoKey tags + multi-band. Left: an IFD chain in the TIFF container holding N bands (R / G / B / NIR / SWIR / Thermal …); top right: six core GeoKeys parasitise three TIFF private tag slots (34735 GeoKeyDirectory / 34736 GeoDoubleParams / 34737 GeoAsciiParams), and via ModelPixelScale + ModelTiepoint map any pixel (col, row) to (lat, lon) on a chosen datum / projection; right: a lat/lon globe grid stands for the "pixel destination". The whole scheme is fully backward-compatible — a TIFF reader that doesn't know GeoKey can still open the .tif as a plain TIFF.

技术内核

Technical core

GeoTIFF 内核三件事。① 6 个核心 GeoKey tag:ModelTransformationTag(4×4 仿射矩阵,完整描述像素到地理坐标的线性变换)/ ModelTiepointTag(若干"控制点对",每对是像素位置地理坐标,适合非线性场景)/ ModelPixelScaleTag(每像素代表多少经纬度 / 米)/ GeoKeyDirectoryTag(主索引,记录所有 GeoKey 的 ID + value 偏移)/ GeoDoubleParamsTag(浮点参数表)/ GeoAsciiParamsTag(ASCII 字符串表,存投影名 / datum 名)。它们寄生在 TIFF 私有 tag 域 34735-34737,因此对不识别 GeoKey 的 TIFF 阅读器完全向后兼容 —— 这是 GeoTIFF 设计最聪明的地方。② 像素 → 地理坐标映射:典型组合是 ModelTiepoint(标定原点)+ ModelPixelScale(每像素单位),引擎按 lon = tiepoint.lon + col × scale.x / lat = tiepoint.lat - row × scale.y 反算;复杂场景上 ModelTransformationTag 直接给 4×4 矩阵。Datum / 投影通过 EPSG 代码引用(EPSG:4326 = WGS84,EPSG:3857 = Web Mercator)。③ 多波段(multi-band):一张 GeoTIFF 通常不止 RGB,Landsat 8 有 11 个 band(可见光 + 近红外 NIR + 短波红外 SWIR + 热红外 + 全色 panchromatic),Sentinel-2 有 13 个,科学家用 NIR - Red 算 NDVI(归一化植被指数)、SWIR 看含水量、Thermal 看地表温度。这些 band 全部走 TIFF 标准的 SamplesPerPixel + BitsPerSample 机制存,互不打扰。

GeoTIFF's core, three pieces. ① Six core GeoKey tags: ModelTransformationTag (a 4×4 affine matrix giving the full linear pixel → coordinate transform) / ModelTiepointTag (control-point pairs, each pixel positiongeographic coordinate, for non-linear cases) / ModelPixelScaleTag (lat/lon or metres per pixel) / GeoKeyDirectoryTag (the master index — every GeoKey's ID + value offset) / GeoDoubleParamsTag (floating-point parameter table) / GeoAsciiParamsTag (ASCII string table for projection / datum names). They live in TIFF private tag slots 34735–34737, so any TIFF reader that doesn't know GeoKey is fully backward-compatible — the cleverest part of the design. ② Pixel → coordinate mapping: the typical combination is ModelTiepoint (anchor) + ModelPixelScale (per-pixel unit) — the engine inverts lon = tiepoint.lon + col × scale.x / lat = tiepoint.lat - row × scale.y; for complex cases ModelTransformationTag carries a 4×4 matrix directly. Datum / projection are referenced via EPSG codes (EPSG:4326 = WGS84, EPSG:3857 = Web Mercator). ③ Multi-band: a GeoTIFF rarely stops at RGB — Landsat 8 has 11 bands (visible + NIR + SWIR + thermal IR + panchromatic), Sentinel-2 has 13; scientists compute NDVI (vegetation index) from NIR - Red, see water content with SWIR, surface temperature with thermal IR. All bands ride the standard TIFF SamplesPerPixel + BitsPerSample mechanism, no extra plumbing.

适用

USE FOR

  • 卫星图像(Landsat / Sentinel / SPOT / 高分系列)
  • 航空摄影 / 无人机正射影像
  • 数字高程模型(DEM / DSM / DTM)
  • 土地利用图 / 植被覆盖图 / 气象格点
  • Cloud Optimized GeoTIFF(COG)做云端流式读
  • Satellite imagery (Landsat / Sentinel / SPOT / Gaofen)
  • Aerial photography / UAV orthophotos
  • Digital elevation models (DEM / DSM / DTM)
  • Land-use / vegetation / weather grid maps
  • Cloud Optimized GeoTIFF (COG) for streaming from object storage

反适用

AVOID

  • 任何非地理场景(普通照片用 JPEG / WebP)
  • Web 直接展示(浏览器不解 GeoTIFF · 需服务端切瓦片)
  • 对 metadata 不敏感的临时图像处理
  • Anything non-geographic (use JPEG / WebP for ordinary photos)
  • Direct browser display (no native GeoTIFF — server-side tile a WMS / XYZ)
  • Throwaway image work that doesn't care about metadata
scopereaderseditorsCLI
GeoTIFF (OGC) ✓✓ GDAL · QGIS · ArcGIS · ENVI · ERDAS · rasterio (Python) · sf / terra (R) · OpenLayers / Leaflet (经服务端切瓦片) QGIS · ArcGIS · GlobalMapper · 任意基于 GDAL 的 GIS 软件 gdalinfo in.tif · gdal_translate · gdalwarp · rio info
起源:origin: USGS / OGC · 2000 GeoTIFF 1.0 · 2019 OGC 19-008r4 升 1.1 基于:based on: TIFF 6.0 · 私有 tag 域 34735-34737 寄生 6 个 GeoKey 同代邻居:contemporary neighbour: NITF(军用 / 国防) · COG(Cloud Optimized GeoTIFF · 2018 云原生变种) 现实定位:real position: 遥感 / GIS 行业唯一标准 · GDAL 是事实通用 I/O 后端

NITF — 军用情报图像

NITF — military intelligence imagery

YEAR 1987 (NITF 1.0) · 2005 (NITF 2.1 · 现役) AUTHOR US DoD / NGA EXT .ntf · .nitf STD MIL-STD-2500C(800+ 页) PAYLOAD JPEG / JPEG 2000 / 无压缩 SECURITY 强制 classification tag STATUS 美国国防部 / 北约

"军方版 GeoTIFF,加上一堆 'security classification' 标记。"

"Military GeoTIFF plus a pile of security classification tags."

1980 年代,美国国防部需要一个"统一的情报图像格式" —— 卫星侦察、航空侦察、地面侦察、目标识别、地图、火控影像都要互通,且要带军方专用的元数据。1987 年发布 NITF 1.0(National Imagery Transmission Format),1998 年升 NITF 2.0,2005 年定 NITF 2.1 / MIL-STD-2500C,即今天的现役版本(同步对外发布的 ISO/IEC 12087-5 国际标准基本与之等价)。设计目标包括:(a) 多 segment 文件 —— 一个 .ntf 可以同时装多张图、多张文本注释、多张图形覆盖物(graphic / overlay)、已知地标(LUT 类);(b) 强制 security classification —— 每个 segment 有 2 字节的密级标记(U=Unclassified / C=Confidential / S=Secret / T=Top Secret),还有控制释放范围的 NOFORN / REL TO 等代号;(c) 支持 JPEG / JPEG 2000 / 无压缩的 payload。MIL-STD-2500C 全文 800 多页,涵盖从"怎么标记机密"到"怎么嵌入手绘图标"的全套军用流程。商业领域几乎没人用 —— 它是国防 + 北约 + 部分国家测绘局的内部语言。

In the 1980s the US Department of Defense needed a single intelligence-imagery container — satellite recon, aerial recon, ground recon, target identification, maps and fire-control imagery had to interoperate and carry military-specific metadata. NITF 1.0 (National Imagery Transmission Format) shipped in 1987, NITF 2.0 in 1998, and NITF 2.1 / MIL-STD-2500C in 2005 — the version still in service (ISO/IEC 12087-5, published in parallel, is essentially equivalent). Design goals: (a) multi-segment file — one .ntf can hold multiple images, text annotations, graphic / overlay layers and known-landmark tables (LUT-class); (b) mandatory security classification — every segment carries a two-byte clearance marker (U = Unclassified / C = Confidential / S = Secret / T = Top Secret) plus distribution caveats like NOFORN or REL TO; (c) JPEG / JPEG 2000 / uncompressed payload. MIL-STD-2500C runs over 800 pages, covering everything from "how to tag classification" to "how to embed hand-drawn icons". Commercial use is essentially zero — NITF is the internal language of the US DoD, NATO and a few national mapping agencies.

NITF 2.1 · MULTI-SEGMENT FILE LAYOUT FILE HEADER · 388 byte · CLAS = TOP SECRET / NOFORN IMAGE SEGMENT #1 · JPEG 2000 · 1024×1024 · 8-bit CLAS=S IMAGE SEGMENT #2 · 多光谱 · 4 band · 16-bit CLAS=T GRAPHIC SEGMENT · 手绘标记 / 圆圈 / 箭头 CLAS=S TEXT SEGMENT · 分析员注记 / 释义 CLAS=T 每段独立密级 · 文件总密级 = max(各 segment 密级)
图 51 · NITF 2.1 文件 segment 列表。一个 .ntf 文件由 file header(388 byte 起,固定字段 + 文件级 classification)+ N×image segment(每段独立 payload:JPEG / JPEG 2000 / 无压缩 · 可多波段)+ N×graphic segment(覆盖物 · 手绘标记 · 圆圈 / 箭头 / 标尺)+ N×text segment(分析员注记 / 释义)组成。每个 segment 头部都有 2 byte 的 CLAS(security class)字段:U / C / S / T。文件总密级取所有 segment 中的最高值;读者只看到自己有权限的段。
Fig 51 · NITF 2.1 multi-segment file layout. A .ntf is a file header (≥ 388 bytes — fixed fields + file-level classification) + N image segments (each with its own payload: JPEG / JPEG 2000 / uncompressed, multi-band possible) + N graphic segments (overlays — hand-drawn marks, circles, arrows, scales) + N text segments (analyst notes / interpretations). Every segment header carries a two-byte CLAS (security class) field: U / C / S / T. The file's overall classification equals the max across segments; readers see only the segments their clearance permits.

技术内核

Technical core

NITF 内核三件事。① 多 segment 容器:一个 .ntf 文件 = 一个 file header(388 byte 起,固定字段)+ N 个 image segment + N 个 graphic segment + N 个 text segment + N 个 reserved extension segment(给 NGA 或第三方扩展用)。每个 segment 是独立单元 —— 自己的 header、自己的 payload、自己的密级。这跟 GeoTIFF 一个文件一张图的设计思路完全不同 —— NITF 一个文件就是一份"情报包"。② 强制 security classification:每个 segment 的 header 里都有 2 byte CLAS 字段(U=Unclassified / C=Confidential / S=Secret / T=Top Secret);此外还有 control caveats(NOFORN=No Foreign Nationals / REL TO XXX=Releasable To 列表 / ORCON=Originator Controlled 等)。文件总密级 = 所有 segment 密级的最高值;读取系统按读者权限逐段 redact —— 你看到的可能是同一份 .ntf 但里面只有 2 个 segment,其它的被空白替代。③ 支持多种 payload:image segment 的实际像素数据可以是无压缩 raw、JPEG(legacy)、JPEG 2000(主流 · 因为 JP2 的 progressive + ROI + 任意分辨率层级正好契合"先看缩略图再放大看局部"的情报场景)、Vector Quantization(VQ,1990s 老格式)。NITF 也支持嵌入 GeoTIFF 风格的地理元数据(通过专门的 ICHIPB / RPC00B 之类 TRE,Tagged Record Extensions),所以一张 NITF 同时是图、是地理参考、是密级文档

NITF's core, three pieces. ① Multi-segment container: a .ntf is one file header (≥ 388 bytes, fixed fields) + N image segments + N graphic segments + N text segments + N reserved-extension segments (for NGA or third-party extensions). Each segment is independent — its own header, its own payload, its own classification. The opposite of GeoTIFF's "one file, one image" model — NITF is "one file, one intelligence package". ② Mandatory security classification: each segment header carries a two-byte CLAS field (U = Unclassified / C = Confidential / S = Secret / T = Top Secret); plus control caveats (NOFORN = No Foreign Nationals / REL TO XXX = Releasable To list / ORCON = Originator Controlled, etc.). File-level classification = max across segments; reading systems redact per segment by clearance — your copy of the .ntf may show only two segments, the rest blanked. ③ Multiple payload types: image segments can carry uncompressed raw, JPEG (legacy), JPEG 2000 (mainstream — JP2's progressive + ROI + arbitrary resolution layers fit the "thumbnail first, zoom into a region" intelligence workflow perfectly), or Vector Quantization (VQ, an older 1990s codec). NITF also embeds GeoTIFF-style geo metadata via dedicated TREs (Tagged Record Extensions) such as ICHIPB / RPC00B — so a single .ntf is at once an image, a geo-reference and a classified document.

适用

USE FOR

  • 美国国防部 / NGA / 北约相关情报图像
  • 需要文件内多 segment 不同密级混存的场景
  • 侦察图像 + 分析员注记 + 覆盖物一体化交付
  • US DoD / NGA / NATO intelligence imagery
  • Mixed-classification segments inside a single file
  • Recon image + analyst notes + overlays as one package

反适用

AVOID

  • 任何民用场景(用 GeoTIFF)
  • 不需要 classification metadata 的工作流
  • 商业 GIS / web 地图(几乎没工具能直接渲染 NITF)
  • Anything civilian (use GeoTIFF)
  • Workflows that don't need classification metadata
  • Commercial GIS / web mapping (almost nothing renders NITF directly)
scopereaderseditorsCLI
NITF (US DoD / NGA / NATO) ~ 限国防工具链 · GDAL 部分支持 · ESRI ArcGIS for Defense · Hexagon ERDAS · BAE GXP ~ ArcGIS for Defense · ENVI · Hexagon ERDAS · 其它需许可证 nitfutils(NGA 官方) · gdalinfo --formats | grep NITF · gdal_translate -of NITF
起源:origin: US DoD · 1987 NITF 1.0 → 1998 2.0 → 2005 2.1 / MIL-STD-2500C 设计借鉴:design borrows from: TIFF 私有 tag · GeoTIFF 地理参考 · JPEG 2000 progressive 解码 同代邻居:contemporary neighbour: GeoTIFF(民用对应物) · ISO/IEC 12087-5(等价国际标准) 现实定位:real position: 美国国防 / 北约的情报图像通用语 · 商业领域不可见

FITS — 天文学的"什么都装"

FITS — astronomy's "one format to hold the universe"

YEAR 1981(首版)· 1988 IAU 标准 · 至今多次扩展 AUTHOR NASA / NRAO + IAU FITS Working Group EXT .fits · .fit · .fts MIME image/fits · application/fits STD IAU FITS 4.0(2018) LOSSY 多 tile compression(Rice / GZIP / PLIO / HCOMPRESS) DEPTH 8-64 bit int + 32 / 64-bit float ALPHA 一般无 STATUS 天文唯一标准 · 40+ 年未替代

"它是图,是表格,是光谱,是任何天文数据 —— '一种格式装下宇宙'。"

"It's an image, a table, a spectrum — anything astronomical. 'One format to hold the universe'."

1981 年,Don Wells、Eric Greisen、Ronald Harten 等几位射电 / 光学天文学家在《Astronomy & Astrophysics》期刊上发表论文,提议一种"统一的天文数据格式":FITS = Flexible Image Transport System。痛点是当时美国国家光学天文台(NOAO)、国家射电天文台(NRAO)、欧南台(ESO)、各高校观测站都在用各自不兼容的二进制格式,数据交换得反复转换,且磁带寄送是常态(那个年代没有互联网,数据靠 9 轨磁带跨大洲邮寄)。设计目标:(a) 高位深图像(8-64 bit int / 32-64 bit float),因为 CCD 输出动辄 16-bit、长曝叠加后是 32-bit;(b) 多波段 / 多维数据立方体(空间 X/Y + 波长 Z 三维,甚至时间 T 第四维);(c) 表格存观测元数据(曝光时间、滤镜、坐标、温度、读出噪声等数百个字段);(d) 跨望远镜兼容,且自描述(读它的人不需要望远镜手册也能解读)。1988 年 IAU(国际天文联合会)正式认可 FITS 为天文数据交换标准。至今 40 余年未被替代 —— 因为天文需要严格的可读性、自描述性、跨工具一致性、长期归档能力,这些目标 HDF5 / NetCDF 等更现代格式都做不到比 FITS 更好的平衡。astropy 是 Python 天文社区的事实标准,from astropy.io import fits 是天文程序员的"hello world"。

In 1981, Don Wells, Eric Greisen and Ronald Harten — radio and optical astronomers — published a paper in Astronomy & Astrophysics proposing a unified astronomical data format: FITS = Flexible Image Transport System. The pain: the US NOAO (optical), NRAO (radio), ESO and university observatories all used incompatible binary formats; exchange meant constant conversion, and magnetic-tape shipping was the norm (no Internet — data travelled the world on 9-track tapes). Design goals: (a) high bit-depth images (8-64 bit int, 32-64 bit float — CCDs produce 16-bit, long-exposure stacks reach 32-bit); (b) multi-band / multi-dimensional data cubes (spatial X/Y + wavelength Z, or even a time axis T); (c) tabular metadata (exposure, filter, coordinates, temperature, read-noise — hundreds of fields); (d) cross-telescope, self-describing (a reader needs no instrument handbook). The IAU formally adopted FITS in 1988. Forty-plus years on, no replacement has stuck — modern formats like HDF5 / NetCDF can't beat FITS's balance of human-readability, self-description, cross-tool consistency and long-term archival. astropy is the de-facto Python astronomy library, and from astropy.io import fits is the astronomer-programmer's "hello world".

FITS · HDU CHAIN · PRIMARY + EXTENSIONS PRIMARY HDU · SIMPLE = T · 主图像 1024×1024 float32 EXTENSION 1 · XTENSION = IMAGE · mask · 1024×1024 uint8 EXTENSION 2 · XTENSION = IMAGE · 误差图 float32 EXTENSION 3 · XTENSION = BINTABLE · 源星表(N 行 × M 列) 每个 HDU = ASCII header 卡组(80 byte/卡 · 36 卡/块)+ 二进制 data block(2880 byte 对齐)
图 52a · FITS 的 HDU(Header Data Unit)链。一个 .fits 文件由一个 Primary HDU(必含)+ N 个 Extension HDU(可选)线性串联。Primary HDU 通常装主图像;Extension 可以是 IMAGE(2D / 3D 图)、BINTABLE(二进制表 · 例如源星表)、TABLE(ASCII 表)。每个 HDU 都自带 ASCII header(卡片格式 80 byte/卡)+ 二进制 data block(2880 byte 块对齐 · 这数字是 1980s 9 轨磁带的物理记录长度遗产)。这种结构让"主图 + mask + error map + 源表"可以放进一个文件,这是为什么 NASA 的 JWST / Hubble pipeline 一个观测就生成一个 .fits。
Fig 52a · FITS HDU (Header Data Unit) chain. A .fits file is a single Primary HDU (mandatory) followed by N Extension HDUs (optional) in a linear chain. The Primary HDU usually holds the main image; Extensions can be IMAGE (2D / 3D), BINTABLE (binary table — e.g. a source catalog), or TABLE (ASCII table). Every HDU carries its own ASCII header (80-byte cards) and a binary data block aligned to 2880 bytes (a number inherited from the physical record length of 1980s 9-track magnetic tape). The structure lets "main image + mask + error map + source table" live in a single file — which is why NASA's JWST / Hubble pipelines emit one .fits per observation.
FITS · 80-BYTE ASCII HEADER CARD EXPTIME = 120.0000 / exposure time (sec) KEYWORD = VALUE / COMMENT byte 1 8 9 30 80 大写 ASCII · 全 80 byte 不可缩短 · 不足空格右补齐 36 张卡 = 1 块 = 2880 byte(磁带块大小)· 末卡 END 占位 → 你可以用文本编辑器看见 FITS 的整段元数据
图 52b · FITS 80 byte ASCII header card 格式。byte 1-8 = KEYWORD(大写)/ byte 9 = '='/ byte 11-30 = VALUE / byte 31 = '/'/ byte 32-80 = COMMENT。全部大写 ASCII,不足部分用空格右补齐到 80 byte。36 张卡构成 1 个 2880 byte 块(继承 1980s 磁带物理记录长度),最后一张卡用 END 标记 header 结束。最神奇的是:你可以直接用 head -c 2880 image.fits 看到一段可读的元数据 —— 几十年的天文文件都能用文本编辑器"瞄一眼"。
Fig 52b · FITS 80-byte ASCII header card. Bytes 1–8 = KEYWORD (uppercase) / 9 = '=' / 11–30 = VALUE / 31 = '/' / 32–80 = COMMENT. All uppercase ASCII, padded with trailing spaces to 80 bytes. Thirty-six cards form a 2880-byte block (inherited from 1980s magnetic-tape record length); the last card carries END to mark header termination. The magic: you can head -c 2880 image.fits and read the metadata in plain text — decades-old astronomy files are still text-editor inspectable.
WCS · PIXEL (col,row) → SKY (RA, Dec) pixel grid (col=512, row=384) CRPIX / CRVAL CDELT / CD matrix CTYPE = 'RA---TAN' celestial sphere RA=12h34m · Dec=+45°
图 52c · WCS(World Coordinate System)像素 → 天球映射。FITS header 里的 CRPIX1 / CRPIX2(参考像素位置)、CRVAL1 / CRVAL2(参考像素的 RA / Dec)、CDELT1 / CDELT2CDi_j 矩阵(每像素的角秒)、CTYPE1 = 'RA---TAN'(投影类型,常见 TAN / SIN / ARC / ZEA)共同定义了像素到天球的可逆函数。这套机制由 Greisen & Calabretta 在 2002 年的两篇里程碑论文里完整化,所有现代天文软件(ds9 · astropy.wcs · IDL)都按这套读 WCS。
Fig 52c · WCS (World Coordinate System) — pixel → sky mapping. Header keywords CRPIX1 / CRPIX2 (reference-pixel position), CRVAL1 / CRVAL2 (RA / Dec at the reference pixel), CDELT1 / CDELT2 or the CDi_j matrix (arcsec per pixel) and CTYPE1 = 'RA---TAN' (projection — typically TAN / SIN / ARC / ZEA) together define an invertible pixel-to-sky function. The framework was completed in two landmark Greisen & Calabretta papers (2002), and every modern astronomy tool (ds9 · astropy.wcs · IDL) reads WCS the same way.
FITS DATA CUBE · X / Y / λ · NAXIS = 3 X (RA) Y (Dec) λ (wavelength) 每片 = 一个波长的图像 NAXIS=3 · NAXIS1=X · NAXIS2=Y · NAXIS3=λ
图 52d · FITS 数据立方体(data cube)— 同一份 NAXIS=3 的 FITS 数据集,沿空间 X(RA)/ Y(Dec)+ 第三轴 λ(波长)三维索引。每"切片"是同一片天空在某个特定波长下的图像;沿 λ 方向取一个像素就得到该天体的光谱。IFU(integral field unit)光谱仪、射电干涉阵的频谱立方体都是这种数据;再加一个时间轴 T 就成 NAXIS=4 的"动态光谱立方体"。这是 FITS 一个文件就能装下"任何天文数据"承诺的核心 —— 没有"只支持 2D 图像"的限制。
Fig 52d · FITS data cube — a single NAXIS=3 dataset indexed along spatial X (RA) / Y (Dec) + third axis λ (wavelength). Each "slice" is the same patch of sky at one wavelength; reading a single pixel along λ yields that object's spectrum. IFU (integral-field unit) spectrographs and radio interferometer spectral cubes both produce this. Add a time axis T and you get NAXIS=4 "dynamic spectral cubes". This is the heart of FITS's promise to carry "any astronomical data" — there's no "2D-image only" limit.

技术内核

Technical core

FITS 内核六块。① HDU(Header Data Unit)链:一个 .fits 文件 = Primary HDU(必含)+ N 个 Extension HDU(可选),线性串联。Primary 装主图像;Extension 可以是 IMAGE(2D / 3D / N 维数组)、BINTABLE(二进制表)、TABLE(ASCII 表)。一个观测出来的 .fits 通常就是"主图 + mask + error + 源星表"四件套。② 80 byte ASCII header card:格式 KEYWORD = value / comment,全大写、固定 byte 1-8 是关键字、byte 9 是 '='、byte 11-30 是值、byte 31 是 '/'、byte 32-80 是注释。36 张卡 = 1 个 2880 byte 块(磁带遗产);最后一张是 END你能用文本编辑器看见 FITS 的元数据 —— 这是 FITS 跨越 40 年的根本原因之一。③ 多维数据数组:NAXIS = N(维度数)/ NAXIS1, NAXIS2, …, NAXISN(各维大小)/ BITPIX(每像素位深 8/16/32/-32/-64,负数表示 IEEE 754 浮点)。NAXIS=2 是图,NAXIS=3 是数据立方体(X/Y/λ),NAXIS=4 加时间维。④ BINTABLE / TABLE:这是 FITS 真正的"杀手级"扩展 —— 二进制表能存源星表(N 行 × 几十列,从 ID / RA / Dec 到亮度 / 颜色 / 形态参数)、光谱(N 行波长 × 流量)、时序数据(N 行时间 × 通量),全部走标准 TFORM / TTYPE / TUNIT 描述,任何 FITS 工具都能读。这点让 FITS 既是"图像格式"又是"科学数据库",HDF5 / NetCDF 都没有这种"老牌 + 跨工具一致"的优势。⑤ WCS(World Coordinate System):由 Greisen & Calabretta 在 2002 年的两篇里程碑论文里完整化,通过 CRPIX(参考像素)/ CRVAL(参考点天球坐标)/ CDELTCDi_j 矩阵(每像素角秒)/ CTYPE = 'RA---TAN'(投影类型,常见 TAN / SIN / ARC / ZEA)定义可逆函数。GeoTIFF 学的就是这套思路,只是把"天球"换成了"地球"。⑥ 多种 tile compression:Rice(整数、低噪图像最优 · 2-3×)/ GZIP(通用 · 2-3×)/ PLIO(mask 类整数图 · 4-8×)/ HCOMPRESS(有损 · 巡天图像 · 4-10×,JWST / Pan-STARRS 等大型项目用它)。压缩存在专门的 BINTABLE 扩展里,不破坏任何 FITS 兼容性 —— 不认识压缩的工具仍能识别 BINTABLE,只是看不懂里面是图。

FITS's core, six pieces. ① HDU (Header Data Unit) chain: a .fits file = Primary HDU (mandatory) + N Extension HDUs (optional), in a linear chain. Primary holds the main image; Extensions can be IMAGE (2D / 3D / N-D array), BINTABLE (binary table) or TABLE (ASCII table). A typical observation produces "main image + mask + error map + source catalogue" in a single file. ② 80-byte ASCII header card: format KEYWORD = value / comment, all uppercase, bytes 1–8 are the keyword, byte 9 is '=', bytes 11–30 are the value, byte 31 is '/', bytes 32–80 are the comment. Thirty-six cards = one 2880-byte block (magnetic-tape heritage); the last card is END. You can read FITS metadata in a text editor — one of the deep reasons FITS has lasted forty years. ③ Multidimensional data arrays: NAXIS = N (dimensions) / NAXIS1, NAXIS2, …, NAXISN (sizes) / BITPIX (per-pixel bit depth 8/16/32/-32/-64; negative = IEEE 754 float). NAXIS=2 is an image, NAXIS=3 is a data cube (X/Y/λ), NAXIS=4 adds time. ④ BINTABLE / TABLE — FITS's killer extension: binary tables hold source catalogues (N rows × tens of columns from ID / RA / Dec to magnitudes / colours / shape parameters), spectra (N rows of wavelength × flux), time series (N rows of time × flux) — all described via standard TFORM / TTYPE / TUNIT, readable by any FITS tool. This is what makes FITS simultaneously "image format" and "scientific database" — an edge HDF5 / NetCDF can't match. ⑤ WCS (World Coordinate System): completed in Greisen & Calabretta's two landmark 2002 papers — CRPIX (reference pixel) / CRVAL (sky coordinate at the reference pixel) / CDELT or CDi_j matrix (arcsec/pixel) / CTYPE = 'RA---TAN' (projection — typically TAN / SIN / ARC / ZEA) together define an invertible function. GeoTIFF borrowed exactly this design, just swapping "sky" for "Earth". ⑥ Tile compressions: Rice (best for low-noise integer images, 2-3×) / GZIP (general, 2-3×) / PLIO (mask-style integer images, 4-8×) / HCOMPRESS (lossy, survey imagery, 4-10× — used by JWST, Pan-STARRS and other large surveys). Compressed data lives inside a dedicated BINTABLE extension, so FITS compatibility is preserved — tools that don't understand the compression still see a BINTABLE, just can't decode the image inside.

FITS · TELESCOPE → MULTI-HDU FILE → ds9 / astropy → WCS → ANALYSIS 望远镜 CCD JWST / HST / Chandra 16-bit / 32-bit raw FITS · 多 HDU 文件 PRIMARY · 主图 1024² float32 EXT 1 · IMAGE · mask EXT 2 · IMAGE · error EXT 3 · IMAGE · 第二曝光 EXT 4 · BINTABLE · 源星表 header 含 WCS · EXPTIME · INSTRUME · DATE-OBS ds9 / astropy 读入 $ ds9 image.fits >>> from astropy.io import fits >>> hdu = fits.open('img.fits') >>> data = hdu[0].data WCS 应用 CRPIX + CRVAL + CDELT + CTYPE → pixel (col,row) 转 (RA, Dec) → 跨望远镜数据可叠加 科学分析 / 论文图 · 测光(SourceExtractor) · 光谱拟合(specutils) · 时序光变(lightkurve) · 多波段叠加 · 出版图(matplotlib) spectrum / lightcurve 关键点:FITS 一个文件装下"图像 + mask + error + 源星表 + 光谱"整套观测产物,header 自描述、WCS 让跨望远镜数据可比较。 这是一份观测的全部输出,而不只是"一张图" — 这条流水线 1981 年至今没有结构上的变化,只是工具变了(磁带 → CD → 网络 → S3)。 JWST / Hubble / Chandra / SDSS 全部走这条管道 · MAST / IRSA / NED 等数据中心也按 FITS 提供下载。 astropy + matplotlib + ds9 是天文程序员的"三件套" — 几乎所有现代天文论文里的图都从这条路径产出。

图 52 · FITS 完整天文工作流。望远镜 CCD 输出 16/32-bit raw,pipeline 写成多 HDU FITS(主图 + mask + error + 源星表),ds9 / astropy 读入后通过 header 里的 WCS 把像素转成 (RA, Dec) 天球坐标 —— 这一步让"今天 JWST 的红外图"和"30 年前 Hubble 的可见光图"可以叠在同一片天空上做联合分析。下游再走 SourceExtractor 测光、specutils 拟合光谱、matplotlib 出图,最后进入论文。这条流水线 1981 年至今没有结构上的变化,只是工具变了 —— 磁带 → CD → FTP → AWS S3。MAST(STScI)、IRSA(IPAC)、NED 等天文数据中心至今全部按 FITS 提供下载。

Fig 52 · The full FITS astronomical workflow. The telescope CCD emits 16/32-bit raw, the pipeline writes a multi-HDU FITS (main image + mask + error + source catalogue), ds9 / astropy reads it and uses the WCS in the header to convert pixels to (RA, Dec) sky coordinates — that single step lets "today's JWST infrared image" and "Hubble's visible-light image from thirty years ago" be stacked on the same patch of sky for joint analysis. Downstream: SourceExtractor for photometry, specutils for spectral fitting, matplotlib for publication figures, then into the paper. This pipeline has been structurally unchanged since 1981; only the tools changed — magnetic tape → CD → FTP → AWS S3. MAST (STScI), IRSA (IPAC), NED and other archives still distribute everything as FITS.

compressiontypical ratiolossy?typical use
Rice2-3×无损低噪整数图(CCD raw)
GZIP2-3×无损通用 / 文本类数据
PLIO4-8×无损mask 图(整数 / 稀疏)
HCOMPRESS4-10×有损巡天图像(JWST · Pan-STARRS)
$ ds9 image.fits                                # SAOImage 查看 · 天文标配 GUI
$ python -c "from astropy.io import fits; \
             hdu = fits.open('img.fits'); \
             print(hdu.info())"                  # Python · 列出所有 HDU
$ funpack img.fits.fz                            # CFITSIO 解 Rice / GZIP / HCOMPRESS
$ wcsinfo img.fits                               # 看 WCS · CRPIX / CRVAL / CTYPE
$ fitsverify img.fits                            # 校验是否合规 FITS · NASA 出品

适用

USE FOR

  • 所有天文数据(从太阳到深空)
  • 多维科学数据(N-D 数组、数据立方体)
  • 需要长期归档(40+ 年向后兼容)
  • 需要 ASCII 可读 metadata 的科学场景
  • 需要 BINTABLE 二进制表 + 图像同文件的工作流
  • 跨望远镜 / 跨时代数据叠加(WCS 一致性)
  • All astronomical data (Sun to deep sky)
  • Multidimensional scientific data (N-D arrays, data cubes)
  • Long-term archival (40-plus-year backward compatibility)
  • Scientific work needing ASCII-readable metadata
  • Workflows mixing BINTABLE and image in one file
  • Cross-telescope / cross-era data stacking (WCS consistency)

反适用

AVOID

  • 任何非科学场景(用 PNG / JPEG / TIFF)
  • 需要浏览器原生展示(零浏览器支持)
  • 对压缩比极度敏感的存档(用 HEIF / AVIF)
  • 不需要 WCS / metadata 的工作流(开销浪费)
  • Anything non-scientific (use PNG / JPEG / TIFF)
  • Native browser display (zero browser support)
  • Archives extremely size-sensitive (use HEIF / AVIF)
  • Workflows that don't need WCS / metadata (overhead waste)
scopereaderseditors / pipelinesCLI
FITS (IAU) ✓✓ ds9 · astropy.io.fits · CFITSIO · IDL · IRAF · CASA · DS9 / Aladin · ESA Datalabs · MAST / IRSA / NED 数据中心 ✓✓ astropy 全家桶(specutils / photutils / lightkurve)· STScI Hubble pipeline · NASA JWST pipeline · ESO 镜像 fitsinfo · fitsdump · fitscopy · fitsverify · funpack · wcsinfo
起源:origin: Don Wells / Eric Greisen / Ronald Harten · 1981 A&A paper · 1988 IAU 标准 设计动机:design motivation: 解决 1980s 天文台之间磁带格式互不兼容 + 自描述跨工具一致性 同源思想:cousins: TIFF(自描述容器思想)· GeoTIFF WCS 借鉴 FITS WCS · HDF5 / NetCDF(更现代但未替代) 现役:in service: JWST · Hubble · Chandra · SDSS · Pan-STARRS · LSST · 几乎所有现代望远镜 现实定位:real position: 天文唯一标准 · 40+ 年未替代 · 永恒存在 · 不与任何其它格式直接竞争

JP2 / JPX — JPEG 2000 在科学领域的活路

JP2 / JPX — JPEG 2000's afterlife in science

YEAR 2000(JP2)/ 2003(JPX) AUTHOR JPEG 委员会(ISO/IEC JTC 1/SC 29/WG 1) EXT .jp2 · .jpx · .j2k(裸 codestream) BASE JPEG 2000 codestream + ISOBMFF box 容器 STD ISO/IEC 15444-1(JP2)· 15444-2(JPX) LOSSY 可有可无(同一 codestream 切换) STATUS DICOM 内嵌 / 卫星归档 / 文化遗产

"JPEG 2000 在 web 死了,在医学和卫星领域活得很好。"

"JPEG 2000 died on the web; it lives well in medicine and satellites."

2000 年 ISO/IEC 15444-1 发布,JPEG 2000 标准里附带 JP2 文件结构 —— 一个 ISOBMFF 风格的 box 容器(跟 MP4 同源,2001 年才被定为 ISOBMFF;但 JP2 box 框架是更早成型的同套思路),内部装 JPEG 2000 codestream payload + 色彩管理 + 分辨率信息 + ICC profile 等元数据。2003 年 ISO/IEC 15444-2 定义 JPX 扩展,允许多 codestream(类似多页)、复杂 metadata、跟 XML 的元数据集成、富互动结构。设计本意是替代 JPEG 成为下一代主流 —— 任意分辨率层级解码、无损 + 有损切换、ROI(region of interest)优先解码、progressive 流式播放。但是 web 浏览器拒绝实现:Chromium / Firefox 都说 "JPEG 2000 解码 CPU 开销太大,而 web 流量是体积敏感不是质量敏感",只有 Safari 至今支持(macOS / iOS 走系统 ImageIO 框架)。结果 JPEG 2000 在主流场景死亡,但在不被浏览器决定的场景里活得很好:DICOM transfer syntax(医学影像标准内嵌 JP2 codestream)、卫星图像归档(ESA / NASA 部分管线)、文化遗产高保真扫描(Library of Congress 古籍数字化)—— 这些场景需要"任意分辨率层级解码"和"同一文件无损 + 有损切换"。

In 2000 ISO/IEC 15444-1 shipped, with the JPEG 2000 standard also defining the JP2 file structure — an ISOBMFF-style box container (cousin of MP4 — formally ISOBMFF only in 2001, but JP2's box framework is an earlier instance of the same philosophy) wrapping a JPEG 2000 codestream plus colour-management, resolution and ICC-profile metadata. In 2003 ISO/IEC 15444-2 defined the JPX extension, allowing multiple codestreams (page-like), richer metadata, XML metadata integration and interactive structures. The original ambition was to replace JPEG as the mainstream — any-resolution-layer decoding, lossless / lossy switch, region-of-interest priority decoding, progressive streaming. Browsers refused: Chromium and Firefox both said "JPEG 2000 decode is too CPU-heavy, and web traffic is size-sensitive not quality-sensitive"; only Safari supports it today (via macOS / iOS ImageIO). So JPEG 2000 died on the mainstream — and lives well in scenes browsers don't gatekeep: DICOM transfer syntaxes (medical-imaging standards embedding JP2 codestreams), satellite-image archiving (ESA / NASA pipelines), cultural-heritage high-fidelity scans (Library of Congress book digitisation) — places that need "arbitrary resolution layers" and "lossless / lossy in one file".

JP2 · ISOBMFF BOX TREE jP  · 12 byte signature ftyp · file type · 'jp2 ' jp2h · image header (super-box) ihdr · 宽 / 高 / 通道数 / bit depth colr · 色彩空间 · ICC profile jp2c · JPEG 2000 codestream(payload) JPX 扩展可加多 jp2c · uuid box · XML metadata
图 53 · JP2 文件 box 树。ISOBMFF 风格的层叠容器,根下挂 jP(12 byte 签名)/ ftyp(文件类型 'jp2 ')/ jp2h(image header super-box,内含 ihdr 宽高位深 / colr 色彩空间)/ jp2c(实际的 JPEG 2000 codestream payload)。JPX(ISO/IEC 15444-2)扩展可加多个 jp2c(类似多页)+ uuid 自定义 box + XML metadata。这套 box 哲学跟 MP4 / HEIF 同源 —— 都把"容器和编码解耦"当成第一原则。
Fig 53 · JP2 file box tree. An ISOBMFF-style nested container — at the root: jP (12-byte signature) / ftyp (file type 'jp2 ') / jp2h (image-header super-box containing ihdr for width/height/depth and colr for colour space) / jp2c (the actual JPEG 2000 codestream payload). JPX (ISO/IEC 15444-2) extends this with multiple jp2c boxes (page-like), uuid custom boxes and XML metadata. The same box philosophy as MP4 / HEIF — "container decoupled from codec" as first principle.

技术内核

Technical core

JP2 / JPX 内核两件事。① JP2 = ISOBMFF box 容器 + JPEG 2000 codestream payload:容器层负责文件组织(签名 / 文件类型 / 图像头 / 色彩管理 / ICC profile / 分辨率信息 / metadata),payload 层是 JPEG 2000 的 wavelet codestream(EBCOT 码块 + 分辨率层 + 质量层),两者解耦。这套 box 哲学跟 MP4 / HEIF 同源,工业实现都共享 ISOBMFF parser。② JPX(ISO/IEC 15444-2)是 JP2 的扩展,加多 codestream(可装多张图,类似 PDF 的多页)、复杂 metadata(XML / RDF 集成,适合文化遗产场景描述古籍册次 / 著录信息)、富互动结构(超链接、分层标注)。JP2 在医学和卫星归档活下来的真正原因不是容器多复杂,而是 JPEG 2000 codestream 的两个核心特性:(a) 同一文件可无损或有损切换 —— 用 reversible 5/3 wavelet 是无损,irreversible 9/7 wavelet 是有损,客户端按 quality layer 选;DICOM 1.2.840.10008.1.2.4.91 transfer syntax 就是有损 9/7,部分医院 CT 用它做长期归档;(b) 任意分辨率层级解码 —— wavelet 多分辨率天然支持"先看 1/8 缩略图,再按需解 1/4、1/2、原始",对超大幅图像(古籍数字化 50K×50K 像素 / 卫星 10000×10000 多波段)做"渐进 + ROI 优先"流式查看是杀手锏。这种"同一文件多种用法"的能力 JPEG / WebP / AVIF 都没有(它们要么必须有损要么必须无损,无法切换)。

JP2 / JPX core, two pieces. ① JP2 = ISOBMFF box container + JPEG 2000 codestream payload: the container layer handles file organisation (signature / file type / image header / colour management / ICC profile / resolution / metadata); the payload is the JPEG 2000 wavelet codestream (EBCOT code blocks + resolution layers + quality layers); the two are decoupled. The same box philosophy as MP4 / HEIF — industrial implementations share the same ISOBMFF parser. ② JPX (ISO/IEC 15444-2) extends JP2 with multiple codestreams (page-like, à la PDF), richer metadata (XML / RDF integration — perfect for cultural-heritage cataloguing of book volumes and bibliographic records) and interactive structures (hyperlinks, layered annotations). The real reason JP2 lives on in medicine and satellite archiving isn't container sophistication — it's two core properties of the JPEG 2000 codestream itself: (a) lossless / lossy in one file — the reversible 5/3 wavelet is lossless, the irreversible 9/7 is lossy, and the client picks via quality layer; DICOM transfer syntax 1.2.840.10008.1.2.4.91 is lossy 9/7, used by some hospital CT archives; (b) arbitrary-resolution-layer decoding — wavelet multi-resolution naturally lets you "see a 1/8 thumbnail first, then decode 1/4, 1/2, original on demand". For huge imagery (50K×50K-pixel book scans, 10000×10000 multi-band satellite scenes), "progressive + ROI-priority" streaming is the killer feature. JPEG / WebP / AVIF have no equivalent — they're forced lossy or forced lossless, no switch.

适用

USE FOR

  • DICOM transfer syntax(医学影像归档)
  • 卫星图像归档(ESA / NASA / 部分商业卫星)
  • 文化遗产高保真扫描(LoC · 大英图书馆古籍)
  • 电影 DCP(Digital Cinema Package · DCI 强制 JP2)
  • 需要同一文件无损 / 有损切换的存档场景
  • 需要任意分辨率层级 + ROI 优先解码的超大图
  • DICOM transfer syntax (medical-image archives)
  • Satellite-image archiving (ESA / NASA / commercial sats)
  • Cultural-heritage high-fidelity scans (LoC, British Library)
  • Digital Cinema Package (DCI mandates JP2)
  • Archives needing in-file lossless / lossy switch
  • Huge images needing arbitrary-resolution + ROI-priority decode

反适用

AVOID

  • Web(只 Safari 原生,Chromium / Firefox 拒绝)
  • 移动端(解码 CPU / 内存开销大)
  • 追求极致压缩比的现代场景(用 AVIF / HEIF)
  • 不需要分辨率层级或 ROI 的普通图像
  • The web (Safari only — Chromium / Firefox refused)
  • Mobile (heavy decode CPU / memory)
  • Modern scenes chasing peak compression (use AVIF / HEIF)
  • Plain images without resolution-layer / ROI needs
scopereaderseditorsCLI
JP2 / JPX ~ Safari(macOS / iOS · 原生)· OpenJPEG(开源)· Kakadu(商业 · 性能基准)· DICOM 阅读器(MicroDicom · OsiriX · Horos)· GDAL · ImageMagick ~ Photoshop(JP2 / JPX 插件)· GIMP(部分版本)· IrfanView · 文化遗产专用扫描软件 opj_decompress(OpenJPEG)· kdu_expand(Kakadu)· gdal_translate -of JP2OpenJPEG
起源:origin: JPEG 委员会 · 2000 ISO/IEC 15444-1(JP2)· 2003 15444-2(JPX) 基于:based on: JPEG 2000 wavelet codestream + ISOBMFF box 容器 同源容器哲学:cousin container philosophy: MP4 / HEIF / AVIF · 都是 ISOBMFF box 框架 现实定位:real position: Web 死亡 · DICOM / 卫星归档 / 电影 DCP / 文化遗产高保真活得很好

WebP2 — Google 的实验后代

WebP2 — Google's experimental heir

YEAR 2021 起开发 AUTHOR Google libwebp2 团队 EXT .wp2(实验性) BASE 基于 AV1 思想自研 codec(非直接 AV1) STATUS 实验性 · libwebp2 0.x · 未推到 Chrome CONTEXT 与 AVIF / JXL 性能对比中

"WebP 的 mark II,但 AVIF 已经赢在了起跑线。"

"WebP Mark II — but AVIF already crossed the finish line."

2021 年 Google 启动 WebP2 项目,目标是"做 WebP 的下一代"—— 不再像 WebP v1 那样基于 VP8 帧内编码,而是吸收 AV1(2018 年发布的下一代视频 codec)的思想自研一套全新 codec,同时保留 WebP 的 web 优先哲学(简单容器、轻量解码、Chrome 直接支持)。但设计窗口已经关闭:WebP2 启动时,AVIF(直接基于 AV1 的图片格式)2019 年已被 Netflix / Google 推动落地,2020 年 Chrome 加入支持,2022 年 Firefox / Safari 全部跟进 —— Google 自己的浏览器都已经先支持了竞品。WebP2 在 libwebp2 仓库慢慢迭代,但从未推到 Chrome 主流支持;Google 自己也没有公开宣布要替代 WebP 或 AVIF。结果今天 WebP2 处于一种尴尬状态:技术上是真的在写,数据上压缩率确实跟 AVIF / JXL 在同一档,但商业上没有任何动机推它落地 —— 因为 AVIF 已经占据了"下一代 web 图片格式"的生态位。WebP2 项目自己的 README 第一句话就承认:"WebP 2 is an experimental successor of WebP. WebP 2 is not WebP, neither v2 of WebP."(WebP 2 是 WebP 的实验后继者,既不是 WebP,也不是 WebP 的 v2)—— 一个少见的、官方亲自打的"这是研究项目,不是产品"标签。

In 2021 Google started the WebP2 project, aiming to build "WebP's next generation" — no longer based on VP8 intra-frame coding like WebP v1, but absorbing ideas from AV1 (the 2018 next-gen video codec) into a freshly engineered codec, while keeping WebP's web-first philosophy (simple container, light decoder, native Chrome support). But the design window had already closed: by the time WebP2 launched, AVIF (the image format directly built on AV1) had been driven into production by Netflix and Google in 2019, picked up by Chrome in 2020, and joined by Firefox and Safari by 2022 — Google's own browser already supported the competitor. WebP2 keeps iterating in the libwebp2 repo, but has never been promoted to mainstream Chrome support; Google has never publicly committed to it replacing WebP or AVIF. Today WebP2 sits in an awkward limbo: technically real, with compression on par with AVIF / JXL, but with zero commercial pressure to ship — AVIF already owns the "next-gen web image" ecological niche. The project's own README opens with a rare self-aware disclaimer: "WebP 2 is an experimental successor of WebP. WebP 2 is not WebP, neither v2 of WebP." A research project, officially labelled as such.

CODEC ACTIVITY · 2010-2026 2010 2014 2018 2022 2026 WebP mainstream WebP2 experimental AVIF rising JXL niche Chrome 2010 AV1 2018 → AVIF 2019
图 54 · 四种"下一代 web 图片格式"的活跃度时间线 2010-2026。WebP(2010 Chrome 原生支持)是真正的主流;WebP2(2021 起开发)是细线 —— 实验性,从未被推到 Chrome 主线;AVIF(基于 2018 AV1,2019 落地)抢先占位"下一代 web 图片格式"生态位;JXL 在桌面 / 摄影 niche 活下来。WebP2 的窗口被 AVIF 关上的那一刻,就是 Chrome 自己 2020 年加入 AVIF 支持。
Fig 54 · activity timeline of four "next-gen web image" candidates, 2010-2026. WebP (Chrome native 2010) is actually mainstream; WebP2 (started 2021) is a thin line — experimental, never promoted into Chrome's main line; AVIF (built on AV1 2018, shipped 2019) seized the "next-gen web image" niche first; JXL survives in a desktop / photography niche. The moment that closed WebP2's window was Chrome itself adding AVIF support in 2020.

技术内核

Technical core

WebP2 内核两件事。① 基于 AV1 思想自研 codec(不直接用 AV1) —— Google 没有像 AVIF 那样直接抄 AV1 的帧内编码,而是从 AV1 借鉴几个思路(更大的 transform block 64×64、更聪明的 intra prediction、entropy coding 改进)然后自研一套独立 codec。原因有政治也有技术:技术上,Google 想做更轻量的解码器,AVIF 的解码器其实是 AV1 的子集,代码量大,移动设备 CPU 紧;政治上,WebP / VP9 / AV1 都是 Google 系的开放视频 codec 生态,WebP2 是想做"web 图片专用、不背 video codec 包袱"的小而美。但代价是 —— 没有现成的 AV1 解码器可借,得自己写。② 仍在 Google libwebp2 开发,未推到 Chrome 主流 —— libwebp2 是 Google 自己的开源库,在 GitHub 持续提交,但 Chrome 至今没有 webp2 的 image decoder 注册(对比 WebP 是 2010 年原生,AVIF 是 2020 年原生)。Google 自己也没公开 commit 推它落地 —— 一种"我们继续研究,但不答应商业化"的姿态。这种姿态在大公司开源项目里很少见,通常要么开发要么砍,WebP2 罕见地处于"长期实验状态"。

WebP2 core, two pieces. ① AV1-inspired but home-grown codec (not AV1 itself) — Google did not, like AVIF, just adopt AV1's intra-frame coding directly. Instead it borrowed ideas from AV1 (larger 64×64 transform blocks, smarter intra prediction, improved entropy coding) and engineered its own independent codec. The reason is partly political, partly technical: technically, Google wanted a lighter decoder — AVIF's decoder is essentially a subset of AV1, code-heavy and tight on mobile CPU; politically, WebP / VP9 / AV1 are all Google-aligned open video codecs, and WebP2 was meant to be a small purpose-built web-image codec without the video-codec baggage. The cost: no off-the-shelf AV1 decoder to borrow — everything written from scratch. ② Still in Google's libwebp2, never promoted to mainstream Chrome — libwebp2 is Google's own open-source library, with continuing GitHub commits, but Chrome has no webp2 image decoder registered (compare: WebP native since 2010, AVIF native since 2020). Google has never publicly committed to shipping it — a "we keep researching, but won't promise productisation" posture. Rare for big-company open source: usually it's either ship or kill — WebP2 sits in unusual long-term experimental limbo.

适用

USE FOR

  • (研究)Codec 对比基准
  • libwebp2 开发者社区实验
  • 关注下一代图像 codec 的从业者跟踪样本
  • (Research) codec comparison benchmarks
  • libwebp2 developer-community experiments
  • Tracking sample for next-gen image-codec watchers

反适用

AVOID

  • 任何生产环境(浏览器原生支持为零)
  • 任何对兼容性有要求的场景
  • 替代 AVIF / WebP / JXL —— 没有理由
  • Any production setting (zero native browser support)
  • Any compatibility-sensitive scenario
  • Replacing AVIF / WebP / JXL — no reason to
scopereaderseditorsCLI
WebP2 无浏览器原生 · ~ Google libwebp2 库自带的参考解码器 无主流编辑器支持 cwp2 / dwp2(libwebp2 仓库自带 · 仅参考实现)
起源:origin: Google libwebp2(2021 起)· 基于 AV1 思想自研 前身:predecessor: WebP(VP8 帧内 → libwebp2 重做 codec) 平行竞争:parallel competitors: AVIF / JXL · 都在抢"下一代 web 图片"位 现实定位:real position: 未来不确定 · 长期实验状态 · 浏览器原生支持为零

AVIF Sequence — 视频帧序列

AVIF Sequence — when stills become a video track

YEAR 2019(与 AVIF 同期) AUTHOR AOMedia(Alliance for Open Media) EXT .avif(同 AVIF 单图) BASE HEIF(ISOBMFF)容器 · AV1 video codec MODE 多 image item / video track + 帧间预测 STATUS 与 AVIF 相同 · Chrome / Firefox / Safari modern

"AVIF 的'多帧'就是把 AV1 的 video 模式接回来。"

"AVIF's multi-frame mode just dials AV1's video back in."

AVIF 单图模式只用 AV1 的 intra-frame(关键帧)编码 —— 因为 web 图片不需要"前一帧后一帧"。但 AVIF 用的容器是 HEIF(基于 ISOBMFF,跟 MP4 / JP2 同源),HEIF 容器原本就是为视频设计的,有完整的 video track 概念。所以 AVIF Sequence 做的事情非常简单:把 AV1 的 video 模式装回去 —— 让一个 AVIF 文件可以装多帧、有 timeline、可循环、可带帧间预测。结果是一种"高质量短动图替代品":代替 GIF 的 8 bit 256 色 + LZW 暴体积、代替 animated WebP 的 VP8 老 codec。实测体积比 animated WebP 小 30-50%,因为 AV1 的帧间预测远比 VP8 高效。但代价是 —— AVIF Sequence 是真正的视频压缩,带 motion estimation / motion compensation,编码时间是 animated WebP 的 10-30×。这意味着服务器侧预编码可行,用户实时上传不行:Twitter / Reddit / Imgur 这种用户上传场景你不能让用户等 30 秒;但 Cloudinary / imgix 这种 CDN 中间层服务器预编码 OK。AVIF Sequence 现在的实际用法:替代 GIF 表情包(质量 + 体积都赢)、替代 web 短动画(.mp4 太重 / GIF 太丑的中间地带)、替代某些 Live Photo 场景(iOS HEIF 走的是相邻路线)。

AVIF's still-image mode only uses AV1's intra-frame (keyframe) coding — web images don't need "previous-frame / next-frame". But AVIF's container is HEIF (built on ISOBMFF, sharing roots with MP4 / JP2), and HEIF was designed for video in the first place, with a full video-track concept. So AVIF Sequence does something extremely simple: dial AV1's video mode back in — let a single AVIF file hold multiple frames, with a timeline, looping, and inter-frame prediction. The result is a "high-quality short-animation substitute": replacing GIF's 8-bit 256-colour LZW bloat and animated WebP's older VP8 codec. Measured sizes are 30–50% smaller than animated WebP, because AV1's inter-frame prediction is far more efficient than VP8. The cost: AVIF Sequence is real video compression, with motion estimation / motion compensation — encoding takes 10–30× longer than animated WebP. So server-side pre-encoding is fine, real-time user uploads are not: Twitter / Reddit / Imgur, where users upload live, can't make the user wait 30 seconds; Cloudinary / imgix as a CDN middle layer can. Today's actual uses: replacing GIF stickers (better quality and smaller); replacing web short animations (the middle ground between heavy .mp4 and ugly GIF); replacing some Live-Photo flows (iOS HEIF takes a parallel path).

AVIF SEQUENCE · HEIF BOX TREE ftyp · brand 'avis' (sequence) vs 'avif' (still) meta · image items (still mode reuse) iinf · iref · iloc · ipma — describe per-frame items moov · video track · AV1 codec · frame timeline I P P P P I I = intra(关键帧)· P = 帧间预测
图 55 · AVIF Sequence 的 HEIF box 结构。ftyp 用 brand avis(sequence)区分单图 avif;meta 装 image items(沿用单图模式的描述方式);moov 是真正的视频 track,装 AV1 codec 的 I 帧(intra,关键帧)+ P 帧(inter,帧间预测)的时间线。"I-P-P-P-P-I-..."就是 AVIF Sequence 比 animated WebP 小 30-50% 的原因 —— P 帧只编码"和上一帧的差",而 animated WebP 每帧都是独立的。
Fig 55 · AVIF Sequence's HEIF box structure. ftyp uses brand avis (sequence) to distinguish from still avif; meta holds image items (reusing still-mode description); moov is the actual video track with an AV1-codec timeline of I-frames (intra / keyframe) and P-frames (inter / predicted from previous). The "I-P-P-P-P-I-..." pattern is exactly why AVIF Sequence is 30–50% smaller than animated WebP — P-frames encode only the delta, while animated WebP encodes every frame independently.

技术内核

Technical core

AVIF Sequence 内核两件事。① HEIF 容器内多 image item 或 video track —— HEIF(High Efficiency Image Format)容器是 ISOBMFF 风格的 box 结构,跟 MP4 / JP2 同源。AVIF 单图模式用 meta box 装一个 image item(只一帧 intra);AVIF Sequence 有两种装法:(a)多 image item(每帧独立 intra,跟单图一样,只是多个);(b)走 moov video track(真正的视频 track,有时间戳、可循环、可装 inter 帧)。ftyp 用 brand 区分:avif 是单图,avis 是 sequence。② 帧间预测可选(不一定都是 intra) —— 如果走 video track 模式,AVIF Sequence 就是真正的视频压缩:P 帧编码"和上一帧的差",B 帧编码"和前后帧的差",带 motion estimation / motion compensation 整套机制。这是它比 animated WebP / GIF 小 30-50% 的根因 —— animated WebP 每帧都是独立 VP8 intra(本质上是多张静图叠在一起),AVIF Sequence 真把"运动"压缩了。但这个能力的代价非常贵:编码时间 10-30× animated WebP,因为 motion estimation 是计算密集型搜索;客户端实时编码不可行(用户上传不能让其等 30 秒),只能服务器预编码或 CDN 中间层转码。这种"质量 / 体积赢、编码慢"的权衡跟 AVIF 单图是一致的 —— AVIF 全家就是"花更多 CPU 换更小文件"。

AVIF Sequence core, two pieces. ① Multiple image items or a video track inside HEIF — the HEIF (High Efficiency Image Format) container is an ISOBMFF-style box structure, sharing roots with MP4 / JP2. AVIF still mode places a single image item in a meta box (one intra frame). AVIF Sequence has two ways: (a) multiple image items (every frame independent intra, like still mode but several of them); (b) a real moov video track (with timestamps, looping, and inter frames). The ftyp brand distinguishes them: avif for still, avis for sequence. ② Inter-frame prediction is optional (not necessarily all intra) — in video-track mode AVIF Sequence is actual video compression: P-frames encode the delta from the previous frame, B-frames encode deltas from both sides, complete with motion estimation / motion compensation. This is why it's 30–50% smaller than animated WebP / GIF — animated WebP is essentially a stack of independent VP8-intra stills, while AVIF Sequence really compresses motion. The cost: encoding takes 10–30× animated WebP, because motion estimation is a compute-intensive search; client-side real-time encoding isn't viable (you can't make a user wait 30 seconds on upload), so it lives on the server side or in a CDN transcoder. The same "quality / size win, slow encode" trade-off as still AVIF — the whole AVIF family trades CPU for smaller files.

适用

USE FOR

  • 高质量短动图(替代 GIF / animated WebP)
  • 表情包 / sticker / 反应图(质量 + 体积双赢)
  • web 短动画(.mp4 太重 / GIF 太丑的中间地带)
  • 服务器预编码 / CDN 中间层转码场景
  • Live Photo 类场景(短视频 + 关键帧静图)
  • High-quality short animations (replacing GIF / animated WebP)
  • Stickers / reactions (smaller and better-looking)
  • Web short animations (the middle ground between heavy .mp4 and ugly GIF)
  • Server-side pre-encode / CDN transcode setups
  • Live-Photo-like flows (short video plus a keyframe still)

反适用

AVOID

  • 客户端实时编码(用户上传场景 · 编码 10-30× 慢)
  • 需要真正视频功能(音轨 / 长时长 · 改用 .mp4)
  • 老浏览器兼容场景(同 AVIF · IE / 老 Safari 不行)
  • Client-side real-time encoding (user uploads — 10–30× slower)
  • True video features (audio track / long duration — use .mp4)
  • Legacy browsers (same as AVIF — IE / old Safari out)
scopereaderseditorsCLI
AVIF Sequence Chrome / Firefox / Safari modern · iOS / macOS Photos · libavif ~ ffmpeg(via libavif)· FFmpeg-based 转码工具 avifenc -k 0 frames/*.png anim.avif · ffmpeg -i in.mp4 -c:v libaom-av1 out.avif
起源:origin: AOMedia AV1(2018)· AVIF(2019)同时定义了 sequence 模式 基于:based on: AVIF 单图 + AV1 video codec + HEIF video track 替代:replaces: animated WebP / GIF(高质量短动图场景) 现实定位:real position: 服务器预编码场景 OK · 用户实时上传不行 · 表情包 / 短动画首选

JPEG XS — 低延迟广播

JPEG XS — sub-millisecond broadcast

YEAR 2018(ISO/IEC 21122) AUTHOR JPEG WG(ISO/IEC JTC 1/SC 29/WG 1) EXT .jxs · 多直接走流 BASE 简化 wavelet(不完整 EBCOT) DESIGN < 1 ms 编 / 解延迟 · 视觉无损(高 bpp) STATUS 广播(SMPTE 2110)/ IP 视频 / VR

"为'实时'而生:压缩比小,延迟极低,正好接 4K/8K 直播。"

"Built for live — modest compression, microsecond latency, just right for 4K/8K broadcast."

现代 4K / 8K 广播正在从 SDI 光纤切到 IP 流(SMPTE 2110 标准):传统电视台用 12G-SDI 光纤把 4K 信号从摄像机送到导播台,布线昂贵;新一代直接走以太网 IP 包,跟数据中心同基础设施。但 IP 流要解决一个传统 SDI 不存在的问题:带宽。4K 60p 未压缩是 12 Gbps,8K 是 48 Gbps,数据中心万兆 / 二十五兆以太网装不下。所以需要"压一下,但不能影响实时性"的 codec。JPEG XL 太复杂(编码慢)、JPEG 2000 也慢(EBCOT entropy coding 计算量大)、H.264 / H.265 / AV1 是视频 codec 但有帧间预测延迟(至少要缓 1-2 帧才能编),完全不行。JPEG WG 在 2018 年推 JPEG XS(ISO/IEC 21122):简化的 wavelet(不做完整 EBCOT,只用更轻量的 entropy coding),牺牲压缩比(只 4-6×,而 JPEG 是 10-20×、JPEG 2000 是 20-50×),换微秒到亚毫秒级编 / 解延迟。设计目标写在标准首页:"visually lossless at 4-6× compression with sub-millisecond latency"(视觉无损 + 4-6 倍压缩 + 亚毫秒延迟)。SMPTE 2110-22(2019)正式把 JPEG XS 列入 IP 广播标准的 mezzanine compression 层。VR 头显的 wireless display(无线 VR · 把 PC 渲染的画面无线传到头显)也用 —— 因为头显需要"运动到光子"<20ms 延迟才能不晕,JPEG XS 的<1ms 编 / 解给了足够预算。

Modern 4K / 8K broadcast is moving from SDI fibre to IP streams (SMPTE 2110): traditional TV stations used 12G-SDI fibre to ship 4K from camera to control room — expensive cabling. New ones run straight Ethernet IP, sharing infrastructure with data centres. But IP brings a problem SDI never had: bandwidth. 4K 60p uncompressed is 12 Gbps; 8K is 48 Gbps — 10/25 Gigabit Ethernet can't carry it raw. So you need a codec that "compresses a little without breaking real-time". JPEG XL is too complex (slow encode); JPEG 2000 is also slow (EBCOT entropy coding is heavy); H.264 / H.265 / AV1 are video codecs but have inter-frame-prediction latency (need to buffer 1–2 frames before encoding), totally unacceptable. The JPEG WG shipped JPEG XS in 2018 (ISO/IEC 21122): simplified wavelet (no full EBCOT, lighter entropy coding), trading compression ratio (only 4–6× — vs. JPEG's 10–20× and JPEG 2000's 20–50×) for microsecond-to-sub-millisecond encode / decode latency. The standard's front page literally says: "visually lossless at 4-6× compression with sub-millisecond latency". SMPTE 2110-22 (2019) formally adopted JPEG XS as the mezzanine-compression layer for IP broadcast. VR headsets using wireless display (PC-rendered frames sent wirelessly to the headset) use it too — because headsets need "motion-to-photon" latency under 20 ms to avoid sickness, and JPEG XS's sub-1 ms encode / decode leaves enough budget for everything else.

LATENCY × COMPRESSION · LOG-LOG μs 1ms 10ms 100ms 1s latency (encode + decode) 10× 30× 50× 压缩比 / ratio JPEG XS · 4-6× · <1ms JPEG · 10-20× · ~10ms JP2 · 30-50× · ~100ms SMPTE 2110
图 56 · JPEG / JPEG 2000 / JPEG XS 在"延迟 × 压缩比"二维平面上的散点。横轴是编 + 解总延迟(对数,μs → s),纵轴是压缩比。JPEG XS 在左下角(亚毫秒延迟、4-6× 压缩比),JPEG 在中间(~10ms、10-20×),JPEG 2000 在右上角(~100ms、30-50×)。三者各占不同生态位:JPEG XS = 实时广播 / VR、JPEG = 通用 web、JPEG 2000 = 离线归档。同一委员会出三种工具,各管一片。SMPTE 2110 标记标注 JPEG XS 的实际部署位置。
Fig 56 · JPEG / JPEG 2000 / JPEG XS plotted on a two-dimensional latency × compression-ratio plane. X axis is total encode + decode latency (log scale, μs → s); Y axis is compression ratio. JPEG XS sits bottom-left (sub-millisecond, 4–6×); JPEG middle (~10 ms, 10–20×); JPEG 2000 top-right (~100 ms, 30–50×). Three different niches: JPEG XS = real-time broadcast / VR, JPEG = general web, JPEG 2000 = offline archives. The same committee shipped three tools, each owning a region. The SMPTE 2110 marker shows JPEG XS's actual deployment.

技术内核

Technical core

JPEG XS 内核三件事。① 简化的 wavelet(不做完整 EBCOT) —— JPEG 2000 的核心是 5/3 或 9/7 wavelet 加上 EBCOT(Embedded Block Coding with Optimal Truncation)entropy coding,EBCOT 提供超高压缩比但计算密集。JPEG XS 砍掉 EBCOT,只保留更简化的小波分解 + 轻量 entropy coding(直接 run-length / VLC),损失大概 5-10× 压缩比但获得10-100× 速度。② 视觉无损(typical 4-6× compression) —— 设计目标不是"压到最小",而是"压到肉眼看不出区别但带宽够省"。在 4K 60p 12 Gbps 场景,4-6× 压到 2-3 Gbps,刚好塞进 10 Gigabit Ethernet。这种"够用就好"的目标决定了它不会出现在 web(web 要求最小体积)、不会出现在归档(归档要求最高保真)。③ 帧内独立编码,无需缓冲 —— 每帧完全独立(类似 motion JPEG / motion JPEG 2000),没有帧间预测,所以编码器拿到一帧立刻编、解码器拿到一帧立刻解,延迟主要来自计算时间本身(<1 ms),不来自缓冲。这是相比 H.264 / AV1 这些视频 codec 的本质差异:视频 codec 必须缓冲 1-2 帧才能做 motion estimation,JPEG XS 完全不缓冲。代价是没有视频 codec 那种"压两个数量级"的能力,但这是 trade-off,不是缺陷。

JPEG XS core, three pieces. ① Simplified wavelet (no full EBCOT) — JPEG 2000's core is the 5/3 or 9/7 wavelet plus EBCOT (Embedded Block Coding with Optimal Truncation) entropy coding; EBCOT delivers very high compression but at heavy computational cost. JPEG XS strips EBCOT, keeping a much simpler wavelet decomposition plus lightweight entropy coding (direct run-length / VLC) — losing roughly 5–10× compression but gaining 10–100× speed. ② Visually lossless (typical 4–6× compression) — the design goal isn't "compress as much as possible", it's "compress until the eye can't tell, while saving useful bandwidth". On 4K 60p at 12 Gbps, 4–6× brings it down to 2–3 Gbps — fitting cleanly inside 10 Gigabit Ethernet. This "good enough" target keeps it out of the web (which wants the smallest size) and out of archives (which want the highest fidelity). ③ Intra-frame independent coding, no buffering — every frame is fully independent (like motion JPEG / motion JPEG 2000), no inter-frame prediction, so the encoder can encode the moment a frame arrives and the decoder can decode the moment it lands. Latency comes from compute alone (< 1 ms), not buffering. This is the essential difference vs. H.264 / AV1 video codecs: video codecs must buffer 1–2 frames to run motion estimation; JPEG XS buffers nothing. The price is no two-orders-of-magnitude video-style compression — but that's the trade-off, not a defect.

适用

USE FOR

  • 4K / 8K 广播 IP 流(SMPTE 2110-22 mezzanine 层)
  • VR 头显 wireless display(PC → 头显无线传图)
  • 实时多机位摄影棚 IP 切换台
  • 低延迟视频墙 / 监控墙(楼宇 / 控制室)
  • "够用即可、宁要延迟不要压缩比"的实时场景
  • 4K / 8K broadcast IP streams (SMPTE 2110-22 mezzanine)
  • VR-headset wireless display (PC → headset)
  • Live multi-camera studio IP switching
  • Low-latency video walls (control rooms / signage)
  • Real-time use cases where "good enough" beats "smallest"

反适用

AVOID

  • Web 静图(用 JPEG / WebP / AVIF)
  • 归档 / 压缩比敏感场景(用 JPEG 2000 / JXL)
  • VOD / 离线点播(用 H.265 / AV1 视频 codec)
  • Web stills (use JPEG / WebP / AVIF)
  • Archives / size-sensitive scenes (use JPEG 2000 / JXL)
  • VOD / offline playback (use H.265 / AV1 video codecs)
scopereaderseditorsCLI
JPEG XS 浏览器无 · ~ intoPIX SDK · SMPTE 2110-22 设备 · Kakadu(部分) 无主流编辑器 · 都是广播链路硬件 / 软件 ~ CLI 限商业(intoPIX)· 开源参考实现仅用于研究
起源:origin: JPEG WG · 2018 ISO/IEC 21122 · 为 IP 广播 / 实时设计 基于:based on: JPEG 2000 简化版本(砍 EBCOT 换速度) 平行存在:parallel niche: 在广播 / VR 业各自占据"低延迟视觉无损"生态位 现实定位:real position: SMPTE 2110-22 · 4K / 8K IP 广播 mezzanine · VR wireless display

神经压缩 — HiFiC / CDC / NN-codec

Neural compression — HiFiC / CDC / NN-codec

YEAR 2017 起(Toderici 2016 / Ballé 2018 / HiFiC 2020 / CDC 2023) AUTHOR Google / NYU / Stanford / Tencent / NVIDIA / Disney Research EXT 无统一(.nn-img · .hific · .cdc 各家各自) BASE Encoder NN + Hyperprior + Entropy + Decoder NN(±GAN ±Diffusion) DEPLOYMENT 解码必须 GPU(模型 10-50M params · 几十 MB) STATUS 实验 / 部分商用试点 · 短期不替代 AVIF

"用神经网络当 codec —— 同 bpp 下视觉效果比 AVIF 好 30%,但解码要 GPU。"

"Use a neural net as codec — visually 30% better than AVIF at same bpp, but needs a GPU to decode."

传统 codec 的设计哲学是手工设计 transform + 量化 + 熵编码:JPEG 用 8×8 DCT、AVIF 用 AV1 intra block 变换、JPEG 2000 用 wavelet —— 每一步都是人写的数学。神经压缩从 2016-2017 起换了路:整个 codec 是一个端到端可训练的神经网络。Toderici 等人 2016 年在 ICLR 用 RNN 做图像压缩;Ballé 等人 2018 年在 ICLR 提出 Hyperprior(用一个小网络估计 latent 的概率分布给熵编码器,大幅提升压缩比);Mentzer / Toderici 等人 2020 年在 NeurIPS 发表 HiFiC(High-Fidelity Generative Compression),引入 GAN 训练让低 bpp 重建有"细节合成";2023 年 Stanford 出 CDC(Conditional Diffusion Codec)用扩散模型当 decoder。在视觉相似度指标(MS-SSIM / LPIPS)上明显赢传统 codec —— 特别在极低 bpp(< 0.3 bpp):传统 codec 这时已经糊成方块、出 ringing,而 NN codec 可以"幻觉"出合理的纹理和细节(虽然不是真实的 —— 是plausibly hallucinated)。但工业部署寥寥:解码器 NN 必须随客户端分发(几十 MB 模型 vs 几 KB 图,反向负担);模型版本升级会让旧 .nn-img 解不出来;解码 GPU 依赖让移动端不可接受;学术界每 6 个月一篇 NeurIPS 论文宣布超越 AVIF 30%,但生产部署没几个真站住的。短期不会替代 AVIF,但可能在"AI 生成内容"领域率先落地 —— 同 AI 生成的图,用 AI 压缩。

Traditional codecs are hand-designed transforms + quantisation + entropy coding: JPEG uses 8×8 DCT, AVIF uses AV1 intra-block transforms, JPEG 2000 uses wavelets — every step is human-written maths. Neural compression took a different path from 2016–2017: the whole codec is a single end-to-end trainable neural network. Toderici et al. did RNN-based compression at ICLR 2016; Ballé et al. introduced the Hyperprior at ICLR 2018 (a small network estimating the latent's probability distribution for the entropy coder, dramatically improving ratios); Mentzer / Toderici et al. published HiFiC (High-Fidelity Generative Compression) at NeurIPS 2020, adding GAN training so low-bpp reconstructions get "detail synthesis"; in 2023 Stanford shipped CDC (Conditional Diffusion Codec) using a diffusion model as decoder. On visual-similarity metrics (MS-SSIM / LPIPS) they clearly beat traditional codecs — especially at very low bpp (< 0.3 bpp): traditional codecs by then are blocky and full of ringing, while NN codecs can "hallucinate" plausible texture and detail (not real — plausibly hallucinated). Industrial deployment stays thin: the decoder NN must ship with the client (tens of MB of model vs. a few KB of image — inverted load); a model-version bump makes old .nn-img files undecodable; GPU dependency rules out mobile; every six months a NeurIPS paper claims +30% over AVIF, but few productions actually stick. Short term it won't replace AVIF, but it may land first in "AI-generated content" — AI images compressed by AI.

NN CODEC · ENCODE / DECODE PIPELINE ENCODE image Encoder NN latent quantise + entropy .nn-img bytes DECODE bytes entropy decode latent Decoder NN image 解码 NN 必须随客户端分发(几十 MB 模型) · GPU 加速
图 57a · 神经压缩典型 codec pipeline。编码侧:image → Encoder NN → latent(continuous tensor)→ quantise + entropy → bytes。解码侧:bytes → entropy decode → latent → Decoder NN → image。Encoder 和 Decoder 都是 CNN(典型 10-50M params),通过端到端联合训练优化"重建损失 + 熵率"的拉格朗日组合。最关键的非对称是:解码 NN 必须随客户端分发,这给"图很小但模型很大"的反向负担埋下伏笔 —— web 流量是体积敏感的,几十 MB 的模型一次下载,几 KB 的图无数次,要算才算得过。
Fig 57a · A typical neural-codec pipeline. Encode: image → Encoder NN → latent (continuous tensor) → quantise + entropy → bytes. Decode: bytes → entropy decode → latent → Decoder NN → image. Encoder and Decoder are both CNNs (typically 10–50 M params), trained end-to-end against a Lagrangian of "reconstruction loss + entropy rate". The defining asymmetry: the decoder NN must ship with the client — the "tiny image, fat model" inverted load. Web traffic is size-sensitive: tens of MB of model downloaded once vs. a few KB of image many times — the maths only works if usage is heavy enough.
HIFIC · GAN-BASED TRAINING TRIANGLE Encoder NN Decoder NN Discriminator latent real / fake? reconstructed image loss = λ·rate + d(x, x̂) + β·GAN_loss
图 57b · HiFiC 的 GAN-based 训练三角。在 Ballé Hyperprior 的"Encoder + Decoder"基础上加一个 Discriminator,Decoder 既要让重建图像原图(rate-distortion 项),又要让重建图看起来真实到能骗过 Discriminator(GAN 项)。低 bpp 时 rate-distortion 单独走会出糊状方块,GAN 项把"hallucinated 但 plausible 的纹理"补回去。结果是:HiFiC 在 0.1 bpp 看比 AVIF 0.3 bpp 还好 —— 但补出来的细节是"像那么回事",不是真的,法医 / 医学场景不能用
Fig 57b · HiFiC's GAN-based training triangle. Add a Discriminator on top of Ballé's Hyperprior "Encoder + Decoder". The Decoder now has to make the reconstruction look like the original (rate-distortion term) AND look real enough to fool the Discriminator (GAN term). At low bpp, pure rate-distortion produces blurry mush; the GAN term fills in "hallucinated but plausible" texture. Result: HiFiC at 0.1 bpp looks better than AVIF at 0.3 bpp — but the synthesised detail is "looks right", not "is right". Forensic / medical use is forbidden.
QUALITY × BPP · JPEG / AVIF / HIFIC 0.1 bpp 0.2 bpp 0.5 bpp JPEG AVIF HiFiC block 条越长 = 视觉质量越好(MS-SSIM / LPIPS 主观)
图 57c · 同 bpp 下的视觉质量对比(示意)。横向三档 bpp(0.1 / 0.2 / 0.5),纵向 JPEG / AVIF / HiFiC 三种 codec。在极低 bpp(0.1):JPEG 已经糊成方块、AVIF 涂成马赛克,HiFiC 用 GAN 幻觉出可信纹理。在中 bpp(0.2):AVIF 翻盘可用,HiFiC 仍领先。到常规 bpp(0.5):三者趋近,HiFiC 的优势缩小。结论:NN codec 在极低 bpp 占优最明显,这是它的真正生态位 —— 但极低 bpp 应用场景本身有限(普通 web 跑 0.5 bpp 不缺地方放)。
Fig 57c · Visual quality at the same bpp (illustrative). Three bpp columns (0.1 / 0.2 / 0.5); three codecs in rows (JPEG / AVIF / HiFiC). At very low bpp (0.1): JPEG is blocky, AVIF mosaic-y; HiFiC hallucinates plausible texture via GAN. At mid bpp (0.2): AVIF becomes usable, HiFiC still ahead. At regular bpp (0.5): all three converge, HiFiC's edge shrinks. Takeaway: NN codecs win most at very low bpp, which is their real niche — but very-low-bpp applications are themselves limited (regular web at 0.5 bpp has plenty of room).
DECODE TIME · LOG SCALE JPEG AVIF HiFiC GPU HiFiC CPU ~10 μs ~30 ms ~80 ms (GPU) ~1.5 s (CPU · 1080p) μs ms s
图 57d · 解码时间(单帧 1080p · 对数横轴)。JPEG 微秒级、AVIF 毫秒级(已经是"重 codec")、HiFiC有 GPU 是几十毫秒、HiFiC 纯 CPU秒级(1.5 秒一帧)。这条数据线就是 NN codec 工业部署最大的拦路虎 —— 桌面 GPU 还能接受,手机 / IoT / 老设备完全不行。所以 NN codec 短期不会进 web,要进也是"先在云端 / 流媒体的服务器侧解了再发"。
Fig 57d · Decode time (single 1080p frame, log x-axis). JPEG microseconds; AVIF milliseconds (already a "heavy codec"); HiFiC with GPU tens of ms; HiFiC CPU only is seconds (≈ 1.5 s/frame). This single line is NN-codec deployment's biggest blocker — desktop GPUs can swallow it, but phones / IoT / older devices can't. So NN codecs won't reach the web in the short term — if they appear, it'll be "decode in the cloud / streaming server, then send the pixels".

技术内核

Technical core

神经压缩内核五块。① Encoder / Decoder 都是 CNN(典型 10-50M params),从图到 latent 是几层下采样卷积 + GDN(generalised divisive normalisation)非线性激活,从 latent 到图是对应的反卷积 / 上采样;两侧权重通过端到端反向传播联合训练。② 超先验(Hyperprior):Ballé 2018 的关键贡献 —— 用一个小网络估计 latent 每个 channel 的 Gaussian / Laplace 概率分布参数 σ,再用 σ 喂 arithmetic coder。这一步让"latent 的统计结构"被显式建模,熵编码效率提升一个量级;之后所有 NN codec 都沿用 Hyperprior 思路。③ GAN 训练(HiFiC):在 rate-distortion 损失之外加一个 Discriminator 判别"重建图 vs 原图",Decoder 学着"骗过 Discriminator"。低 bpp 重建从"糊状方块"变成"幻觉的合理纹理",MS-SSIM / LPIPS / FID 都大幅好转,但细节是合成的不是真实的 —— 这是 NN codec 不能用于法医 / 医学的根本原因。④ Diffusion-based codec(CDC):Stanford 2023 工作 —— Decoder 不是单次反卷积,而是一个条件扩散模型(以 latent 为条件,从噪声开始多步去噪到图像)。优势:diffusion 的"多步细化"对低 bpp 修复尤其好;劣势:解码 50-100 步 NN forward,慢到完全反实时(1080p 几秒)。CDC 现在还是学术阶段,但代表了 NN codec 的下一程方向。⑤ 解码必须 GPU:这是工业部署最大的物理约束 —— 移动端的 GPU(Adreno / Mali)架构跟桌面 NVIDIA 差太远,也跟 NN 推理优化(TensorRT / Core ML)的高端路径差太远;Web 上要做就得走 WebGPU,但目前 WebGPU 对 NN 推理的优化跟原生差 5-10×。所以 NN codec 现在的工业部署模式都是"中央服务器 GPU 解码 → 把解出来的 RGB 再压成 AVIF / WebP / VP9 → 发给客户端" —— 客户端从来没真正解过 NN codec 的码流。

Neural compression's core, five pieces. ① Encoder / Decoder are both CNNs (typically 10–50 M params); image → latent is a few downsampling convolutions plus GDN (generalised divisive normalisation) non-linearity; latent → image is the matching upsampling. Both sides are jointly trained end-to-end via backprop. ② Hyperprior: Ballé 2018's key contribution — a small network estimates the Gaussian / Laplace distribution parameters σ for every channel of the latent, then feeds σ into the arithmetic coder. This explicitly models the latent's statistical structure, lifting entropy efficiency by an order of magnitude; every NN codec since uses Hyperprior. ③ GAN training (HiFiC): on top of rate-distortion loss, add a Discriminator distinguishing "reconstruction vs. original"; the Decoder learns to "fool the Discriminator". Low-bpp reconstructions go from "blurry mush" to "hallucinated plausible texture"; MS-SSIM / LPIPS / FID all improve sharply — but the detail is synthesised, not real. That's the fundamental reason NN codecs can't be used in forensic / medical settings. ④ Diffusion-based codec (CDC): Stanford 2023 — the Decoder isn't a single deconvolution but a conditional diffusion model (start from noise, denoise to the image conditioned on the latent). Pros: diffusion's "multi-step refinement" works especially well for low-bpp restoration. Cons: 50–100 NN forwards per decode, completely off real-time (seconds per 1080p frame). CDC is still academic but charts the next leg. ⑤ Decoding requires a GPU: the biggest physical deployment constraint — mobile GPUs (Adreno / Mali) differ too much from desktop NVIDIA, and from NN-inference optimisation paths (TensorRT / Core ML). On the web you'd go through WebGPU, but its NN-inference performance is 5–10× behind native. So today's industrial NN-codec deployments are "central GPU decode → re-compress as AVIF / WebP / VP9 → ship to client" — clients never actually decode the NN bitstream.

NEURAL CODEC · TRAINING (TOP) · INFERENCE (BOTTOM) TRAINING · 一次性, 离线, 数千 GPU 小时 训练数据集 CLIC / OpenImages ~1M images 256×256 / 512×512 patches 联合训练 · Encoder + Hyperprior + Decoder + (Discriminator) Encoder NN Hyperprior NN Decoder NN Discriminator (HiFiC) loss = λ·rate(熵) + d(x, x̂)(MSE/MS-SSIM) + β·GAN_loss end-to-end backprop · Adam · ~1-4 周训练 trade-off 通过 λ 调节 — 多个 λ = 多个不同 bpp 模型 模型权重 · 10-50M params ~30-150 MB(.pt / .onnx) 必须随 Decoder 分发 → 这是 NN codec 工业部署的反向负担 几十 MB 模型 vs 几 KB 图 INFERENCE · 每次编码 / 解码都跑一次模型 原图 image RGB 1080p ~6 MB raw Encoder NN + quantise + arithmetic encode GPU · ~80 ms/frame .nn-img bytes ~30 KB(0.1 bpp) Decoder NN arithmetic decode + Decoder NN GPU · ~80 ms/frame 重建图 x̂ 视觉质量 ≈ 原图(低 bpp 时) "plausibly hallucinated"细节 关键非对称:训练一次(数周 GPU 集群),推理无数次,但每次推理都要跑模型 —— 解码 NN必须随客户端,且必须 GPU 加速码流不能跨模型版本兼容 — 模型升级则旧 .nn-img 解不出来(没有 backward compatibility)。 现行工业部署模式:中央 GPU 解码 → 转 AVIF / WebP 发给客户端,客户端从未真正解过 NN 码流。 短期不会进 web · 中期可能在"AI 生成内容 + AI 压缩"垂直生态率先落地 · 长期 ?

图 57 · 神经压缩完整流程。训练阶段(一次性):大数据集(CLIC / OpenImages ~1M 图)喂进 Encoder + Hyperprior + Decoder + (Discriminator) 联合训练,损失函数是"熵率 + 重建距离 + GAN 损失"的拉格朗日组合,几周 GPU 集群训出 10-50M 参数的模型(.pt / .onnx 文件 30-150 MB)。模型必须随 Decoder 一起分发到客户端。推理阶段(每次编码 / 解码):原图 → Encoder NN → quantise → arithmetic encode → bytes(0.1 bpp 的 1080p ≈ 30 KB);bytes → arithmetic decode → Decoder NN → 重建图。GPU 上每帧 ~80 ms,纯 CPU ~1.5 s。最反直觉的部分:模型升级会让旧 .nn-img 解不出来 —— 这跟 JPEG / PNG / AVIF 那种"几十年向后兼容"完全相反,这也是 NN codec 短期不可能上 web 主战场的根本原因。

Fig 57 · Full neural-compression workflow. Training (one-off): a large dataset (CLIC / OpenImages, ~1 M images) trains Encoder + Hyperprior + Decoder + (Discriminator) jointly under a Lagrangian of "entropy rate + reconstruction distance + GAN loss". A few weeks on a GPU cluster yields a 10–50 M-param model (.pt / .onnx, 30–150 MB). The model must ship to clients alongside the Decoder. Inference (per encode / decode): image → Encoder NN → quantise → arithmetic encode → bytes (1080p at 0.1 bpp ≈ 30 KB); bytes → arithmetic decode → Decoder NN → reconstruction. ~80 ms / frame on GPU, ~1.5 s on CPU. The most counterintuitive part: a model bump makes old .nn-img files undecodable — the opposite of JPEG / PNG / AVIF's "decades of backward compatibility", which is the fundamental reason NN codecs can't fight on the web's main front in the short term.

codecyearauthorfeature
Toderici LSTM2016Google早期 RNN-based · 概念奠基
Ballé Hyperprior2018NYU / GoogleGaussian Hyperprior · 领域基石
HiFiC2020Google ResearchGAN-based · 低 bpp 王
ELIC2022TencentEfficient Learned Image Compression · 实用化向
CDC2023StanfordDiffusion-based decoder · 多步去噪
ContextFormer2023Microsoft ResearchTransformer-based hyperprior
$ pip install compressai                          # PyTorch NN codec 库 · InterDigital
$ python -m compressai.utils.eval_model \
    -a bmshj2018-hyperprior \
    -m /path/to/model -d in.png                   # Ballé 2018 Hyperprior 示例
$ python compressai cli encode model_hific in.png out.bin
$ python compressai cli decode model_hific out.bin recon.png
$ pip install neuralcompression                   # Meta(Facebook AI Research)NN codec
$ python -c "from compressai.zoo import bmshj2018_hyperprior; \
             m = bmshj2018_hyperprior(quality=8, pretrained=True)"

适用

USE FOR

  • (未来)AI 生成内容压缩(同生态:AI 图 + AI codec)
  • 极低 bpp(< 0.3)+ 服务器侧 GPU 可用的场景
  • 云游戏 / 流媒体的服务器侧解码 → 转 AVIF 流出
  • 研究 / 学术评测 / 数据集压缩(实验性)
  • "内容 ≫ 模型大小"的高带宽专用通道(Stadia 类)
  • (future) AI-generated-content compression — AI image + AI codec, same ecosystem
  • Very-low-bpp (< 0.3) scenes with server-side GPUs
  • Cloud-gaming / streaming: server decode → re-encode as AVIF on the way out
  • Research / academic benchmarking / dataset compression
  • "Content ≫ model size" specialised channels (Stadia-style)

反适用

AVOID

  • 当前 web · 任何无 GPU 端(移动 / IoT / 老电脑)
  • 需要"每个像素真实"的场景(法医 / 医学 / 卫星)
  • 需要长期归档(码流不向后兼容)
  • 客户端实时编码(GPU 编码也很贵 · 普通用户上传不行)
  • 低流量场景(模型 30-150 MB 摊不平)
  • Today's web · any GPU-less endpoint (mobile / IoT / old PCs)
  • Anything needing "every pixel real" (forensic / medical / satellite)
  • Long-term archives (no bitstream backward compatibility)
  • Client-side real-time encoding (GPU encode is also expensive — user uploads can't take it)
  • Low-volume scenes (the 30–150 MB model can't amortise)
scopereaderseditors / pipelinesCLI
NN codec(各家不互通) 无任何浏览器原生 · ~ compressai(InterDigital)· neuralcompression(Meta)· 各家自家 SDK 无主流编辑器 · 仅研究代码 · TensorFlow Compression · PyTorch + 自训模型 compressai cli encode/decode · tfci(TensorFlow Compression)· 各家自家 CLI 工具
起源:origin: Toderici 2016 ICLR(RNN)· Ballé 2018 ICLR(Hyperprior) 里程碑:milestone: HiFiC 2020 NeurIPS(Google)· CDC 2023(Stanford) 思想颠覆:paradigm break: 与传统 codec(JPEG / AVIF)正交 —— 端到端可训练 vs 手工设计 短期并存:parallel niche (short term): 不替代任何主流 codec · 仅作研究 / 实验存在 未来生态:future ecosystem: 与 AI 生成图像(diffusion / GAN)同生态 · "AI 内容 + AI 压缩"垂直闭环

HEIC Live Photo — 苹果的图 + 视频混合容器

HEIC Live Photo — Apple's still + video twin container

YEAR 2015(iPhone 6s 首发) AUTHOR Apple EXT .heic + .mov(双胞胎) REAL 1 张 HEIC 静图 + 1 段 MOV(3 秒 H.264)+ UUID 关联 DURATION 前 1.5 秒 + 后 1.5 秒(无声)· 3-5 MB STATUS iOS 主流(iPhone / iPad)· 跨平台兼容差

"一张照片其实是一个 .heic + 一个 .mov 的双胞胎。"

"One 'photo' is actually a twin: one .heic and one .mov."

2015 年 9 月,Apple 在 iPhone 6s 上推出 Live Photo —— 拍照时同时录下前 1.5 秒 + 后 1.5 秒共 3 秒视频,让"静态照片"在长按时能"动一下"。技术上这不是一种新的图像格式,而是一个双文件容器思路:1 张 HEIC 静图(iOS 11 起 HEIC 取代 JPEG 成为默认)+ 1 段 MOV 视频(H.264 1080p 25fps 无声),通过 metadata 里的 asset identifier UUID 关联,Photos.app(iOS / macOS)把它们当一个对象呈现。这种"图 + 视频组合"是HEIC(基于 HEIF 的 ISOBMFF 容器)在工程层的延伸 —— HEIF 容器原生支持图像 + video track 共存(参见AVIF Sequence),但 Apple 选择不把它们装进同一个 HEIF 文件,而是分两个文件靠 UUID 维系。原因可能是 backward compatibility:老的 .heic / .mov 工具不需要为 Live Photo 改动,各自能独立打开。代价就是跨平台传输:AirDrop 给非 iOS 设备时,MOV 部分会丢失,接收方只看到一张静图。它是"关联式混合容器格式"在消费级场景的代表 —— 同思路 Google 的 Motion Photo(2017)、Samsung 的 Motion Photo 都是 .jpg 加内嵌 mp4,只是合在一个文件里。

In September 2015 Apple introduced Live Photo on the iPhone 6s — the camera simultaneously records 1.5 s before and 1.5 s after the shot, three seconds total, so a "still photo" can "move a little" on long-press. Technically it's not a new image format but a twin-file container idea: one HEIC still (HEIC replaced JPEG as the default starting iOS 11) + one MOV video (H.264 1080p 25 fps, no audio), linked via an asset identifier UUID in the metadata, with Photos.app (iOS / macOS) presenting them as one object. The "still + video pair" extends HEIC (an ISOBMFF-based HEIF container) at the engineering layer — HEIF natively supports image + video track in one file (see AVIF Sequence), but Apple chose not to pack them into a single HEIF file, instead splitting across two files held together by UUID. Probably for backward compatibility: legacy .heic / .mov tools needed no Live-Photo changes; each opens independently. The cost is cross-platform transfer — AirDrop to a non-iOS device drops the MOV; the receiver sees only the still. It's "linked-pair hybrid container" at consumer scale — Google Motion Photo (2017) and Samsung's equivalent take the same idea but pack the .jpg and the mp4 into a single file.

LIVE PHOTO · TWO FILES · ONE ASSET IMG_0001.HEIC 主静图 1 张 HEVC intra · 4:2:0 ~2-3 MB IMG_0001.MOV 3 秒 H.264 1080p 25 fps · 无声 ~3-5 MB UUID asset identifier metadata Photos.app · 视为一张 Live Photo
图 58 · Live Photo 的双文件结构。同一次拍摄产生两个文件:IMG_0001.HEIC(主静图,HEVC intra,2-3 MB)+ IMG_0001.MOV(3 秒 H.264 视频,1080p 25fps 无声,3-5 MB)。两者通过 metadata 里相同的 asset identifier UUID 关联(.heic 的 UUID box / .mov 的 com.apple.quicktime.content.identifier metadata),Photos.app(iOS / macOS)读两个文件的 UUID 一致就把它们当作一张 Live Photo 呈现。AirDrop 给非 iOS 设备时,MOV 不会被识别为关联资产,只有 HEIC 静图过去 —— 这是 Live Photo 跨平台兼容差的根因。
Fig 58 · Live Photo's twin-file structure. One capture creates two files: IMG_0001.HEIC (main still, HEVC intra, 2–3 MB) + IMG_0001.MOV (3-second H.264, 1080p 25 fps, audioless, 3–5 MB). They're linked via a shared asset identifier UUID in metadata (.heic's UUID box / .mov's com.apple.quicktime.content.identifier); Photos.app (iOS / macOS) sees the matching UUIDs and presents them as one Live Photo. AirDrop to a non-iOS device doesn't recognise the MOV as a linked asset — only the HEIC still travels — which is exactly why Live Photo's cross-platform compatibility is poor.

技术内核

Technical core

Live Photo 内核两件事。① 双文件 + UUID 关联:拍照那一刻 iPhone 同时存两个文件 —— IMG_xxxx.HEIC(默认 iOS 11+ 静图格式 · HEVC intra block 编码 · ~2-3 MB)+ IMG_xxxx.MOV(QuickTime 容器装 H.264 · 1080p 25fps · 无声 · 前 1.5 + 后 1.5 共 3 秒 · ~3-5 MB)。两者通过 metadata 里相同的 asset identifier UUID 关联:.heic 在 ISOBMFF 的 uuid box 里写,.mov 在 moov.meta.keys.com.apple.quicktime.content.identifier 里写。这套关联机制是 Apple 私有的,但 UUID 字段格式在 iOS Photos 框架里有公开 API。② Photos.app 把两个文件当一个对象:iOS / macOS 的 PhotoKit 框架在导入照片时检测到匹配的 UUID 就自动绑定,UI 层呈现一个图标(单张静图 + 长按播放视频),云端同步(iCloud Photos)也作为一个 asset 同步。第三方 app 想读 Live Photo 必须走 PhotoKit 的 PHAssetResource API —— 直接读两个文件 + 匹配 UUID 也行,但要自己实现绑定逻辑。AirDrop / iMessage 在 Apple 设备间能保留双文件;但跨平台(发到 Android / Windows)只发 HEIC,MOV 部分丢失 —— 这是"双文件容器"路线最大的代价。

Live Photo's core, two pieces. ① Twin files + UUID link: the iPhone stores two files at capture — IMG_xxxx.HEIC (default iOS 11+ still format · HEVC intra · ~2–3 MB) + IMG_xxxx.MOV (QuickTime container with H.264 · 1080p 25 fps · no audio · 1.5 s before + 1.5 s after = 3 s · ~3–5 MB). They're linked via a shared asset identifier UUID: the .heic writes it in an ISOBMFF uuid box; the .mov writes it under moov.meta.keys.com.apple.quicktime.content.identifier. The mechanism is Apple-private, but the UUID field format is exposed via public APIs in iOS's Photos framework. ② Photos.app treats them as one asset: iOS / macOS PhotoKit detects matching UUIDs on import and binds them automatically; the UI shows a single icon (still image, long-press to play video); iCloud Photos syncs them as one asset. Third-party apps wanting to read Live Photos must go through PhotoKit's PHAssetResource API — reading the two files directly and matching UUIDs works too, but you implement the binding yourself. AirDrop / iMessage between Apple devices preserves both files; cross-platform (to Android / Windows) only the HEIC goes, losing the MOV — the biggest cost of the "twin-file container" path.

适用

USE FOR

  • iPhone / iPad Live Photo 拍照(默认开启)
  • Apple 生态内分享(AirDrop / iMessage / iCloud)
  • iOS 锁屏 / 壁纸"动起来"效果
  • macOS Photos.app 浏览 / 编辑(导出可选静图 / GIF / video)
  • iPhone / iPad Live Photo capture (on by default)
  • Within Apple ecosystem (AirDrop / iMessage / iCloud)
  • iOS lock-screen / wallpaper "moving" effects
  • macOS Photos.app browsing / editing (export to still / GIF / video)

反适用

AVOID

  • 任何非 Apple 生态(Android / Windows / Web)
  • 跨平台分享(MOV 丢失,只剩静图)
  • 第三方 app 不走 PhotoKit 的话需自己处理 UUID 绑定
  • 需要"单文件即可"的纯静图场景(用 HEIC / JPEG)
  • Anything outside Apple's ecosystem (Android / Windows / Web)
  • Cross-platform sharing (MOV lost, still-only remains)
  • Third-party apps not on PhotoKit must handle UUID binding themselves
  • Pure stills where one file suffices (use HEIC / JPEG)
scopereaderseditorsCLI
HEIC Live Photo iOS Photos · macOS Photos · PhotoKit API · ~ 第三方 heif-tools(部分)· Web 浏览器无 iOS Photos · macOS Photos · 第三方 Live Photo 编辑 app(Lively · Motion Stills 已停) ~ exiftool 可读 UUID metadata · ffmpeg 处理 MOV 部分 · heif-info 读 HEIC
起源:origin: Apple · 2015 iPhone 6s 首发 · iOS 9 基于:based on: HEIC 静图 + QuickTime MOV 视频 · UUID metadata 关联 同思路:cousins: Google Motion Photo(2017)· Samsung Motion Photo · 单文件 vs 双文件之争 现实定位:real position: iOS 主流 · 跨平台兼容差 · 概念上启发了消费级"图 + 视频混合容器"

命令行 codec 一览

Command-line codec roster

这一节是查询表 —— 按"目标格式"找对应工具。所有命令均假设你已经装好对应工具(brew / apt 安装名见每行末尾)。命令风格各家不一,但参数语义大致互通:-q / --quality 控质量、-o / --output 给输出文件、-s / --speed 调编码速度(慢 = 小)。

A reference table — find the right tool by output format. Each row assumes you have installed the package (Homebrew / apt name at the end). Each codec uses its own flag dialect, but the semantics roughly converge: -q / --quality for quality, -o / --output for the output file, -s / --speed for the encoder speed (slower = smaller).

formatencoderdecodertypical commandinstall
JPEGcjpeg / mozjpeg / jpeglidjpegcjpeg -quality 85 -optimize in.ppm > out.jpglibjpeg-turbo
PNGoxipng / pngcrush / optipnglibpngoxipng -o6 in.pngoxipng
WebPcwebpdwebpcwebp -q 75 in.png -o out.webplibwebp
AVIFavifencavifdecavifenc -s 6 -a end-usage=q -a cq-level=23 in.png out.aviflibavif
JPEG XLcjxldjxlcjxl in.png out.jxl --quality 90libjxl
HEICheif-encheif-decheif-enc -q 60 in.png -o out.heiclibheif
GIFgifsicle / convertgifsiclegifsicle --colors 256 -O3 in.gif > out.gifgifsicle
BC1-7 (DDS)nvtt_export / texconv / ispc_texcompD3D / OpenGL nativenvtt_export --bc7 in.png -o out.ddsnvtt
ASTCastcencastcdec / GPUastcenc -cl in.png out.astc 6x6 -mediumastcenc
KTX2toktxktxinfotoktx --bcmp 7 out.ktx2 in.pngKTX-Software
Basisbasisubasisubasisu -ktx2 in.png -output_file out.ktx2basis_universal
OpenEXRoiiotool / exrtoolsOpenImageIOoiiotool in.png -o out.exrOpenImageIO
TIFFlibtiff / convertlibtiffconvert in.png -compress lzw out.tiflibtiff
RAWdcraw / LibRaw / rawtherapee-clidcraw -v -w in.NEFlibraw
DICOMdcmconvdcmdump / dcm2pnmdcm2pnm in.dcm out.pnmdcmtk
SVGsvgo / inkscape / resvgresvg / browsersvgo in.svg -o out.svgsvgo
FITSastropy / cfitsioastropy / ds9python -c "from astropy.io import fits; ..."astropy
genericlibvips / ImageMagicksamevips copy in.png out.avif[Q=60]libvips

几个值得知道的事实

A few worth-knowing facts

  • libvips 是隐藏的性能王者 —— 大批量处理比 ImageMagick 快 5-10×、内存占用低 10×。所有需要"批量转码 100k+ 张图"的场景都应该首选 vips。
  • jpegli 是 Google 2024 年从 libjxl 仓库剥出来的"现代 JPEG 编码器" —— 同 quality 比 mozjpeg 体积小约 35%,而且产出仍是合法 JPEG,所有 JPEG 解码器都能读。
  • oxipng 是 Rust 写的 PNG 重压缩器 —— 比 pngcrush 快 5-10×,体积稍小;oxipng -o6 是大多数项目的默认配方。
  • Squoosh CLI(npm i -g @squoosh/cli)是浏览器中 squoosh.app 的命令行版本 —— 一个 Node 包搞定 AVIF / WebP / JXL / mozjpeg / oxipng,适合 CI 流水线。
  • libvips is the hidden performance king — bulk pipelines run 5-10× faster than ImageMagick at one-tenth the memory. Any "batch-convert 100 k images" job should reach for vips first.
  • jpegli, spun out of the libjxl repo by Google in 2024, is the "modern JPEG encoder" — about 35% smaller than mozjpeg at the same quality, and the output is still legal JPEG that every decoder can read.
  • oxipng is a Rust PNG re-packer — 5-10× faster than pngcrush and slightly smaller; oxipng -o6 is the default recipe for most projects.
  • Squoosh CLI (npm i -g @squoosh/cli) is the command-line cousin of the browser-based squoosh.app — one Node package wraps AVIF / WebP / JXL / mozjpeg / oxipng, ideal for CI pipelines.

DevTools 看响应头与解码时间

DevTools — response headers & decode time

浏览器选择哪种格式不是玄学,而是三个 HTTP 头 + 一段 JS 解码任务共同决定的。Network 面板看 Accept(请求时浏览器宣告支持哪些格式)、Content-Type(响应里服务器实际返回什么)、Content-Length(字节数)三个头,这是picture / source 协商的全部凭证。Performance 面板里"Decode Image"任务才是真实的代价 —— AVIF 比 JPEG 慢 3 倍、JXL 又比 AVIF 快 2 倍,这些差异在 4G 慢网下会被字节数掩盖,在 5G 快网或本地 CDN 下却开始主导首屏渲染时间。

Which format the browser picks is not magic — it's decided by three HTTP headers plus a JS decode task. The Network panel shows Accept (the browser announces what it supports), Content-Type (what the server actually returns), and Content-Length (the byte count). These three are the entire vocabulary of picture / source negotiation. The Performance panel's "Decode Image" task is the real cost — AVIF decodes about 3× slower than JPEG, JXL about 2× faster than AVIF; over slow 4G the byte savings dominate, but on 5G or a near CDN, decode time starts to set first-paint.

Network 面板

Network panel

DevTools · Network · hero.avif ▸ Request Headers GET /img/hero.avif HTTP/2 Host: ursb.me Accept: image/avif,image/webp, image/png,image/*,*/*;q=0.8 Accept-Encoding: gzip, br Sec-Ch-UA: "Chromium";v="120" ▾ Response Headers HTTP/2 200 OK Content-Type: image/avif Content-Length: 12345 Cache-Control: max-age=31536000 ETag: "9af3...c2" Vary: Accept 三段对协商起决定作用 · the three lines that drive negotiation ① Accept 列出浏览器支持的 mime · what the browser accepts ② Content-Type 是服务器最终选的 · what the server picked ③ Vary: Accept 告诉 CDN "同 URL 按 Accept 缓存多份" · per-Accept caching
图 a · DevTools Network 面板某图片请求。Accept(请求头)+ Content-Type(响应头)+ Vary: Accept(响应头)三段共同构成"按浏览器选格式"的协商凭证。CDN 上一定要设 Vary: Accept,否则 Chrome 拿到 AVIF 后,Safari 也会拿到同一份 AVIF —— 然后解码失败。
Fig a · A typical image request in DevTools Network. Accept (request) + Content-Type (response) + Vary: Accept (response) form the negotiation contract. CDNs must set Vary: Accept, otherwise Chrome's AVIF cache will be served to Safari, which then fails to decode.

Performance 面板 — Decode 任务

Performance panel — decode task

DevTools · Performance · main thread 0 ms 10 20 30 40 JPEG P Decode 5 ms WebP P Decode 8 ms AVIF P Decode 18 ms · slow JXL P Decode 7 ms · fast P = Parse · 数值仅示意 1080p 单图,实际机型 / 实现差异极大
图 b · Performance 面板时间线模拟。同一张 1080p 图,JPEG 解码 ~5 ms、WebP ~8 ms、AVIF ~18 ms、JXL ~7 ms —— AVIF 是当前主流格式里解码最慢的,因为 AV1 帧内解码本来就比 VP8 / JPEG 重 2-3×。在 5G 快网下,AVIF 节省的 30% 字节可能被慢解码吃掉,这也是 picture/source 协商策略要"看场景挑选"的根因。
Fig b · A simulated Performance timeline. Same 1080p image: JPEG ~5 ms, WebP ~8 ms, AVIF ~18 ms, JXL ~7 ms — AVIF is the slowest decoder among today's mainstream formats because AV1 intra decoding is intrinsically 2-3× heavier than VP8 / JPEG. On fast 5G networks, AVIF's 30% byte savings can be eaten by the slow decode — exactly why picture/source negotiation is a per-scenario choice, not a one-size-fits-all rule.

picture + source fallback 链

picture + source fallback chain

<picture> source · AVIF type="image/avif" Chrome / Safari 16+ source · WebP type="image/webp" Chrome / FF / Edge img · JPEG 默认兜底 / fallback 所有浏览器 · all 浏览器自顶向下匹配 · first match wins · 不再下载后续 source browser scans top-down · first supported type stops the search
图 c · <picture> + 多 <source> 的 fallback 链是树状匹配。浏览器自顶向下扫描,遇到第一个 type 自己支持的就停下,后面的 source 完全不下载。所以"AVIF → WebP → JPEG"的顺序很重要 —— 反过来写 JPEG 永远赢,AVIF 永远没机会。
Fig c · <picture> with multiple <source> tags is a tree-shaped match. The browser scans top-down and stops at the first type it supports — every later source is never fetched. Order matters: "AVIF → WebP → JPEG" is correct; reversed, JPEG always wins and AVIF never gets a chance.

同图横评 — 解码时间条形图

Same image — decode time bars

同一张 4K 图 · same 4K image · decode time (ms · M1 Mac · main thread) 0 15 30 45 ms JPEG 12 ms WebP 16 ms AVIF 38 ms · slowest JXL 14 ms · fastest of the three new ↑ AVIF 解码量是 JPEG 3× · 字节小 30%(慢解码可能吃掉字节红利)
图 d · 同图 4K · 解码时间条形图。AVIF 字节最小,但解码最慢;JXL 字节也小、解码却比 WebP 还快 —— 这正是 JXL 在"现代格式"中独特的位置。本图基于 M1 Mac · 主线程 · 单核 · 仅作量级示意,真实数据按机型 / 编码参数浮动 ±50%。
Fig d · Same 4K image, decode time bars. AVIF wins on bytes but loses on decode; JXL is small and faster than WebP — exactly why JXL occupies its unique slot among the "modern" trio. Numbers measured on M1 Mac, main thread, single core; treat as orders of magnitude — real numbers shift ±50% with hardware and encoder settings.

把"Accept 头 + Content-Type 响应 + Vary: Accept 缓存指令"理解透,你就抓住了"为什么这台浏览器收到 AVIF、那台收到 JPEG"的全部机理。把 Performance 面板里的 Decode Image 长度看习惯,你就知道"是不是该用 AVIF"不只是字节问题,而是字节 ÷ 解码时间的比值问题。

Understand the trio "Accept request + Content-Type response + Vary: Accept cache directive" and you have the full mechanism for "why this browser got AVIF and that one got JPEG." Get used to reading Decode Image durations in the Performance panel, and "should I serve AVIF" stops being a byte question and becomes a bytes ÷ decode-time ratio question.

libvips vs ImageMagick — 性能对比表

libvips vs ImageMagick — performance comparison

两个最常见的"通用图像处理库",设计目标完全相反 —— ImageMagick 1990 年代生于 Unix 工具传统,先把整张图解码到内存再做操作,简单直接;libvips 1990 年代末由 VIPS 项目演化而来,核心思路是streaming pipeline:像素一行一行流过处理链,从不把整张图加载进内存。这条架构差异在批量处理场景被放大成 5-10× 的速度差和 10× 的内存差,直接决定了它们各自的最佳战场。

The two most common general-purpose image-processing libraries are designed for opposite goals. ImageMagick, born of 1990s Unix tools, decodes the whole image into memory and operates on it — simple and direct. libvips evolved from the VIPS project of the late 1990s and is built on a streaming pipeline: pixels flow through the chain row by row, the full image never loads into RAM. That single architectural choice expands into 5-10× speed and 10× memory differences at scale — and that decides which library belongs where.

metriclibvipsImageMagick
设计 / designstreaming + parallel(pthread)full-load(全图入内存)
100 张 4K → JPEG 时间~8 s~60 s
100 张 4K → AVIF 时间~120 s~600 s
峰值内存 / peak RAM~50 MB~500 MB
命令行 / CLIvips copy in.png out.jpg[Q=85]convert in.png -quality 85 out.jpg
学习曲线 / learning curve中(API 风格独特)低(命令名好记)
format coverage常用 + 现代(AVIF / WebP / JXL)250+ 格式(含老格式 / 罕见容器)
典型用法 / typical use批量服务 · 高吞吐 thumbnail单张定制 · 复杂滤镜 · 老格式恢复

数据基于 libvips 官方 benchmark + 社区验证,实测值因机型 / 任务类型浮动 ±30%。结论是稳定的:需要批量、需要省内存、需要快 用 libvips;需要冷门格式、需要复杂滤镜、单次任务 用 ImageMagick。

Numbers are taken from the libvips official benchmark plus community runs; real values shift ±30% by hardware and task type. The takeaway is robust: pick libvips when you need bulk, low memory, high speed; pick ImageMagick when you need rare formats, complex filters, or one-off jobs.

两种内存模型示意

Two memory models

libvips · streaming pipeline input .png resize row × n sharpen row × n encode avif output .avif ~50 MB peak · 像素行接力 · pixels relay row-by-row ImageMagick · full-load input .png all pixels in memory decode → resize → sharpen → encode ~500 MB peak · 全图驻留 output .avif
图 e · 两种内存模型对比。libvips(上)是 streaming pipeline,像素一行一行接力穿过处理节点,峰值内存 ~50 MB;ImageMagick(下)是 full-load,先把整张 4K 图解到内存(~500 MB),所有处理就地完成。两种模型在"单张 100×100 缩略图"上看不出差,但在"批量 100k 张 4K → AVIF"任务里,libvips 用 1/10 的内存跑出 5× 的速度。
Fig e · Two memory models. libvips (top) is a streaming pipeline — pixels relay through processing nodes row by row at a ~50 MB peak. ImageMagick (bottom) is full-load — the whole 4K image decompresses into RAM (~500 MB), all operations happen in place. The difference is invisible on a single 100×100 thumbnail; on a 100 k batch of 4K → AVIF, libvips uses one-tenth the memory at 5× the speed.

一句记忆口诀:"vips 流水、IM 大屋" —— vips 像生产线传送带,材料(像素行)源源不断流过工位;ImageMagick 像把所有材料堆进一个大房间再一起加工。两种思路都对,只是适合不同规模。

A mnemonic: "vips is the conveyor, IM is the warehouse" — vips moves rows past stations like an assembly line; ImageMagick piles everything into one big room and processes in place. Both philosophies work; they just fit different scales.

「我应该用哪个格式」决策树

"Which format should I use" — decision tree

本文走完 50+ 格式之后,最常被问到的还是这一句:"那我到底用哪个?"答案分两层:第一层是用途场景 —— 屏幕显示、GPU 纹理、HDR 影视、科学医学,用途不同候选集就完全不同;第二层是具体约束 —— 兼容老浏览器吗?要透明通道吗?是 16-bit 工程影像吗?这张决策树是出发点,不是教条 —— 真实工程里你可能因为某个客户的 IT 政策只能用 JPEG,或因为某个 GPU 不支持 BC7 而退回 BC1,这些场景决策树之外。

After 50+ formats, the question we still get most is: "OK, so which one do I use?" The answer has two layers. Layer one is use case — screen display, GPU texture, HDR film, science / medicine; different domains, totally different shortlists. Layer two is specific constraints — must support legacy browsers? need alpha? 16-bit engineering imagery? This tree is a starting point, not gospel — real projects sometimes pin you to JPEG for IT-policy reasons, or fall back from BC7 to BC1 because of a target GPU; those edge cases live outside the tree.

我要存什么图? what kind of image? 屏幕显示 screen / web GPU 纹理 GPU texture HDR / 工程 HDR / engineering 科学 / 医学 science / medical 照片 photo 图标 icon / logo 动图 animation AVIF JXL 现代 JPEG 兼容 支持新? 老浏览器? SVG 矢量 PNG 位图 几何形? 需 raster? AVIF anim WebP anim 非 GIF 桌面 GPU desktop 移动 GPU mobile BC7 DDS / D3D ASTC 6×6 iOS / Android KTX2 / Basis cross-platform container 影视 VFX film VFX HDR 摄影 HDR photo EXR half-float RAW DNG / NEF 医学 medical 天文 astronomy DICOM 病例标准 FITS 天文标准 边上的小字是「决策条件」 · annotations on edges are decision questions 叶节点颜色 = 阵营色 · 蓝 Web · 紫 GPU · 金 HDR · 青 Vec · 棕 RT · 深蓝 Sci · 粉 Nx leaf colour = phase colour · blue Web · purple GPU · gold HDR · teal Vec · brown RT · navy Sci · pink Nx 几个常见的「附加约束」 · common extra constraints 需要 alpha? → PNG / WebP / AVIF / TIFF ✓ → JPEG / BC1 ✗ 需要 lossless? → PNG / TIFF / JXL / WebP-LL ✓ → JPEG / AVIF(可,但少用) ~ 需要 GPU 直采? → BC7 / ASTC / KTX2 ✓ → JPEG / PNG / AVIF ✗(必须 CPU 解码) 需要 16-bit/通道? → TIFF-16 / PNG-16 / EXR ✓ → JPEG / WebP-8 ✗ 几个经典「错配」反例 · classic mismatches 用 GIF 存照片 → 256 色,色斑 + 体积大 photo → GIF: 256-colour banding, huge file 用 PNG 存照片 → 体积比 JPEG 大 5-10× photo → PNG: 5-10× larger than JPEG 用 JPEG 当 GPU 纹理 → 跑不动 / 必须 CPU 解 JPEG → texture: needs full CPU decode first 用 8-bit JPEG 做 HDR 后期 → bit depth 不够,色阶断层 8-bit JPEG → HDR grading: not enough bit depth, banding 用 PNG 当矢量 logo → 缩放糊掉,应用 SVG 才对 PNG → vector logo: blurs on scale, use SVG

树根问"用途",叶子才到具体格式。中间几跳问的是"老浏览器要不要兜底""有没有透明""是不是 16-bit"。同一片叶子(比如"屏幕显示 · 照片"),最终选 AVIF 还是 JPEG,取决于客户群是不是全在 Safari 16+。这棵树没有"绝对正确",只有"在你的约束下,谁先 +谁兜底"。

The root asks "what for"; only the leaves name a format. The middle hops ask "do legacy browsers need a fallback?" "is there alpha?" "is it 16-bit?". On the same leaf — say "screen · photo" — choosing AVIF over JPEG depends entirely on whether your audience is all on Safari 16+. The tree has no absolute right answer; only "given your constraints, what's the primary and what's the fallback?"

四个职业场景的"开箱组合"

Four professional starter kits

这四个组合不是"最优解",只是"开箱推荐"。真实工程里你会遇到甲方只接受 JPEG、Unity 强制要求 ASTC LDR、医院 PACS 系统只识别 DICOM 1995 子集 —— 这些约束才是决策树之外的真正变量。但当约束消失时,这四套组合是 2026 年最不出错的起点。

These four kits aren't "optimal" — they're "out-of-the-box recommendations." Real projects hit constraints — clients accepting only JPEG, Unity demanding ASTC LDR, a hospital PACS that only reads a 1995 DICOM subset — and those constraints are the real variables outside the tree. But when constraints recede, these four kits are the safest 2026 starting points.

像素的归宿

The fate of the pixel

它出生在一颗 CMOS sensor 的硅井底部 —— 一个 14-bit 的电荷,被 ADC 抬上数字总线,被相机固件写进一个叫 .ARW 的 RAW 文件,sleep 在某张 SD 卡的 NAND 块上六个月没人看。它当时还不是"像素",它只是一个电压样本,一个 16384 之中的整数,带着读出噪声和热噪声,带着一行 EXIF 和一段 ICC profile 等待被解释。

It was born at the bottom of a CMOS sensor's silicon well — a 14-bit charge, lifted onto a digital bus by an ADC, written by camera firmware into a .ARW RAW file, sleeping in the NAND of some SD card for six months with no one looking. It wasn't a "pixel" yet — just a voltage sample, an integer out of 16384, carrying read noise and thermal noise, a line of EXIF, and an ICC profile waiting to be interpreted.

六个月后它被 LibRaw 解码成 16-bit linear,被 Lightroom 调色,被导出成 16-bit TIFF 进 Photoshop 修瑕,被另存为 sRGB JPEG 上传朋友圈,又被同一张图压成 AVIF 上博客 hero,被 Cloudflare CDN 缓存到全球 200 个边缘节点,被一万个浏览器在一分钟内同时解码,在某些 WebGL 场景里它被上传到 GPU 显存压成 BC7 块,被 fragment shader 采样过 12 次,被 ICC profile 从 sRGB 映到 Display P3,被 mipmap 选了 LOD 2,被 trilinear filter 平滑掉了高频。它有时是 24 bit,有时是 8 bit,有时是 4 bit/pixel,有时是浮点。它一直在变形。

Six months later LibRaw decodes it into 16-bit linear, Lightroom grades it, it exports as 16-bit TIFF into Photoshop for retouching, saves as sRGB JPEG to a social feed, gets re-compressed as AVIF for a blog hero, lives in Cloudflare's CDN across 200 edge nodes, decodes simultaneously in ten thousand browsers within a minute, gets uploaded to GPU memory as a BC7 block in some WebGL scene, is sampled 12 times by a fragment shader, gets remapped from sRGB to Display P3 by an ICC profile, picks LOD 2 from a mipmap chain, gets smoothed by a trilinear filter. Sometimes it's 24 bits, sometimes 8, sometimes 4 bits per pixel, sometimes floating point. It never stops changing shape.

birth RAW edit EXR compress AVIF transmit CDN decode RGB VRAM BC7 sample LOD screen photon HDR WEB GPU SCREEN
图 9.1像素的 8 站旅程,这次不是预告,是回顾。
Fig 9.1The pixel's 8-stop journey — this time as recap, not preview.

它最后变成了屏幕上一个发光的小方块。它当过 RAW、当过 AVIF、当过 BC7、当过显存、当过电压、当过光子。每一段旅程都给它换了一个容器,但它一直是同一颗像素 —— 一个被反复翻译、反复重写、反复压缩、反复采样,却始终保留某种"原意"的微小信号。

It ends as a glowing square on a screen. It has been a RAW, an AVIF, a BC7 block, GPU memory, a voltage, a photon. Every leg of the journey gave it a different container — but it stayed the same pixel: a tiny signal repeatedly translated, rewritten, compressed, and sampled, somehow holding onto its original meaning through every transform.

三个反直觉结论

Three counter-intuitive takeaways

沉淀这五十多种格式之后,有三件事是写完之前没意识到的。

After settling fifty-plus formats into this codex, three things surprised me — none of which I expected before writing.

01
最古老的格式不一定最差。
The oldest format isn't always the worst.

QOI(2021)的 spec 一页 A4 写得下,实现 300 行 C 代码,比 PNG(1996)简单 100×,却只慢 PNG 3-4× / 大 5-10%。Farbfeld(2014)更激进 —— 干脆不压缩,只做"标头 + 像素"。PCX(1985)的 RLE 在纯色场景甚至比 PNG 还小。简洁是一种持久的设计姿态,不是历史遗物 —— 当 GIF 还在被使用、BMP 还在 Windows 剪贴板里跑、JPEG 仍占 web 图像 60%,你会发现"老"和"差"是两个独立维度。

QOI's spec (2021) fits on one A4 page; the reference implementation is 300 lines of C — 100× simpler than PNG (1996), only 3-4× slower and 5-10% larger. Farbfeld (2014) goes further — no compression at all, just "header + pixels." PCX's RLE (1985) beats PNG on flat-color art. Simplicity is a durable design posture, not a relic — when GIF is still in use, BMP still drives the Windows clipboard, and JPEG still serves 60% of the web, "old" and "bad" turn out to be independent axes.

02
ASTC 6×6 比 BC7 4×4 看似激进,实际几乎没视觉差。
ASTC 6×6 looks aggressive next to BC7 4×4 — but the visual gap is invisible.

直觉上块越大越糊,但 ASTC 6×6(2.22 bpp)对比 BC7 4×4(8 bpp)显存少 2.25×,ΔPSNR 只有不到 1 dB,SSIM 差异在双盲测里几乎不可分辨。移动游戏开发者一致默认 ASTC 6×6 是甜点;Unity 的 mobile preset 直接以 6×6 为默认。压缩比的甜点不在 4×4,而在让 GPU 缓存命中率最大化的那个块大小 —— 4×4 太奢侈,8×8 太糊,6×6 恰好。

Intuition says bigger blocks blur more — but ASTC 6×6 (2.22 bpp) versus BC7 4×4 (8 bpp) is 2.25× less VRAM with under 1 dB of PSNR loss; SSIM differences are essentially invisible in blind tests. Mobile game devs converge on ASTC 6×6 as the sweet spot; Unity's mobile preset defaults to it. The sweet spot of texture compression isn't 4×4 — it's whichever block size maximizes GPU cache hit rate. 4×4 is luxury, 8×8 is mush, 6×6 is just right.

03
AVIF 战胜 JXL 不是技术问题,是政治问题。
AVIF didn't out-engineer JXL — it out-politicked it.

JXL 在所有维度都赢:HDR、lossless、JPEG 无损 transcode、渐进式解码、解码速度。但 Chrome 团队在 2022-10 以"业界兴趣不足"为由从 Chromium 砍掉 flag,理由是 AVIF 已经够用 —— 而 AVIF 背后是 AOMedia(Google + Netflix + Amazon + 多家芯片厂的联盟)。技术从不单独决定胜负,生态决定。同样的故事在 WebP vs JPEG2000、HEIC vs JXR、Opus vs Vorbis 反复上演;能写进一个 Chromium 的 if-branch,胜过所有 paper 上的曲线。

JXL wins on every axis: HDR, lossless, lossless JPEG transcode, progressive decoding, decode speed. But in October 2022 the Chrome team pulled the flag from Chromium citing "insufficient ecosystem interest" — AVIF, they argued, was enough. And AVIF stands on AOMedia (Google + Netflix + Amazon + chip vendors). Technology rarely decides alone — ecosystems do. The same story replays in WebP vs JPEG2000, HEIC vs JXR, Opus vs Vorbis: shipping inside one Chromium if-branch beats every curve in every paper.

三个发现共享一个底色:格式不是被它的技术指标决定的,是被使用它的人决定的。

All three share the same undertone: a format isn't decided by its technical merits — it's decided by the people who use it.

参考与扩展阅读

References & further reading

本文写作的关键依据。按章节分组,数据 / 引用全部来自公开来源。如发现错漏,欢迎邮件指正。

Sources this article relies on, grouped by phase. All data and quotations are drawn from public references — corrections welcome by email.

Phase I · Web 显示派

Phase I · Web display

  1. RFC 2083 — PNG (Portable Network Graphics) Specification, 1997
  2. ISO/IEC 15948 — PNG Specification (Second Edition), 2003
  3. ISO/IEC 10918-1 — JPEG, 1992
  4. RFC 1951 — DEFLATE Compressed Data Format Specification
  5. RFC 9082 — AVIF Image File Format
  6. ISO/IEC 23008-12 — HEIF (Image File Format)
  7. AOMedia AV1 Bitstream & Decoding Process Specification, v1.0.0-errata1
  8. libwebp documentation — developers.google.com/speed/webp
  9. mozjpeg — github.com/mozilla/mozjpeg
  10. Squoosh source — github.com/GoogleChromeLabs/squoosh
  11. Cloudflare blog — "Generating WebP, AVIF and JPEG XL all at once"
  12. Jon Sneyers — "The case for JPEG XL", Cloudinary blog (2021)
  13. Chrome JXL removal — bugs.chromium.org/p/chromium/issues/detail?id=1178058
  14. Smashing Magazine — "Comparing JPEG-XL, AVIF, WebP & JPEG" (2022)

Phase II · GPU 纹理派

Phase II · GPU textures

  1. Khronos KTX 2.0 specification — registry.khronos.org/KTX/specs/2.0/ktxspec_v2.html
  2. Basis Universal — github.com/BinomialLLC/basis_universal
  3. ARM ASTC specification — developer.arm.com/documentation/100672
  4. D3D11 BC1-BC7 specification — Microsoft Docs (Direct3D 11 texture block compression)
  5. Intel ISPCTextureCompressor — github.com/GameTechDev/ISPCTextureCompressor
  6. Lance Williams — "Pyramidal Parametrics", SIGGRAPH (1983)
  7. OpenGL ES 3.0 / 3.2 specification — Khronos Group
  8. NVIDIA Texture Tools — github.com/NVIDIAGameWorks/NVIDIATextureTools
  9. Iourcha, Nayak & Hong — "System and method for fixed-rate block-based image compression with inferred pixel values" (S3TC, 1999)

Phase III · HDR / 工程影像

Phase III · HDR / engineering imaging

  1. OpenEXR — openexr.com (Academy Software Foundation)
  2. Greg Ward — "Real Pixels", Graphics Gems II (1991, RGBE format)
  3. Adobe DNG specification 1.7 — helpx.adobe.com/camera-raw/digital-negative.html
  4. LibRaw documentation — libraw.org
  5. NEMA DICOM Standard PS 3.x — dicomstandard.org
  6. TIFF 6.0 specification — Adobe (1992)
  7. SMPTE ST 268 — DPX File Format for Digital Moving-Picture Exchange
  8. Dave Coffin's dcraw — cybercom.net/~dcoffin/dcraw/
  9. OpenColorIO — opencolorio.org
  10. ITU-R BT.2100 — Image parameter values for HDR television
  11. SMPTE ST 2084 — Perceptual Quantizer (PQ) transfer function

Phase IV · 矢量 / 文档

Phase IV · Vector / document

  1. W3C SVG 1.1 / SVG 2 Recommendation — w3.org/TR/SVG2/
  2. ISO 32000-1 / -2 — Document management — Portable Document Format (PDF)
  3. ITU-T T.88 — JBIG2 (Joint Bi-level Image experts Group)
  4. Adobe PostScript Language Reference Manual, 3rd ed. (1999)
  5. Lottie — airbnb.design/lottie / lottiefiles.com
  6. Encapsulated PostScript File Format Specification, Adobe v3.0
  7. WMF / EMF — Microsoft Open Specifications [MS-WMF], [MS-EMF]

Phase V · 复古 / 怪格式

Phase V · Retro / oddities

  1. QOI specification — qoiformat.org / github.com/phoboslab/qoi
  2. Farbfeld — tools.suckless.org/farbfeld/
  3. NetPBM (PBM/PGM/PPM) — netpbm.sourceforge.net
  4. EA IFF '85 specification — Jerry Morrison, Electronic Arts (1985)
  5. Truevision TGA File Format Specification, v2.0 (1989)
  6. ZSoft PCX Technical Reference Manual (1988)
  7. BMP / DIB structure — Microsoft Docs (Win32 GDI)
  8. XPM — X PixMap format, X.Org reference

Phase VI · 卫星 / 科学

Phase VI · Satellite / science

  1. FITS Standard 4.0 — fits.gsfc.nasa.gov/fits_standard.html
  2. OGC GeoTIFF 1.1 specification — ogc.org/standard/geotiff/
  3. NITF MIL-STD-2500C — National Imagery Transmission Format
  4. astropy — astropy.org
  5. GDAL — gdal.org (Geospatial Data Abstraction Library)
  6. Cloud-Optimized GeoTIFF (COG) — cogeo.org
  7. Zarr — zarr.dev (chunked, compressed N-dimensional arrays)

Phase VII · 神经压缩 / 未来

Phase VII · Neural / future

  1. Toderici et al. — "Variable Rate Image Compression with Recurrent Neural Networks", ICLR 2016
  2. Ballé, Minnen et al. — "Variational Image Compression with a Scale Hyperprior", ICLR 2018
  3. Mentzer, Toderici et al. — "High-Fidelity Generative Image Compression" (HiFiC), NeurIPS 2020
  4. Yang, Mandt — "Lossy Image Compression with Conditional Diffusion Models" (CDC), NeurIPS 2023
  5. CompressAI — github.com/InterDigitalInc/CompressAI
  6. ISO/IEC 21122 — JPEG XS (low-latency lightweight image coding)
  7. WebP2 — chromium.googlesource.com/codecs/libwebp2 (experimental)
  8. JPEG AI — Call for Proposals, ISO/IEC JTC 1/SC 29/WG 1 (2022)

综合 / 工具

General / tools

  1. libvips documentation — libvips.github.io/libvips/
  2. ImageMagick documentation — imagemagick.org
  3. OpenImageIO — openimageio.readthedocs.io
  4. David Salomon — "Data Compression: The Complete Reference", 4th ed. (Springer)
  5. Khalid Sayood — "Introduction to Data Compression", 5th ed.
  6. Charles Poynton — "Digital Video and HD: Algorithms and Interfaces", 2nd ed.

合计 ~70 条参考,覆盖 8 组。完整列表可视为这条沉积带的"地层钻孔",每一层都能往下挖。

About 70 references in total across 8 groups. Treat the list as a borehole through this sedimentary band — every stratum can be dug deeper.

✦ ✦ ✦
阅读Reads

留下评论Leave a comment

评论Comments

加载中…Loading…