The Life of an LLM Inference — A Prompt's 28 Stops Inside llama.cpp
After you press enter on "hello, llama", those 5 tokens travel 28 stops inside llama.cpp before they come back as an answer — the tokenizer splits bytes into BPE ids, the embedding table fetches 4096-dim vectors, the KV cache decides the memory ceiling of the whole run, attention fuses Q·Kᵀ through FlashAttention's online softmax into a single SRAM kernel, the MoE router lights up only 8 of 256 experts, MLA collapses the entire KV table into latent form, quantisation squeezes a 16 GB model into 5 GB, speculative decoding shaves 60% off decode time, continuous batching keeps the GPU busy between requests, TP/PP/EP splits 405B across 8 cards, GBNF forces output to obey a JSON schema, the vision encoder turns an image into 256 tokens, reasoning models "*think*" 10000 internal tokens, Blackwell fp4 fits 405B into a single node, and finally SSE pushes every character one frame at a time to the user's screen. Every stop maps to a real file and a real function in llama.cpp / vLLM, read line by line.
The Life of a Stylesheet — Inside Chromium's CSS Engine
A 4-line stylesheet has to traverse 9 stages, hit 3 index trees, and participate in an 8-step cascade before one pixel on screen turns oklch blue. Every step unfolded — Blink source-line citations, CSSOM field layouts, cascade rules, container queries / @layer / :has() internals.
The Life of a TLS Handshake — TLS 1.3 Protocol in Full
A client fires off a 538-byte ClientHello stuffed with key agreement, version negotiation, ALPN, SNI, and a 0-RTT invitation. One round-trip later, HKDF grows eight keys out of a single root seed, the certificate chain is verified, the handshake self-attests via HMAC, and AEAD takes over every application byte — all in 1 RTT. A byte-level field manual for TLS 1.3, from ClientHello to ECH to post-quantum KEM.
Eight Translations of One Dispatch — A WebGPU Stack Source-Level Walkthrough
One `pass.dispatchWorkgroups()` call traverses eight translations before it reaches GPU silicon — JS → Blink WebIDL → Renderer Wire → Mojo IPC → Dawn validation → Tint(WGSL→SPIR-V/MSL/HLSL) → Metal/D3D12/Vulkan → driver ISA. A 25-chapter vertical dissection of the full WebGPU stack — Dawn + wgpu side by side, real source, real latencies, threaded by a compute matmul.
Many Ways to Die — A Family Map of 11 Garbage Collectors
The same line `list = null` frees immediately in CPython, waits for the next minor GC in V8, completes sub-ms concurrently in ZGC, dies only when the actor dies in Erlang, and was dropped at compile time in Rust. A cross-language tour of garbage collection across 20+ runtimes, organised by algorithm family, threaded by one main-line program.
The Life of One JS Line — A QuickJS Source-Level Walkthrough
How 70 000 lines of C turn one line of JS into [2,4,6]. A source-level walkthrough of QuickJS — lexer, bytecode, the 3000-line interpreter loop, refcount GC — with every step compared against V8 / JSC / SpiderMonkey / Hermes.
From Rust to SIMD — The Life of WebAssembly
11 lines of Rust convolution from cargo build have to cross seventeen stages, two tiers of JIT, and 4 GiB of linear memory before they can light up one SIMD-vectorised pixel on screen. Every byte, every IR form, every machine instruction, every W3C / IETF spec citation — fully unpacked.
The Life of a Request — A Field Map of HTTP/3
A single GET has to cross 13 stages on top of UDP, four cryptographic levels and three streams before it can land a 200 OK. Wire formats, keys, timelines and source paths for QUIC, HTTP/3 and QPACK.
Sediment of Pixels — A Codex of 50+ Image Formats
From 1985 BMP to 2026 neural codecs, from screen pixels to GPU memory, from medical CT to astronomical FITS — a hand-drawn codex of image formats. 50+ formats, 7 families, 67 chapters.
Bytecode to Pixels — A Field Map of Chromium's Rendering Pipeline
A stream of bytes from the network must cross 13 stages, 3 processes and 4 property trees before it can light a single pixel. Source paths, class hierarchies, algorithm skeletons and diagrams for every stage.
JavaScript at the Limit — V8 Internals and One Hot Function's Road to 10×
How to diagnose a slow piece of JS, locate its bottleneck, and rewrite it ten times faster using V8 internals as a guide. A 20-chapter methodology — one real px2rem function takes the trip from ~240ms to ~24ms / 1M iters, each cut mapping to a concrete V8 mechanism.
Measuring «Smoothness» — From FrameTime to Stutter
A discriminator language for the experience of lag. From FrameTime to Stutter — why high frame rates can still feel choppy, and what the machine sees the moment your eyes say it is stuttering.
How to Build a High-Performance Mini-Game Container — The Evolution & Tech Behind Helio
One year — from kickoff in 2023 to surpassing WebView in 2024. The complete engineering retrospective for a cross-app mini-game container, with 44 hand-built visualizations.
Claude 101: An Interactive Tour of Claude Code from the AI's Point of View
Claude 101 is an interactive learning site that uses 16 deep chapters, annotated real source code, and animated visualizations to teach you Claude Code's complete architecture — from the AI's first-person perspective.
Claude Code's Memory System, Explained
A deep dive into Claude Code's Memory system — its 5-layer architecture, 4 memory types, write/retrieve/delete flows, and the three-layer safety mechanism.
A Visual Guide to OpenHarness: Recreating Claude Code's Agent Architecture in 3% of the Code
OpenHarness is an open-source project from HKU's HKUDS group that recreates Claude Code's Agent Harness architecture in pure Python — 98% tool coverage in just 11,733 lines of code.
Understanding AI Coding Through Cybernetics
Cybernetics' feedback loop model maps perfectly onto the AI Coding workflow. From negative feedback loops to Plant complexity to the evolution of control strategies, this framework lets you rethink the nature of human-AI collaboration.