Traces of Thought

May 24, 2026

LLMInference Enginellama.cppvLLMTransformerKV CacheMoEMLAQuantizationSpeculative DecodingContinuous BatchingTensor ParallelismGBNFMultimodalReasoningPerformance Immersive

The Life of an LLM Inference — A Prompt's 28 Stops Inside llama.cpp

After you press enter on "hello, llama", those 5 tokens travel 28 stops inside llama.cpp before they come back as an answer — the tokenizer splits bytes into BPE ids, the embedding table fetches 4096-dim vectors, the KV cache decides the memory ceiling of the whole run, attention fuses Q·Kᵀ through FlashAttention's online softmax into a single SRAM kernel, the MoE router lights up only 8 of 256 experts, MLA collapses the entire KV table into latent form, quantisation squeezes a 16 GB model into 5 GB, speculative decoding shaves 60% off decode time, continuous batching keeps the GPU busy between requests, TP/PP/EP splits 405B across 8 cards, GBNF forces output to obey a JSON schema, the vision encoder turns an image into 256 tokens, reasoning models "*think*" 10000 internal tokens, Blackwell fp4 fits 405B into a single node, and finally SSE pushes every character one frame at a time to the user's screen. Every stop maps to a real file and a real function in llama.cpp / vLLM, read line by line.

May 23, 2026

CSSBrowser EngineBlinkPerformance EngineeringRendering Pipeline Immersive

The Life of a Stylesheet — Inside Chromium's CSS Engine

A 4-line stylesheet has to traverse 9 stages, hit 3 index trees, and participate in an 8-step cascade before one pixel on screen turns oklch blue. Every step unfolded — Blink source-line citations, CSSOM field layouts, cascade rules, container queries / @layer / :has() internals.

May 23, 2026

TLSNetwork ProtocolCryptographyECHPost-QuantumX.509 Immersive

The Life of a TLS Handshake — TLS 1.3 Protocol in Full

A client fires off a 538-byte ClientHello stuffed with key agreement, version negotiation, ALPN, SNI, and a 0-RTT invitation. One round-trip later, HKDF grows eight keys out of a single root seed, the certificate chain is verified, the handshake self-attests via HMAC, and AEAD takes over every application byte — all in 1 RTT. A byte-level field manual for TLS 1.3, from ClientHello to ECH to post-quantum KEM.

May 18, 2026

WebGPUGPUGraphicsWGSLDawnwgpuSystems Immersive

Eight Translations of One Dispatch — A WebGPU Stack Source-Level Walkthrough

One `pass.dispatchWorkgroups()` call traverses eight translations before it reaches GPU silicon — JS → Blink WebIDL → Renderer Wire → Mojo IPC → Dawn validation → Tint(WGSL→SPIR-V/MSL/HLSL) → Metal/D3D12/Vulkan → driver ISA. A 25-chapter vertical dissection of the full WebGPU stack — Dawn + wgpu side by side, real source, real latencies, threaded by a compute matmul.

May 17, 2026

GCRuntimesMemoryProgramming LanguagesPerformance Engineering Immersive

Many Ways to Die — A Family Map of 11 Garbage Collectors

The same line `list = null` frees immediately in CPython, waits for the next minor GC in V8, completes sub-ms concurrently in ZGC, dies only when the actor dies in Erlang, and was dropped at compile time in Rust. A cross-language tour of garbage collection across 20+ runtimes, organised by algorithm family, threaded by one main-line program.

May 16, 2026

JS EnginesQuickJSInterpreterBytecodeGCPerformance Engineering Immersive

The Life of One JS Line — A QuickJS Source-Level Walkthrough

How 70 000 lines of C turn one line of JS into [2,4,6]. A source-level walkthrough of QuickJS — lexer, bytecode, the 3000-line interpreter loop, refcount GC — with every step compared against V8 / JSC / SpiderMonkey / Hermes.

May 16, 2026

WebAssemblyCompilerJITV8LiftoffTurboFanPerformance Engineering Immersive

From Rust to SIMD — The Life of WebAssembly

11 lines of Rust convolution from cargo build have to cross seventeen stages, two tiers of JIT, and 4 GiB of linear memory before they can light up one SIMD-vectorised pixel on screen. Every byte, every IR form, every machine instruction, every W3C / IETF spec citation — fully unpacked.

May 15, 2026

Network ProtocolsHTTP/3QUICTLSPerformance Engineering Immersive

The Life of a Request — A Field Map of HTTP/3

A single GET has to cross 13 stages on top of UDP, four cryptographic levels and three streams before it can land a 200 OK. Wire formats, keys, timelines and source paths for QUIC, HTTP/3 and QPACK.

May 5, 2026

Image FormatsGraphicsCodecsCompressionCodex Immersive

Sediment of Pixels — A Codex of 50+ Image Formats

From 1985 BMP to 2026 neural codecs, from screen pixels to GPU memory, from medical CT to astronomical FITS — a hand-drawn codex of image formats. 50+ formats, 7 families, 67 chapters.

May 3, 2026

BrowserRendering EngineChromiumBlinkPerformance Engineering Immersive

Bytecode to Pixels — A Field Map of Chromium's Rendering Pipeline

A stream of bytes from the network must cross 13 stages, 3 processes and 4 property trees before it can light a single pixel. Source paths, class hierarchies, algorithm skeletons and diagrams for every stage.

May 3, 2026

JavaScriptV8Performance EngineeringJITHidden ClassInline Cache Immersive

JavaScript at the Limit — V8 Internals and One Hot Function's Road to 10×

How to diagnose a slow piece of JS, locate its bottleneck, and rewrite it ten times faster using V8 internals as a guide. A 20-chapter methodology — one real px2rem function takes the trip from ~240ms to ~24ms / 1M iters, each cut mapping to a concrete V8 mechanism.

Apr 27, 2026

Performance EngineeringFrame RateJankStutterMobile Immersive

Measuring «Smoothness» — From FrameTime to Stutter

A discriminator language for the experience of lag. From FrameTime to Stutter — why high frame rates can still feel choppy, and what the machine sees the moment your eyes say it is stuttering.

Apr 26, 2026

Game EngineContainerPerformanceEngineering Retrospective Immersive

How to Build a High-Performance Mini-Game Container — The Evolution & Tech Behind Helio

One year — from kickoff in 2023 to surpassing WebView in 2024. The complete engineering retrospective for a cross-app mini-game container, with 44 hand-built visualizations.

Apr 5, 2026

aiagentarchitecture

Claude 101: An Interactive Tour of Claude Code from the AI's Point of View

Claude 101 is an interactive learning site that uses 16 deep chapters, annotated real source code, and animated visualizations to teach you Claude Code's complete architecture — from the AI's first-person perspective.

Apr 5, 2026

aiagentarchitecture

Claude Code's Memory System, Explained

A deep dive into Claude Code's Memory system — its 5-layer architecture, 4 memory types, write/retrieve/delete flows, and the three-layer safety mechanism.

Apr 4, 2026

aiagentarchitecture

A Visual Guide to OpenHarness: Recreating Claude Code's Agent Architecture in 3% of the Code

OpenHarness is an open-source project from HKU's HKUDS group that recreates Claude Code's Agent Harness architecture in pure Python — 98% tool coverage in just 11,733 lines of code.

Mar 29, 2026

cyberneticsaisystems-thinking

Understanding AI Coding Through Cybernetics

Cybernetics' feedback loop model maps perfectly onto the AI Coding workflow. From negative feedback loops to Plant complexity to the evolution of control strategies, this framework lets you rethink the nature of human-AI collaboration.

The Life of an LLM Inference — A Prompt's 28 Stops Inside llama.cpp ↗

The Life of a Stylesheet — Inside Chromium's CSS Engine ↗

The Life of a TLS Handshake — TLS 1.3 Protocol in Full ↗

Eight Translations of One Dispatch — A WebGPU Stack Source-Level Walkthrough ↗

Many Ways to Die — A Family Map of 11 Garbage Collectors ↗

The Life of One JS Line — A QuickJS Source-Level Walkthrough ↗

From Rust to SIMD — The Life of WebAssembly ↗

The Life of a Request — A Field Map of HTTP/3 ↗

Sediment of Pixels — A Codex of 50+ Image Formats ↗

Bytecode to Pixels — A Field Map of Chromium's Rendering Pipeline ↗

JavaScript at the Limit — V8 Internals and One Hot Function's Road to 10× ↗

Measuring «Smoothness» — From FrameTime to Stutter ↗

How to Build a High-Performance Mini-Game Container — The Evolution & Tech Behind Helio ↗

Claude 101: An Interactive Tour of Claude Code from the AI's Point of View

Claude Code's Memory System, Explained

A Visual Guide to OpenHarness: Recreating Claude Code's Agent Architecture in 3% of the Code

Understanding AI Coding Through Cybernetics

The Life of an LLM Inference — A Prompt's 28 Stops Inside llama.cpp

The Life of a Stylesheet — Inside Chromium's CSS Engine

The Life of a TLS Handshake — TLS 1.3 Protocol in Full

Eight Translations of One Dispatch — A WebGPU Stack Source-Level Walkthrough

Many Ways to Die — A Family Map of 11 Garbage Collectors

The Life of One JS Line — A QuickJS Source-Level Walkthrough

From Rust to SIMD — The Life of WebAssembly

The Life of a Request — A Field Map of HTTP/3

Sediment of Pixels — A Codex of 50+ Image Formats

Bytecode to Pixels — A Field Map of Chromium's Rendering Pipeline

JavaScript at the Limit — V8 Internals and One Hot Function's Road to 10×

Measuring «Smoothness» — From FrameTime to Stutter

How to Build a High-Performance Mini-Game Container — The Evolution & Tech Behind Helio