· EN
LLMInference Enginellama.cppvLLMTransformerKV CacheMoEMLAQuantizationSpeculative DecodingContinuous BatchingTensor ParallelismGBNFMultimodalReasoningPerformance Immersive

The Life of an LLM Inference — A Prompt's 28 Stops Inside llama.cpp

After you press enter on "hello, llama", those 5 tokens travel 28 stops inside llama.cpp before they come back as an answer — the tokenizer splits bytes into BPE ids, the embedding table fetches 4096-dim vectors, the KV cache decides the memory ceiling of the whole run, attention fuses Q·Kᵀ through FlashAttention's online softmax into a single SRAM kernel, the MoE router lights up only 8 of 256 experts, MLA collapses the entire KV table into latent form, quantisation squeezes a 16 GB model into 5 GB, speculative decoding shaves 60% off decode time, continuous batching keeps the GPU busy between requests, TP/PP/EP splits 405B across 8 cards, GBNF forces output to obey a JSON schema, the vision encoder turns an image into 256 tokens, reasoning models "*think*" 10000 internal tokens, Blackwell fp4 fits 405B into a single node, and finally SSE pushes every character one frame at a time to the user's screen. Every stop maps to a real file and a real function in llama.cpp / vLLM, read line by line.

CSSBrowser EngineBlinkPerformance EngineeringRendering Pipeline Immersive

The Life of a Stylesheet — Inside Chromium's CSS Engine

A 4-line stylesheet has to traverse 9 stages, hit 3 index trees, and participate in an 8-step cascade before one pixel on screen turns oklch blue. Every step unfolded — Blink source-line citations, CSSOM field layouts, cascade rules, container queries / @layer / :has() internals.

TLSNetwork ProtocolCryptographyECHPost-QuantumX.509 Immersive

The Life of a TLS Handshake — TLS 1.3 Protocol in Full

A client fires off a 538-byte ClientHello stuffed with key agreement, version negotiation, ALPN, SNI, and a 0-RTT invitation. One round-trip later, HKDF grows eight keys out of a single root seed, the certificate chain is verified, the handshake self-attests via HMAC, and AEAD takes over every application byte — all in 1 RTT. A byte-level field manual for TLS 1.3, from ClientHello to ECH to post-quantum KEM.

WebGPUGPUGraphicsWGSLDawnwgpuSystems Immersive

Eight Translations of One Dispatch — A WebGPU Stack Source-Level Walkthrough

One `pass.dispatchWorkgroups()` call traverses eight translations before it reaches GPU silicon — JS → Blink WebIDL → Renderer Wire → Mojo IPC → Dawn validation → Tint(WGSL→SPIR-V/MSL/HLSL) → Metal/D3D12/Vulkan → driver ISA. A 25-chapter vertical dissection of the full WebGPU stack — Dawn + wgpu side by side, real source, real latencies, threaded by a compute matmul.

GCRuntimesMemoryProgramming LanguagesPerformance Engineering Immersive

Many Ways to Die — A Family Map of 11 Garbage Collectors

The same line `list = null` frees immediately in CPython, waits for the next minor GC in V8, completes sub-ms concurrently in ZGC, dies only when the actor dies in Erlang, and was dropped at compile time in Rust. A cross-language tour of garbage collection across 20+ runtimes, organised by algorithm family, threaded by one main-line program.

JS EnginesQuickJSInterpreterBytecodeGCPerformance Engineering Immersive

The Life of One JS Line — A QuickJS Source-Level Walkthrough

How 70 000 lines of C turn one line of JS into [2,4,6]. A source-level walkthrough of QuickJS — lexer, bytecode, the 3000-line interpreter loop, refcount GC — with every step compared against V8 / JSC / SpiderMonkey / Hermes.

WebAssemblyCompilerJITV8LiftoffTurboFanPerformance Engineering Immersive

From Rust to SIMD — The Life of WebAssembly

11 lines of Rust convolution from cargo build have to cross seventeen stages, two tiers of JIT, and 4 GiB of linear memory before they can light up one SIMD-vectorised pixel on screen. Every byte, every IR form, every machine instruction, every W3C / IETF spec citation — fully unpacked.

Network ProtocolsHTTP/3QUICTLSPerformance Engineering Immersive

The Life of a Request — A Field Map of HTTP/3

A single GET has to cross 13 stages on top of UDP, four cryptographic levels and three streams before it can land a 200 OK. Wire formats, keys, timelines and source paths for QUIC, HTTP/3 and QPACK.

Image FormatsGraphicsCodecsCompressionCodex Immersive

Sediment of Pixels — A Codex of 50+ Image Formats

From 1985 BMP to 2026 neural codecs, from screen pixels to GPU memory, from medical CT to astronomical FITS — a hand-drawn codex of image formats. 50+ formats, 7 families, 67 chapters.

BrowserRendering EngineChromiumBlinkPerformance Engineering Immersive

Bytecode to Pixels — A Field Map of Chromium's Rendering Pipeline

A stream of bytes from the network must cross 13 stages, 3 processes and 4 property trees before it can light a single pixel. Source paths, class hierarchies, algorithm skeletons and diagrams for every stage.

JavaScriptV8Performance EngineeringJITHidden ClassInline Cache Immersive

JavaScript at the Limit — V8 Internals and One Hot Function's Road to 10×

How to diagnose a slow piece of JS, locate its bottleneck, and rewrite it ten times faster using V8 internals as a guide. A 20-chapter methodology — one real px2rem function takes the trip from ~240ms to ~24ms / 1M iters, each cut mapping to a concrete V8 mechanism.

Performance EngineeringFrame RateJankStutterMobile Immersive

Measuring «Smoothness» — From FrameTime to Stutter

A discriminator language for the experience of lag. From FrameTime to Stutter — why high frame rates can still feel choppy, and what the machine sees the moment your eyes say it is stuttering.

Game EngineContainerPerformanceEngineering Retrospective Immersive

How to Build a High-Performance Mini-Game Container — The Evolution & Tech Behind Helio

One year — from kickoff in 2023 to surpassing WebView in 2024. The complete engineering retrospective for a cross-app mini-game container, with 44 hand-built visualizations.

aiagentarchitecture

Claude 101: An Interactive Tour of Claude Code from the AI's Point of View

Claude 101 is an interactive learning site that uses 16 deep chapters, annotated real source code, and animated visualizations to teach you Claude Code's complete architecture — from the AI's first-person perspective.

aiagentarchitecture

Claude Code's Memory System, Explained

A deep dive into Claude Code's Memory system — its 5-layer architecture, 4 memory types, write/retrieve/delete flows, and the three-layer safety mechanism.

aiagentarchitecture

A Visual Guide to OpenHarness: Recreating Claude Code's Agent Architecture in 3% of the Code

OpenHarness is an open-source project from HKU's HKUDS group that recreates Claude Code's Agent Harness architecture in pure Python — 98% tool coverage in just 11,733 lines of code.

cyberneticsaisystems-thinking

Understanding AI Coding Through Cybernetics

Cybernetics' feedback loop model maps perfectly onto the AI Coding workflow. From negative feedback loops to Plant complexity to the evolution of control strategies, this framework lets you rethink the nature of human-AI collaboration.