Archive · fp4

H100 vs H200 vs B200: TCO for Inference Infrastructure

Beyond the spec sheet: deriving actual cost per million tokens for each generation, accounting for memory capacity, bandwidth, rack power, and cooling — the numbers that determine your infrastructure decision.

⚙⚙⚙⚙⚙ 2026.06.22

2606.002 System

System · 26 min

Intra-node vs Inter-node Interconnects in Distributed Training

NVLink, NVSwitch, InfiniBand, and RoCE — the bandwidth and latency numbers that determine whether your distributed training job scales or stalls.

⚙⚙⚙⚙⚙ 2026.06.20

2606.001 Silicon

Silicon · 22 min

GPU Memory Hierarchy and Kernel Performance

Why memory bandwidth — not FLOPs — is the binding constraint for most LLM workloads, and how H100's five-level hierarchy determines what your kernels can actually achieve.

⚙⚙⚙⚙⚙ 2026.06.18