fp4

fp4The engineering layer of AI. Deep technical coverage of LLM, GPU, and ML systems internals.https://fp4.dev/en-usH100 vs H200 vs B200: TCO for Inference Infrastructurehttps://fp4.dev/silicon/h100-h200-b200-tco/https://fp4.dev/silicon/h100-h200-b200-tco/Beyond the spec sheet: deriving actual cost per million tokens for each generation, accounting for memory capacity, bandwidth, rack power, and cooling — the numbers that determine your infrastructure decision.Mon, 22 Jun 2026 00:00:00 GMTsiliconh100h200b200tcoinferencehardwarecostIntra-node vs Inter-node Interconnects in Distributed Traininghttps://fp4.dev/system/distributed-interconnects/https://fp4.dev/system/distributed-interconnects/NVLink, NVSwitch, InfiniBand, and RoCE — the bandwidth and latency numbers that determine whether your distributed training job scales or stalls.Sat, 20 Jun 2026 00:00:00 GMTsystemnvlinkinfinibandrocedistributed-trainingcollective-opsncclGPU Memory Hierarchy and Kernel Performancehttps://fp4.dev/silicon/gpu-memory-hierarchy/https://fp4.dev/silicon/gpu-memory-hierarchy/Why memory bandwidth — not FLOPs — is the binding constraint for most LLM workloads, and how H100's five-level hierarchy determines what your kernels can actually achieve.Thu, 18 Jun 2026 00:00:00 GMTsilicongpumemoryhbmsrambandwidthkernels