Today I learned
  • Github
  • Linkedin
  • Nomad Life(Travel Blog)
  • About
Sign in Subscribe

llm

A collection of 3 posts
llm

Ghostbusters: Who You Gonna Call When KV Cache Eats Your GPU?

My model was 60GB, and my GPU had 141GB. I should have had 81GB free, but I kept hitting OOM errors. The culprit? KV cache - an unseen memory hog that consumed 68GB without showing up in any config file. This article will explore how the context window and batch size are a zero-sum game.
03 Nov 2025 14 min read
ai

Fast & Furious Tensor Parallelism: GPU Heist Gone Wrong

Splitting a model across 4 H200 GPUs was expected to 4x throughput, but instead resulted in 2.8x worse latency and 35% lower throughput. Without NVLink, tensor parallelism causes more communication overhead than speedup, so sometimes 1 GPU outperforms 4
02 Nov 2025 17 min read
ai

Honey, I Shrunk the Model: When Quantizing 70B Parameters Broke Everything

I tried to shrink a 70B model from FP16 to FP8 to fit in my 141GB of VRAM. Spoiler: it broke everything. After testing 6 models and 3 quantization formats, I discovered that a 30B model in full precision outperformed every quantized 70B. Turns out precision matters more than parameter count.
01 Nov 2025 9 min read
Page 1 of 1
Today I learned © 2025
Powered by Ghost