Notes

Research, engineering write-ups, and the dead ends in between.

Distribution Matching Is Not Enough: Two Failure Modes in Latent Text Drifting

Probing Latent Directions in Video Diffusion Models

Hybrid Lexical–Semantic Retrieval for Tool Selection in Agent Systems

From Single-GPU to Distributed Training: A Framework for Making the Right Call

Distributed Data Parallel: How It Actually Works

Tensor Parallelism and Sequence Parallelism

Pipeline Parallelism: How It Actually Works

ZeRO and FSDP: Model Sharding

Kinetic-4B: A 4-Billion Parameter Model That Outperforms Claude Haiku at Tool Calling

LLM Inference at the Edge