| Mar 29, 2026 | Scaling LLMs: MoE Routing & JAX Parallelism on TPU |
| Mar 17, 2026 | GPUs for LLMs: The Same Rooflines, Different Numbers |
| Mar 14, 2026 | TPU Profiling: When Math Meets Reality |
| Mar 08, 2026 | Serving LLaMA 3-70B: From Theory to Production Numbers |
| Mar 07, 2026 | Transformer Inference: Two Problems in Disguise |
| Mar 02, 2026 | Training LLaMA 3 on TPUs: Putting Theory Into Practice |
| Feb 10, 2026 | Training at Scale: When Communication Becomes the Enemy |
| Feb 07, 2026 | Transformer Math: The 6PT Rule and Other Accounting Tricks |
| Feb 07, 2026 | Sharding Strategies: The Art of Distributed Matrix Multiplication |
| Feb 04, 2026 | TPU Architecture: Understanding the Bandwidth Hierarchy |
| Feb 03, 2026 | Roofline Analysis: When Does Your Model Hit the Wall? |
| Feb 02, 2026 | Scaling LLMs: From Alchemy to Science (Part 0) |
| Jan 31, 2026 | Math to Model: GRPO, PCA Augmentation, and SVD |
| Jan 31, 2026 | Level Up: AlexNet, DeepSeek R1, & Linear Algebra Badges! 🚀 |
| Jan 28, 2026 | 3 New Deep-ML Badges Earned! 🏆 |
| Jan 28, 2026 | DenseNet Block: Brute-Force Feature Reuse |
| Jan 28, 2026 | Pegasos Kernel SVM: The Hardest Math So Far |
| Jan 28, 2026 | Building a Primitive GPT-2: Layers & Dimensions |
| Jan 27, 2026 | Building Autograd: Chain Rule and Topo-Sort |
| Jan 27, 2026 | Understanding LSTMs: A Hands-on Intuition |
| Jan 27, 2026 | Earned Deep-ML Badges - Attention Is All You Need & ResNet |
| Jan 23, 2026 | Hello World |