2026 | Yash Jayswal

Mar 29, 2026	Scaling LLMs: MoE Routing & JAX Parallelism on TPU
Mar 17, 2026	GPUs for LLMs: The Same Rooflines, Different Numbers
Mar 14, 2026	TPU Profiling: When Math Meets Reality
Mar 08, 2026	Serving LLaMA 3-70B: From Theory to Production Numbers
Mar 07, 2026	Transformer Inference: Two Problems in Disguise
Mar 02, 2026	Training LLaMA 3 on TPUs: Putting Theory Into Practice
Feb 10, 2026	Training at Scale: When Communication Becomes the Enemy
Feb 07, 2026	Transformer Math: The 6PT Rule and Other Accounting Tricks
Feb 07, 2026	Sharding Strategies: The Art of Distributed Matrix Multiplication
Feb 04, 2026	TPU Architecture: Understanding the Bandwidth Hierarchy
Feb 03, 2026	Roofline Analysis: When Does Your Model Hit the Wall?
Feb 02, 2026	Scaling LLMs: From Alchemy to Science (Part 0)
Jan 31, 2026	Math to Model: GRPO, PCA Augmentation, and SVD
Jan 31, 2026	Level Up: AlexNet, DeepSeek R1, & Linear Algebra Badges! 🚀
Jan 28, 2026	3 New Deep-ML Badges Earned! 🏆
Jan 28, 2026	DenseNet Block: Brute-Force Feature Reuse
Jan 28, 2026	Pegasos Kernel SVM: The Hardest Math So Far
Jan 28, 2026	Building a Primitive GPT-2: Layers & Dimensions
Jan 27, 2026	Building Autograd: Chain Rule and Topo-Sort
Jan 27, 2026	Understanding LSTMs: A Hands-on Intuition
Jan 27, 2026	Earned Deep-ML Badges - Attention Is All You Need & ResNet
Jan 23, 2026	Hello World