Building Autograd: Chain Rule and Topo-Sort

I recently tackled the “Basic Autograd Operations” problem on Deep-ML, and it was an eye-opening experience. Following the footsteps of Andrej Karpathy (check out his legendary Micrograd video), I built a scalar-valued autograd engine.

🌉 Bridging the Gap: Theory to Code

What I loved most about this project was how systematic the commands felt. The chain rule isn’t just a math formula here; it’s a series of small, local instructions that each Value object knows how to execute.

🔄 Topological Sorting in the Wild

For a long time, I only saw Topological Sorting in competitive programming contexts (LeetCode/CodeForces). But in an autograd engine, it’s the “secret sauce.” It ensures that before we compute the gradient for any node, we’ve already finished with everything that depends on it.

This simple algorithm turns a messy graph of operations into a perfectly ordered sequence for backpropagation.

🐍 The Elegance of Python

The simplicity with which we can write this in Python is stunning. One single backward() function can set all gradients across the entire network, mirroring the core functionality of heavy-weights like PyTorch.

💻 The Implementation

Here is my Value class that handles the heavy lifting:

class Value:
    def __init__(self, data, _children=(), _op=''):
        self.data = data
        self.grad = 0
        self._backward = lambda: None
        self._prev = set(_children)
        self._op = _op

    def __add__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data + other.data, (self, other), '+')
        def _backward():
            self.grad += 1  * out.grad
            other.grad += 1  * out.grad
        out._backward = _backward
        return out

    def __mul__(self, other):
        other = other if isinstance(other, Value) else Value(other)
        out = Value(self.data * other.data, (self, other), '*')
        def _backward():
            self.grad += other.data * out.grad
            other.grad += self.data * out.grad
        out._backward = _backward
        return out

    def relu(self):
        out = Value(0 if self.data < 0 else self.data, (self,), 'ReLU')
        def _backward():
            self.grad += (out.data > 0) * out.grad
        out._backward = _backward
        return out

    def backward(self):
        topo = []
        visited = set()
        def build_topo(v):
            if v not in visited:
                visited.add(v)
                for child in v._prev:
                    build_topo(child)
                topo.append(v)
        build_topo(self)
        self.grad = 1
        for node in reversed(topo):
            node._backward()

Check out the full repository and tests here: 👉 Autograd Project on GitHub

Next up: Building actual neurons using this autograd engine!

🌉 Bridging the Gap: Theory to Code

🔄 Topological Sorting in the Wild

🐍 The Elegance of Python

💻 The Implementation

Enjoy Reading This Article?