Understanding LSTMs: A Hands-on Intuition
I recently implemented a Long Short-Term Memory (LSTM) network from scratch using NumPy. While the formulas can look daunting, the real beauty of LSTMs lies in their intuitive design as a “controlled memory stream.”
🧠 The “Controlled Memory” Intuition
If you’ve ever struggled with the technicalities of LSTMs, I highly recommend reading Christopher Olah’s classic blog post. It’s the gold standard for understanding how these networks truly function.
In my implementation, I focused on making the gates “feel” like what they actually do:
-
*(Multiplication) is the Selector: It decides exactly how much information flow should be allowed at any given moment. -
+(Addition) is the Knowledge Adder: It creates a “highway” for the cell state, allowing gradients to flow without vanishing—solving the key problem of traditional RNNs. -
Sigmoidis the Gatekeeper: Because it outputs between 0 and 1, it’s perfect for selecting or blocking information. -
Tanhis the Featurizer: It’s great for creating new candidate features because it normalizes data while keeping it expressive.
🚪 What each gate “feels” like
- Forget Gate: The Eraser. It looks at the past and asks: “Is this still worth remembering?”
- Input Gate: The Filter. It decides which parts of the new input are actually useful.
- Candidate Memory: The Writer. It prepares the new “draft” of information using
tanh. - Output Gate: The Presenter. It takes the long-term memory and decides what to show for the current time step.
💻 Implementation
You can find my full implementation, along with a detailed breakdown of the intuition, in my dedicated repository:
👉 LSTM Implementation on GitHub
This project is part of my ongoing journey through the Deep-ML problem sets!
Enjoy Reading This Article?
Here are some more articles you might like to read next: