Understanding LSTMs: A Hands-on Intuition

I recently implemented a Long Short-Term Memory (LSTM) network from scratch using NumPy. While the formulas can look daunting, the real beauty of LSTMs lies in their intuitive design as a “controlled memory stream.”

🧠 The “Controlled Memory” Intuition

If you’ve ever struggled with the technicalities of LSTMs, I highly recommend reading Christopher Olah’s classic blog post. It’s the gold standard for understanding how these networks truly function.

In my implementation, I focused on making the gates “feel” like what they actually do:

* (Multiplication) is the Selector: It decides exactly how much information flow should be allowed at any given moment.
+ (Addition) is the Knowledge Adder: It creates a “highway” for the cell state, allowing gradients to flow without vanishing—solving the key problem of traditional RNNs.
Sigmoid is the Gatekeeper: Because it outputs between 0 and 1, it’s perfect for selecting or blocking information.
Tanh is the Featurizer: It’s great for creating new candidate features because it normalizes data while keeping it expressive.

🚪 What each gate “feels” like

Forget Gate: The Eraser. It looks at the past and asks: “Is this still worth remembering?”
Input Gate: The Filter. It decides which parts of the new input are actually useful.
Candidate Memory: The Writer. It prepares the new “draft” of information using tanh.
Output Gate: The Presenter. It takes the long-term memory and decides what to show for the current time step.

💻 Implementation

You can find my full implementation, along with a detailed breakdown of the intuition, in my dedicated repository:

👉 LSTM Implementation on GitHub

This project is part of my ongoing journey through the Deep-ML problem sets!

🧠 The “Controlled Memory” Intuition

🚪 What each gate “feels” like

💻 Implementation

Enjoy Reading This Article?