Recurrent Neural Networks (RNNs)
Recurrent Neural Networks (RNNs) were the first neural architectures designed specifically for sequential data processing. Unlike traditional feed-forward networks, RNNs maintain an internal state (memory) that captures information about previous inputs as they process a sequence word by word.
The basic RNN architecture processes inputs sequentially, updating its hidden state at each step based on both the current input and the previous hidden state. This recurrent connection allows information to persist across time steps, enabling the network to capture dependencies between words separated by arbitrary distances in theory.
In NLP applications, RNNs could be used to process text either left-to-right (reading a sentence normally), right-to-left (capturing dependencies in reverse), or bidirectionally (combining both directions for richer representations). This allowed models to develop context-sensitive understanding of words based on surrounding text.
However, basic RNNs suffered from significant limitations when modeling long-range dependencies. The vanishing gradient problem—where gradient signals diminished exponentially during backpropagation through time—made it difficult for these networks to learn connections between distant words. This limitation spurred the development of more sophisticated architectures like LSTMs and GRUs.