In-Depth Guide to Recurrent Neural Networks (RNNs) in 2021
Neural networks are powering a wide range of deep learning applications in different industries with different use cases from natural language processing (NLP) to computer vision and drug discovery. There are different types of neural networks for different applications such as:
- Feedforward neural networks
- Convolutional neural networks (CNNs)
- Recurrent neural networks (RNNs)
In this article, we will explore RNNs and their use cases.
What are recurrent neural networks (RNNs)?
Recurrent neural networks (RNNs) are a class of neural networks that takes the output from previous steps as input to the current step. In this sense, RNNs have a “memory” of what has been calculated before. This makes these algorithms fit for sequential problems such as natural language processing (NLP), speech recognition, or time series analysis where current observations depend on previous ones.
What is the difference between RNNs and other neural network algorithms?
RNNs differ from feedforward and convolutional neural networks (CNNs) with their temporal dimension. In other types of neural network algorithms, inputs and outputs of the model are assumed to be independent of each other. In RNNs, the output depends on previous elements.
Suppose you have a speech recognition problem containing the sentence “What time is it?”. The deployed algorithm in this problem needs to account for the specific sequence of words for the output to make sense. As illustrated below, the RNN predicts the next word in the sentence by using previous words as inputs.
Since inputs and outputs are independent of each other in other types of neural networks, they are more appropriate for problems that do not have a sequential property such as image recognition or tabular data analysis.
How do RNNs work?
The image below demonstrates the basic structure of an RNN. The diagram on the right is the full (or unfolded) version of the diagram on the left.
Source: Deep Learning Book
- The model inputs are denoted with x(t) where t is time. x(t) can be a word and its place in a sentence or the price of a stock on a specific day.
- h(t) denotes the hidden state of the network at time t. Hidden states act as “memory” of the model and they are calculated based on the current input x(t) and previous state h(t-1).
- o(t) represents the output of the model at time t. The current output is determined by current input, x(t), and the current hidden state, h(t), which depends on previous hidden states. This is the distinguishing feature of RNNs since current output depends on both current input and previous inputs.
- Parameters (U, V, W) represent the weights between inputs, hidden states, and outputs. They control the extent of influence between these.
For more, you can check our article on how regular neural networks work. RNNs are an extension of these regular neural networks.
What are the challenges with RNNs?
Recurrent neural networks suffer from a problem called vanishing gradient, which is also a common problem for other neural network algorithms. The vanishing gradient problem is the result of an algorithm called backpropagation that allows neural networks to optimize the learning process.
In short, the neural network model compares the difference between its output and the desired output and feeds this information back to the network to adjust parameters such as weights using a value called gradient. A bigger gradient value means bigger adjustments to the parameters, and vice versa. This process continues until a satisfying level of accuracy is reached.
RNNs leverage the backpropagation through time (BPTT) algorithm where calculations depend on previous steps. However, if the value of gradient is too small in a step during backpropagation, the value would be even smaller in the next step. This causes gradients to decrease exponentially to a point where the model stops learning.
This is called the vanishing gradient problem and causes RNNs to have a short-term memory: earlier outputs have increasingly small or no effect on the current output. This can be seen in the “What time is it?” problem above where colors for earlier words shrink as the model moves through the sentence.
The vanishing gradient problem can be remedied by different RNN variants. Two of them are called Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU). These algorithms use mechanisms called “gates” to control how much and which information to retain and forget.
For better preventive care and personalized interventions in hospitals and senior living facilities, AI consultant Positronic has developed a deep learning hospital monitoring solution for a client. The solution leverages LSTM technology and it accurately predicts before patients attempt to exit their beds.
What are the use cases and applications of RNNs?
RNNs and their variants LSTMs and GRUs are used in problems where the input data is sequential by nature. These include:
- Time series analysis such as stock price forecasting
- Machine translation
- Speech recognition. Google’s voice search uses LSTM.
- Image captioning
- Sentiment analysis
Note: We do not own this content, we have been inspired to use this content for the educational purposes and betterment of our students.