HomeBlogData ScienceIntroduction to Recurrent Neural Networks and the Math That Powers Them

Introduction to Recurrent Neural Networks and the Math That Powers Them

Published
24th Sep, 2024
Views
view count loader
Read it in
11 Mins
In this article
    Introduction to Recurrent Neural Networks and the Math That Powers Them

    Deep Learning is a branch of artificial intelligence which makes use of an artificial neural network to make decisions and predict the solution for the given problem. The basic machine learning models such as linear or logistics regression are referred to as neurons in deep learning. Specialized architectures help neural networks perform better on a specific kind of dataset. Recurrent Neural Networks, or RNNs is a type of artificial neural network which is a part of the deep learning ecosystem. Recurrent neural network specializes in working with sequences. Time series data such as stock markets, sales figures, etc. are popular examples of sequential data. In this article, you will learn the theory, types of RNN, and application along with how to build a recurrent neural network example in python. We will also see how to build a recurrent neural network using Keras library in python. Check out this Machine Learning crash course which is a great place to ace key concepts and fundamentals of Deep Learning and Machine Learning.

    What is Recurrent Neural Network (RNN)? 

    An artificial neural network employing sequential, or time series data is known as a recurrent neural network (RNN). Recurrent neural networks (RNNs) use training data to learn just like feedforward and convolutional neural networks (CNNs) do. Recurrent neural networks' outputs depend on the previous parts in the sequence, unlike typical deep neural networks, which presume that inputs and outputs are independent.

    Architecture Of Recurrent Neural Network and its types 

    To understand the recurrent neural network architecture, one must be aware of the architecture of artificial feed-forward neural network. A simple neural network contains one input layer, one or more hidden layer, and an output layer. This architecture can be represented with the below mentioned diagram with 2 inputs, single output, and a hidden layer.

    We know that RNN works with sequential data. In a sequence, we require to remember the previous output to generate the next in sequence. For this purpose, RNN has an additional feedback loop in the hidden layer also known as the temporal loop. It means that this hidden layer not only gives an output but also feeds back into itself. The neurons in RNN have a short-term memory which help them to connect themselves through time. The recurrent neural network diagram shown below depicts how the neurons connect through time. Here, ‘x’ is the input, ‘y’ is the output, and ‘h’ is the output from previous input stored in memory cell of RNN.

    There are four different types of RNN architecture. Let us discuss these types of recurrent neural network one by one.

    1. One to One

    One-to-One RNNs are the most basic RNN types because they only support a single input and output. It operates like a conventional neural network and has predefined input and output sizes. It is an unpopular choice for RNNs as it is used for general machine learning problems where other feed-forward neural networks like ANN are preferred.

    2. One to Many

    In one-to-many relationship, we have one input and multiple outputs. For example, we can consider a neural network which describes an image. It takes one input in the form of an image matrix and provides a description of the image. The output sentence is put together in this case making it a multi-output architecture.

    3. Many to One

    Many-to-one relationship consists of multiple inputs and a single output. Example of such network is a sentiment analysis model where we have a lot of text data and from this text data we need to specify if the sentiment is positive or negative. Therefore, the output is either a binary one or a probabilistic output. In both cases, it makes it as a single output architecture.

    4. Many to Many

    An architecture with multiple inputs and multiple outputs forms many-to-many relationships. A language translator, for example, takes in a series of texts or words and then predicts the translated sentence or phrase. One thing worth noting is that here the network needs to remember the sequence to translate the phrase accurately. Word to word translation is neither accurate nor an example of RNN model.

    How Do Recurrent Neural Networks work? 

    Convolutional and feed-forward neural networks can process matrices. To handle sequences of various lengths, however, we require connections that feed the output back into a layer as inputs. RNNs can handle sequential data of various durations because of their feedback loop. RNN explained in this section has the same underlying principle that applies to its variations.

    Recurrent neural networks resemble feedforward neural networks in appearance, with the exception that they also have connections pointing backwards. Let's consider the most basic RNN, which consists of a single neuron that receives inputs, generates an output, and then feeds that result back to itself. At each time frame t, this recurrent neuron receives the inputs x(t) as well as its own output from the previous time step, y(t–1) or h(t). After unrolling this network over time, we get a network similar to the one shown below.

    Every neuron receives the input vector and the output vector from the previous time step at each time step t. One set of weights is for the inputs, and the other is for the outputs of the previous time step, for each recurrent neuron. Let’s call these weight vectors Wx and Wh. The weights are matrices that are first initialized with random elements and are then changed using the loss function's error. The weights are updated as part of this adjustment process via the back-propagation method. 

    We can process a sequence of vectors X by applying recurrence formula at every time step. h(t)=f(x(t)∙W_x+h(t-1)∙W_h+b)

    Here, the current state holds the information about the previous output in the sequence. It is calculated using the previous output denote by h(t-1) and current input denoted by x(t). The term ‘b’ is the bias term and ‘f’ is a non-linear activation function applied to the summation of the two inputs.

    The whole layer’s output can be computed using a vectorized form as shown in below equation. Y(t)=g(H(t)∙W_y+b)

    Where ‘b’ is the bias term and ‘g’ is the activation function applied to the network’s output layer. Wy is the weight matrix applied to the output layer of the current time frame.

    Types or Variations of Recurrent Neural Network (RNN) 

    • Long Short-Term Memory (LSTM)

    Unfortunately, the simplest RNN model has a major drawback, called vanishing gradient problem, which prevents it from being accurate. The problem comes from the fact that at each step during training we are using the same weights to calculate the output. That multiplication is also done during back-propagation. The further we move backwards, the bigger or smaller our error signal becomes. This means that the network experiences difficulty in memorizing words from far away in the sequence and makes predictions based on only the most recent ones. That is why more powerful models like long short-term memory networks or LSTMs which were invented in 1997 by Hochreiter and Schmidhuber come in hand. To this day, they remain the most widely used recurrent neural networks and are responsible for many state-of-the-art results in various fields, from speech recognition to machine translation.

    • Gated Recurrent Unit (GRU)

    One of the most important recurrent neural networks is the Gated Recurrent Unit or GRU. Like LSTMs, these networks also prevent the vanishing gradient problem seen in vanilla RNN models. The GRU is a simplified version of the LSTM recurrent neural network, and it seems to perform just as well. Nowadays, most of the NLP applications such as language translation, sentiment analysis, document summarization, etc. make use of the GRU network. GRU uses less memory and is faster than LSTM, however, LSTM is more accurate when using datasets with longer sequences. As compared to LSTMs, GRU has a simpler architecture with two gates in a GRU as opposed to three gates in an LSTM cell.

    • Bidirectional Recurrent Neural Network (BRNN)

    Two hidden layers with opposing directions are connected to the same output via bidirectional recurrent neural networks (BRNN). The output layer can simultaneously receive data from previous and future states with this type of generative deep learning. To provide network access to more input data, BRNNs were developed. The basic idea behind a BRNN is to divide a conventional RNN's neurons into two directions: one for forward states which requires delays for future information inclusion, and two-time directions that allow input information from the present time frame's past and future to be utilized.

    • Encoder-Decoder RNNs

    The conventional neural machine translation approach that outperforms traditional statistical machine translation techniques uses recurrent neural networks with an encoder-decoder design. The network and its architecture have been adopted by various language translation services like Google Translate and Microsoft Translator. The network consists of two parts, namely, an encoder and a decoder. The encoder is a collection of many recurrent units like the LSTM or GRU, each of which accepts a single input sequence element and propagates information for that element forward. The decoder is a stack of several recurrent units where each unit predicts an output at a given time frame. This model's strength lies in its ability to map different length sequences to one another. The lengths of the inputs and outputs can vary and they are not correlated.

    Common Activation Functions 

    To obtain exact output, the input layer might need to be activated in a particular way. By calculating this activation function over the net input, the artificial neural network's output is determined. Non-linear activation functions are used to ensure that a neuron's response is bounded. The output produced by a multilayer network with linear activation functions is identical to what would be produced by a single-layer network when a signal is sent through it. As a result, nonlinear activation functions as opposed to linear activation functions are utilized in multilayer networks. Some of the common activation functions used in RNNs are –

    • Sigmoidal function – The sigmoidal functions are widely used in back-propagation networks and can be implemented as binary sigmoidal function and bipolar sigmoidal function.
    • ReLU function – ReLU is short for Rectified Linear Activation function. If the input is positive, the ReLU function will output the input directly; if it is negative, it will output zero. Because a model that utilises it is simpler to train and frequently performs better, it has evolved into the standard activation function for many different kinds of neural networks.
    • Leaky ReLu – The Leaky ReLU function is similar to the ReLU function, but it introduces a small slope to negative values, which helps to prevent neurons from dying in the model.
    • Fully connected layers often utilise a softmax activation function to categorise inputs appropriately, producing a probability ranging from 0 to 1.

    Basic Python Implementation (RNN with Keras) 

    Now that we have a basic understanding of recurrent neural networks, we can proceed towards building our first RNN model using Keras library in Python. We will try to predict the trends of a stock price using LSTM model. We shall work on historical stock prices of Infosys (NSE - INFY) over 5 years from 2014 to 2019. We are avoiding the year of pandemic which can be hard for the model to predict the trend due to external factors. You can download the data from this link. The RNN LSTM model will consist of an input layer, three hidden layers, and an output layer.

    ### IMPORTING THE REQUIRED LIBRARIES ###
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.preprocessing import MinMaxScaler
    
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.layers import LSTM
    from keras.layers import Dropout
    
    ### READ THE CSV FILE ###
    data = pd.read_csv("Infosys_Historical_Stock_Price_2014_19.csv")
    
    ### TRAIN TEST SPLIT ###
    train_data = data.iloc[:-30, :]
    test_data  = data.iloc[-30: :]
    
    ### CONSIDERING OPEN VALUES OF STOCK ###
    train_arr = data.iloc[:-30, 1:2].values
    test_arr  = data.iloc[-30:, 1:2].values
    
    ### PERFORMING FEATURE SCALING ###
    sc = MinMaxScaler(feature_range = (0, 1))
    train_arr_scaled = sc.fit_transform(train_arr)
    
    ### CREATE X_TRAIN AND Y_TRAIN SETS ###
    X_train = []
    y_train = []
    
    for i in range(60, len(train_arr_scaled)):
        X_train.append(train_arr_scaled[i - 60:i, 0])
        y_train.append(train_arr_scaled[i, 0])
    
    X_train, y_train = np.array(X_train), np.array(y_train)
    
    ### RESHAPING X_TRAIN SET ###
    X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
    
    ### INITIALISING THE RNN ###
    regressor = Sequential()
    
    ### ADDING THE FIRST LSTM LAYER AND SOME DROPOUT REGULARISATION ###
    regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
    regressor.add(Dropout(0.2))
    
    ### ADDING A SECOND LSTM LAYER AND SOME DROPOUT REGULARISATION ###
    regressor.add(LSTM(units = 50, return_sequences = True))
    regressor.add(Dropout(0.2))
    
    ### ADDING A THIRD LSTM LAYER AND SOME DROPOUT REGULARISATION ###
    regressor.add(LSTM(units = 50, return_sequences = True))
    regressor.add(Dropout(0.2))
    
    ### ADDING A FOURTH LSTM LAYER AND SOME DROPOUT REGULARISATION ###
    regressor.add(LSTM(units = 50))
    regressor.add(Dropout(0.2))
    
    ### ADDING THE OUTPUT LAYER ###
    regressor.add(Dense(units = 1))
    
    ### COMPILING THE RNN ###
    regressor.compile(optimizer = 'adam', loss = 'mean_squared_error')
    
    ### FITTING THE RNN TO THE TRAINING SET ###
    regressor.fit(X_train, y_train, epochs = 100, batch_size = 32)
    
    ### GET PREDICTIONS FROM LSTM MODEL ###
    y_predict = regressor.predict(X_test)
    y_predict = sc.inverse_transform(y_predict)
    
    ### VISUALISE OUTPUT ###
    plt.figure(figsize=(12, 6))
    plt.plot(test_arr, color = 'blue', label = 'Actual Stock Price')
    plt.plot(predicted_stock_price, color = 'green', label = 'Predicted Stock Price')
    plt.title('Infosys Stock Price Prediction using LSTM')
    plt.xlabel('Time')
    plt.ylabel('Infosys Stock Price')
    plt.grid()
    plt.legend()
    plt.show()

    Running the above code requires Python 3.7+ and the following libraries installed in your python environment.

    • Pandas
    • Numpy
    • Matplotlib
    • Scikit-Learn
    • Keras

    You can follow along this at KnowlegeHut’s Machine Learning crash course to build and deploy your own deep learning and data visualization models in a real-world project.

    Applications of Recurrent Neural Networks 

    Recurrent Neural Networks (RNN) are a set of powerful models that have applications in wide range of areas where we can use sequential data. Some of the most popular RNN applications are listed below.

    1. Predictive Analytics

    As part of predictive analytics, any time series problem can be solved with the help of RNNs. Time series data such as stock prices, metal prices, DNA sequencing, etc. can be forecasted using RNN models.

    2. Language Modeling

    RNNs can create powerful natural language models such as language translation models, speech recognition systems, and grammar detection.

    3. Text Summarization

    Generating image descriptions or captions, sentiment analysis, document summarization, etc. are some of the areas in which RNNs have gained increasing popularity.

    Difference between RNN and Simple Neural Network 

    As the name suggests, simple neural networks are the basic artificial feed-forward neural networks. On the other hand, RNN is one of the most advanced and complex neural networks.

    In a simple neural network or feed-forward neural network, the information flow is unidirectional, that means, the information only moves in one direction from input layer, through the hidden layer to the output layer. Whereas in the RNN, the information cycles through a loop consisting of the current input and the output of previous inputs.

    The simple neural networks are fed on tabular data whereas RNNs are trained on sequential data.

    Simple neural networks can have a variety of applications including predictive analytics, image recognition, etc. RNNs are useful with speech recognition, stock price prediction, etc.

    Conclusion 

    In this article, we have learnt how RNNs function and their usage in machine learning and deep learning tasks. RNN form a powerful set of deep learning models that have the ability to model complex tasks. They have gained popularity in recent times due to the availability of massive data along with excellent computing capabilities. These networks are part of the complex black box models, and one needs to understand the basics before transitioning to such complex models. You can learn to tackle such complex data science and machine learning problems through Knowledgehut’s best Data Science courses available online. The courses cover data cleaning, mathematics, statistics, SQL, python, tableau, ML, DL and more. With hours of instructor-led training from 650+ expert trainers, comprehensive hands-on python training, deep learning technique using popular libraries such as Tensorflow and Keras, live projects, MCQs and assignments, you can learn to build an AI application from scratch.

    Frequently Asked Questions (FAQs)

    1What is RNN used for?

    A recurrent neural network has a similar architecture to the feed-forward networks, but it also incorporates connections that point backward in the network, much like feedback loops. This makes RNN suitable for applications like predicting stock prices, autonomous driving systems, speech-to-text, sentiment analysis, etc.

    2What is difference between CNN and RNN?

    CNNs are a particular kind of feed-forward artificial neural network that function well for processing images and videos. RNN is the best choice for text and speech analysis because, unlike feed-forward neural networks, it can analyse arbitrary input sequences using its internal memory. For example, is not a good idea to use recurrent neural network for image classification but instead convolution neural network performs better for the task.

    3Why is RNN used for machine learning?

    RNN in machine learning is used majorly used because of its ability to imitate the activity of neurons in the human brain. It has the ability to detect patterns in a variety of data sequences, including text, DNA sequences, spoken language, handwriting, and numerical time series data from sensors, stock markets, and other data sources.

    4Which is better CNN or RNN?

    CNN RNN architecture are different and they are used in specific use cases. CNNs are often used to solve issues involving spatial data, such as images. When analysing temporal, sequential data, like text or videos, RNNs perform better.

    Profile

    Amit Pathak

    Author

    Amit is an experienced Software Engineer, specialising in Data Science and Operations Research. In the past five years, he has worked in different domains including full stack development, GUI programming, and machine learning. In addition to his work, Amit has a keen interest in learning about the latest technologies and trends in the field of Artificial Intelligence and Machine Learning.

    Share This Article
    Ready to Master the Skills that Drive Your Career?

    Avail your free 1:1 mentorship session.

    Select
    Your Message (Optional)

    Upcoming Data Science Batches & Dates

    NameDateFeeKnow more
    Course advisor icon
    Course Advisor
    Whatsapp/Chat icon