Recurrent Neural Networks – Ep. 9 (Deep Learning SIMPLIFIED)

Articles, Blog

Recurrent Neural Networks – Ep. 9 (Deep Learning SIMPLIFIED)

Recurrent Neural Networks – Ep. 9 (Deep Learning SIMPLIFIED)


What do you do if the patterns in your data
change with time? In that case, your best bet is to use a recurrent neural network.
This deep learning model has a simple structure with a built-in feedback loop, allowing it
to act as a forecasting engine. Let’s take a closer look. ]Recurrent neural networks, or RNNs, have
a long history, but their recent popularity is mostly due to the works of Juergen Schmidhuber,
Sepp Hochreiter, and Alex Graves. Their applications are extremely versatile – ranging from speech
recognition to driverless cars. All the nets we’ve seen up to this point
have been feedforward neural networks. In a feedforward neural network, signals flow
in only one direction from input to output, one layer at a time. In a recurrent net, the
output of a layer is added to the next input and fed back into the same layer, which is
typically the only layer in the entire network. You can think of this process as a passage
through time – shown here are 4 such time steps. At t=1, the net takes the output
of time t=0 and sends it back into the net along with the next input. The net repeats
this for t=2, t=3, and so on. Unlike feedforward nets, a recurrent net can
receive a sequence of values as input, and it can also produce a sequence of values as
output. The ability to operate with sequences opens up these nets to a wide variety of applications.
Here are a few examples. When the input is singular and the output is a sequence, a potential
application is image captioning. A sequence of inputs with a single output can be used
for document classification. When both the input and output are sequences, these nets
can classify videos frame by frame. If a time delay is introduced, the net can statistically
forecast the demand in supply chain planning. Have you ever used an RNN for one of these
applications? If so, please comment and share your experiences. Like we’ve seen with previous deep learning
models, by stacking RNNs on top of each other, you can form a net capable of more complex
output than a single RNN working alone. Typically, an RNN is an extremely difficult
net to train. Since these nets use backpropagation, we once again run into the problem of the
vanishing gradient. Unfortunately, the vanishing gradient is exponentially worse for an RNN.
The reason for this is that each time step is the equivalent of an entire layer in a
feedforward network. So training an RNN for 100 time steps is like training a 100-layer
feedforward net – this leads to exponentially small gradients and a decay of information
through time. There are several ways to address this problem
– the most popular of which is gating. Gating is a technique that helps the net decide when
to forget the current input, and when to remember it for future time steps. The most popular
gating types today are GRU and LSTM. Besides gating, there are also a few other techniques
like gradient clipping, steeper gates, and better optimizers. When it comes to training a recurrent net,
GPUs are an obvious choice over an ordinary CPU. This was validated by a research team
at Indico, which uses these nets on text processing tasks like sentiment analysis and helpfulness
extraction. The team found that GPUs were able to train the nets 250 times faster! That’s
the difference between one day of training, and over eight months! So under what circumstances would you use
a recurrent net over a feedforward net? We know that a feedforward net outputs one value,
which in many cases was a class or a prediction. A recurrent net is suited for time series
data, where an output can be the next value in a sequence, or the next several values.
So the answer depends on whether the application calls for classification, regression, or forecasting. In the next video, we’ll take a look at
a family of deep learning models known as the autoencoders.

51 thoughts on Recurrent Neural Networks – Ep. 9 (Deep Learning SIMPLIFIED)

  1. Hi, i have a question. In the Illustration "Decay of Information trough time" you show that the gradient decays with every time step we move forward. So at the most recent time step (here 100) the influence of the gradient is the smallest.
    To this point my understanding was that the influence of the gradient is big at recent time steps (short-term dependencies) and decays exponentially as we go back in time (long-term dependencies). Is this just another alternative to view the vanishing gradient problem or is my understanding flawed? Would really appreciate if someone could clarify 🙂

  2. Very useful set of videos. I actually know a lot about the details of deep learning and it is really refreshing to step back from all the complexity and see the big picture like this. Very helpful. Thanks,

  3. out of curiosity, anyone has tried to diminish the vanishing gradient problem with Batch Normalization? Be cool to know what happened and if it was easier to train! 🙂

  4. Fantastic series, first and foremost! Surprisingly it is the first video for me where I could not understand the core concept right away from the video… It was hard to tell in the images where was input and where was output.

    total_RNN_input(1) = {RNN_itself(0); input(1)}
    total_RNN_output(1) = {RNN_itself(1); output(1)}

    Thank you so much for making this series!

  5. i used them to predict product failure where we identify if a product will cause problems in the future. basically I'm trying to identify possible epidemics

  6. I made a lstm to do some time series prediction the other day with keras , my best MAPE was like 25℅ . I'm wondering if it can predict more accurate . I tried to gain some information about how to choose the right number of hidden layers but I didn't find useful information, I've got a question I have like 1000 samples and every row has 25 features . does the following idea nessesary work : "the deeper network , the better performance "?

  7. the figures are quite small in size hence not easy to look at… Please bigger the size of images and figures.
    The explanation is really good ….
    thanks

  8. In your Vanishing Gradients Video(Episode-5), you said that the problem was solved by 3 breakthrough papers by "Hinton", "Lecun", & "Bengio".
    Geoffrey Hinton = Deep Belief
    Yann Lecun = Convolutional
    Bengio = ? (Please tell what he did because Recurrent NN is by Jurgen Schmidhuber and Sepp Hochreiter as you told in this video)

  9. Very confusing video. What do the different colors mean? I understand that this video was supposed to be simplified, but you simplified it too much and it made no sense to someone who hasn't seen RNNs before.

  10. Tried to train a recurrent net with the genetic algorithm long ago, without very impressive results.
    Awesome series by the way!

  11. Going through the video for the first time. One issue I'm having is that some terms are used without being defined. A major term in this video is a Feedforward Network. I'm assuming that this is referring to RBMs and DBNs in how each neural is connected to one in the next layer, but no definition is actually given. In this video Feedforward Network is used quite a few times, and not knowing the actual definition of the word makes it hard to create accurate associations and contrasts of Recurrent Neural Networks and Feedforward Networks.

  12. hello,
    i have seen this video.i need to apply recurrent neural networks for text classification in tensorflow platform.can you please help me ?

  13. Hello. Thank you for a great video. Do you have any example for the use of RNN in vibration control. I am working on vehicle vibration and having difficulty applying RNN. Anyone help please.

  14. can you tell me how recurrent neural network can be used for speech recognition? is it because it can do a forecasting to predict what words is possible according to the previous word spoken?

    and I got one more question in my head,
    according to 1:57
    "A sequence of input with a single output can be used for document classification"
    is this apply to speech recognition too? or maybe another variety of input-output would do?

    by the way, it's a good series, Thanks 🙂

  15. I do not know much about these things. But wow!! I like your voice! I could hear you speaking for hours, even if I do not understand much. Are you a singer?

  16. You are reading a script. I can tell by the tone of your voice that you don't understand this topic enough to be presenting Deep Learning.

  17. Such a nontrivial explanation in such a short time. WoWW!! The best pieces were – 1) RNN Usages, 2) On the training of RNNs, and 3) Their solutions (LSTM). Usually we find these different pieces as different models when trying to learn from the internet. But this is awesome! Thank you!

  18. gerat job, i repeated my prof videos 100 times to understand but couldn't, while u explain it so nice in 5 mnts. thank u

  19. https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&cad=rja&uact=8&ved=0ahUKEwjs6amqwpvXAhUE24MKHTnNCQEQFggxMAI&url=https%3A%2F%2Fmotherboard.vice.com%2Fen_us%2Farticle%2Fwjgnjq%2Fai-vicarious-recursive-cortical-network-captcha&usg=AOvVaw3-5LaqR877vyZUqGSFF0lT

  20. Awesome tutorial.. btw which software or template are you using for creating this type of animations in the presentation. Please tell!

  21. i want to use rnn for forcasting bitcoin price for my research, is it usable for the data?
    is there another neural network that will have more accuracy for forcasting bitcoin price? please help…

  22. regarding the supplychain……. i compared a lstm against a structural times series model ( based on the kalman filter), and not surprisingly, the second won…….. whether on theoretical data, or on ' real world' data……….. cherry on the cake, the kalman filter makes it possible to understand what happens, and find a soultion ( well, ' not all the time', of course), which is not the case for the lstm which remains a blackblox, as all the other neural neetworks
    cheers

  23. Anyone can explain me mathematical explanation of differentiating the functions of lstm plssss! I wanna know how the backprop works. I need it for my math school project! Plssss helllpp!!

Leave a Reply

Your email address will not be published. Required fields are marked *