Recurrent Neural Networks (RNN) – Deep Learning w/ Python, TensorFlow & Keras p.7

In this part we’re going to be covering recurrent neural networks. The idea of a recurrent neural network is that sequences and order matters. For many operations, this definitely does.

Text tutorials and sample code: https://pythonprogramming.net/recurrent-neural-network-deep-learning-python-tensorflow-keras/

Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex

118 comments

  1. ארד ארבל on

    cool video! really like your channel!

    and I have a little question for you:
    I am trying to create a neural network in python from scratch, kind of like you did in the beginning of your practical machine learning tutorials series, and I managed to do it with some data sets.
    now, I am trying to apply MNIST hand written digits, by using softmax regression (input layer of 784 neurons, output layer of 10 neurons, softmax as activation, cross entropy as loss). but it just does not work. is it possible to do something like that without using convolutional network? and if it is, can you post a video about it? (or did you already upload one in the past?)

    Reply
  2. Panchsheel on

    Hello Sentdex Please make a video on making a PUBG Map Radar With Can Track Realtime All players, Show Where They Are Currently….

    Reply
  3. Panchsheel on

    can anybody send me an i3 or i5 or i7 Processor Please i have core2 DUO and i can’t Afford a new processor please help…….

    Reply
  4. Alex Smith on

    CuDNNLSTM is amazing! It used to take me 3h to evaluate a model on a large dataset, now it only takes about 20mins. Also training takes about 10 times less.

    Reply
  5. atrumluminarium on

    Not sure if I’m getting confused or not, but when setting the decay of the rate a decay close to 0 would be a very strong decay and a decay close to 1 would be very weak (for example setting it to 0.9999 would mean that the change would be almost negligible) correct? Could that have been the reason why it was running slowly?

    Reply
  6. Akash Adhikari on

    “And we’ll throw in a dropout because that’s what you do.” 😀

    Man, I haven’t gone through the previous videos in this series but I think you should probably also make videos on bias, variances, why use regularization, dropouts, etc. You know, a COMPLETE mathematical overview for each of the implementations shown in the video.
    Anyway, this looks like a great series. Would love to go through.

    Reply
  7. Fadgedo on

    Awesome video series! Thanks for inspiration Harrison. I’m about to get my first job, thanks to you. Thanks a lot man, keep up.

    Reply
  8. fuba44 on

    I would really like to see you doing the real propper data preprocessing, like really get into the details, and do it on real data pulled from somewhere. Maybe best practices and handy ways to store/load/condition large amounts of data, even if you got only low’ish amounts of ram.

    Reply
  9. Shivam Chandhok on

    The training accuracy is less than validation accuracy as during training ,dropout works and some nodes are switched off.However,during test time dropout doesn’t switch off any nodes so all nodes are involved in the computation of validation accuracy.

    Reply
  10. J_Net Reloaded on

    none of this is practical for me at least show how to do something useful like face recognition thats why subbed you still not showed how to do it. i need to get my door to open up when it recognises me simple request!!!!

    Reply
  11. Manas Hejmadi on

    wow CuDNNLSTM is literally a billion times faster than my PC😂😂😂Im a student and i use an Intel i3 laptop processor to train my models, recently shifted to cloud but damn! its so fast

    Reply
  12. Idris PendisBey on

    seperating the datasets to train, validation and test would be a better practice. Only use test data after the training to evaluate the final performance of model and do not return any information from this evaluation back to the development of the model.

    Reply
  13. Sam Slim on

    Thank you. But there are always datasets that are well known and already been preprocesses. I wish to have some video tutorials about how to preprocess the data (because I wanna learn how to process any of my own data) For text, images and videos. Thank you

    Reply
  14. Trinedy on

    Usually you remove dropout layers from the model at validation, keras might do this silently, which would explain the higher validation accuracy. I’m not 100% sure since I usually use tf estimators.

    Reply
  15. James Briggs on

    Correct me if I’m wrong but the diagram at 2:15, I think you are describing it as each cell (A) is a separate recurrent cell, whereas each following ‘cell’ is in fact the same cell but 1 timestep later?

    Reply
  16. James Calam on

    Correct me if I’m wrong but the diagram at 2:15, I think you are describing it as each cell (A) is a separate recurrent cell, whereas each following ‘cell’ is in fact the same cell but 1 timestep later?

    Reply
  17. Patrick Littlefield on

    Wondering why you chose to go with Long-Short Term Memory (LSTM) instead of Gated Recurrent Unit (GRU) for the recurrent hidden unit in the layers. While Googling around for more info on the LSTM function, I came across a research paper (https://arxiv.org/pdf/1412.3555v1.pdf) which seemed to say that, in most cases, GRU was the way to go both in terms of results and computational speed. And since there appears to be a CuDNNGRU function in the CuDNN library, it seems like a lay-up.

    From the paper:
    “Based on our experiments, we concluded that by using a fixed number of parameters for all models, on some datasets GRU can outperform LSTM units both in terms of convergence in CPU time and in terms of parameter updates and generalization.”

    Reply
  18. Prajwal V Atreyas on

    always love your videos.
    could u do a video on tensorflow lite models that can be utilised for running Ml on android phones?

    Reply
  19. Александр Назаров on

    Hi, when I tried to get prediction with model.predict(x_test[0]) I got the error “Error when checking input: expected lstm_input to have 3 dimensions, but got array with shape (28, 28)” . And i didnt catch why it wants 3 dim rather than (28,28) input shape. Thanks for answer.

    Reply
  20. Nyx IoT on

    can you make a simple tutorial on how sequence 2 sequence works?
    I am curious about it and how it is used to make chatbots
    how does the encoder and decoder work?

    Reply
  21. Nyx IoT* on

    can you make a simple tutorial on how sequence 2 sequence works?
    I am curious about it and how it is used to make chatbots
    how does the encoder and decoder work?

    Reply
  22. Nyx AIoT on

    can you make a simple tutorial on how sequence 2 sequence works?
    I am curious about it and how it is used to make chatbots
    how does the encoder and decoder work?

    Reply
  23. Jason Wolbrom on

    As you mentioned training for crypo in your next video, what if you created an algo to train many crypto currencies and get 1 output that can predict any price for any currency?

    Reply
  24. TheCreatingdestiny on

    I would like to know if you are going to exploring the applications of deep learning to financial markets in the near term , if not , what would the best way to start considering that you dont know anything programming , machine learning or deep learning ?

    Reply
  25. Speculator on

    I would like to know if you are going to exploring the applications of deep learning to financial markets in the near term , if not , what would the best way to start considering that you dont know anything programming , machine learning or deep learning ?

    Reply
  26. BigBadBurrow on

    Dropout on every layer? I thought the general consensus was to only add them to the output Dense layers, and not on the hidden layers?

    Reply
  27. Daniel Santos on

    Hi Sentdex, really nice job. Can you please in the next videos, where you will show deep learning on time series for real world examples, use a mutiple input LSTM example and talk a bit about the best approach and pratices to use LSTM’s? You can approach the cryptocurrency example that you mentioned in this last video, using LSTM’s. My biggest difficulty with deep learning, and I guess in general to the rest of the people, is filter the correct information to feed the neural network. Thank you. Keep going with your fantastic work.

    Reply
  28. ChrisHalden007 on

    Very interesting. I would have never thought about using LSTM with MNIST.

    BTW, when using only LSTM (running your code), I get much worse results than you:

    Train on 60000 samples, validate on 10000 samples
    Epoch 1/3
    60000/60000 [==============================] – 341s 6ms/step – loss: 2.0034 – acc: 0.2651 – val_loss: 1.2642 – val_acc: 0.5651
    Epoch 2/3
    60000/60000 [==============================] – 341s 6ms/step – loss: 0.9755 – acc: 0.6602 – val_loss: 0.4962 – val_acc: 0.8445
    Epoch 3/3
    60000/60000 [==============================] – 341s 6ms/step – loss: 0.5671 – acc: 0.8212 – val_loss: 0.3070 – val_acc: 0.9101

    But when I run using CuDDNLSTM, I get very similar results than you:

    Train on 60000 samples, validate on 10000 samples
    Epoch 1/3
    60000/60000 [==============================] – 45s 745us/step – loss: 0.3338 – acc: 0.8980 – val_loss: 0.0977 – val_acc: 0.9708
    Epoch 2/3
    60000/60000 [==============================] – 43s 722us/step – loss: 0.1053 – acc: 0.9722 – val_loss: 0.0602 – val_acc: 0.9828
    Epoch 3/3
    60000/60000 [==============================] – 43s 713us/step – loss: 0.0758 – acc: 0.9802 – val_loss: 0.0596 – val_acc: 0.9845

    Any idea why this happens ? I tried simple LSTM several times and it always much lower training accuracy than yours.
    Just curious what is your opinion about it.
    Thanks

    Keep up the good work. I am really enjoying your videos 😉

    Reply
  29. Dal Sprout on

    Hi, great tutorial sir!
    Just a simple comment, the mnist training set is usually shuffled before begining of the training, maybe that is the reason the val_accuracy is bigger than the accuracy but I’m sorry if i’m mistaken.
    Thank you!

    Reply
  30. Soon-Chang Poh on

    I’m confused. What’s a cell?
    When you add a LSTM layer, you called it a adding a LSTM cell.
    Then, you called the argument for LSTM() which is 128, number of cells. I checked the documentation, I think 128 is the units.

    Reply
  31. Soon-Chang on

    I’m confused. What’s a cell?
    When you add a LSTM layer, you called it a adding a LSTM cell.
    Then, you called the argument for LSTM() which is 128, number of cells. I checked the documentation, I think 128 is the units.

    Reply
  32. Hamedato Hamedato on

    I remember my self-driving car algorithm trained on CNN used to go backward while there was no need for that and it could have gone forward. And I knew why! Because we unsorted the data in order to unbalance it so the training was confused what to do with a single image that sometimes its target says to go forward and sometimes to go backward. So, I’m thinking of training my self-driving car algorithm using RNN which I think it would benefit to learning how to smoothly reduce the speed and turn. But two concerns: 1) Should I still unbalance the data? 2) LSTMs need the input image to be reshaped to a vector. Can I apply the whole image as 2D input to LSTMs by modifying them into convolutional network with some sort of memory to remember dependency between following frames? Thanks!

    Reply
  33. sould3mon on

    if people are getting an error while folling this with the cuda version and following error:
    InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op ‘CudnnRNN’ with these attrs.

    —-my solution—
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    session = tf.Session(config=config)

    just after the imports and it should run fine from command line (not jupyter notebook) for some reason though .

    Reply
  34. Viraat Chandra on

    what does using a lstm rnn mean on this data set… what exactly is the input for each time step and is there any practical use of rnn on image data like minst? Also what exactly is 128, is it like related to the shapes of the gate tensors for the lstm?

    Reply
  35. Jules on

    If the input is 28×28, so 28 sequences of length 28, how comes we can feed that into a 128 nodes network? Shouldn’t the first layer have only 28 nodes? I must be missing something.

    Reply
  36. Saurabh Kumar on

    hey, i have an architectural doubt, would be happy if you reply. The sequence here is rows of 28 pixel values right and you have 128 LSTM nodes, so what exactly happens, is it that each LSTM node feeds in these 28 pixels(which is the sequence) and outputs to the next LSTM node and also to the next layer and this goes for all 28 rows and all 60k images. Is that what’s happening? Thanks

    Reply
  37. deepblundon on

    It would be great if you could draw out the architecture of the mnist example in terms of inputs and blocks. I have a little trouble visualizaing how a 28×28 array feeds into a layer with 128 LSTM blocks. Otherwise, terrific tutorial!

    Reply
  38. Alex nixi on

    I would have thought the input shape would be 1 (lines) x 28 (pixels) if each element is the time series is a line of the mnist digit. Can you please explain why your input is 28 x 28? It is like you are feeding the entire image at once, than how does the network know to “look at it” as if each line was an element of a time series? How do you know it is not “looking at it” as if each pixel was an element of a time series…

    Reply
  39. Uknown on

    I get 10 % after the 3 epochs running exactly the same code as u did :(, tried CuDNN and cpu variations and also normalized data 🙁

    Reply
  40. Rahul Bhatia on

    Hello everyone! Great Video….loved it! Just one question…in general…we should train an LSTM for how many epochs…I know it depends on case to case basis but still like you would train an ANN for 100 epochs…why train an LSTM for just 3…and what difference does it make…if anyone could help..I Would really appericate.

    Thanks

    Reply
  41. Navodit Jain on

    Please help me out. I am getting this error while executing the line
    model.fit(x_train,y_train,epochs=3,validation_data=(x_test, y_test))
    I am using CuDNNLSTM for layers.

    InvalidArgumentError Traceback (most recent call last)
    ~/minorproject-2016-20/minorproject-2016-20/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
    1333 try:
    -> 1334 return fn(*args)
    1335 except errors.OpError as e:

    ~/minorproject-2016-20/minorproject-2016-20/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
    1316 # Ensure any changes to the graph are reflected in the runtime.
    -> 1317 self._extend_graph()
    1318 return self._call_tf_sessionrun(

    ~/minorproject-2016-20/minorproject-2016-20/lib/python3.6/site-packages/tensorflow/python/client/session.py in _extend_graph(self)
    1351 with self._graph._session_run_lock(): # pylint: disable=protected-access
    -> 1352 tf_session.ExtendSession(self._session)
    1353

    InvalidArgumentError: No OpKernel was registered to support Op ‘CudnnRNN’ with these attrs. Registered devices: [CPU,XLA_CPU,XLA_GPU], Registered kernels:
    device=’GPU’; T in [DT_DOUBLE]
    device=’GPU’; T in [DT_FLOAT]
    device=’GPU’; T in [DT_HALF]

    [[{{node cu_dnnlstm/CudnnRNN}} = CudnnRNN[T=DT_FLOAT, direction=”unidirectional”, dropout=0, input_mode=”linear_input”, is_training=true, rnn_mode=”lstm”, seed=0, seed2=0](cu_dnnlstm/transpose, cu_dnnlstm/ExpandDims, cu_dnnlstm/ExpandDims_1, cu_dnnlstm/concat)]]

    Any help would be appreciated.

    Reply
  42. Li Qian on

    Could use the previous code to test/use the model

    import matplotlib.pyplot as plt
    import numpy as np
    predictions = model.predict([x_test])
    print(np.argmax(predictions[219])) # shortcut
    plt.imshow(x_test[219], cmap = plt.cm.binary)
    plt.show()

    Reply
  43. Krutarth Dave on

    I took a reference from a github “image-captioning” project where language model uses LSTM(256), and merging image and language model uses LSTM(1000). what is the signficance of this. Much Thanks in advance..!!

    Reply
  44. surankan de on

    can u do something on ai like jarvis,friday …that can talk like us and can control many things…….thank u…ur videos r the best

    Reply
  45. Ian Song on

    In the first layer of your neural network, why do you always slice the input data and exclude the 0th index? Or does : have a different meaning in tensorflow and Keras? I am very confused 🙁

    Reply
  46. pradeep kumar on

    So I had this issue of ranks in labels due to one hot encoding in using a different data set of images.
    Is there any work around for the issue

    Reply
  47. Le Duck on

    Guys, I have a question:
    Where do machine learning algorithms come into play in terms of neural networks?
    For example when you’re using LSTM for classification vs using kNN algorithm for classification, do neural network layers apply machine learning algorithms?
    Thanks for the help in advance

    Reply
  48. Vasco Cansado Carvalho on

    One thing I noticed is that most of the processing time that TensorFlow spends processing is actually wasted printing the progress on screen so if you silence it defining model.fit( … , verbose=0, … ) it runs WAY faster!

    Reply
  49. Aneesh Prabu on

    Hey, I understand you wanted to teach us the basics, but my LSTM model did not predict the values properly, then I understood that MNIST is a locality-based problem and not sequential. It did work for the IMDB dataset.

    Reply
  50. Basel Ghaibour on

    Thanks for your effort I have a question about how can build a model in one encoder and 2 decoder I want to use the same encoder output with 2 separate decoder one of them a dense and the second a LSTM.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Show Buttons
Hide Buttons