Loading in your own data – Deep Learning basics with Python, TensorFlow and Keras p.2

Welcome to a tutorial where we’ll be discussing how to load in our own outside datasets, which comes with all sorts of challenges!

First, we need a dataset. Let’s grab the Dogs vs Cats dataset from Microsoft: https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765

Text tutorials and sample code: https://pythonprogramming.net/loading-custom-data-deep-learning-python-tensorflow-keras/

Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex

162 comments

  1. Daniel on

    The first video was great, looking forward to watching this one through as well. Can make a video about using CPU vs GPU for some of these training processes? I would like to learn more about forcing the script to use the GPU for running instead of the CPU. For instance some of your older videos (like the Monte Carlo Simulation series) could benefit from this. Thanks!

    Reply
  2. Panchsheel on

    One more question if i have a Nvidia GT 710 Graphics Card and 4 GB RAM installed in my PC Can i install Tensorflow-GPU on my Windows 10…..
    but i can’t install it

    Reply
  3. Harsath Mark Zuckerberg on

    These were videos that I requested. Please make more Project videos in Machine learning and deep learning videos and real-world machine learning projects in PYTHON because You Are The Best to learn from

    Reply
  4. Harsath Zuckonit on

    These were videos that I requested. Please make more Project videos in Machine learning and deep learning videos and real-world machine learning projects in PYTHON because You Are The Best to learn from

    Reply
  5. p95humbucker on

    amazing videos, great AI tutorials, honestly one of the best programming channels on YouTube. thank you for making these videos

    Reply
  6. Sam Witteveen on

    Cool Video as always, but as of TF 1.9 you can use tf.data with Keras to do what you did in here and it will make a much more efficient pipeline for training larger datasets. This will also work for converting to tf.records if you want to change the format. This becomes important when using fast GPUs/TPUs as they no longer are the bottleneck and loading of data into the model is the bottleneck.

    Reply
  7. Harsath Mark Zuckerberg on

    Also, make videos on Data Cleaning and converting categorical data convention. Don’t always work on Larger datasets also make Machine learning videos on smaller datasets because, those type of datasets will be coming in the real world so thats a peice of my advide Keep Making videos and We Love You Always

    Reply
  8. Harsath Zuckonit on

    Also, make videos on Data Cleaning and converting categorical data convention. Don’t always work on Larger datasets also make Machine learning videos on smaller datasets because, those type of datasets will be coming in the real world so thats a peice of my advide Keep Making videos and We Love You Always

    Reply
  9. Thor Odinson on

    Can you also show us the steps to creating your own neural network so we will know how to create our own for other things?

    Reply
  10. slavenya001 on

    Great tutorial!!!
    Could you please add something for highly imbalanced data set? For example, one from eCommerce when people are not buying 95-96% of the time.
    Could you please also cover a session based sequence prediction? Like many users with many sessions…

    Reply
  11. Matthew Grotheer on

    I’m not sure which ends up being better….the videos or the random (read: dope) coffee mugs you keep pulling out in them 😉

    Reply
  12. Cineva in comentarii on

    please update the video you teach us how to use object detection API made by tensorflow , hours of google can’t make me fix protobuf and a lot of people in comments have problems too

    i really like that you remake the neural networks videos , cuz old ones can be harder to understand

    Reply
  13. Seth Adams on

    What is your opinion on setting an aspect ratio and adding padding during resizing? I just feel like forcing an n x n dimension distorts images too much when we have the varied original resolutions.

    Reply
  14. ROY on

    Great tutorial video sir,
    Please make a video for colour image data preparation.
    Actually i prepared the data for colour image just removing gray convert line and 3 in reshape instead of 1
    But my final image is showing in blue color

    Reply
  15. Aasrith Chennapragada on

    Couldn’t you use matplotlib’s imread function? (Well cv2’s does the same thing but one less import line ✌🏻)

    Reply
  16. Tyler K on

    Great tutorial as always. Very easy to listen to and follow. I was wondering, are you planning to cover things like TFRecords for handling very large datasets sometime in the future? There are other tutorials, but I think the topic would really benefit from your style.

    Reply
  17. Tozzzer on

    Love these vids. I keep getting an error to do with input size that really fools me all the time. eg when you feed data through the model and it says something like: ‘input_1 needs 3 arguments, but 2 given: (6,2)’ 🙁

    Reply
  18. Fatih on

    Heay sentdex pls keep up with your videos. They are really helpful in so many ways. Im just starting to get into ML and started studying Computer-Science just because of ML and your videos are so helpful. Thumbs up to you

    Reply
  19. عبدالرحمن العيسى on

    Great tutorial as always im in love with this channel i learn a lot, i’m trying to build OCR to identify a low resolution documents , do you recommended any source to help me out on this , and i wish some day you create a vedio about this ..Regards

    Reply
  20. Niclas Wüstenbecker on

    Great tutorial, but the way you load the data is not very memory efficient and this will cause problems with large datasets. First the training_data list is written into RAM and afterwards the same amount of memory is reserved when converting into a numpy array. So this approach is only good for datasets < RAM size/2. Another option would be to create the numpy array at the beginning using np.empty and then write the data as entries into the array. This way the dataset can be as large as your RAM. If the dataset is larger than the RAM size it is suggested to use a generator that loads and yields the data during training. This way your dataset can be as large as your SSD, but training speed is most likely limited by the read speed of the drive. Just something I had to deal with during my thesis in the last couple of months. Maybe you could make a tutorial on the generator one, not a lot of people know about this. Anyways, keep up the good work!

    Reply
  21. Yoni Fihrer on

    I suggest using context managers for file opening. Cleaner and is better for beginners as you don’t have to remember to close the file

    Reply
  22. Banama on

    are you typing with 10 fingers ? all the symbols that programming require kinda complicates things, I am still trying to figure out best position to type code faster…

    Reply
  23. Yunusa Muhammed on

    I did everything right but when I get to print(len(training_data)) after waiting for the excution it shows zero and when I print sample it doesn’t show anything

    Reply
  24. DemonSlayer627 on

    If your using keras you should use the flow_from_directory function ,it’s really the same thing without the hassle of running out of memory trying to load the entire dataset.

    Reply
  25. YumekuiNeru on

    so these datasets are pretty small
    how do you divide the dataset into batches of some sort if your dataset is too large to fit in memory at once?
    is this what like hdf5 is for?

    Reply
  26. vikas mishra on

    Hey sentdex, can you please make video on training our own audio dataset using neural networks.
    love all your videos from india.

    Reply
  27. Suleiman Mustafa on

    Thanks have been looking forward to this tutorial will help with my thesis.
    For windows, if you have anaconda installed and cannot find module cv2, you may simply have to do:

    pip install opencv-python

    if you are on linux you can do :

    pip install opencv-python

    Reply
  28. RickertBrandsen on

    These vids always cheer me up 🙂 You are by far my most favourite instructor. 🙂 When I feel depressed i just watch your videos.

    Reply
  29. Srijal shrestha on

    i got my own datasets of truck and mobile phone, but when i used your code, it has this error “OSError: [Errno 2] No such file or directory: ‘//tmp/images” what should i do?

    Reply
  30. Shubham Paul on

    I honestly hate that ‘blue’ dog. OpenCV follows BGR whereas matplotlib RGB, I trust.

    One would like to…

    image = cv2.imread(‘your_image.jpg’)
    x, y, z = cv2.split(image)
    image = cv2.merge([z, y, x])

    Reply
  31. pratyush pradhan on

    in 3:33 you converted the data to grayscale again in 14:45 you said you put 1 because its a grayscale if its already a grayscale data why do you have to put that 1. could you explain I am kinda confused

    Reply
  32. Simon Moore on

    The latest version of opencv-python wouldn’t work for me inside of a docker container. It has ‘qt’ as a dependency. You can get around it by installing version 3.3.0.9 instead with “pip install opencv-contrib-python==3.3.0.9”

    Reply
  33. Quang Huy Ngô on

    Hi Harrison.
    You’ve been doing an absolutely amazing list of implementating Deep learning videos with Python, Tensorflow, Keras, etc.
    This is the most useful job you’ve ever done. I’ve learned the Machine learning, Deep learning theory easily but implementation and application is something difficult to me. Keep doing this please.

    Reply
  34. Reza Hosseini on

    I did convert my x to a no.array without reshaping it as you did and I got the exact same shape as you did! So I guess you no longer need to reshape it to (-1,50,50,1). plz tell me if I’ve done something wrong

    Reply
  35. Mohmed Hussein on

    Great tutorial, but if the images with multi label ,that way is same to load the data with binary classification or multi label classification

    Reply
  36. Mad Muffin on

    Im not very well versed in all this ComputerStuff. Mainly doning it for fun.
    Is there a way I do not need to download all the images ? Or do I really got to get this almost GB cats and dogs on my PC?

    Reply
  37. effe rossi on

    How can I do a multi label classification, for example cats, dogs and number of cats or dogs inside the images, their colors etc?

    Reply
  38. Doug P on

    Thanks for the awesome videos! I’ve been following along in Kaggle. If anyone wants quick access to this, I’ve uploaded as a public dataset here:

    https://www.kaggle.com/thesherpafromalabama/cats-and-dogs-sentdex-tutorial

    Obviously, I had to make some tweaks to make the data load into Kaggle rather than my Python IDE. Admittedly, I didn’t get the exact same results and am still messing around with the code to figure out what went wrong (for example, the length of my training_data was 24946, +30, after running the create_training_data function). If you see what I did wrong or have any suggestions, please let me know!

    Kernel:
    https://www.kaggle.com/thesherpafromalabama/sentdex-2-deep-learning-basics-p2

    Reply
  39. ElChe-Ko on

    For those interested, i believe i reached the same output X, Y by creating them directly inside of the first for loop when you load the data. Here is the code in you wanna test it:

    # Define directory where data is
    DATADIR = ‘./Data/kagglecatsanddogs_3367a/PetImages’

    # Define categories
    CATEGORIES = [“Dog”, ‘Cat’]

    # Load data
    IMG_SIZE = 120
    X_train, y_train = [], [] # <------------------ NEW PART!!!! for category in CATEGORIES: path = os.path.join(DATADIR,category) # path to dogs or cats dir label = CATEGORIES.index(category) # define label as 0 (Dog) or 1 (Cat) for img in os.listdir(path): try: img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_COLOR) img_array_resized = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE)) # resize image as IMG_SIZE X IMG_SIZE pixels X (1 for grayscale or 3 for color) X_train.append(img_array_resized) # <------------------ NEW PART!!!! y_train.append(label) # <------------------ NEW PART!!!! #plt.imshow(img_array_resized, cmap='gray') #plt.show() except Exception as e: pass # Shuffle data to avoid bias random.shuffle(data) # Convert into array X_train = np.asarray(X_train) # <------------------ NEW PART!!!! y_train = np.asarray(y_train) # <------------------ NEW PART!!!!

    Reply
  40. Eivind Strømsvåg on

    it seems that i can’t find the directory to load the file… any help? i use mac

    [Errno 2] No such file or directory: ‘X:/Datasets/PetImages/Dog’

    Reply
  41. thrivikram reddy on

    from IPython.core.interactiveshell import InteractiveShell

    InteractiveShell.ast_node_interactivity = “all”

    Does this help?

    Reply
  42. jan biel on

    The value of these videos is fucking incredible. After some setup with anaconda to get tensorflow and python 3.6 to work in pycharm, i was able to reproduce all of this with my own data. Your explanations are absolutely on point and i have no questions left after this part.

    Reply
  43. Pra Yogiz on

    i enjoyed this so much, i was from CS degree. But not have quite good moment with programming. So i decided to get job that not programming. But, since i was try to learn about pyautogui and selenium from your video, i was so exited to learn ML, and now here am i … following your keras tutorial 😀

    Reply
  44. Kin Fai on

    I don’t understand the reshape parameters for converting X from list to numpy array. -1 is a catch all is fine, but IMG_SIZE, IMG_SIZE and especially 1 for grey scale doesn’t really make sense to me. anyone can explain it in more details please?

    Reply
  45. tic tac on

    Have anyone tried with other random dataset? I’m facing error while reshaping. cannot reshape array of size 72 into shape (28,28,1)

    Reply
  46. Kinshuk Das on

    After importing matplotlib.pyplot you can write %matplotlib inline, then you don’t have to write plt.show()…

    What if I have images of more than two class let’s say dog, cat, bird then how I can label them, I should take the index or I have to one hot encode them?

    Reply
  47. HuntHoot on

    I feel very dumb. I thought I had a grasp on this stuff in your last video, but this one just breezes through a lot of stuff that I didn’t understand. Not your fault, definitely my own, but it’s super discouraging that I’m apparently the only one here who doesn’t understand most of what I just wrote down. I’ve been programming in python for about a year, guess I need more experience still.

    Reply
  48. Skyscraper on

    Sir can you make a couple of videos on emotion recognition with CNN’s, I tried haar cascade and then switched to CNN’s and still i am getting <50% accuracy on kaggle dataset fer2013 , I want this to work on realtime video feed but 50% accuracy is no good because all it shows is happy/neutral every time i feel all the work i have put in is of no use.If you could spare some time these videos could be of great help for others as well :-).

    Reply
  49. Hassan Shaikh on

    Facing UNICODE error like : DATADIR = “C:UsersJunaidDesktoprose vs sunflower”
    ^
    SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated UXXXXXXXX escape

    Anybody help.

    Reply
  50. Gt Cline on

    Another amazing set of tutorials. You truly are helping me understand Python and Deep Learning at a whole different level. Thank you for your time and expertise, Sentdex.

    Reply
  51. Kim JinYoung on

    great video 🙂 , thank you

    i have one Question.

    why is the result difference between my coding in my computer and your result

    at print(lens()) my result is 0 , but your result is 24916

    above all, same code

    Reply
  52. ShayCreations : on

    The training data does not give me a length. IT just shows me corrupt image errors and then when i print the length it says 0.

    Reply
  53. Sagar Khuteta on

    new_array = cv2.resize(img_array, (50, 50))
    cv2.error: OpenCV(3.4.2) C:projectsopencv-pythonopencvmodulesimgprocsrcresize.cpp:4044: error: (-215:Assertion failed) !ssize.empty() in function ‘cv::resize’

    Can anyone help me with this error..

    Reply
  54. Hameed Hm on

    Great Video @Sentdex on CNN, this will really help all new comers who want to learn DL. I was trying to replicate your code but I am getting the same error what you got about X var, X.append (features) :
    AttributeError: ‘numpy.ndarray’ object has no attribute ‘append’
    Could you help how to resolve this error? Thanks

    Reply
  55. Maximillian Fam on

    Hello,
    may i know is there any updates of tf and keras on the reshaping of the image array “features”?
    I am not sure but what i could find is just tf.reshape(),function, which does the same thing as np’s reshape

    Reply
  56. vishnu p.v on

    best video from a pro. i loved it and helped me lot to get the basic idea. please add a tutorial to extract frames from a 100 videos in a folder within different folders . i expects a positive reply from u pro…..

    Reply
  57. Omar Cusma Fait on

    I’m using CoLab
    —-> 6 for img in os.listdir(path):
    FileNotFoundError: [Errno 2] No such file or directory: ‘C:/Datasets/PetImages/Dog’

    Reply
  58. Maor Cohen on

    when I run the function “create_training_data()” I get this error “Corrupt JPEG data: 128 extraneous bytes before marker 0xd9”, how do I fix this.

    Reply
  59. Aniket Patil on

    How should I augment my data? I am doing cancer prediction and I have 50 images I want to make them to 200 how should I do that? Like flipping rotating etc

    Reply
  60. Koutini Marwan on

    when i try to execute the code
    i have this error : NameError: name ‘create_training_data’ is not defined
    anyone knows solve this problem?

    Reply
  61. David G on

    how would i convert this to using layers then giving me a list of the top 5 predictions if i add 10 categories and for instance defining a kitchen and a bathroom and whats in them like plastic bottle , glass, cup, person, food kind of thing ?

    Reply
  62. Graham Sahagian on

    I like your videos but this tutorial series or at leas the first few seems to be taken directly from Francois Chollet’s book on Deep Learning with Python. His first example is the MNIST data set, then goes into more depth with the cat & dog data set…. I’m not sure if its just a coincidence but if you did use his book as a guide then you should at least cite him as a reference. just saying

    Reply
  63. Kacem ICHAKDI on

    Hi sir, I hope that u are fine. I just have a problem in ‘except Exception’ but I don’t know why

    File “C:/Users/kacem/Desktop/deep learnig/cat-dog.py”, line 28
    except Exception as e:
    ^
    IndentationError: unindent does not match any outer indentation level

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Show Buttons
Hide Buttons