Loading in your own data – Deep Learning basics with Python, TensorFlow and Keras p.2

Welcome to a tutorial where we’ll be discussing how to load in our own outside datasets, which comes with all sorts of challenges!

First, we need a dataset. Let’s grab the Dogs vs Cats dataset from Microsoft: https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765

Text tutorials and sample code: https://pythonprogramming.net/loading-custom-data-deep-learning-python-tensorflow-keras/

Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex


  1. Daniel on

    The first video was great, looking forward to watching this one through as well. Can make a video about using CPU vs GPU for some of these training processes? I would like to learn more about forcing the script to use the GPU for running instead of the CPU. For instance some of your older videos (like the Monte Carlo Simulation series) could benefit from this. Thanks!

  2. Panchsheel on

    One more question if i have a Nvidia GT 710 Graphics Card and 4 GB RAM installed in my PC Can i install Tensorflow-GPU on my Windows 10…..
    but i can’t install it

  3. Harsath Mark Zuckerberg on

    These were videos that I requested. Please make more Project videos in Machine learning and deep learning videos and real-world machine learning projects in PYTHON because You Are The Best to learn from

  4. Harsath Zuckonit on

    These were videos that I requested. Please make more Project videos in Machine learning and deep learning videos and real-world machine learning projects in PYTHON because You Are The Best to learn from

  5. p95humbucker on

    amazing videos, great AI tutorials, honestly one of the best programming channels on YouTube. thank you for making these videos

  6. Sam Witteveen on

    Cool Video as always, but as of TF 1.9 you can use tf.data with Keras to do what you did in here and it will make a much more efficient pipeline for training larger datasets. This will also work for converting to tf.records if you want to change the format. This becomes important when using fast GPUs/TPUs as they no longer are the bottleneck and loading of data into the model is the bottleneck.

  7. Harsath Mark Zuckerberg on

    Also, make videos on Data Cleaning and converting categorical data convention. Don’t always work on Larger datasets also make Machine learning videos on smaller datasets because, those type of datasets will be coming in the real world so thats a peice of my advide Keep Making videos and We Love You Always

  8. Harsath Zuckonit on

    Also, make videos on Data Cleaning and converting categorical data convention. Don’t always work on Larger datasets also make Machine learning videos on smaller datasets because, those type of datasets will be coming in the real world so thats a peice of my advide Keep Making videos and We Love You Always

  9. Thor Odinson on

    Can you also show us the steps to creating your own neural network so we will know how to create our own for other things?

  10. slavenya001 on

    Great tutorial!!!
    Could you please add something for highly imbalanced data set? For example, one from eCommerce when people are not buying 95-96% of the time.
    Could you please also cover a session based sequence prediction? Like many users with many sessions…

  11. Matthew Grotheer on

    I’m not sure which ends up being better….the videos or the random (read: dope) coffee mugs you keep pulling out in them 😉

  12. Cineva in comentarii on

    please update the video you teach us how to use object detection API made by tensorflow , hours of google can’t make me fix protobuf and a lot of people in comments have problems too

    i really like that you remake the neural networks videos , cuz old ones can be harder to understand

  13. Seth Adams on

    What is your opinion on setting an aspect ratio and adding padding during resizing? I just feel like forcing an n x n dimension distorts images too much when we have the varied original resolutions.

  14. ROY on

    Great tutorial video sir,
    Please make a video for colour image data preparation.
    Actually i prepared the data for colour image just removing gray convert line and 3 in reshape instead of 1
    But my final image is showing in blue color

  15. Aasrith Chennapragada on

    Couldn’t you use matplotlib’s imread function? (Well cv2’s does the same thing but one less import line ✌🏻)

  16. Tyler K on

    Great tutorial as always. Very easy to listen to and follow. I was wondering, are you planning to cover things like TFRecords for handling very large datasets sometime in the future? There are other tutorials, but I think the topic would really benefit from your style.

  17. Tozzzer on

    Love these vids. I keep getting an error to do with input size that really fools me all the time. eg when you feed data through the model and it says something like: ‘input_1 needs 3 arguments, but 2 given: (6,2)’ 🙁

  18. Fatih on

    Heay sentdex pls keep up with your videos. They are really helpful in so many ways. Im just starting to get into ML and started studying Computer-Science just because of ML and your videos are so helpful. Thumbs up to you

  19. عبدالرحمن العيسى on

    Great tutorial as always im in love with this channel i learn a lot, i’m trying to build OCR to identify a low resolution documents , do you recommended any source to help me out on this , and i wish some day you create a vedio about this ..Regards

  20. Niclas Wüstenbecker on

    Great tutorial, but the way you load the data is not very memory efficient and this will cause problems with large datasets. First the training_data list is written into RAM and afterwards the same amount of memory is reserved when converting into a numpy array. So this approach is only good for datasets < RAM size/2. Another option would be to create the numpy array at the beginning using np.empty and then write the data as entries into the array. This way the dataset can be as large as your RAM. If the dataset is larger than the RAM size it is suggested to use a generator that loads and yields the data during training. This way your dataset can be as large as your SSD, but training speed is most likely limited by the read speed of the drive. Just something I had to deal with during my thesis in the last couple of months. Maybe you could make a tutorial on the generator one, not a lot of people know about this. Anyways, keep up the good work!

  21. Yoni Fihrer on

    I suggest using context managers for file opening. Cleaner and is better for beginners as you don’t have to remember to close the file

  22. Banama on

    are you typing with 10 fingers ? all the symbols that programming require kinda complicates things, I am still trying to figure out best position to type code faster…

  23. Yunusa Muhammed on

    I did everything right but when I get to print(len(training_data)) after waiting for the excution it shows zero and when I print sample it doesn’t show anything

  24. DemonSlayer627 on

    If your using keras you should use the flow_from_directory function ,it’s really the same thing without the hassle of running out of memory trying to load the entire dataset.

  25. YumekuiNeru on

    so these datasets are pretty small
    how do you divide the dataset into batches of some sort if your dataset is too large to fit in memory at once?
    is this what like hdf5 is for?

  26. vikas mishra on

    Hey sentdex, can you please make video on training our own audio dataset using neural networks.
    love all your videos from india.

  27. Suleiman Mustafa on

    Thanks have been looking forward to this tutorial will help with my thesis.
    For windows, if you have anaconda installed and cannot find module cv2, you may simply have to do:

    pip install opencv-python

    if you are on linux you can do :

    pip install opencv-python

  28. RickertBrandsen on

    These vids always cheer me up 🙂 You are by far my most favourite instructor. 🙂 When I feel depressed i just watch your videos.

  29. Srijal shrestha on

    i got my own datasets of truck and mobile phone, but when i used your code, it has this error “OSError: [Errno 2] No such file or directory: ‘//tmp/images” what should i do?

  30. Shubham Paul on

    I honestly hate that ‘blue’ dog. OpenCV follows BGR whereas matplotlib RGB, I trust.

    One would like to…

    image = cv2.imread(‘your_image.jpg’)
    x, y, z = cv2.split(image)
    image = cv2.merge([z, y, x])

  31. pratyush pradhan on

    in 3:33 you converted the data to grayscale again in 14:45 you said you put 1 because its a grayscale if its already a grayscale data why do you have to put that 1. could you explain I am kinda confused

  32. Simon Moore on

    The latest version of opencv-python wouldn’t work for me inside of a docker container. It has ‘qt’ as a dependency. You can get around it by installing version instead with “pip install opencv-contrib-python==”

  33. Quang Huy Ngô on

    Hi Harrison.
    You’ve been doing an absolutely amazing list of implementating Deep learning videos with Python, Tensorflow, Keras, etc.
    This is the most useful job you’ve ever done. I’ve learned the Machine learning, Deep learning theory easily but implementation and application is something difficult to me. Keep doing this please.

  34. Reza Hosseini on

    I did convert my x to a no.array without reshaping it as you did and I got the exact same shape as you did! So I guess you no longer need to reshape it to (-1,50,50,1). plz tell me if I’ve done something wrong

  35. Mohmed Hussein on

    Great tutorial, but if the images with multi label ,that way is same to load the data with binary classification or multi label classification

  36. Mad Muffin on

    Im not very well versed in all this ComputerStuff. Mainly doning it for fun.
    Is there a way I do not need to download all the images ? Or do I really got to get this almost GB cats and dogs on my PC?

  37. effe rossi on

    How can I do a multi label classification, for example cats, dogs and number of cats or dogs inside the images, their colors etc?

  38. Doug P on

    Thanks for the awesome videos! I’ve been following along in Kaggle. If anyone wants quick access to this, I’ve uploaded as a public dataset here:


    Obviously, I had to make some tweaks to make the data load into Kaggle rather than my Python IDE. Admittedly, I didn’t get the exact same results and am still messing around with the code to figure out what went wrong (for example, the length of my training_data was 24946, +30, after running the create_training_data function). If you see what I did wrong or have any suggestions, please let me know!


  39. ElChe-Ko on

    For those interested, i believe i reached the same output X, Y by creating them directly inside of the first for loop when you load the data. Here is the code in you wanna test it:

    # Define directory where data is
    DATADIR = ‘./Data/kagglecatsanddogs_3367a/PetImages’

    # Define categories
    CATEGORIES = [“Dog”, ‘Cat’]

    # Load data
    IMG_SIZE = 120
    X_train, y_train = [], [] # <------------------ NEW PART!!!! for category in CATEGORIES: path = os.path.join(DATADIR,category) # path to dogs or cats dir label = CATEGORIES.index(category) # define label as 0 (Dog) or 1 (Cat) for img in os.listdir(path): try: img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_COLOR) img_array_resized = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE)) # resize image as IMG_SIZE X IMG_SIZE pixels X (1 for grayscale or 3 for color) X_train.append(img_array_resized) # <------------------ NEW PART!!!! y_train.append(label) # <------------------ NEW PART!!!! #plt.imshow(img_array_resized, cmap='gray') #plt.show() except Exception as e: pass # Shuffle data to avoid bias random.shuffle(data) # Convert into array X_train = np.asarray(X_train) # <------------------ NEW PART!!!! y_train = np.asarray(y_train) # <------------------ NEW PART!!!!

  40. Eivind Strømsvåg on

    it seems that i can’t find the directory to load the file… any help? i use mac

    [Errno 2] No such file or directory: ‘X:/Datasets/PetImages/Dog’

  41. thrivikram reddy on

    from IPython.core.interactiveshell import InteractiveShell

    InteractiveShell.ast_node_interactivity = “all”

    Does this help?

  42. jan biel on

    The value of these videos is fucking incredible. After some setup with anaconda to get tensorflow and python 3.6 to work in pycharm, i was able to reproduce all of this with my own data. Your explanations are absolutely on point and i have no questions left after this part.

  43. Pra Yogiz on

    i enjoyed this so much, i was from CS degree. But not have quite good moment with programming. So i decided to get job that not programming. But, since i was try to learn about pyautogui and selenium from your video, i was so exited to learn ML, and now here am i … following your keras tutorial 😀

  44. Kin Fai on

    I don’t understand the reshape parameters for converting X from list to numpy array. -1 is a catch all is fine, but IMG_SIZE, IMG_SIZE and especially 1 for grey scale doesn’t really make sense to me. anyone can explain it in more details please?

  45. tic tac on

    Have anyone tried with other random dataset? I’m facing error while reshaping. cannot reshape array of size 72 into shape (28,28,1)

  46. Kinshuk Das on

    After importing matplotlib.pyplot you can write %matplotlib inline, then you don’t have to write plt.show()…

    What if I have images of more than two class let’s say dog, cat, bird then how I can label them, I should take the index or I have to one hot encode them?

  47. HuntHoot on

    I feel very dumb. I thought I had a grasp on this stuff in your last video, but this one just breezes through a lot of stuff that I didn’t understand. Not your fault, definitely my own, but it’s super discouraging that I’m apparently the only one here who doesn’t understand most of what I just wrote down. I’ve been programming in python for about a year, guess I need more experience still.

  48. Skyscraper on

    Sir can you make a couple of videos on emotion recognition with CNN’s, I tried haar cascade and then switched to CNN’s and still i am getting <50% accuracy on kaggle dataset fer2013 , I want this to work on realtime video feed but 50% accuracy is no good because all it shows is happy/neutral every time i feel all the work i have put in is of no use.If you could spare some time these videos could be of great help for others as well :-).

  49. Hassan Shaikh on

    Facing UNICODE error like : DATADIR = “C:UsersJunaidDesktoprose vs sunflower”
    SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated UXXXXXXXX escape

    Anybody help.

  50. Gt Cline on

    Another amazing set of tutorials. You truly are helping me understand Python and Deep Learning at a whole different level. Thank you for your time and expertise, Sentdex.

  51. Kim JinYoung on

    great video 🙂 , thank you

    i have one Question.

    why is the result difference between my coding in my computer and your result

    at print(lens()) my result is 0 , but your result is 24916

    above all, same code

  52. ShayCreations : on

    The training data does not give me a length. IT just shows me corrupt image errors and then when i print the length it says 0.

  53. Sagar Khuteta on

    new_array = cv2.resize(img_array, (50, 50))
    cv2.error: OpenCV(3.4.2) C:projectsopencv-pythonopencvmodulesimgprocsrcresize.cpp:4044: error: (-215:Assertion failed) !ssize.empty() in function ‘cv::resize’

    Can anyone help me with this error..

  54. Hameed Hm on

    Great Video @Sentdex on CNN, this will really help all new comers who want to learn DL. I was trying to replicate your code but I am getting the same error what you got about X var, X.append (features) :
    AttributeError: ‘numpy.ndarray’ object has no attribute ‘append’
    Could you help how to resolve this error? Thanks

  55. Maximillian Fam on

    may i know is there any updates of tf and keras on the reshaping of the image array “features”?
    I am not sure but what i could find is just tf.reshape(),function, which does the same thing as np’s reshape

  56. vishnu p.v on

    best video from a pro. i loved it and helped me lot to get the basic idea. please add a tutorial to extract frames from a 100 videos in a folder within different folders . i expects a positive reply from u pro…..

  57. Omar Cusma Fait on

    I’m using CoLab
    —-> 6 for img in os.listdir(path):
    FileNotFoundError: [Errno 2] No such file or directory: ‘C:/Datasets/PetImages/Dog’

  58. Maor Cohen on

    when I run the function “create_training_data()” I get this error “Corrupt JPEG data: 128 extraneous bytes before marker 0xd9”, how do I fix this.

  59. Aniket Patil on

    How should I augment my data? I am doing cancer prediction and I have 50 images I want to make them to 200 how should I do that? Like flipping rotating etc

  60. Koutini Marwan on

    when i try to execute the code
    i have this error : NameError: name ‘create_training_data’ is not defined
    anyone knows solve this problem?

  61. David G on

    how would i convert this to using layers then giving me a list of the top 5 predictions if i add 10 categories and for instance defining a kitchen and a bathroom and whats in them like plastic bottle , glass, cup, person, food kind of thing ?

  62. Graham Sahagian on

    I like your videos but this tutorial series or at leas the first few seems to be taken directly from Francois Chollet’s book on Deep Learning with Python. His first example is the MNIST data set, then goes into more depth with the cat & dog data set…. I’m not sure if its just a coincidence but if you did use his book as a guide then you should at least cite him as a reference. just saying

  63. Kacem ICHAKDI on

    Hi sir, I hope that u are fine. I just have a problem in ‘except Exception’ but I don’t know why

    File “C:/Users/kacem/Desktop/deep learnig/cat-dog.py”, line 28
    except Exception as e:
    IndentationError: unindent does not match any outer indentation level

  64. syed mustafa on

    hey, @sentdex I am little confused in some tutorials you use Ubuntu and in some windows but windows sucks. I Have seen your object detection tutorials but there you have used for first part windows and another part windows

  65. cuda nexus on

    hey, @sentdex I am little confused in some tutorials you use Ubuntu and in some windows but windows sucks. I Have seen your object detection tutorials but there you have used for first part windows and another part windows

  66. Nor Eddine Belhaoua on

    Hello, when i try to reshape X as you do i got this error can you help me with it
    ValueError: cannot reshape array of size 140000 into shape (50,50,3)


    X = np.array(X).reshape(-1, img_size, img_size, 1)

    NameError: name ‘X’ is not defined
    tell the solution

  68. Biranavan Pulendralingam on

    Hey I was following along with your tutorial, but at the:
    print(len(training_data)) i get 0

    and in my terminal it says images are corrupted? Any help, because i could not find anything helpful :/?

  69. federmontes on

    Hi. Love your videos. It is giving me an error on the image conversion to an array.

    in line 15 plt.imshow(img_array, cmap=’gray’)
    TypeError: Image data cannot be converted to float.

  70. Ali R. Memon on

    I would like to compare the performance of with and without resizing image size. Could someone refer me to the tutorial how to train network on variable size of images?

  71. Suyog Nepal on

    —-> 1 create_training_data()

    in create_training_data()
    4 for category in CATEGORIES:
    5 path = os.path.join(DATA, category)
    —-> 6 class_num = CATEGORIES.index(category)
    7 for img in os.listdir(path):

    AttributeError: ‘set’ object has no attribute ‘index’


  72. Dhiraj Neupane on

    File “C:/Users/User/PycharmProjects/Myproject/TensorFlow-Tutorials-master/Loading Data.py”, line 41
    SyntaxError: invalid syntax

    Error Encountered:
    Please Help

  73. Sai Krishna on

    Qsn: image looks like this [[255 255 255 … 255 255 255]
    [255 255 255 … 255 255 255]
    [255 255 255 … 255 255 255]

    [255 255 255 … 255 255 255]
    [255 255 255 … 255 255 255]
    [255 255 255 … 255 255 255]]
    What does that mean ?

  74. lautaro dapin on

    Hi, i had been having problems to load the data using jupyter notebook
    FileNotFoundError: [WinError 3] El sistema no puede encontrar la ruta especificada: ‘abcbelen\a’ (sistem can not find the specified rute)
    I could not find how to solve the issue
    Other errors that i had were “permision denied”

  75. mohamed touati on

    HI sentdex ! good work i like ur codes , how can i train my image dataset and do my classification of an image from a camera video ?

  76. Ahmed Abdelwahab on

    os.path.join won’t take a list as an argument, how did you get it to work? I’ve struggled with this for a really long time


    Hi sentdex, the dataset is not downloadable from the website. So please host it in your website or here. Thanks in advance.

  78. Raphael Nazirullah on

    I use Google Colab bcoz of their GPU. Running NNs on my local machine takes a ot of time. But I don’t know how to load local data from my PC to Google Colab. Could u plz make a short video on that, I haven’t been able to figure it out from Google searches?

  79. Jan Kuliga on

    I always get an error (FIleNotFoundError: [Errno 2] No such file or directory: ‘C:/DATASETS/TEST/Cat on the line: for img in os.listdir(path)
    The folder exist, but why can Python not find it


Leave a Reply

Your email address will not be published. Required fields are marked *

Show Buttons
Hide Buttons