Building Search Word Database – Data Analysis of Google Tracking Takeout p.1

In these videos, we’re going to be analyzing some of the data that we can download on ourselves that Google tracks on us.

To begin, we’ll be analyzing our google search queries to see if we can pick out certain trends and changes. First, we’ll have to build a database of these search terms, then we can graph them over time.

Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex

32 comments

  1. Celivalg on

    have you tried using jupyter-notebook? it’s an IDE for python that’s designed for scripting in a way in python
    It’s really really nice when you want to script stuff quickly

    Reply
  2. space i s water on

    Great video really interesting what you are doing with google’s data, looking forward to part II.

    Reply
  3. scott harwood on

    Imma be that bro, this is so ugly 😛 you could have used ElementTree to parse the xml 😛 Love this though, you are a brave person for doing this on a person data set. Its always nice to see that a dev still has to google common things.

    Reply
  4. dddk097777 on

    You made me curious im gonna try to see my top searched words. I have a feeling that i may see python at the top

    Reply
  5. Sahil Shukla on

    I am not ashamed of my search history, it’s 90% “how to {task} in Python” and 10% of mismatched lyrics trying to find a song from years ago.

    Reply
  6. wi1 on

    you can get nltk’s official stopwords by doing from nltk.corpus import stopwords

    edit: you have to do nltk.download(‘stopwords’) first

    the set if anyone wants it: {‘y’, “mustn’t”, ‘o’, “hadn’t”, ‘that’, ‘ourselves’, “weren’t”, ‘can’, ‘then’, ‘it’, ‘these’, ‘theirs’, ‘more’, ‘before’, ‘has’, ‘we’, “should’ve”, “couldn’t”, “shan’t”, ‘his’, ‘between’, ‘aren’, ‘whom’, ‘hadn’, ‘yourselves’, ‘ll’, ‘isn’, ‘those’, ‘not’, ‘herself’, ‘couldn’, ‘s’, ‘weren’, “didn’t”, ‘against’, ‘other’, ‘haven’, “you’d”, ‘don’, ‘for’, ‘myself’, ‘both’, ‘again’, ‘here’, ‘in’, ‘he’, ‘below’, ‘him’, ‘over’, ‘should’, ‘themselves’, ‘doesn’, ‘up’, ‘same’, ‘the’, ‘into’, “shouldn’t”, “that’ll”, ‘after’, ‘hasn’, ‘she’, ‘which’, ‘had’, “haven’t”, ‘any’, ‘its’, ‘they’, ‘most’, ‘himself’, ‘t’, ‘their’, ‘just’, ‘such’, ‘on’, ‘during’, ‘this’, ‘down’, ‘d’, ‘m’, ‘too’, ‘me’, ‘as’, ‘by’, ‘wasn’, ‘yours’, ‘who’, ‘have’, ‘am’, ‘hers’, ‘further’, “won’t”, ‘what’, ‘to’, ‘nor’, ‘some’, ‘you’, ‘if’, “wouldn’t”, “it’s”, ‘because’, ‘ma’, ‘out’, ‘shan’, ‘will’, ‘been’, “aren’t”, “hasn’t”, ‘re’, ‘own’, ‘very’, ‘needn’, ‘our’, ‘when’, “wasn’t”, ‘her’, ‘through’, “you’ll”, ‘only’, ‘why’, ‘won’, ‘an’, ‘itself’, ‘there’, ‘be’, ‘do’, ‘were’, ‘are’, ‘above’, ‘now’, ‘ain’, ‘a’, ‘how’, ‘being’, “she’s”, “doesn’t”, ‘so’, ‘under’, ‘few’, ‘is’, ‘each’, ‘ours’, ‘once’, ‘at’, “you’re”, ‘mustn’, ‘having’, “don’t”, ‘of’, ‘no’, ‘about’, ‘from’, ‘your’, ‘off’, ‘with’, ‘or’, ‘my’, ‘until’, ‘where’, ‘mightn’, ‘yourself’, ‘them’, ‘while’, ‘all’, ‘and’, ‘didn’, ‘doing’, ‘i’, ‘than’, ‘but’, ‘was’, ‘did’, ‘wouldn’, “isn’t”, ‘shouldn’, “mightn’t”, “needn’t”, “you’ve”, ‘does’, ‘ve’}

    Reply
  7. Kyle Gunby on

    You can actually set it to download the data in a json file. When selecting which info to download, there is a format option.

    Reply
  8. paschein on

    EDT is Eastern Daylight Time, which is EST for Daylight Savings Time.

    Any idea how to pass ffmpeg through tqdm? 🙂

    Reply
  9. Jonathan Vusich on

    It probably would have been a good idea to use an HTML/XML parser such as lxml or Beautiful Soup for the first section of the video.

    Reply
  10. fuuman5 on

    I downloaded my dataset and analyzed the google searches.
    My Top 3 occuring words:
    2458 python
    1759 youtube
    1645 %CITY_I_LIVE_IN%

    Another awesome set of videos. Love your content bro. <3

    Reply
  11. Josselin Marnat on

    Hi, so, why didn’t you used BeautifulSoup or at least regexes to parse this html file? I think you would have been able to write this much faster (and cleaner ?).
    Great tutorial anyway!

    Reply
  12. WhatMACHI on

    What blew my mind was typing cmd from the Windows Explorer drive path to open CMD into that directory O_O

    Reply
  13. Elí Benson Vasquez Hdz on

    I love when you pick on your viewers,(on the ones that complaint) LOL, please keep making videos man!!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Show Buttons
Hide Buttons