Graphing/visualization – Data Analysis with Python and Pandas p.2

Doing some basic visualizations with our Pandas dataframe in Python with Matplotlib.

Text-based tutorial: https://pythonprogramming.net/graph-visualization-python3-pandas-data-analysis/

Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex

63 comments

  1. markobe08 on

    Coding O.G. strikes again! Today on my job I liked the video from other account (savage). I’m joking. Enough respect!! I know that you already did machine learning but is there any possibility of going in that way after data analysis?

    Reply
  2. atrumluminarium on

    “pandas at the end of the day,
    Can be a multidimensional array,
    And how might that happen, by the way”

    Damn… Sentdex dropping those lit rhymes

    Reply
  3. Milan Lora on

    I didn’t really understand the “organic” part. What would’ve happened if you chose the other object? Are there multiple rows with the same date now?

    Reply
  4. Skull3r1121990 on

    I just wanted to thank You for your Tutorials. Especial for the “older” Finance and visualisation one. They helped me a lot to create my own little Stockanalysis Program for my exam.
    I also recommented your Channel to my professor 😀
    I´d be happy if you could do a new one on finance and stock analyse tutorial maybe without machine learning, more like the Tkinter Tutorial

    Reply
  5. Vinayak Gosale on

    Quick tip: if you put a semicolon at the end of your .plot(..) functions, you will no longer see the matplotlib object printed in the output area of Jupyter just above the figures. Hope this is useful.

    Reply
  6. Semih Öztürk on

    How can I be as expert as you in python? lol
    you are too good, congrats.
    Do you have any suggestion for me about how to study to be data scientist/machine learning engineer?
    Thank you.

    Reply
  7. heavy pump on

    Pls talk about underflow overflow numerical method ….. Float number precition . How machine store number like
    secant x base^exposant ….. How to solve non linéaire équations .. newton methode …. But all this with python

    Reply
  8. Yaqiong Li on

    Thank you for doing this!! I have a question here though. Your moving/rolling average is the average of the first 25 points, so how you sort your data actually matters. If the previous records/rows have different regions, then plotting the rolling mean for different regions doesn’t make sense to me. Can someone explain?

    Reply
  9. amr nashawaty on

    really thank you for your time. If I had a way to support you I would, but unfortunately I am in Syria
    pray for us bro 😉

    Reply
  10. Vinícius Zanchini on

    when i execute the initial code until albany_df[“AveragePrice”].plot(), my graph doesnt display on block, just this

    do i need to import matplotlib?

    update: used %matplotlib inline
    and it worked.

    Reply
  11. Ilyas k on

    hey there! I have a problem with running a simple a program on the respberry pi, I wish you can help me with it. I’m new at programming, so, excuse me if you find too easy

    from time import sleep
    import RPi.GPIO as GPIO
    GPIO.setmode(GPIO.BOARD)

    button1=17
    button2=18
    button3=26
    button4=20

    ledpin1=22
    ledpin2=23

    GPIO.setup(ledpin1, GPIO.OUT)
    GPIO.setup(ledpin2, GPIO.OUT)
    GPIO.setup(button1, GPIO.IN, pull_up_down=GPIO.PUD_UP)
    GPIO.setup(button2, GPIO.IN, pull_up_down=GPIO.PUD_UP)
    GPIO.setup(button3, GPIO.IN, pull_up_down=GPIO.PUD_UP)
    GPIO.setup(button4, GPIO.IN, pull_up_down=GPIO.PUD_UP)

    try:
    while True:

    if GPIO.input(button1)==1:
    GPIO.output(ledpin1,true)

    GPIO.output(ledpin2,true)
    print’button1 was pressed’

    elif GPIO.input(button2)==1:
    GPIO.output(ledpin1,true)

    GPIO.output(ledpin2,false)
    print’button2 was pressed’

    elif GPIO.input(button3)==1:
    GPIO.output(ledpin1,false)

    GPIO.output(ledpin2,true)
    print’button3 was pressed’

    else:
    GPIO.input(button4)==1
    GPIO.output(ledpin1,false)

    GPIO.output(ledpin2,false)
    print’button4 was pressed’
    except KeyboardInterrupt:
    GPIO.cleanup()

    Reply
  12. Malko Gindrat on

    It’s amazing how you can compact so many things and still remain clear. Thanks to you I really start to use now pandas, kaggle, jupyter!

    Reply
  13. clockwerkz on

    I’m a little unclear on the warning: so is it because the new dataset we’re creating, albany_df, is referencing the original df dataset and any changes to df will affect albany_df’s data? That’s why we make a copy of df so that it breaks the reference dependency?

    Also, when I plot the graph of all of the 25mas I only get one line showing up. BaltimoreWashington is the only one displaying for me.

    Great video series, thank you for making a new version of working with pandas!

    Reply
  14. The Pug Engineer on

    I was just looking for a tutorial to visualize reproduction of a Pug with the 349 known breeds. Seems that I have to wait for the continuation of that series, hopefully heading towards GANs soon!

    Reply
  15. Damian Shaw on

    When you use set_index you can use verify_integrity=True, this will check the uniqueness of the index. Which Is generally what you want out of an index and also when you use join you don’t get many-to-many joins which increase your dataframe size exponentially (and explode RAM)

    Reply
  16. Vinícius Zanchini on

    I have a Kaggle database that contains 1kk+ tweets and I would like to create a Model to predict unseen tweets as DEPRESSIVE or not based on 500 tweets labeled by me as DEPRESSIVE.
    I’ll label according to my intuition about depression symptoms transmitted through words on the tweet. I know that this guess will be the limiter of my model accuracy because it lacks certainty if the tweet’s author is depressive or not).
    Does anybody know any youtube video/lecture/articles/book about NLP that will guide me through this problem and clear up my mind a little bit?
    Thanks.

    Reply
  17. Nahid Muzammil on

    Hey, thanks for these videos.

    So, you said working on Pandas is faster because it’s running on C++. And I get that, but we’re using Pandas with Python – so how do I make sense of that? Is the Python code being translated to C++ and then compiled on my computer?

    Reply
  18. Jan Kowalski on

    Can somebody explain me what rolling value exactly is? It is period of time but what exactly? Month or Day? This function calc average? I dont understanding this well but need to know it

    Reply
  19. Francis Boey on

    Would you define what is ‘PLU’? I don’t understand how does it relate to column header ‘Type’ and why it takes up more RAM. Pls advise. Many thanks.

    Reply
  20. Justin Siegel on

    I love when you stop to take a sip from your always interesting coffee mugs! Also your videos are really helpful. Thank you!

    Reply
  21. G T on

    hey Sentdex, I forgot to put it in the limit [:16] and I think it blows the memory because you are basically loading a full outer join into memory. Due to the multiple dates you keep computing combinations of rows with same dates that are increasing thus *BUUUM* I think.

    Reply
  22. MrOberschlumpf on

    Hi,
    first I want to say great video! I am new to all this and have had some problems:
    f strings are not available for me. Thats because it was introduced in Python 3.6. No Problem now…
    But now it says: AttributeError: ‘Series’ object has no attribute ‘join’
    and I dont know why I dont have this. Can someone help. I got everything so far and am at approx 20 minutes into the video
    Sorry for bad english

    Reply
  23. Ayanava Dasgupta on

    At the time of plotting the graph
    graph_df.plot()
    I am getting only a single curve… Not that crowded one with so many legends shown in the tutorial… Why?

    Reply
  24. cartmankiller2 on

    Maybe one could have done this as well with the groupby operation with something like:

    _ = df.set_index(“Date”).groupby(“region”).rolling(25)[“AveragePrice”].mean().groupby(“region”).plot(legend=False, figsize=(8,5))

    But – my Graph looks slightly different compared to yours. Not sure why.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Show Buttons
Hide Buttons