Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Python Histogram Plotting: NumPy, Matplotlib, Pandas & Seaborn (Conclusion)

At this point, you’ve seen more than a handful of functions and methods to choose from for plotting a Python histogram. How do they compare? In short, there is no one-size-fits-all answer. Here’s a recap of the functions and methods you’ve covered so far, all of which relate to breaking down and representing distributions in Python:

You Have/Want To Consider Using Note(s)
Clean-cut integer data housed in a data structure such as a list, tuple, or set, and you want to create a Python histogram without importing any third party libraries. collections.Counter() from the Python standard library offers a fast and straightforward way to get frequency counts from a container of data. This is a frequency table, so it doesn’t use the concept of binning as a “true” histogram does.
Large array of data, and you want to compute the “mathematical” histogram that represents bins and the corresponding frequencies. NumPy’s np.histogram() and np.bincount() are useful for computing the histogram values numerically and the corresponding bin edges. For more, check out np.digitize().
Tabular data in Pandas’ Series or DataFrame object. Pandas methods such as Series.plot.hist(), DataFrame.plot.hist(), Series.value_counts(), and cut(), as well as Series.plot.kde() and DataFrame.plot.kde(). Check out the Pandas visualization docs for inspiration.
Create a highly customizable, fine-tuned plot from any data structure. pyplot.hist() is a widely used histogram plotting function that uses np.histogram() and is the basis for Pandas’ plotting functions. Matplotlib, and especially its object-oriented framework, is great for fine-tuning the details of a histogram. This interface can take a bit of time to master, but ultimately allows you to be very precise in how any visualization is laid out.
Pre-canned design and integration. Seaborn’s distplot(), for combining a histogram and KDE plot or plotting distribution-fitting. Essentially a “wrapper around a wrapper” that leverages a Matplotlib histogram internally, which in turn utilizes NumPy.

With that, best of luck creating histograms in the wild. Whatever you do, just don’t use a pie chart!

00:00 Now you know a couple different ways to produce and plot histograms as well as customize them to suit your needs. Because this has been a lot of information, we’re going to take some time to summarize all of these methods and try to help guide you towards picking the right tool for the job.

00:16 Let’s say you have clean-cut integer data or you want to have a histogram without any third-party libraries.

00:23 If this is the case, collections.Counter() is the way to go. Note that this produces a frequency table and not a “true” histogram, but because it doesn’t rely on any third-party dependencies, you can be up and running very quickly.

00:36 Another case is that you have a large array of data where you need a mathematical histogram, with “true” bins and frequencies. Here, NumPy’s histogram(), and bincount() methods can suit you very well.

00:48 A common case is that you have data that’s already in a Pandas Series or DataFrame. And if you have this, you can go ahead and just plot those directly from Pandas, using Series.plot.hist() or DataFrame.plot.hist().

01:02 You also have the options to use those Pandas cut() and .value_counts() methods, as well as adding the KDE plots on top of the histograms. Now let’s say that you need to have a highly customized plot from any type of data.

01:15 This is where you have to make that special report that has to look exactly how you want it to. In this case, Matplotlib.pyplot has you covered. The hist() method will call the NumPy histogram() and give you plenty of customization options.

01:30 I’ve noted Plenty here, because going through the documentation can be a bit overwhelming. But if you spend some time at it, you can really make your graph look exactly how you want it to. Finally, you might just need a quick, pre-made design to wrap up your data and get it ready for a presentation. When this happens, go for Seaborn.

01:50 You can think of it as a double wrapper, because it uses Matplotlib, which then uses NumPy. But it can make you very nice looking graphs with very few lines of code and is usually worth checking out. And that’s everything!

02:04 Hopefully by now you have a pretty good idea on how you can use Python to generate histograms for your projects. Histograms are a great tool for exploring your data, and once you get used to putting them together quickly, can really help you give a feel for your data. Thanks for watching.

Avatar image for KB

KB on Dec. 7, 2022

Can you update this video in light of ‘distplot’ being a deprecated function that will be removed in seaborn v0.14.0?

Become a Member to join the conversation.