NumPy Review (Optional)

Python Plotting With Matplotlib Austin Cepalia 07:57

One of the many dependencies of matplotlib is called numpy, which is short for “numerical Python.” It’s a very popular library used for scientific computing.

numpy provides objects that can represent more complex data than the built-in data types in Python can. It also provides efficient yet advanced mathematical operations you can perform on this data.

In this course, you’ll use numpy mostly to generate pseudorandom numbers, which you’ll then store in multidimensional arrays.

00:00 One of Matplotlib’s many dependencies is called NumPy. NumPy, short for Numerical Python, is a very popular library used for scientific computing in Python. It’s sometimes pronounced “num-pee” but I’m not going to be that guy. NumPy provides objects that can represent more complex data than the built-in data types in Python.

00:27 It also provides efficient yet advanced mathematical operations we can perform on this data. Most of our utilization of NumPy will revolve around generating pseudorandom numbers, which we will then store in multidimensional arrays.

00:46 These arrays will then be graphed and visualized using Matplotlib. Once you’ve gotten NumPy installed, using it is as simple as import numpy as np.

00:59 It’s convention to reference numpy as np, but of course it’s not required.

01:06 The first function we’ll learn about is the seed() function, which will set the seed for the pseudorandom number generator. Computers can only generate what are known as pseudorandom numbers.

01:20 That is, they aren’t truly random because they’re computed with some sort of algorithm that’s, by nature, predictable. There’s ways of making these numbers as random as possible, but computers alone can’t generate a truly random number. By setting the seed, we’ll ensure that every time we run our program, we get the same set of random data.

01:45 That is, the computer will generate pseudorandom numbers once, and those same numbers will be used for each run of the program. If you’d like to learn more about pseudorandom number generators, see the link in the video notes down below. NumPy uses what are called ndarrays, short for n-dimensional array.

02:08 These look a lot like Python lists, except they act like arrays traditionally do. They are created at a fixed size and cannot grow later, and all of their elements share a common data type.

02:23 This is what makes them so efficient, which is important when dealing with large sets of data. The Python list just doesn’t cut it. There are a few important properties of ndarrays. First, we have the number of dimensions.

02:40 This here is a one-dimensional array, meaning that it contains just one set of numbers. Simple enough. This is a two-dimensional array. Each element in this array is actually a one-dimensional array in and of itself.

02:57 That’s right—you can put arrays inside of other arrays. The shape is a tuple that shows how many elements are in each dimension.

03:09 Our 2D array has two 1D arrays inside of it, and each of those have four elements, and so its shape is (2, 4). The size of an ndarray is simply the values in the shape multiplied, so our 2D array here has a size of 8.

03:31 We can create an ndarray with as many dimensions as we want, but in this course, we’ll limit ourselves to just two. To create a one-dimensional ndarray, we can use the arange() function.

03:46 This function will ensure that each element is evenly spaced, for example, [1, 2, 3, 4] or [3, 6, 9, 12]. If we supply it just a single integer, it will create an ndarray starting at 0 incrementing by 1 and ending at the number we supply, but not including it.

04:10 We say it’s inclusive of the lower bound and exclusive of the upper bound.

04:18 One way to create multidimensional arrays full of random numbers is by using the randint() function. The first two parameters are the inclusive lower and exclusive upper bounds of each random number.

04:34 That means that if we supply 2 and 4 here, then each of our random numbers will be either 2 or 3, because we don’t include 4. The last parameter we’ll use is called size, but you can really think of it like the shape.

04:53 The number of elements in this tuple corresponds to the number of dimensions in our ndarray.

05:00 We’ll be working with 2D arrays in the future, and so we’ll give this function a tuple containing the number of sub 1D arrays to create as well as the number of random numbers to generate inside of each array. To make this more clear, let’s look at this example. np.random.randint() with a lower bound of 0, an upper bound of 10, and a size of (3, 4).

05:32 This will return an ndarray containing three subarrays, and each of those will contain four random numbers from 0 to 9 inclusive.

05:45 The next function to know is column_stack(). This function will take a tuple containing two ndarrays, and literally stack them on top of each other.

05:57 We’ll be passing this two 1D arrays, and so it will return to us a 2D array where each index of the original 1D array is its own sub 1D array here.

06:13 I know that’s probably pretty confusing, so let’s look at an example. If we supply [1, 2, 3] and [4, 5, 6], our new 2D array will be a 1D array of [1, 4], a 1D array of [2, 5], and finally, a 1D array of [3, 6].

06:38 Next is transposing and reshaping. We’ve already seen how we can generate an ndarray from 0 to 5 inclusive with something like this: x = np.arange(6).

06:55 If we call .reshape() on that, our 1D array will turn into something 2D and it’ll look like this. We’ve literally reshaped the array.

07:08 Now we can transpose this, which will—in a way—reverse the shape. If we add a .T to the end here, we can see that our 3 by 2 is now a 2 by 3.

07:23 Finally, the last piece of NumPy you should be familiar with is the diag() function. This will create a two-dimensional ndarray out of all the values we pass in, with the rest of the arrays being zeros.

07:39 Notice how the values we pass in are arranged diagonally.

07:44 That should be everything you need to know about NumPy in order to understand the rest of this course. If you’d like to learn more about NumPy, see the video notes down below for links.

Anonymous on Oct. 23, 2019

I think you meant np.column_stack((a,b))

Austin Cepalia RP Team on Oct. 23, 2019

That’s correct. I’ll have this video updated shortly

Pradeep Kumar on Feb. 2, 2020

Thanks very much Austin, your tutorial is helpful to many, perticualary to those who are very confused because of too much of different tools in python for doing the same thing. The zen of python syas “There should be one– and preferably only one –obvious way to do it.” , if that is so why there are so many packages for doing the same stuff, example: matplotlib (the oop version, and the non oop version), altaire, bokeh, ggplot, seaboarn, pandas plot, plotly and so on…, I mean I was very confused until I find this tutorial and the other one in realpython which tells how matplotlib also exists in stateless and stateful versions, Really it saved me. But I wonder why, there are so many libraries for doing the same thing, its just really frustating and not one of them is easy to learn. I am really thankful that you made this tutorial.

Austin Cepalia RP Team on Feb. 6, 2020

@Pradeep Kumar thanks for the kind words, I’m glad the course was useful to you. You’re absolutely right about the zen of Python. I’m not entirely sure if this is the best answer to your question, but if I had to guess I would say that there exit so many libraries, frameworks, and packages because the “best” way of doing something for one person might not be the best way of doing it for someone else. It’s like asking why there are so many different cars; they all get you from place A to place B, but some are faster on road, some can carry a heavy load, and some just look fancy! It’s a matter of personal preference. Likewise, the different plotting libraries accomplish basically the same thing but in radically different ways. They also have different features and capabilities.

Pradeep Kumar on Feb. 9, 2020

Thanks for your reply, Can you suggest good book which is based on stateless approach of matplotlib, Sorry this might be an offtopic question.

Austin Cepalia RP Team on Feb. 13, 2020

@Pradeep Kumar I’m not sure of any books specifically about the stateless approach to matplotlib, but because the stateless approach is so common, I would imagine any good book on matplotlib would cover it extensively. Since the stateless approach makes use of object-oriented Python concepts, you may be interested in strengthening those skills too. I’ve got a course on that here if you’re interested.

Pradeep Kumar on Feb. 19, 2020

Thanks, I have already done your course on oop

Become a Member to join the conversation.