Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Using Indices

00:00 In this lesson, you’ll be looking at the books dataset indices, how to use them, and how to work with them. Indices, then. You usually want something to refer to the rows, something unique with which you can refer to each row, something like an index number, but it doesn’t have to be an index number.

00:17 It can be a string. It can be a hash value, a checksum. There are many different options you have there. It doesn’t have to be unique, but it generally is much more helpful if it is. The DataFrame, as you’ve seen, if it doesn’t find one automatically will add one.

00:32 And usually that will just be a list of incrementing numbers from 0 up to however many rows there are. Again, it’s better if it’s unique, but it doesn’t have to be. pandas won’t enforce that on you.

00:44 If you already have an identifier column, you can first check to see if the values are unique, and then you can explicitly set that DataFrame’s index to that column that you want to be the identifier.

00:58 So there is one clear candidate to be our identifier, our index, which is the id column. We can take a look at this. books.loc[:] will get all the rows, and we’ll look at the id column.

01:18 That looks like a bunch of numbers. They go up quite a bit far more than the number of rows there are. But the thing that you’re interested in right now is whether it’s unique or not.

01:28 So, one thing you can do is to use this property called is_unique right on the series that we returned, basically the column of id.

01:37 This is not a method, it’s just a property. So if you run this, it will return a True or False value indicating whether it’s unique.

01:44 And as you can see, it is unique. To set that as the index, you can call the method of .set_index() on the DataFrame, again chaining this all together and passing in the column title.

02:01 Now we’ll run this

02:07 and inspect the data. As you can see, now the id is the ID that comes with the dataset. It’s not incremental, but it’s the ID that came with the dataset.

02:19 You’re not exactly sure what this refers to, but perhaps it has some significance. There’s no real downside using it, so you might as well.

02:30 Now that you’ve tweaked the indices of the books dataset, in the next lesson, you’re going to get into cleaning the date column.

Become a Member to join the conversation.