Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set your subtitle preferences in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please refer to our video player troubleshooting guide for assistance.

Querying Your Dataset

00:00 So far, you’ve been using the columns and index to select data from your dataset. Using queries, you can get more precise and select data based on the values.

00:12 Here’s how to select all games from the nba DataFrame played after the year 2010.

00:20 Now notice that this returns a DataFrame. The columns are still there, but the number of rows is smaller because only games played after 2010 have been included.

00:33 So, how does this work? First, look at the query itself. This evaluates the 'year_id' column and compares each value to 2010. If the value is greater, the result is True, and False if it is less or equal.

00:51 Then all of the results are returned in a Series. Note that this Series has all 126,314 rows in the nba DataFrame. However, if you query the DataFrame, it will only return the rows where a True value exists in the Series.

01:11 This DataFrame contains only the 15,000 or so games played after 2010.

01:19 You can get quite detailed with queries. For example, the notes column in the nba DataFrame does not always contain data.

01:27 First of all, verify this with the .info() method. It seems that only 5,400 games have notes. How can you weed out the games with no notes? The notes column will be null for the 120,000 or so games that have no notes.

01:44 Therefore, you can use the .notnull() method to test the 'notes' column. Just like the previous demo, this returns a Series of bools, and you can use this to query the entire DataFrame to include just the data you want.

02:01 Remember the object data type? It’s how strings are stored in the DataFrame. You can access the .str attribute of a object column and call string methods to filter the column on the values. For example, the 'fran_id' column stores the name of the franchise of the team.

02:19 Many of them end in 'ers''Lakers', 'Cavaliers', and so on. Using the .endswith() method for a string, you can create a filter to get only those teams.

02:33 Sometimes, a single criteria is not enough to select the data you need. You can combine criteria with the logical AND and OR operators.

02:42 The logical AND operator is the single ampersand (&), while the logical OR operator is the single pipe (|).

02:50 This next query gets all of the games in the nba DataFrame with an ID of 'BLB' that scored more than 100 points

03:01 and the opposing team scored more than 100 points.

03:07 Notice that the criteria are in parentheses to ensure they are evaluated separately, and it’s easier to remember what you meant the next time you see it.

03:17 In the next lesson, you’ll learn how to summarize your dataset with aggregation and grouping.

Become a Member to join the conversation.