Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set your subtitle preferences in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please refer to our video player troubleshooting guide for assistance.

Determining Ratios

00:00 Determining Ratios. Vertical and horizontal bar charts are often a good choice if you want to see the difference between your categories. If you’re interested in ratios, however, pie plots are the way to go.

00:14 Since cat_totals contains a few smaller categories, creating a pie plot with the code seen onscreen leads to a plot with issues.

00:28 There are several tiny slices with overlapping labels. To address the problem, the smaller categories can all be lumped together into a single group. This code merges all the categories with a total of under 100,000 into a category called "Other", and then creates a pie plot.

01:20 Notice that the argument for label is an empty string ("").

01:26 By default, pandas adds a label with the column name. That often makes sense, but in this case it would only add some noise. Now the pie plot is much better, as you can see here.

01:36 The "Other" category still only makes up a very small slice of the pie. That’s a good sign that merging those categories was the right choice.

01:45 Zooming in on Categories. Sometimes you also want to verify whether a certain categorization makes sense. Are the members of a category more similar to one another than they are to the rest of the dataset? Again, a distribution is a good tool to get a first overview.

02:01 Generally, we expect the distribution of a category to be similar to the normal distribution but have a smaller range. This code creates a histogram plot showing the distribution of the median earnings for the engineering majors.

02:15 It will generate a histogram that you can compare to the histogram of all majors from the beginning.

02:28 The range of the major median earnings is somewhat smaller, starting at $40,000. The distribution is closer to normal, although its peak is still on the left.

02:39 So even if you’ve decided to pick a major in the engineering category, it would be wise to dive deeper and analyze your options more thoroughly.

pnmcdos on April 8, 2022

Seems like a lot of line of code to remove the overlap of:

  • Two majors
  • Interdisciplinary
  • Agriculture & Natural Resources

Was there a way to combine the two into ‘other’ with one line then plot it?

Become a Member to join the conversation.