Graphing Performance With matplotlib
Now you’ll extend the script to use
matplotlib to produce charts of the performance to allow deeper analysis of the two approaches. As always, it’s good practice when using any third-party package to install it into a virtual environment.
The contents of the original script are copied to a new file,
chart.py, and then the code to create the graphs is added. First,
pyplot module is imported with the traditional alias.
Finally, the plot is displayed. The second plot will plot the ratio of performance between generators and loops. So the
generator_ratio that was previously seen in interactive mode is calculated, along with the
looping_ratio, which is always one.
Then the plot is displayed depending on the system you’re running and the values for
POSITION_INCREMENT that you use. Running the script can take a while, but it should produce one chart that shows the times plotted against each other. Additionally, after closing the first chart, you’ll get another chart that shows the ratio between the two strategies.
This chart clearly illustrates that in this test, when the target item is near the beginning of the iterator, generators are far slower than
for loops. However, once the element to find is at position a hundred or greater, generators beat the
for loop quite consistently and by a fair margin.
04:18 You can interactively zoom in on the chart with the magnifying glass icon. The zoomed chart shows there’s a performance gain of around 5 or 6 percent. This may not be anything to write home about, but it’s also not negligible.
Whether it’s worth it for you depends on the specific data you’ll be using and how often you need to use it. A point of note is that for low values of
TIMEIT_TIMES, you’ll often get spikes in the chart, as seen on-screen.
04:47 These are an inevitable side effect of testing on a computer that’s not dedicated to the task. If the computer needs to do something, then it will pause the Python process without hesitation, and this can inflate certain results.
With these results, you can tentatively say that generators are faster than
for loops, even though generators can be significantly slower when the item to find is in the first hundred elements of the iterable. When you’re dealing with small lists, the overall difference in terms of raw milliseconds lost isn’t much, yet for large iterable, where a 5 precent gain can mean minutes, it’s something to bear in mind.
05:43 This last chart shows the performance for very large intervals, with the increase in performance stabilizing at around 6 percent. Now that you’ve seen the performance of the two hard-coded solutions for finding the first match, in the next section of the course, you’ll take a look at a general reusable function which will allow you to do the same in more situations.
Become a Member to join the conversation.