Understanding Generators

Python Generators 101 Christian Mondorf 11:33

In this lesson, you’ll learn how to create generator functions, and how to use the yield statement. Generator functions are a special kind of function that return a lazy iterator. These are objects that you can loop over like a list. However, unlike lists, lazy iterators do not store their contents in memory.

You’ll also compare list comprehensions and generator expressions. Once you’ve learned the difference in syntax, you’ll compare the memory footprint of both, and profile their performance using cProfile.

00:00 Welcome to the second video in our course on generators and the yield keyword in Python. In this video, we’re finally going to be looking at some real code.

00:09 So here’s my code editor. I’ve written a function here. It’s called infinite_sequence() and it initializes a variable called num, and then while True:, there’s a loop here which returns num and then increments num by 1 in each iteration.

00:26 So, there is a problem with this function, and you can see that my IDE is already highlighting the num variable on line 5, but I’ve chosen to ignore that for now. Instead, what I’m hoping will happen is that since True is always True, this while loop will run forever, and so it will return an infinite sequence like the name suggests. Let’s try this out and see what happens.

00:48 I’m going to enter the REPL and now I will import my function since it’s saved in an external file. Now let’s try printing the output from this function.

01:00 Okay. So that gave me 0 and nothing else. It’s not quite an infinite sequence. Now, you may already have realized this, but the problem is here on line 4 there is a return statement.

01:11 So when this line executes, the function returns the variable num, which at this stage has a value of 0. And then, it stops. We’ve exited the function. The function is gone.

01:22 It evaporates. And we never progress past the initial value of 0. Now a generator function would allow us to get around this problem and I’ll show you how. I’ll turn this function, this very normal function, into a generator function with one quick change.

01:38 And that change is that I’m going to replace the return keyword on line 4 with a yield statement. Watch.

01:47 So, I’ll quickly restart the REPL and import this again.

01:56 Since this is a generator function, I won’t use it in quite the same way as a normal function. Instead, what I’m going to do is I’m going to create a variable for it and I will store my function there. Let’s call this variable infinite.

02:10 Okay, there we go, so now I have this variable, infinite, and in that variable, I’m storing this generator function. Let’s just check that this is an actual generator, and I’m going to do that with the type() function. Okay, so you can see that this object is of the <class 'generator'>.

02:28 So this is looking good. The next question is, how do we then use this generator function? How do we get it to give us values? Well, we do that with a keyword called next().

02:38 I’m going to call next() on it right now. Okay, so it gave us the value 0. So far it hasn’t done anything very extraordinary, it just gave us back the same value the previous function was giving us, but watch what happens if I call next() on it again.

02:54 So, this is interesting. It didn’t give us 0. It remembered where we left off, it incremented the num value by 1 and then returned, or actually yielded the num variable to us.

03:06 What we’re seeing here is quite different from the behavior we see from normal functions. In this case, we started the generator, and then every time we’ve been calling next() on it, it’s running this loop once and it’s stopping.

03:19 It’s stopping here, where the yield keyword is. So, the yield keyword is causing it to stop and pause there and return this value, or yield this value. Let’s try running this again and you see, it keeps going, and rather than just stopping and evaporating, the way a normal function would, the generator function is remembering where it left off. This is called state.

03:43 We say that it’s keeping state between calls, so it’s keeping track of its own condition. And this is quite nifty because we can keep on calling next(), potentially forever. And yet, as we’ll see later, the memory footprint is quite modest because we don’t have to store an infinite sequence in memory. Okay.

04:04 Let’s leave the REPL for now and make a few changes to our function again. What I’m going to do this time is I’m going to add another yield.

04:14 So now let’s restart the REPL and re-import this.

04:21 Okay, so I’ve repeated the same steps I had done before, except this time when I loaded the infinite_sequence() generator, I had made some changes to it. It has a second yield statement.

04:32 Let’s see how this plays out when we call next() on it. So the first time I run this, the same thing happens as last time. It runs all the way to the first yield statement, it yields num, whose value is 0 because it was just initialized up here and it stopped here, the way it had the last time we called this.

04:52 But now there’s a second yield statement, so if I call next() on it again, it runs line 5, line 6, yields this value, so this string, and then stops here.

05:03 So, what happens if you have several yield statements in a generator is that every time the generator is called, it runs until the next yield statement, and then it stops, and that’s where it will wait. Let’s call this one more time.

05:18 It’s returning 1. If I call it again, I get the second yield statement to run. Okay. There’s one more special case I’d like to show you before we move on to something else. And that’s what happens when we exhaust our generator.

05:32 A generator is exhausted when it doesn’t have a next value to go to. I’ll show you how this works by turning this into a finite sequence.

05:58 So, I’m at a similar position to where I was before. I’ve created this generator. This time it’s a finite sequence. Let’s see what happens when I call next() on it.

06:08 So, it’s returning the first item in that list. If I call this again, it returns the next item in the list. And then again, but now we’ve reached the end of this list.

06:19 There are no more numbers in the list. So what will happen if I call this again?

06:24 So, as you can see, I got a traceback and the error type is a StopIteration. Now, something interesting is that whenever you’re using an iterator in Python on any structure which has baked-in double-under __iter__() methods, this is what’s happening behind the scenes.

06:40 The next() function is being called to move the iterator along and when it reaches the end, it actually raises a StopIteration, but you tend not to see it because those StopIterations are being caught by breaks. That’s sort of an aside. If you want to go a bit deeper into that topic, you can read up on iterators.

06:59 But the main point I want to make here is that when generators are exhausted, that is to say, when they’ve reached the end of the values which they can yield, they will raise this StopIteration exception.

07:11 You’ve seen how to create a generator function using the yield keyword, basically writing a normal function and replacing return with yield, but there’s another way to create generators, and that’s generator expressions.

07:24 You might remember list comprehensions. As a reminder, here on line 1, there’s a list comprehension which returns a list of all squared numbers in the range from zero to five. We can try running this,

07:38 and you can see that this returned a list of squared numbers, 0 to 4. range() stops at 4, since it stops just before the value we’ve given it.

07:48 And what you have here is a generator comprehension. So, let’s see what happens when we call that. And you see, in this case, a generator object is being returned. And if we had saved this in a variable, then we could operate on it using the next() function.

08:04 So, what was the difference between the two? It’s a bit subtle to catch if you haven’t been paying attention, but the list comprehension has square brackets at the beginning and at the end.

08:15 An easy way to remember this is that when you’re creating a list in Python, you’re using square brackets, right? So you open and close square brackets, and in between, you have the different list elements. A generator comprehension, on the other hand, uses parentheses, as you can see here on line 2. Okay.

08:31 So now that we’ve seen the syntax to create generators using functions, or how to create generator comprehensions, let’s spend a minute or two looking at how they perform in terms of both memory and speed. The way I’m going to do that is I’m going to import sys, and this will allow us to see how much memory is being taken up by different objects. So now I will first create a list comprehension with squared numbers,

08:56 and then I’ll create a generator comprehension with the same values.

09:03 So just as a quick reminder, in the list comprehension, I used square brackets, and for the generator comprehension, I used parentheses. Now let’s see how much memory these different objects are taking up.

09:15 So, we have over 824,000 bytes for the list comprehension. Let’s see how the generator comprehension is performing.

09:25 That comes in at a very modest 128 bytes. So you can see that the generator comprehension has a much smaller memory footprint than the list comprehension.

09:34 But let’s see how these two structures compare in terms of speed. And for that, we’re going to import cProfile.

09:50 So here we go. Summing these values with a list comprehension took 0.018 seconds. Let’s see how a generator comprehension compares.

10:08 And that took 0.031 seconds, so significantly longer but with a smaller memory footprint. So, that’s it for this video. We looked at the generator syntax.

10:21 Basically, you can take any function and turn it into a generator by replacing return with a yield. You just have to remember that every time that this generator is called, it will run until it hits the yield statement, and that’s where it will stop.

10:34 And if there are several yield statements, then it will be stopping at a yield statement each time. The other thing we looked at was the generator expression.

10:43 The syntax for this is very similar to a list comprehension, except that instead of using square brackets, you’re using parentheses. And the key points here are that generators are iterable objects, they keep state between calls, that is to say that they remember where they left off.

10:57 It’s a bit like having a bookmark in a book. You don’t have to start all the way from the beginning each time. Whereas a normal function would exit when it hit return and it would start from the very beginning each time. They save memory, they have a very modest memory footprint, but you do pay a price for it in terms of speed.

11:14 So there is a tradeoff there between memory and speed, and what is the right solution for you really depends on what kind of application you’re working on, and what your use case is. In the next video, we’ll go a bit deeper into generators by looking at some advanced generator methods you can use. I’ll see you there!

MrSteve3000 on June 26, 2020

In your cProfile examples, I see you are using i * 2 instead of exponentiation. I tried the code both ways, and of course the generator expression is still a bit slower, but not as dramatically.

StevenRF on Aug. 4, 2021

It looks like next(infinite) works but next(infinite_sequence()) does not work. Both are generators, so what is the difference?

>>> def infinite_sequence():
...     num = 0
...     while True:
...             yield num
...             num += 1
... 
>>> infinite = infinite_sequence()
>>> type(infinite)
<class 'generator'>
>>> type(infinite_sequence())
<class 'generator'>
>>> next(infinite)
0
>>> next(infinite)
1
>>> next(infinite)
2
>>> next(infinite_sequence())
0
>>> next(infinite_sequence())
0
>>> next(infinite_sequence())
0
>>> next(infinite_sequence())
0
>>>

Bartosz Zaczyński RP Team on Aug. 5, 2021

@StevenRF Using the yield keyword turns your function into a generator function, which behaves differently from regular functions:

>>> def generator_function():
...     print("Hello ")
...     yield 42
...     print("world")
... 
>>> generator_function()
<generator object generator_function at 0x7fef0e59f270>

When you call such a function, it won’t execute its body but rather it will create a new generator object that encapsulates the state of iteration. It supports next() or, more formally, the iterator protocol:

>>> generator_object = generator_function()
>>> next(generator_object)
Hello
42
>>> next(generator_object)
world
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    next(generator_object)
StopIteration

When you call next(generator_object), it advances the generator object by resuming the execution of the underlying generator function. Each invocation of next(generator_object) runs the code inside of the function until the next yield keyword.

So, calling next(infinite_sequence) advances the existing generator object stored in your infinite_sequence variable. On the other hand, calling next(infinite_sequence()) creates a fresh generator object every time, which is why you end up at the beginning.

Become a Member to join the conversation.