Best Practices for Python Main Functions

Defining Main Functions in Python Rich Bibby 11:17

Now that you know the differences between how Python handles the various execution modes, it’s useful for you to know which best practices you can implement. You’ll learn about four best practices you can use to make sure that your code can serve a dual purpose:

Put most code into a function or class.
Use __name__ to control execution of your code.
Create a function called main() to contain the code you want to run.
Call other functions from main().

00:00 Now that you can see the differences in how Python handles the different execution modes, it’s useful for you to know some best practices to use. These will apply whenever you want to write code that you can run as a script and import into another module or an interactive session.

00:15 You’ll learn about four best practices to make sure that your code can serve a dual purpose. These are, one, putting most code into a function or class; two, using __name__ to control execution of your code; three, using a function called main() to contain the code you want to run; and finally four, which is calling other functions from main().

00:39 Let’s break these down, starting with putting most code into a function or class. Remember that the Python interpreter executes all the code in a module when it imports the module.

00:49 Sometimes the code you write will have side effects that you want the user to control, such as running a computation that takes a long time, or writing to a file on the disk, or printing information that would clutter the user’s terminal.

01:02 In these cases, you want the user to control triggering the execution of this code, rather than letting the Python interpreter execute the code when it imports the module.

01:12 Therefore, the best practice here is to include most code inside of a function or a class. This is because when the Python interpreter encounters the def or class keywords, it only stores these definitions for later use and it doesn’t actually execute them until you tell it to.

01:29 Let’s walk through an example. Here, we have a file that we’ve called best_practices_1.py to demonstrate this idea. In this code you first import sleep() from the time module. sleep() pauses the interpreter for however many seconds you give as an argument and will produce a function that takes a long time to run for this example.

01:52 Next, you use print() to print a sentence describing the purpose of this code. Then, you define a function called process_data() that does five things. Firstly, it prints some output to tell the user that the data processing is starting.

02:05 Then, it modifies the input data. Next, it pauses the execution for 3 seconds using the sleep() function. Then it prints some output to tell the user that the processing is finished, and finally, it returns the modified data.

02:19 Now, let’s see what happens when you execute this file as a script on the command line. The Python interpreter will execute the from time import sleep and the print() lines that are outside the function definition, and then it will create the definition of the function called process_data().

02:35 Then, the script will exit without doing anything further, because the script does not have any code that executes process_data(). This is a result of running this file as a script.

02:45 The output that we can see here is the result of the first print statement. Notice that importing from time and defining process_data() produce no output.

02:55 Specifically, the outputs of the calls to print() that are inside the definition of process_data() are not printed. When you import this file in an interactive session or another module, the Python interpreter will perform exactly the same steps as when it executes the file as a script. Once the Python interpreter imports the file, you can use any variables, classes, or functions defined in the module you’ve imported. To demonstrate this, we’ll use the Python interactive interpreter.

03:24 We’ll start the interactive interpreter and then we’ll type import best_practices_1. The only output we can see is from the first print() call defined outside process_data(). Importing from time and defining process_data() produced no output, just like when you executed the code from the command line.

03:44 So, what if you want process_data() to execute when you run the script from the command line, but not when the Python interpreter imports the file? Well, you can use the if __name__ == "__main__" idiom to determine the execution context and conditionally run process_data() only when __name__ is equal to "__main__".

04:03 You can either modify your file, or like I’ve done here, I’ve created a new file called best_practices_2 that adds this code to the script. In this code, we’ve added a conditional statement that checks the value of __name__.

04:17 This conditional will evaluate to True when __name__ is equal to the string "__main__".

04:23 Remember that the special value of "__main__" for the __name__ variable means the Python interpreter is executing your script and not importing it. Inside the conditional block, you have added four lines of code. In lines 13 and 14, you’re creating a variable called data that stores the data that you’ve acquired from the web and prints it. In line 15, you’re processing this data. And in line 16, you’re printing the modified data.

04:53 Now, let’s run our new script from the command line to see how the output will change.

05:02 First, the output shows the results of the print() call outside of process_data(). After that, the value of data is printed.

05:10 This happened because the variable __name__ has the value of "__main__" when the Python interpreter executes the file as a script, so the conditional statement evaluated to True. Next, the script called our process_data() function and passed data in for modification.

05:27 When process_data() executes, it prints some status messages to the output. Finally, the value of modified_data is printed. Now let’s check what happens when you import the file from the interactive interpreter or another module.

05:41 Notice that you get the same behavior as before you added the conditional statement at the end of the file. This is because the __name__ variable had the value "best_practices", so Python did not execute the code inside the block, including process_data(), because the conditional statement evaluated to False.

06:00 Now you’re able to write Python code that can be run from the command line as a script and imported without unwanted side effects. Next, you’re going to learn about how to write your code to make it easy for other Python programmers to follow what you mean. Many languages, such as C and Java and several others, define a special function that must be called main() that the operating system automatically calls when it executes the compiled program.

06:23 This function is often called the entry point because it’s where execution enters the program. By contrast, Python does not have a special function that serves as the entry point to a script. You can actually give the entry point in a Python script any name you want.

06:36 Although Python does not assign any significance to a function called main(), the best practice here is to name the entry point function main() anyways.

06:45 That way, any other programmers who read your script immediately know that this function is the starting point of the code that accomplishes the primary task of the script. In addition, main() should contain any code that you want to run where the Python interpreter executes the file.

07:00 This is better than putting the code directly into the conditional block because a user can reuse main() if they import your module. Now, let’s check the third version of our script to help understand this concept. In this version, we’ve added the definition of main() that includes the code that was previously inside the conditional block, and then we’ve changed the conditional block so that it executes main().

07:23 If you run this code as a script or import it, you’ll get the same output as before. Another common practice in Python is to have the main() function execute other functions, rather than including the task-accomplishing code in main().

07:36 This is especially useful when you compose your overall task from several smaller sub-tasks that can execute independently. For example, you may have a script that does the following: reads a file from a source that could be a database, a file on a disk, or a web API; processes the data; and finally, writes the processed data to another location.

07:57 If you implement each of these sub-tasks in separate functions, then it’s easy for you or the user to reuse a few of the steps and ignore the ones you don’t want. Then you can create a default workflow in main(), and you can have the best of both worlds.

08:11 Whether to apply this practice to your code is a judgment call on your part. Splitting the work into several functions makes reuse easier but increases the difficulty for someone else trying to interpret your code because they have to follow several jumps in the flow of the program.

08:26 So let’s look at the final version of our script called best_practices_4. Stepping through this code, the first nine lines of the file have the same content that they had before.

08:37 The second function definition on line 11 creates and returns some sample data, the third function definition on line 16 simulates writing the modified data to a database. On line 20, main() is defined. In this example, we’ve modified main() so that it calls the data reading, data processing, and data writing functions in turn. First, the data is created from read_data_from_web().

09:04 This data is passed to process_data(), which returns the modified_data. Finally, modified_data is passed into write_data_to_database().

09:13 The last two lines of the scripts are the conditional block that checks __name__ and runs main() if the statement is True. Now let’s run the whole processing pipeline from the command line. You can see that the Python interpreter executed main(), which executed read_data_from_web(), process_data(), and write_data_to_database().

09:34 However, you can also import this file and reuse process_data() for a different input data source. First you need to import the file, then you can give it the shortened name of bp for this code.

09:46 The import process caused the Python interpreter to execute all of the lines of code in the file. Now that bp is imported, you can use those imported functions, create a variable named data, and set its value to the string "Data from a file" instead of reading the data from the web.

10:06 Then reuse the process_data() and the write_data_to_database() functions from the file. In this case, you take advantage of reusing the code instead of defining all of the logic in main(). So, that was quite a lot to take in, so here’s a recap of the four key best practices about main() in Python that you just saw. Number one, put code that takes a long time to run or has other effects on the computer in a function or class so that you can control exactly when that code is executed.

10:39 Two, use the different values of __name__ to determine the context and change the behavior of your code with a conditional statement. Three, you should name your entry point function main() in order to communicate the intention of the function, even though Python does not assign any special significance to a function called main(). And lastly, number four, if you want to reuse functionality from your code, define the logic in functions outside main() and call those functions from within main(). In the next lesson, you will review everything you’ve learned in this course.

Gregory Klassen on March 10, 2020

Is it tmux or some other terminal setup you use in your presentation?

Dan Bader RP Team on March 10, 2020

@Gregory: I believe this is the built-in terminal of the VS Code editor. We have a dedicated course on it here :)

mattc on March 11, 2020

What happens if you import more than one module with a main() function? Does the namespace of the module keep the interpreter from getting confused?

Dima on March 11, 2020

Very often on the top of a file one defines module-level variables. Does it mean they become available from the scope of the importing script? Can they override each other?

pallavlearn on March 18, 2020

Do you record the audio and video separately or is it in one go?

Zarata on Nov. 12, 2020

I personally found a point of (picayune(??)) confusion here. There are at least two statements made that I believed were probably technically true, but without further realization on my part seemed to contradict other points that were made. “Although Python does not assign any significance to a function called ‘main’ …” and “We can actually give the entry point in a Python script any name we want …”

My problem was that language spec. quoted in the previous portion of this module on the face did imply something “special” about the letters “main”: “A module’s __name__ is set equal to ‘__main__’ when read from standard input, a script, or from an interactive prompt. docs.python.org/3/library/__main__.html“

Long short: the missing bit for me was my finally realizing that “__main__” is a special STRING VALUE, NOT another special variable symbol (like __name__). The spec is saying a special String value “__main__” is occasionally assigned by the language to __name__. There is not some (unexplained) mechanism that changes the value of a variable __main__ to the name of one’s entry point function.

Therefore, one may write a “guarding conditional” (so called in another reference) for an arbitrarily-named entry point “foo” as:

if __name__ == "__main__":    #__main__ is a string value, NOT a
                              # variable symbol
    foo()                     #A non-conventionally-named entry point
                              # function

I suppose I’ll leave this embarrassing bit, just in case anyone else who tends to trip over their own shoe strings has a similar “huh??” moment.

Become a Member to join the conversation.