argparse

Command Line Interfaces in Python Liam Pulsifer 09:06

Here are resources for more information about argparse, SHA-1, and hashes:

00:00 In this lesson, I’ll demonstrate how you can use argparse to build command line interfaces. I’ll be showing you how to duplicate a Linux utility called sha1sum, which you can use to get SHA1 hashes of files and other data.

00:16 So, let me take you through how that utility works.

00:20 I’m here in my macOS terminal, but if you’re on any kind of Linux or Unix terminal, like the Ubuntu subsystem for Windows or the inbuilt terminal on a Linux system, then this demo should work perfectly fine for you. So, sha1sum—and I’m going to put in the -h flag so you can see the usage and the options.

00:39 The usage is quite simple. It’s sha1sum followed by some options enumerated here, and then any number of files. You can pass in one file, two files, zero files—whatever you want—and it will calculate the SHA1 hash of those files.

00:55 I’ll demonstrate it here. So, sha1sum and then I can put in main.py and main.c, and as you can see, it calculates the hash and then just prints out the name of the actual file itself. Now, there’s another mode here, which is checking files, which I won’t get into in actually implementing in Python, but just to show you how it works because I think it’s pretty cool.

01:18 You can say main.py main.c and then you can put that content into a file, like for example, I’ll just call it m.test. And then later, if you want to check that the contents of those files are the same, you can say sha1sum --check m.test and it will recalculate those hashes and make sure that they haven’t changed since the last time that they were calculated. So, that’s a cool feature, but it’s not one that I’m really going to get into in the Python code.

01:45 One other thing you should know about the Python code, though, a feature I will try to duplicate is passing in a hyphen, which allows you to enter some stuff to hash into the standard input, and then you can simply exit out of it with either Control + D in some systems, Control + Z in other systems, and it will just hash that standard input stuff.

02:06 So it says Some stuff to hash, and the SHA1 hash value of this content is this long hex number, here. So that’s how sha1sum works, and you can even pass in wildcard expansions to it, so I can say main.* and it will do exactly the same thing.

02:23 But that’s generally the idea: you just pass in some number of files, or this hyphen to indicate standard input, and it will just calculate the SHA1 hash.

02:30 So, let’s see how you can do that using argparse in Python.

02:36 All right. So, I have a mostly written version of sha1sum() using argparse, but I’ve left out the most important parts of the actual argparse functionality and I’ll take you through how to write those in just a minute.

02:48 Well, let’s actually start at the bottom of this file and take a look at the main() function. So of course, if __name__ == "__main__", then I’ll just run main(), and what main() does is it creates a parser.

03:01 This is really the key to understanding how argparse works: that you instantiate a parser object and then you get the arguments, which are essentially the parsed form of the arguments, by calling parser.parse_args(). So ideally, if your parser is structured correctly, then you’ll have all of your args here in this object and they’ll be super easy to access and work with.

03:24 So, this clause says if not args.files: output_sha1sum(process_stdin()). So, essentially, you’re just saying, if there are no files passed in—if this person just says sha1sum—then you’ll just get the stdin (standard in) and process that. Otherwise, go through all of the files in args.files, and if the file is a hyphen, then you output_sha1sum() of this process_stdin(). Otherwise, you try to output_sha1sum() of the processed version of that file’s contents, and if there’s some kind of error there, then you just print out a little usage message with the error.

04:02 So, let’s go back up. This init_argparse() is something that I will actually be writing out in just a second, but otherwise what you’re really doing here is just—this sha1sum(), you’re using hashlib to get this SHA1 hash, and then you just call it on whatever byte data you have here, and then process_stdin() and process_file() just actually get the bytes from either a file or from the joined version of the

04:28 standard input. So anything that’s on the standard input, you just take it all in and get the bytes version. And then, of course, you just output it by printing it.

04:36 There’s nothing much more to that. So what this really all hangs on is the initialization of the argument parser, here. So, how does that work? Well, what you do is as follows.

04:47 You’re going to instantiate a parser and the parser is just going to be—as you can tell from this type hint, here—argparse.ArgumentParser—

04:57 and there are many different options that you can have here. The two I’m going to use are usage, and usage is just going to equal the actual program here—and this is just a little formatting syntax for getting the name of the program—and then [OPTIONS], and then [FILES].

05:16 And I won’t worry too much about the specific formatting of this usage message, but that gives you a general idea of how it works. And then you can also put in a description, which is always useful to have.

05:25 So you’ll say "Calculates the sha1 hash of some number of files or the standard input". This is how you initialize this parser, and then what you need to do is add arguments to the parser, so you’re going to call parser.add_argument().

05:41 And normally what happens there is you can add in any names for this argument,

05:48 right? So "-v" or "--version" is the first one I’m going to add because it’s really simple. And then you can add in an action to take when this is actually used,

05:59 and then you can add in other keyword arguments that actually define these actions, or at least the strings to be printed out. So I’ll say here, f"{parser.prog}". And then I’ll just say version "1.0.0", because, you know, this isn’t a real application so there’s not really a real version number.

06:19 Now, I can add an argument here, and that argument will just be called "files" and it will have a number of arguments which is a "*", which stands for zero to any number of arguments.

06:32 Something that can be difficult about this .add_arguments() paradigm is that it’s not always entirely clear what each keyword argument that’s passed into the constructor for a new command line argument actually means, so below this video tutorial I’m going to link a much more in-depth tutorial to argparse, which will give you more information about that sort of thing.

06:56 And that is actually all that you need to get this to run because argparse.ArgumentParser is a really powerful library that does a lot of this work for you.

07:05 You just add in this "--version" argument and an action for it to do when this "--version" argument is included, and then you add in an argument for "files", and since I didn’t include any specific options syntax here, this is just going to be a positional argument and you can have any number of them, and then the --help option is actually already taken care of.

07:26 Now, let’s watch it run. python sha1sum_argparse.py, and then, first, I’ll just run it with no arguments at all,

07:35 and so it should be taking some stuff from stdin—and I’ll get out of that—

07:42 and it does seem to generate at least some kind of hash. And now I can also say python sha1sum_argparse.py and what’s kind of cool is I can actually run it on this file, so I can make it hash itself, which is kind of cool. So there it does at least seem to be some kind of hash, so that’s nice.

07:58 So, let’s run the real sha1sum on sha1sum_argparse.py. As you can see, it has exactly the same values. That’s pretty darn cool.

08:07 And let’s try it with another file in here. So, what else could I do? I can use the next demonstration file that I’m going to use here. And so it seems to at least work pretty darn well?

08:17 And let’s add a hyphen just to make sure that this all works, too. So, it gets the hash of the first file and then it allows me to enter stdin, and so it gets that hash and then it gets the hash of seq_getopt.py.

08:32 So this is working great, and it really is amazing the level of flexibility and power you can get just from a few lines of argparse code. Because all of this other stuff, here, this is all program logic that you would need regardless of what kind of command line interface library you’re using, and then really all you need to do here is you initialize the parser, do a little work here, and you get the args and then it’s pretty much just like having an object that contains all of your arguments.

09:00 In the next lesson, I’ll show you a bit of a different approach with getopt.

Become a Member to join the conversation.