Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Reading Information From ZIP Files

00:00 Reading Information From ZIP Files. In this section of the course, you’ll see a number of methods for reading information from ZIP files, starting with how to read metadata.

00:10 You’ve already put .printdir() into action. It’s a useful method that you can use to list the content of your ZIP files quickly. Along with .printdir(), the ZipFile class provides several handy methods for extracting metadata from existing ZIP files. On-screen, you can see a summary of these methods: .getinfo() returns a ZipInfo object, .infolist() returns a list of ZipInfo objects, and .namelist() returns a list holding the names of all the member files.

00:39 With these three tools, you can retrieve a lot of useful information about the content of your ZIP files. On-screen, you can see .getinfo() in use.

01:01 .getinfo() takes a member file as an argument and returns a ZipInfo object with information about it. ZipInfo objects have several attributes that allow you to retrieve valuable information about the target member file. For example, .file_size and .compress_size hold the size, in bytes, of the original and compressed files, respectively.

01:22 The class also has some other useful attributes, such as .filename and .date_time, which return the filename and last modification date.

01:34 By default, ZipFile doesn’t compress the input files to add them to the final archive. That’s why the size and the compressed size are the same in the examples seen previously.

01:44 You’ll look at compressing files and directories later on in the course. With .infolist(), you can extract information from all the files in a given archive.

01:55 On-screen is an example that uses its method to generate a minimal report with information about all the member files in your archive.

02:12 The for loop iterates over the ZipInfo objects from .infolist(), retrieving the filename, the last modification date, the normal size, and the compressed size of each member file. In this example, you use datetime to format the date in a human-readable way.

02:42 If you just need to perform a quick check on a ZIP file and list the names of its member files, then you can use .namelist().

03:01 Because the filenames in this output are valid arguments to .getinfo(), you can combine these two methods to retrieve information about selected member files only.

03:13 Sometimes you have a ZIP file and need to read the content of a given member file without extracting it. To do that, you can use .read(). This method takes a member file’s name and returns that file’s content as bites.

03:33 To use .read(), you need to open the ZIP file for reading or appending. Note that .read() returns the content of the target file as a stream of bytes. In this example, you use .split() to split the stream into lines using the line feed character "\n" as a separator.

03:50 Because .split() is operating on a byte object, you need to add a leading b to the string used as an argument. ZipFile’s .read() method also accepts a second positional argument called pwd.

04:07 This argument allows you to provide a password for reading encrypted files. To try this feature, you can rely on the file that you downloaded with the materials for this course.

04:31 First, you provide the password secret to read the encrypted file. The pwd argument accepts values of the bytes type. As you can see here, if you use read on an encrypted file without providing the required password, then you get a RuntimeError.

04:59 Note that Python’s zipfile supports decryption. However, it doesn’t support the creation of encrypted ZIP files. That’s why you’d need to use an external file archiver to encrypt your files.

05:11 Some popular file archivers include 7z and WinRAR for Windows, Ark and GNOME Archive Manager for Linux, and Archiver and Keka for macOS. For large encrypted ZIP files, keep in mind that the decryption operation can be extremely slow because it’s implemented in pure Python. In such cases, consider using a specialized program to handle your archives instead of using zipfile. If you regularly work with encrypted files, then you may want to avoid providing the decryption password every time you call .read() or another method that accepts a pwd argument.

05:51 If that’s the case, you can use ZipFile.setpassword() to set a global password. With .setpassword(), you just need to provide the password once. ZipFile uses that unique password for decrypting all of the member files.

06:25 In contrast, if you have ZIP files with different passwords for individual member files, then you need to provide the specific password for each file using the pwd argument of .read().

06:47 Here, you use secret1 as a password to read hello.txt

07:01 and secret2 to read

07:14 A final detail to consider is that when you use the pwd argument, you’re overriding whatever archive-level password you may have set with .setpassword().

07:24 If you call .read() on a ZIP file that uses an unsupported compression method, this raises a NotImplementedError. You’ll also get an error if the required compression module isn’t available in your Python installation. In the next section of the course, you’ll see some other ways of opening and reading the contents of ZIP files.

Become a Member to join the conversation.