Locked learning resources

Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Locked learning resources

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Compressing and Extracting Member Files

Manipulating ZIP Files With Python Darren Jones 06:19

Transcript
Discussion

00:01 Compressing and Extracting Member Files. If your files are taking up too much space, then you might consider compressing them. Python’s zipfile supports a few popular compression methods. However, the module doesn’t compress your files by default.

00:15 If you want to make your files smaller, then you need to explicitly supply a compression method to ZipFile. Typically, you’ll use the term stored to refer to member files written into a ZIP file without compression.

00:27 That’s why the default compression method of ZipFile is called ZIP_STORED, which actually refers to uncompressed member files that are simply stored in the containing archive.

00:39 The compression method is the third argument to the initializer of ZipFile. If you want to compress your files when you write them to a ZIP archive, then you can set this argument to one of the constants seen on-screen.

00:51 These are the compression methods you can currently use with ZipFile. A different method will raise a NotImplementedError. There are no additional compression methods available to zipfile as of Python 3.11. As an additional requirement, If you choose one of these methods, then the compression module that supports it must be available in your Python installation. Otherwise, you’ll get a RuntimeError exception, and your code will break.

01:18 Another relevant argument of ZipFile when it comes to compressing your files is compresslevel. This argument controls which compression level you use. With the Deflate method, compresslevel can take integer numbers from zero through nine.

01:31 With the Bzip2 method, you can pass integers from one through nine. In both cases, when the level increases, you get more compression but lower compression speed.

01:43 Note that many binary files, such as PNG, JPG, and MP3, already use some kind of compression. As a result, adding them to a ZIP file may not make the data any smaller because it’s already compressed.

01:57 Let’s say you want to archive and compress the content of a given directory using the Deflate method, the most commonly used one in ZIP files. To do that, you can run the code seen on-screen.

02:26 Here you pass 9 to compresslevel to get maximum compression. To provide this argument, you use a keyword argument. You need to do this because compresslevel isn’t the fourth positional argument in the ZipFile initializer.

02:41 The initializer of ZipFile takes a fourth argument called allowZip64. It’s a Boolean argument that tells ZipFile to create ZIP files with the .zip64 extension for files larger than four gigabytes.

03:09 After running this code, you’ll have a comp_dir.zip file in your current directory. If you compare the size of that file with the size of directory.zip, which contains the same member files, but without compression, then you’ll notice significant size reduction.

03:31 Creating ZIP files sequentially can be another common requirement in your day-to-day programming. For example, you may need to create an initial ZIP file with or without content and then append new member files as soon as they become available. In this situation, you need to open and close the target ZIP file multiple times.

03:50 To solve this problem, you can use ZipFile in append mode ("a"), as you already have done. This mode allows you to safely append new member files to a ZIP archive without truncating its current content.

04:09 Here append_member() is a function that appends a file (member) to the input ZIP archive (zip_file). To perform this action, the function opens and closes the target archive every time you call it.

04:22 Using a function to perform this task allows you to reuse the code as many times as you need. The get_file_from_stream() function is a generator function simulating a stream of files to process.

04:43 Meanwhile, the for loop sequentially adds member files to incremental.zip using append_member().

04:54 If you check your working directory after running this code, then you’ll find an incremental.zip archive containing the three files you passed into the loop.

05:08 One of the most common operations you’ll ever perform on ZIP files is to extract their content to a given directory in your file system. You already learned the basics of using .extract() and .extract_all() to extract one or all of the files from an archive. As an additional example, go back to the sample.zip file. At this point, the archive contains four files of different types.

05:30 You have two text files and two Markdown files. Let’s say you want to extract only the Markdown files. To do so, you can run the code seen on-screen.

05:46 The with statement opens sample.zip for reading, and the for loop iterates over each file in the archive using namelist(), while the conditional statement checks if the filename ends with the .md extension. If it does, then you extract the file at hand to a target directory, output_dir/, using .extract().

06:10 Having taken a deep dive into ZipFile and ZipInfo, in the next section of the course, you’ll take a look at other classes from the zipfile module.

Become a Member to join the conversation.