How to Work With a PDF in Python

Andrew Stephen 6 Lessons 31m intermediate

The Portable Document Format or PDF is a file format that can be used to present and exchange documents reliably across operating systems. While the PDF was originally invented by Adobe, it is now an open standard that is maintained by the International Organization for Standardization (ISO). You can work with a preexisting PDF in Python by using the PyPDF2 package.

PyPDF2 is a pure-Python package that you can use for many different types of PDF operations.

By the end of this course, you’ll know how to:

Extract document information from a PDF in Python
Rotate pages
Merge PDFs
Split PDFs
Add watermarks
Encrypt a PDF

What’s Included:

6 Lessons
Video Subtitles and Full Transcripts
3 Downloadable Resources
Accompanying Text-Based Tutorial
Q&A With Python Experts: Ask a Question
Certificate of Completion

Downloadable Resources:

Start Now

How to Work With a PDF in Python

6 Lessons 31m

1. How to Work With a PDF in Python (Overview)01:29

2. History of PyPDF203:44

3. Extracting Metadata and Rotating Pages11:50

4. Merging and Splitting PDFs04:16

5. Watermarking and Encrypting PDFs08:23

6. How to Work With a PDF in Python (Summary)01:41

Start Now

About Andrew Stephen

Andrew is an avid Pythonista and creates video tutorials for Real Python. He is a qualified robotics and mechatronics engineer who works for an engineering firm as a production engineer and loves his sport, music, gaming and learning.

» More about Andrew

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Christopher

Joanna

Sadie

Tappan

Mike

Brad

Aldren

Participant Comments

dthomas01 on April 14, 2020

I’m late to the party....really enjoyed this tutorial. Thought I would mention that PyPDF2 hangs in the middle of writing out the encrypted PDF file. Switching to the newer PyPDF4 you earlier mentioned solved that issue. I’m using Python 3.7 on Windows 10 Pro. The rest of the programs ran flawlessly. Very impressive and hope you keep up the good work, Andrew!

Alan ODannel on April 14, 2020

Very informative lesson. I’ll be able to put this to use in the near future.

sion on March 23, 2020

Many thanks for an excellent and useful presentation. Some years ago I scraped PDF’s for this information. It was MESSY. Now, “never again” Thank you.

rgusaas on March 7, 2020

Ditto on excellent presentation. The ReportLab reference was a real eye opener. Greatly appreciated.

Perhaps another lesson on reading a PDF’s contents. I wrote a PDF reader that would split a 100+ Page invoice document into separate pages and pulled the account manager name, invoice number and job number for the output file naming convention. Seems that most of the world struggles with how to strip out contents or search the contents of PDF files.

mikesult on March 1, 2020

Thank you Andrew for a great and very useful tutorial. I learned a lot about working with PDFs. I use pdf files as music charts quite a bit and these techniques will be very useful to split, merge and organize charts from pdf books. I appreciate your links to additional resources too.

« Browse All Courses