Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Hint: You can adjust the default video playback speed in your account settings.
Hint: You can set your subtitle preferences in your account settings.
Sorry! Looks like there’s an issue with video playback 🙁 This might be due to a temporary outage or because of a configuration issue with your browser. Please refer to our video player troubleshooting guide for assistance.

How to Work With a PDF in Python (Summary)

The PyPDF2 package is quite useful and is usually pretty fast. You can use PyPDF2 to automate large jobs and leverage its capabilities to help you do your job better!

In this course, you learned how to do the following:

  • Extract metadata from a PDF
  • Rotate pages
  • Merge and split PDFs
  • Add watermarks
  • Add encryption

Also keep an eye on the newer PyPDF4 package as it will likely replace PyPDF2 soon. You might also want to check out pdfrw, which can do many of the same things that PyPDF2 can do.

If you’d like to learn more about working with PDFs in Python, then you should check out some of the following resources for more information:


Course Slides (.pdf)

153.8 KB

Sample Code (.zip)

2.7 KB

Course Documents (.zip)

3.4 MB

mikesult on March 1, 2020

Thank you Andrew for a great and very useful tutorial. I learned a lot about working with PDFs. I use pdf files as music charts quite a bit and these techniques will be very useful to split, merge and organize charts from pdf books. I appreciate your links to additional resources too.

fahmico on March 5, 2020

Thank you for the tutorial! You explain very well.^_^ This is really worth to learn.

Andrew Stephen RP Team on March 6, 2020

Hi @mikesult. Thanks for the feedback, glad you enjoyed the course and that you will be getting almost immediate real world use from what you have learnt.

Andrew Stephen RP Team on March 6, 2020

Hi @fahmico, Thanks for the kind words. Glad you enjoyed it!

rgusaas on March 7, 2020

Ditto on excellent presentation. The ReportLab reference was a real eye opener. Greatly appreciated.

Perhaps another lesson on reading a PDF’s contents. I wrote a PDF reader that would split a 100+ Page invoice document into separate pages and pulled the account manager name, invoice number and job number for the output file naming convention. Seems that most of the world struggles with how to strip out contents or search the contents of PDF files.

sion on March 23, 2020

Many thanks for an excellent and useful presentation. Some years ago I scraped PDF’s for this information. It was MESSY. Now, “never again” Thank you.

Alan ODannel on April 14, 2020

Very informative lesson. I’ll be able to put this to use in the near future.

dthomas01 on April 14, 2020

I’m late to the party....really enjoyed this tutorial. Thought I would mention that PyPDF2 hangs in the middle of writing out the encrypted PDF file. Switching to the newer PyPDF4 you earlier mentioned solved that issue. I’m using Python 3.7 on Windows 10 Pro. The rest of the programs ran flawlessly. Very impressive and hope you keep up the good work, Andrew!

Felix M on May 24, 2020

Very informative course. Thank you!

andresfmesad on Sept. 14, 2021

Very well explained! Is there a way to write a pandas dataframe to a PDF file and specify some format?

Hugh Tipping on Sept. 14, 2021

Very happy with this presentation. It gives a solid foundation in starting to work with PDFs with enough outside reference material to keep me busy for a long time. Many thanks.

Become a Member to join the conversation.