PdfHandler¶

High-level utilities for inspecting and modifying PDF files.

This module exposes PdfHandler, a convenience wrapper around pikepdf and pdfminer for:

text extraction and word counting
encryption, decryption, and permission inspection
moving, deleting, and resizing PDFs
merging PDFs with optional separator pages

class pdfhandler.pdf_handler.PdfHandler(pdf_path)[source]¶

Bases: object

Helper for common operations on a single PDF file.

The handler validates the input path on construction and then provides methods for:

extracting text and counting words
checking and changing encryption / permissions
moving, deleting, and resizing the file
merging PDFs and inserting separator pages

cp(new_path=None)[source]¶

Copy the PDF to a specified location and return its Path.

Parameters:: new_path (str | Path | None, optional) – Path to the new copy. If None it will be saved to the original PDF’s path with ‘-copy’ embedded between the stem and suffix. (Default: None).
Return type:: Path

decrypt(output=None, in_place=False, owner_password=None)[source]¶

Decrypt the PDF if it is currently encrypted.

If in_place is False (recommended), a decrypted copy is saved to a new file; otherwise, the original file is overwritten. If the PDF is not encrypted, no changes are made.

Parameters:

output (str | Path | None, default None) – Destination path for the decrypted PDF. Ignored if in_place=True. If None, a new file is created with "-Decrypted" appended to the original name.
in_place (bool, default False) – Whether to overwrite the original file in place.
owner_password (str | None, default None) – The owner password used to unlock and decrypt the PDF.

Return type:

None

encrypt(output=None, in_place=False, password=None, owner_password=None)[source]¶

Encrypt the PDF if it is not already encrypted.

This creates an encrypted version of the PDF using restrictive permissions by default. If in_place is False, the encrypted file is saved to a new path; otherwise, the original file is overwritten.

For fine-grained control over permissions, use save_pike_pdf() directly.

Parameters:

output (str | Path | None, default None) – Destination path for the encrypted PDF. Ignored if in_place=True. If None, a new file is created with "-Encrypted" appended to the original name.
in_place (bool, default False) – Whether to overwrite the original file in place.
password (str | None, default None) – The user password required to open the PDF. If None or empty, no password is required to view.
owner_password (str | None, default None) – The owner password used to set encryption and permissions.

Return type:

None

get_pdf_permissions()[source]¶

Return the current permission settings of the PDF.

Returns:

A dictionary mapping permission names to boolean values. Keys include:

"extract"
"modify_annotation"
"modify_assembly"
"modify_form"
"modify_other"
"print_lowres"
"print_highres"

Return type:

dict[str, bool]

get_pdf_text(pages=None)[source]¶

Extract text from the PDF, optionally from specific pages.

Parameters:

pages (PageNumberType, optional) –

Pages to extract text from. If None (default), all pages are included. Acceptable formats include:

a single int or str (e.g., 5 or "5")
a range as a str (e.g., "2-4")
a comma/space/"and"-delimited str (e.g., "1, 3 and 5-6")
a list of ints and/or strs (e.g., [1, "3", "5-7"])

Returns:

The extracted text as a single string. Returns an empty string if no text is found.

Return type:

str

classmethod merge_pdfs(pdf0_path, pdf1_path, output_path, add_separator=False, separator_type='black')[source]¶

Merge two PDF files, placing the first file on top.

Parameters:

pdf0_path (str | Path) – Path to the first PDF, which will appear first in the output.
pdf1_path (str | Path) – Path to the second PDF, which will appear after the first.
output_path (str | Path) – Path to save the merged output PDF.
add_separator (bool, default False) – If True, insert a separator page between the PDFs.
separator_type ({"black", "blank"}, default "black") –
Type of separator page to insert:
- "black" : a black bar (~1 in height)
- "blank" : a full blank page

Raises:

ValueError – If separator_type is not "black" or "blank".

Return type:

None

mv(dst)[source]¶

Move the PDF to a new location and update the internal path.

Parameters:: dst (str | Path) – Destination path, including the filename and .pdf extension.
Return type:: None

pdf_is_encrypted()[source]¶

Return whether the PDF is encrypted.

Returns:: True if the PDF is encrypted, False otherwise.
Return type:: bool

classmethod pdfs_are_duplicates(pdf0_path, pdf1_path)[source]¶

Return whether two PDFs have identical extracted text content.

Text is extracted using pdfminer. Layout, formatting, and metadata differences are ignored.

Parameters:

pdf0_path (str | Path) – Path to the first PDF file.
pdf1_path (str | Path) – Path to the second PDF file.

Returns:

True if the extracted text from both PDFs is identical, False otherwise.

Return type:

bool

print_permissions()[source]¶

Print encryption and permission status to the console.

Output is color-coded using colorama:

green for enabled permissions
red for disabled permissions

Return type:: None

resize(width, height, output_path=None)[source]¶

Resize all pages in the PDF to the specified dimensions.

Parameters:

width (int) – Desired page width in points (1 inch = 72 points).
height (int) – Desired page height in points (1 inch = 72 points).
output_path (str | Path | None, default None) – Path to save the resized PDF. If None, a new file is created in the same directory with the name pattern {original_name}-{width}x{height}.pdf.

Raises:

ValueError – If output_path is provided and does not end with .pdf.

Return type:

None

rm()[source]¶

Delete the PDF file from disk.

Return type:: None

save_pike_pdf(output, in_place=False, crypt_type=None, password=None, owner_password=None, extract=True, modify_annotation=True, modify_assembly=True, modify_form=True, modify_other=True, print_lowres=True, print_highres=True)[source]¶

Save the PDF with optional encryption or decryption applied.

Parameters:

output (str | Path | None) – Destination for the saved file. Ignored if in_place is True. If None, a new file is saved with a suffix such as "-Encrypted" or "-Decrypted" depending on usage.
in_place (bool, default False) – If True, overwrites the original file. If False, creates a new file.
crypt_type (str | None, default None) –
A preset encryption mode. Must be one of:
- "decrypt" : disables encryption entirely
- "encrypt" : enables encryption with all permissions set to False
- "no_copy" : like "decrypt" but with extract permission set to False
- None : uses the individual permission arguments below
password (str | None, default None) – User password for opening the encrypted PDF. If None or an empty string, no password is required to open.
owner_password (str | None, default None) – Owner password used to set permissions. A default value is used if this is None.
extract (bool, default True) – Whether users can extract text or images.
modify_annotation (bool, default True) – Whether users can modify annotations.
modify_assembly (bool, default True) – Whether users can rearrange pages or merge documents.
modify_form (bool, default True) – Whether users can fill in or edit form fields.
modify_other (bool, default True) – Whether users can make general modifications.
print_lowres (bool, default True) – Whether users can print in low resolution.
print_highres (bool, default True) – Whether users can print in high resolution.

Raises:

ValueError – If crypt_type is invalid or if the resolved output path is invalid.

Return type:

None

word_count(pages=None)[source]¶

Count the number of words in the PDF.

Parameters:: pages (PageNumberType, optional) – Pages to include in the word count. If None (default), all pages are included. See get_pdf_text() for accepted formats.
Returns:: The total number of words found on the specified pages.
Return type:: int