PdfHandler¶
High-level utilities for inspecting and modifying PDF files.
This module exposes PdfHandler, a convenience wrapper around pikepdf
and pdfminer for:
text extraction and word counting
encryption, decryption, and permission inspection
moving, deleting, and resizing PDFs
merging PDFs with optional separator pages
- class pdfhandler.pdf_handler.PdfHandler(pdf_path)[source]¶
Bases:
objectHelper for common operations on a single PDF file.
The handler validates the input path on construction and then provides methods for:
extracting text and counting words
checking and changing encryption / permissions
moving, deleting, and resizing the file
merging PDFs and inserting separator pages
- cp(new_path=None)[source]¶
Copy the PDF to a specified location and return its Path.
- Parameters:
new_path (str | Path | None, optional) – Path to the new copy. If None it will be saved to the original PDF’s path with ‘-copy’ embedded between the stem and suffix. (Default: None).
- Return type:
Path
- decrypt(output=None, in_place=False, owner_password=None)[source]¶
Decrypt the PDF if it is currently encrypted.
If
in_placeisFalse(recommended), a decrypted copy is saved to a new file; otherwise, the original file is overwritten. If the PDF is not encrypted, no changes are made.- Parameters:
output (str | Path | None, default None) – Destination path for the decrypted PDF. Ignored if
in_place=True. IfNone, a new file is created with"-Decrypted"appended to the original name.in_place (bool, default False) – Whether to overwrite the original file in place.
owner_password (str | None, default None) – The owner password used to unlock and decrypt the PDF.
- Return type:
None
- encrypt(output=None, in_place=False, password=None, owner_password=None)[source]¶
Encrypt the PDF if it is not already encrypted.
This creates an encrypted version of the PDF using restrictive permissions by default. If
in_placeisFalse, the encrypted file is saved to a new path; otherwise, the original file is overwritten.For fine-grained control over permissions, use
save_pike_pdf()directly.- Parameters:
output (str | Path | None, default None) – Destination path for the encrypted PDF. Ignored if
in_place=True. IfNone, a new file is created with"-Encrypted"appended to the original name.in_place (bool, default False) – Whether to overwrite the original file in place.
password (str | None, default None) – The user password required to open the PDF. If
Noneor empty, no password is required to view.owner_password (str | None, default None) – The owner password used to set encryption and permissions.
- Return type:
None
- get_pdf_permissions()[source]¶
Return the current permission settings of the PDF.
- Returns:
A dictionary mapping permission names to boolean values. Keys include:
"extract""modify_annotation""modify_assembly""modify_form""modify_other""print_lowres""print_highres"
- Return type:
dict[str, bool]
- get_pdf_text(pages=None)[source]¶
Extract text from the PDF, optionally from specific pages.
- Parameters:
pages (PageNumberType, optional) –
Pages to extract text from. If
None(default), all pages are included. Acceptable formats include:a single int or str (e.g.,
5or"5")a range as a str (e.g.,
"2-4")a comma/space/
"and"-delimited str (e.g.,"1, 3 and 5-6")a list of ints and/or strs (e.g.,
[1, "3", "5-7"])
- Returns:
The extracted text as a single string. Returns an empty string if no text is found.
- Return type:
str
- classmethod merge_pdfs(pdf0_path, pdf1_path, output_path, add_separator=False, separator_type='black')[source]¶
Merge two PDF files, placing the first file on top.
- Parameters:
pdf0_path (str | Path) – Path to the first PDF, which will appear first in the output.
pdf1_path (str | Path) – Path to the second PDF, which will appear after the first.
output_path (str | Path) – Path to save the merged output PDF.
add_separator (bool, default False) – If
True, insert a separator page between the PDFs.separator_type ({"black", "blank"}, default "black") –
Type of separator page to insert:
"black": a black bar (~1 in height)"blank": a full blank page
- Raises:
ValueError – If
separator_typeis not"black"or"blank".- Return type:
None
- mv(dst)[source]¶
Move the PDF to a new location and update the internal path.
- Parameters:
dst (str | Path) – Destination path, including the filename and
.pdfextension.- Return type:
None
- pdf_is_encrypted()[source]¶
Return whether the PDF is encrypted.
- Returns:
Trueif the PDF is encrypted,Falseotherwise.- Return type:
bool
- classmethod pdfs_are_duplicates(pdf0_path, pdf1_path)[source]¶
Return whether two PDFs have identical extracted text content.
Text is extracted using
pdfminer. Layout, formatting, and metadata differences are ignored.- Parameters:
pdf0_path (str | Path) – Path to the first PDF file.
pdf1_path (str | Path) – Path to the second PDF file.
- Returns:
Trueif the extracted text from both PDFs is identical,Falseotherwise.- Return type:
bool
- print_permissions()[source]¶
Print encryption and permission status to the console.
Output is color-coded using
colorama:green for enabled permissions
red for disabled permissions
- Return type:
None
- resize(width, height, output_path=None)[source]¶
Resize all pages in the PDF to the specified dimensions.
- Parameters:
width (int) – Desired page width in points (1 inch = 72 points).
height (int) – Desired page height in points (1 inch = 72 points).
output_path (str | Path | None, default None) – Path to save the resized PDF. If
None, a new file is created in the same directory with the name pattern{original_name}-{width}x{height}.pdf.
- Raises:
ValueError – If
output_pathis provided and does not end with.pdf.- Return type:
None
- save_pike_pdf(output, in_place=False, crypt_type=None, password=None, owner_password=None, extract=True, modify_annotation=True, modify_assembly=True, modify_form=True, modify_other=True, print_lowres=True, print_highres=True)[source]¶
Save the PDF with optional encryption or decryption applied.
- Parameters:
output (str | Path | None) – Destination for the saved file. Ignored if
in_placeisTrue. IfNone, a new file is saved with a suffix such as"-Encrypted"or"-Decrypted"depending on usage.in_place (bool, default False) – If
True, overwrites the original file. IfFalse, creates a new file.crypt_type (str | None, default None) –
A preset encryption mode. Must be one of:
"decrypt": disables encryption entirely"encrypt": enables encryption with all permissions set toFalse"no_copy": like"decrypt"but with extract permission set toFalseNone: uses the individual permission arguments below
password (str | None, default None) – User password for opening the encrypted PDF. If
Noneor an empty string, no password is required to open.owner_password (str | None, default None) – Owner password used to set permissions. A default value is used if this is
None.extract (bool, default True) – Whether users can extract text or images.
modify_annotation (bool, default True) – Whether users can modify annotations.
modify_assembly (bool, default True) – Whether users can rearrange pages or merge documents.
modify_form (bool, default True) – Whether users can fill in or edit form fields.
modify_other (bool, default True) – Whether users can make general modifications.
print_lowres (bool, default True) – Whether users can print in low resolution.
print_highres (bool, default True) – Whether users can print in high resolution.
- Raises:
ValueError – If
crypt_typeis invalid or if the resolved output path is invalid.- Return type:
None
- word_count(pages=None)[source]¶
Count the number of words in the PDF.
- Parameters:
pages (PageNumberType, optional) – Pages to include in the word count. If
None(default), all pages are included. Seeget_pdf_text()for accepted formats.- Returns:
The total number of words found on the specified pages.
- Return type:
int