Getting and Setting Page Properties using Python
Aspose.PDF for Python via .NET lets you read and set properties of pages in a PDF file in your Python applications. This section shows how to get the number of pages in a PDF file, get information about PDF page properties such as color and set page properties. The examples given are in Python.
Get Number of Pages in a PDF File
When working with documents, you often want to know how many pages they contain. With Aspose.PDF this takes no more than two lines of code.
To get the number of pages in a PDF file:
- Open the PDF file using the Document class.
- Then use the PageCollection collection’s Count property (from the Document object) to get the total number of pages in the document.
The following code snippet shows how to get the number of pages of a PDF file.
import os
import aspose.pdf as ap
# Global configuration
DATA_DIR = "your path here"
def get_page_count(input_file_name):
"""
Get the total number of pages in a PDF document.
Args:
input_file_name (str): Path to the input PDF file.
Returns:
None: Prints the page count to console.
Example:
get_page_count("example.pdf")
# Output: Page Count: 10
"""
# Open document
document = ap.Document(input_file_name)
# Get page count
print("Page Count:", str(len(document.pages)))
Get page count without saving the document
Sometimes we generate the PDF files on the fly and during PDF file creation, we may come across the requirement (creating Table Of Contents etc.) to get page count of PDF file without saving the file over system or stream. So in order to cater to this requirement, a method process_paragraphs() has been introduced in Document class. Please take a look over the following code snippet which shows the steps to get page count without saving the document.
import os
import aspose.pdf as ap
# Global configuration
DATA_DIR = "your path here"
def get_page_count_without_saving(input_file_name):
"""
Get the page count of a PDF document after adding content without saving the file.
This function opens an existing PDF document, adds a new page with 300 text fragments,
processes the paragraphs to ensure accurate page counting, and prints the total number
of pages in the document. The document is not saved to disk.
Args:
input_file_name (str): Path to the input PDF file to be processed.
Returns:
None: This function prints the page count but does not return a value.
Example:
>>> get_page_count_without_saving("sample.pdf")
Number of pages in document = 2
"""
# Instantiate Document instance
document = ap.Document(input_file_name)
# Add page to pages collection of PDF file
page = document.pages.add()
# Create loop instance
for _ in range(0, 300):
# Add TextFragment to paragraphs collection of page object
page.paragraphs.add(ap.text.TextFragment("Pages count test"))
# Process the paragraphs in PDF file to get accurate page count
document.process_paragraphs()
# Print number of pages in document
print("Number of pages in document =", str(len(document.pages)))
Get Page Properties
Each page in a PDF file has a number of properties, such as the width, height, bleed-, crop- and trimbox. Aspose.PDF allows you to access these properties.
Understanding Page Properties: the Difference between Artbox, BleedBox, CropBox, MediaBox, TrimBox and Rect property
- Media box: The media box is the largest page box. It corresponds to the page size (for example A4, A5, US Letter, etc.) selected when the document was printed to PostScript or PDF. In other words, the media box determines the physical size of the media on which the PDF document is displayed or printed.
- Bleed box: If the document has bleed, the PDF will also have a bleed box. Bleed is the amount of color (or artwork) that extends beyond the edge of a page. It is used to make sure that when the document is printed and cut to size (“trimmed”), the ink will go all the way to the edge of the page. Even if the page is mistrimmed - cut slightly off the trim marks - no white edges will appear on the page.
- Trim box: The trim box indicates the final size of a document after printing and trimming.
- Art box: The art box is the box drawn around the actual contents of the pages in your documents. This page box is used when importing PDF documents in other applications.
- Crop box: The crop box is the “page” size at which your PDF document is displayed in Adobe Acrobat. In normal view, only the contents of the crop box are displayed in Adobe Acrobat. For detailed descriptions of these properties, read the Adobe.Pdf specification, particularly 10.10.1 Page Boundaries.
- Page.Rect: the intersection (commonly visible rectangle) of the MediaBox and DropBox. The picture below illustrates these properties.
For further details, please visit this page.
Accessing Page Properties
The Page class provides all the properties related to a particular PDF page. All the pages of the PDF files are contained in the of the Document object’s PageCollection collection.
From there, it is possible to access either individual Page objects using their index, or loop through the collection, using a foreach loop, to get all pages. Once an individual page is accessed, we can get its properties. The following code snippet shows how to get page properties.
import os
import aspose.pdf as ap
# Global configuration
DATA_DIR = "your path here"
def get_page_properties(input_file_name):
"""
Retrieves and displays various page properties for the first page of a PDF document.
Args:
input_file_name (str): Path to the PDF file to analyze.
"""
# Open document
document = ap.Document(input_file_name)
# Get particular page
page = document.pages[1]
# Get page properties
boxes = {
"ArtBox": page.art_box,
"BleedBox": page.bleed_box,
"CropBox": page.crop_box,
"MediaBox": page.media_box,
"TrimBox": page.trim_box,
"Rect": page.rect
}
# Print box properties
for box_name, box in boxes.items():
print(f"{box_name} : Height={box.height},Width={box.width},LLX={box.llx},LLY={box.lly},URX={box.urx},URY={box.ury}")
# Print other page properties
print(f"Page Number : {page.number}")
print(f"Rotate : {page.rotate}")
Determine Page Color
The Page class provides the properties related to a particular page in a PDF document, including what type of colour - RGB, black and white, grayscale or undefined - the page uses.
All the pages of the PDF files are contained by the PageCollection collection. The color_type property specifies the color of elements on page. To get or determine the color information for particular PDF page, use the Page object’s color_type property.
The following code snippet shows how to iterate through individual page of PDF file to get color information.
import os
import aspose.pdf as ap
# Global configuration
DATA_DIR = "your path here"
def get_page_color_type(input_file_name):
"""
Analyzes and prints the color type information for each page in a PDF document.
This function opens a PDF file and iterates through all pages to determine
the color type of each page (black and white, grayscale, RGB, or undefined).
The results are printed to the console with human-readable descriptions.
Args:
input_file_name (str): Path to the PDF file to analyze.
Returns:
None: This function prints results directly to console and doesn't return a value.
Example:
>>> get_page_color_type("sample.pdf")
Page # 1 is RGB.
Page # 2 is Gray Scale.
Page # 3 is Black and white.
Note:
Requires the aspose.pdf library (imported as ap) to be installed and available.
The PDF file must be accessible at the specified path.
"""
# Open source PDF file
document = ap.Document(input_file_name)
# Iterate through all the page of PDF file
for page_number in range(1, len(document.pages) + 1):
# Get the color type information for particular PDF page
page_color_type = document.pages[page_number].color_type
color_type_map = {
ap.ColorType.BLACK_AND_WHITE: "Black and white",
ap.ColorType.GRAYSCALE: "Gray Scale",
ap.ColorType.RGB: "RGB",
ap.ColorType.UNDEFINED: "undefined"
}
color_description = color_type_map.get(page_color_type, "unknown")
print(f"Page # {page_number} is {color_description}.")