Replace Text in PDF with Python

These examples show how to modify or remove text in an existing PDF.

Use this page when you need to update text values, remove unwanted content, or apply text replacement rules across PDF pages.

Replace existing text

Replace Text in all pages of PDF document

Text replacement is a common requirement when updating or correcting content in existing PDF documents — for instance, changing product names, fixing typos, or updating terminology across multiple pages.

Aspose.PDF for Python via .NET offers a powerful and efficient method for searching and replacing text programmatically through the TextFragmentAbsorber class.

This example demonstrates how to find all occurrences of a specific phrase (in this case, “Black cat”) and replace them with a new phrase (“White dog”) throughout an entire PDF document.

  1. Specify Search and Replacement Phrases. Set the text you want to find and the text you want it replaced with.
  2. Load the PDF Document.
  3. Create a Text Absorber. A TextFragmentAbsorber is initialized with the search phrase. It scans the document for all instances of the given phrase.
  4. Apply the Absorber to All Pages. This iterates through all pages and collects text fragments matching the phrase.
  5. Replace each found fragment. Every instance of “Black cat” should be changed to “White dog”.
  6. Save the Updated PDF.
import sys
import aspose.pdf as ap
from os import path

def replace_text_on_all_pages(infile, outfile):
    search_phrase = "PDF"
    replace_phrase = "pdf"

    with ap.Document(infile) as document:
        absorber = ap.text.TextFragmentAbsorber(search_phrase)
        document.pages.accept(absorber)

        for fragment in absorber.text_fragments:
            fragment.text = replace_phrase

        document.save(outfile)

Replace Text in particular page region

Sometimes, you may need to replace text only within a specific area of a PDF page instead of searching the entire document — for example, updating a header, footer, or a table cell within a known position.

The Aspose.PDF for Python via .NET library enables this functionality by utilizing the TextFragmentAbsorber in conjunction with region-based text searching.

This example demonstrates how to find and replace all occurrences of a target phrase within a defined rectangular region on a specific page.

  1. Specify Search and Replacement Phrases.
  2. Load the PDF Document.
  3. Create a Text Absorber for Searching. Initialize a TextFragmentAbsorber to find the desired text.
  4. Restrict the Search Area. The rectangle specifies the x- and y-coordinate limits on the page.
  5. Apply the Absorber to a Specific Page. This performs the search and collects matching text fragments within the specified area.
  6. Replace the Found Text. Every occurrence of ‘doc’ in the defined region becomes ‘DOC’.
  7. Save the Updated PDF.
import sys
import aspose.pdf as ap
from os import path

def replace_text_in_particular_page_region(infile, outfile):
    search_phrase = "doc"
    replace_phrase = "DOC"

    with ap.Document(infile) as document:
        absorber = ap.text.TextFragmentAbsorber(search_phrase)
        absorber.text_search_options.limit_to_page_bounds = True
        absorber.text_search_options.rectangle = ap.Rectangle(300, 442, 500, 742, True)
        document.pages[1].accept(absorber)

        for fragment in absorber.text_fragments:
            fragment.text = replace_phrase

        document.save(outfile)

Resize and Shift Text Without Changing Font Size

When replacing text in a PDF, sometimes you want to fit or reposition the new text within a specific area without modifying the font size. Aspose.PDF for Python via .NET provides options to adjust text layout and spacing while keeping the original font size intact.

  1. Load the PDF Document.
  2. Collect all text fragments on the page using a ‘TextFragmentAbsorber’.
  3. Select the Fragment to Modify.
  4. Shift and resize the text rectangle.
  5. Adjust Text Spacing. Enable spacing adjustment to fit the text within the modified rectangle.
  6. Replace the fragment text.
  7. Save the Updated PDF.
import sys
import aspose.pdf as ap
from os import path

def replace_text_and_resize_and_shift_without_changing_font_size(infile, outfile):
    with ap.Document(infile) as document:
        absorber = ap.text.TextFragmentAbsorber()
        absorber.visit(document.pages[1])
        fragment = absorber.text_fragments[1]
        text = fragment.text
        rect = fragment.rectangle
        rect.llx += 50
        rect.urx -= 50
        fragment.replace_options.rectangle = rect
        fragment.replace_options.replace_adjustment_action = (
            ap.text.TextReplaceOptions.ReplaceAdjustment.ADJUST_SPACE_WIDTH
        )
        fragment.text = f"{text} {text}"
        document.save(outfile)

Resize and Shift a Paragraph in PDF

When working with PDFs, sometimes you need to replace or expand a paragraph while keeping it visually aligned with the page layout. Aspose.PDF allows you to resize the paragraph’s bounding rectangle and adjust spacing to fit new text, all without changing the font size.

  1. Load the PDF Document.
  2. Use ‘TextFragmentAbsorber’ to collect all text fragments on the page.
  3. Select the Fragment to Modify.
  4. Resize and Shift the Paragraph. Use the page’s media box to determine bounds and adjust the rectangle.
  5. Adjust Spacing. This modifies the spacing between words/letters instead of changing font size.
  6. Replace the fragment text.
  7. Save the Modified PDF.
import sys
import aspose.pdf as ap
from os import path

def replace_text_and_resize_and_shift_paragraph(infile, outfile):
    with ap.Document(infile) as document:
        absorber = ap.text.TextFragmentAbsorber()
        absorber.visit(document.pages[1])
        fragment = absorber.text_fragments[1]
        text = fragment.text
        rect = document.pages[1].media_box
        rect.llx += 20
        rect.urx -= 20
        rect.ury -= 20
        fragment.replace_options.rectangle = rect
        fragment.replace_options.replace_adjustment_action = (
            ap.text.TextReplaceOptions.ReplaceAdjustment.ADJUST_SPACE_WIDTH
        )
        fragment.text = f"{text} {text}"
        document.save(outfile)

Replace Text and Automatically Expand Font to Fill Target Area

Replace text in a PDF while automatically resizing and expanding the font to fill a specific rectangular area. Using the Aspose.PDF for Python via .NET library, the code dynamically adjusts the font size and spacing so that the new text content perfectly fits within a defined bounding box — without manual font calculations.

  1. Load the PDF.
  2. Capture Text Fragments.
  3. Select a Specific Fragment.
  4. Define Target Rectangle.
  5. Enable Text Adjustment Options.
  6. Replace Text.
  7. Save the Document.
import sys
import aspose.pdf as ap
from os import path

def replace_text_and_resize_and_expand_font(infile, outfile):
    with ap.Document(infile) as document:
        absorber = ap.text.TextFragmentAbsorber()
        absorber.visit(document.pages[1])
        fragment = absorber.text_fragments[1]
        text = fragment.text
        fragment.replace_options.rectangle = ap.Rectangle(100, 300, 512, 692, True)
        fragment.replace_options.replace_adjustment_action = (
            ap.text.TextReplaceOptions.ReplaceAdjustment.ADJUST_SPACE_WIDTH
        )
        fragment.replace_options.font_size_adjustment_action = (
            ap.text.TextReplaceOptions.FontSizeAdjustment.SCALE_TO_FILL
        )
        fragment.text = f"{text} {text}"
        document.save(outfile)

Replace Text and Fit It into a Rectangle

Replace text in a PDF document while ensuring the new content fits within the original text’s rectangular area by automatically reducing the font size when needed.

Using the Aspose.PDF for Python via .NET library, this function adjusts both the text layout and font size dynamically, preserving document structure while preventing overflow.

  1. Create a TextFragmentAbsorber object to extract all text fragments from the first page.
  2. Access a Specific Text Fragment.
  3. Set the Replacement Area.
  4. Configure Text Adjustment Options. Set two key replacement options:
    • Font size adjustment - ‘SHRINK_TO_FIT’ automatically reduces font size if the new text is too long.
    • Spacing adjustment - ‘ADJUST_SPACE_WIDTH’ keeps spacing proportional.
  5. Replace the Text.
  6. Save the Modified PDF.
import sys
import aspose.pdf as ap
from os import path

def replace_text_and_fit_text_into_rectangle(infile, outfile):
    with ap.Document(infile) as document:
        absorber = ap.text.TextFragmentAbsorber()
        absorber.visit(document.pages[1])
        fragment = absorber.text_fragments[1]
        text = fragment.text
        fragment.replace_options.rectangle = fragment.rectangle
        fragment.replace_options.font_size_adjustment_action = (
            ap.text.TextReplaceOptions.FontSizeAdjustment.SHRINK_TO_FIT
        )
        fragment.replace_options.replace_adjustment_action = (
            ap.text.TextReplaceOptions.ReplaceAdjustment.ADJUST_SPACE_WIDTH
        )
        fragment.text = f"{text} {text}"
        document.save(outfile)

Automatically Replace Placeholder Text and Rearrange PDF Layout

Replace placeholder text inside a PDF (e.g., templates or forms) with actual data such as names or company information. It automatically adjusts the page layout to fit new text while applying custom formatting (font, color, size).

  1. Import and Load the PDF.
  2. Create a Text Absorber for the Placeholder.
  3. Apply the Absorber to All Pages.
  4. Loop Through Found Text Fragments.
  5. Apply Custom Text Formatting.
  6. Save the Updated Document.
import sys
import aspose.pdf as ap
from os import path

def automatically_rearrange_page_contents(input_file, output_file):
    document = ap.Document(input_file)

    absorber = ap.text.TextFragmentAbsorber("[Long_placeholder_Long_placeholder]")
    document.pages.accept(absorber)

    for text_fragment in absorber.text_fragments:
        # text_fragment.text = "John Smith"
        text_fragment.text = "John Smith, South Development Studio"
        text_fragment.text_state.font = ap.text.FontRepository.find_font("Calibri")
        text_fragment.text_state.font_size = 12
        text_fragment.text_state.foreground_color = ap.Color.navy

    # Save PDF document
    document.save(output_file)

Replace Text Based on a Regular Expression

When working with PDF documents, you may need to replace text that follows a pattern rather than a specific phrase — for example, phone numbers, codes, or date-like formats.

Aspose.PDF for Python via .NET allows you to perform such replacements using regular expressions (regex) with the TextFragmentAbsorber class.

This example demonstrates how to find text patterns (in this case, any text matching the format ####-####, such as 1234-5678) and replace them with a formatted string ‘ABC1-2XZY’. It also shows how to customize the font, color, and size of the replaced text.

The following code snippet shows you how to replace text based on a regular expression.

  1. Load the PDF Document.
  2. Create a Regex-Based Text Absorber. Initialize the TextFragmentAbsorber with a regular expression pattern.
  3. Enable Regular Expression Mode. The ‘True’ parameter activates regular expression search mode.
  4. Apply the Absorber to a Page. This scans the page for all text fragments that match the defined regex pattern.
  5. Replace each match with new text and apply custom styling.
  6. Save the Modified Document.
import sys
import aspose.pdf as ap
from os import path

def replace_text_based_on_regex(infile, outfile):
    with ap.Document(infile) as document:
        absorber = ap.text.TextFragmentAbsorber(r"\d{4}-\d{4}")
        absorber.text_search_options = ap.text.TextSearchOptions(True)
        document.pages[1].accept(absorber)

        for fragment in absorber.text_fragments:
            fragment.text = "ABC1-2XZY"
            fragment.text_state.font = ap.text.FontRepository.find_font("Verdana")
            fragment.text_state.font_size = 12
            fragment.text_state.foreground_color = ap.Color.blue
            fragment.text_state.background_color = ap.Color.light_green

        document.save(outfile)

Replace fonts or remove unused fonts

Replace fonts in existing PDF file

On occasion, you need to standardize or update fonts across a PDF — for instance, replacing an outdated or proprietary font with a more accessible one. The Aspose.PDF for Python via .NET library allows you to detect and replace fonts programmatically, ensuring consistent typography and document compatibility.

This example demonstrates how to replace all instances of a specific font (e.g., ‘Arial-BoldMT’) with another font (e.g., ‘Verdana’) throughout a PDF document.

The following code snippet shows how to replace the font inside PDF document:

  1. Open the PDF Document.
  2. Initialize a TextFragmentAbsorber.
  3. Use the Absorber to extract text fragments from every page in the document.
  4. Identify and Replace Fonts. The script checks if a fragment’s current font is ‘Arial-BoldMT’. If true, it replaces it with the ‘Verdana’ font using the FontRepository.find_font() method.
  5. Save the Modified Document.
import sys
import aspose.pdf as ap
from os import path

def replace_fonts(infile, outfile):
    with ap.Document(infile) as document:
        absorber = ap.text.TextFragmentAbsorber()
        document.pages.accept(absorber)

        for fragment in absorber.text_fragments:
            if fragment.text_state.font.font_name == "Arial-BoldMT":
                fragment.text_state.font = ap.text.FontRepository.find_font("Verdana")

        document.save(outfile)

Remove unused fonts

Over time, PDF documents can accumulate unused or embedded fonts that increase file size and slow down processing. These unused fonts often remain even after text edits or replacements, especially when working with large or complex PDFs.

The Aspose.PDF for Python via .NET library provides an efficient way to remove such redundant fonts using the TextEditOptions class. This not only optimizes your document but also ensures it uses only the fonts actually applied to visible text.

The ‘remove_unused_fonts()’ method is a simple but powerful way to optimize PDF files by removing redundant font data.

This example demonstrates how to:

  • Scan a PDF for unused fonts.
  • Remove them safely.
  • Reassign active text fragments to a consistent font (e.g., Times New Roman).
  1. Open the PDF Document.
  2. Configure Text Editing Options. This instructs the engine to eliminate any embedded fonts not currently used in the visible text.
  3. Create a Text Absorber with Options. A TextFragmentAbsorber extracts text fragments from the document for editing.
  4. Reassign a Standard Font. Once the absorber has collected all fragments, iterate through them and apply a consistent font.
  5. Save the Cleaned PDF.
import sys
import aspose.pdf as ap
from os import path

def remove_unused_fonts(input_file, output_file):
    # Open PDF document
    document = ap.Document(input_file)

    # Initialize text edit options to remove unused fonts
    options = ap.text.TextEditOptions(
        ap.text.TextEditOptions.FontReplace.REMOVE_UNUSED_FONTS
    )

    # Create a TextFragmentAbsorber with the specified options
    absorber = ap.text.TextFragmentAbsorber(options)
    document.pages.accept(absorber)

    # Iterate through all TextFragments
    for text_fragment in absorber.text_fragments:
        text_fragment.text_state.font = ap.text.FontRepository.find_font(
            "TimesNewRoman"
        )

    # Save the updated PDF document
    document.save(output_file)

Remove all Text

Remove Text from PDF

Remove all text content from a PDF file while keeping images, shapes, and layout structures intact. By using TextFragmentAbsorber, the code efficiently scans the entire document and deletes every text fragment found on each page.

  1. Load the PDF Document.
  2. A TextFragmentAbsorber object is created to detect and handle text fragments in the PDF.
  3. Remove All Text Content. The method ‘absorber.remove_all_text()’ removes every text element from the loaded document, leaving non-text components untouched.
  4. Save the Updated Document.
import sys
import aspose.pdf as ap
from os import path

def remove_all_text_using_absorber1(infile, outfile):
    with ap.Document(infile) as document:
        absorber = ap.text.TextFragmentAbsorber()
        absorber.remove_all_text(document)
        document.save(outfile)

Remove all Text from a Specific Page

Remove all text from a single page of a PDF document using the TextFragmentAbsorber class in Aspose.PDF. Unlike full-document removal, this method performs page-level text cleanup, deleting text only from the chosen page while leaving all other pages untouched.

  1. Load the PDF File.
  2. Create a TextFragmentAbsorber Instance.
  3. Remove All Text from the First Page.
  4. Save the Modified PDF.
import sys
import aspose.pdf as ap
from os import path

def remove_all_text_using_absorber2(infile, outfile):
    with ap.Document(infile) as document:
        absorber = ap.text.TextFragmentAbsorber()
        absorber.remove_all_text(document.pages[1])
        document.save(outfile)

Remove all Text from particular area on PDF page

Remove all text from a specific rectangular region on a page using Aspose.PDF’s TextFragmentAbsorber. Instead of clearing an entire page, this method performs targeted text removal, allowing precise control over which part of the page is affected.

  1. Load the PDF Document.
  2. Create a TextFragmentAbsorber.
  3. Define the Target Area (Rectangle).
  4. Remove Text from the Specified Region.
  5. Preserve the Rest of the Document.
  6. Save the Modified PDF.
import sys
import aspose.pdf as ap
from os import path

def remove_all_text_using_absorber3(infile, outfile):
    with ap.Document(infile) as document:
        absorber = ap.text.TextFragmentAbsorber()
        absorber.remove_all_text(
            document.pages[1], ap.Rectangle(10, 200, 120, 600, True)
        )
        document.save(outfile)

Remove all hidden Text from a PDF document

Remove all text from a specific rectangular region on a page using Aspose.PDF’s TextFragmentAbsorber. Instead of clearing an entire page, this method performs targeted text removal, allowing precise control over which part of the page is affected.

  1. Load the PDF Document.
  2. Create a TextFragmentAbsorber.
  3. Define the Target Area (Rectangle).
  4. Remove Text from the Specified Region.
  5. Preserve the Rest of the Document.
  6. Save the Modified PDF.
import sys
import aspose.pdf as ap
from os import path

def remove_hidden_text(infile, outfile):
    # Open PDF document
    with ap.Document(infile) as document:
        text_absorber = ap.text.TextFragmentAbsorber()
        # This option can be used to prevent other text fragments from moving after hidden text replacement
        text_absorber.text_replace_options = ap.text.TextReplaceOptions(
            ap.text.TextReplaceOptions.ReplaceAdjustment.NONE
        )
        document.pages.accept(text_absorber)
        # Remove hidden text
        for fragment in text_absorber.text_fragments:
            if fragment.text_state.invisible:
                fragment.text = ""
        # Save PDF document
        document.save(outfile)