Convert PDF to Text in Python

Convert PDF to Text

Aspose.PDF for Python support converting whole PDF document and single page to a Text file.

Convert PDF document to Text file

You can convert PDF document to TXT file using ‘TextDevice’ class.

Creating the input, and output file path
Creating an instance of the PDF extractor facade with [extractor_create] (https://reference.aspose.com/pdf/python-cpp/core/extractor_create/)
Binding the PDF file to the extractor with extractor_bind_pdf
Extracting the text from the PDF file using extractor_extract_text
Writing the extracted text to the output file
Save the output PDF with ‘document.save’ method.

The following code snippet explains how to extract the texts from the all pages.


    from AsposePdfPython import *

    input_pdf = DIR_INPUT + "sample.pdf"
    output_pdf =  DIR_OUTPUT + "convert_pdf_to_txt.txt"

    extactor = extractor_create()
    extractor_bind_pdf(extactor,input_pdf)
    text = extractor_extract_text(extactor)

    with open(output_pdf, 'w') as f:
        f.write(text)

Convert PDF to Different Image Formats in Python