Convert PDF to Text in Python

Convert PDF to Text

Aspose.PDF for Python support converting whole PDF document and single page to a Text file.

Convert PDF document to Text file

You can convert PDF document to TXT file using ‘TextDevice’ class.

  1. Creating the input, and output file path
  2. Creating an instance of the PDF extractor facade with [extractor_create] (https://reference.aspose.com/pdf/python-cpp/core/extractor_create/)
  3. Binding the PDF file to the extractor with extractor_bind_pdf
  4. Extracting the text from the PDF file using extractor_extract_text
  5. Writing the extracted text to the output file
  6. Save the output PDF with ‘document.save’ method.

The following code snippet explains how to extract the texts from the all pages.


    from AsposePdfPython import *

    input_pdf = DIR_INPUT + "sample.pdf"
    output_pdf =  DIR_OUTPUT + "convert_pdf_to_txt.txt"

    extactor = extractor_create()
    extractor_bind_pdf(extactor,input_pdf)
    text = extractor_extract_text(extactor)

    with open(output_pdf, 'w') as f:
        f.write(text)