Convert PDF to TXT in Python

Convert PDF to TXT

Aspose.PDF for Python via C++ support converting PDF document to a Text file by following steps:

  1. Creating the input, and output file path
  2. Creating an instance of the PDF extractor facade with [extractor_create] (https://reference.aspose.com/pdf/python-cpp/core/extractor_create/)
  3. Binding the PDF file to the extractor with extractor_bind_pdf
  4. Extracting the text from the PDF file using extractor_extract_text
  5. Writing the extracted text to the output file
  6. Save the output PDF with ‘document.save’ method.

The code snippet below shows how to convert JPG Image to PDF using Python via C++:


    import AsposePDFPython as apCore
    import os
    import os.path

    # Creating the data directory path
    dataDir = os.path.join(os.getcwd(), "samples")

    # Creating the input file path
    input_file = os.path.join(dataDir, "sample.pdf")

    # Creating the output file path
    output_file = os.path.join(dataDir, "results", "pdf-to-txt.txt")

    # Creating an instance of the PDF extractor facade
    extactor = apCore.facades_pdf_extractor_create()

    # Binding the PDF file to the extractor
    apCore.facades_facade_bind_pdf(extactor, input_file)

    # Extracting the text from the PDF file
    text = apCore.facades_pdf_extractor_extract_text(extactor)

    # Writing the extracted text to the output file
    with open(output_file, 'w') as f:
        f.write(text)