Convert PDF to TXT in Python
Contents
[
Hide
]
Convert PDF to TXT
Aspose.PDF for Python via C++ support converting PDF document to a Text file by following steps:
- Creating the input, and output file path
- Creating an instance of the PDF extractor facade with [extractor_create] (https://reference.aspose.com/pdf/python-cpp/core/extractor_create/)
- Binding the PDF file to the extractor with extractor_bind_pdf
- Extracting the text from the PDF file using extractor_extract_text
- Writing the extracted text to the output file
- Save the output PDF with ‘document.save’ method.
The code snippet below shows how to convert JPG Image to PDF using Python via C++:
import AsposePDFPython as apCore
import os
import os.path
# Creating the data directory path
dataDir = os.path.join(os.getcwd(), "samples")
# Creating the input file path
input_file = os.path.join(dataDir, "sample.pdf")
# Creating the output file path
output_file = os.path.join(dataDir, "results", "pdf-to-txt.txt")
# Creating an instance of the PDF extractor facade
extactor = apCore.facades_pdf_extractor_create()
# Binding the PDF file to the extractor
apCore.facades_facade_bind_pdf(extactor, input_file)
# Extracting the text from the PDF file
text = apCore.facades_pdf_extractor_extract_text(extactor)
# Writing the extracted text to the output file
with open(output_file, 'w') as f:
f.write(text)