Convert PDF to Text in Python
Contents
[
Hide
]
Convert PDF to Text
Aspose.PDF for Python support converting whole PDF document and single page to a Text file.
Convert PDF document to Text file
You can convert PDF document to TXT file using ‘TextDevice’ class.
- Creating the input, and output file path
- Creating an instance of the PDF extractor facade with [extractor_create] (https://reference.aspose.com/pdf/python-cpp/core/extractor_create/)
- Binding the PDF file to the extractor with extractor_bind_pdf
- Extracting the text from the PDF file using extractor_extract_text
- Writing the extracted text to the output file
- Save the output PDF with ‘document.save’ method.
The following code snippet explains how to extract the texts from the all pages.
from AsposePdfPython import *
input_pdf = DIR_INPUT + "sample.pdf"
output_pdf = DIR_OUTPUT + "convert_pdf_to_txt.txt"
extactor = extractor_create()
extractor_bind_pdf(extactor,input_pdf)
text = extractor_extract_text(extactor)
with open(output_pdf, 'w') as f:
f.write(text)