Convert PDF to Microsoft Word Documents in Python
Convert PDF to DOC
One of the most popular features is the PDF to Microsoft Word DOC conversion, which makes content management easier. Aspose.PDF for Python via .NET allows you to convert PDF files not only to DOC but also to DOCX format, easily and efficiently.
The DocSaveOptions class provides numerous properties that improve the process of converting PDF files to DOC format. Among these properties, Mode enables you to specify the recognition mode for PDF content. You can specify any value from the RecognitionMode enumeration for this property. Each of these values has specific benefits and limitations:
Steps: Convert PDF to DOC in Python
- Load the PDF into an ‘ap.Document’ object.
- Create a ‘DocSaveOptions’ instance.
- Set the format property to ‘DocFormat.DOC’ to ensure the output is in .doc format (older Word format).
- Save the PDF as a Word document using the specified save options.
- Print a confirmation message.
from os import path
import aspose.pdf as apdf
path_infile = path.join(self.data_dir, infile)
path_outfile = path.join(self.data_dir, "python", outfile)
document = apdf.Document(path_infile)
save_options = apdf.DocSaveOptions()
save_options.format = apdf.DocSaveOptions.DocFormat.DOC
document.save(path_outfile, save_options)
print(infile + " converted into " + outfile)
Try to convert PDF to DOC online
Aspose.PDF for Python presents you online free application “PDF to DOC”, where you may try to investigate the functionality and quality it works.
Convert PDF to DOCX
Aspose.PDF for Python API lets you read and convert PDF documents to DOCX using Python via .NET. DOCX is a well-known format for Microsoft Word documents whose structure was changed from plain binary to a combination of XML and binary files. Docx files can be opened with Word 2007 and lateral versions but not with the earlier versions of MS Word which support DOC file extensions.
The following Python code snippet shows the process of converting a PDF file into DOCX format.
Steps: Convert PDF to DOCX in Python
- Load the source PDF using ‘ap.Document’.
- Create an instance of ‘DocSaveOptions’.
- Set the format property to ‘DocFormat.DOC_X’ to generate a .docx file (modern Word format).
- Save the PDF as a DOCX file with the configured save options.
- Print a confirmation message after conversion.
from os import path
import aspose.pdf as apdf
path_infile = path.join(self.data_dir, infile)
path_outfile = path.join(self.data_dir, "python", outfile)
document = apdf.Document(path_infile)
save_options = apdf.DocSaveOptions()
save_options.format = apdf.DocSaveOptions.DocFormat.DOC_X
document.save(path_outfile, save_options)
print(infile + " converted into " + outfile)
The DocSaveOptions class has a property named Format which provides the capability to specify the format of the resultant document, that is, DOC or DOCX. In order to convert a PDF file to DOCX format, please pass the Docx value from the DocSaveOptions.DocFormat enumeration.
Try to convert PDF to DOCX online
Aspose.PDF for Python presents you online free application “PDF to Word”, where you may try to investigate the functionality and quality it works.
