Convert PDF to Microsoft Word Documents in Python
Overview
This article explains how to convert PDF to Microsoft Word Documents using Python. It covers these topics.
Format: DOC
Format: DOCX
Format: Word
Python PDF to DOC and DOCX Conversion
One of the most popular features is the PDF to Microsoft Word DOC conversion, which makes content management easier. Aspose.PDF for Python allows you to convert PDF files not only to DOC but also to DOCX format, easily and efficiently.
Convert PDF to DOC (Word 97-2003) file
Convert PDF file to DOC format with ease and full control. Aspose.PDF for Python is flexible and supports a wide variety of conversions. Converting pages from PDF documents to images, for example, is a very popular feature.
A conversion that many of our customers have requested is PDF to DOC: converting a PDF file to a Microsoft Word document. Customers want this because PDF files cannot easily be edited, whereas Word documents can. Some companies want their users to be able to manipulate text, tables and images in files that started as PDFs.
Keeping alive the tradition of making things simple and understandable, Aspose.PDF for Python lets you transform a source PDF file into a DOC file with two lines of code. To accomplish this feature, we have introduced an enumeration named SaveFormat and its value .Doc lets you save the source file to Microsoft Word format.
The following Python code snippet shows the process of converting a PDF file into DOC format.
Steps: Convert PDF to DOC in Python
- Create an instance of Document object with the source PDF document.
- Save it to SaveFormat.Doc format by calling Document.Save() method.
from asposepdf import Api
documentName = "testdata/Hello.pdf"
doc = Api.Document(documentName)
documentOutName = "testout/out.doc"
doc.save(documentOutName, Api.SaveFormat.Doc)
Using the DocSaveOptions Class
The DocSaveOptions class provides numerous properties that improve the process of converting PDF files to DOC format. Among these properties, Mode enables you to specify the recognition mode for PDF content. You can specify any value from the RecognitionMode enumeration for this property. Each of these values has specific benefits and limitations:
from asposepdf import Api
DIR_INPUT = "testdata/"
DIR_OUTPUT = "testout/"
input_pdf = DIR_INPUT + "Hello.pdf"
output_pdf = DIR_OUTPUT + "convert_pdf_to_doc_with_options.doc"
# Open PDF document
document = Api.Document(input_pdf)
save_options = Api.DocSaveOptions()
save_options.format = Api.DocSaveOptions.DocFormat.Doc
# Set the recognition mode as Flow
save_options.mode = Api.DocSaveOptions.RecognitionMode.Flow
# Set the Horizontal proximity as 2.5
save_options.relative_horizontal_proximity = 2.5
# Enable the value to recognize bullets during conversion process
save_options.recognize_bullets = True
# Save the file into MS Word document format
document.save(output_pdf, save_options)
Try to convert PDF to DOC online
Aspose.PDF for Python presents you online free application “PDF to DOC”, where you may try to investigate the functionality and quality it works.
Convert PDF to DOCX
Aspose.PDF for Python API lets you read and convert PDF documents to DOCX using Python via .NET. DOCX is a well-known format for Microsoft Word documents whose structure was changed from plain binary to a combination of XML and binary files. Docx files can be opened with Word 2007 and lateral versions but not with the earlier versions of MS Word which support DOC file extensions.
The following Python code snippet shows the process of converting a PDF file into DOCX format.
Steps: Convert PDF to DOCX in Python
- Create an instance of Document object with the source PDF document.
- Save it to SaveFormat.DocX format by calling Document.Save() method.
from asposepdf import Api
DIR_INPUT = "testdata/"
DIR_OUTPUT = "testout/"
input_pdf = DIR_INPUT + "Hello.pdf"
output_pdf = DIR_OUTPUT + "convert_pdf_to_doc_with_options.docx"
# Open PDF document
document = Api.Document(input_pdf)
save_options = Api.DocSaveOptions()
save_options.format = Api.DocSaveOptions.DocFormat.Docx
# Set the recognition mode as Flow
save_options.mode = Api.DocSaveOptions.RecognitionMode.Flow
# Set the Horizontal proximity as 2.5
save_options.relative_horizontal_proximity = 2.5
# Enable the value to recognize bullets during conversion process
save_options.recognize_bullets = True
# Save the file into MS Word document format
document.save(output_pdf, save_options)
The DocSaveOptions class has a property named Format which provides the capability to specify the format of the resultant document, that is, DOC or DOCX. In order to convert a PDF file to DOCX format, please pass the Docx value from the DocSaveOptions.DocFormat enumeration.
Try to convert PDF to DOCX online
Aspose.PDF for Python presents you online free application “PDF to Word”, where you may try to investigate the functionality and quality it works.
See Also
This article also covers these topics. The codes are same as above.
Format: Word
- Python PDF to Word Code
- Python PDF to Word API
- Python PDF to Word Programmatically
- Python PDF to Word Library
- Python Save PDF as Word
- Python Generate Word from PDF
- Python Create Word from PDF
- Python PDF to Word Converter
Format: DOC
- Python PDF to DOC Code
- Python PDF to DOC API
- Python PDF to DOC Programmatically
- Python PDF to DOC Library
- Python Save PDF as DOC
- Python Generate DOC from PDF
- Python Create DOC from PDF
- Python PDF to DOC Converter
Format: DOCX