Convert HTML to DOCX – Python Code Examples

A DOCX file is a Microsoft Word document that typically contains the text but can contain a wide range of data, including tables, raster and vector graphics, video, sounds and diagrams. The DOCX file is highly editable, easy to use and manageable in size. This format is popular because of the variety of options it offers users to write any type of documents.

Using Converter.convert_html() methods is the most common way to convert HTML code into various formats. With Aspose.HTML for Python via .NET, you can convert HTML to DOCX format programmatically with full control over a wide range of conversion parameters. In this article, you find information on how to convert HTML to DOCX using convert_html() methods of the Converter class and how to apply DocSaveOptions. Also, you can try an Online HTML Converter to test the Aspose.HTML functionality and convert HTML on the fly.

To continue following this tutorial, install and configure the Aspose.HTML for Python via .NET in your Python project. Our code examples help you to convert HTML to DOCX using the Python library.

Online HTML Converter

You can test the functionality of the Aspose.HTML for Python via .NET API and perform real-time HTML conversions. Load an HTML file from your local system or a URL, select the desired output format, and run the example. Default save options are applied, and you will receive the converted file instantly.

                
            

Convert HTML to DOCX – Python Code Examples

Converting HTML to another format using the convert_html() method is a sequence of operations among which document loading and saving:

  1. Load an HTML file using the HTMLDocument class.
  2. Create a new DocSaveOptions object. The DocSaveOptions class provides numerous properties that give you full control over a wide range of parameters and improve the process of converting HTML to DOCX format.
  3. Use one of the convert_html() methods to save HTML as a DOCX file. In the following example, you need to pass the HTMLDocument, DocSaveOptions, and output file path to the convert_html() method.

HTML to DOCX by a single line of code

The static methods of the Converter class are primarily used as the easiest way to convert an HTML code into various formats. You can convert HTML to DOCX in your Python application literally with a single line of code!

1from aspose.html import *
2from aspose.html.converters import *
3from aspose.html.saving import *
4
5# Convert HTML to DOCX
6Converter.convert_html("document.html", DocSaveOptions(), "document.docx")

Convert HTML to DOCX using DocSaveOptions

Let’s look over the following Python code snippet which shows the process of converting HTML to DOCX with DocSaveOptions specifying:

 1import os
 2from aspose.html import *
 3from aspose.html.saving import *
 4from aspose.html.drawing import *
 5from aspose.html.converters import *
 6from aspose.html.rendering.doc import *
 7
 8# Setup directories and define paths
 9output_dir = "output/"
10input_dir = "data/"
11if not os.path.exists(output_dir):
12    os.makedirs(output_dir)
13
14document_path = os.path.join(input_dir, "document.html")
15save_path = os.path.join(output_dir, "output1.docx")
16
17# Initialize an HTML document from the file
18document = HTMLDocument(document_path)
19
20# Initialize DocSaveOptions
21options = DocSaveOptions()
22
23# Customize save options for DOCX
24options.document_format.DOCX
25options.font_embedding_rule.FULL
26options.css.media_type.PRINT
27options.horizontal_resolution = Resolution.from_dots_per_inch(96.0)
28options.vertical_resolution = Resolution.from_dots_per_inch(96.0)
29
30# Convert HTML to DOCX
31Converter.convert_html(document, options, save_path)
32
33print(f"HTML document converted to DOCX successfully and saved to {save_path}")

We convert an HTML document to a DOCX file using save options in this example. The process involves initializing the HTML document, setting custom save options such as document format, font embedding rule, css media_type, and resolution, and then performing the conversion. Finally, the converted DOCX file is saved to a specified output directory.

Save Options – DocSaveOptions Class

The DocSaveOptions class is a powerful configuration tool that allows you to fine-tune converting HTML documents to the DOCX format. Some properties of this class inherit properties of base classes, such as DocRenderingOptions or RenderingOptions. DocSaveOptions is configured to save the document as DOCX and it includes the following properties:

Download our Aspose.HTML for Python via .NET library to successfully, quickly, and easily convert your HTML, MHTML, EPUB, SVG, and Markdown documents to the most popular formats.

Aspose.HTML offers a free online HTML to DOCX Converter that converts HTML to DOCX with high quality, easy and fast. Just upload, convert your files and get results in a few seconds!

Text “HTML to DOCX Converter”

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.