Convert HTML to DOCX – Python Code Examples

A DOCX file is a Microsoft Word document that typically contains the text but can contain a wide range of data, including tables, raster and vector graphics, video, sounds and diagrams. The DOCX file is highly editable, easy to use and manageable in size. This format is popular because of the variety of options it offers users to write any type of documents.

Using Converter.convert_html() methods is the most common way to convert HTML code into various formats. With Aspose.HTML for Python via .NET, you can convert HTML to DOCX format programmatically with full control over a wide range of conversion parameters. In this article, you find information on how to convert HTML to DOCX using convert_html() methods of the Converter class and how to apply DocSaveOptions. Also, you can try an Online HTML Converter to test the Aspose.HTML functionality and convert HTML on the fly.

To continue following this tutorial, install and configure the Aspose.HTML for Python via .NET in your Python project. Our code examples help you to convert HTML to DOCX using the Python library.

Online HTML Converter

You can test the functionality of the Aspose.HTML for Python via .NET API and perform real-time HTML conversions. Load an HTML file from your local system or a URL, select the desired output format, and run the example. Default save options are applied, and you will receive the converted file instantly.

                
            

Convert HTML to DOCX – Python Code Examples

Converting HTML to another format using the convert_html() method is a sequence of operations among which document loading and saving:

  1. Load an HTML file using the HTMLDocument class.
  2. Create a new DocSaveOptions object. The DocSaveOptions class provides numerous properties that give you full control over a wide range of parameters and improve the process of converting HTML to DOCX format.
  3. Use one of the convert_html() methods to save HTML as a DOCX file. In the following example, you need to pass the HTMLDocument, DocSaveOptions, and output file path to the convert_html() method.

HTML to DOCX with one line of code

The static methods of the Converter class are primarily used as the easiest way to convert an HTML code into various formats. You can convert HTML to DOCX in your Python application literally with a single line of code!

1# Convert HTML to DOCX using Python
2
3import aspose.html.saving as sav
4import aspose.html.converters as conv
5
6# Convert HTML to DOCX
7conv.Converter.convert_html("document.html", sav.DocSaveOptions(), "document.docx")

Convert HTML to DOCX using DocSaveOptions

Let’s look over the following Python code snippet, which shows the process of converting HTML to DOCX with DocSaveOptions specifying:

 1# Convert HTML to DOCX using Python with custom settings
 2
 3import os
 4import aspose.html as ah
 5import aspose.html.saving as sav
 6import aspose.html.drawing as dr
 7import aspose.html.converters as conv
 8import aspose.pydrawing as pd
 9
10# Setup directories and define paths
11output_dir = "output/"
12input_dir = "data/"
13os.makedirs(output_dir, exist_ok=True)
14
15document_path = os.path.join(input_dir, "document.html")
16save_path = os.path.join(output_dir, "document.docx")
17
18# Load an HTML document from a file or URL
19doc = ah.HTMLDocument(document_path)
20
21# Initialize saving options
22options = sav.DocSaveOptions()
23options.page_setup.any_page.size = dr.Size(300, 300)
24page_margin = dr.Margin(40, 40, 10, 10)
25options.page_setup.any_page.margin = page_margin
26options.document_format.DOCX
27options.font_embedding_rule.FULL
28options.css.media_type.PRINT
29options.horizontal_resolution = dr.Resolution.from_dots_per_inch(300.0)
30options.vertical_resolution = dr.Resolution.from_dots_per_inch(300.0)
31options.background_color = pd.Color.bisque
32
33# Convert HTML to DOCX
34conv.Converter.convert_html(doc, options, save_path)

We convert an HTML document to a DOCX file using save options in this example. The process involves initializing the HTML document, setting custom save options such as document format, font embedding rule, css media_type, background color, and resolution, and then performing the conversion. Finally, the converted DOCX file is saved to a specified output directory.

Save Options – DocSaveOptions Class

The DocSaveOptions class is a powerful configuration tool that allows you to fine-tune converting HTML documents to the DOCX format. Some properties of this class inherit properties of base classes, such as DocRenderingOptions or RenderingOptions. DocSaveOptions is configured to save the document as DOCX and it includes the following properties:

Download the Aspose.HTML for Python via .NET library to successfully, quickly, and easily convert your HTML, MHTML, EPUB, SVG, and Markdown documents to the most popular formats.

Aspose.HTML offers a free online HTML to DOCX Converter that converts HTML to DOCX with high quality, easy and fast. Just upload, convert your files and get results in a few seconds!

Text “HTML to DOCX Converter”

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.