Convert HTML to DOCX – Python Code Examples
A DOCX file is a Microsoft Word document that typically contains the text but can contain a wide range of data, including tables, raster and vector graphics, video, sounds and diagrams. The DOCX file is highly editable, easy to use and manageable in size. This format is popular because of the variety of options it offers users to write any type of documents.
Using
Converter.convert_html() methods is the most common way to convert HTML code into various formats. With Aspose.HTML for Python via .NET, you can convert HTML to DOCX format programmatically with full control over a wide range of conversion parameters. In this article, you find information on how to convert HTML to DOCX using convert_html()
methods of the Converter class and how to apply
DocSaveOptions. Also, you can try an Online HTML Converter to test the Aspose.HTML functionality and convert HTML on the fly.
To continue following this tutorial, install and configure the Aspose.HTML for Python via .NET in your Python project. Our code examples help you to convert HTML to DOCX using the Python library.
Online HTML Converter
You can test the functionality of the Aspose.HTML for Python via .NET API and perform real-time HTML conversions. Load an HTML file from your local system or a URL, select the desired output format, and run the example. Default save options are applied, and you will receive the converted file instantly.
Convert HTML to DOCX – Python Code Examples
Converting HTML to another format using the convert_html() method is a sequence of operations among which document loading and saving:
- Load an HTML file using the HTMLDocument class.
- Create a new
DocSaveOptions object. The
DocSaveOptions
class provides numerous properties that give you full control over a wide range of parameters and improve the process of converting HTML to DOCX format. - Use one of the
convert_html() methods to save HTML as a DOCX file. In the following example, you need to pass the
HTMLDocument
,DocSaveOptions
, and output file path to theconvert_html()
method.
HTML to DOCX by a single line of code
The static methods of the Converter class are primarily used as the easiest way to convert an HTML code into various formats. You can convert HTML to DOCX in your Python application literally with a single line of code!
1from aspose.html import *
2from aspose.html.converters import *
3from aspose.html.saving import *
4
5# Convert HTML to DOCX
6Converter.convert_html("document.html", DocSaveOptions(), "document.docx")
Convert HTML to DOCX using DocSaveOptions
Let’s look over the following Python code snippet which shows the process of converting HTML to DOCX with DocSaveOptions
specifying:
1import os
2from aspose.html import *
3from aspose.html.saving import *
4from aspose.html.drawing import *
5from aspose.html.converters import *
6from aspose.html.rendering.doc import *
7
8# Setup directories and define paths
9output_dir = "output/"
10input_dir = "data/"
11if not os.path.exists(output_dir):
12 os.makedirs(output_dir)
13
14document_path = os.path.join(input_dir, "document.html")
15save_path = os.path.join(output_dir, "output1.docx")
16
17# Initialize an HTML document from the file
18document = HTMLDocument(document_path)
19
20# Initialize DocSaveOptions
21options = DocSaveOptions()
22
23# Customize save options for DOCX
24options.document_format.DOCX
25options.font_embedding_rule.FULL
26options.css.media_type.PRINT
27options.horizontal_resolution = Resolution.from_dots_per_inch(96.0)
28options.vertical_resolution = Resolution.from_dots_per_inch(96.0)
29
30# Convert HTML to DOCX
31Converter.convert_html(document, options, save_path)
32
33print(f"HTML document converted to DOCX successfully and saved to {save_path}")
We convert an HTML document to a DOCX file using save options in this example. The process involves initializing the HTML document, setting custom save options such as document format, font embedding rule, css media_type, and resolution, and then performing the conversion. Finally, the converted DOCX file is saved to a specified output directory.
Save Options – DocSaveOptions Class
The
DocSaveOptions class is a powerful configuration tool that allows you to fine-tune converting HTML documents to the DOCX format. Some properties of this class inherit properties of base classes, such as
DocRenderingOptions or RenderingOptions. DocSaveOptions
is configured to save the document as DOCX and it includes the following properties:
- page_setup – This property lets you define the page’s layout, including page size, margins, and other layout aspects, ensuring the output document matches the desired format.
- horizontal_resolution – This property sets or gets the horizontal resolution for internal images in pixels per inch. By default, it is 300 dpi. Higher resolutions can produce better rendering quality but larger file sizes. This property allows you to control the trade-offs between quality and file size.
- vertical_resolution – This property sets or gets the vertical resolution for internal images in pixels per inch. By default, it is 300 dpi. Similar to
horizontal_resolution,
this controls the vertical resolution of documents, affecting their clarity and overall size. - background_color – This property allows you to set the background color for the rendered output. If not set, the default background is transparent.
- css – This property gets a CssOptions object, which is used to configure CSS properties processing. For example, the
css.media_type
property specifies different styles for different media types, ensuring that the correct CSS rules are applied based on how the document is being rendered. - font_embedding_rule – This property sets the rule for embedding fonts and controls whether and how fonts are embedded in the output document. The default value is
NONE
. - document_format – This property sets the file format of the output document. The default is DOCX.
Download our Aspose.HTML for Python via .NET library to successfully, quickly, and easily convert your HTML, MHTML, EPUB, SVG, and Markdown documents to the most popular formats.
Aspose.HTML offers a free online HTML to DOCX Converter that converts HTML to DOCX with high quality, easy and fast. Just upload, convert your files and get results in a few seconds!