Create a Document in Python with Aspose.HTML – Create or Load HTML, SVG, MHTML, EPUB

This article offers a detailed guide on how to create an HTML document using Aspose.HTML for Python via .NET API. The API provides the HTMLDocument class, which is the root of the HTML hierarchy and holds the entire content. This class has a set of constructors that allow you to create or load HTML documents in different ways. HTML documents can be created from scratch as an empty document with an HTML structure, from a string, from a memory stream, or loaded from a file or URL.

HTML Document

The HTMLDocument is a starting point for Aspose.HTML Python library. You can load the HTML document into the Document Object Model (DOM) by using one of the HTMLDocument() constructors and then programmatically read, modify the document tree, add and remove nodes, change the node properties in the document as it is described in the official specifications.

The HTMLDocument class provides an in-memory representation of an HTML DOM that is fully compliant with W3C DOM and WHATWG DOM specifications. If you are familiar with the WHATWG DOM, WHATWG HTML, and JavaScript standards, using the Aspose.HTML for Python via .NET API will be quite comfortable and easy.

Create an Empty HTML Document

The following Python code snippet shows the usage of the default HTMLDocument() constructor to create an empty HTML document and save it to a file.

 1# Create an empty HTML document using Python
 2
 3import os
 4import aspose.html as ah
 5
 6# Setup an output directory and prepare a path to save the document
 7output_dir = "output"
 8if not os.path.exists(output_dir):
 9    os.makedirs(output_dir)
10save_path = os.path.join(output_dir, "document-empty.html")
11
12# Initialize an empty HTML document
13document = ah.HTMLDocument()
14
15# Work with the document here...
16
17# Save the document to a file
18document.save(save_path)

After the creation, the file document-empty.html appears with the initial document structure: the empty document includes elements such as <html> <head> and <body>. Once the document object is created, it can be filled later with HTML elements.

Create a New HTML Document

If you want to create an HTML document programmatically from scratch, please use constructor without parameters as specified in the following code snippet:

 1# Create an HTML document using Python
 2
 3import os
 4import aspose.html as ah
 5
 6# Prepare the output path to save a document
 7output_dir = "output/"
 8if not os.path.exists(output_dir):
 9    os.makedirs(output_dir)
10document_path = os.path.join(output_dir, "create-new-document.html")
11
12# Initialize an empty HTML document
13with ah.HTMLDocument() as document:
14    # Create a text node and add it to the document
15    text = document.create_text_node("Hello, World!")
16    document.body.append_child(text)
17
18    # Save the document to a file
19    document.save(document_path)

In the new document, we have created a text node, given the specified string, using the create_text_node() method and added it to the <body> element using append_child() method.

How to edit an HTML file is described in detail in the Edit HTML Document article.

More details about HTML files saving are in the Save HTML Document article.

Load HTML from a File

If you require to load an existing HTML file from a file, work and save it, then the following code snippet will help you:

 1# Load HTML from a file using Python
 2
 3import os
 4import aspose.html as ah
 5
 6# Setup directories and define paths
 7output_dir = "output/"
 8input_dir = "data/"
 9if not os.path.exists(output_dir):
10    os.makedirs(output_dir)
11
12document_path = os.path.join(input_dir, "document.html")
13save_path = os.path.join(output_dir, "document-edited.html")
14
15# Initialize a document from a file
16document = ah.HTMLDocument(document_path)
17
18# Work with the document
19
20# Save the document to a file
21document.save(save_path)

In the example above, the Python code sets up directories and paths for reading an HTML document from the “data” directory and saving an edited version to the “output” directory. It initializes the HTML document from the specified file, processes it, and then saves the edited document to the designated output path.

Load HTML from a URL

One of the most used features of the Internet is the ability to select files and interact with them on the user’s local device. In the next Python code snippet, you can see how to load a web page into the HTMLDocument.

In case if you pass a wrong URL that can’t be reached right at the moment, the library throws the PlatformException with specialized code NetworkError to inform you that the selected resource can not be found.

1# Load HTML from a URL using Python
2
3import aspose.html as ah
4
5# Load a document from the specified web page
6document = ah.HTMLDocument("https://docs.aspose.com/html/files/aspose.html")
7
8# Write the document content to the output stream
9print(document.document_element.outer_html)

In the example above, we have specified document.html file to load from the URL.

Load from HTML Code

In case your HTML code has the linked resources (styles, scripts, images, etc.), you need to pass a valid base_uri parameter to the constructor of the document. It will be used to resolve the location of the resource during the document loading.

Load HTML from a String

You can create a document from string content using one of the HTMLDocument() constructors. If you want to create a document from a user string directly in your code and save it to a file, the following example could help you. We create an HTML document that contains the text “Hello, World!”

 1# Create HTML from a string using Python
 2
 3import os
 4import aspose.html as ah
 5
 6# Prepare HTML code
 7html_code = "<p>Hello, World!</p>"
 8
 9# Setup output directory
10output_dir = "output"
11if not os.path.exists(output_dir):
12    os.makedirs(output_dir)
13
14# Initialize a document from the string variable
15document = ah.HTMLDocument(html_code, ".")
16
17# Save the document to disk
18document.save(os.path.join(output_dir, "create-html-from-string.html"))

Load HTML from a Stream

If you prepare an HTML code as an in-memory io.BytesIO objects, you don’t need to save them to the file, simply pass your HTML code into specialized constructors. In the following example, to create an HTML document from a stream, we will use the HTMLDocument(content, base_uri) constructor:

 1# Load HTML from a stream using Python
 2
 3import os
 4import io
 5import aspose.html as ah
 6
 7# Prepare an output path for saving the document
 8output_dir = "output"
 9if not os.path.exists(output_dir):
10    os.makedirs(output_dir)
11
12# Use BytesIO instead of StringIO
13content_stream = io.BytesIO(b"<p>Hello, World!</p>")
14base_uri = "."
15# config = Configuration()
16
17# Initialize a document from the content stream
18document = ah.HTMLDocument(content_stream, base_uri)
19
20# Save the document to a disk
21document.save(os.path.join(output_dir, "load-from-stream.html"))

io.BytesIO creates a stream object that resides entirely in memory. This is useful for temporary data storage without needing to write to disk.

SVG Document

Scalable Vector Graphics (SVG) is part of the W3C standards and can be embedded within an HTMLDocument; we have implemented the SVGDocument class with full SVG functionality based on the official SVG2 specification. This allows you to load, read, and manipulate SVG documents in accordance with the standard.

Since both SVGDocument and HTMLDocument are based on the same WHATWG DOM standard, operations such as loading, reading, editing, converting, and saving are similar for both types of documents. Therefore, any examples demonstrating manipulation with HTMLDocument also apply to SVGDocument.

You can create a document from string content using the appropriate SVGDocument() constructor. If you want to load an SVG Document from a content_stream variable in memory and don’t need to save it to a file, the example below shows how to do it:

 1# Load SVG from a string using Python
 2
 3import io
 4import aspose.html.dom.svg as ahsvg
 5
 6# Initialize an SVG document from a string object
 7svg_content = "<svg xmlns='http://www.w3.org/2000/svg'><circle cx='50' cy='50' r='40'/></svg>"
 8base_uri = "."
 9content_stream = io.BytesIO(svg_content.encode('utf-8'))
10
11document = ahsvg.SVGDocument(content_stream, base_uri)
12
13# Write the document content to the output stream
14print(document.document_element.outer_html)
15
16# Save the document to a disk
17document.save("load-from-stream.svg")

In the example above, we have created the SVG document that contains a circle with a radius of 50 pixels. You can learn more about working with SVG documents from the Aspose.SVG for Python via .NET documentation.

MHTML Document

MHTML is a format for archiving web pages. It bundles the HTML content of a web page along with all associated resources, such as CSS, JavaScript, images, and audio files, into a single file. MHTML is commonly used by web developers to save a snapshot of a web page for archival purposes. The Aspose.HTML Python library supports only rendering/converting MHTML files to various output formats. For more information, refer to the Converting Between Formats article.

EPUB Document

EPUB is a widely supported format for eBooks and electronic publications, compatible with most reading devices, including smartphones, tablets, and computers. Similar to MHTML, Aspose.HTML supports only rendering EPUB files to various output formats. For further details, see the Converting Between Formats article.

See Also

  • Download the Aspose.HTML for Python via .NET library to successfully, quickly, and easily manipulate your HTML documents. The Python library can create, modify, extract data, convert, and render HTML documents without the need for external software. It supports popular file formats such as EPUB, MHTML, XML, SVG, and Markdown and can render to PDF, DOCX, XPS, and Image file formats.

  • You can download the complete examples and data files from GitHub.

  • Aspose.HTML offers a free online HTML Converter for converting HTML documents to a variety of popular formats. Just load HTML from a file or URL, choose the format to convert, and you’re done. It’s fast and completely free!

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.