Create a Document in Python – Create or Load HTML, SVG, MHTML, EPUB
This article offers a detailed guide on how to create an HTML document using Aspose.HTML for Python via .NET API. The API provides the HTMLDocument class, which is the root of the HTML hierarchy and holds the entire content. This class has a set of constructors that allow you to create or load HTML documents in different ways. HTML documents can be created from scratch as an empty document with an HTML structure, from a string, from a memory stream, or loaded from a file or URL.
HTML Document
The
HTMLDocument is a starting point for Aspose.HTML Python library. You can load the HTML document into the Document Object Model (DOM) by using one of the HTMLDocument()
constructors and then programmatically read, modify the document tree, add and remove nodes, change the node properties in the document as it is described in the official specifications.
The HTMLDocument
class provides an in-memory representation of an HTML DOM that is fully compliant with
W3C DOM and
WHATWG DOM specifications. If you are familiar with the WHATWG DOM,
WHATWG HTML, and
JavaScript standards, using the Aspose.HTML for Python via .NET API will be quite comfortable and easy.
Create an Empty HTML Document
The following Python code snippet shows the usage of the default HTMLDocument() constructor to create an empty HTML document and save it to a file.
1import os
2from aspose.html import *
3
4# Setup an output directory and prepare a path to save the document
5output_dir = "output/"
6if not os.path.exists(output_dir):
7 os.makedirs(output_dir)
8
9save_path = os.path.join(output_dir, "document-empty.html")
10
11# Initialize an empty HTML document
12document = HTMLDocument()
13
14# Work with the document here...
15
16# Save the document to a file
17document.save(os.path.join(save_path))
After the creation, the file document-empty.html appears with the initial document structure: the empty document includes elements such as <html>
<head>
and <body>
. Once the document object is created, it can be filled later with HTML elements.
Create a New HTML Document
If you want to create an HTML document programmatically from scratch, please use constructor without parameters as specified in the following code snippet:
1import os
2from aspose.html import *
3
4# Prepare an output path for saving the document
5output_dir = "output/"
6if not os.path.exists(output_dir):
7 os.makedirs(output_dir)
8
9document_path = os.path.join(output_dir, "create-new-document.html")
10
11# Initialize an empty HTML document
12with HTMLDocument() as document:
13 # Create a text node and add it to the document
14 text = document.create_text_node("Hello, World!")
15 document.body.append_child(text)
16
17 # Save the document to a file
18 document.save(document_path)
In the new document, we have created a text node, given the specified string, using the
create_text_node() method and added it to the <body>
element using
append_child() method.
How to edit an HTML file is described in detail in the Edit HTML Document article.
More details about HTML files saving are in the Save HTML Document article.
Load HTML from a File
If you require to load an existing HTML file from a file, work and save it, then the following code snippet will help you:
1import os
2from aspose.html import *
3
4# Setup directories and define paths
5output_dir = "output/"
6input_dir = "data/"
7if not os.path.exists(output_dir):
8 os.makedirs(output_dir)
9
10document_path = os.path.join(input_dir, "document.html")
11save_path = os.path.join(output_dir, "document-edited.html")
12
13# Initialize a document from a file
14document = HTMLDocument(document_path)
15
16# Work with the document
17
18# Save the document to disk
19document.save(os.path.join(save_path))
In the example above, the Python code sets up directories and paths for reading an HTML document from the “data” directory and saving an edited version to the “output” directory. It initializes the HTML document from the specified file, processes it, and then saves the edited document to the designated output path.
Load HTML from a URL
One of the most used features of the Internet is the ability to select files and interact with them on the user’s local device. In the next Python code snippet, you can see how to load a web page into the HTMLDocument.
In case if you pass a wrong URL that can’t be reached right at the moment, the library throws the PlatformException with specialized code ‘NetworkError’ to inform you that the selected resource can not be found.
1from aspose.html import *
2
3# Load a document from the URL
4document = HTMLDocument("https://docs.aspose.com/html/net/creating-a-document/document.html")
5
6# Write the document content to the output stream
7print(document.document_element.outer_html)
In the example above, we have specified document.html file to load from the URL.
Load from HTML Code
In case your HTML code has the linked resources (styles, scripts, images, etc.), you need to pass a valid base_uri
parameter to the constructor of the document. It will be used to resolve the location of the resource during the document loading.
Load HTML from a String
You can create a document from string content using one of the HTMLDocument() constructors. If you want to create a document from a user string directly in your code and save it to a file, the following example could help you. We create an HTML document that contains the text “Hello, World!”
1import os
2from aspose.html import *
3
4# Prepare HTML code
5html_code = "<p>Hello, World!</p>"
6
7# Setup output directory
8output_dir = "output/"
9if not os.path.exists(output_dir):
10 os.makedirs(output_dir)
11
12# Initialize a document from the string variable
13document = HTMLDocument(html_code, ".")
14
15# Save the document to disk
16document.save(os.path.join(output_dir, "create-html-from-string.html"))
Load HTML from a Stream
If you prepare an HTML code as an in-memory
io.BytesIO objects, you don’t need to save them to the file, simply pass your HTML code into specialized constructors. In the following example, to create an HTML document from a stream, we will use the HTMLDocument(content, base_uri)
constructor:
1import os
2import io
3from aspose.html import *
4
5# Prepare an output path for saving the document
6output_dir = "output/"
7if not os.path.exists(output_dir):
8 os.makedirs(output_dir)
9
10# Use BytesIO and pass a bytes string to it.
11content_stream = io.BytesIO(b"<p>Hello, World!</p>")
12base_uri = "."
13
14# Initialize a document from the content stream
15document = HTMLDocument(content_stream, base_uri)
16
17# Save the document to a file
18document.save(os.path.join(output_dir, "load-from-stream.html"))
io.BytesIO
creates a stream object that resides entirely in memory. This is useful for temporary data storage without needing to write to disk.
SVG Document
Scalable Vector Graphics (SVG) is part of the W3C standards and can be embedded within an HTMLDocument; we have implemented the SVGDocument class with full SVG functionality based on the official SVG2 specification. This allows you to load, read, and manipulate SVG documents in accordance with the standard.
Since both SVGDocument
and HTMLDocument
are based on the same
WHATWG DOM standard, operations such as loading, reading, editing, converting, and saving are similar for both types of documents. Therefore, any examples demonstrating manipulation with HTMLDocument
also apply to SVGDocument
.
You can create a document from string content using the appropriate
SVGDocument() constructor. If you want to load an SVG Document from a content_stream
variable in memory and don’t need to save it to a file, the example below shows how to do it:
1import io
2from aspose.html.dom.svg import *
3
4# Initialize an SVG document from a string object
5svg_content = "<svg xmlns="http://www.w3.org/2000/svg"><circle cx="50" cy="50" r="50"/></svg>"
6base_uri = "."
7content_stream = io.BytesIO(svg_content.encode("utf-8"))
8
9document = SVGDocument(content_stream, base_uri)
10
11# Write the document content to the output stream
12print(document.document_element.outer_html)
In the example above, we have created the SVG document that contains a circle with a radius of 50 pixels. You can learn more about working with SVG documents from the Aspose.SVG for Python via .NET documentation.
MHTML Document
MHTML is a format for archiving web pages. It bundles the HTML content of a web page along with all associated resources, such as CSS, JavaScript, images, and audio files, into a single file. MHTML is commonly used by web developers to save a snapshot of a web page for archival purposes. The Aspose.HTML Python library supports only rendering/converting MHTML files to various output formats. For more information, refer to the Converting Between Formats article.
EPUB Document
EPUB is a widely supported format for eBooks and electronic publications, compatible with most reading devices, including smartphones, tablets, and computers. Similar to MHTML, Aspose.HTML supports only rendering EPUB files to various output formats. For further details, see the Converting Between Formats article.
Download the Aspose.HTML for Python via .NET library to successfully, quickly, and easily manipulate your HTML documents. The Python library can create, modify, extract data, convert, and render HTML documents without the need for external software. It supports popular file formats such as EPUB, MHTML, XML, SVG, and Markdown and can render to PDF, DOCX, XPS, and Image file formats.
Aspose.HTML offers a free online HTML Converter for converting HTML documents to a variety of popular formats. Just load HTML from a file or URL, choose the format to convert, and you’re done. It’s fast and completely free!