Edit HTML Document in Python

DOM namespace

Aspose.HTML for Python via .NET allows you to access and manipulate the HTML DOM (Document Object Model) in Python language. The aspose.html.dom namespace provides an API for interacting with HTML, XML, and SVG documents. It represents the document as a node tree, with each node representing a part of the document, such as an element, text string, or comment. The namespace includes classes like Attr, CharacterData, Comment, and more, each serving specific purposes within the document model.

The Document class represents the entire HTML, XML, or SVG document and serves as the root of the document tree. Other classes like Element, Node, DocumentFragment, and EventTarget provide access to different parts of the document and allow for manipulation and interaction with the document’s data. The API is based on the WHATWG DOM standard. So, it is easy to use Aspose.HTML for Python via .NET having a basic knowledge of HTML and JavaScript languages.

Edit HTML

To edit an HTML document using the DOM tree, use the HTMLDocument class, which represents the entire document. There are many ways you can edit HTML by using our Python library. You can modify the document by inserting new nodes, removing, or editing the content of existing nodes. If you need to create a new element or node, the following methods are ones that need to be invoked:

To manipulate element attributes, use the methods:

Once you have new nodes are created, there are several methods in the DOM that can help you to insert nodes into the document tree. The following list describes the most common way of inserting or removing nodes:

For a complete list of classes and methods represented in the DOM namespace please visit API Reference Source.

Edit a Document Tree

Aspose.HTML Python API supports a set of HTML elements that are defined in HTML Standard, along with rules about how the elements can be nested. Consider simple steps to create HTML from scratch and edit it using a DOM tree and the functional mentioned above. The document will contain a text paragraph with an id attribute:

This code demonstrates how to create a basic HTML document programmatically, add a paragraph with an attribute and text, and save the resulting HTML to a file:

 1# Edit HTML document using DOM Tree in Python
 2
 3import os
 4import aspose.html as ah
 5
 6# Define the output directory and file path
 7output_dir = "output/"
 8output_path = os.path.join(output_dir, "edit-document-tree.html")
 9
10# Ensure the output directory exists
11os.makedirs(output_dir, exist_ok=True)
12
13# Create an instance of an HTML document
14document = ah.HTMLDocument()
15
16# Access the document <body> element
17body = document.body
18
19# Create a paragraph element <p>
20p = document.create_element("p")
21
22# Set a custom attribute
23p.set_attribute("id", "my-paragraph")
24
25# Create a text node
26text = document.create_text_node("The Aspose.Html.Dom namespace provides an API for representing and interfacing with HTML, XML, or SVG documents.")
27
28# Add the text to the paragraph
29p.append_child(text)
30
31# Attach the paragraph to the document body
32body.append_child(p)
33
34# Save the HTML document to a file
35document.save(output_path)

Let’s look at creating a more complex HTML document. In the following code snippet, we will construct an HTML document from scratch, create new elements, populate them with content, and add them to the HTML document structure. The HTML DOM model and the functionality mentioned above will help us with this.

 1# Create and add new HTML elements using Python
 2
 3import os
 4import aspose.html as ah
 5
 6# Define output directory and file paths
 7output_dir = "output/"
 8save_path = os.path.join(output_dir, "edit-document.html")
 9
10# Ensure the output directory exists
11os.makedirs(output_dir, exist_ok=True)
12
13# Create an instance of an HTML document
14document = ah.HTMLDocument()
15
16# Create a style element and set the teal color for elements with class "col"
17style = document.create_element("style")
18style.text_content = ".col { color: teal }"
19
20# Find the document <head> element and append the <style> element
21head = document.get_elements_by_tag_name("head")[0]
22head.append_child(style)
23
24# Create a paragraph <p> element with class "col"
25p = document.create_element("p")
26p.class_name = "col"
27
28# Create a text node
29text = document.create_text_node("Edit HTML document")
30
31# Append the text node to the paragraph
32p.append_child(text)
33
34# Append the paragraph to the document <body>
35document.body.append_child(p)
36
37# Save the HTML document to a file
38document.save(save_path)

The API Reference Source provides a comprehensive list of classes and methods in the DOM namespace.

Using inner_html and outer_html properties

Working with DOM objects provides a powerful way to manipulate an HTML document in Python. However, in some cases, it can be more convenient to work directly with strings. The inner_html and outer_html properties are used to access and manipulate HTML content in a document, but they differ in what they represent and how they are used:

  1. The inner_html property represents the HTML content inside an element, excluding the element’s own start and end tags.
  2. The outer_html property represents all of an element’s HTML content, including its own start and end tags.

The following code snippet shows you how to use the inner_html and outer_html properties of the Element class to edit HTML.

 1# Edit HTML body content and get modified document as a string using Python
 2
 3import aspose.html as ah
 4
 5# Create an instance of an HTML document
 6document = ah.HTMLDocument()
 7
 8# Write the content of the HTML document to the console
 9print(document.document_element.outer_html)  # output: <html><head></head><body></body></html>
10
11# Set the content of the body element
12document.body.inner_html = "<p>HTML is the standard markup language for Web pages.</p>"
13
14# Find the document <p> element
15p = document.get_elements_by_tag_name("p")[0]
16
17# Write the updated content of the HTML document to the console
18print(p.inner_html)  # output: HTML is the standard markup language for Web pages.
19
20# Write the updated content of the HTML document to the console
21print(document.document_element.outer_html)  # output: <html><head></head><body><p>HTML is the standard markup language for Web pages.</p></body></html>

Edit CSS

Cascading Style Sheets (CSS) is a style sheet language used to describe how web pages appear in a browser. CSS can be added to HTML documents in an inline, internal, and external way. Thus, you can define a unique style for a single HTML element using inline CSS or for multiple web pages to share formatting by specifying the relevant CSS in a separate .css file. Aspose.HTML for Python via .NET not only support CSS out-of-the-box but also gives you instruments to manipulate with document styles just on the fly before converting the HTML document to the other formats, as it follows.

Inline CSS

When CSS is written using the style attribute inside of an HTML tag, it’s called an “inline CSS”. The inline CSS gives you to apply an individual style to one HTML element at a time. You set CSS to an HTML element by using the style attribute with any CSS properties defined within it. In the following code snippet, you can see how to specify CSS style properties for an HTML <p> element.

 1# How to set inline CSS styles in an HTML element using Python
 2
 3import os
 4import aspose.html as ah
 5import aspose.html.rendering.pdf as rp
 6
 7# Define the content of the HTML document
 8content = "<p>Edit inline CSS using Aspose.HTML for Python via .NET</p>"
 9
10# Create an instance of an HTML document with specified content
11document = ah.HTMLDocument(content, ".")
12
13# Find the paragraph element and set a style attribute
14paragraph = document.get_elements_by_tag_name("p")[0]
15paragraph.set_attribute("style", "font-size: 150%; font-family: arial; color: teal")
16
17# Save the HTML document to a file
18output_dir = "output/"
19os.makedirs(output_dir, exist_ok=True)
20html_path = os.path.join(output_dir, "edit-inline-css.html")
21document.save(html_path)
22
23# Create an instance of the PDF output device and render the document to this device
24pdf_path = os.path.join(output_dir, "edit-inline-css.pdf")
25with rp.PdfDevice(pdf_path) as device:
26    document.render_to(device)
Example-EditInlineCss.py hosted with ❤ by GitHub

In this particular example, color, font-size and font-family apply to the <p> element. The fragment of rendered pdf page looks like this:

Text “Edit inline CSS”

Internal CSS

The internal CSS styling option is popular for applying properties to individual pages by encasing all styles in the <style> element placed it in the <head> of HTML documents.

 1# Edit HTML with internal CSS using Python
 2
 3import os
 4import aspose.html as ah
 5import aspose.html.rendering.pdf as rp
 6
 7
 8# Define the content of the HTML document
 9content = "<div><h1>Internal CSS</h1><p>An internal CSS is used to define a style for a single HTML page</p></div>"
10
11# Create an instance of an HTML document with specified content
12document = ah.HTMLDocument(content, ".")
13
14# Create a <style> element and define internal CSS rules
15style = document.create_element("style")
16style.text_content = (
17    ".frame1 { margin-top:50px; margin-left:50px; padding:25px; width:360px; height:90px; "
18    "background-color:#82011a; font-family:arial; color:#fff5ee;} \r\n"
19    ".frame2 { margin-top:-70px; margin-left:160px; text-align:center; padding:20px; width:360px; "
20    "height:100px; background-color:#ebd2d7;}"
21)
22
23# Find the <head> element and append the style element
24head = document.get_elements_by_tag_name("head")[0]
25head.append_child(style)
26
27# Find the first paragraph element and apply styles
28header = document.get_elements_by_tag_name("h1")[0]
29header.class_name = "frame1"
30
31# Update the style using the style attribute directly
32header.set_attribute("style", "font-size: 200%; text-align: center;")
33
34# Find the last paragraph element and apply styles
35paragraph = document.get_elements_by_tag_name("p")[0]
36paragraph.class_name = "frame2"
37paragraph.set_attribute("style", "color: #434343; font-size: 150%; font-family: verdana;")
38
39# Save the HTML document to a file
40output_dir = "output/"
41os.makedirs(output_dir, exist_ok=True)
42html_path = os.path.join(output_dir, "edit-internal-css.html")
43document.save(html_path)
44
45# Create an instance of the PDF output device and render the document to this device
46pdf_path = os.path.join(output_dir, "edit-internal-css.pdf")
47with rp.PdfDevice(pdf_path) as device:
48    document.render_to(device)

In this example, we use internal CSS and also declare additional style properties for individual elements using the style attribute inside of the <h1> and <p> tags. The figure illustrates the fragment of rendered “edit-internal-css.pdf” file:

Text “Edit internal CSS”

Download the Aspose.HTML for Python via .NET library to successfully, quickly, and easily manipulate your HTML documents. The Python library can create, modify, extract data, convert, and render HTML documents without the need for external software. It supports popular file formats such as EPUB, MHTML, XML, SVG, and Markdown and can render to PDF, DOCX, XPS, and Image file formats.

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.