Edit HTML Document in Python

DOM namespace

Aspose.HTML for Python via .NET allows you to access and manipulate the HTML DOM (Document Object Model) in Python language. The aspose.html.dom namespace provides an API for interacting with HTML, XML, and SVG documents. It represents the document as a node tree, with each node representing a part of the document, such as an element, text string, or comment. The namespace includes classes like Attr, CharacterData, Comment, and more, each serving specific purposes within the document model.

The Document class represents the entire HTML, XML, or SVG document and serves as the root of the document tree. Other classes like Element, Node, DocumentFragment, and EventTarget provide access to different parts of the document and allow for manipulation and interaction with the document’s data. The API is based on the WHATWG DOM standard. So, it is easy to use Aspose.HTML for Python via .NET having a basic knowledge of HTML and JavaScript languages.

Edit HTML

To edit an HTML document using the DOM tree, use the HTMLDocument class, which represents the entire document. There are many ways you can edit HTML by using our Python library. You can modify the document by inserting new nodes, removing, or editing the content of existing nodes. If you need to create a new element or node, the following methods are ones that need to be invoked:

To manipulate element attributes, use the methods:

Once you have new nodes are created, there are several methods in the DOM that can help you to insert nodes into the document tree. The following list describes the most common way of inserting or removing nodes:

For a complete list of classes and methods represented in the DOM namespace please visit API Reference Source.

Edit a Document Tree

Aspose.HTML Python API supports a set of HTML elements that are defined in HTML Standard, along with rules about how the elements can be nested. Consider simple steps to create HTML from scratch and edit it using a DOM tree and the functional mentioned above. The document will contain a text paragraph with an id attribute:

This code demonstrates how to create a basic HTML document programmatically, add a paragraph with an attribute and text, and save the resulting HTML to a file:

 1import os
 2from aspose.html import *
 3
 4# Define the output directory and file path
 5output_dir = "output/"
 6output_path = os.path.join(output_dir, "edit-document-tree.html")
 7
 8# Ensure the output directory exists
 9os.makedirs(output_dir, exist_ok=True)
10
11# Create an instance of an HTML document
12document = HTMLDocument()
13
14# Access the document <body> element
15body = document.body
16
17# Create a paragraph element <p>
18p = document.create_element("p")
19
20# Set a custom attribute
21p.set_attribute("id", "my-paragraph")
22
23# Create a text node
24text = document.create_text_node("The Aspose.Html.Dom namespace provides an API for representing and interfacing with HTML, XML, or SVG documents.")
25
26# Add the text to the paragraph
27p.append_child(text)
28
29# Attach the paragraph to the document body
30body.append_child(p)
31
32# Save the HTML document to a file
33document.save(output_path)

Let’s look at creating a more complex HTML document. In the following code snippet, we will construct an HTML document from scratch, create new elements, populate them with content, and add them to the HTML document structure. The HTML DOM model and the functionality mentioned above will help us with this.

 1import os
 2from aspose.html import *
 3from aspose.html.saving import *
 4
 5# Define output directory and file paths
 6output_dir = "output/"
 7save_path = os.path.join(output_dir, "edit-document.html")
 8
 9# Ensure the output directory exists
10os.makedirs(output_dir, exist_ok=True)
11
12# Create an instance of an HTML document
13document = HTMLDocument()
14
15# Create a style element and set the teal color for elements with class "col"
16style = document.create_element("style")
17style.text_content = ".col { color: teal }"
18
19# Find the document <head> element and append the <style> element
20head = document.get_elements_by_tag_name("head")[0]
21head.append_child(style)
22
23# Create a paragraph <p> element with class "col"
24p = document.create_element("p")
25p.class_name = "col"
26
27# Create a text node
28text = document.create_text_node("Edit HTML document")
29
30# Append the text node to the paragraph
31p.append_child(text)
32
33# Append the paragraph to the document <body>
34document.body.append_child(p)
35
36# Save the HTML document to a file
37document.save(save_path)

The API Reference Source provides a comprehensive list of classes and methods in the DOM namespace.

Using inner_html and outer_html properties

Working with DOM objects provides a powerful way to manipulate an HTML document in Python. However, in some cases, it can be more convenient to work directly with strings. The inner_html and outer_html properties are used to access and manipulate HTML content in a document, but they differ in what they represent and how they are used:

  1. The inner_html property represents the HTML content inside an element, excluding the element’s own start and end tags.
  2. The outer_html property represents all of an element’s HTML content, including its own start and end tags.

The following code snippet shows you how to use the inner_html and outer_html properties of the Element class to edit HTML.

 1from aspose.html import *
 2
 3# Create an instance of an HTML document
 4document = HTMLDocument()
 5
 6# Write the content of the HTML document to the console
 7print(document.document_element.outer_html)  # output: <html><head></head><body></body></html>
 8
 9# Set the content of the body element
10document.body.inner_html = "<p>HTML is the standard markup language for Web pages.</p>"
11
12# Find the document <p> element
13p = document.get_elements_by_tag_name("p")[0]
14
15# Write the content of the <p> element to the console
16print(p.inner_html)  # output: HTML is the standard markup language for Web pages.
17
18# Write the updated content of the HTML document to the console
19print(document.document_element.outer_html)  # output: <html><head></head><body><p>HTML is the standard markup language for Web pages.</p></body></html>

Edit CSS

Cascading Style Sheets (CSS) is a style sheet language used to describe how web pages appear in a browser. CSS can be added to HTML documents in an inline, internal, and external way. Thus, you can define a unique style for a single HTML element using inline CSS or for multiple web pages to share formatting by specifying the relevant CSS in a separate .css file. Aspose.HTML for Python via .NET not only support CSS out-of-the-box but also gives you instruments to manipulate with document styles just on the fly before converting the HTML document to the other formats, as it follows.

Inline CSS

When CSS is written using the style attribute inside of an HTML tag, it’s called an “inline CSS”. The inline CSS gives you to apply an individual style to one HTML element at a time. You set CSS to an HTML element by using the style attribute with any CSS properties defined within it. In the following code snippet, you can see how to specify CSS style properties for an HTML <p> element.

 1import os
 2from aspose.html import *
 3from aspose.html.rendering.pdf import *
 4
 5# Define the content of the HTML document
 6content = "<p>Edit inline CSS using Aspose.HTML for Python via .NET</p>"
 7
 8# Create an instance of an HTML document with specified content
 9document = HTMLDocument(content, ".")
10
11# Find the paragraph element and set a style attribute
12paragraph = document.get_elements_by_tag_name("p")[0]
13paragraph.set_attribute("style", "font-size: 150%; font-family: arial; color: teal")
14
15# Save the HTML document to a file
16output_dir = "output/"
17os.makedirs(output_dir, exist_ok=True)
18html_path = os.path.join(output_dir, "edit-inline-css.html")
19document.save(html_path)
20
21# Create an instance of the PDF output device and render the document to this device
22pdf_path = os.path.join(output_dir, "edit-inline-css.pdf")
23with PdfDevice(pdf_path) as device:
24    document.render_to(device)

In this particular example, color, font-size and font-family apply to the <p> element. The fragment of rendered pdf page looks like this:

Text “Edit inline CSS”

Internal CSS

The internal CSS styling option is popular for applying properties to individual pages by encasing all styles in the <style> element placed it in the <head> of HTML documents.

 1import os
 2from aspose.html import *
 3from aspose.html.rendering.pdf import *
 4
 5
 6# Define the content of the HTML document
 7content = "<div><h1>Internal CSS</h1><p>An internal CSS is used to define a style for a single HTML page</p></div>"
 8
 9# Create an instance of an HTML document with specified content
10document = HTMLDocument(content, ".")
11
12# Create a <style> element and define internal CSS rules
13style = document.create_element("style")
14style.text_content = (
15    ".frame1 { margin-top:50px; margin-left:50px; padding:25px; width:360px; height:90px; "
16    "background-color:#82011a; font-family:arial; color:#fff5ee;} \r\n"
17    ".frame2 { margin-top:-70px; margin-left:160px; text-align:center; padding:20px; width:360px; "
18    "height:100px; background-color:#ebd2d7;}"
19)
20
21# Find the <head> element and append the <style> element
22head = document.get_elements_by_tag_name("head")[0]
23head.append_child(style)
24
25# Find the <h1> element and apply styles
26header = document.get_elements_by_tag_name("h1")[0]
27header.class_name = "frame1"
28
29# Update the style of <h1> using the style attribute directly
30header.set_attribute("style", "font-size: 200%; text-align: center;")
31
32# Find the paragraph element and apply styles
33paragraph = document.get_elements_by_tag_name("p")[0]
34paragraph.class_name = "frame2"
35paragraph.set_attribute("style", "color: #434343; font-size: 150%; font-family: verdana;")
36
37# Save the HTML document to a file
38output_dir = "output/"
39os.makedirs(output_dir, exist_ok=True)
40html_path = os.path.join(output_dir, "edit-internal-css.html")
41document.save(html_path)
42
43# Create an instance of the PDF output device and render the document to this device
44pdf_path = os.path.join(output_dir, "edit-internal-css.pdf")
45with PdfDevice(pdf_path) as device:
46    document.render_to(device)

In this example, we use internal CSS and also declare additional style properties for individual elements using the style attribute inside of the <h1> and <p> tags. The figure illustrates the fragment of rendered “edit-internal-css.pdf” file:

Text “Edit internal CSS”

Download the Aspose.HTML for Python via .NET library to successfully, quickly, and easily manipulate your HTML documents. The Python library can create, modify, extract data, convert, and render HTML documents without the need for external software. It supports popular file formats such as EPUB, MHTML, XML, SVG, and Markdown and can render to PDF, DOCX, XPS, and Image file formats.

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.