Edit HTML Document in Python
DOM namespace
Aspose.HTML for Python via .NET allows you to access and manipulate the HTML DOM (Document Object Model) in Python language. The aspose.html.dom namespace provides an API for interacting with HTML, XML, and SVG documents. It represents the document as a node tree, with each node representing a part of the document, such as an element, text string, or comment. The namespace includes classes like Attr, CharacterData, Comment, and more, each serving specific purposes within the document model.
The Document class represents the entire HTML, XML, or SVG document and serves as the root of the document tree. Other classes like Element, Node, DocumentFragment, and EventTarget provide access to different parts of the document and allow for manipulation and interaction with the document’s data. The API is based on the WHATWG DOM standard. So, it is easy to use Aspose.HTML for Python via .NET having a basic knowledge of HTML and JavaScript languages.
Edit HTML
To edit an HTML document using the DOM tree, use the HTMLDocument class, which represents the entire document. There are many ways you can edit HTML by using our Python library. You can modify the document by inserting new nodes, removing, or editing the content of existing nodes. If you need to create a new element or node, the following methods are ones that need to be invoked:
- create_element(
local_name
) for generating new elements like<style>
and<p>
; - get_elements_by_tag_name(
tagname
) to retrieve a list of existing elements of a given tagname; - get_element_by_id(
element_id
) to return the first element with a specified ID attribute with the given value; - create_text_node(
data
) for adding textual content.
To manipulate element attributes, use the methods:
- set_attribute(
qualified_name, value
) adds a new attribute and sets its value. If an attribute with that name is already present in the element, its value is changed to that of thevalue
parameter; - get_attribute(
qualified_name
) retrieves the value of an attribute by name.
Once you have new nodes are created, there are several methods in the DOM that can help you to insert nodes into the document tree. The following list describes the most common way of inserting or removing nodes:
- The
append_child(
node
) method is used to add elements or nodes to existing elements; - The
insert_before(
node, child
) method inserts thenode
before the existing child nodechild
. If achild
is null, insert anode
at the end of the list of children. - The remove() method removes this instance from the HTML DOM tree.
- The
remove_child(
child
) method removes the child node from the list of children.
For a complete list of classes and methods represented in the DOM namespace please visit API Reference Source.
Edit a Document Tree
Aspose.HTML Python API supports a set of HTML elements that are defined in HTML Standard, along with rules about how the elements can be nested. Consider simple steps to create HTML from scratch and edit it using a DOM tree and the functional mentioned above. The document will contain a text paragraph with an id
attribute:
- Use the HTMLDocument() constructor to create an instance of an HTML document.
- The
body
property of the HTMLDocument class points to the element of the document. - Employ the
create_element() method of
HTMLDocument
to create a paragraph element<p>
. - Use the
set_attribute() method to set the
id
attribute for the paragraph element. - Create a text node with the create_text_node() method.
- Add this text node to the paragraph element using the append_child() method.
- Add the paragraph element into the
<body>
of the document using theappend_child()
method. - Save the HTML document to a file.
This code demonstrates how to create a basic HTML document programmatically, add a paragraph with an attribute and text, and save the resulting HTML to a file:
1# Edit HTML document using DOM Tree in Python
2
3import os
4import aspose.html as ah
5
6# Define the output directory and file path
7output_dir = "output/"
8output_path = os.path.join(output_dir, "edit-document-tree.html")
9
10# Ensure the output directory exists
11os.makedirs(output_dir, exist_ok=True)
12
13# Create an instance of an HTML document
14document = ah.HTMLDocument()
15
16# Access the document <body> element
17body = document.body
18
19# Create a paragraph element <p>
20p = document.create_element("p")
21
22# Set a custom attribute
23p.set_attribute("id", "my-paragraph")
24
25# Create a text node
26text = document.create_text_node("The Aspose.Html.Dom namespace provides an API for representing and interfacing with HTML, XML, or SVG documents.")
27
28# Add the text to the paragraph
29p.append_child(text)
30
31# Attach the paragraph to the document body
32body.append_child(p)
33
34# Save the HTML document to a file
35document.save(output_path)
Let’s look at creating a more complex HTML document. In the following code snippet, we will construct an HTML document from scratch, create new elements, populate them with content, and add them to the HTML document structure. The HTML DOM model and the functionality mentioned above will help us with this.
- Use the
HTMLDocument()
constructor to initialize a new HTMLDocument object, which represents the HTML document. - Employ the
create_element() method of
HTMLDocument
to generate a<style>
element. - Assign CSS rules to the style element using the text_content property.
- Retrieve the
<head>
element using the get_elements_by_tag_name() method. - Append the
<style>
element to the<head>
element using the append_child() method. - Use the
create_element() method to create a
<p>
paragraph element. - Set the
class_name
property of the paragraph to apply the desired CSS styles. - Create a text node with the create_text_node() method.
- Append this text node to the paragraph element using the
append_child()
method. - Add the paragraph element into the
<body>
of the document using theappend_child()
method on thebody
property. - Call the
save()
method ofHTMLDocument
to save the document to the specified HTML file path.
1# Create and add new HTML elements using Python
2
3import os
4import aspose.html as ah
5
6# Define output directory and file paths
7output_dir = "output/"
8save_path = os.path.join(output_dir, "edit-document.html")
9
10# Ensure the output directory exists
11os.makedirs(output_dir, exist_ok=True)
12
13# Create an instance of an HTML document
14document = ah.HTMLDocument()
15
16# Create a style element and set the teal color for elements with class "col"
17style = document.create_element("style")
18style.text_content = ".col { color: teal }"
19
20# Find the document <head> element and append the <style> element
21head = document.get_elements_by_tag_name("head")[0]
22head.append_child(style)
23
24# Create a paragraph <p> element with class "col"
25p = document.create_element("p")
26p.class_name = "col"
27
28# Create a text node
29text = document.create_text_node("Edit HTML document")
30
31# Append the text node to the paragraph
32p.append_child(text)
33
34# Append the paragraph to the document <body>
35document.body.append_child(p)
36
37# Save the HTML document to a file
38document.save(save_path)
The API Reference Source provides a comprehensive list of classes and methods in the DOM namespace.
Using inner_html
and outer_html
properties
Working with DOM objects provides a powerful way to manipulate an HTML document in Python. However, in some cases, it can be more convenient to work directly with strings. The inner_html
and outer_html
properties are used to access and manipulate HTML content in a document, but they differ in what they represent and how they are used:
- The inner_html property represents the HTML content inside an element, excluding the element’s own start and end tags.
- The outer_html property represents all of an element’s HTML content, including its own start and end tags.
The following code snippet shows you how to use the inner_html
and outer_html
properties of the
Element class to edit HTML.
1# Edit HTML body content and get modified document as a string using Python
2
3import aspose.html as ah
4
5# Create an instance of an HTML document
6document = ah.HTMLDocument()
7
8# Write the content of the HTML document to the console
9print(document.document_element.outer_html) # output: <html><head></head><body></body></html>
10
11# Set the content of the body element
12document.body.inner_html = "<p>HTML is the standard markup language for Web pages.</p>"
13
14# Find the document <p> element
15p = document.get_elements_by_tag_name("p")[0]
16
17# Write the updated content of the HTML document to the console
18print(p.inner_html) # output: HTML is the standard markup language for Web pages.
19
20# Write the updated content of the HTML document to the console
21print(document.document_element.outer_html) # output: <html><head></head><body><p>HTML is the standard markup language for Web pages.</p></body></html>
Edit CSS
Cascading Style Sheets (CSS) is a style sheet language used to describe how web pages appear in a browser. CSS can be added to HTML documents in an inline, internal, and external way. Thus, you can define a unique style for a single HTML element using inline CSS or for multiple web pages to share formatting by specifying the relevant CSS in a separate .css file. Aspose.HTML for Python via .NET not only support CSS out-of-the-box but also gives you instruments to manipulate with document styles just on the fly before converting the HTML document to the other formats, as it follows.
Inline CSS
When CSS is written using the style
attribute inside of an HTML tag, it’s called an “inline CSS”. The inline CSS gives you to apply an individual style to one HTML element at a time. You set CSS to an HTML element by using the style
attribute with any CSS properties defined within it.
In the following code snippet, you can see how to specify CSS style properties for an HTML <p>
element.
1# How to set inline CSS styles in an HTML element using Python
2
3import os
4import aspose.html as ah
5import aspose.html.rendering.pdf as rp
6
7# Define the content of the HTML document
8content = "<p>Edit inline CSS using Aspose.HTML for Python via .NET</p>"
9
10# Create an instance of an HTML document with specified content
11document = ah.HTMLDocument(content, ".")
12
13# Find the paragraph element and set a style attribute
14paragraph = document.get_elements_by_tag_name("p")[0]
15paragraph.set_attribute("style", "font-size: 150%; font-family: arial; color: teal")
16
17# Save the HTML document to a file
18output_dir = "output/"
19os.makedirs(output_dir, exist_ok=True)
20html_path = os.path.join(output_dir, "edit-inline-css.html")
21document.save(html_path)
22
23# Create an instance of the PDF output device and render the document to this device
24pdf_path = os.path.join(output_dir, "edit-inline-css.pdf")
25with rp.PdfDevice(pdf_path) as device:
26 document.render_to(device)
In this particular example, color, font-size and font-family apply to the <p>
element. The fragment of rendered pdf page looks like this:
Internal CSS
The internal CSS styling option is popular for applying properties to individual pages by encasing all styles in the <style>
element placed it in the <head>
of HTML documents.
1# Edit HTML with internal CSS using Python
2
3import os
4import aspose.html as ah
5import aspose.html.rendering.pdf as rp
6
7
8# Define the content of the HTML document
9content = "<div><h1>Internal CSS</h1><p>An internal CSS is used to define a style for a single HTML page</p></div>"
10
11# Create an instance of an HTML document with specified content
12document = ah.HTMLDocument(content, ".")
13
14# Create a <style> element and define internal CSS rules
15style = document.create_element("style")
16style.text_content = (
17 ".frame1 { margin-top:50px; margin-left:50px; padding:25px; width:360px; height:90px; "
18 "background-color:#82011a; font-family:arial; color:#fff5ee;} \r\n"
19 ".frame2 { margin-top:-70px; margin-left:160px; text-align:center; padding:20px; width:360px; "
20 "height:100px; background-color:#ebd2d7;}"
21)
22
23# Find the <head> element and append the style element
24head = document.get_elements_by_tag_name("head")[0]
25head.append_child(style)
26
27# Find the first paragraph element and apply styles
28header = document.get_elements_by_tag_name("h1")[0]
29header.class_name = "frame1"
30
31# Update the style using the style attribute directly
32header.set_attribute("style", "font-size: 200%; text-align: center;")
33
34# Find the last paragraph element and apply styles
35paragraph = document.get_elements_by_tag_name("p")[0]
36paragraph.class_name = "frame2"
37paragraph.set_attribute("style", "color: #434343; font-size: 150%; font-family: verdana;")
38
39# Save the HTML document to a file
40output_dir = "output/"
41os.makedirs(output_dir, exist_ok=True)
42html_path = os.path.join(output_dir, "edit-internal-css.html")
43document.save(html_path)
44
45# Create an instance of the PDF output device and render the document to this device
46pdf_path = os.path.join(output_dir, "edit-internal-css.pdf")
47with rp.PdfDevice(pdf_path) as device:
48 document.render_to(device)
In this example, we use internal CSS and also declare additional style properties for individual elements using the style
attribute inside of the <h1>
and <p>
tags. The figure illustrates the fragment of rendered “edit-internal-css.pdf” file:
Download the Aspose.HTML for Python via .NET library to successfully, quickly, and easily manipulate your HTML documents. The Python library can create, modify, extract data, convert, and render HTML documents without the need for external software. It supports popular file formats such as EPUB, MHTML, XML, SVG, and Markdown and can render to PDF, DOCX, XPS, and Image file formats.