Edit HTML Document in Python
DOM namespace
Aspose.HTML for Python via .NET allows you to access and manipulate the HTML DOM (Document Object Model) in Python language. The aspose.html.dom namespace provides an API for interacting with HTML, XML, and SVG documents. It represents the document as a node tree, with each node representing a part of the document, such as an element, text string, or comment. The namespace includes classes like Attr, CharacterData, Comment, and more, each serving specific purposes within the document model.
The Document class represents the entire HTML, XML, or SVG document and serves as the root of the document tree. Other classes like Element, Node, DocumentFragment, and EventTarget provide access to different parts of the document and allow for manipulation and interaction with the document’s data. The API is based on the WHATWG DOM standard. So, it is easy to use Aspose.HTML for Python via .NET having a basic knowledge of HTML and JavaScript languages.
Edit HTML
To edit an HTML document using the DOM tree, use the HTMLDocument class, which represents the entire document. There are many ways you can edit HTML by using our Python library. You can modify the document by inserting new nodes, removing, or editing the content of existing nodes. If you need to create a new element or node, the following methods are ones that need to be invoked:
- create_element(
local_name
) for generating new elements like<style>
and<p>
; - get_elements_by_tag_name(
tagname
) to retrieve a list of existing elements of a given tagname; - get_element_by_id(
element_id
) to return the first element with a specified ID attribute with the given value; - create_text_node(
data
) for adding textual content.
To manipulate element attributes, use the methods:
- set_attribute(
qualified_name, value
) adds a new attribute and sets its value. If an attribute with that name is already present in the element, its value is changed to that of thevalue
parameter; - get_attribute(
qualified_name
) retrieves the value of an attribute by name.
Once you have new nodes are created, there are several methods in the DOM that can help you to insert nodes into the document tree. The following list describes the most common way of inserting or removing nodes:
- The
append_child(
node
) method is used to add elements or nodes to existing elements; - The
insert_before(
node, child
) method inserts thenode
before the existing child nodechild
. If achild
is null, insert anode
at the end of the list of children. - The remove() method removes this instance from the HTML DOM tree.
- The
remove_child(
child
) method removes the child node from the list of children.
For a complete list of classes and methods represented in the DOM namespace please visit API Reference Source.
Edit a Document Tree
Aspose.HTML Python API supports a set of HTML elements that are defined in HTML Standard, along with rules about how the elements can be nested. Consider simple steps to create HTML from scratch and edit it using a DOM tree and the functional mentioned above. The document will contain a text paragraph with an id
attribute:
- Use the HTMLDocument() constructor to create an instance of an HTML document.
- The
body
property of the HTMLDocument class points to the element of the document. - Employ the
create_element() method of
HTMLDocument
to create a paragraph element<p>
. - Use the
set_attribute() method to set the
id
attribute for the paragraph element. - Create a text node with the create_text_node() method.
- Add this text node to the paragraph element using the append_child() method.
- Add the paragraph element into the
<body>
of the document using theappend_child()
method. - Save the HTML document to a file.
This code demonstrates how to create a basic HTML document programmatically, add a paragraph with an attribute and text, and save the resulting HTML to a file:
1import os
2from aspose.html import *
3
4# Define the output directory and file path
5output_dir = "output/"
6output_path = os.path.join(output_dir, "edit-document-tree.html")
7
8# Ensure the output directory exists
9os.makedirs(output_dir, exist_ok=True)
10
11# Create an instance of an HTML document
12document = HTMLDocument()
13
14# Access the document <body> element
15body = document.body
16
17# Create a paragraph element <p>
18p = document.create_element("p")
19
20# Set a custom attribute
21p.set_attribute("id", "my-paragraph")
22
23# Create a text node
24text = document.create_text_node("The Aspose.Html.Dom namespace provides an API for representing and interfacing with HTML, XML, or SVG documents.")
25
26# Add the text to the paragraph
27p.append_child(text)
28
29# Attach the paragraph to the document body
30body.append_child(p)
31
32# Save the HTML document to a file
33document.save(output_path)
Let’s look at creating a more complex HTML document. In the following code snippet, we will construct an HTML document from scratch, create new elements, populate them with content, and add them to the HTML document structure. The HTML DOM model and the functionality mentioned above will help us with this.
- Use the
HTMLDocument()
constructor to initialize a new HTMLDocument object, which represents the HTML document. - Employ the
create_element() method of
HTMLDocument
to generate a<style>
element. - Assign CSS rules to the style element using the text_content property.
- Retrieve the
<head>
element using the get_elements_by_tag_name() method. - Append the
<style>
element to the<head>
element using the append_child() method. - Use the
create_element() method to create a
<p>
paragraph element. - Set the
class_name
property of the paragraph to apply the desired CSS styles. - Create a text node with the create_text_node() method.
- Append this text node to the paragraph element using the
append_child()
method. - Add the paragraph element into the
<body>
of the document using theappend_child()
method on thebody
property. - Call the
save()
method ofHTMLDocument
to save the document to the specified HTML file path.
1import os
2from aspose.html import *
3from aspose.html.saving import *
4
5# Define output directory and file paths
6output_dir = "output/"
7save_path = os.path.join(output_dir, "edit-document.html")
8
9# Ensure the output directory exists
10os.makedirs(output_dir, exist_ok=True)
11
12# Create an instance of an HTML document
13document = HTMLDocument()
14
15# Create a style element and set the teal color for elements with class "col"
16style = document.create_element("style")
17style.text_content = ".col { color: teal }"
18
19# Find the document <head> element and append the <style> element
20head = document.get_elements_by_tag_name("head")[0]
21head.append_child(style)
22
23# Create a paragraph <p> element with class "col"
24p = document.create_element("p")
25p.class_name = "col"
26
27# Create a text node
28text = document.create_text_node("Edit HTML document")
29
30# Append the text node to the paragraph
31p.append_child(text)
32
33# Append the paragraph to the document <body>
34document.body.append_child(p)
35
36# Save the HTML document to a file
37document.save(save_path)
The API Reference Source provides a comprehensive list of classes and methods in the DOM namespace.
Using inner_html
and outer_html
properties
Working with DOM objects provides a powerful way to manipulate an HTML document in Python. However, in some cases, it can be more convenient to work directly with strings. The inner_html
and outer_html
properties are used to access and manipulate HTML content in a document, but they differ in what they represent and how they are used:
- The inner_html property represents the HTML content inside an element, excluding the element’s own start and end tags.
- The outer_html property represents all of an element’s HTML content, including its own start and end tags.
The following code snippet shows you how to use the inner_html
and outer_html
properties of the
Element class to edit HTML.
1from aspose.html import *
2
3# Create an instance of an HTML document
4document = HTMLDocument()
5
6# Write the content of the HTML document to the console
7print(document.document_element.outer_html) # output: <html><head></head><body></body></html>
8
9# Set the content of the body element
10document.body.inner_html = "<p>HTML is the standard markup language for Web pages.</p>"
11
12# Find the document <p> element
13p = document.get_elements_by_tag_name("p")[0]
14
15# Write the content of the <p> element to the console
16print(p.inner_html) # output: HTML is the standard markup language for Web pages.
17
18# Write the updated content of the HTML document to the console
19print(document.document_element.outer_html) # output: <html><head></head><body><p>HTML is the standard markup language for Web pages.</p></body></html>
Edit CSS
Cascading Style Sheets (CSS) is a style sheet language used to describe how web pages appear in a browser. CSS can be added to HTML documents in an inline, internal, and external way. Thus, you can define a unique style for a single HTML element using inline CSS or for multiple web pages to share formatting by specifying the relevant CSS in a separate .css file. Aspose.HTML for Python via .NET not only support CSS out-of-the-box but also gives you instruments to manipulate with document styles just on the fly before converting the HTML document to the other formats, as it follows.
Inline CSS
When CSS is written using the style
attribute inside of an HTML tag, it’s called an “inline CSS”. The inline CSS gives you to apply an individual style to one HTML element at a time. You set CSS to an HTML element by using the style
attribute with any CSS properties defined within it.
In the following code snippet, you can see how to specify CSS style properties for an HTML <p>
element.
1import os
2from aspose.html import *
3from aspose.html.rendering.pdf import *
4
5# Define the content of the HTML document
6content = "<p>Edit inline CSS using Aspose.HTML for Python via .NET</p>"
7
8# Create an instance of an HTML document with specified content
9document = HTMLDocument(content, ".")
10
11# Find the paragraph element and set a style attribute
12paragraph = document.get_elements_by_tag_name("p")[0]
13paragraph.set_attribute("style", "font-size: 150%; font-family: arial; color: teal")
14
15# Save the HTML document to a file
16output_dir = "output/"
17os.makedirs(output_dir, exist_ok=True)
18html_path = os.path.join(output_dir, "edit-inline-css.html")
19document.save(html_path)
20
21# Create an instance of the PDF output device and render the document to this device
22pdf_path = os.path.join(output_dir, "edit-inline-css.pdf")
23with PdfDevice(pdf_path) as device:
24 document.render_to(device)
In this particular example, color, font-size and font-family apply to the <p>
element. The fragment of rendered pdf page looks like this:
Internal CSS
The internal CSS styling option is popular for applying properties to individual pages by encasing all styles in the <style>
element placed it in the <head>
of HTML documents.
1import os
2from aspose.html import *
3from aspose.html.rendering.pdf import *
4
5
6# Define the content of the HTML document
7content = "<div><h1>Internal CSS</h1><p>An internal CSS is used to define a style for a single HTML page</p></div>"
8
9# Create an instance of an HTML document with specified content
10document = HTMLDocument(content, ".")
11
12# Create a <style> element and define internal CSS rules
13style = document.create_element("style")
14style.text_content = (
15 ".frame1 { margin-top:50px; margin-left:50px; padding:25px; width:360px; height:90px; "
16 "background-color:#82011a; font-family:arial; color:#fff5ee;} \r\n"
17 ".frame2 { margin-top:-70px; margin-left:160px; text-align:center; padding:20px; width:360px; "
18 "height:100px; background-color:#ebd2d7;}"
19)
20
21# Find the <head> element and append the <style> element
22head = document.get_elements_by_tag_name("head")[0]
23head.append_child(style)
24
25# Find the <h1> element and apply styles
26header = document.get_elements_by_tag_name("h1")[0]
27header.class_name = "frame1"
28
29# Update the style of <h1> using the style attribute directly
30header.set_attribute("style", "font-size: 200%; text-align: center;")
31
32# Find the paragraph element and apply styles
33paragraph = document.get_elements_by_tag_name("p")[0]
34paragraph.class_name = "frame2"
35paragraph.set_attribute("style", "color: #434343; font-size: 150%; font-family: verdana;")
36
37# Save the HTML document to a file
38output_dir = "output/"
39os.makedirs(output_dir, exist_ok=True)
40html_path = os.path.join(output_dir, "edit-internal-css.html")
41document.save(html_path)
42
43# Create an instance of the PDF output device and render the document to this device
44pdf_path = os.path.join(output_dir, "edit-internal-css.pdf")
45with PdfDevice(pdf_path) as device:
46 document.render_to(device)
In this example, we use internal CSS and also declare additional style properties for individual elements using the style
attribute inside of the <h1>
and <p>
tags. The figure illustrates the fragment of rendered “edit-internal-css.pdf” file:
Download the Aspose.HTML for Python via .NET library to successfully, quickly, and easily manipulate your HTML documents. The Python library can create, modify, extract data, convert, and render HTML documents without the need for external software. It supports popular file formats such as EPUB, MHTML, XML, SVG, and Markdown and can render to PDF, DOCX, XPS, and Image file formats.