Save HTML Document in Python
How to Save HTML in Python
This article offers a detailed guide on how to save an HTML document using Aspose.HTML for Python via .NET API. After working with HTML documents, you can save the changes using one of the HTMLDocument.save() methods. There are methods to save a document to a file or URL.
- Aspose.HTML Python API provides aspose.html.saving namespace with the SaveOptions and ResourceHandlingOptions classes that allow you to set options for saving operations.
- Aspose.HTML for Python via .NET provides aspose.html.saving.resourcehandlers namespace that contains ResourceHandler and FileSystemResourceHandler classes responsible for handling resources.
Please note that Aspose.HTML for Python via .NET offers two different approaches for creating the output files:
HTML-Based Approach: This way involves producing HTML-like files as output. It uses the SaveOptions class, which serves as the foundation for managing the saving process of related resources such as scripts, styles, and images. The ResourceHandler class is responsible for handling these resources. It is developed to save HTML content and associated resources into streams and provides methods that allow you to control what will be done with the resource.
Visual Representation Approach: This way focuses on creating a visual representation of HTML. It is based on the RenderingOptions class, which offers specialized methods for defining page size, margins, resolution, user styles, and more.
This article will cover the use of the SaveOptions
and ResourceHandler
classes.
Save HTML
Once you have completed your changes to your HTML document, you may want to save it. You can do this using one of the save() methods provided by the HTMLDocument class. Here is a simple Python example to save an HTML file:
1import os
2from aspose.html import *
3
4# Prepare an output path for saving the document
5output_dir = "output/"
6if not os.path.exists(output_dir):
7 os.makedirs(output_dir)
8
9document_path = os.path.join(output_dir, "create-new-document.html")
10
11# Initialize an empty HTML document
12with HTMLDocument() as document:
13 # Create a text node and add it to the document
14 text = document.create_text_node("Hello, World!")
15 document.body.append_child(text)
16
17 # Save the document to a file
18 document.save(document_path)
In the example above, we use the HTMLDocument()
constructor for initializing an empty HTML document. The
create_text_node(data
) method of the HTMLDocument class creates a text node given the specified string. When you call document.save(document_path)
, the method writes the HTML content of the document
object to the file specified by document_path
.
The sample above is quite simple. However, in real-life applications, you often need additional control over the saving process. The next few sections describe how to use resource handling options or save you document to the different formats.
SaveOptions and ResourceHandlingOptions
The
SaveOptions is a base class that allows you to specify additional options for saving operations and helps to manage the linked resources. The resource_handling_options
property of the SaveOptions class is used for configuration of resources handling. The
ResourceHandlingOptions class defines options for managing resources associated with an HTML document. It provides several properties to control how various types of resources are handled during the saving or processing of the document:
java_script property determines how JavaScript resources are managed. It could be saved as a separated linked file, embed into HTML file or even be ignored. The default value is
ResourceHandling.SAVE
.default property sets the default handling method for all resources. The default value is
ResourceHandling.SAVE
.resource_url_restriction property defines the URL restrictions for resources such as CSS, JavaScript, and images. The default value is
UrlRestriction.SAME_HOST
, which restricts resources to those hosted on the same domain as the document.page_url_restriction property specifies URL restrictions for pages. The default value is
UrlRestriction.ROOT_AND_SUB_FOLDERS
, meaning that only pages within the root directory and its subfolders will be handled.max_handling_depth property controls the maximum depth of page handling. A depth of 1 means only pages directly referenced from the saved document will be handled. Setting this property to -1 will handle all pages. The default value is 0, which means only the document itself will be processed.
Save HTML to a File
The following Python code snippet shows how to use resource_handling_options property of the SaveOptions class to manage linked to your document files.
1import os
2from aspose.html import *
3from aspose.html.saving import *
4
5# Prepare an output path for the document
6output_dir = "output/"
7if not os.path.exists(output_dir):
8 os.makedirs(output_dir)
9
10document_path = os.path.join(output_dir, "save-with-linked-file.html")
11
12# Prepare a simple HTML file with a linked document
13with open(document_path, "w") as file:
14 file.write("<p>Hello, World!</p>" +
15 "<a href="linked.html">linked file</a>")
16
17# Prepare a simple linked HTML file
18with open(os.path.join(output_dir, "linked.html"), "w") as file:
19 file.write("<p>Hello, linked file!</p>")
20
21# Load the "save-with-linked-file.html" into memory
22document = HTMLDocument(document_path)
23
24# Create a save options instance
25options = HTMLSaveOptions()
26
27# The following line with value "0" cuts off all other linked HTML files while saving this instance
28# If you remove this line or change the value to "1", the "linked.html" file will be saved as well to the output folder
29options.resource_handling_options.max_handling_depth = 1
30
31# Save the document with the save options
32output_path = os.path.join(output_dir, "save-with-linked-file_out.html")
33document.save(output_path, options)
If you need to save HTML as an image or document with a fixed layout like PDF, you can convert the document to the format you need. Refer to the section HTML Converter for more information.
Save HTML to MHTML
In some cases, you need to save your web page as a single file. MHTML document could be handy and helpful for this purpose since it is a web-page archive and it stores everything inside itself. The
HTMLSaveFormat Enumeration specifies the format in which document is saved, it can be HTML, MHTML, and Markdown formats. The example below shows how to use the save(path, save_format)
method for HTML to MHTML saving.
1import os
2from aspose.html import *
3from aspose.html.saving import *
4
5# Define the output directory and document path
6output_dir = "output/"
7document_path = os.path.join(output_dir, "save-to-MHTML.mht")
8
9# Ensure the output directory exists
10os.makedirs(output_dir, exist_ok=True)
11
12# Prepare a simple HTML file with a linked document
13with open("document.html", "w") as file:
14 file.write("<p>Hello, World! I save HTML to MHTML.</p>"
15 "<a href="linked-file.html">linked file</a>")
16
17# Prepare a simple linked HTML file
18with open("linked-file.html", "w") as file:
19 file.write("<p>Hello, linked file!</p>")
20
21# Load the "document.html" into memory
22with HTMLDocument("document.html") as document:
23 # Save the document to MHTML format
24 document.save(document_path, HTMLSaveFormat.MHTML)
The saved “save-to-MTHML.mht” file stores HTML of the “document.html” and “linked-file.html” files.
Save HTML to Markdown
Markdown is a markup language with plain-text syntax. As well as for HTML to MHTML example, you can use the HTMLSaveFormat
for HTML to Markdown saving. Please take a look at the following Python example:
1import os
2from aspose.html import *
3from aspose.html.saving import *
4
5# Prepare a path to a source and output HTML file
6data_dir = "data"
7output_dir = "output/"
8if not os.path.exists(output_dir):
9 os.makedirs(output_dir)
10
11input_path = os.path.join(data_dir, "document.html")
12output_path = os.path.join(output_dir, "html-to-markdown.md")
13
14# Load the HTML document from a file
15document = HTMLDocument(input_path)
16
17# Save the document to MHTML format
18document.save(output_path, HTMLSaveFormat.MARKDOWN)
Save SVG
Usually, SVG is embedded within an HTML file to represent vector graphics such as images, icons, tables, and more. However, SVG can also be extracted from a web page and manipulated independently, much like an HTML document.
Since both
SVGDocument and
HTMLDocument adhere to the
WHATWG DOM standard, their operations – such as loading, reading, editing, converting, and saving – are largely similar. Thus, any examples demonstrating manipulation with HTMLDocument
can also be applied to SVGDocument
.
1import os
2from aspose.html import *
3from aspose.html.dom.svg import *
4
5# Define the output directory and document path
6output_dir = "output/"
7document_path = os.path.join(output_dir, "save-to.svg")
8
9# Ensure the output directory exists
10os.makedirs(output_dir, exist_ok=True)
11
12# Prepare SVG code
13svg_code = """
14<svg xmlns="http://www.w3.org/2000/svg" height="400" width="300">
15 <path stroke="#a06e84" stroke-width="3" fill="#74aeaf" d="
16 M 150,50 L 150, 300
17 M 120,100 L 150,50 L 180, 100
18 M 110,150 L 150,90 L 190, 150
19 M 90,220 L 150,130 L 210, 220
20 M 70,300 L 150,190 L 230, 300
21 M 110,310 L 150,240 L 190, 310
22 " />
23</svg>
24"""
25
26# Initialize an SVG instance from the content string
27document = SVGDocument(svg_code, ".")
28
29# Save SVG
30document.save(document_path)
For more information about Aspose.SVG Python API usage for the processing and rendering of SVG documents, see the Aspose.SVG for Python via .NET Documentation.
Download the Aspose.HTML for Python via .NET library to successfully, quickly, and easily manipulate your HTML documents. The Python library can create, modify, extract data, convert, and render HTML documents without the need for external software. It supports popular file formats such as EPUB, MHTML, XML, SVG, and Markdown and can render to PDF, DOCX, XPS, and Image file formats.
Aspose.HTML offers a free online HTML Converter for converting HTML documents to a variety of popular formats. Just load HTML from a file or URL, choose the format to convert, and you’re done. It’s fast and completely free!