Save HTML Document – C# Examples
After downloading an existing file or creating an HTML document from scratch, you can save the changes using one of the HTMLDocument.Save() methods. There are overloaded methods to save a document to a file, URL, or streams.
- The API provides Aspose.Html.Saving namespace with the SaveOptions and ResourceHandlingOptions classes that allow you to set options for saving operations.
- The API provides Aspose.Html.Saving.ResourceHandlers namespace that contains ResourceHandler and FileSystemResourceHandler classes responsible for handling resources.
Please note that we have two different concepts for creating the output files:
- The first conception is based on producing the HTML like files as output. The SaveOptions as a base class for this approach helps to handle the saving process of related resources such as scripts, styles, images, etc. The ResourceHandler class is responsible for handling resources. It is developed to save HTML content and resources into streams and provides methods that allow you to control what will be done with the resource.
- The second concept could be used to creating a visual representation of HTML as a result. The base class for this conception is RenderingOptions; it has specialized methods to specify the page size, page-margins, resolution, user-styles, etc.
This article only describes how to use SaveOptions
and ResourceHandler
classes. To read more about the rendering mechanism, please follow the
Renderers and
Rendering Options articles.
SaveOptions & ResourceHandlingOptions
The
SaveOptions is a base class that allows you to specify additional options for saving operations and helps to manage the linked resources. The ResourceHandlingOptions
property of the SaveOptions class is used for configuration of resources handling. The
ResourceHandlingOptions class represents resource handling options and the list of available ones are demonstrated in the following table:
Option | Description |
---|---|
UrlRestriction | Applies restrictions to the host or folders where resources are located. |
MaxHandlingDepth | If you need to save not the only specified HTML document, but also the linked HTML pages, this option gives you the ability to control the depth of the linked pages that should be saved. |
JavaScript | This option specifies how do we need to treat the JavaScript files: it could be saved as a separated linked file, embed into HTML file or even be ignored. |
Default | This option specifies behavior for other than JavaScript files. Gets or sets an enum, which represents the default way of resource handling. Currently, Save, Ignore, and Embed values are supported. The default value is Save. |
Save HTML
Once you have finished your changes in HTML, you may want to save the document. You can do it using one of the Save() methods of the HTMLDocument class. The following example is the easiest way to save an HTML file:
1// Prepare an output path for a document saving
2string documentPath = Path.Combine(OutputDir, "save-to-file.html");
3
4// Initialize an empty HTML document
5using (var document = new HTMLDocument())
6{
7 // Create a text element and add it to the document
8 var text = document.CreateTextNode("Hello, World!");
9 document.Body.AppendChild(text);
10
11 // Save the HTML document to the file on a disk
12 document.Save(documentPath);
13}
In the example above, we use the HTMLDocument() constructor for initializing an empty HTML document. The
CreateTextNode(data
) method of the HTMLDocument class creates a text node given the specified string. The
Save(path
) method saves the document to a local file specified by path.
The sample above is quite simple. However, in real-life applications, you often need additional control over the saving process. The next few sections describe how to use resource handling options or save you document to the different formats.
Save HTML to a File
The following code snippet shows how to use ResourceHandlingOptions property of the SaveOptions class to manage linked to your document files.
1// Prepare an output path for an HTML document
2string documentPath = Path.Combine(OutputDir, "save-with-linked-file.html");
3
4// Prepare a simple HTML file with a linked document
5File.WriteAllText(documentPath, "<p>Hello, World!</p>" +
6 "<a href='linked.html'>linked file</a>");
7
8// Prepare a simple linked HTML file
9File.WriteAllText(Path.Combine(OutputDir, "linked.html"), "<p>Hello, linked file!</p>");
10
11// Load the "save-with-linked-file.html" into memory
12using (var document = new HTMLDocument(documentPath))
13{
14 // Create a save options instance
15 var options = new HTMLSaveOptions();
16
17 // The following line with value '0' cuts off all other linked HTML-files while saving this instance
18 // If you remove this line or change value to the '1', the 'linked.html' file will be saved as well to the output folder
19 options.ResourceHandlingOptions.MaxHandlingDepth = 1;
20
21 // Save the document with the save options
22 document.Save(Path.Combine(OutputDir, "save-with-linked-file_out.html"), options);
23}
Save HTML to a Local File System Storage
The HTML document can contain different resources like CSS, external images and files. Aspose.HTML for .NET provides a way to save HTML with all linked files – the ResourceHandler class is developed for saving HTML content and resources to streams. This class is responsible for handling resources and provides methods that allow you to control what is done with each resource.
Let’s consider an example of saving HTML with resourses to user-specified local file storage. The source
with-resources.html document and its linked image file are in the same directory. The
FileSystemResourceHandler(customOutDir
) constructor takes a path indicating where the document with resources will be saved and creates a FileSystemResourceHandler
object. The
Save(resourceHandler
) method takes this object and saves HTML to the output storage.
1// Prepare a path to a source HTML file
2string inputPath = Path.Combine(DataDir, "with-resources.html");
3
4// Prepare a full path to an output directory
5string customOutDir = Path.Combine(Directory.GetCurrentDirectory(), "./../../../../tests-out/saving/");
6
7// Load the HTML document from a file
8using (var doc = new HTMLDocument(inputPath))
9{
10 // Save HTML with resources
11 doc.Save(new FileSystemResourceHandler(customOutDir));
12}
Save HTML to a Zip Archive
You can implement the
ResourceHandler by creating ZipResourceHandler class. It allows you to create a structured and compressed archive containing HTML documents and associated resources, making it suitable for scenarios such as archiving and storage optimization. The
HandleResource() method in the ZipResourceHandler
class serves to customize the behavior of how individual resources are processed and stored in a Zip archive.
In the following example, the ZipResourceHandler
class is used to save the
with-resources.html document along with its linked resources to a Zip archive:
1// Prepare a path to a source HTML file
2string inputPath = Path.Combine(DataDir, "with-resources.html");
3
4var dir = Directory.GetCurrentDirectory();
5
6// Prepare a full path to an output zip storage
7string customArchivePath = Path.Combine(dir, "./../../../../tests-out/saving/archive.zip");
8
9// Load the HTML document
10using (var doc = new HTMLDocument(inputPath))
11{
12 // Initialize an instance of the ZipResourceHandler class
13 using (var resourceHandler = new ZipResourceHandler(customArchivePath))
14 {
15 // Save HTML with resources to a Zip archive
16 doc.Save(resourceHandler);
17 }
18}
The ResourceHandler
class is intended for customer implementation. The ZipResourceHandler
class extends the ResourceHandler
base class and provides a convenient way to manage the entire process of handling and storing resources linked with an HTML document into a Zip archive:
1internal class ZipResourceHandler : ResourceHandler, IDisposable
2{
3 private FileStream zipStream;
4 private ZipArchive archive;
5 private int streamsCounter;
6 private bool initialized;
7
8 public ZipResourceHandler(string name)
9 {
10 DisposeArchive();
11 zipStream = new FileStream(name, FileMode.Create);
12 archive = new ZipArchive(zipStream, ZipArchiveMode.Update);
13 initialized = false;
14 }
15
16 public override void HandleResource(Resource resource, ResourceHandlingContext context)
17 {
18 var zipUri = (streamsCounter++ == 0
19 ? Path.GetFileName(resource.OriginalUrl.Href)
20 : Path.Combine(Path.GetFileName(Path.GetDirectoryName(resource.OriginalUrl.Href)),
21 Path.GetFileName(resource.OriginalUrl.Href)));
22 var samplePrefix = String.Empty;
23 if (initialized)
24 samplePrefix = "my_";
25 else
26 initialized = true;
27
28 using (var newStream = archive.CreateEntry(samplePrefix + zipUri).Open())
29 {
30 resource.WithOutputUrl(new Url("file:///" + samplePrefix + zipUri)).Save(newStream, context);
31 }
32 }
33
34 private void DisposeArchive()
35 {
36 if (archive != null)
37 {
38 archive.Dispose();
39 archive = null;
40 }
41
42 if (zipStream != null)
43 {
44 zipStream.Dispose();
45 zipStream = null;
46 }
47
48 streamsCounter = 0;
49 }
50
51 public void Dispose()
52 {
53 DisposeArchive();
54 }
55}
Save HTML to Memory Streams
The
ResourceHandler class implementation in the MemoryResourceHandler class allows saving HTML to memory streams. The following code shows how to use the MemoryResourceHandler
class to store an HTML document in memory, collecting and printing information about the handled resources.
- Initialize an HTML Document using the specified HTML file path.
- Create an instance of the
MemoryResourceHandler
class. This class is designed to capture and store resources within memory streams during the resource-handling process. - Call the
Save()
method of the HTML document and pass it theMemoryResourceHandler
instance as an argument. This associates the resource handling logic of theMemoryResourceHandler
with the HTML document-saving process. - Use the
PrintInfo()
method of theMemoryResourceHandler
to print information about the handled resources.
1// Prepare a path to a source HTML file
2string inputPath = Path.Combine(DataDir, "with-resources.html");
3
4// Load the HTML document
5using (var doc = new HTMLDocument(inputPath))
6{
7 // Create an instance of the MemoryResourceHandler class and save HTML to memory
8 var resourceHandler = new MemoryResourceHandler();
9 doc.Save(resourceHandler);
10 resourceHandler.PrintInfo();
11}
After the example run, the message about memory storage will be printed:
uri:memory:///with-resources.html, length:256
uri:memory:///photo1.png, length:57438
The
ResourceHandler is a base class that supports the creation and management of output streams. The MemoryResourceHandler
class allows you to capture and store resources in-memory streams, providing a dynamic and flexible way to handle resources without physically saving them to the file system. The following code snippet shows the realization of the ResourceHandler
in the MemoryResourceHandler class:
1internal class MemoryResourceHandler : ResourceHandler
2{
3 public List<Tuple<Stream, Resource>> Streams;
4
5 public MemoryResourceHandler()
6 {
7 Streams = new List<Tuple<Stream, Resource>>();
8 }
9
10 public override void HandleResource(Resource resource, ResourceHandlingContext context)
11 {
12 var outputStream = new MemoryStream();
13 Streams.Add(Tuple.Create<Stream, Resource>(outputStream, resource));
14 resource
15 .WithOutputUrl(new Url(Path.GetFileName(resource.OriginalUrl.Pathname), "memory:///"))
16 .Save(outputStream, context);
17 }
18
19 public void PrintInfo()
20 {
21 foreach (var stream in Streams)
22 Console.WriteLine($"uri:{stream.Item2.OutputUrl}, length:{stream.Item1.Length}");
23 }
24}
Save HTML to MHTML
In some cases, you need to save your web page as a single file. MHTML document could be handy and helpful for this purpose since it is a web-page archive and it stores everything inside itself. The HTMLSaveFormat Enumeration specifies the format in which document is saved, it can be HTML, MHTML, and MD formats. The example below shows how to use the
Save(path, saveFormat
) method for HTML to MHTML saving.
1// Prepare an output path for a document saving
2string savePath = Path.Combine(OutputDir, "save-to-mhtml.mht");
3
4// Prepare a simple HTML file with a linked document
5File.WriteAllText("save-to-mhtml.html", "<p>Hello, World!</p>" +
6 "<a href='linked-file.html'>linked file</a>");
7
8// Prepare a simple linked HTML file
9File.WriteAllText("linked-file.html", "<p>Hello, linked file!</p>");
10
11// Load the "save-to-mhtml.html" into memory
12using (var document = new HTMLDocument("save-to-mhtml.html"))
13{
14 // Save the document to MHTML format
15 document.Save(savePath, HTMLSaveFormat.MHTML);
16}
The saved “save-to-mhtml.mht” file stores HTML of the “document.html” and “linked-file.html” files.
Save HTML to Markdown
Markdown is a markup language with plain-text syntax. As well as for HTML to MHTML example, you can use the HTMLSaveFormat
for HTML to MD saving. Please take a look at the following example:
1// Prepare an output path for a document saving
2string documentPath = Path.Combine(OutputDir, "save-html-to-markdown.md");
3
4// Prepare HTML code
5var html_code = "<H2>Hello, World!</H2>";
6
7// Initialize a document from a string variable
8using (var document = new HTMLDocument(html_code, "."))
9{
10 // Save the document as a Markdown file
11 document.Save(documentPath, HTMLSaveFormat.Markdown);
12}
For the more information how to use HTML Converter, please visit the Convert HTML to Markdown article.
Save SVG
Usually, you could see SVG as a part of an HTML file, it is used to represent the vector data on the page: images, icons, tables, etc. However, SVG also could be extracted from the web page and you can manipulate it in a similar way as the HTML document.
Since SVGDocument and HTMLDocument are based on the same WHATWG DOM standard, all operations such as loading, reading, editing, converting and saving are similar for both documents. So, all examples where you can see manipulation with the HTMLDocument are applicable for the SVGDocument as well.
To save your changes, please use follows:
1// Prepare an output path for a document saving
2string documentPath = Path.Combine(OutputDir, "save-html-to-svg.svg");
3
4// Prepare SVG code
5var code = @"
6 <svg xmlns='http://www.w3.org/2000/svg' height='200' width='300'>
7 <g fill='none' stroke-width= '10' stroke-dasharray='30 10'>
8 <path stroke='red' d='M 25 40 l 215 0' />
9 <path stroke='black' d='M 35 80 l 215 0' />
10 <path stroke='blue' d='M 45 120 l 215 0' />
11 </g>
12 </svg>";
13
14// Initialize an SVG instance from the content string
15using (var document = new SVGDocument(code, "."))
16{
17 // Save the SVG file to a disk
18 document.Save(documentPath);
19}
For more information about SVG Basics Drawing and and the API usage for the processing and rendering of SVG documents, see the Aspose.SVG for .NET Documentation.
You can download the complete examples and data files from GitHub.