Save HTML Document – C# Examples

After downloading an existing file or creating an HTML document from scratch, you can save the changes using one of the HTMLDocument.Save() methods. There are overloaded methods to save a document to a file, URL, or streams.

Please note that we have two different concepts for creating the output files:

  • The first conception is based on producing the HTML like files as output. The SaveOptions as a base class for this approach helps to handle the saving process of related resources such as scripts, styles, images, etc. The ResourceHandler class is responsible for handling resources. It is developed to save HTML content and resources into streams and provides methods that allow you to control what will be done with the resource.
  • The second concept could be used to creating a visual representation of HTML as a result. The base class for this conception is RenderingOptions; it has specialized methods to specify the page size, page-margins, resolution, user-styles, etc.

This article only describes how to use SaveOptions and ResourceHandler classes. To read more about the rendering mechanism, please follow the Renderers and Rendering Options articles.

SaveOptions & ResourceHandlingOptions

The SaveOptions is a base class that allows you to specify additional options for saving operations and helps to manage the linked resources. The ResourceHandlingOptions property of the SaveOptions class is used for configuration of resources handling. The ResourceHandlingOptions class represents resource handling options and the list of available ones are demonstrated in the following table:

OptionDescription
UrlRestrictionApplies restrictions to the host or folders where resources are located.
MaxHandlingDepthIf you need to save not the only specified HTML document, but also the linked HTML pages, this option gives you the ability to control the depth of the linked pages that should be saved.
JavaScriptThis option specifies how do we need to treat the JavaScript files: it could be saved as a separated linked file, embed into HTML file or even be ignored.
DefaultThis option specifies behavior for other than JavaScript files. Gets or sets an enum, which represents the default way of resource handling. Currently, Save, Ignore, and Embed values are supported. The default value is Save.

Save HTML

Once you have finished your changes in HTML, you may want to save the document. You can do it using one of the Save() methods of the HTMLDocument class. The following example is the easiest way to save an HTML file:

 1// Prepare an output path for a document saving
 2string documentPath = Path.Combine(OutputDir, "save-to-file.html");
 3
 4// Initialize an empty HTML document
 5using (var document = new HTMLDocument())
 6{
 7    // Create a text element and add it to the document
 8    var text = document.CreateTextNode("Hello, World!");
 9    document.Body.AppendChild(text);
10
11    // Save the HTML document to the file on a disk
12    document.Save(documentPath);
13}

In the example above, we use the HTMLDocument() constructor for initializing an empty HTML document. The CreateTextNode(data) method of the HTMLDocument class creates a text node given the specified string. The Save(path) method saves the document to a local file specified by path.

The sample above is quite simple. However, in real-life applications, you often need additional control over the saving process. The next few sections describe how to use resource handling options or save you document to the different formats.

Save HTML to a File

The following code snippet shows how to use ResourceHandlingOptions property of the SaveOptions class to manage linked to your document files.

 1// Prepare an output path for an HTML document 
 2string documentPath = Path.Combine(OutputDir, "save-with-linked-file.html");
 3
 4// Prepare a simple HTML file with a linked document
 5File.WriteAllText(documentPath, "<p>Hello, World!</p>" +
 6                                "<a href='linked.html'>linked file</a>");
 7
 8// Prepare a simple linked HTML file
 9File.WriteAllText(Path.Combine(OutputDir, "linked.html"), "<p>Hello, linked file!</p>");
10
11// Load the "save-with-linked-file.html" into memory
12using (var document = new HTMLDocument(documentPath))
13{
14    // Create a save options instance
15    var options = new HTMLSaveOptions();
16
17    // The following line with value '0' cuts off all other linked HTML-files while saving this instance
18    // If you remove this line or change value to the '1', the 'linked.html' file will be saved as well to the output folder
19    options.ResourceHandlingOptions.MaxHandlingDepth = 1;
20
21    // Save the document with the save options
22    document.Save(Path.Combine(OutputDir, "save-with-linked-file_out.html"), options);
23}

Save HTML to a Local File System Storage

The HTML document can contain different resources like CSS, external images and files. Aspose.HTML for .NET provides a way to save HTML with all linked files – the ResourceHandler class is developed for saving HTML content and resources to streams. This class is responsible for handling resources and provides methods that allow you to control what is done with each resource.

Let’s consider an example of saving HTML with resourses to user-specified local file storage. The source with-resources.html document and its linked image file are in the same directory. The FileSystemResourceHandler(customOutDir) constructor takes a path indicating where the document with resources will be saved and creates a FileSystemResourceHandler object. The Save(resourceHandler) method takes this object and saves HTML to the output storage.

 1// Prepare a path to a source HTML file
 2string inputPath = Path.Combine(DataDir, "with-resources.html");
 3
 4// Prepare a full path to an output directory 
 5string customOutDir = Path.Combine(Directory.GetCurrentDirectory(), "./../../../../tests-out/saving/");
 6
 7// Load the HTML document from a file
 8using (var doc = new HTMLDocument(inputPath))
 9{
10    // Save HTML with resources
11    doc.Save(new FileSystemResourceHandler(customOutDir));
12}

Save HTML to a Zip Archive

You can implement the ResourceHandler by creating ZipResourceHandler class. It allows you to create a structured and compressed archive containing HTML documents and associated resources, making it suitable for scenarios such as archiving and storage optimization. The HandleResource() method in the ZipResourceHandler class serves to customize the behavior of how individual resources are processed and stored in a Zip archive.

In the following example, the ZipResourceHandler class is used to save the with-resources.html document along with its linked resources to a Zip archive:

 1// Prepare a path to a source HTML file 
 2string inputPath = Path.Combine(DataDir, "with-resources.html");
 3
 4var dir = Directory.GetCurrentDirectory();
 5
 6// Prepare a full path to an output zip storage
 7string customArchivePath = Path.Combine(dir, "./../../../../tests-out/saving/archive.zip");
 8
 9// Load the HTML document 
10using (var doc = new HTMLDocument(inputPath))
11{
12    // Initialize an instance of the ZipResourceHandler class
13    using (var resourceHandler = new ZipResourceHandler(customArchivePath))
14    {
15        // Save HTML with resources to a Zip archive
16        doc.Save(resourceHandler);
17    }
18}

The ResourceHandler class is intended for customer implementation. The ZipResourceHandler class extends the ResourceHandler base class and provides a convenient way to manage the entire process of handling and storing resources linked with an HTML document into a Zip archive:

 1internal class ZipResourceHandler : ResourceHandler, IDisposable
 2{
 3    private FileStream zipStream;
 4    private ZipArchive archive;
 5    private int streamsCounter;
 6    private bool initialized;
 7
 8    public ZipResourceHandler(string name)
 9    {
10        DisposeArchive();
11        zipStream = new FileStream(name, FileMode.Create);
12        archive = new ZipArchive(zipStream, ZipArchiveMode.Update);
13        initialized = false;
14    }
15
16    public override void HandleResource(Resource resource, ResourceHandlingContext context)
17    {
18        var zipUri = (streamsCounter++ == 0
19            ? Path.GetFileName(resource.OriginalUrl.Href)
20            : Path.Combine(Path.GetFileName(Path.GetDirectoryName(resource.OriginalUrl.Href)),
21                Path.GetFileName(resource.OriginalUrl.Href)));
22        var samplePrefix = String.Empty;
23        if (initialized)
24            samplePrefix = "my_";
25        else
26            initialized = true;
27
28        using (var newStream = archive.CreateEntry(samplePrefix + zipUri).Open())
29        {
30            resource.WithOutputUrl(new Url("file:///" + samplePrefix + zipUri)).Save(newStream, context);
31        }
32    }
33
34    private void DisposeArchive()
35    {
36        if (archive != null)
37        {
38            archive.Dispose();
39            archive = null;
40        }
41
42        if (zipStream != null)
43        {
44            zipStream.Dispose();
45            zipStream = null;
46        }
47
48        streamsCounter = 0;
49    }
50
51    public void Dispose()
52    {
53        DisposeArchive();
54    }
55}

Save HTML to Memory Streams

The ResourceHandler class implementation in the MemoryResourceHandler class allows saving HTML to memory streams. The following code shows how to use the MemoryResourceHandler class to store an HTML document in memory, collecting and printing information about the handled resources.

  1. Initialize an HTML Document using the specified HTML file path.
  2. Create an instance of the MemoryResourceHandler class. This class is designed to capture and store resources within memory streams during the resource-handling process.
  3. Call the Save() method of the HTML document and pass it the MemoryResourceHandler instance as an argument. This associates the resource handling logic of the MemoryResourceHandler with the HTML document-saving process.
  4. Use the PrintInfo() method of the MemoryResourceHandler to print information about the handled resources.
 1// Prepare a path to a source HTML file 
 2string inputPath = Path.Combine(DataDir, "with-resources.html");
 3
 4// Load the HTML document 
 5using (var doc = new HTMLDocument(inputPath))
 6{
 7    // Create an instance of the MemoryResourceHandler class and save HTML to memory
 8    var resourceHandler = new MemoryResourceHandler();
 9    doc.Save(resourceHandler);
10    resourceHandler.PrintInfo();
11}

After the example run, the message about memory storage will be printed:

uri:memory:///with-resources.html, length:256
uri:memory:///photo1.png, length:57438

The ResourceHandler is a base class that supports the creation and management of output streams. The MemoryResourceHandler class allows you to capture and store resources in-memory streams, providing a dynamic and flexible way to handle resources without physically saving them to the file system. The following code snippet shows the realization of the ResourceHandler in the MemoryResourceHandler class:

 1internal class MemoryResourceHandler : ResourceHandler
 2{
 3    public List<Tuple<Stream, Resource>> Streams;
 4
 5    public MemoryResourceHandler()
 6    {
 7        Streams = new List<Tuple<Stream, Resource>>();
 8    }
 9
10    public override void HandleResource(Resource resource, ResourceHandlingContext context)
11    {
12        var outputStream = new MemoryStream();
13        Streams.Add(Tuple.Create<Stream, Resource>(outputStream, resource));
14        resource
15            .WithOutputUrl(new Url(Path.GetFileName(resource.OriginalUrl.Pathname), "memory:///"))
16            .Save(outputStream, context);
17    }
18
19    public void PrintInfo()
20    {
21        foreach (var stream in Streams)
22            Console.WriteLine($"uri:{stream.Item2.OutputUrl}, length:{stream.Item1.Length}");
23    }
24}

Save HTML to MHTML

In some cases, you need to save your web page as a single file. MHTML document could be handy and helpful for this purpose since it is a web-page archive and it stores everything inside itself. The HTMLSaveFormat Enumeration specifies the format in which document is saved, it can be HTML, MHTML, and MD formats. The example below shows how to use the Save(path, saveFormat) method for HTML to MHTML saving.

 1// Prepare an output path for a document saving
 2string savePath = Path.Combine(OutputDir, "save-to-mhtml.mht");
 3
 4// Prepare a simple HTML file with a linked document
 5File.WriteAllText("save-to-mhtml.html", "<p>Hello, World!</p>" +
 6                                        "<a href='linked-file.html'>linked file</a>");
 7
 8// Prepare a simple linked HTML file
 9File.WriteAllText("linked-file.html", "<p>Hello, linked file!</p>");
10
11// Load the "save-to-mhtml.html" into memory
12using (var document = new HTMLDocument("save-to-mhtml.html"))
13{
14    // Save the document to MHTML format
15    document.Save(savePath, HTMLSaveFormat.MHTML);
16}

The saved “save-to-mhtml.mht” file stores HTML of the “document.html” and “linked-file.html” files.

Save HTML to Markdown

Markdown is a markup language with plain-text syntax. As well as for HTML to MHTML example, you can use the HTMLSaveFormat for HTML to MD saving. Please take a look at the following example:

 1// Prepare an output path for a document saving
 2string documentPath = Path.Combine(OutputDir, "save-html-to-markdown.md");
 3
 4// Prepare HTML code
 5var html_code = "<H2>Hello, World!</H2>";
 6
 7// Initialize a document from a string variable
 8using (var document = new HTMLDocument(html_code, "."))
 9{
10    // Save the document as a Markdown file
11    document.Save(documentPath, HTMLSaveFormat.Markdown);
12}

For the more information how to use HTML Converter, please visit the Convert HTML to Markdown article.

Save SVG

Usually, you could see SVG as a part of an HTML file, it is used to represent the vector data on the page: images, icons, tables, etc. However, SVG also could be extracted from the web page and you can manipulate it in a similar way as the HTML document.

Since SVGDocument and HTMLDocument are based on the same WHATWG DOM standard, all operations such as loading, reading, editing, converting and saving are similar for both documents. So, all examples where you can see manipulation with the HTMLDocument are applicable for the SVGDocument as well.

To save your changes, please use follows:

 1// Prepare an output path for a document saving
 2string documentPath = Path.Combine(OutputDir, "save-html-to-svg.svg");
 3
 4// Prepare SVG code
 5var code = @"
 6    <svg xmlns='http://www.w3.org/2000/svg' height='200' width='300'>
 7        <g fill='none' stroke-width= '10' stroke-dasharray='30 10'>
 8            <path stroke='red' d='M 25 40 l 215 0' />
 9            <path stroke='black' d='M 35 80 l 215 0' />
10            <path stroke='blue' d='M 45 120 l 215 0' />
11        </g>
12    </svg>";
13
14// Initialize an SVG instance from the content string
15using (var document = new SVGDocument(code, "."))
16{
17    // Save the SVG file to a disk
18    document.Save(documentPath);
19}

For more information about SVG Basics Drawing and and the API usage for the processing and rendering of SVG documents, see the Aspose.SVG for .NET Documentation.

You can download the complete examples and data files from GitHub.

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.