Save HTML Document – C# Examples

After downloading an existing file or creating an HTML document from scratch, you can save the changes using one of the HTMLDocument.Save() methods. There are overloaded methods to save a document to a file, URL, or streams.

Please note that we have two different concepts for creating the output files:

  • The first conception is based on producing the HTML like files as output. The SaveOptions as a base class for this approach helps to handle the saving process of related resources such as scripts, styles, images, etc. The ResourceHandler class is responsible for handling resources. It is developed to save HTML content and resources into streams and provides methods that allow you to control what will be done with the resource.
  • The second concept could be used to creating a visual representation of HTML as a result. The base class for this conception is RenderingOptions; it has specialized methods to specify the page size, page-margins, resolution, user-styles, etc.

This article only describes how to use SaveOptions and ResourceHandler classes. To read more about the rendering mechanism, please follow the Renderers and Rendering Options articles.

SaveOptions & ResourceHandlingOptions

The SaveOptions is a base class that allows you to specify additional options for saving operations and helps to manage the linked resources. The ResourceHandlingOptions property of the SaveOptions class is used for configuration of resources handling. The ResourceHandlingOptions class represents resource handling options and the list of available ones are demonstrated in the following table:

OptionDescription
UrlRestrictionApplies restrictions to the host or folders where resources are located.
MaxHandlingDepthIf you need to save not the only specified HTML document, but also the linked HTML pages, this option gives you the ability to control the depth of the linked pages that should be saved.
JavaScriptThis option specifies how do we need to treat the JavaScript files: it could be saved as a separated linked file, embed into HTML file or even be ignored.
DefaultThis option specifies behavior for other than JavaScript files. Gets or sets an enum, which represents the default way of resource handling. Currently, Save, Ignore, and Embed values are supported. The default value is Save.

You can download the complete examples and data files from GitHub.

Save HTML

Once you have finished your changes in HTML, you may want to save the document. You can do it using one of the Save() methods of the HTMLDocument class. The following example is the easiest way to save an HTML file:

 1using System.IO;
 2using Aspose.Html;
 3...
 4    // Prepare an output path for a document saving
 5	string documentPath = Path.Combine(OutputDir, "save-to-file.html");
 6
 7    // Initialize an empty HTML document
 8    using (var document = new HTMLDocument())
 9    {
10        // Create a text node and add it to the document
11        var text = document.CreateTextNode("Hello World!");
12        document.Body.AppendChild(text);
13
14        // Save the HTML document to a local file
15        document.Save(documentPath);
16    }

In the example above, we use the HTMLDocument() constructor for initializing an empty HTML document. The CreateTextNode(data) method of the HTMLDocument class creates a text node given the specified string. The Save(path) method saves the document to a local file specified by path.

The sample above is quite simple. However, in real-life applications, you often need additional control over the saving process. The next few sections describe how to use resource handling options or save you document to the different formats.

Save HTML to a File

The following code snippet shows how to use ResourceHandlingOptions property of the SaveOptions class to manage linked to your document files.

 1using System.IO;
 2using Aspose.Html;
 3using Aspose.Html.Saving;
 4...
 5    // Prepare an output path for a document
 6    string documentPath = Path.Combine(OutputDir, "save-with-linked-file.html");
 7
 8    // Prepare a simple HTML file with a linked document
 9    File.WriteAllText(documentPath, "<p>Hello World!</p>" +
10                                    "<a href='linked.html'>linked file</a>");
11    // Prepare a simple linked HTML file
12    File.WriteAllText("linked.html", "<p>Hello linked file!</p>");
13
14    // Load the "save-with-linked-file.html" into memory
15    using (var document = new HTMLDocument(documentPath))
16    {
17        // Create a save options instance
18        var options = new HTMLSaveOptions();
19
20        // The following line with value '0' cuts off all other linked HTML-files while saving this instance
21        // If you remove this line or change value to the '1', the 'linked.html' file will be saved as well to the output folder
22        options.ResourceHandlingOptions.MaxHandlingDepth = 1;
23
24        // Save the document with the save options
25        document.Save(Path.Combine(OutputDir, "save-with-linked-file_out.html"), options);
26    }

Save HTML to a Local File System Storage

The HTML document can contain different resources like CSS, external images and files. Aspose.HTML for .NET provides a way to save HTML with all linked files – the ResourceHandler class is developed for saving HTML content and resources to streams. This class is responsible for handling resources and provides methods that allow you to control what is done with each resource.

Let’s consider an example of saving HTML with resourses to user-specified local file storage. The source with-resources.html document and its linked image file are in the same directory. The FileSystemResourceHandler(customOutDir) constructor takes a path indicating where the document with resources will be saved and creates a FileSystemResourceHandler object. The Save(resourceHandler) method takes this object and saves HTML to the output storage.

 1using System.IO;
 2using Aspose.Html;
 3using Aspose.Html.Saving.ResourceHandlers;
 4...
 5
 6    // Prepare a path to a source HTML file
 7    string inputPath = Path.Combine(DataDir, "with-resources.html");
 8
 9    // Prepare a full path to an output directory 
10    string customOutDir = Path.Combine(Directory.GetCurrentDirectory(), "./../../../../tests-out/save/");
11
12    // Load the HTML document from a file
13    using (var doc = new HTMLDocument(inputPath))
14    {
15        // Save HTML with resources
16        doc.Save(new FileSystemResourceHandler(customOutDir));
17    }

Save HTML to a Zip Archive

You can implement the ResourceHandler by creating ZipResourceHandler class. It allows you to create a structured and compressed archive containing HTML documents and associated resources, making it suitable for scenarios such as archiving and storage optimization. The HandleResource() method in the ZipResourceHandler class serves to customize the behavior of how individual resources are processed and stored in a Zip archive.

In the following example, the ZipResourceHandler class is used to save the with-resources.html document along with its linked resources to a Zip archive:

 1using System.IO;
 2using Aspose.Html;
 3using Aspose.Html.Saving;
 4using Aspose.Html.Saving.ResourceHandlers;
 5using System.IO.Compression;
 6...
 7
 8    // Prepare a path to a source HTML file 
 9    string inputPath = Path.Combine(DataDir, "with-resources.html");
10
11    var dir = Directory.GetCurrentDirectory();
12
13    // Prepare a full path to an output zip storage
14    string customArchivePath = Path.Combine(dir, "./../../../../tests-out/save/archive.zip");
15
16    // Load the HTML document
17    using (var doc = new HTMLDocument(inputPath))
18    {
19        // Initialize an instance of the ZipResourceHandler class
20        using (var resourceHandler = new ZipResourceHandler(customArchivePath))
21        {
22            // Save HTML with resources to a Zip archive
23            doc.Save(resourceHandler);
24        }
25    }

The ResourceHandler class is intended for customer implementation. The ZipResourceHandler class extends the ResourceHandler base class and provides a convenient way to manage the entire process of handling and storing resources linked with an HTML document into a Zip archive:

 1    internal class ZipResourceHandler : ResourceHandler, IDisposable
 2    {
 3        private FileStream zipStream;
 4        private ZipArchive archive;
 5        private int streamsCounter;
 6        private bool initialized;
 7
 8        public ZipResourceHandler(string name)
 9        {
10            DisposeArchive();
11            zipStream = new FileStream(name, FileMode.Create);
12            archive = new ZipArchive(zipStream, ZipArchiveMode.Update);
13            initialized = false;
14        }
15
16        public override void HandleResource(Resource resource, ResourceHandlingContext context)
17        {
18            var zipUri = (streamsCounter++ == 0
19                ? Path.GetFileName(resource.OriginalUrl.Href)
20                : Path.Combine(Path.GetFileName(Path.GetDirectoryName(resource.OriginalUrl.Href)),
21                    Path.GetFileName(resource.OriginalUrl.Href)));
22            var samplePrefix = String.Empty;
23            if (initialized)
24                samplePrefix = "my_";
25            else
26                initialized = true;
27
28            using (var newStream = archive.CreateEntry(samplePrefix + zipUri).Open())
29            {
30                resource.WithOutputUrl(new Url("file:///" + samplePrefix + zipUri)).Save(newStream, context);
31            }
32        }
33
34        private void DisposeArchive()
35        {
36            if (archive != null)
37            {
38                archive.Dispose();
39                archive = null;
40            }
41
42            if (zipStream != null)
43            {
44                zipStream.Dispose();
45                zipStream = null;
46            }
47
48            streamsCounter = 0;
49        }
50
51        public void Dispose()
52        {
53            DisposeArchive();
54        }
55    }

Save HTML to Memory Streams

The ResourceHandler class implementation in the MemoryResourceHandler class allows saving HTML to memory streams. The following code shows how to use the MemoryResourceHandler class to store an HTML document in memory, collecting and printing information about the handled resources.

  1. Initialize an HTML Document using the specified HTML file path.
  2. Create an instance of the MemoryResourceHandler class. This class is designed to capture and store resources within memory streams during the resource-handling process.
  3. Call the Save() method of the HTML document and pass it the MemoryResourceHandler instance as an argument. This associates the resource handling logic of the MemoryResourceHandler with the HTML document-saving process.
  4. Use the PrintInfo() method of the MemoryResourceHandler to print information about the handled resources.
 1using System.IO;
 2using Aspose.Html;
 3using Aspose.Html.Saving;
 4using Aspose.Html.Saving.ResourceHandlers;
 5using System.Collections.Generic;
 6...
 7
 8    // Prepare a path to a source HTML file 
 9    string inputPath = Path.Combine(DataDir, "with-resources.html");
10    
11    // Load the HTML document
12    using (var doc = new HTMLDocument(inputPath))
13    {
14        // Create an instance of the MemoryResourceHandler class and save HTML to memory
15        var resourceHandler = new MemoryResourceHandler();
16        doc.Save(resourceHandler);
17        resourceHandler.PrintInfo();
18    }

After the example run, the message about memory storage will be printed:

uri:memory:///with-resources.html, length:256
uri:memory:///photo1.png, length:57438

The ResourceHandler is a base class that supports the creation and management of output streams. The MemoryResourceHandler class allows you to capture and store resources in-memory streams, providing a dynamic and flexible way to handle resources without physically saving them to the file system. The following code snippet shows the realization of the ResourceHandler in the MemoryResourceHandler class:

 1    internal class MemoryResourceHandler : ResourceHandler
 2    {
 3        public List<Tuple<Stream, Resource>> Streams;
 4
 5        public MemoryResourceHandler()
 6        {
 7            Streams = new List<Tuple<Stream, Resource>>();
 8        }
 9
10        public override void HandleResource(Resource resource, ResourceHandlingContext context)
11        {
12            var outputStream = new MemoryStream();
13            Streams.Add(Tuple.Create<Stream, Resource>(outputStream, resource));
14            resource
15                .WithOutputUrl(new Url(Path.GetFileName(resource.OriginalUrl.Pathname), "memory:///"))
16                .Save(outputStream, context);
17        }
18
19        public void PrintInfo()
20        {
21            foreach (var stream in Streams)
22                Console.WriteLine($"uri:{stream.Item2.OutputUrl}, length:{stream.Item1.Length}");
23        }
24    }

Save HTML to MHTML

In some cases, you need to save your web page as a single file. MHTML document could be handy and helpful for this purpose since it is a web-page archive and it stores everything inside itself. The HTMLSaveFormat Enumeration specifies the format in which document is saved, it can be HTML, MHTML, and MD formats. The example below shows how to use the Save(path, saveFormat) method for HTML to MHTML saving.

 1using System.IO;
 2using Aspose.Html;
 3using Aspose.Html.Saving;
 4...
 5    // Prepare an output path for a document saving
 6    string documentPath = Path.Combine(OutputDir, "save-to-MTHML.mht");
 7
 8    // Prepare a simple HTML file with a linked document
 9    File.WriteAllText("document.html", "<p>Hello World!</p>" +
10                                       "<a href='linked-file.html'>linked file</a>");
11    // Prepare a simple linked HTML file
12    File.WriteAllText("linked-file.html", "<p>Hello linked file!</p>");
13
14    // Load the "document.html" into memory
15    using (var document = new HTMLDocument("document.html"))
16    {
17        // Save the document to MHTML format
18        document.Save(documentPath, HTMLSaveFormat.MHTML);
19
20    }

The saved “save-to-MTHML.mht” file stores HTML of the “document.html” and “linked-file.html” files.

Save HTML to Markdown

Markdown is a markup language with plain-text syntax. As well as for HTML to MHTML example, you can use the HTMLSaveFormat for HTML to MD saving. Please take a look at the following example:

 1using System.IO;
 2using Aspose.Html;
 3using Aspose.Html.Saving;
 4...
 5    // Prepare an output path for a document saving
 6    string documentPath = Path.Combine(OutputDir, "save-to-MD.md");
 7
 8    // Prepare HTML code
 9    var html_code = "<H2>Hello World!</H2>";
10
11    // Initialize a document from the string variable
12    using (var document = new HTMLDocument(html_code, "."))
13    {
14        // Save the document as a Markdown file
15        document.Save(documentPath, HTMLSaveFormat.Markdown);
16    }

For the more information how to use HTML Converter, please visit the Convert HTML to Markdown article.

Save SVG

Usually, you could see SVG as a part of an HTML file, it is used to represent the vector data on the page: images, icons, tables, etc. However, SVG also could be extracted from the web page and you can manipulate it in a similar way as the HTML document.

Since SVGDocument and HTMLDocument are based on the same WHATWG DOM standard, all operations such as loading, reading, editing, converting and saving are similar for both documents. So, all examples where you can see manipulation with the HTMLDocument are applicable for the SVGDocument as well.

To save your changes, please use follows:

 1using System.IO;
 2using Aspose.Html;
 3using Aspose.Html.Dom.Svg;
 4...
 5    // Prepare an output path for an SVG document saving
 6    string documentPath = Path.Combine(OutputDir, "save-to-SVG.svg");
 7
 8    // Prepare SVG code
 9    var code = @"
10        <svg xmlns='http://www.w3.org/2000/svg' height='200' width='300'>
11            <g fill='none' stroke-width= '10' stroke-dasharray='30 10'>
12                <path stroke='red' d='M 25 40 l 215 0' />
13                <path stroke='black' d='M 35 80 l 215 0' />
14                <path stroke='blue' d='M 45 120 l 215 0' />
15            </g>
16        </svg>";
17
18    // Initialize an SVG instance from the content string
19    using (var document = new SVGDocument(code, "."))
20    {
21        // Save the SVG file to a disk
22        document.Save(documentPath);
23    }

For more information about SVG Basics Drawing and and the API usage for the processing and rendering of SVG documents, see the Aspose.SVG for .NET Documentation.

You can download the complete examples and data files from GitHub.

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.