Website to HTML – Save Website – C#

Website to HTML

While wired internet or Wi-Fi is available everywhere these days, it happens that you may not have an internet connection from time to time; for example, between subway stops, on a trip, or on a flight. To be able to access information without a network connection, you need to save the contents of different websites in order to use them offline, perhaps for reading, research, or entertainment.

There are a few reasons why you would want to convert website to HTML:

This article provides ways to save website or how to save a webpage using Aspose.HTML for .NET API. You can customize the process – save an entire website or save a webpage.

How to Save a Webpage

You can use Aspose.HTML for .NET library to convert website to HTML for offline reading without any hassles. You should take a few following steps:

  1. Use the HTMLDocument(Url) constructor to load an HTMLdocument object from a URL to convert webpage to HTML.
  2. Create an instance of the HTMLSaveOptions class and set the required properties to customize the saving process. If you don’t initialize HTMLSaveOptions, the process will work with the default save options, as shown in the example below.
  3. Call the Save(savePath) or the Save(savePath, options) method to save website offline.

The following C# example shows how to save a webpage. With default save options, you only save a separate web page with related resources. Please note that only resources that are in the same domain as the site page will be saved.

1using System.IO;
2using Aspose.Html;
3...
4    // Initialize HTML document from a URL to convert webpage to HTML
5    using var document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
6    
7    // Save webpage
8    document.Save(Path.Combine(OutputDir, "root/result.html"));

Using the functionality described in this article, you can save both individual pages and entire websites with resources. To customize the saving process, you can specify the resource handling options for the HTMLSaveOptions object.

Resource Handling Options

Using the HTMLSaveOptions and ResourceHandlingOptions classes allows you to customize the saving process. For example, you can manipulate the depth of pages that will be handled and save an entire website or only save a webpage.

The table below summarizes the main properties of the ResourceHandlingOptions class. They provide options for handling maximum page depth, applying limits, or handling external resources such as images, CSS files, and JavaScript when loading an HTML document using the API.

PropertyDescription
DefaultGets or sets an enum representing the default resource handling method. Currently, Save, Ignore, and Embed values are supported. The default value is Save, meaning the resource will be saved as a file.
JavaScriptRepresents the way scripts are handled. Currently, Save, Ignore, Discard, and Embed values are supported. The default value is Save, meaning the resource will be saved as a file.
MaxHandlingDepthThis property contains information about the maximum depth of pages that will be handled. Using this property, you can manipulate the depth of pages that will be handled and save entire website or only save a web page. A depth of 1 means only pages directly referenced from the saved document will be handled. Setting this property to -1 will lead to the handling of all pages. The default value is 0.
PageUrlRestrictionContains information about restrictions applied to URLs of handled pages. The default value is RootAndSubFolders, meaning only resources in the root and subfolders are processed.
ResourceUrlRestrictionThis property contains information about restrictions applied to URLs of handled resources such as CSS, js, images, etc. The default value is SameHost, meaning only resources in the same host are processed.

Website to HTML using JavaScript Property

Aspose.HTML provides the ability to control the logic for saving scripts. They can be saved in separate files, embedded, or thrown out of the resulting document. The following C# example shows how to save website and embed all the JavaScripts to be saved into resulted HTML document.

To save website from URL you should take a few following steps:

  1. Use the HTMLDocument(Url) constructor to load an HTMLdocument object from a URL to convert website to HTML.
  2. Create an instance of the HTMLSaveOptions class and set the JavaScripts property with ResourceHandling.Embed value.
  3. Call the Save(savePath, options) method to save website offline.

In the following example, the ResourceHandling.Embed option specifies that any JavaScript resources should be embedded in the HTML document when saved. This means the resulting HTML file will contain all of the JavaScript resources within the document rather than referencing them as external files.

 1using System.IO;
 2using Aspose.Html;
 3using Aspose.Html.Saving;
 4...
 5    // Initialize HTML document from URL to convert website to HTML
 6    using var document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
 7
 8    // Create an HTMLSaveOptions object and set JavaScript property
 9    var options = new HTMLSaveOptions
10    {
11        ResourceHandlingOptions =
12        {
13            JavaScript = ResourceHandling.Embed
14        }
15    };
16
17    // Prepare a path to save HTML
18    string savePath = Path.Combine(OutputDir, "rootAndEmbedJs/result.html");
19
20    // Save website
21    document.Save(savePath, options);

Website to HTML using MaxHandlingDepth Property

The MaxHandlingDepth property specifies the maximum depth of the HTML document element hierarchy to be loaded and processed. The API will not load or process any elements beyond this depth. Therefore, the MaxHandlingDepth optimizes the performance of the saving process, helping to reduce the memory and processing power required by the API by limiting the number of elements to be processed.

Only the open document and its resources are saved by default as an individual web page, but you can control the handling depth with the MaxHandlingDepth property. The following example shows how to save not only the document but also all the pages to which it links and whose URL is nested relative to the URL of this page. Let’s look at the C# example when this property is set to 1, which means that only elements up to a depth of 1 in the HTML document hierarchy will be saved:

 1using System.IO;
 2using Aspose.Html;
 3using Aspose.Html.Saving;
 4...
 5    // Load an HTML document from URL to convert website to HTML
 6    using var document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
 7
 8    // Create an HTMLSaveOptions object and set MaxHandlingDepth property
 9    var options = new HTMLSaveOptions
10    {
11        ResourceHandlingOptions =
12        {
13            MaxHandlingDepth = 1
14        }
15    };
16
17    // Prepare a path for a file saving
18    string savePath = Path.Combine(OutputDir, "rootAndAdjacent/result.html");
19
20    // Save website offline
21    document.Save(savePath, options);

Save Website using PageUrlRestriction Property

Aspose.HTML for .NET provides various options for filtering the URLs of saved pages for a website. The PageUrlRestriction property restricts loading web pages from specific URLs or domains when saving an HTML document.

By default, the PageUrlRestriction property is set to RootAndSubFolders, meaning only pages in the root and subfolders are processed. However, you can set this property to another value – SameHost or None. Setting it to None will allow you to load web pages from any domain whose URLs are on the saved website. Using this property cautiously is essential, as allowing web pages to be loaded from any domain can increase the risk of security vulnerabilities.

In the following example, all pages to which an HTML document refers and those in the same domain will be saved in addition to the document:

 1using System.IO;
 2using Aspose.Html;
 3using Aspose.Html.Saving;
 4...
 5    // Initialize HTML document from URL to convert website to HTML
 6    using var document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
 7
 8    // Create an HTMLSaveOptions object and set MaxHandlingDepth and PageUrlRestriction properties
 9    var options = new HTMLSaveOptions
10    {
11        ResourceHandlingOptions =
12        {
13            MaxHandlingDepth = 1,
14            PageUrlRestriction = UrlRestriction.SameHost
15        }
16    };
17
18    // Prepare a path to save HTML
19    string savePath = Path.Combine(OutputDir, "rootAndManyAdjacent/result.html");
20
21    // Save website offline
22    document.Save(savePath, options);

You can download the complete C# examples and data files from GitHub.

Aspose.HTML offers HTML Web Applications that are an online collection of free converters, mergers, SEO tools, HTML code generators, URL tools, and more. The applications work on any operating system with a web browser and do not require any additional software installation. Easily convert, merge, encode, generate HTML code, extract data from the web, or analyze web pages in terms of SEO wherever you are. Use our collection of HTML Web Applications to perform your daily matters and make your workflow seamless!

Text “Banner HTML Web Applications”

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.