Website to HTML – Save Website – C#
Website to HTML
While wired internet or Wi-Fi is available everywhere these days, it happens that you may not have an internet connection from time to time; for example, between subway stops, on a trip, or on a flight. To be able to access information without a network connection, you need to save the contents of different websites in order to use them offline, perhaps for reading, research, or entertainment.
There are a few reasons why you would want to convert website to HTML:
- access the information without a network connection;
- extracting useful information from websites for analysis or other purposes;
- recording the page’s content lest the data becomes unavailable for whatever reason. Maybe you have experience with broken bookmarks;
- moving static HTML websites to another host;
- archiving content and creating backups;
- learning purposes such as design or webpage templates study and more.
This article provides ways to save website or how to save a webpage using Aspose.HTML for .NET API. You can customize the process – save an entire website or save a webpage.
How to Save a Webpage
You can use Aspose.HTML for .NET library to convert website to HTML for offline reading without any hassles. You should take a few following steps:
- Use the
HTMLDocument(
Url
) constructor to load anHTMLdocument
object from a URL to convert webpage to HTML. - Create an instance of the
HTMLSaveOptions class and set the required properties to customize the saving process. If you don’t initialize
HTMLSaveOptions
, the process will work with the default save options, as shown in the example below. - Call the
Save(
savePath)
or the Save(savePath
,options
) method to save website offline.
The following C# example shows how to save a webpage. With default save options, you only save a separate web page with related resources. Please note that only resources that are in the same domain as the site page will be saved.
1using System.IO;
2using Aspose.Html;
3...
4 // Initialize HTML document from a URL to convert webpage to HTML
5 using var document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
6
7 // Save webpage
8 document.Save(Path.Combine(OutputDir, "root/result.html"));
Using the functionality described in this article, you can save both individual pages and entire websites with resources. To customize the saving process, you can specify the resource handling options for the HTMLSaveOptions object.
Resource Handling Options
Using the HTMLSaveOptions and ResourceHandlingOptions classes allows you to customize the saving process. For example, you can manipulate the depth of pages that will be handled and save an entire website or only save a webpage.
The table below summarizes the main properties of the ResourceHandlingOptions class. They provide options for handling maximum page depth, applying limits, or handling external resources such as images, CSS files, and JavaScript when loading an HTML document using the API.
Property | Description |
---|---|
Default | Gets or sets an enum representing the default resource handling method. Currently, Save , Ignore , and Embed values are supported. The default value is Save , meaning the resource will be saved as a file. |
JavaScript | Represents the way scripts are handled. Currently, Save , Ignore , Discard , and Embed values are supported. The default value is Save , meaning the resource will be saved as a file. |
MaxHandlingDepth | This property contains information about the maximum depth of pages that will be handled. Using this property, you can manipulate the depth of pages that will be handled and save entire website or only save a web page. A depth of 1 means only pages directly referenced from the saved document will be handled. Setting this property to -1 will lead to the handling of all pages. The default value is 0. |
PageUrlRestriction | Contains information about restrictions applied to URLs of handled pages. The default value is RootAndSubFolders , meaning only resources in the root and subfolders are processed. |
ResourceUrlRestriction | This property contains information about restrictions applied to URLs of handled resources such as CSS, js, images, etc. The default value is SameHost , meaning only resources in the same host are processed. |
Website to HTML using JavaScript
Property
Aspose.HTML provides the ability to control the logic for saving scripts. They can be saved in separate files, embedded, or thrown out of the resulting document. The following C# example shows how to save website and embed all the JavaScripts to be saved into resulted HTML document.
To save website from URL you should take a few following steps:
- Use the
HTMLDocument(
Url
) constructor to load an HTMLdocument object from a URL to convert website to HTML. - Create an instance of the
HTMLSaveOptions class and set the
JavaScripts property with
ResourceHandling.Embed
value. - Call the
Save(
savePath
,options
) method to save website offline.
In the following example, the ResourceHandling.Embed
option specifies that any JavaScript
resources should be embedded in the HTML document when saved. This means the resulting HTML file will contain all of the JavaScript
resources within the document rather than referencing them as external files.
1using System.IO;
2using Aspose.Html;
3using Aspose.Html.Saving;
4...
5 // Initialize HTML document from URL to convert website to HTML
6 using var document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
7
8 // Create an HTMLSaveOptions object and set JavaScript property
9 var options = new HTMLSaveOptions
10 {
11 ResourceHandlingOptions =
12 {
13 JavaScript = ResourceHandling.Embed
14 }
15 };
16
17 // Prepare a path to save HTML
18 string savePath = Path.Combine(OutputDir, "rootAndEmbedJs/result.html");
19
20 // Save website
21 document.Save(savePath, options);
Website to HTML using MaxHandlingDepth
Property
The
MaxHandlingDepth property specifies the maximum depth of the HTML document element hierarchy to be loaded and processed. The API will not load or process any elements beyond this depth. Therefore, the MaxHandlingDepth
optimizes the performance of the saving process, helping to reduce the memory and processing power required by the API by limiting the number of elements to be processed.
Only the open document and its resources are saved by default as an individual web page, but you can control the handling depth with the MaxHandlingDepth
property. The following example shows how to save not only the document but also all the pages to which it links and whose URL is nested relative to the URL of this page. Let’s look at the C# example when this property is set to 1, which means that only elements up to a depth of 1 in the HTML document hierarchy will be saved:
1using System.IO;
2using Aspose.Html;
3using Aspose.Html.Saving;
4...
5 // Load an HTML document from URL to convert website to HTML
6 using var document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
7
8 // Create an HTMLSaveOptions object and set MaxHandlingDepth property
9 var options = new HTMLSaveOptions
10 {
11 ResourceHandlingOptions =
12 {
13 MaxHandlingDepth = 1
14 }
15 };
16
17 // Prepare a path for a file saving
18 string savePath = Path.Combine(OutputDir, "rootAndAdjacent/result.html");
19
20 // Save website offline
21 document.Save(savePath, options);
Save Website using PageUrlRestriction
Property
Aspose.HTML for .NET provides various options for filtering the URLs of saved pages for a website. The PageUrlRestriction property restricts loading web pages from specific URLs or domains when saving an HTML document.
By default, the PageUrlRestriction
property is set to RootAndSubFolders
, meaning only pages in the root and subfolders are processed. However, you can set this property to another value – SameHost
or None
. Setting it to None
will allow you to load web pages from any domain whose URLs are on the saved website. Using this property cautiously is essential, as allowing web pages to be loaded from any domain can increase the risk of security vulnerabilities.
In the following example, all pages to which an HTML document refers and those in the same domain will be saved in addition to the document:
1using System.IO;
2using Aspose.Html;
3using Aspose.Html.Saving;
4...
5 // Initialize HTML document from URL to convert website to HTML
6 using var document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
7
8 // Create an HTMLSaveOptions object and set MaxHandlingDepth and PageUrlRestriction properties
9 var options = new HTMLSaveOptions
10 {
11 ResourceHandlingOptions =
12 {
13 MaxHandlingDepth = 1,
14 PageUrlRestriction = UrlRestriction.SameHost
15 }
16 };
17
18 // Prepare a path to save HTML
19 string savePath = Path.Combine(OutputDir, "rootAndManyAdjacent/result.html");
20
21 // Save website offline
22 document.Save(savePath, options);
You can download the complete C# examples and data files from GitHub.
Aspose.HTML offers HTML Web Applications that are an online collection of free converters, mergers, SEO tools, HTML code generators, URL tools, and more. The applications work on any operating system with a web browser and do not require any additional software installation. Easily convert, merge, encode, generate HTML code, extract data from the web, or analyze web pages in terms of SEO wherever you are. Use our collection of HTML Web Applications to perform your daily matters and make your workflow seamless!