Save a Website or Web Page Using Java

Why Save a Website as HTML?

To ensure uninterrupted access to important content, it is helpful to save websites for offline use. Converting a website to HTML gives you the ability to view pages without an internet connection, whether for research, education, or entertainment. You can save web pages for analysis, creating backups, or learning about web design and templates. Saving web pages gives you flexibility, control, and a sense of peace of mind.

This article provides ways to save website using Aspose.HTML for Java API. You can customize the process – save an entire website or save a webpage.

How to Save a Webpage in Java

You can use Aspose.HTML for Java library to save web page as HTML for offline reading without any hassles. You should take a few following steps:

  1. Use the HTMLDocument(Url) constructor to create an instance of the HTMLDocument class by passing the URL of the web page you want to download.
  2. Prepare a local file path where the HTML content will be saved.
  3. Call the save(savePath) or the save(savePath, options) method to save the downloaded HTML document to the specified location.

This Java example demonstrates how to save a webpage using the default save options. By default, only the specific web page and its related resources from the same domain are saved. Resources from external domains will not be downloaded.

1// Initialize an HTML document from a URL
2final HTMLDocument document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
3// Prepare a path to save the downloaded file
4String savePath = "root/result.html";
5
6// Save the HTML document to the specified file
7document.save(savePath);

Using the functionality described in this article, you can save both individual pages and entire websites with resources. To customize the saving process, you can specify the resource handling options for the HTMLSaveOptions object.

The HTMLSaveOptions and ResourceHandlingOptions classes allow you to customize how resources, such as images, CSS, and JavaScript files, are handled when saving an HTML document. It includes properties for setting default resource handling behavior (such as preserving, ignoring, or embedding resources), controlling JavaScript handling, specifying the maximum depth of linked pages to save, and applying URL restrictions to both pages and resources. These classes provide fine-grained control over how external content is managed during the save process, allowing you to tailor the behavior to different needs, such as preserving entire websites or retrieving specific resources.

Save Website using setJavaScript()

The JavaScript property defines how JavaScript scripts are handled when saving a webpage. Options include Save, Ignore, Discard, and Embed, with the default set to Save. The following Java example shows how to save website and embed all the JavaScripts to be saved into resulted HTML document.

To save website from URL you should take a few following steps:

  1. Use the HTMLDocument(Url) constructor to initialize an HTML document from a specified URL.
  2. Create an instance of the HTMLSaveOptions class to configure how the document will be saved.
  3. Modify the ResourceHandlingOptions property within HTMLSaveOptions to set the handling behavior for JavaScript.
  4. Call the save(savePath, options) method to save the document with the desired settings.

In the following example, the ResourceHandling.Embed option specifies that any JavaScript resources should be embedded directly into the HTML document when saved. This means that instead of linking to external .js files, the JavaScript content will be included within the saved HTML file itself, ensuring that all necessary scripts are contained in the same document for offline access or distribution. This approach eliminates the need for external file references and helps to preserve the integrity of the webpage when transferred or saved.

 1// Initialize an HTML document from a URL
 2final HTMLDocument document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
 3
 4// Create an HTMLSaveOptions object and set the JavaScript property
 5HTMLSaveOptions options = new HTMLSaveOptions();
 6options.getResourceHandlingOptions().setJavaScript(ResourceHandling.Embed);
 7
 8// Prepare a path to save the downloaded file
 9String savePath = "rootAndEmbedJs/result.html";
10
11// Save the HTML document to the specified file
12document.save(savePath, options);

Save Website using setMaxHandlingDepth()

The MaxHandlingDepth property defines the maximum depth of the HTML document element hierarchy that should be loaded and processed. By limiting this depth, the property helps optimize performance, reducing the memory and processing power needed during the saving process. By default, only the open document and its direct resources are saved as an individual web page. Setting this property to 1 ensures that only elements up to a depth of 1 in the HTML hierarchy are saved, including the document itself and the directly linked pages.

 1// Load an HTML document from a URL
 2final HTMLDocument document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
 3
 4// Create an HTMLSaveOptions object and set the MaxHandlingDepth property
 5HTMLSaveOptions options = new HTMLSaveOptions();
 6options.getResourceHandlingOptions().setMaxHandlingDepth(1);
 7
 8// Prepare a path for downloaded file saving
 9String savePath = "rootAndAdjacent/result.html";
10
11// Save the HTML document to the specified file
12document.save(savePath, options);

Save Website using setPageUrlRestriction()

Aspose.HTML for Java offers various options for filtering page URLs when saving a website. The PageUrlRestriction property controls which web pages can be loaded based on their URLs or domains during the saving process. By default, this property is set to RootAndSubFolders, which means that only pages in the root and its subfolders are processed. You can change this property to apply different restrictions as needed.

In the following example, all pages referenced by the HTML document and pages on the same domain will be saved in addition to the document:

 1// Initialize an HTML document from a URL
 2final HTMLDocument document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
 3
 4// Create an HTMLSaveOptions object and set MaxHandlingDepth and PageUrlRestriction properties
 5HTMLSaveOptions options = new HTMLSaveOptions();
 6options.getResourceHandlingOptions().setMaxHandlingDepth(1);
 7options.getResourceHandlingOptions().setPageUrlRestriction(UrlRestriction.SameHost);
 8
 9// Prepare a path to save the downloaded file
10String savePath = "rootAndManyAdjacent/result.html";
11
12// Save the HTML document to the specified file
13document.save(savePath, options);

Aspose.HTML provides a set of free online HTML Web Applications, including converters, mergers, SEO tools, HTML code generators, URL utilities, and more. These browser-based tools work on all operating systems and don’t require any additional software installation. Whether you need to convert or merge files, extract web data, generate HTML code, or analyze pages for SEO, you can do it all right on the web. Streamline your daily tasks and increase your productivity with our easy-to-use HTML Web Apps – anytime, anywhere.

Text “HTML Web Applications”

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.