Extract Images From Website in Java

If you are a web developer, graphic designer, researcher, journalist, student, or just working on a personal project, you will probably need images and the ability to collect them from websites. Manually saving images – copying each URL and downloading them one by one – can be time-consuming and inefficient. However, you can use the Aspose.HTML for Java library library to automate this process and extract images from a website programmatically.

This article will explore how to extract different types of images from a website programmatically using Java. With Aspose.HTML for Java, you can easily create a tool that parses an HTML page, identifies image sources, and downloads those images. It is a powerful solution for anyone who needs to collect images for analysis, archiving, or content creation – without the hassle of doing it manually.

Extract Images from Website

Most pictures in an HTML document are represented using the <img> element. The following code snippet demonstrates how to use Aspose.HTML for Java to find images specified by this element. So, to download images from website, you should take a few following steps:

  1. Use the HTMLDocument(Url) constructor to create an instance of HTMLDocument passing the URL of the web page you want to process.
  2. Call the getElementsByTagName(“img”) method to retrieve all <img> elements from the document. Method returns a collection of <img> elements present on the page.
  3. Iterate through the <img> elements and use the getAttribute(“src”) method to get the value of each image’s src attribute. Each src is added to the urls set.
  4. Use the Url class along with the BaseURI property of the document to convert relative image paths into absolute URLs.
  5. For each absolute image URL, create a request using the RequestMessage(url) constructor and send it using document.getContext().getNetwork().send(request). This returns a ResponseMessage.
  6. If the response indicates success, extract the image data using response.getContent().readAsByteArray() and save it to your local file system using FileHelper.writeAllBytes().
 1// Open a document you want to download images from
 2final HTMLDocument document = new HTMLDocument("https://docs.aspose.com/svg/net/drawing-basics/svg-shapes/");
 3
 4// Collect all <img> elements
 5HTMLCollection images = document.getElementsByTagName("img");
 6
 7// Create a distinct collection of relative image URLs
 8Iterator<Element> iterator = images.iterator();
 9java.util.Set<String> urls = new HashSet<>();
10for (Element e : images) {
11    urls.add(e.getAttribute("src"));
12}
13
14// Create absolute image URLs
15java.util.List<Url> absUrls = urls.stream()
16    .map(src -> new Url(src, document.getBaseURI()))
17    .collect(Collectors.toList());
18
19// foreach to while statements conversion
20for (Url url : absUrls) {
21    // Create an image request message
22    final RequestMessage request = new RequestMessage(url);
23
24    // Extract image
25    final ResponseMessage response = document.getContext().getNetwork().send(request);
26
27    // Check whether a response is successful
28    if (response.isSuccess()) {
29        String[] split = url.getPathname().split("/");
30        String path = split[split.length - 1];
31
32        // Save file to a local file system
33        FileHelper.writeAllBytes(path, response.getContent().readAsByteArray());
34    }
35}

This simple and effective solution enables you to automate the image extraction process, saving you valuable time.

Note: Always respect copyright laws and make sure you have the necessary permissions or licenses before using saved images for commercial purposes. We do not support the extraction and use of content from third-party sources for commercial purposes without proper permission.

Extract Icons from Website

Icons in HTML documents are typically defined using <link> elements with the rel="icon" attribute. To extract icons from a website using Aspose.HTML for Java, follow these steps:

  1. Load the web page using the HTMLDocument(Url) constructor, passing in the URL of the website you want to analyze.
  2. Use the getElementsByTagName(“link”) method to collect all <link> elements from the document.
  3. Filter the results to include only elements where the rel attribute is set to "icon", as these define icon links.
  4. Extract relative URLs by calling getAttribute(“href”) on each filtered <link> element.
  5. Create absolute icon URLs using the Url class and the getBaseURI() mrthod of HTMLDocument.
  6. Send a request for each icon using the RequestMessage class and the document.getContext().getNetwork().send() method.
  7. Check the response, and if successful, save the icon locally using FileHelper.writeAllBytes(). As a result, all website icons referenced in the HTML will be downloaded and saved to your local file system.
 1// Open a document you want to download icons from
 2final HTMLDocument document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
 3
 4// Collect all <link> elements
 5HTMLCollection links = document.getElementsByTagName("link");
 6
 7// Leave only "icon" elements
 8java.util.Set<Element> icons = new HashSet<>();
 9for (Element link : links) {
10    if ("icon".equals(link.getAttribute("rel"))) {
11        icons.add(link);
12    }
13}
14
15// Create a distinct collection of relative icon URLs
16java.util.Set<String> urls = new HashSet<>();
17for (Element icon : icons) {
18    urls.add(icon.getAttribute("href"));
19}
20
21// Create absolute image URLs
22java.util.List<Url> absUrls = urls.stream()
23    .map(src -> new Url(src, document.getBaseURI()))
24    .collect(Collectors.toList());
25
26// foreach to while statements conversion
27for (Url url : absUrls) {
28    // Create a downloading request
29    final RequestMessage request = new RequestMessage(url);
30
31    // Extract icon
32    final ResponseMessage response = document.getContext().getNetwork().send(request);
33
34    // Check whether a response is successful
35    if (response.isSuccess()) {
36        String[] split = url.getPathname().split("/");
37        String path = split[split.length - 1];
38
39        // Save file to a local file system
40        FileHelper.writeAllBytes(path, response.getContent().readAsByteArray());
41    }
42}

You can use these Java examples to automate extracting all images from website, which can be helpful for tasks such as archiving, researching, analyzing web content, or any other application for personal use. Also, this is great for web designers and developers wanting to pull images from sites without diving into the source code.

Aspose.HTML provides a set of free online HTML Web Applications, including converters, mergers, SEO tools, HTML code generators, URL utilities, and more. These browser-based tools work on any operating system and require no additional software installation. Whether you need to convert or merge files, extract web data, generate HTML code, or analyze pages for SEO, you can do it all right on the web. Streamline your daily tasks and increase your productivity with our easy-to-use HTML Web Apps – anytime, anywhere.

Text “HTML Web Applications”

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.