Extract Images From Website in Java
If you are a web developer, graphic designer, researcher, journalist, student, or just working on a personal project, you will probably need images and the ability to collect them from websites. Manually saving images – copying each URL and downloading them one by one – can be time-consuming and inefficient. However, you can use the Aspose.HTML for Java library library to automate this process and extract images from a website programmatically.
This article will explore how to extract different types of images from a website programmatically using Java. With Aspose.HTML for Java, you can easily create a tool that parses an HTML page, identifies image sources, and downloads those images. It is a powerful solution for anyone who needs to collect images for analysis, archiving, or content creation – without the hassle of doing it manually.
Extract Images from Website
Most pictures in an HTML document are represented using the <img>
element. The following code snippet demonstrates how to use Aspose.HTML for Java to find images specified by this element. So, to download images from website, you should take a few following steps:
- Use the
HTMLDocument(Url) constructor to create an instance of
HTMLDocument
passing the URL of the web page you want to process. - Call the
getElementsByTagName(“img”) method to retrieve all
<img>
elements from the document. Method returns a collection of<img>
elements present on the page. - Iterate through the
<img>
elements and use the getAttribute(“src”) method to get the value of each image’ssrc
attribute. Eachsrc
is added to theurls
set. - Use the
Url class along with the
BaseURI
property of the document to convert relative image paths into absolute URLs. - For each absolute image URL, create a request using the
RequestMessage(url) constructor and send it using
document.getContext().getNetwork().send(request)
. This returns aResponseMessage
. - If the response indicates success, extract the image data using
response.getContent().readAsByteArray()
and save it to your local file system usingFileHelper.writeAllBytes()
.
1// Open a document you want to download images from
2final HTMLDocument document = new HTMLDocument("https://docs.aspose.com/svg/net/drawing-basics/svg-shapes/");
3
4// Collect all <img> elements
5HTMLCollection images = document.getElementsByTagName("img");
6
7// Create a distinct collection of relative image URLs
8Iterator<Element> iterator = images.iterator();
9java.util.Set<String> urls = new HashSet<>();
10for (Element e : images) {
11 urls.add(e.getAttribute("src"));
12}
13
14// Create absolute image URLs
15java.util.List<Url> absUrls = urls.stream()
16 .map(src -> new Url(src, document.getBaseURI()))
17 .collect(Collectors.toList());
18
19// foreach to while statements conversion
20for (Url url : absUrls) {
21 // Create an image request message
22 final RequestMessage request = new RequestMessage(url);
23
24 // Extract image
25 final ResponseMessage response = document.getContext().getNetwork().send(request);
26
27 // Check whether a response is successful
28 if (response.isSuccess()) {
29 String[] split = url.getPathname().split("/");
30 String path = split[split.length - 1];
31
32 // Save file to a local file system
33 FileHelper.writeAllBytes(path, response.getContent().readAsByteArray());
34 }
35}
This simple and effective solution enables you to automate the image extraction process, saving you valuable time.
Note: Always respect copyright laws and make sure you have the necessary permissions or licenses before using saved images for commercial purposes. We do not support the extraction and use of content from third-party sources for commercial purposes without proper permission.
Extract Icons from Website
Icons in HTML documents are typically defined using <link>
elements with the rel="icon"
attribute. To extract icons from a website using Aspose.HTML for Java, follow these steps:
- Load the web page using the HTMLDocument(Url) constructor, passing in the URL of the website you want to analyze.
- Use the
getElementsByTagName(“link”) method to collect all
<link>
elements from the document. - Filter the results to include only elements where the
rel
attribute is set to"icon"
, as these define icon links. - Extract relative URLs by calling
getAttribute(“href”) on each filtered
<link>
element. - Create absolute icon URLs using the
Url class and the
getBaseURI()
mrthod ofHTMLDocument
. - Send a request for each icon using the
RequestMessage class and the
document.getContext().getNetwork().send()
method. - Check the response, and if successful, save the icon locally using
FileHelper.writeAllBytes()
. As a result, all website icons referenced in the HTML will be downloaded and saved to your local file system.
1// Open a document you want to download icons from
2final HTMLDocument document = new HTMLDocument("https://docs.aspose.com/html/net/message-handlers/");
3
4// Collect all <link> elements
5HTMLCollection links = document.getElementsByTagName("link");
6
7// Leave only "icon" elements
8java.util.Set<Element> icons = new HashSet<>();
9for (Element link : links) {
10 if ("icon".equals(link.getAttribute("rel"))) {
11 icons.add(link);
12 }
13}
14
15// Create a distinct collection of relative icon URLs
16java.util.Set<String> urls = new HashSet<>();
17for (Element icon : icons) {
18 urls.add(icon.getAttribute("href"));
19}
20
21// Create absolute image URLs
22java.util.List<Url> absUrls = urls.stream()
23 .map(src -> new Url(src, document.getBaseURI()))
24 .collect(Collectors.toList());
25
26// foreach to while statements conversion
27for (Url url : absUrls) {
28 // Create a downloading request
29 final RequestMessage request = new RequestMessage(url);
30
31 // Extract icon
32 final ResponseMessage response = document.getContext().getNetwork().send(request);
33
34 // Check whether a response is successful
35 if (response.isSuccess()) {
36 String[] split = url.getPathname().split("/");
37 String path = split[split.length - 1];
38
39 // Save file to a local file system
40 FileHelper.writeAllBytes(path, response.getContent().readAsByteArray());
41 }
42}
You can use these Java examples to automate extracting all images from website, which can be helpful for tasks such as archiving, researching, analyzing web content, or any other application for personal use. Also, this is great for web designers and developers wanting to pull images from sites without diving into the source code.
Aspose.HTML provides a set of free online HTML Web Applications, including converters, mergers, SEO tools, HTML code generators, URL utilities, and more. These browser-based tools work on any operating system and require no additional software installation. Whether you need to convert or merge files, extract web data, generate HTML code, or analyze pages for SEO, you can do it all right on the web. Streamline your daily tasks and increase your productivity with our easy-to-use HTML Web Apps – anytime, anywhere.