HTML Navigation Using Aspose.HTML for Java

In this article, you will learn how to navigate through an HTML document and perform a detailed inspection of its elements using the Aspose.HTML for Java API. You can easily create your own application to analyze, collect, or extract information from HTML documents since our API provides a powerful toolset to navigate the document using CSS Selector, XPath Query, or custom filters.

HTML navigation

There are many ways that can be used to make HTML navigation. The following shortlist shows the simplest way to access to all DOM elements using the Node class:

PropertyDescription
FirstChildAccessing this attribute of an element must return a reference to the first child node.
LastChildAccessing this attribute of an element must return a reference to the last child node
NextSiblingAccessing this attribute of an element must return a reference to the sibling node of that element which most immediately follows that element.
PreviousSiblingAccessing this attribute of an element must return a reference to the sibling node of that element which most immediately precedes that element.
ChildNodesReturns a list that contains all children of that element.

Using the mentioned properties, you can navigate through an HTML document as it follows:

 1// Navigate the HTML DOM using Java
 2
 3// Prepare HTML code
 4String html_code = "<span>Hello,</span> <span>World!</span>";
 5
 6// Initialize a document from the prepared code
 7HTMLDocument document = new HTMLDocument(html_code, ".");
 8
 9// Get the reference to the first child (first <span>) of the document body
10Element element = document.getBody().getFirstElementChild();
11System.out.println(element.getTextContent());
12// @output: Hello,
13
14// Get the reference to the second <span> element
15element = element.getNextElementSibling();
16System.out.println(element.getTextContent());
17// @output: World!

For the more complicated scenarios, when you need to find a node based on a specific pattern (e.g., get the list of headers, links, etc.), you can use a specialized TreeWalker or NodeIterator object with a custom Filter implementation.

The next example shows how to implement your own NodeFilter to skip all elements except images:

 1// Create custom NodeFilter to accept only image elements in Java
 2
 3public static class OnlyImageFilter extends NodeFilter {
 4    @Override
 5    public short acceptNode(Node n) {
 6        // The current filter skips all elements, except IMG elements
 7        return "img".equals(n.getLocalName())
 8                ? FILTER_ACCEPT
 9                : FILTER_SKIP;
10    }
11}

Once you implement a filter, you can use HTML navigation as it follows:

 1// Filter HTML elements using TreeWalker and custom NodeFilter in Aspose.HTML for Java
 2
 3// Prepare HTML code
 4String code = "    < p > Hello, </p >\n" +
 5        "    <img src = 'image1.png' >\n" +
 6        "    <img src = 'image2.png' >\n" +
 7        "    <p > World ! </p >\n";
 8
 9// Initialize a document based on the prepared code
10HTMLDocument document = new HTMLDocument(code, ".");
11
12// To start HTML navigation, we need to create an instance of TreeWalker
13// The specified parameters mean that it starts walking from the root of the document, iterating all nodes, and using our custom implementation of the filter
14ITreeWalker iterator = document.createTreeWalker(document, NodeFilter.SHOW_ALL, new NodeFilterUsageExample.OnlyImageFilter());
15// Use
16while (iterator.nextNode() != null) {
17    // Since we are using our own filter, the current node will always be an instance of the HTMLImageElement
18    // So, we don't need the additional validations here
19    HTMLImageElement image = (HTMLImageElement) iterator.getCurrentNode();
20
21    System.out.println(image.getSrc());
22    // @output: image1.png
23    // @output: image2.png
24}

XPath

The alternative to the HTML Navigation is XML Path Language. The syntax of the XPath expressions is quite simple and what is more important, it is easy to read and support.

The following example shows how to use XPath queries within Aspose.HTML for Java API:

 1// Select HTML elements using XPath expression in Aspose.HTML for Java
 2
 3// Prepare HTML code
 4String code = "< div class='happy' >\n" +
 5        "        <div >\n" +
 6        "            <span > Hello! </span >\n" +
 7        "        </div >\n" +
 8        "    </div >\n" +
 9        "    <p class='happy' >\n" +
10        "        <span > World! </span >\n" +
11        "    </p >\n";
12
13// Initialize a document based on the prepared code
14HTMLDocument document = new HTMLDocument(code, ".");
15
16// Here, we evaluate the XPath expression where we select all child <span> elements from elements whose 'class' attribute equals to 'happy'
17IXPathResult result = document.evaluate("//*[@class='happy']//span",
18        document,
19        null,
20        XPathResultType.Any,
21        null
22);
23
24// Iterate over the resulted nodes
25for (Node node; (node = result.iterateNext()) != null; ) {
26    System.out.println(node.getTextContent());
27    // @output: Hello!
28    // @output: World!
29}

CSS Selector

Along with HTML Navigation and XPath you can use CSS Selector API that is also supported by our library. This API is designed to create a search pattern to match elements in a document tree based on CSS Selectors syntax.

 1// Select HTML elements using CSS selector querySelectorAll method in Aspose.HTML for Java
 2
 3// Prepare HTML code
 4String code = "< div class='happy' >\n" +
 5        "        <div >\n" +
 6        "            <span > Hello, </span >\n" +
 7        "        </div >\n" +
 8        "    </div >\n" +
 9        "    <p class='happy' >\n" +
10        "        <span > World ! </span >\n" +
11        "    </p >\n";
12
13// Initialize a document based on the prepared code
14HTMLDocument document = new HTMLDocument(code, ".");
15
16// Here, we create a CSS Selector that extracts all elements whose 'class' attribute equals to 'happy' and their child SPAN elements
17NodeList elements = document.querySelectorAll(".happy span");
18
19// Iterate over the resulted list of elements
20elements.forEach(element -> {
21    System.out.println(((HTMLElement) element).getInnerHTML());
22    // @output: Hello,
23    // @output: World!
24});

Aspose.HTML offers AI Keyword Extractor, an AI-powered tool for extracting keywords from web pages, plain text, or files. This app helps you quickly identify key topics and trends for website optimization, competitor analysis, or summarizing large documents. Simply paste the text or URL, select the settings, and click “Extract” to get accurate, meaningful keywords in seconds. Ideal for improving search engine visibility, content targeting, and data-driven decision making.

Text “AI Keyword Extractor”

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.