How to use XPath – Evaluate() method

XPath

XPath (XML Path Language) provides a flexible way of pointing to different parts of an XML-based document using a non-XML syntax. The name of the XPath derives from the path expression, which provides a means of hierarchic addressing of the nodes in a document tree. XPath is a query language, with lots of possibilities. XPath allows the processing of values conforming to the DOM data model; it is based on a DOM representation of the HTML document and selects nodes by various criteria. XPath expressions can also be used in C and C++, JavaScript, XML Schema, PHP, Python, and many other languages.

XPath is primarily used to navigate the DOM of an XML-based language document using XPath expressions. You can use XPath in HTML and SVG instead of relying on the GetElement() or QuerySelectorAll() methods and other DOM functions.

This article introduces how to use Evaluate() method to navigate through the HTML document and select nodes by XPath Query. You will learn, how to select all photos from an HTML document using XPath expressions.

You can download the data files and complete C# examples that demonstrate the use of the Evaluate() method for XPath Queries from GitHub.

Evaluate() method

XPath queries are mainly made using the Document class’s Evaluate() method. The Evaluate(expression, contextNode, resolver, type, result) method accepts an XPath expression and other given parameters and returns a result of the specified type. Let’s consider the method parameters:

expression is a string representation of the XPath to be evaluated.
contextNode specifies the context node for the evaluation of the XPath expression. It’s common to pass the document as the context node.
resolver permits the translation of all prefixes, including the xml namespace prefix, within the XPath expression into appropriate namespace URIs.
type corresponds to the type of result XPathResult to return. If a specific type is specified, then the result will be returned as the corresponding type.
result specifies a specific result object which may be reused and returned by this method. null is the most common and will create a new XPathResult.

XPath Query – Get images from a web page

Often you want to save a large number of images from different services, such as photos from a particular album. It’s possible to do it by hand, but it would take a considerable amount of time. Thus, it is long and inefficient, so you can use Aspose.HTML library to automate this process.

In this example, you will learn how to find links to all desired images on a web page using Evaluate() method and XPath expression. XPath is a powerful query language that gives you a lot of freedom to customize queries. Let’s take a look at the HTML document xpath-image.htm. It consists of a header and footer containing advertising images, as well as the main element that contains rows of photos interspersed with advertising banners.

Get links to all images in the document

Let’s start with a straightforward XPath Query for all the images in the document. The following uses the XPath expression //img. It selects all img elements no matter where they are in the document:

XPath Expression

1//img

C# code

1var result = doc.Evaluate("//img", doc, doc.CreateNSResolver(doc), XPathResultType.Any, null);

JavaScript code

1var result = document.evaluate("//img", document, null, XPathResult.ANY_TYPE, null);

This XPath Query will return all links to images (photos, banners) from the header and footer, as well as those that are between the rows of photos and among them.

Get rid of banners in the header and footer

First, let’s get rid of the banners in the header and footer, there are many ways to do this, but in this example, we will set the filtering by a parent. The XPath query //main//img returns all nested //img elements inside all //main elements. This result is already better suited for the request to get photos from the document but still contains extra banners.

XPath Expression

1//main//img

C# code

1var result = doc.Evaluate("//main//img", doc, doc.CreateNSResolver(doc), XPathResultType.Any, null);

JavaScript code

1var result = document.evaluate("//main//img", document, null, XPathResult.ANY_TYPE, null);

Get rid of some banners in the “main” container

In the next step let’s get rid of banners in the even /div children of the main container. This XPath expression lets you select all /div child elements whose position number will give the remainder when divided by 2, i.e. odd:

XPath Expression

1//main/div[position() mod 2 = 1]//img

C# code

1var result = doc.Evaluate("//main/div[position() mod 2 = 1]//img", doc, doc.CreateNSResolver(doc), XPathResultType.Any, null);

JavaScript code

1var result = document.evaluate("//main/div[position() mod 2 = 1]//img", document, null, XPathResult.ANY_TYPE, null);

So, we got a list containing links to photos and banners ads located in all odd div elements that are children of the main container.

Get only links to photos from the HTML document

To get rid of banner ads located among photos, the XPath expression must include the name of the image class because all the photos in the rows have the corresponding class photo:

XPath Expression

1//main/div[position() mod 2 = 1]//img[@class = 'photo']

C# code

1var result = doc.Evaluate("//main/div[position() mod 2 = 1]//img[@class = 'photo']", doc, doc.CreateNSResolver(doc), XPathResultType.Any, null);

JavaScript code

1var result = document.evaluate("//main/div[position() mod 2 = 1]//img[@class = 'photo']", document, null, XPathResult.ANY_TYPE, null);

As a result, we got a list containing only links to photos. So the only thing left to do is download them.

C# Example – Get only links to photos from the HTML document

Let’s consider the C# example of how to use Evaluate() method to select all photos from an HTML document using XPath expressions. You should follow a few steps:

Load an existing HTML file ( xpath-image.htm).
Use the Evaluate() method of the Document class and pass XPath expression and other parameters to it.
Iterate over the resulted nodes and print them to the console.
You will get a list containing only links to photos from the HTML document.

 1// Use XPath to get only links to photos from HTML
 2
 3// Create an instance of an HTML document
 4using (HTMLDocument doc = new HTMLDocument(Path.Combine(DataDir, "xpath-image.htm")))
 5{
 6    // Evaluate the XPath expression
 7    IXPathResult result = doc.Evaluate("//main/div[position() mod 2 = 1]//img[@class = 'photo']", doc, doc.CreateNSResolver(doc), XPathResultType.Any, null);
 8    // Iterate over the resulted nodes and print them to the console
 9    Node node;
10    while ((node = result.IterateNext()) != null)
11    {
12        HTMLImageElement img = (HTMLImageElement)node;
13        Console.WriteLine(img.Src);
14    }
15}

Example-UseXPathToGetLinksToPhotos.cs hosted with ❤ by GitHub

Aspose.HTML offers free HTML Web Applications that are an online collection of converters, mergers, SEO tools, HTML code generators, URL tools, and more. The applications work on any operating system with a web browser and do not require any additional software installation. It’s a fast and easy way to efficiently and effectively solve your HTML-related tasks.

How to use CSS Selector How To Use XPath To Select XML Nodes

Analyzing your prompt, please hold on...

An error occurred while retrieving the results. Please refresh the page and try again.