Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.
What is this page about?
This page describes how to extract selected content between nodes in a document.
When working with documents, it is important to be able to easily extract content from a specific range within a document. However, the content may consist of complex elements such as paragraphs, tables, images, etc.
Regardless of what content needs to be extracted, the method to extract that content will always be determined by which nodes are selected to extract content between. These can be entire text bodies or simple text runs.
There are many possible situations and therefore many different node types to consider when extracting content. For example, you might want to extract content between:
In some situations, you may even need to combine different node types, such as extracting content between a paragraph and a field, or between a run and a bookmark.
This article provides the code implementation for extracting text between different nodes, as well as examples of common scenarios.
Often the goal of extracting the content is to duplicate or save it separately in a new document. For example, you can extract content and:
This can be easily achieved using Aspose.Words and the code implementation below.
The code in this section addresses all of the possible situations described above with one generalized and reusable method. The general outline of this technique involves:
To extract the content from your document you need to call the ExtractContent method below and pass the appropriate parameters. The underlying basis of this method involves finding block level nodes (paragraphs and tables) and cloning them to create identical copies. If the marker nodes passed are block level then the method is able to simply copy the content on that level and add it to the array.
However if the marker nodes are inline (a child of a paragraph) then the situation becomes more complex, as it is necessary to split the paragraph at the inline node, be it a run, bookmark fields etc. Content in the cloned parent nodes not present between the markers is removed. This process is used to ensure that the inline nodes will still retain the formatting of the parent paragraph. The method will also run checks on the nodes passed as parameters and throws an exception if either node is invalid. The parameters to be passed to this method are:
The implementation of the ExtractContent method you can find on Aspose.Words GitHub. This method will be referred to in the scenarios in this article.
We will also define a custom method to easily generate a document from extracted nodes. This method is used in many of the scenarios below and simply creates a new document and imports the extracted content into it.
The following code example shows how to take a list of nodes and inserts them into a new document:
… (remaining content unchanged) …
You may need to extract document images to perform some tasks. Aspose.Words allows you to do this as well.
The following code example shows how to extract images from a document:
Q: How do I control whether the start and end marker nodes are included in the extracted content?
A: Pass a Boolean value to the IsInclusive parameter of the ExtractContent method. Set it to true to include the marker nodes (e.g., the whole field or bookmark), or false to exclude them and extract only the content between the markers.
Q: Can I extract content between nodes that belong to different sections of the document?
A: Yes. Retrieve the desired nodes from their respective sections (e.g., using Document.GetChild or Section.FirstParagraph) and pass those nodes to ExtractContent. The method works across sections as long as both nodes belong to the same Document instance.
Q: After extracting nodes, how can I obtain plain text without any formatting or control characters?
A: Create a new Document, import the extracted nodes, then either call newDocument.GetText() for raw text with control characters or newDocument.Save(stream, SaveFormat.Text) to get clean plain‑text output. Using SaveFormat.Text removes formatting and Word control characters.
Q: Why does ExtractContent throw an exception about invalid nodes?
A: This usually occurs when the start or end node is null, belongs to a different Document, or the start node appears after the end node in the document order. Verify that both nodes are non‑null, belong to the same document, and that the start node precedes the end node before calling the method.
Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.