Aspose.Words Document Object Model (DOM)
The Aspose.Words Document Object Model (DOM) is an in-memory representation of a Word document. The Aspose.Words DOM allows you to programmatically read, manipulate, and modify the content and formatting of a Word document.
This section describes the main classes of the Aspose.Words DOM and their relationships. By using the Aspose.Words DOM classes, you can obtain programmatic access to document elements and formatting.
Create Document Object Tree
When a document is read into the Aspose.Words DOM, then an object tree is built and different types of elements of the source document have their own DOM tree objects with various properties.
Build Document Nodes Tree
When Aspose.Words reads a Word document into memory, it creates objects of different types that represent various document elements. Every run of a text, paragraph, table, or section is a node, and even the document itself is a node. Aspose.Words defines a class for every document node type.
The document tree in Aspose.Words follows the Composite Design Pattern:
- All node classes ultimately derive from the Node class, which is the base class in the Aspose.Words Document Object Model.
- Nodes that can contain other nodes, for example, Section or Paragraph, derive from the CompositeNode class, which in turn derives from the Node class.
The diagram provided below shows inheritance between node classes of the Aspose.Words Document Object Model (DOM). The names of abstract classes are in Italics.

Node
class.
Let’s look at an example. The following image shows a Microsoft Word document with different types of content.

When reading the above document into the Aspose.Words DOM, the tree of objects is created, as shown in the schema below.

Document, Section, Paragraph, Table, Shape, Run, and all other ellipses on the diagram are Aspose.Words objects that represent elements of the Word document.
Get a Node
Type
Although the Node class is sufficient enough to distinguish different nodes from each other, Aspose.Words provides the NodeType enumeration to simplify some API tasks, such as selecting nodes of a specific type.
The type of each node can be obtained using the NodeType property. This property returns a NodeType enumeration value. For example, a paragraph node represented by the Paragraph class returns NodeType.Paragraph, and a table node represented by the Table class returns NodeType.Table.
The following example shows how to get a node type using the NodeType enumeration:
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java | |
Document doc = new Document(); | |
// Returns NodeType.Document | |
int type = doc.getNodeType(); |
Document Tree Navigation
Aspose.Words represents a document as a node tree, which enables you to navigate between nodes. This section describes how to explore and navigate the document tree in Aspose.Words.
When you open the sample document, presented earlier, in the Document Explorer, the node tree appears exactly as it is represented in Aspose.Words.

Document Node Relationships
The nodes in the tree have relationships between them:
- A node containing another node is a parent.
- The node contained in the parent node is a child. Child nodes of the same parent are sibling nodes.
- The root node is always the Document node.
The nodes that can contain other nodes derive from the CompositeNode class, and all nodes ultimately derive from the Node class. These two base classes provide common methods and properties for the tree structure navigation and modification.
The following UML object diagram shows several nodes of the sample document and their relations to each other via the parent, child, and sibling properties:

Document is Node Owner
A node always belongs to a particular document, even if it has been just created or removed from the tree, because vital document-wide structures such as styles and lists are stored in the Document node. For example, it is not possible to have a Paragraph without a Document because each paragraph has an assigned style that is defined globally for the document. This rule is used when creating any new nodes. Adding a new Paragraph directly to the DOM requires a document object passed to the constructor.
When creating a new paragraph using DocumentBuilder, the builder always has a Document class linked to it through the DocumentBuilder.Document property.
The following code example shows that when creating any node, a document that will own the node is always defined:
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java | |
// Open a file from disk. | |
Document doc = new Document(); | |
// Creating a new node of any type requires a document passed into the constructor. | |
Paragraph para = new Paragraph(doc); | |
// The new paragraph node does not yet have a parent. | |
System.out.println("Paragraph has no parent node: " + (para.getParentNode() == null)); | |
// But the paragraph node knows its document. | |
System.out.println("Both nodes' documents are the same: " + (para.getDocument() == doc)); | |
// The fact that a node always belongs to a document allows us to access and modify | |
// properties that reference the document-wide data such as styles or lists. | |
para.getParagraphFormat().setStyleName("Heading 1"); | |
// Now add the paragraph to the main text of the first section. | |
doc.getFirstSection().getBody().appendChild(para); | |
// The paragraph node is now a child of the Body node. | |
System.out.println("Paragraph has a parent node: " + (para.getParentNode() != null)); |
Parent Node
Each node has a parent specified by the ParentNode property. A node has no parent node, that is, ParentNode is null, in the following cases:
- The node has just been created and has not yet been added to the tree.
- The node has been removed from the tree.
- This is the root Document node which always has a null parent node.
You can remove a node from its parent by calling the Remove method.The following code example shows how to access the parent node:
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java | |
// Create a new empty document. It has one section. | |
Document doc = new Document(); | |
// The section is the first child node of the document. | |
Node section = doc.getFirstChild(); | |
// The section's parent node is the document. | |
System.out.println("Section parent is the document: " + (doc == section.getParentNode())); |
Child Nodes
The most efficient way to access child nodes of a CompositeNode is via the FirstChild and LastChild properties that return the first and last child nodes, respectively. If there are no child nodes, these properties return null.
CompositeNode also provides the ChildNodes collection enabling indexed or enumerated access to the child nodes. The ChildNodes property is a live collection of nodes, which means that whenever the document is changed, such as when nodes are removed or added, the ChildNodes collection is automatically updated.
If a node has no child, then the ChildNodes property returns an empty collection. You can check whether the CompositeNode contains any child nodes using the HasChildNodes property.
The following code example shows how to enumerate immediate child nodes of a CompositeNode
using the enumerator provided by the ChildNodes
collection:
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java | |
Document doc = new Document(dataDir + "Document.doc"); | |
Paragraph paragraph = (Paragraph) doc.getChild(NodeType.PARAGRAPH, 0, true); | |
NodeCollection children = paragraph.getChildNodes(); | |
for (Node child : (Iterable<Node>) children) { | |
// Paragraph may contain children of various types such as runs, shapes and so on. | |
if (child.getNodeType() == NodeType.RUN) { | |
// Say we found the node that we want, do something useful. | |
Run run = (Run) child; | |
System.out.println(run.getText()); | |
} | |
} |
The following code example shows how to enumerate immediate child nodes of a CompositeNode
using indexed access:
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java | |
Document doc = new Document(dataDir + "Document.doc"); | |
Paragraph paragraph = (Paragraph) doc.getChild(NodeType.PARAGRAPH, 0, true); | |
NodeCollection children = paragraph.getChildNodes(); | |
for (int i = 0; i < children.getCount(); i++) { | |
Node child = children.get(i); | |
// Paragraph may contain children of various types such as runs, shapes and so on. | |
if (child.getNodeType() == NodeType.RUN) { | |
// Say we found the node that we want, do something useful. | |
Run run = (Run) child; | |
System.out.println(run.getText()); | |
} | |
} |
Sibling Nodes
You can obtain the node that immediately precedes or follows a particular node using the PreviousSibling and NextSibling properties, respectively. If a node is the last child of its parent, then the NextSibling property is null. Conversely, if the node is the first child of its parent, the PreviousSibling property is null.
The following code example shows how to efficiently visit all direct and indirect child nodes of a composite node:
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java | |
public static void main(String[] args) throws Exception { | |
String dataDir = Utils.getSharedDataDir(ChildNodes.class) + "DocumentObjectModel/"; | |
recurseAllNodes(dataDir); | |
} | |
public static void recurseAllNodes(String dataDir) throws Exception { | |
// Open a document | |
Document doc = new Document(dataDir + "Node.RecurseAllNodes.doc"); | |
// Invoke the recursive function that will walk the tree. | |
traverseAllNodes(doc); | |
} | |
/** | |
* A simple function that will walk through all children of a specified node | |
* recursively and print the type of each node to the screen. | |
*/ | |
public static void traverseAllNodes(CompositeNode parentNode) throws Exception { | |
// This is the most efficient way to loop through immediate children of a node. | |
for (Node childNode = parentNode.getFirstChild(); childNode != null; childNode = childNode.getNextSibling()) { | |
// Do some useful work. | |
System.out.println(Node.nodeTypeToString(childNode.getNodeType())); | |
// Recurse into the node if it is a composite node. | |
if (childNode.isComposite()) | |
traverseAllNodes((CompositeNode) childNode); | |
} | |
} |
Typed Access to Child and Parent Nodes
So far, we have discussed the properties that return one of the base types – Node or CompositeNode. But sometimes there are situations where you might need to cast values to a specific node class, such as Run or Paragraph. That is, you cannot completely get away from casting when working with the Aspose.Words DOM, which is composite.
To reduce the need for casting, most Aspose.Words classes provide properties and collections that provide strongly-typed access. There are three basic patterns of typed access:
- A parent node exposes typed FirstXXX and LastXXX properties. For example, the Document has FirstSection and LastSection properties. Similarly, Table has properties such as FirstRow, LastRow, and others.
- A parent node exposes a typed collection of child nodes, such as Document.Sections, Body.Paragraphs, and others.
- A child node provides typed access to its parent, such as Run.ParentParagraph, Paragraph.ParentSection, and others.
Typed properties are merely useful shortcuts that sometimes provide easier access than generic properties inherited from Node.ParentNode and CompositeNode.FirstChild.
The following code example shows how to use typed properties to access nodes of the document tree:
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java | |
Document doc = new Document(); | |
// Quick typed access to the first child Section node of the Document. | |
Section section = doc.getFirstSection(); | |
// Quick typed access to the Body child node of the Section. | |
Body body = section.getBody(); | |
// Quick typed access to all Table child nodes contained in the Body. | |
TableCollection tables = body.getTables(); | |
for (Table table : tables) { | |
// Quick typed access to the first row of the table. | |
if (table.getFirstRow() != null) | |
table.getFirstRow().remove(); | |
// Quick typed access to the last row of the table. | |
if (table.getLastRow() != null) | |
table.getLastRow().remove(); | |
} |