Working with OneNote Document
Extract OneNote Content using DocumentVisitor
Aspose.Note can be used to parse Microsoft Office OneNote documents to extract separate document elements such as pages, images, rich text, outline, title, table, attached files, and others. Another possible task is to find all text and get a total count of nodes etc.
Use the DocumentVisitor class to implement this usage scenario. This class corresponds to the well-known Visitor design pattern. With the DocumentVisitor, you can define and execute custom operations that require enumeration over the document tree.
DocumentVisitor provides a set of VisitXXX methods that are invoked when a particular document element (node) is encountered. For example, DocumentVisitor.VisitRichTextStart is called when the text item starts and DocumentVisitor.VisitImageEnd is called when the visitor has visited all the children nodes. Each DocumentVisitor.VisitXXX method accepts the corresponding object that it encounters so you can use it as needed.
These are the steps you should follow to programmatically determine and extract various parts of a document:
- Create a class derived from DocumentVisitor.
- Override and provide implementations for some or all of the DocumentVisitor.VisitXXX methods to perform custom operations.
- Call [Node.Accept]https://reference.aspose.com/note/java/com.aspose.note/node#accept-com.aspose.note.DocumentVisitor) on the node from where you want to start the enumeration. For example, if you want to enumerate the whole document, use Document.Accept.
DocumentVisitor provides default implementations for all of the DocumentVisitor.VisitXXX methods. This makes it easier to create new document visitors as only the methods required for the particular visitor need to be overridden. It is not necessary to override all of the visitor methods.
This example shows how to use the Visitor pattern to add new operations to the Aspose.Note object model. In this case, we create a simple document converted into a text format.
1public class ExtractOneNoteContentUsingDocumentvisitor extends DocumentVisitor {
2
3 final private StringBuilder mBuilder;
4 final private boolean mIsSkipText;
5 private int nodecount;
6
7 public ExtractOneNoteContentUsingDocumentvisitor() {
8 nodecount = 0;
9 mIsSkipText = false;
10 mBuilder = new StringBuilder();
11 }
12
13 // Gets the plain text of the document that was accumulated by the visitor.
14 public String GetText() {
15 return mBuilder.toString();
16 }
17
18 // Adds text to the current output. Honors the enabled/disabled output flag.
19 private void AppendText(String text) {
20 if (!mIsSkipText)
21 mBuilder.append(text);
22 }
23
24 // Called when a RichText node is encountered in the document.
25 public /* override */ void VisitRichTextStart(RichText run) {
26 ++nodecount;
27 AppendText(run.getText());
28 }
29
30 // Called when a Document node is encountered in the document.
31 public /* override */ void VisitDocumentStart(Document document) {
32 ++nodecount;
33 }
34
35 // Called when a Page node is encountered in the document.
36 public /* override */ void VisitPageStart(Page page) {
37 ++nodecount;
38 }
39
40 // Called when a Title node is encountered in the document.
41 public /* override */ void VisitTitleStart(Title title) {
42 ++nodecount;
43 }
44
45 // Called when a Image node is encountered in the document.
46 public /* override */ void VisitImageStart(Image image) {
47 ++nodecount;
48 }
49
50 // Called when a OutlineGroup node is encountered in the document.
51 public /* override */ void VisitOutlineGroupStart(OutlineGroup outlineGroup) {
52 ++nodecount;
53 }
54
55 // Called when a Outline node is encountered in the document.
56 public void VisitOutlineStart(Outline outline) {
57 ++nodecount;
58 }
59
60 // Called when a OutlineElement node is encountered in the document.
61 public void VisitOutlineElementStart(OutlineElement outlineElement) {
62 ++nodecount;
63 }
64
65 // Gets the total count of nodes by the Visitor
66 public int NodeCount() {
67 return this.nodecount;
68 }
69
70 public static void main(String[] args) throws IOException {
71 // Open the document we want to convert.
72
73 String dataDir = Utils.getSharedDataDir(ExtractOneNoteContentUsingDocumentvisitor.class) + "load/";
74
75 Document doc = new Document(dataDir + "Sample1.one", new LoadOptions());
76
77 // Create an object that inherits from the DocumentVisitor class.
78 ExtractOneNoteContentUsingDocumentvisitor myConverter = new ExtractOneNoteContentUsingDocumentvisitor();
79
80 // This is the well known Visitor pattern. Get the model to accept a
81 // visitor.
82 // The model will iterate through itself by calling the corresponding
83 // methods
84 // on the visitor object (this is called visiting).
85 //
86 // Note that every node in the object model has the Accept method so the
87 // visiting
88 // can be executed not only for the whole document, but for any node in
89 // the document.
90 doc.accept(myConverter);
91
92 // Once the visiting is complete, we can retrieve the result of the
93 // operation,
94 // that in this example, has accumulated in the visitor.
95 System.out.println(myConverter.GetText());
96 System.out.println(myConverter.NodeCount());
97 }
98}
Aspose.Note Document Object Model
Node Classes
When Aspose.Note reads a OneNote document into memory, objects of different types are created to represent various document elements. Every RichText of text, title, table, and even the OneNote itself is a node. Aspose.Note defines a class for every type of document node. The following illustration is a UML class diagram that shows inheritance between node classes of the Aspose.Note Document Object Model (DOM). The names of abstract classes are Node and CompositeNode. Please note that the Aspose.Note DOM also contains non-node classes such as ParagraphStyle, Margins, NumberList, etc that do not participate in the inheritance and they are not shown on this diagram.
The following table lists Aspose.Note node classes and their short descriptions.
Aspose.Note Class | Category | Description |
---|---|---|
Document | Document | A document object that, as the root of the document tree, provides access to the entire OneNote document. |
Title | Title | A page title of OneNote document. |
Page | Page | A page of OneNote document. |
AttachedFile | File | Represents an attached file within the OneNote document. |
Image | Image | Represents an image file within the OneNote document. |
OutlineGroup | Outline | Represents a group of outlines. |
OutlineElement | Outline | Represents an outline element. |
Outline | Outline | Represents an outline. |
Table | Tables | A table in a OneNote document. |
TableCell | Tables | A cell of a table row. |
TableRow | Tables | A row of a table. |
RichText | Text | A RichText of text with consistent formatting. |
The following table lists Aspose.Note base node classes that help to form the class hierarchy.
Class | Description |
---|---|
Node | Abstract base class for all nodes of a OneNote document. Provides basic functionality of a child node. |
CompositeNode | Base class for nodes that can contain other nodes. Provides operations to access, insert, remove and select child nodes. |
Distinguish Nodes by NodeType
Although the class of the node is sufficient enough to distinguish different nodes from each other, Aspose.Note provides the NodeType enumeration to simplify some API tasks such as selecting nodes of a specific type. The type of each node can be obtained using the NodeType enumeration. This returns a NodeType enumeration value. For example, a RichText node (represented by the RichText class) returns NodeType.RichText, a table node (represented by the Table class) returns NodeType.Table, and so on.
Create an Empty OneNote Document with a Page Title
Aspose.Note for Java supports generating OneNote documents from scratch.
Use this code snippet to create from scratch a new document with a title.
This example works as follows:
- Create an object of the Document class.
- Initialize Page class object by passing the Document class object.
- Set Page title using Page.setTitle() method.
- Call Document class’ appendChildLast method and pass the Page class object.
- Finally, save it by calling the save method of Document class.
The following code snippet shows you how to create an empty OneNote document with a page title.
1// The path to the documents directory.
2String dataDir = Utils.getSharedDataDir(CreateDocWithPageTitle.class);
3
4// Create an object of the Document class
5Document doc = new Document();
6
7// Initialize Page class object
8Page page = new Page(doc);
9
10// Default style for all text in the document.
11ParagraphStyle textStyle = new ParagraphStyle();
12textStyle.setFontColor(Color.BLACK);
13textStyle.setFontName("Arial");
14textStyle.setFontSize(10);
15
16// Set page title properties
17Title title = new Title(doc);
18
19RichText titleText = new RichText(doc);
20titleText.setText("Title text.");
21titleText.setParagraphStyle(textStyle);
22title.setTitleText(titleText);
23
24RichText titleDate = new RichText(doc);
25Calendar cal = Calendar.getInstance();
26cal.set(2018, 04, 03);
27titleDate.setText(cal.getTime().toString());
28titleDate.setParagraphStyle(textStyle);
29title.setTitleDate(titleDate);
30
31RichText titleTime = new RichText(doc);
32titleTime.setText("12:34");
33titleTime.setParagraphStyle(textStyle);
34title.setTitleText(titleTime);
35
36page.setTitle(title);
37
38// Append Page node in the document
39doc.appendChildLast(page);
40
41dataDir = dataDir + "load//CreateDocWithPageTitle_out.one";
42
43// Save OneNote document
44doc.save(dataDir);
Getting File Format Information
Aspose.Note API supports Microsoft OneNote as well as on OneNote online file formats. The lateral has some limitations such as no support for attachments and images in the document is not available. In order to guide the users about these two different types of documents, the API provides the method to determine the file format of the OneNote document so as to know the limitations until these are fully supported.
This article shows how to retrieve the file format information of the OneNote document using the FileFormat property of the Document class.
1String dataDir = Utils.getSharedDataDir(GetFileFormatInfo.class) + "load/";
2
3Document document = new Document(dataDir + "Aspose.one");
4switch (document.getFileFormat())
5{
6 case FileFormat.OneNote2010:
7 // Process OneNote 2010
8 break;
9 case FileFormat.OneNoteOnline:
10 // Process OneNote Online
11 break;
12}