Working with OneNote Document

Extract OneNote Content using DocumentVisitor

Aspose.Note can be used to parse Microsoft Office OneNote documents to extract separate document elements such as pages, images, rich text, outline, title, table, attached files, and others. Another possible task is to find all text and get a total count of nodes etc.

Use the DocumentVisitor class to implement this usage scenario. This class corresponds to the well-known Visitor design pattern. With DocumentVisitor, you can define and execute custom operations that require enumeration over the document tree.

DocumentVisitor provides a set of VisitXXX methods that are invoked when a particular document element (node) is encountered. For example, DocumentVisitor.VisitRichTextStart is called when the text item starts and DocumentVisitor.VisitImageEnd is called when the visitor has visited all the children nodes. Each DocumentVisitor.VisitXXX method accepts the corresponding object that it encounters so you can use it as needed.

These are the steps you should follow to programmatically determine and extract various parts of a document:

  1. Create a class derived from DocumentVisitor.
  2. Override and provide implementations for some or all of the DocumentVisitor.VisitXXX methods to perform custom operations.
  3. Call Node.Accept on the node from where you want to start the enumeration. For example, if you want to enumerate the whole document, use Document.Accept.

DocumentVisitor provides default implementations for all of the DocumentVisitor.VisitXXX methods. This makes it easier to create new document visitors as only the methods required for the particular visitor need to be overridden. It is not necessary to override all of the visitor methods.

The following code example demonstrates how to use the Visitor pattern to add new operations to the Aspose.Note object model. In this case, we create a simple document converter into a text format.

  1public static void Run()
  2{            
  3    // The path to the documents directory.
  4    string dataDir = RunExamples.GetDataDir_LoadingAndSaving();
  5
  6    // Open the document we want to convert.
  7    Document doc = new Document(dataDir + "Aspose.one");
  8
  9    // Create an object that inherits from the DocumentVisitor class.
 10    MyOneNoteToTxtWriter myConverter = new MyOneNoteToTxtWriter();
 11
 12    // This is the well known Visitor pattern. Get the model to accept a visitor.
 13    // The model will iterate through itself by calling the corresponding methods
 14    // on the visitor object (this is called visiting).
 15    //
 16    // Note that every node in the object model has the Accept method so the visiting
 17    // can be executed not only for the whole document, but for any node in the document.
 18    doc.Accept(myConverter);
 19
 20    // Once the visiting is complete, we can retrieve the result of the operation,
 21    // that in this example, has accumulated in the visitor.
 22    Console.WriteLine(myConverter.GetText());
 23    Console.WriteLine(myConverter.NodeCount);            
 24}
 25
 26/// <summary>
 27/// Simple implementation of saving a document in the plain text format. Implemented as a Visitor.
 28/// </summary>
 29public class MyOneNoteToTxtWriter : DocumentVisitor
 30{
 31    public MyOneNoteToTxtWriter()
 32    {
 33        nodecount = 0;
 34        mIsSkipText = false;
 35        mBuilder = new StringBuilder();
 36    }
 37
 38    /// <summary>
 39    /// Gets the plain text of the document that was accumulated by the visitor.
 40    /// </summary>
 41    public string GetText()
 42    {
 43        return mBuilder.ToString();
 44    }
 45
 46    /// <summary>
 47    /// Adds text to the current output. Honors the enabled/disabled output flag.
 48    /// </summary>
 49    private void AppendText(string text)
 50    {
 51        if (!mIsSkipText)
 52            mBuilder.Append(text);
 53    }
 54
 55    /// <summary>
 56    /// Called when a RichText node is encountered in the document.
 57    /// </summary>
 58    public override void VisitRichTextStart(RichText run)
 59    {
 60        ++nodecount;
 61        AppendText(run.Text);
 62    }
 63
 64    /// <summary>
 65    /// Called when a Document node is encountered in the document.
 66    /// </summary>
 67    public override void VisitDocumentStart(Document document)
 68    {
 69        ++nodecount;
 70    }
 71
 72    /// <summary>
 73    /// Called when a Page node is encountered in the document.
 74    /// </summary>
 75    public override void VisitPageStart(Page page)
 76    {
 77        ++nodecount;
 78    }
 79
 80    /// <summary>
 81    /// Called when a Title node is encountered in the document.
 82    /// </summary>
 83    public override void VisitTitleStart(Title title)
 84    {
 85        ++nodecount;
 86    }
 87
 88    /// <summary>
 89    /// Called when a Image node is encountered in the document.
 90    /// </summary>
 91    public override void VisitImageStart(Image image)
 92    {
 93        ++nodecount;
 94    }
 95
 96    /// <summary>
 97    /// Called when a OutlineGroup node is encountered in the document.
 98    /// </summary>
 99    public override void VisitOutlineGroupStart(OutlineGroup outlineGroup)
100    {
101        ++nodecount;
102    }
103
104    /// <summary>
105    /// Called when a Outline node is encountered in the document.
106    /// </summary>
107    public override void VisitOutlineStart(Outline outline)
108    {
109        ++nodecount;
110    }
111
112    /// <summary>
113    /// Called when a OutlineElement node is encountered in the document.
114    /// </summary>
115    public override void VisitOutlineElementStart(OutlineElement outlineElement)
116    {
117        ++nodecount;
118    }
119
120    /// <summary>
121    /// Gets the total count of nodes by the Visitor
122    /// </summary>
123    public Int32 NodeCount
124    {
125        get { return this.nodecount; }
126    }
127
128    private readonly StringBuilder mBuilder;
129    private bool mIsSkipText;
130    private Int32 nodecount;
131}

Aspose.Note Document Object Model

Node Classes

When Aspose.Note reads a OneNote document into memory, objects of different types are created to represent various document elements. Every RichText of text, title, table, and even the OneNote itself is a node. Aspose.Note defines a class for every type of document node.

The following illustration is a UML class diagram that shows inheritance between node classes of the Aspose.Note Document Object Model (DOM). The names of abstract classes are Node and CompositeNode. Please note that the Aspose.Note DOM also contains non-node classes such as ParagraphStyle, Margins, NumberList, etc that do not participate in the inheritance and they are not shown on this diagram.

The following table lists Aspose.Note node classes with short descriptions.

Aspose.Note ClassCategoryDescription
DocumentDocumentA document object that, as the root of the document tree, provides access to the entire OneNote document.
TitleTitleA page title of OneNote document.
PagePageA page of OneNote document.
AttachedFileFileRepresents an attached file within the OneNote document.
ImageImageRepresents an image file within the OneNote document.
OutlineGroupOutlineRepresents a group of outlines.
OutlineElementOutlineRepresents an outline element.
OutlineOutlineRepresents an outline.
TableTablesA table in a OneNote document.
TableCellTablesA cell of a table row.
TableRowTablesA row of a table.
RichTextTextA RichText of text with consistent formatting.
The following table lists Aspose.Note base node classes that help to form the class hierarchy.
ClassDescription
NodeAbstract base class for all nodes of a OneNote document. Provides basic functionality of a child node.
CompositeNodeBase class for nodes that can contain other nodes. Provides operations to access, insert, remove, and select child nodes.

Distinguish Nodes by NodeType

Although the class of the node is sufficient enough to distinguish different nodes from each other, Aspose.Note provides the NodeType enumeration to simplify some API tasks such as selecting nodes of a specific type. The type of each node can be obtained using the NodeType enumeration. This returns a NodeType enumeration value. For example, a RichText node (represented by the RichText class) returns NodeType.RichText, a table node (represented by the {**}Table{*} class) returns NodeType.Table, and so on.

The following code example demonstrates how to use the NodeType enumeration.

1// The path to the documents directory.
2Document doc = new Document();
3
4// Returns NodeType.Document
5NodeType type = doc.NodeType;

Create an Empty OneNote Document with a Page Title

Aspose.Note for .NET supports generating OneNote documents from scratch.

Use the code snippet given below to create a new document with a title.

This example works as follows:

  1. Create an object of the Document class.
  2. Initialize Page class object by passing the Document class object.
  3. Set Page.Title object properties.
  4. Call Document class’ AppendChild method and pass Page class object.
  5. Finally, save it by calling the Save method of Document class.

The following code snippet demonstrates how to create an empty OneNote document with a page title

 1// The path to the documents directory.
 2string dataDir = RunExamples.GetDataDir_LoadingAndSaving();
 3
 4// Create an object of the Document class
 5Document doc = new Aspose.Note.Document();
 6// Initialize Page class object
 7Aspose.Note.Page page = new Aspose.Note.Page(doc);
 8
 9// Default style for all text in the document.
10ParagraphStyle textStyle = new ParagraphStyle { FontColor = Color.Black, FontName = "Arial", FontSize = 10 };
11// Set page title properties
12page.Title = new Title(doc)
13{
14    TitleText = new RichText(doc) { Text = "Title text.", ParagraphStyle = textStyle },
15    TitleDate = new RichText(doc) { Text = new DateTime(2011, 11, 11).ToString("D", CultureInfo.InvariantCulture), ParagraphStyle = textStyle },
16    TitleTime = new RichText(doc) { Text = "12:34", ParagraphStyle = textStyle }
17};
18// Append Page node in the document
19doc.AppendChildLast(page);
20
21dataDir = dataDir + "CreateDocWithPageTitle_out.one";
22// Save OneNote document
23doc.Save(dataDir);

Getting File Format Information

Aspose.Note API supports Microsoft OneNote as well as on OneNote online file formats. The lateral has some limitations such as no support for attachments and images in the document is not available. To guide the users about these two different types of documents, the API provides the method to determine the file format of the OneNote document to know the limitations until these are fully supported.

The following code example demonstrates how to retrieve the file format information of the OneNote document using the FileFormat property of the Document class.

 1// The path to the documents directory.
 2string dataDir = RunExamples.GetDataDir_LoadingAndSaving();
 3
 4var document = new Aspose.Note.Document(dataDir + "Aspose.one");
 5switch (document.FileFormat)
 6{
 7    case FileFormat.OneNote2010:
 8        // Process OneNote 2010
 9        break;
10    case FileFormat.OneNoteOnline:
11        // Process OneNote Online
12        break;
13}
Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.