Working with Table of Contents
Insert and Work with the Table of Contents Field
Often you will work with documents containing a table of contents (TOC). Using Aspose.Words you can insert your own table of contents or completely rebuild the existing table of contents in the document using just a few lines of code.
This article outlines how to work with the table of contents field and demonstrates:
- How to insert a brand new TOC.
- Update new or existing TOCs in the document.
- Specify switches to control the formatting and overall structure of the TOC.
- How to modify the styles and appearance of the table of contents.
- How to remove an entire TOC field along with all entries from the document.
Insert a Table of Contents Programmatically
The DocumentBuilder.insertTableOfContents(java.lang.String) method is called to insert a TOC field into the document at the current position of the DocumentBuilder.
A table of contents in a Word document can be built in several ways and formatted using a variety of options. The field switches that you pass to the method control the way the table is built and displayed in your document.
The default switches that are used in a TOC inserted in Microsoft Word are “\o “1-3 \h \z \u”. Descriptions of these switches as well as a list of supported switches can be found later in the article. You can either use that guide to obtain the correct switches or if you already have a document containing the similar TOC that you want you can show field codes (ALT+F9) and copy the switches directly from the field.
The following code example shows how to insert a Table of Contents field into a document.
The following code example demonstrates how to insert a Table of contents (TOC) into a document using heading styles as entries.
The code demonstrates the new table of contents is inserted into a blank document. The DocumentBuilder class is then used to insert some sample content formatting with the appropriate heading styles which are used to mark the content to be included in the TOC. The next lines then populate the TOC by updating the fields and the page layout of the document.
Without these calls when the output document is opened you would find that there would be a TOC field but with no visible content. This is because the TOC field has been inserted but is not yet populated until it’s updated in the document. Further information about this is discussed in the next section.
Updating the Table of Contents
Aspose.Words allows you to completely update a TOC with only a few lines of code. This can be done to populate a newly inserted TOC or to update an existing TOC after changes to the document have been made.
The following two methods must be used to update the TOC fields in the document:
Please note that these two update methods are required to be called in that order. If reversed the table of contents will be populated but no page numbers will be displayed. Any number of different TOCs can be updated. These methods will automatically update all TOCs found in the document.
The following code example shows how to completely rebuild TOC fields in the document by invoking field updates.
The first call to Document.updateFields() will build the TOC, all text entries are populated and the TOC appears almost complete. The only thing missing is the page numbers which for now are displayed with “?”.
The second call to Document.updatePageLayout() will build the layout of the document in memory. This needs to be done to gather the page numbers of the entries. The correct page numbers calculated from this call are then inserted into the TOC.
Using Switches to Control the Behavior of the Table of Contents
As with any other field, the TOC field can accept switches defined within the field code that controls how the table of contents is built. Certain switches are used to control which entries are included and at what level while others are used to control the appearance of the TOC. Switches can be combined together to allow a complex table of contents to be produced.
By default, these switches above are included when inserting a default TOC in the document. A TOC with no switches will include content from the built-in heading styles (as if the \O switch is set).
The available TOC switches that are supported by Aspose.Words are listed below and their uses are described in detail. They can be divided into separate sections based on their type. The switches in the first section define what content to include in the TOC and the switches in the second section control the appearance of the TOC.
If a switch is not listed here then it is currently unsupported. All switches will be supported in future versions. We are adding further support to every release.
Entry Marking Switches
This switch defines that the TOC should be built off the built-in heading styles. In Microsoft Word, these are defined by Heading 1 – Heading 9. In Aspose.Words these styles are represented by the corresponding StyleIdentifier enumeration. This enumeration represents a locale-independent identifier of a style, for example, StyleIdentifier.Heading1 represents the Heading 1 style. Using this, the formatting and properties of the style can be retrieved from the Style collection of the document. The corresponding Style class can be retrieved from the Document.Styles collection by using the indexed property of type StyleIdentifier.
Each paragraph can define an outline level under Paragraph options.
Note that built-in heading styles such as Heading 1 have an outline level compulsory set in style settings.
This switch will allow custom styles to be used when collecting entries to be used in the TOC. This is often used in conjunction with the \O switch to include custom styles along with built-in heading styles in the TOC.
will use content styled with CustomHeading1 as level 1 content in the TOC and CustomHeading2 as level 2.
|Use TC Fields
(\F and \L Switches)
In older versions of Microsoft Word, the only way to build a TOC was the use of TC fields. These fields are inserted hidden into the document even when field codes are shown. They include the text that should be displayed in the entry and the TOC is built from them. This functionality is now not used very often but may still be useful in some occasions to include entries in the TOC which are not indented to be visible in the document.
These fields can be inserted into a document at any position like any other field and are represented by the FieldType.FieldTOCEntry enumeration.
will only include TC fields such as
The TOC field also has a related switch, the “\L” switch specifies that only TC field with levels within the specified range are included.
Appearance Related Switches
|Omit Page Numbers
This switch is used to hide page numbers for certain levels of the TOC. For example, you can define
and the page numbers on the entries of levels 3 and four will be hidden along with the leader dots (if there are any). To specify only one level a range should still be used, for example, “1-1” will exclude page numbers only for the first level.
|Insert As Hyperlinks
This switch specifies that TOC entries are inserted as hyperlinks. When viewing a document in Microsoft Word these entries will still appear as normal text inside the TOC but are hyperlinked and thus can be used to navigate to the position of the original entry in the document by using Ctrl + Left Click in Microsoft Word. When this switch is included then these links are also preserved in other formats. For instance, in HTML based formats including EPUB and rendered formats such as PDF and XPS, these will be exported as working links.
|Set Separator Character
This switch allows the content separating the title of the entry and page numbering to be easily changed in the TOC. The separator to use should be specified after this switch and enclosed in speech marks.
|Preserve Tab Entries
Using this switch will specify that any entries that have a tab character, for instance, a heading that has a tab at the end of the line, will be retained as a proper tab character when populating the TOC. This means the function of the tab character will be present in the TOC and can be used to format the entry. For example, certain entries may use tab stops and tab characters to evenly space out the text. As long as the corresponding TOC level defines the equivalent tab stops then the generated TOC entries will appear with similar spacing.
|Preserve New Line Entries
Similar to the switch above, this switch specifies that headings spanning over multiple lines (using newline characters, not separate paragraphs) will be preserved as they are in the generated TOC. For example, a heading which is to spread across multiple lines can use the new line character (Ctrl + Enter or ControlChar.LineBreak) to separate content across different lines. With this switch specified, the entry in the TOC will preserve these new line characters as shown below.
Insert TC Fields
You can insert a new TC field at the current position of the DocumentBuilder by calling the DocumentBuilder.InsertField method and specifying the field name as “TC” along with any switches that are needed.
The following code example shows how to insert a TC field into the document using DocumentBuilder.
Often a specific line of text is designated for the TOC and is marked with a TC field. The easy way to do this in MS Word is to highlight the text and press ALT+SHIFT+O. This automatically creates a TC field using the selected text. The same technique can be accomplished through code. The code below will find text matching the input and insert a TC field in the same position as the text. The code is based on the same technique used in the article. The following code example shows how to find and insert a TC field at the text in a document.
Modify a Table of Contents
Change the Formatting of Styles
The formatting of entries in the TOC does not use the original styles of the marked entries, instead, each level is formatted using an equivalent TOC style. For example, the first level in the TOC is formatted with the TOC1 style, the second level formatted with the TOC2 style and so on. This means that to change the look of the TOC these styles must be modified. In Aspose.Words these styles are represented by the locale-independent StyleIdentifier.TOC1 through to StyleIdentifier.TOC9 and can be retrieved from the Document.Styles collection using these identifiers.
Once the appropriate style of the document has been retrieved the formatting for this style can be modified. Any changes to these styles will be automatically reflected in the TOCs in the document.
The following code example changes a formatting property used in the first level TOC style.
It is also useful to note that any direct formatting of a paragraph (defined on the paragraph itself and not in the style) marked to be included in the TOC will be copied over in the entry in the TOC. For example, if the Heading 1 style is used to mark content for the TOC and this style has Bold formatting while the paragraph also has italic formatting directly applied to it. The resulting TOC entry will not be bold as that is part of style formatting however it will be italic as this is directly formatted on the paragraph.
You can also control the formatting of the separators used between each entry and the page number. By default, this is a dotted line that is spread across to the page numbering using a tab character and a right tab stop lined up close to the right margin.
Using the Style class retrieved for the particular TOC level you want to modify, you can also modify how these appear in the document.
To change how this appears firstly Style.ParagraphFormat must be called to retrieve the paragraph formatting for the style. From this, the tab stops can be retrieved by calling ParagraphFormat.TabStops and the appropriate tab stop modified. Using this same technique the tab itself can be moved or removed altogether.
The following code example shows how to modify the position of the right tab stop in TOC related paragraphs.
Removing a Table of Contents from the Document
A table of contents can be removed from the document by removing all nodes found between the FieldStart and FieldEnd node of the TOC field.
The code below demonstrates this. The removal of the TOC field is simpler than a normal field as we do not keep track of nested fields. Instead, we check the FieldEnd node is of type FieldType.FieldTOC which means we have encountered the end of the current TOC. This technique can be used in this case without worrying about any nested fields as we can assume that any properly formed document will have no fully nested TOC field inside another TOC field.
Firstly the FieldStart nodes of each TOC are collected and stored. The specified TOC is then enumerated so all nodes within the field are visited and stored. The nodes are then removed from the document. The following code example demonstrates how to remove a specified TOC from a document.
Extract Table of Contents
If you want to extract a table of contents from any Word document, the following code sample can be used.