Working with Table of Contents
Often you will work with documents containing a table of contents (TOC). Using Aspose.Words you can insert your own table of contents or completely rebuild existing table of contents in the document using just a few lines of code. This article outlines how to work with the table of contents field and demonstrates:
- How to insert a brand new
TOC
- Update new or existing TOCs in the document.
- Specify switches to control the formatting and overall structure f the TOC.
- How to modify the styles and appearance of the table of contents.
- How to remove an entire
TOC
field along with all entries form the document.
Insert a Table of Contents Programmatically
You can insert a TOC
(table of contents) field into the document at the current position by calling the DocumentBuilder.insert_table_of_contents method.
A table of contents in a Word document can be built in a number of ways and formatted using a variety of options. The field switches that you pass to the method control the way the table is built and displayed in your document.
The default switches that are used in a TOC
inserted in Microsoft Word are “\o “1-3 \h \z \u”. Descriptions of these switches as well as a list of supported switches can be found later in the article. You can either use that guide obtain the correct switches or if you already have a document containing the similar TOC
that you want you can show field codes (ALT+F9) and copy the switches directly from the field.
The following code example shows how to insert a Table of Contents field into a document:
The code demonstrates the new table of contents being inserted into a blank document. The DocumentBuilder class is then used to insert some sample content formatting with the appropriate heading styles which are used to mark the content to be included in the TOC. The next lines then populate the TOC
by updating the fields and page layout of the document.
TOC
field, but with no visible content. This is because the TOC
field has been inserted but is not yet populated until it’s updated in the document. Further information about this is discussed in the next section.
Update the Table of Contents
Aspose.Words allows you to completely update a TOC
with only a few lines of code. This can be done to populate a newly inserted TOC
or to update an existing TOC
after changes to the document have been made. The following two methods must be used in order to update the TOC
fields in the document:
Please note that these two update methods are required to be called in that order. If reversed the table of contents will be populated but no page numbers will be displayed. Any number of different TOCs can be updated. These methods will automatically update all TOCs found in the document.
The following code example shows how to completely rebuild TOC
fields in the document by invoking field update:
The first call to Document.update_fields will build the TOC
, all text entries are populated and the TOC
appears almost complete. The only thing missing is the page numbers which for now are displayed with “?”. The second call to Document.update_page_layout will build the layout of the document in memory. This needs to be done to gather the page numbers of the entries. The correct page numbers calculated from this call are then inserted into the TOC.
Use Switches to Control the Behavior of the Table of Contents.
As with any other field, the TOC
field can accept switches defined within the field code that control the how the table of contents is built. Certain switches are used to control which entries are included and at what level while others are used to control the appearance of the TOC. Switches can be combined together to allow complex table of contents to be produced.
By default these switches above are included when inserting a default TOC
in the document. A TOC
with no switches will include content from the built-in heading styles (as if the \O switch is set). The available TOC
switches that are supported by Aspose.Words are listed below and their uses are described in detail. They can be divided into separate sections based off their type. The switches in the first section define what content to include in the TOC
and the switches in the second section control the appearance of the TOC. If a switch is not listed here then it is currently unsupported. All switches will be supported in future versions. We are adding further support with every release.
Entry Marking Switches
Switch | Description |
---|---|
Heading Styles (\O Switch) |
This switch defines that the Any content formatted with these styles are included in the table of contents. The level of the heading will define the corresponding hierarchical level of the entry in the TOC. For instance, a paragraph with Heading 1 style will be treated as the first level in the |
Outline Levels (\U switch) |
Each paragraph can define an outline level under Paragraph options. This setting dictates which level this paragraph should be treated in document hierarchy. This is commonly used practice used to easily structure the layout of a document. This hierarchy can be viewed by changing to Outline View in Microsoft Word. Similar to heading styles, there can be 1 – 9 outline levels in addition to the “Body Text” level. Outline levels 1 – 9 will appear in the Note that built-in heading styles such as Heading 1 have an outline level compulsory set in style settings.
|
Custom Styles (\T switch) |
This switch will allow custom styles to be used when collecting entries to be used in the TOC. This is often used in conjunction with the \O switch to include custom styles along with built-in heading styles in the TOC.
will use content styled with CustomHeading1 as level 1 content in the |
Use TC Fields (\F and \L Switches) |
In older versions of Microsoft Word, the only way to build a FieldType.FieldTOCEntry enumeration. The \F switch in a TOC is used to specify that TC fields should be used as entries. The switch on its own without any extra identifier means that any TC field in the document will be included. Any extra parameter, often a single letter, will designate that only TC fields which have a matching \f switch will be included in the TOC. For instance *
will only include TC fields such as
The The - \F – Explained above. - \L – Defines which level in the - _\N – The page numbering for this |
Appearance Related Switches
Switch | Description |
---|---|
Omit Page Numbers (\N Switch) |
This switch is used to hide page numbers for certain levels of the TOC. For example you can define
and the page numbers on the entries of levels 3 and four will be hidden along with the leader dots (if there are any). To specify only one level a range should still be used, for example “1-1” will exclude page numbers only for the first level. |
Insert As Hyperlinks (\H Switch) |
This switch specifies that |
Set Separator Character (\P Switch) |
This switch allows the content separating the title of the entry and page numbering to be easily changed in the TOC. The separator to use should be specified after this switch and enclosed in speech marks. |
Preserve Tab Entries (\W Switch) |
Using this switch will specify that that any entries that have a tab character, for instance a heading which has a tab at the end of the line, will be retained as a proper tab character when populating the TOC. This means the function of the tab character will be present in the |
Preserve New Line Entries (\X Switch) |
Similar to the switch above, this switch specifies that headings spanning over multiple lines (using new line characters not separate paragraphs) will be preserved as they are in the generated TOC. For example, a heading which is to spread across multiple lines can use the new line character (Ctrl + Enter or |
Insert TC Fields
You can insert a new TC field at the current position of the DocumentBuilder by calling the DocumentBuilder.insert_field method and specifying the field name as “TC” along with any switches that are needed. Below example shows how to insert a TC
field into the document using DocumentBuilder.
Modify a Table of Contents
The formatting of entries in the TOC
do not use the original styles of the marked entries, instead each level is formatted using an equivalent TOC
style. For example the first level in the TOC
is formatted with the TOC1 style, the second level formatted with the TOC2 style and so on. This means that to change the look of the TOC
these styles must be modified. In Aspose.Words these styles are represented by the locale independent StyleIdentifier.TOC1 through to StyleIdentifier.TOC9 and can be retrieved from the Document.styles collection using these identifiers.
Once the appropriate style of the document has been retrieved the formatting for this style can be modified. Any changes to these styles will be automatically reflected on the TOCs in the document. Below example changes a formatting property used in the first level TOC
style.
It is also useful to note that any direct formatting of a paragraph (defined on the paragraph itself and not in the style) marked to be included the TOC
will be copied over in the entry in the TOC. For example if the Heading 1 style is used to mark content for the TOC
and this style has Bold formatting while the paragraph also has italic formatting directly applied to it. The resulting TOC
entry will not be bold as that is part of style formatting however it will be italic as this is directly formatted on the paragraph.
You can also control the formatting of the separators used between each entry and page number. By default this is a dotted line which is spread across to the page numbering using a tab character and a right tab stop lined up close to the right margin.
Using the Style class retrieved for the particular TOC
level you want to modify, you can also modify how these appear in the document. To change how this appears firstly Style.paragraph_format must be called to retrieve the paragraph formatting for the style. From this the tab stops can be retrieved by calling ParagraphFormat.tab_stops and the appropriate tab stop modified. Using this same technique the tab itself can be moved or removed all together. Below example shows how to modify the position of the right tab stop in TOC
related paragraphs.
Remove a Table of Contents from the Document
A table of contents can be removed from the document by removing all nodes found between the FieldStart and FieldEnd node of the TOC
field. The code below demonstrates this. The removal of the TOC
field is simpler than a normal field as we do not keep track of nested fields. Instead we check the FieldEnd node is of type FieldType.FIELD_TOC which means we have encountered the end of the current TOC. This technique can be used in this case without worrying about any nested fields as we can assume that any properly formed document will have no fully nested TOC
field inside another TOC
field.
Firstly the FieldStart nodes of each TOC
are collected and stored. The specified TOC
is then enumerated so all nodes within the field are visited and stored. The nodes are then removed from the document. Below code sample demonstrates how to remove a specified TOC
from a document.
Extract Table of Contents
If you want to extract a table of contents from any Word document, the following code sample can be used.
doc = aw.Document(docs_base.my_dir + "Table of contents.docx")
for field in doc.range.fields :
if (field.type == aw.fields.FieldType.FIELD_HYPERLINK) :
hyperlink = field.as_field_hyperlink()
if (hyperlink.sub_address != None and hyperlink.sub_address.find("_Toc") == 0) :
tocItem = field.start.get_ancestor(aw.NodeType.PARAGRAPH).as_paragraph()
print(tocItem.to_string(aw.SaveFormat.TEXT).strip())
print("------------------")
bm = doc.range.bookmarks.get_by_name(hyperlink.sub_address)
pointer = bm.bookmark_start.get_ancestor(aw.NodeType.PARAGRAPH).as_paragraph()
print(pointer.to_string(aw.SaveFormat.TEXT))