Clean Up a Document

Sometimes you may need to remove unused or duplicate information to reduce the size of the output document and processing time.

While you can find and remove unused data, such as styles or lists, or duplicate information manually, it will be much more convenient to do this using features and capabilities provided by Aspose.Words.

The CleanupOptions class allows you to specify options for document cleaning. To remove duplicate styles or just unused styles or lists from the document, you can use the cleanup method.

Remove Unused Information from a Document

You can use the unused_styles and unused_builtin_styles properties to detect and remove styles that are marked as “unused”.

You can use the unused_lists property to detect and remove lists and list definitions that are marked as “unused”.

The following code example shows how to remove only unused styles from a document:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
doc = aw.Document(docs_base.my_dir + "Unused styles.docx")
# Combined with the built-in styles, the document now has eight styles.
# A custom style is marked as "used" while there is any text within the document
# formatted in that style. This means that the 4 styles we added are currently unused.
print(f"Count of styles before Cleanup: {doc.styles.count}\n" +
f"Count of lists before Cleanup: {doc.lists.count}")
# Cleans unused styles and lists from the document depending on given CleanupOptions.
cleanupOptions = aw.CleanupOptions()
cleanupOptions.unused_lists = False
cleanupOptions.unused_styles = True
doc.cleanup(cleanupOptions)
print(f"Count of styles after Cleanup was decreased: {doc.styles.count}\n" +
f"Count of lists after Cleanup is the same: {doc.lists.count}")
doc.save(docs_base.artifacts_dir + "WorkingWithDocumentOptionsAndSettings.cleanup_unused_styles_and_lists.docx")

Remove Duplicate Information from a Document

You can also use the duplicate_style property to substitute all duplicate styles with the original one and remove duplicates from a document.

The following code example shows how to remove duplicate styles from a document:

# For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Python-via-.NET
doc = aw.Document(docs_base.my_dir + "Document.docx")
# Count of styles before Cleanup.
print(doc.styles.count)
# Cleans duplicate styles from the document.
options = aw.CleanupOptions()
options.duplicate_style = True
doc.cleanup(options)
# Count of styles after Cleanup was decreased.
print(doc.styles.count)
doc.save(docs_base.artifacts_dir + "WorkingWithDocumentOptionsAndSettings.cleanup_duplicate_style.docx")