Clean Up a Document

Sometimes you may need to remove unused or duplicate information to reduce the size of the output document and processing time.

While you can find and remove unused data, such as styles or lists, or duplicate information manually, it will be much more convenient to do this using features and capabilities provided by Aspose.Words.

The CleanupOptions class allows you to specify options for document cleaning. To remove duplicate styles or just unused styles or lists from the document, you can use the Cleanup method.

Remove Unused Information from a Document

You can use the UnusedStyles and UnusedBuiltinStyles properties to detect and remove styles that are marked as “unused”.

You can use the UnusedLists property to detect and remove lists and list definitions that are marked as “unused”.

The following code example shows how to remove only unused styles from a document:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET.git.
Document doc = new Document(MyDir + "Unused styles.docx");
// Combined with the built-in styles, the document now has eight styles.
// A custom style is marked as "used" while there is any text within the document
// formatted in that style. This means that the 4 styles we added are currently unused.
Console.WriteLine($"Count of styles before Cleanup: {doc.Styles.Count}\n" +
$"Count of lists before Cleanup: {doc.Lists.Count}");
// Cleans unused styles and lists from the document depending on given CleanupOptions.
CleanupOptions cleanupOptions = new CleanupOptions { UnusedLists = false, UnusedStyles = true };
doc.Cleanup(cleanupOptions);
Console.WriteLine($"Count of styles after Cleanup was decreased: {doc.Styles.Count}\n" +
$"Count of lists after Cleanup is the same: {doc.Lists.Count}");
doc.Save(ArtifactsDir + "WorkingWithDocumentOptionsAndSettings.CleanupUnusedStylesAndLists.docx");

Remove Duplicate Information from a Document

You can also use the DuplicateStyle property to substitute all duplicate styles with the original one and remove duplicates from a document.

The following code example shows how to remove duplicate styles from a document:

// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-.NET.git.
Document doc = new Document(MyDir + "Document.docx");
// Count of styles before Cleanup.
Console.WriteLine(doc.Styles.Count);
// Cleans duplicate styles from the document.
CleanupOptions options = new CleanupOptions { DuplicateStyle = true };
doc.Cleanup(options);
// Count of styles after Cleanup was decreased.
Console.WriteLine(doc.Styles.Count);
doc.Save(ArtifactsDir + "WorkingWithDocumentOptionsAndSettings.CleanupDuplicateStyle.docx");