Compare Documents
Comparing documents is a process that identifies changes between two documents and contains the changes as revisions. This process compares any two documents, including versions of one specific document, then the changes between both documents will be shown as revisions in the first document.
The comparison method is achieved by comparing words at the character level or at the word level. If a word contains a change of at least one character, in the result, the difference will be displayed as a change of the entire word, not a character. This process of comparison is a usual task in the legal and financial industries.
Instead of manually searching for differences between documents or between different versions of them, you can use Aspose.Words for comparing documents and getting content changes in formatting, header/footer, tables, and more.
This article explains how to compare documents and how to specify advanced comparing properties.
Try online
You can compare two documents online by using the Document comparison online tool.
Note that the comparison method, described below, is used in this tool to ensure getting equal results. So you will get the same results even by using the online comparison tool or by using the comparison method in Aspose.Words.
Limitations and Supported File Formats
Comparing documents is a very complex feature. There are varied parts of content combination that need to be analyzed to recognize all differences. The reason for this complexity is because Aspose.Words aims to get the same comparison results as the Microsoft Word comparison algorithm.
The general limitation for two documents being compared is that they must not have revisions before calling the compare method as this limitation exists in Microsoft Word.
Compare Two Documents
When you compare documents, differences of the latter document from the former show up as revisions to the former. When you modify a document, each edit will have its own revision after running the compare method.
Aspose.Words allows you to identify documents differences using the Compare method – this is similar to the Microsoft Word document compare feature. It allows you to check documents or document versions to find differences and changes, including formatting modifications such as font changes, spacing changes, the addition of words and paragraphs.
As a result of the comparison, documents can be determined as equal or not equal. The term “equal” documents mean that the comparison method is not able to represent changes as revisions. This means that both document text and text formatting are the same. But there can be other differences between documents. For example, Microsoft Word supports only format revisions for styles, and you cannot represent style insertion/deletion. So documents can have a different set of styles, and the Compare method still produces no revisions.
The following code example shows how to check if two documents are equal or not:
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java | |
Document docA = new Document(dataDir + "DocumentA.doc"); | |
Document docB = new Document(dataDir + "DocumentB.doc"); | |
docA.compare(docB, "user", new Date()); | |
if (docA.getRevisions().getCount() == 0) | |
System.out.println("Documents are equal"); | |
else | |
System.out.println("Documents are not equal"); |
The following code example shows how to simply apply the Compare
method to two documents:
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java | |
// The source document doc1 | |
Document doc1 = new Document(); | |
DocumentBuilder builder = new DocumentBuilder(doc1); | |
builder.writeln("This is the original document."); | |
// The target document doc2 | |
Document doc2 = new Document(); | |
builder = new DocumentBuilder(doc2); | |
builder.writeln("This is the edited document."); | |
// If either document has a revision, an exception will be thrown | |
if (doc1.getRevisions().getCount() == 0 && doc2.getRevisions().getCount() == 0) | |
doc1.compare(doc2, "authorName", new Date()); | |
// If doc1 and doc2 are different, doc1 now has some revisions after the comparison, which can now be viewed and processed | |
if (doc1.getRevisions().getCount() == 2) | |
System.out.println("Documents are equal"); | |
for (Revision r : doc1.getRevisions()) | |
{ | |
System.out.println("Revision type: " + r.getRevisionType() + ", on a node of type " + r.getParentNode().getNodeType() + ""); | |
System.out.println("\tChanged text: " + r.getParentNode().getText() + ""); | |
} | |
// All the revisions in doc1 are differences between doc1 and doc2, so accepting them on doc1 transforms doc1 into doc2 | |
doc1.getRevisions().acceptAll(); | |
// doc1, when saved, now resembles doc2 | |
doc1.save(dataDir + "Document.Compare.docx"); | |
doc1 = new Document(dataDir + "Document.Compare.docx"); | |
if (doc1.getRevisions().getCount() == 0) | |
System.out.println("Documents are equal"); | |
if (doc2.getText().trim() == doc1.getText().trim()) | |
System.out.println("Documents are equal"); |
Specify Advanced Comparison Options
There are many different properties of the CompareOptions class which you can apply when you want to compare documents.
For example, Aspose.Words allows you to ignore changes made during a comparison operation for certain types of objects within the original document. You can select the appropriate property for the object type, such as IgnoreHeadersAndFooters, IgnoreFormatting, IgnoreComments, and others by setting them to “true”.
In addition, Aspose.Words provides the Granularity property with which you can specify whether to track changes by character or by word.
Another common property is a choice in which document to show comparison changes. For example, the “Compare documents dialogue box” in Microsoft Word has the option “Show changes in” – this also affects the comparison results. Aspose.Words provides the Target property that serves this purpose.
The following code example shows how to set the advanced comparing properties:
// For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-Java | |
// Create the original document | |
Document docOriginal = new Document(); | |
DocumentBuilder builder = new DocumentBuilder(docOriginal); | |
// Insert paragraph text with an endnote | |
builder.writeln("Hello world! This is the first paragraph."); | |
builder.insertFootnote(FootnoteType.ENDNOTE, "Original endnote text."); | |
// Insert a table | |
builder.startTable(); | |
builder.insertCell(); | |
builder.write("Original cell 1 text"); | |
builder.insertCell(); | |
builder.write("Original cell 2 text"); | |
builder.endTable(); | |
// Insert a textbox | |
Shape textBox = builder.insertShape(ShapeType.TEXT_BOX, 150, 20); | |
builder.moveTo(textBox.getFirstParagraph()); | |
builder.write("Original textbox contents"); | |
// Insert a DATE field | |
builder.moveTo(docOriginal.getFirstSection().getBody().appendParagraph("")); | |
builder.insertField(" DATE "); | |
// Insert a comment | |
Comment newComment = new Comment(docOriginal, "John Doe", "J.D.", new Date()); | |
newComment.setText("Original comment."); | |
builder.getCurrentParagraph().appendChild(newComment); | |
// Insert a header | |
builder.moveToHeaderFooter(HeaderFooterType.HEADER_PRIMARY); | |
builder.writeln("Original header contents."); | |
// Create a clone of our document, which we will edit and later compare to the original | |
Document docEdited = (Document)docOriginal.deepClone(true); | |
Paragraph firstParagraph = docEdited.getFirstSection().getBody().getFirstParagraph(); | |
// Change the formatting of the first paragraph, change casing of original characters and add text | |
firstParagraph.getRuns().get(0).setText("hello world! this is the first paragraph, after editing."); | |
firstParagraph.getParagraphFormat().setStyle(docEdited.getStyles().get(StyleIdentifier.HEADING_1)); | |
// Edit the footnote | |
Footnote footnote = (Footnote)docEdited.getChild(NodeType.FOOTNOTE, 0, true); | |
footnote.getFirstParagraph().getRuns().get(1).setText("Edited endnote text."); | |
// Edit the table | |
Table table = (Table)docEdited.getChild(NodeType.TABLE, 0, true); | |
table.getFirstRow().getCells().get(1).getFirstParagraph().getRuns().get(0).setText("Edited Cell 2 contents"); | |
// Edit the textbox | |
textBox = (Shape)docEdited.getChild(NodeType.SHAPE, 0, true); | |
textBox.getFirstParagraph().getRuns().get(0).setText("Edited textbox contents"); | |
// Edit the DATE field | |
FieldDate fieldDate = (FieldDate)docEdited.getRange().getFields().get(0); | |
fieldDate.setUseLunarCalendar(true); | |
// Edit the comment | |
Comment comment = (Comment)docEdited.getChild(NodeType.COMMENT, 0, true); | |
comment.getFirstParagraph().getRuns().get(0).setText("Edited comment."); | |
// Edit the header | |
docEdited.getFirstSection().getHeadersFooters().getByHeaderFooterType(HeaderFooterType.HEADER_PRIMARY).getFirstParagraph().getRuns().get(0).setText("Edited header contents."); | |
// Apply different comparing options | |
CompareOptions compareOptions = new CompareOptions(); | |
compareOptions.setIgnoreFormatting(false); | |
compareOptions.setIgnoreCaseChanges(false); | |
compareOptions.setIgnoreComments(false); | |
compareOptions.setIgnoreTables(false); | |
compareOptions.setIgnoreFields(false); | |
compareOptions.setIgnoreFootnotes(false); | |
compareOptions.setIgnoreTextboxes(false); | |
compareOptions.setIgnoreHeadersAndFooters(false); | |
compareOptions.setTarget(ComparisonTargetType.NEW); | |
// compare both documents | |
docOriginal.compare(docEdited, "John Doe", new Date(), compareOptions); | |
docOriginal.save(dataDir + "Document.CompareOptions.docx"); |