Compare Documents
Comparing documents is a process that identifies changes between two documents and contains the changes as revisions. This process compares any two documents, including versions of one specific document, then the changes between both documents will be shown as revisions in the first document.
The comparison method is achieved by comparing words at the character level or the word level. If a word contains a change of at least one character, in the result, the difference will be displayed as a change of the entire word, not a character. This process of comparison is a usual task in the legal and financial industries.
Instead of manually searching for differences between documents or between different versions of them, you can use Aspose.Words for comparing documents and getting content changes in formatting, header/footer, tables, and more.
This article explains how to compare documents and how to specify advanced comparing properties.
Try online
You can compare two documents online by using the Document comparison online tool.
Note that the comparison method, described below, is used in this tool to ensure getting equal results. So you will get the same results even by using the online comparison tool or by using the comparison method in Aspose.Words.
Limitations and Supported File Formats
Comparing documents is a very complex feature. There are varied parts of content combination that need to be analyzed to recognize all differences. The reason for this complexity is due to the fact that Aspose.Words aims to get the same comparison results as the Microsoft Word comparison algorithm.
The general limitation for two documents being compared is that they must not have revisions before calling the compare method as this limitation exists in Microsoft Word.
Compare Two Documents
When you compare documents, differences of the latter document from the former show up as revisions to the former. When you modify a document, each edit will have its own revision after running the compare method.
Aspose.Words allows you to identify documents differences using the Compare method – this is similar to the Microsoft Word document compare feature. It allows you to check documents or document versions to find differences and changes, including formatting modifications such as font changes, spacing changes, the addition of words and paragraphs.
As a result of the comparison, documents can be determined as equal or not equal. The term “equal” documents means that the comparison method is not able to represent changes as revisions. This means that both document text and text formatting are the same. But there can be other differences between documents. For example, Microsoft Word supports only format revisions for styles, and you cannot represent style insertion/deletion. So documents can have a different set of styles, and the Compare method still produces no revisions.
The following code example shows how to check if two documents are equal or not:
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C | |
System::SharedPtr<Document> docA = System::MakeObject<Document>(inputDataDir + u"TestFile.doc"); | |
System::SharedPtr<Document> docB = System::MakeObject<Document>(inputDataDir + u"TestFile - Copy.doc"); | |
// DocA now contains changes as revisions. | |
docA->Compare(docB, u"user", System::DateTime::get_Now()); | |
if (docA->get_Revisions()->get_Count() == 0) | |
{ | |
std::cout << "Documents are equal" << std::endl; | |
} | |
else | |
{ | |
std::cout << "Documents are not equal" << std::endl; | |
} |
The following code example shows how to simply apply the Compare
method to two documents:
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C | |
// The source document doc1 | |
System::SharedPtr<Document> doc1 = System::MakeObject<Document>(); | |
System::SharedPtr<DocumentBuilder> builder = System::MakeObject<DocumentBuilder>(doc1); | |
builder->Writeln(u"This is the original document."); | |
// The target document doc2 | |
System::SharedPtr<Document> doc2 = System::MakeObject<Document>(); | |
builder = System::MakeObject<DocumentBuilder>(doc2); | |
builder->Writeln(u"This is the edited document."); | |
// If either document has a revision, an exception will be thrown | |
if (doc1->get_Revisions()->get_Count() == 0 && doc2->get_Revisions()->get_Count() == 0) | |
doc1->Compare(doc2, u"authorName", System::DateTime::get_Now()); | |
// If doc1 and doc2 are different, doc1 now has some revisions after the comparison, which can now be viewed and processed | |
if (doc1->get_Revisions()->get_Count() == 2) | |
std::cout << "Documents are equal." << std::endl << std::endl; | |
for (System::SharedPtr<Revision> r : System::IterateOver(doc1->get_Revisions())) | |
{ | |
std::cout << "Revision type: " << System::ObjectExt::ToString(r->get_RevisionType()).ToUtf8String() | |
<< ", on a node of type " << System::ObjectExt::ToString(r->get_ParentNode()->get_NodeType()).ToUtf8String() << std::endl; | |
std::cout << "Changed text: " << r->get_ParentNode()->GetText() << std::endl; | |
} | |
// All the revisions in doc1 are differences between doc1 and doc2, so accepting them on doc1 transforms doc1 into doc2 | |
doc1->get_Revisions()->AcceptAll(); | |
// doc1, when saved, now resembles doc2 | |
doc1->Save(inputDataDir + u"Document.Compare.docx"); | |
doc1 = System::MakeObject<Document>(inputDataDir + u"Document.Compare.docx"); | |
if (doc1->get_Revisions()->get_Count() == 0) | |
std::cout << "Documents are equal" << std::endl; | |
if (doc2->GetText().Trim() == doc1->GetText().Trim()) | |
std::cout << "Documents are equal" << std::endl; | |
Specify Advanced Comparison Options
There are many different properties of the CompareOptions class which you can apply when you want to compare documents.
For example, Aspose.Words allows you to ignore changes made during a comparison operation for certain types of objects within the original document. You can select the appropriate property for the object type, such as IgnoreHeadersAndFooters, IgnoreFormatting, IgnoreComments, and others by setting them to “true”.
In addition, Aspose.Words provides the Granularity property with which you can specify whether to track changes by character or by word.
Another common property is a choice in which document to show comparison changes. For example, the “Compare documents dialogue box” in Microsoft Word has the option “Show changes in” – this also affects the comparison results. Aspose.Words provides the Target property that serves this purpose.
The following code example shows how to set the advanced comparing properties:
For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C | |
// Create the original document | |
System::SharedPtr<Document> docOriginal = System::MakeObject<Document>(); | |
System::SharedPtr<DocumentBuilder> builder = System::MakeObject<DocumentBuilder>(docOriginal); | |
// Insert paragraph text with an endnote | |
builder->Writeln(u"Hello world! This is the first paragraph."); | |
builder->InsertFootnote(FootnoteType::Endnote, u"Original endnote text."); | |
// Insert a table | |
builder->StartTable(); | |
builder->InsertCell(); | |
builder->Write(u"Original cell 1 text"); | |
builder->InsertCell(); | |
builder->Write(u"Original cell 2 text"); | |
builder->EndTable(); | |
// Insert a textbox | |
System::SharedPtr<Shape> textBox = builder->InsertShape(ShapeType::TextBox, 150, 20); | |
builder->MoveTo(textBox->get_FirstParagraph()); | |
builder->Write(u"Original textbox contents"); | |
// Insert a DATE field | |
builder->MoveTo(docOriginal->get_FirstSection()->get_Body()->AppendParagraph(u"")); | |
builder->InsertField(u" DATE "); | |
// Insert a comment | |
auto newComment = System::MakeObject<Comment>(docOriginal, u"John Doe", u"J.D.", System::DateTime::get_Now()); | |
newComment->SetText(u"Original comment."); | |
builder->get_CurrentParagraph()->AppendChild(newComment); | |
// Insert a header | |
builder->MoveToHeaderFooter(Aspose::Words::HeaderFooterType::HeaderPrimary); | |
builder->Writeln(u"Original header contents."); | |
// Create a clone of our document, which we will edit and later compare to the original | |
auto docEdited = System::DynamicCast<Aspose::Words::Document>(System::StaticCast<Node>(docOriginal)->Clone(true)); | |
System::SharedPtr<Paragraph> firstParagraph = docEdited->get_FirstSection()->get_Body()->get_FirstParagraph(); | |
// Change the formatting of the first paragraph, change casing of original characters and add text | |
firstParagraph->get_Runs()->idx_get(0)->set_Text(u"hello world! this is the first paragraph, after editing."); | |
firstParagraph->get_ParagraphFormat()->set_Style(docEdited->get_Styles()->idx_get(Aspose::Words::StyleIdentifier::Heading1)); | |
// Edit the footnote | |
auto footnote = System::DynamicCast<Aspose::Words::Footnote>(docEdited->GetChild(Aspose::Words::NodeType::Footnote, 0, true)); | |
footnote->get_FirstParagraph()->get_Runs()->idx_get(1)->set_Text(u"Edited endnote text."); | |
// Edit the table | |
auto table = System::DynamicCast<Aspose::Words::Tables::Table>(docEdited->GetChild(Aspose::Words::NodeType::Table, 0, true)); | |
table->get_FirstRow()->get_Cells()->idx_get(1)->get_FirstParagraph()->get_Runs()->idx_get(0)->set_Text(u"Edited Cell 2 contents"); | |
// Edit the textbox | |
textBox = System::DynamicCast<Aspose::Words::Drawing::Shape>(docEdited->GetChild(Aspose::Words::NodeType::Shape, 0, true)); | |
textBox->get_FirstParagraph()->get_Runs()->idx_get(0)->set_Text(u"Edited textbox contents"); | |
// Edit the DATE field | |
auto fieldDate = System::DynamicCast<Aspose::Words::Fields::FieldDate>(docEdited->get_Range()->get_Fields()->idx_get(0)); | |
fieldDate->set_UseLunarCalendar(true); | |
// Edit the comment | |
auto comment = System::DynamicCast<Aspose::Words::Comment>(docEdited->GetChild(Aspose::Words::NodeType::Comment, 0, true)); | |
comment->get_FirstParagraph()->get_Runs()->idx_get(0)->set_Text(u"Edited comment."); | |
// Edit the header | |
docEdited->get_FirstSection()->get_HeadersFooters()->idx_get(Aspose::Words::HeaderFooterType::HeaderPrimary)->get_FirstParagraph()->get_Runs()->idx_get(0)->set_Text(u"Edited header contents."); | |
// When we compare documents, the differences of the latter document from the former show up as revisions to the former | |
// Each edit that we've made above will have its own revision, after we run the Compare method | |
// We can compare with a CompareOptions object, which can suppress changes done to certain types of objects within the original document | |
// from registering as revisions after the comparison by setting some of these members to "true" | |
auto compareOptions = System::MakeObject<Aspose::Words::CompareOptions>(); | |
compareOptions->set_IgnoreFormatting(false); | |
compareOptions->set_IgnoreCaseChanges(false); | |
compareOptions->set_IgnoreComments(false); | |
compareOptions->set_IgnoreTables(false); | |
compareOptions->set_IgnoreFields(false); | |
compareOptions->set_IgnoreFootnotes(false); | |
compareOptions->set_IgnoreTextboxes(false); | |
compareOptions->set_IgnoreHeadersAndFooters(false); | |
compareOptions->set_Target(Aspose::Words::ComparisonTargetType::New); | |
docOriginal->Compare(docEdited, u"John Doe", System::DateTime::get_Now(), compareOptions); | |
docOriginal->Save(inputDataDir + u"Document.CompareOptions.docx"); |