Compare Documents

Comparing documents is a process that identifies changes between two documents and contains the changes as revisions. This process compares any two documents, including versions of one specific document, then the changes between both documents will be shown as revisions in the first document.

The comparison method is achieved by comparing words at the character level or the word level. If a word contains a change of at least one character, in the result, the difference will be displayed as a change of the entire word, not a character. This process of comparison is a usual task in the legal and financial industries.

Instead of manually searching for differences between documents or between different versions of them, you can use Aspose.Words for comparing documents and getting content changes in formatting, header/footer, tables, and more.

This article explains how to compare documents and how to specify advanced comparing properties.

Limitations and Supported File Formats

Comparing documents is a very complex feature. There are varied parts of content combination that need to be analyzed to recognize all differences. The reason for this complexity is due to the fact that Aspose.Words aims to get the same comparison results as the Microsoft Word comparison algorithm.

The general limitation for two documents being compared is that they must not have revisions before calling the compare method as this limitation exists in Microsoft Word.

Compare Two Documents

When you compare documents, differences of the latter document from the former show up as revisions to the former. When you modify a document, each edit will have its own revision after running the compare method.

Aspose.Words allows you to identify documents differences using the Compare method – this is similar to the Microsoft Word document compare feature. It allows you to check documents or document versions to find differences and changes, including formatting modifications such as font changes, spacing changes, the addition of words and paragraphs.

As a result of the comparison, documents can be determined as equal or not equal. The term “equal” documents means that the comparison method is not able to represent changes as revisions. This means that both document text and text formatting are the same. But there can be other differences between documents. For example, Microsoft Word supports only format revisions for styles, and you cannot represent style insertion/deletion. So documents can have a different set of styles, and the Compare method still produces no revisions.

The following code example shows how to check if two documents are equal or not:

For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
System::SharedPtr<Document> docA = System::MakeObject<Document>(inputDataDir + u"TestFile.doc");
System::SharedPtr<Document> docB = System::MakeObject<Document>(inputDataDir + u"TestFile - Copy.doc");
// DocA now contains changes as revisions.
docA->Compare(docB, u"user", System::DateTime::get_Now());
if (docA->get_Revisions()->get_Count() == 0)
{
std::cout << "Documents are equal" << std::endl;
}
else
{
std::cout << "Documents are not equal" << std::endl;
}

The following code example shows how to simply apply the Compare method to two documents:

For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// The source document doc1
System::SharedPtr<Document> doc1 = System::MakeObject<Document>();
System::SharedPtr<DocumentBuilder> builder = System::MakeObject<DocumentBuilder>(doc1);
builder->Writeln(u"This is the original document.");
// The target document doc2
System::SharedPtr<Document> doc2 = System::MakeObject<Document>();
builder = System::MakeObject<DocumentBuilder>(doc2);
builder->Writeln(u"This is the edited document.");
// If either document has a revision, an exception will be thrown
if (doc1->get_Revisions()->get_Count() == 0 && doc2->get_Revisions()->get_Count() == 0)
doc1->Compare(doc2, u"authorName", System::DateTime::get_Now());
// If doc1 and doc2 are different, doc1 now has some revisions after the comparison, which can now be viewed and processed
if (doc1->get_Revisions()->get_Count() == 2)
std::cout << "Documents are equal." << std::endl << std::endl;
for (System::SharedPtr<Revision> r : System::IterateOver(doc1->get_Revisions()))
{
std::cout << "Revision type: " << System::ObjectExt::ToString(r->get_RevisionType()).ToUtf8String()
<< ", on a node of type " << System::ObjectExt::ToString(r->get_ParentNode()->get_NodeType()).ToUtf8String() << std::endl;
std::cout << "Changed text: " << r->get_ParentNode()->GetText() << std::endl;
}
// All the revisions in doc1 are differences between doc1 and doc2, so accepting them on doc1 transforms doc1 into doc2
doc1->get_Revisions()->AcceptAll();
// doc1, when saved, now resembles doc2
doc1->Save(inputDataDir + u"Document.Compare.docx");
doc1 = System::MakeObject<Document>(inputDataDir + u"Document.Compare.docx");
if (doc1->get_Revisions()->get_Count() == 0)
std::cout << "Documents are equal" << std::endl;
if (doc2->GetText().Trim() == doc1->GetText().Trim())
std::cout << "Documents are equal" << std::endl;

Specify Advanced Comparison Options

There are many different properties of the CompareOptions class which you can apply when you want to compare documents.

For example, Aspose.Words allows you to ignore changes made during a comparison operation for certain types of objects within the original document. You can select the appropriate property for the object type, such as IgnoreHeadersAndFooters, IgnoreFormatting, IgnoreComments, and others by setting them to “true”.

In addition, Aspose.Words provides the Granularity property with which you can specify whether to track changes by character or by word.

Another common property is a choice in which document to show comparison changes. For example, the “Compare documents dialogue box” in Microsoft Word has the option “Show changes in” – this also affects the comparison results. Aspose.Words provides the Target property that serves this purpose.

The following code example shows how to set the advanced comparing properties:

For complete examples and data files, please go to https://github.com/aspose-words/Aspose.Words-for-C
// Create the original document
System::SharedPtr<Document> docOriginal = System::MakeObject<Document>();
System::SharedPtr<DocumentBuilder> builder = System::MakeObject<DocumentBuilder>(docOriginal);
// Insert paragraph text with an endnote
builder->Writeln(u"Hello world! This is the first paragraph.");
builder->InsertFootnote(FootnoteType::Endnote, u"Original endnote text.");
// Insert a table
builder->StartTable();
builder->InsertCell();
builder->Write(u"Original cell 1 text");
builder->InsertCell();
builder->Write(u"Original cell 2 text");
builder->EndTable();
// Insert a textbox
System::SharedPtr<Shape> textBox = builder->InsertShape(ShapeType::TextBox, 150, 20);
builder->MoveTo(textBox->get_FirstParagraph());
builder->Write(u"Original textbox contents");
// Insert a DATE field
builder->MoveTo(docOriginal->get_FirstSection()->get_Body()->AppendParagraph(u""));
builder->InsertField(u" DATE ");
// Insert a comment
auto newComment = System::MakeObject<Comment>(docOriginal, u"John Doe", u"J.D.", System::DateTime::get_Now());
newComment->SetText(u"Original comment.");
builder->get_CurrentParagraph()->AppendChild(newComment);
// Insert a header
builder->MoveToHeaderFooter(Aspose::Words::HeaderFooterType::HeaderPrimary);
builder->Writeln(u"Original header contents.");
// Create a clone of our document, which we will edit and later compare to the original
auto docEdited = System::DynamicCast<Aspose::Words::Document>(System::StaticCast<Node>(docOriginal)->Clone(true));
System::SharedPtr<Paragraph> firstParagraph = docEdited->get_FirstSection()->get_Body()->get_FirstParagraph();
// Change the formatting of the first paragraph, change casing of original characters and add text
firstParagraph->get_Runs()->idx_get(0)->set_Text(u"hello world! this is the first paragraph, after editing.");
firstParagraph->get_ParagraphFormat()->set_Style(docEdited->get_Styles()->idx_get(Aspose::Words::StyleIdentifier::Heading1));
// Edit the footnote
auto footnote = System::DynamicCast<Aspose::Words::Footnote>(docEdited->GetChild(Aspose::Words::NodeType::Footnote, 0, true));
footnote->get_FirstParagraph()->get_Runs()->idx_get(1)->set_Text(u"Edited endnote text.");
// Edit the table
auto table = System::DynamicCast<Aspose::Words::Tables::Table>(docEdited->GetChild(Aspose::Words::NodeType::Table, 0, true));
table->get_FirstRow()->get_Cells()->idx_get(1)->get_FirstParagraph()->get_Runs()->idx_get(0)->set_Text(u"Edited Cell 2 contents");
// Edit the textbox
textBox = System::DynamicCast<Aspose::Words::Drawing::Shape>(docEdited->GetChild(Aspose::Words::NodeType::Shape, 0, true));
textBox->get_FirstParagraph()->get_Runs()->idx_get(0)->set_Text(u"Edited textbox contents");
// Edit the DATE field
auto fieldDate = System::DynamicCast<Aspose::Words::Fields::FieldDate>(docEdited->get_Range()->get_Fields()->idx_get(0));
fieldDate->set_UseLunarCalendar(true);
// Edit the comment
auto comment = System::DynamicCast<Aspose::Words::Comment>(docEdited->GetChild(Aspose::Words::NodeType::Comment, 0, true));
comment->get_FirstParagraph()->get_Runs()->idx_get(0)->set_Text(u"Edited comment.");
// Edit the header
docEdited->get_FirstSection()->get_HeadersFooters()->idx_get(Aspose::Words::HeaderFooterType::HeaderPrimary)->get_FirstParagraph()->get_Runs()->idx_get(0)->set_Text(u"Edited header contents.");
// When we compare documents, the differences of the latter document from the former show up as revisions to the former
// Each edit that we've made above will have its own revision, after we run the Compare method
// We can compare with a CompareOptions object, which can suppress changes done to certain types of objects within the original document
// from registering as revisions after the comparison by setting some of these members to "true"
auto compareOptions = System::MakeObject<Aspose::Words::CompareOptions>();
compareOptions->set_IgnoreFormatting(false);
compareOptions->set_IgnoreCaseChanges(false);
compareOptions->set_IgnoreComments(false);
compareOptions->set_IgnoreTables(false);
compareOptions->set_IgnoreFields(false);
compareOptions->set_IgnoreFootnotes(false);
compareOptions->set_IgnoreTextboxes(false);
compareOptions->set_IgnoreHeadersAndFooters(false);
compareOptions->set_Target(Aspose::Words::ComparisonTargetType::New);
docOriginal->Compare(docEdited, u"John Doe", System::DateTime::get_Now(), compareOptions);
docOriginal->Save(inputDataDir + u"Document.CompareOptions.docx");