Compare PDF documents

Comparing PDF Documents with Aspose.PDF for .NET

When working with PDF documents, there are times when you need to compare the content of two documents to identify differences. The Aspose.PDF for .NET library provides a powerful toolset for this purpose. In this article, we’ll explore how to compare PDF documents using a couple of simple code snippets.

The comparison functionality in Aspose.PDF allows you to compare two PDF documents page by page. You can choose to compare either specific pages or entire documents. The resulting comparison document highlights differences, making it easier to identify changes between the two files.

Comparing Specific Pages

The first code snippet demonstrates how to compare the first pages of two PDF documents.

Steps:

  1. Document Initialization. The code starts by initializing two PDF documents using their respective file paths (documentPath1 and documentPath2). The paths are specified as empty strings for now, but in practice, you would replace these with the actual file paths.
  2. Comparison Process.
  • Page Selection - the comparison is limited to the first page of each document (‘Pages[1]’).
  • Comparison Options:

‘AdditionalChangeMarks = true’- this option ensures that additional change markers are displayed. These markers highlight differences that might be present on other pages, even if they are not on the current page being compared.

‘ComparisonMode = ComparisonMode.IgnoreSpaces’ - this mode tells the comparer to ignore spaces in the text, focusing only on changes within words.

  1. The resulting comparison document, which highlights the differences between the two pages, is saved to the file path specified in ‘resultPdfPath’.

    string documentPath1 = "";
    string documentPath2= "";

    string resultPdfPath ="";

    using (Document document1 = new Document(documentPath1), document2 = new Document(documentPath2))
    {
        SideBySidePdfComparer.Compare(document1.Pages[1], document2.Pages[1], resultPdfPath, new SideBySideComparisonOptions()
        {
            AdditionalChangeMarks = true,
            ComparisonMode = ComparisonMode.IgnoreSpaces
        });
    }

Comparing Entire Documents

The second code snippet expands the scope to compare the entire content of two PDF documents.

Steps:

  1. Document Initialization. Just like in the first example, two PDF documents are initialized with their file paths.
  2. Comparison Process.
  • Entire Document Comparison - unlike the first snippet, this code compares the entire content of the two documents.
  • Comparison Options - the options are the same as in the first snippet, ensuring that spaces are ignored, and additional change markers are displayed.
  1. The comparison result, which highlights differences across all pages of the two documents, is saved in the file specified by ‘resultPdfPath’.
    string documentPath1 = "";
    string documentPath2 = "";

    string resultPdfPath ="";

    using (Document document1 = new Document(documentPath1), document2 = new Document(documentPath2))
    {
        SideBySidePdfComparer.Compare(document1, document2, resultPdfPath, new SideBySideComparisonOptions()
        {
            AdditionalChangeMarks = true,
            ComparisonMode = ComparisonMode.IgnoreSpaces
        });
    }

The comparison results generated by these snippets are PDF documents that you can open in a viewer like Adobe Acrobat. If you use the Two-page view in Adobe Acrobat, you’ll see the changes side by side:

  • Deletions - these are noted on the left page.
  • Insertions - these are noted on the right page.

By setting ‘AdditionalChangeMarks’ to ’true’, you can also see markers for changes that may occur on other pages, even if those changes aren’t on the current page being viewed.

Aspose.PDF for .NET provides robust tools for comparing PDF documents, whether you need to compare specific pages or entire documents. By using options like ‘AdditionalChangeMarks’ and different ‘ComparisonMode settings’, you can tailor the comparison process to your specific needs. The resulting document provides a clear, side-by-side view of changes, making it easier to track revisions and ensure document accuracy.