Compare PDF documents

Please note that all comparing tools are available in Aspose.PDF.Drawing library.

Ways to compare PDF Documents

When working with PDF documents, there are times when you need to compare the content of two documents to identify differences. The Aspose.PDF for .NET library provides a powerful toolset for this purpose. In this article, we’ll explore how to compare PDF documents using a couple of simple code snippets.

The comparison functionality in Aspose.PDF allows you to compare two PDF documents page by page. You can choose to compare either specific pages or entire documents. The resulting comparison document highlights differences, making it easier to identify changes between the two files.

Here is a list of possible ways to compare PDF documents using Aspose.PDF for .NET library:

  1. Comparing Specific Pages - Compare the first pages of two PDF documents.

  2. Comparing Entire Documents - Compare the entire content of two PDF documents.

  3. Compare PDF documents graphically:

  • Compare PDF with GetDifference method - individual images where changes are marked.

  • Compare PDF with CompareDocumentsToPdf method - PDF document with images where changes are marked.

Comparing Specific Pages

The first code snippet demonstrates how to compare the first pages of two PDF documents.

Steps:

  1. Document Initialization. The code starts by initializing two PDF documents using their respective file paths (documentPath1 and documentPath2). The paths are specified as empty strings for now, but in practice, you would replace these with the actual file paths.

  2. Comparison Process.

  • Page Selection - the comparison is limited to the first page of each document (‘Pages[1]’).
  • Comparison Options:

‘AdditionalChangeMarks = true’- this option ensures that additional change markers are displayed. These markers highlight differences that might be present on other pages, even if they are not on the current page being compared.

‘ComparisonMode = ComparisonMode.IgnoreSpaces’ - this mode tells the comparer to ignore spaces in the text, focusing only on changes within words.

  1. The resulting comparison document, which highlights the differences between the two pages, is saved to the file path specified in ‘resultPdfPath’.
string documentPath1 = "";
string documentPath2= "";

string resultPdfPath = "";

using (Document document1 = new Document(documentPath1), document2 = new Document(documentPath2))
{
    SideBySidePdfComparer.Compare(document1.Pages[1], document2.Pages[1], resultPdfPath, new SideBySideComparisonOptions()
    {
        AdditionalChangeMarks = true,
        ComparisonMode = ComparisonMode.IgnoreSpaces
    });
}

Comparing Entire Documents

The second code snippet expands the scope to compare the entire content of two PDF documents.

Steps:

  1. Document Initialization. Just like in the first example, two PDF documents are initialized with their file paths.

  2. Comparison Process.

  • Entire Document Comparison - unlike the first snippet, this code compares the entire content of the two documents.

  • Comparison Options - the options are the same as in the first snippet, ensuring that spaces are ignored, and additional change markers are displayed.

  1. The comparison result, which highlights differences across all pages of the two documents, is saved in the file specified by ‘resultPdfPath’.
string documentPath1 = "";
string documentPath2 = "";

string resultPdfPath = "";

using (Document document1 = new Document(documentPath1), document2 = new Document(documentPath2))
{
    SideBySidePdfComparer.Compare(document1, document2, resultPdfPath, new SideBySideComparisonOptions()
    {
        AdditionalChangeMarks = true,
        ComparisonMode = ComparisonMode.IgnoreSpaces
    });
}

The comparison results generated by these snippets are PDF documents that you can open in a viewer like Adobe Acrobat. If you use the Two-page view in Adobe Acrobat, you’ll see the changes side by side:

  • Deletions - these are noted on the left page.
  • Insertions - these are noted on the right page.

By setting ‘AdditionalChangeMarks’ to ’true’, you can also see markers for changes that may occur on other pages, even if those changes aren’t on the current page being viewed.

Aspose.PDF for .NET provides robust tools for comparing PDF documents, whether you need to compare specific pages or entire documents. By using options like ‘AdditionalChangeMarks’ and different ‘ComparisonMode settings’, you can tailor the comparison process to your specific needs. The resulting document provides a clear, side-by-side view of changes, making it easier to track revisions and ensure document accuracy.

Compare PDF documents using GraphicalPdfComparer

When collaborating on documents, especially in professional environments, you often end up with multiple versions of the same file.

You can use the GraphicalPdfComparer class to compare PDF documents and pages. The class is suitable for comparing changes in a page’s graphic content.

With Aspose.PDF for .NET, it’s possible to compare documents and pages and output the comparison result to a PDF document or image file.

You can set the following class properties:

  • Resolution - resolution in DPI units for output images, as well as for images generated during the comparison.
  • Color - the color of change marks.
  • Threshold - change threshold in percent. The default value is zero. Setting a value other than zero allows you to ignore graphic changes that are insignificant to you.

The class has a method that allows you to get page image differences in a form suitable for further processing: ImagesDifference GetDifference(Page page1, Page page2).

This method returns an object of the ImagesDifference class, which contains an image of the first page being compared and an array of differences. The array of differences and the original image has the RGB24bpp pixel format.

ImagesDifference allows you to generate a different image and get an image of the second page being compared by adding an array of differences to the original image. To do this, use the ImagesDifference.GetDestinationImage and ImagesDifference.DifferenceToImage methods.

Compare PDF with GetDifference method

The provided code defines a method GetDifference that compares two PDF documents and generates visual representations of the differences between them.

This method compares the first pages of two PDF files and generates two PNG images:

  • One image (diffPngFilePath) highlights the differences between the pages in red.
  • The other image (destPngFilePath) is a visual representation of the destination (second) PDF page.

This process can be useful for visually comparing changes or differences between two versions of a document.

string doc1Path = "";
string doc2Path = "";
string destPngFilePath = "";
string diffPngFilePath = "";

using (Document doc1 = new Document(doc1Path), doc2 = new Document(doc2Path))
{
    GraphicalPdfComparer comparer = new GraphicalPdfComparer();
    using (ImagesDifference imagesDifference = comparer.GetDifference(doc1.Pages[1], doc2.Pages[1]))
    {

        using (Bitmap diffImg = imagesDifference.DifferenceToImage(Color.Red, Color.White))
        {
            diffImg.Save(diffPngFilePath);
        }

        using (Bitmap destImg = imagesDifference.GetDestinationImage())
        {
            destImg.Save(destPngFilePath);
        }
    }
}

Compare PDF with CompareDocumentsToPdf method

The provided code snippet used the CompareDocumentsToPdf method, which compares two documents and generates a PDF report of the comparison results.

string firstPath = "";
string secondPath = "";
string resultPdfPath = "";
using (Document doc1 = new Document(firstPath), doc2 = new Document(secondPath))
{
    GraphicalPdfComparer comparer = new GraphicalPdfComparer()
    {
        Threshold = 3.0,
        Color = Color.Blue,
        Resolution = new Resolution(300)
    };
    comparer.CompareDocumentsToPdf(doc1, doc2, resultPdfPath);
}