DOC Converter

This article guides you through using the Aspose.Pdf DOC Converter for .NET to convert a PDF document to Microsoft Word format (.doc / .docx).

Prerequisites

You will need the following:

  • Visual Studio 2019 or later
  • Aspose.PDF for .NET 24.1 or later
  • A sample PDF file that contains some form fields

You can download the Aspose.PDF for .NET library from the official website or install them using the NuGet Package Manager in Visual Studio.

Steps

1. Setting Up Your Conversion (screenshot of FileDataSource class)

The conversion process involves three main steps: defining input and output files, creating a PdfDoc object, and specifying conversion options.

1.1. Defining Data Sources

  • Input File: We’ll use the FileDataSource class to specify the location of the PDF file you want to convert.
  FileDataSource inputDataSource = new(Path.Combine(@"C:\Samples\", "sample.pdf"));
  • Replace "C:\Samples\sample.pdf" with the actual path to your PDF file.

  • Output File: Similarly, use another FileDataSource object to define the location and filename for the resulting Word document.

  FileDataSource outputDataSource = new(Path.Combine(@"C:\Samples\", "sample.docx"));
  • Replace "C:\Samples\sample.docx" with your desired output path and filename.

2. Creating the PdfDoc Plugin Object (screenshot of PdfDoc class)

Next, we create an instance of the PdfDoc class to perform the conversion.

  var plugin = new PdfDoc();

This object serves as the engine for the conversion process.

3. Configuring Conversion Options

The PdfToDocOptions class allows you to fine-tune the conversion process. Here’s how to set the essential options:

  • Save Format: Specify the desired output format for the Word document. In this case, we use SaveFormat.DocX to generate a Microsoft Word 2007 or later compatible document (.docx).

  • Conversion Mode: Define how the plugin interprets the PDF structure during conversion. We’ll use ConversionMode.EnhancedFlow to optimize the resulting Word document for layout and formatting.

Here’s the code snippet for configuring options:

  PdfToDocOptions options = new()
  {
      SaveFormat = SaveFormat.DocX,
      ConversionMode = ConversionMode.EnhancedFlow
  };

Adding Input and Output:

Finally, we associate the previously defined data sources with the conversion options using the AddInput and AddOutput methods:

  options.AddInput(inputDataSource);
  options.AddOutput(outputDataSource);

This connects the input PDF and desired output Word document to the conversion process.

4. Performing the Conversion

With everything set up, let’s initiate the conversion by calling the Process method of the PdfDoc plugin and passing the configured options:

  var resultContainer = plugin.Process(options);

This method executes the conversion and returns a ResultContainer object containing details about the process.

Retrieving Results:

Although not essential for basic conversion, you can access the results through the ResultCollection property of the ResultContainer object. This might be useful for debugging or tracking specific conversion details.

  var result = resultContainer.ResultCollection[0];

  // Print the result (optional for demonstration purposes)
  Console.WriteLine(result);

With this final step, your PDF document will be converted to the specified Word format and saved to the defined output location.