How to Create PDF using C#

We are always looking for a way to generate PDF documents and work with them in C# projects more exactly, accurately, and effectively. Having easy-to-use functions from a library allows us to track more of the work, and less on the time-heavy details of trying to generate PDFs, whether in .NET.

The following code snippet also work with Aspose.PDF.Drawing library.

Create (or Generate) PDF document using C# language

Aspose.PDF for .NET API lets you create and read PDF files using C# and VB.NET. The API can be used in a variety of .NET applications including WinForms, ASP.NET, and several others. In this article, we are going to show how to use Aspose.PDF for .NET API to easily generate and read PDF files in .NET applications.

How to Create Simple PDF File

To create a PDF file using C#, the following steps can be used.

  1. Create an object of Document class.
  2. Add a Page object to the Pages collection of the Document object.
  3. Add TextFragment to Paragraphs collection of the page.
  4. Save the resultant PDF document.
// The path to the documents directory.
string dataDir = RunExamples.GetDataDir_AsposePdf_QuickStart();

// Initialize document object
Document document = new Document();
// Add page
Page page = document.Pages.Add();
// Add text to new page
page.Paragraphs.Add(new TextFragment("Hello World!"));
// Save updated PDF
document.Save(dataDir + "HelloWorld_out.pdf");

How to Create a Searchable PDF document

Aspose.PDF for .NET provides the feature to create as well as manipulate existing PDF documents. When adding Text elements inside PDF file, the resultant PDF is searchable. However if we are converting an Image containing text to PDF file, the contents inside PDF are not searchable. However as a workaround, we can use OCR over the resultant file, so that it becomes searchable.

This logic specified below recognizes text for PDF images. For recognition you may use outer OCR supports HOCR standard. For testing purposes, we have used a free Google tesseract OCR. Therefore first you need to install Tesseract-OCR on your system, and you will have tesseract console application.

Following is complete code to accomplish this requirement:

using (Document document = new Document(file))
{
    bool convertResult = false;
    try
    {
        convertResult = document.Convert(CallBackGetHocr);
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
    }
    document.Save(file);
}

static string CallBackGetHocr(System.Drawing.Image img)
{
    string tmpFile = System.IO.Path.GetTempFileName();
    try
    {
        System.Drawing.Bitmap bmp = new System.Drawing.Bitmap(img);

        bmp.Save(tmpFile, System.Drawing.Imaging.ImageFormat.Bmp);
        string inputFile = string.Concat('"', tmpFile, '"');
        string outputFile = string.Concat('"', tmpFile, '"');
        string arguments = string.Concat(inputFile, " ", outputFile, " -l eng hocr");
        string tesseractProcessName = @"C:\Program Files\Tesseract-OCR\Tesseract.exe";

        System.Diagnostics.ProcessStartInfo psi =
            new System.Diagnostics.ProcessStartInfo(tesseractProcessName, arguments)
            {
                UseShellExecute = true,
                CreateNoWindow = true,
                WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden,
                WorkingDirectory = System.IO.Path.GetDirectoryName(tesseractProcessName)
            };

        System.Diagnostics.Process p = new System.Diagnostics.Process
        {
            StartInfo = psi
        };
        p.Start();
        p.WaitForExit();

        using (System.IO.StreamReader streamReader = new System.IO.StreamReader(tmpFile + ".hocr"))
        {
            string text = streamReader.ReadToEnd();
            return text;
        }
    }
    finally
    {
        if (System.IO.File.Exists(tmpFile))
        {
            System.IO.File.Delete(tmpFile);
        }
        if (System.IO.File.Exists(tmpFile + ".hocr"))
        {
            System.IO.File.Delete(tmpFile + ".hocr");
        }
    }
}

How to Create an accessible PDF using low-level functions

This code snippet works with a PDF document and its tagged content, utilizing an Aspose.PDF library to process it.

The example creates a new span element in the tagged content of the first page of a PDF, finds all BDC elements, and associates them with the span. The modified document is then saved.

You can create a bdc statement specifying mcid, lang, and expansion text using the BDCProperties object:

BDC bdc = new BDC(PdfConsts.P, new BDCProperties(1, "de", "Hallo, welt!"));

After creating the structure tree, it is possible to bind the BDC operator to the specified element of the structure with method Tag on the element object:

SpanElement span = content.CreateSpanElement();
span.Tag(bdc);

Steps to creating an accessible PDF:

  1. Load the PDF Document.
  2. Access Tagged Content.
  3. Create a Span Element.
  4. Append Span to Root Element.
  5. Iterate Over Page Contents.
  6. Check for BDC Elements and Tag Them.
  7. Save the Modified Document.
var document = new Document(somepdffilepath);
ITaggedContent content = document.TaggedContent;
SpanElement span = content.CreateSpanElement();
content.RootElement.AppendChild(span);
foreach (var op in document.Pages[1].Contents)
{
    BDC bdc = op as BDC;
    if (bdc != null)
    {
        span.Tag(bdc);
    }
}

document.Save(output);

This code modifies a PDF by creating a span element within the document’s tagged content and tagging specific content (BDC operations) from the first page with this span. The modified PDF is then saved to a new file.