Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.
We are always looking for a way to generate PDF documents and work with them in C# projects more exactly, accurately, and effectively. Having easy-to-use functions from a library allows us to track more of the work, and less on the time-heavy details of trying to generate PDFs, whether in .NET.
The following code snippet also work with Aspose.PDF.Drawing library.
Aspose.PDF for .NET API lets you create and read PDF files using C# and VB.NET. The API can be used in a variety of .NET applications including WinForms, ASP.NET, and several others. In this article, we are going to show how to use Aspose.PDF for .NET API to easily generate and read PDF files in .NET applications.
To create a PDF file using C#, the following steps can be used.
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void CreateHelloWorldDocument()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_QuickStart();
// Create PDF document
using (var document = new Aspose.Pdf.Document())
{
// Add page
var page = document.Pages.Add();
// Add text to new page
page.Paragraphs.Add(new Aspose.Pdf.Text.TextFragment("Hello World!"));
// Save PDF document
document.Save(dataDir + "HelloWorld_out.pdf");
}
}
Aspose.PDF for .NET provides the feature to create as well as manipulate existing PDF documents. When adding Text elements inside PDF file, the resultant PDF is searchable. However if we are converting an Image containing text to PDF file, the contents inside PDF are not searchable. However as a workaround, we can use OCR over the resultant file, so that it becomes searchable.
This logic specified below recognizes text for PDF images. For recognition you may use outer OCR supports HOCR standard. For testing purposes, we have used a free Google tesseract OCR. Therefore first you need to install Tesseract-OCR on your system, and you will have tesseract console application.
Following is complete code to accomplish this requirement:
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void CreateSearchableDocument()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_QuickStart();
// Open PDF document
using (var document = new Aspose.Pdf.Document(dataDir + "SearchableDocument.pdf"))
{
document.Convert(CallBackGetHocr);
// Save PDF document
document.Save(dataDir + "SearchableDocument_out.pdf");
}
}
private static string CallBackGetHocr(System.Drawing.Image img)
{
var tmpFile = Path.GetTempFileName();
try
{
using (var bmp = new System.Drawing.Bitmap(img))
{
bmp.Save(tmpFile, System.Drawing.Imaging.ImageFormat.Bmp);
}
var inputFile = string.Concat('"', tmpFile, '"');
var outputFile = string.Concat('"', tmpFile, '"');
var arguments = string.Concat(inputFile, " ", outputFile, " -l eng hocr");
var tesseractProcessName = RunExamples.GetTesseractExePath();
var psi = new System.Diagnostics.ProcessStartInfo(tesseractProcessName, arguments)
{
UseShellExecute = true,
CreateNoWindow = true,
WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden,
WorkingDirectory = Path.GetDirectoryName(tesseractProcessName)
};
var p = new System.Diagnostics.Process
{
StartInfo = psi
};
p.Start();
p.WaitForExit();
using (var streamReader = new StreamReader(tmpFile + ".hocr"))
{
string text = streamReader.ReadToEnd();
return text;
}
}
finally
{
if (File.Exists(tmpFile))
{
File.Delete(tmpFile);
}
if (File.Exists(tmpFile + ".hocr"))
{
File.Delete(tmpFile + ".hocr");
}
}
}
This code snippet works with a PDF document and its tagged content, utilizing an Aspose.PDF library to process it.
The example creates a new span element in the tagged content of the first page of a PDF, finds all BDC elements, and associates them with the span. The modified document is then saved.
You can create a bdc statement specifying mcid, lang, and expansion text using the BDCProperties object:
var bdc = new Aspose.Pdf.Operators.BDC("P", new Aspose.Pdf.Facades.BDCProperties(1, "de", "Hallo, welt!"));
After creating the structure tree, it is possible to bind the BDC operator to the specified element of the structure with method Tag on the element object:
Aspose.Pdf.LogicalStructure.SpanElement span = content.CreateSpanElement();
span.Tag(bdc);
Steps to creating an accessible PDF:
// For complete examples and data files, visit https://github.com/aspose-pdf/Aspose.PDF-for-.NET
private static void CreateAnAccessibleDocument()
{
// The path to the documents directory
var dataDir = RunExamples.GetDataDir_AsposePdf_QuickStart();
// Open PDF document
using (var document = new Aspose.Pdf.Document(dataDir + "tourguidev2_gb_tags.pdf"))
{
// Access tagged content
Aspose.Pdf.Tagged.ITaggedContent content = document.TaggedContent;
// Create a span element
Aspose.Pdf.LogicalStructure.SpanElement span = content.CreateSpanElement();
// Append span to root element
content.RootElement.AppendChild(span);
// Iterate over page contents
foreach (var op in document.Pages[1].Contents)
{
var bdc = op as Aspose.Pdf.Operators.BDC;
if (bdc != null)
{
span.Tag(bdc);
}
}
// Save PDF document
document.Save(dataDir + "AccessibleDocument_out.pdf");
}
}
This code modifies a PDF by creating a span element within the document’s tagged content and tagging specific content (BDC operations) from the first page with this span. The modified PDF is then saved to a new file.
Analyzing your prompt, please hold on...
An error occurred while retrieving the results. Please refresh the page and try again.