Hello, world!
In this article, you will learn how to build a basic web page that requests an image file from an user and extracts the text from it with with Aspose.OCR for Node.js via C++.
You will need
- A computer with Node.js 14 or later.
- Any text editor.
- Some image containing a text. You can simply download the one from this article.
- 15 minutes of spare time.
Preparing
-
Create a directory somewhere on your system where the project files will be kept. For example,
C:\Aspose-OCR-Example\
.
This directory will later be referred as project directory. -
Create node_modules directory in the project directory.
-
Download Aspose.OCR for Node.js via C++ distributive.
-
Unpack the downloaded package to aspose-ocr directory inside node_modules directory.
-
Download a sample image to the project directory under the name
source.png
:
Coding
-
Create an
index.js
file in the the project directory which will be used as a main (startup) project script. -
Import Aspose.OCR and File system modules:
const Module = require("aspose-ocr/lib/asposeocr"); const fs = require("fs");
-
Start recognition once Aspose.OCR for Node.js via C++ module finishes loading:
Module.onRuntimeInitialized = async _ => { }
-
Load the image into the project’s temporary storage:
fs.readFile("source.png", (err, imageData) => { const imageBytes = new Uint8Array(imageData); let internalFileName = "temp"; let stream = Module.FS.open(internalFileName, "w+"); Module.FS.write(stream, imageBytes, 0, imageBytes.length, 0); Module.FS.close(stream); });
Aspose.OCR for Node.js via C++ binary code is based on WebAssembly (Wasm) technology. As such, it has no access to physical files on disks and uses the virtual file system.
To overcome this restriction, we read the image file to the array of bytes and then save it to the virtual file system of Aspose.OCR module.
-
Add the image to the recognition batch:
let source = Module.WasmAsposeOCRInput(); source.url = internalFileName; let batch = new Module.WasmAsposeOCRInputs(); batch.push_back(source);
-
Specify the recognition language:
let recognitionSettings = Module.WasmAsposeOCRRecognitionSettings(); recognitionSettings.language_alphabet = Module.Language.ENG;
-
Send image for recognition:
var result = Module.AsposeOCRRecognize(batch, recognitionSettings);
-
Output image text to the console:
var text = Module.AsposeOCRSerializeResult(result, Module.ExportFormat.text); console.log(text);
Full listing (index.js)
const Module = require("aspose-ocr/lib/asposeocr");
const fs = require("fs");
Module.onRuntimeInitialized = async _ => {
// Load image file
fs.readFile("source.png", (err, imageData) => {
// Save image to the virtual storage
const imageBytes = new Uint8Array(imageData);
let internalFileName = "temp";
let stream = Module.FS.open(internalFileName, "w+");
Module.FS.write(stream, imageBytes, 0, imageBytes.length, 0);
Module.FS.close(stream);
// Add image to recognition batch
let source = Module.WasmAsposeOCRInput();
source.url = internalFileName;
let batch = new Module.WasmAsposeOCRInputs();
batch.push_back(source);
// Specify recognition language
let recognitionSettings = Module.WasmAsposeOCRRecognitionSettings();
recognitionSettings.language_alphabet = Module.Language.ENG;
// Send image for OCR
var result = Module.AsposeOCRRecognize(batch, recognitionSettings);
// Output image text to the console
var text = Module.AsposeOCRSerializeResult(result, Module.ExportFormat.text);
console.log(text);
});
}
Running
- Open the command prompt and navigate to the project directory.
- Run index.js script with the following command:
node --no-experimental-fetch index
- Wait for recognition to complete. It may take a while depending on the image size and your system performance.
You will see extracted text in the console output:
Hello. World! I can read this text
What’s next
Congratulations! You have performed OCR on an image and extracted the machine-readable text from it using Node.js.