Extracting text from passport images
Automatic passport recognition and verification is a very common task in many areas: border control, banking, security, and so on. However, manually re-typing text is an error-prone and time-consuming process, and mistakes can lead to security breaches and other undesirable consequences.
Aspose.OCR for Python via .NET offers a special recognition algorithm that extracts text from scanned or photographed passports, which can then be automatically saved to the database or automatically verified.
To extract text from a passport image, use recognize_passport()
method of AsposeOcr
class.
This method allows you to customize recognition accuracy, performance, and other settings.
The method takes OcrInput
object and returns OcrOutput
object containing the passport data.
# Instantiate Aspose.OCR API
api = AsposeOcr()
# Add image to the recognition batch
input = OcrInput(InputType.SINGLE_IMAGE)
input.add("passport1.png")
input.add("passport2.png")
# Set recognition language
recognitionSettings = PassportRecognitionSettings()
recognitionSettings.language = Language.Latin
# Recognize passports
results = api.recognize_passport(input, recognitionSettings)
# Print recognition result
for result in results:
print(result.recognition_text)
Extracting passport details
Besides recognizing passport text, this method is capable of extracting essential information from a passport image, like date of birth, names, and more. The specific details extracted depend on the passport’s origin, which is specified in the country
parameter of the recognition settings.
To retrieve the passport details, use the get_keywords()
method of the recognition results object. The information is returned as a collection of Keyword
objects, representing a single passport detail as a name-value pair:
key
- passport detail ID, for exampleDATE_OF_BIRTH
;value
- specific passport detail, for example1 Sep 2000
Example
The following code snippet shows how to extract key details from US passport:
# Instantiate Aspose.OCR API
api = ocr.AsposeOcr()
# Add image to the recognition batch
input = OcrInput(InputType.SINGLE_IMAGE)
input.add("us_passport.png")
# Enable US passport recognition
settings = ocr.PassportRecognitionSettings()
settings.country = ocr.Country.USA
# Extract passport details
result = api.recognize_passport(input, settings)
details = result[0].get_keywords()
for detail in details:
print(detail.key)
print(detail.value.text_in_line)