Recognition languages
To recognize text in certain languages, you must install additional recognition models:
- Cyrillic text recognition: aspose-ocr-cyrillic-v1
- Chinese text recognition: aspose-ocr-chinese-v2
- Indic (Devanagari) text recognition: aspose-ocr-hindi-v2
- Arabic text recognition: aspose-ocr-arabic-v1
- Persian (Farsi) text recognition: aspose-ocr-arabic-v1
- Uyghur text recognition: aspose-ocr-arabic-v1
- Urdu text recognition: aspose-ocr-arabic-v1
- Japanese text recognition: aspose-ocr-japanese-v1
- Korean text recognition: aspose-ocr-korean-v1
- Kannada text recognition: aspose-ocr-kannada-v1
- Tamil text recognition: aspose-ocr-tamil-v1
- Telugu text recognition: aspose-ocr-telugu-v1
- Mixed-language Cyrillic/English recognition: aaspose-ocr-cyrillic-v2
- Mongolian text recognition: aspose-ocr-cyrillic-v1
Aspose.OCR for Python via .NET can recognize a text in a large number of languages and all popular writing scripts, including texts with mixed languages.
To specify a language for recognition, provide one of the following values in language
property of recognition settings:
Value | Alphabet |
---|---|
Language.MULTILANGUAGE Aspose.OCR.Language.AUTO Aspose.OCR.Language.UNIVERSAL |
Automatically detects the language. Supports multiple language families, including Latin, Cyrillic, Arabic, Chinese and more. |
Language.EXT_LATIN |
Auto-detect all supported Latin characters and diacritics |
Language.CYRILLIC |
Auto-detect all supported Cyrillic characters |
Language.CHINESE |
All Chinese languages. Mixed-language Chinese/English texts also supported. |
Language.DEVANAGARI Language.INDIC |
Indic texts based on Devanagari script, including mixed Devanagari/English texts. |
Language.EUROPEAN |
Mixed-language Cyrillic/English texts (experimental). |
Language.PERSO_ARABIC Language.ISLAMIC |
Mixed-language texts with Arabic, Persian and English. |
Language.AFR
| Afrikaans
Language.ALN
| Albanian
Language.ARA
| Arabic, including texts in mixed Arabic/English
Language.AWA
| Awadhi
Language.AZB
| Azerbaijani (Azeri)
Language.BCL
| Bikol
Language.BEL
| Belarusan (Belorussian)
Language.BEM
| Bemba (Chibemba)
Language.BEW
| Betawi
Language.BGC
| Haryanvi
Language.BHO
| Bhojpuri
Language.BHR
| Malagasy
Language.BJJ
| Kanauji
Language.BOS
| Bosnian
Language.BUL
| Bulgarian
Language.CAT
| Catalan
Language.CCX
| Zhuang
Language.CDO
| Min Dong
Language.CEB
| Cebuano
Language.CES
| Czech
Language.CHE
| Chechen
Language.CMN
| Mandarin (Chinese)
Language.CPX
| Pu-Xian
Language.DAN
| Danish
Language.DEU
| German
Language.DHD
| Dhundari
Language.DIQ
| Dimli
Language.DOC
| Dong
Language.ENG
| English
Language.EST
| Estonian
Language.FIN
| Finnish
Language.FRA
| French
Language.GAN
| Gan
Language.GAX
| Oromo
Language.GBM
| Garhwali
Language.GLG
| Galician
Language.GLK
| Gilaki
Language.GUZ
| Gusii
Language.HAK
| Hakka
Language.HAU
| Hausa
Language.HBS
| Serbo-Croatian (Latin)
Language.HIL
| Hiligaynon
Language.HIN
| Hindi
Language.HMN
| Hmong
Language.HNE
| Chattisgarhi (Laria, Khaltahi)
Language.HRV
| Croatian
Language.HSN
| Xiang
Language.HUN
| Hungarian (Magyar)
Language.ILO
| Ilocano
Language.IND
| Indonesian
Language.ITA
| Italian
Language.JPN
| Japanese (mixed texts in Japanese and English are also supported)
Language.KAN
| Mixed-language Kannada/English texts.
Language.KAZ
| Kazakh
Language.KBD
| Kabardian
Language.KFY
| Kumauni
Language.KIN
| Rwanda
Language.KLN
| Nandi
Language.KMR
| Kurdish (Kurmanji)
Language.KNC
| Kanuri
Language.KNN
| Konkani
Language.KON
| Kikongo
Language.KOR
| Korean (mixed texts in Korean and English are also supported)
Language.LATIN
| Latin
Language.LAV
| Latvian
Language.LIT
| Lithuanian
Language.LMN
| Lamani (Lambadi)
Language.LNC
| Occitan
Language.LUO
| Luo
Language.MAG
| Magahi
Language.MAI
| Maithili
Language.MAK
| Makassar (Makasar)
Language.MAR
| Marathi
Language.MER
| Meru
Language.MIN
| Minangkabau
Language.MLY
| Malay (Melayu)
Language.MNP
| Min Bei
Language.MON
| Mongolian
Language.MTQ
| Muong
Language.MTR
| Mewari
Language.MUI
| Musi
Language.MUP
| Malvi
Language.NAN
| Min Nan
Language.NBL
| Ndebele
Language.NDS
| Low German
Language.NEP
| Nepali
Language.NLD
| Dutch
Language.NOR
| Norwegian
Language.NSO
| Sotho (Northern)
Language.NYA
| Chichewa (Chewa, Nyanja)
Language.PAG
| Pangasinan
Language.PAM
| Kapampangan
Language.PCC
| Bouyei (Buyi, Giáy)
Language.PES
| Persian (Farsi), including texts in mixed Persian/English
Language.PLM
| Palembang
Language.POL
| Polish
Language.POR
| Portuguese
Language.QUC
| K’iche'
Language.QXA
| Quechua
Language.RJB
| Rajbanshi
Language.RON
| Romanian
Language.RUF
| Luguru
Language.RUS
| Russian
Language.RWR
| Marwari
Language.SAS
| Sasak
Language.SLK
| Slovak
Language.SLV
| Slovene (Slovenian)
Language.SNA
| Shona (Karanga)
Language.SOM
| Somali
Language.SOT
| Sotho (Southern)
Language.SPA
| Spanish
Language.SRP
| Serbian (Cyrillic)
Language.SRR
| Serer-Sine
Language.SSW
| Swati (Swazi)
Language.SUK
| Sukuma
Language.SUN
| Sundanese (Sunda)
Language.SWE
| Swedish
Language.SWH
| Swahili
Language.TAM
| Mixed-language Tamil/English texts.
Language.TEL
| Mixed-language Telugu/English texts.
Language.TGL
| Tagalog (Pilipino)
Language.TOI
| Tonga
Language.TSN
| Tswana
Language.TSO
| Tsonga
Language.TUK
| Turkmen
Language.TUM
| Tumbuka
Language.TUR
| Turkish
Language.UIG
| Uyghur, including texts in mixed Uyghur/English
Language.UKR
| Ukrainian
Language.UMB
| Umbundu
Language.URD
| Urdu, including texts in mixed Urdu/English
Language.VIE
| Vietnamese
Language.VMW
| Makua (Makhuwa)
Language.WAL
| Wolaytta
Language.WAR
| Waray-Waray
Language.WBR
| Wagdi
Language.WTM
| Mewati
Language.WUU
| Wu (Changzhou)
Language.XHO
| Xhosa
Language.YAO
| Yao
Language.YOR
| Yoruba
Language.YUE
| Cantonese
Language.ZUL
| Zulu
If this parameter is omitted, the OCR engine will assume that the text is written in extended Latin.
Language.ENG
, only the characters that look the same in both languages (for example, с
and c
) will be correctly recognized. Other characters will be replaced with similar-looking alternatives.
Example
The following code sample demonstrates how to specify the recognition language:
# Instantiate Aspose.OCR API
api = AsposeOcr()
# Add image to the recognition batch
input = OcrInput(InputType.SINGLE_IMAGE)
input.add("source.png")
# Recognize Ukrainian text
recognitionSettings = RecognitionSettings()
recognitionSettings.language = Language.UKR
# Recognize the image
result = api.recognize(input, recognitionSettings)
# Print recognition result
print(result[0].recognition_text)
input("Press Enter to continue...")