Kurdish Optical Character Recognition

Keywords: Optical Character Recognition, Character Segmentation, Upper Contour Labeling, Kurdish OCR, Kurdish NLP


Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it is cursive, and it uses unique diacritics. These features, particularly the last two, affect the segmentation stage in developing a Kurdish OCR. In this article, we introduce an enhanced character segmentation based method which addresses the mentioned characteristics. We applied the method to text-only images and tested the Kurdish OCR using documents of different fonts, font sizes, and image resolutions. The results of the experiments showed that the accuracy rate of character recognition of the proposed method was 90.82% on average.


How to Cite
Yaseen, R., & Hassani, H. (2018, June 30). Kurdish Optical Character Recognition. UKH Journal of Science and Engineering, 2(1), 18-27. https://doi.org/https://doi.org/10.25079/ukhjse.v2n1y2018.pp18-27
Research Articles