Sistem za prepoznavanje štampanog teksta korišćenjem Tesseract biblioteke

Mikloš Popović

doi:10.24867/19IH04Popovic

Mechatronics

Vol. 37 No. 10 (2022): Proceedings of Faculty of Technical Sciences

System for printed text recognition using Tesseract library

Mikloš Popović^▸^▾

19IH04Popovic.pdf (Serbian)

DOI:: https://doi.org/10.24867/19IH04Popovic
Submitted: May 26, 2022
Published: 2022-10-05

Abstract

Paper presents one solution of the system that allows recognition of the printed text on the image using digital image processing, i.e. machine vision. The goal was to create an application by writing the program code in Python and by using the Tesseract library, which allows printed text recognition, after which a file containing the recognized text is formed. Sensitivity of the proposed solution was analyzed against the type of acquisition using camera, i.e. character size, as well as which processing techniques best affect the desired result.

References

[1] Chen, X., Jin, L., Zhu, Y., Luo, C., & Wang, T. (2021). Text recognition in the wild: A survey. ACM Computing Surveys (CSUR), 54(2), 1-35.
[2] Gonzalez, R. C., & Woods, R. E. (2018). Digital Image Processing, Hoboken.
[3] Sonka, M., Hlavac, V., & Boyle, R. (2014). Image processing, analysis, and machine vision. Cengage Learning.
[4] Smith, R. (2007) An overview of the Tesseract OCR engine. ICDAR 2007, Vol. 2, pp. 629-633. IEEE.