IRTUM – Institutional Repository of the Technical University of Moldova

Exploring OCR: Combining open-source engines for improved document digitization

Show simple item record

dc.contributor.author PANDELICĂ, Mihai-Lucian
dc.contributor.author VLĂSCESANU, Giorgiana Violeta
dc.contributor.author ŢURCANU, Mihai
dc.date.accessioned 2026-02-18T16:43:15Z
dc.date.available 2026-02-18T16:43:15Z
dc.date.issued 2025
dc.identifier.citation PANDELICĂ, Mihai-Lucian; Giorgiana Violeta VLĂSCESANU and Mihai ŢURCANU. Exploring OCR: Combining open-source engines for improved document digitization. In: 24th RoEduNet International Conference Networking in Education and Research, Chisinau, Republic of Moldova, 17-19 September, 2025. Universitatea Politehnică din Bucureşti. IEEE, 2025, pp. 1-11. ISBN 979-8-3315-5714-0, eISBN 979-8-331-55713-3, ISSN 2068-1038, eISSN 2247-5443. en_US
dc.identifier.isbn 979-8-3315-5714-0
dc.identifier.isbn 979-8-331-55713-3
dc.identifier.issn 2068-1038
dc.identifier.issn 2247-5443
dc.identifier.uri https://doi.org/10.1109/RoEduNet68395.2025.11208270
dc.identifier.uri https://repository.utm.md/handle/5014/35311
dc.description Acces full text: https://doi.org/10.1109/RoEduNet68395.2025.11208270 en_US
dc.description.abstract Document digitization involves converting physical documents into editable digital text, a process that offers significant benefits such as preserving archives, enabling remote access, and simplifying content modification. Optical Character Recognition (OCR) technologies facilitate this transformation by extracting text from scanned or photographed document images. However, OCR accuracy can be hindered by the wide variety of document layouts and conditions, including issues like faded text and uneven lighting. In this study, we investigate the potential of combining multiple open-source OCR engines to improve digitization accuracy, focusing on the Tesseract and EasyOCR engines. We developed a testing pipeline and conducted experiments targeting challenging scenarios for character recognition. Our results demonstrate that integrating outputs from both engines can enhance performance, highlighting their complementary strengths and the promise of ensemble approaches for more reliable document digitization. en_US
dc.language.iso en en_US
dc.publisher IEEE (Institute of Electrical and Electronics Engineers) en_US
dc.rights Attribution-NonCommercial-NoDerivs 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.subject document digitization en_US
dc.subject tesseract en_US
dc.title Exploring OCR: Combining open-source engines for improved document digitization en_US
dc.type Article en_US


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Search DSpace


Browse

My Account