| dc.contributor.author | PANDELICĂ, Mihai-Lucian | |
| dc.contributor.author | VLĂSCESANU, Giorgiana Violeta | |
| dc.contributor.author | ŢURCANU, Mihai | |
| dc.date.accessioned | 2026-02-18T16:43:15Z | |
| dc.date.available | 2026-02-18T16:43:15Z | |
| dc.date.issued | 2025 | |
| dc.identifier.citation | PANDELICĂ, Mihai-Lucian; Giorgiana Violeta VLĂSCESANU and Mihai ŢURCANU. Exploring OCR: Combining open-source engines for improved document digitization. In: 24th RoEduNet International Conference Networking in Education and Research, Chisinau, Republic of Moldova, 17-19 September, 2025. Universitatea Politehnică din Bucureşti. IEEE, 2025, pp. 1-11. ISBN 979-8-3315-5714-0, eISBN 979-8-331-55713-3, ISSN 2068-1038, eISSN 2247-5443. | en_US |
| dc.identifier.isbn | 979-8-3315-5714-0 | |
| dc.identifier.isbn | 979-8-331-55713-3 | |
| dc.identifier.issn | 2068-1038 | |
| dc.identifier.issn | 2247-5443 | |
| dc.identifier.uri | https://doi.org/10.1109/RoEduNet68395.2025.11208270 | |
| dc.identifier.uri | https://repository.utm.md/handle/5014/35311 | |
| dc.description | Acces full text: https://doi.org/10.1109/RoEduNet68395.2025.11208270 | en_US |
| dc.description.abstract | Document digitization involves converting physical documents into editable digital text, a process that offers significant benefits such as preserving archives, enabling remote access, and simplifying content modification. Optical Character Recognition (OCR) technologies facilitate this transformation by extracting text from scanned or photographed document images. However, OCR accuracy can be hindered by the wide variety of document layouts and conditions, including issues like faded text and uneven lighting. In this study, we investigate the potential of combining multiple open-source OCR engines to improve digitization accuracy, focusing on the Tesseract and EasyOCR engines. We developed a testing pipeline and conducted experiments targeting challenging scenarios for character recognition. Our results demonstrate that integrating outputs from both engines can enhance performance, highlighting their complementary strengths and the promise of ensemble approaches for more reliable document digitization. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | IEEE (Institute of Electrical and Electronics Engineers) | en_US |
| dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
| dc.subject | document digitization | en_US |
| dc.subject | tesseract | en_US |
| dc.title | Exploring OCR: Combining open-source engines for improved document digitization | en_US |
| dc.type | Article | en_US |
The following license files are associated with this item: