DSpace Repository

Evaluating whisper’s speech to text performance for romanian using audio from diverse domains

Show simple item record

dc.contributor.advisor BEȘLIU, Corina
dc.contributor.author CERNEI, Ion
dc.date.accessioned 2026-01-15T06:25:19Z
dc.date.available 2026-01-15T06:25:19Z
dc.date.issued 2026
dc.identifier.citation CERNEI, Ion. Evaluating whisper’s speech to text performance for romanian using audio from diverse domains. In: Conferinţa Tehnico-Ştiinţifică a Colaboratorilor, Doctoranzilor şi Studenţilor = The Technical Scientific Conference of Undergraduate, Master and PhD Students, 14-16 Mai 2025. Universitatea Tehnică a Moldovei. Chişinău: Tehnica-UTM, 2026, vol. 1, pp. 771-774. ISBN 978-9975-64-612-3, ISBN 978-9975-64-613-0 (PDF). en_US
dc.identifier.isbn 978-9975-64-612-3
dc.identifier.isbn 978-9975-64-613-0
dc.identifier.uri https://repository.utm.md/handle/5014/34447
dc.description.abstract This study evaluates the Whisper model’s performance for Romanian speech-to-text transcription, investigating how transcription accuracy varies across diverse audio domains. Audio sources, including audiobooks, news broadcasts, and official public speeches, were selected for their verified textual references, ensuring robust evaluation through accurate alignment. Each domain presents distinct linguistic and acoustic characteristics, from the structured and clear narration of audiobooks to the dynamic and occasionally noisy environments of live news, to the formal rhetoric of political discourse. The study uses standard evaluation metrics such as Word Error Rate (WER) and Character Error Rate (CER), enabling a consistent assessment of transcription performance. By focusing on Romanian, a low-resource language in automatic speech recognition, this study provides novel insights into Whisper’s effectiveness and the influence of the audio domain on transcription quality, contributing to advancements in speech recognition for under-resourced languages. Results show that Whisper performs best on scripted, high-quality audio such as audiobooks. At the same time, accuracy decreases in more variable and spontaneous contexts, highlighting the model’s sensitivity to content structure and recording conditions. en_US
dc.language.iso en en_US
dc.publisher Universitatea Tehnică a Moldovei en_US
dc.relation.ispartofseries Conferinţa tehnico-ştiinţifică a studenţilor, masteranzilor şi doctoranzilor = The Technical Scientific Conference of Undergraduate, Master and PhD Students: 14-16 mai 2025;
dc.rights Attribution-NonCommercial-NoDerivs 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.subject automatic speech recognition en_US
dc.subject low-resource languages en_US
dc.subject error metrics en_US
dc.subject speech analysis en_US
dc.subject domain-specific evaluation en_US
dc.title Evaluating whisper’s speech to text performance for romanian using audio from diverse domains en_US
dc.type Article en_US


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Search DSpace


Advanced Search

Browse

My Account