From scores to insights: Predicting MT errors using reliable metrics and linguistic typology in slavic languages

IF 1.9 Q2 MULTIDISCIPLINARY SCIENCES
MethodsX Pub Date : 2025-09-08 DOI:10.1016/j.mex.2025.103613
Dasa Munkova , Lucia Benkova , Michal Munk , Ľubomír Benko , Petr Hajek
{"title":"From scores to insights: Predicting MT errors using reliable metrics and linguistic typology in slavic languages","authors":"Dasa Munkova ,&nbsp;Lucia Benkova ,&nbsp;Michal Munk ,&nbsp;Ľubomír Benko ,&nbsp;Petr Hajek","doi":"10.1016/j.mex.2025.103613","DOIUrl":null,"url":null,"abstract":"<div><div>Machine Translation (MT) evaluation plays a crucial role in advancing systems translating into morphologically rich, low-resource languages such as Slovak. Existing automatic evaluation methods typically offer a single quality score, lacking insight into specific error types. A novel linguistically informed methodology that predicts the probability of MT error categories by integrating manual annotation with automatic evaluation metrics is proposed. The method builds on a modified MQM framework adapted for Slovak and employs a dataset of English-to-Slovak translations, combining outputs from statistical and neural MT systems with human reference translations. Manual annotations identified five linguistically motivated error categories. Reliability of 68 automatic metrics was assessed using Cronbach’s alpha, correlation coefficients, coefficient of determination (R²), and entropy. Bootstrapped logistic regression models were then developed to predict error occurrence probabilities. The proposed methodology improves the explainability and reliability of automatic MT evaluation by bridging the gap between holistic scoring and detailed error categorization. It significantly reduces the human effort required for quality assessment while maintaining a high degree of linguistic relevance, particularly for complex target languages like Slovak.<ul><li><span>•</span><span><div>Predicts probabilities of specific MT error categories</div></span></li><li><span>•</span><span><div>Integrates linguistic expertise with statistical reliability analysis</div></span></li><li><span>•</span><span><div>Reduces human effort in MT evaluation while preserving linguistic precision</div></span></li></ul></div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"15 ","pages":"Article 103613"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215016125004571","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Machine Translation (MT) evaluation plays a crucial role in advancing systems translating into morphologically rich, low-resource languages such as Slovak. Existing automatic evaluation methods typically offer a single quality score, lacking insight into specific error types. A novel linguistically informed methodology that predicts the probability of MT error categories by integrating manual annotation with automatic evaluation metrics is proposed. The method builds on a modified MQM framework adapted for Slovak and employs a dataset of English-to-Slovak translations, combining outputs from statistical and neural MT systems with human reference translations. Manual annotations identified five linguistically motivated error categories. Reliability of 68 automatic metrics was assessed using Cronbach’s alpha, correlation coefficients, coefficient of determination (R²), and entropy. Bootstrapped logistic regression models were then developed to predict error occurrence probabilities. The proposed methodology improves the explainability and reliability of automatic MT evaluation by bridging the gap between holistic scoring and detailed error categorization. It significantly reduces the human effort required for quality assessment while maintaining a high degree of linguistic relevance, particularly for complex target languages like Slovak.
  • Predicts probabilities of specific MT error categories
  • Integrates linguistic expertise with statistical reliability analysis
  • Reduces human effort in MT evaluation while preserving linguistic precision

Abstract Image

从分数到洞察力:使用可靠的度量和斯拉夫语言的语言类型学预测机器翻译错误
机器翻译(MT)评估在推进系统翻译成形态丰富,低资源语言(如斯洛伐克语)方面起着至关重要的作用。现有的自动评估方法通常提供单一的质量分数,缺乏对特定错误类型的洞察。提出了一种结合人工标注和自动评价指标预测机器翻译错误类别概率的新方法。该方法建立在针对斯洛伐克语修改的MQM框架上,并使用英语到斯洛伐克语翻译的数据集,将统计和神经机器翻译系统的输出与人类参考翻译相结合。手动注释确定了五种语言动机错误类别。采用Cronbach’s alpha、相关系数、决定系数(R²)和熵来评估68个自动指标的信度。然后开发了自举逻辑回归模型来预测错误发生的概率。该方法通过弥合整体评分和详细错误分类之间的差距,提高了自动机器翻译评估的可解释性和可靠性。它大大减少了质量评估所需的人力,同时保持了高度的语言相关性,特别是对于像斯洛伐克语这样复杂的目标语言。•预测特定机器翻译错误类别的概率•将语言学专业知识与统计可靠性分析相结合•减少机器翻译评估中的人力,同时保持语言精度
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
MethodsX
MethodsX Health Professions-Medical Laboratory Technology
CiteScore
3.60
自引率
5.30%
发文量
314
审稿时长
7 weeks
期刊介绍:
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信