From scores to insights: Predicting MT errors using reliable metrics and linguistic typology in slavic languages

IF 1.9 Q2 MULTIDISCIPLINARY SCIENCES

MethodsX Pub Date : 2025-09-08 DOI:10.1016/j.mex.2025.103613

Dasa Munkova , Lucia Benkova , Michal Munk , Ľubomír Benko , Petr Hajek

{"title":"From scores to insights: Predicting MT errors using reliable metrics and linguistic typology in slavic languages","authors":"Dasa Munkova , Lucia Benkova , Michal Munk , Ľubomír Benko , Petr Hajek","doi":"10.1016/j.mex.2025.103613","DOIUrl":null,"url":null,"abstract":"<div><div>Machine Translation (MT) evaluation plays a crucial role in advancing systems translating into morphologically rich, low-resource languages such as Slovak. Existing automatic evaluation methods typically offer a single quality score, lacking insight into specific error types. A novel linguistically informed methodology that predicts the probability of MT error categories by integrating manual annotation with automatic evaluation metrics is proposed. The method builds on a modified MQM framework adapted for Slovak and employs a dataset of English-to-Slovak translations, combining outputs from statistical and neural MT systems with human reference translations. Manual annotations identified five linguistically motivated error categories. Reliability of 68 automatic metrics was assessed using Cronbach’s alpha, correlation coefficients, coefficient of determination (R²), and entropy. Bootstrapped logistic regression models were then developed to predict error occurrence probabilities. The proposed methodology improves the explainability and reliability of automatic MT evaluation by bridging the gap between holistic scoring and detailed error categorization. It significantly reduces the human effort required for quality assessment while maintaining a high degree of linguistic relevance, particularly for complex target languages like Slovak.<ul><li><span>•</span><span><div>Predicts probabilities of specific MT error categories</div></span></li><li><span>•</span><span><div>Integrates linguistic expertise with statistical reliability analysis</div></span></li><li><span>•</span><span><div>Reduces human effort in MT evaluation while preserving linguistic precision</div></span></li></ul></div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"15 ","pages":"Article 103613"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215016125004571","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Machine Translation (MT) evaluation plays a crucial role in advancing systems translating into morphologically rich, low-resource languages such as Slovak. Existing automatic evaluation methods typically offer a single quality score, lacking insight into specific error types. A novel linguistically informed methodology that predicts the probability of MT error categories by integrating manual annotation with automatic evaluation metrics is proposed. The method builds on a modified MQM framework adapted for Slovak and employs a dataset of English-to-Slovak translations, combining outputs from statistical and neural MT systems with human reference translations. Manual annotations identified five linguistically motivated error categories. Reliability of 68 automatic metrics was assessed using Cronbach’s alpha, correlation coefficients, coefficient of determination (R²), and entropy. Bootstrapped logistic regression models were then developed to predict error occurrence probabilities. The proposed methodology improves the explainability and reliability of automatic MT evaluation by bridging the gap between holistic scoring and detailed error categorization. It significantly reduces the human effort required for quality assessment while maintaining a high degree of linguistic relevance, particularly for complex target languages like Slovak.

•
Predicts probabilities of specific MT error categories
•
Integrates linguistic expertise with statistical reliability analysis
•
Reduces human effort in MT evaluation while preserving linguistic precision

Abstract Image

查看原文本刊更多论文

从分数到洞察力：使用可靠的度量和斯拉夫语言的语言类型学预测机器翻译错误

机器翻译（MT）评估在推进系统翻译成形态丰富，低资源语言（如斯洛伐克语）方面起着至关重要的作用。现有的自动评估方法通常提供单一的质量分数，缺乏对特定错误类型的洞察。提出了一种结合人工标注和自动评价指标预测机器翻译错误类别概率的新方法。该方法建立在针对斯洛伐克语修改的MQM框架上，并使用英语到斯洛伐克语翻译的数据集，将统计和神经机器翻译系统的输出与人类参考翻译相结合。手动注释确定了五种语言动机错误类别。采用Cronbach’s alpha、相关系数、决定系数（R²）和熵来评估68个自动指标的信度。然后开发了自举逻辑回归模型来预测错误发生的概率。该方法通过弥合整体评分和详细错误分类之间的差距，提高了自动机器翻译评估的可解释性和可靠性。它大大减少了质量评估所需的人力，同时保持了高度的语言相关性，特别是对于像斯洛伐克语这样复杂的目标语言。•预测特定机器翻译错误类别的概率•将语言学专业知识与统计可靠性分析相结合•减少机器翻译评估中的人力，同时保持语言精度

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊