Semantic Structure Invariance-Based Metamorphic Testing for Machine Translation Systems

IF 5.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Reliability Pub Date : 2025-01-07 DOI:10.1109/TR.2024.3521029

Chang-ai Sun;Jian Mu;Mingjun Xiao;Huai Liu;Pinjia He

{"title":"Semantic Structure Invariance-Based Metamorphic Testing for Machine Translation Systems","authors":"Chang-ai Sun;Jian Mu;Mingjun Xiao;Huai Liu;Pinjia He","doi":"10.1109/TR.2024.3521029","DOIUrl":null,"url":null,"abstract":"In recent years, deep neural networks have been applied in machine translation systems, resulting in the so-called neural machine translation (NMT) models that can improve translation quality significantly. However, due to the brittleness of deep neural network, machine translation systems could return erroneous translations that lead to misunderstandings or even cause serious losses. To detect translation errors, various testing techniques have been proposed. As a popularly used technique, metamorphic testing mainly relies on text or syntactic structure of translations while ignoring the meaning of sentences (i.e., semantic information). Compared with text and syntactic information, semantic information of sentences is more stable when dealing with languages that have rich vocabulary and flexible word order. Motivated by this observation, we propose semantic structure invariance-based metamorphic testing (SSIMT) for machine translation systems. The key insight is that contextually similar sentences should typically have translations of similar semantic structures. Experiments have been conducted to evaluate SSIMT on two widely used machine translation systems, Microsoft Bing Translator and Google Translate with 600 seed sentences crawled from well-known news websites covering six different corpus topics. The experimental results show that SSIMT is able to find thousands of erroneous translations in both translation systems with high accuracy (over 70%). Translation errors reported by SSIMT covers a wide variety of common error types.","PeriodicalId":56305,"journal":{"name":"IEEE Transactions on Reliability","volume":"74 3","pages":"3251-3265"},"PeriodicalIF":5.7000,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Reliability","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10830582/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, deep neural networks have been applied in machine translation systems, resulting in the so-called neural machine translation (NMT) models that can improve translation quality significantly. However, due to the brittleness of deep neural network, machine translation systems could return erroneous translations that lead to misunderstandings or even cause serious losses. To detect translation errors, various testing techniques have been proposed. As a popularly used technique, metamorphic testing mainly relies on text or syntactic structure of translations while ignoring the meaning of sentences (i.e., semantic information). Compared with text and syntactic information, semantic information of sentences is more stable when dealing with languages that have rich vocabulary and flexible word order. Motivated by this observation, we propose semantic structure invariance-based metamorphic testing (SSIMT) for machine translation systems. The key insight is that contextually similar sentences should typically have translations of similar semantic structures. Experiments have been conducted to evaluate SSIMT on two widely used machine translation systems, Microsoft Bing Translator and Google Translate with 600 seed sentences crawled from well-known news websites covering six different corpus topics. The experimental results show that SSIMT is able to find thousands of erroneous translations in both translation systems with high accuracy (over 70%). Translation errors reported by SSIMT covers a wide variety of common error types.

查看原文本刊更多论文

基于语义结构不变性的机器翻译系统变形测试

近年来，深度神经网络被应用于机器翻译系统，产生了所谓的神经机器翻译（NMT）模型，可以显著提高翻译质量。然而，由于深度神经网络的脆弱性，机器翻译系统可能会返回错误的译文，从而导致误解，甚至造成严重的损失。为了检测翻译错误，人们提出了各种各样的测试技术。变形测试作为一种常用的技术，主要依赖于译文的文本或句法结构，而忽略了句子的意义（即语义信息）。与文本信息和句法信息相比，在词汇丰富、语序灵活的语言中，句子的语义信息更为稳定。基于这一观察结果，我们提出了基于语义结构不变性的机器翻译系统变形测试（SSIMT）。关键的观点是上下文相似的句子通常应该具有相似语义结构的翻译。在微软必应翻译和谷歌翻译这两个广泛使用的机器翻译系统上，用从知名新闻网站抓取的600个种子句子，涵盖6个不同的语料库主题，对SSIMT进行了实验评估。实验结果表明，在两种翻译系统中，SSIMT都能以较高的准确率（超过70%）发现数千个错误译文。SSIMT报告的翻译错误涵盖了各种常见的错误类型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Reliability 工程技术-工程：电子与电气

CiteScore

12.20

自引率

8.50%

发文量

153

审稿时长

7.5 months

期刊介绍： IEEE Transactions on Reliability is a refereed journal for the reliability and allied disciplines including, but not limited to, maintainability, physics of failure, life testing, prognostics, design and manufacture for reliability, reliability for systems of systems, network availability, mission success, warranty, safety, and various measures of effectiveness. Topics eligible for publication range from hardware to software, from materials to systems, from consumer and industrial devices to manufacturing plants, from individual items to networks, from techniques for making things better to ways of predicting and measuring behavior in the field. As an engineering subject that supports new and existing technologies, we constantly expand into new areas of the assurance sciences.