Extensive mutation for testing of word sense disambiguation models

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-04-05 DOI:10.1016/j.infsof.2025.107734

Deping Zhang , Zhaohui Yang , Xiang Huang , Yanhui Li

{"title":"Extensive mutation for testing of word sense disambiguation models","authors":"Deping Zhang , Zhaohui Yang , Xiang Huang , Yanhui Li","doi":"10.1016/j.infsof.2025.107734","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Word sense disambiguation (WSD) models are extensively utilized in various translation and question-answering systems. Assessing the WSD capability of these models aids in their improvement and enhances their dependability. Recently, researchers have introduced the concept of “mutation” to induce WSD errors in machine translation systems to evaluate their WSD ability.</div></div><div><h3>Objective:</h3><div>Inspired by the recent research, this study aims to extend types of mutations and check their potential application in testing WSD models to check whether these mutations can effectively provoke WSD errors.</div></div><div><h3>Method:</h3><div>We have designed and implemented nine innovative types of mutations focusing on words, phrases, and sentence structure for the sentence in WSD testing. Based on these extensive mutations, we have proposed a WSD testing framework that utilizes large language models to prompt sentence mutations and assess the disambiguation capability of WSD models.</div></div><div><h3>Results:</h3><div>In our research, we have conducted experiments using five widely recognized test sets for WSD tasks under five widely used WSD models. The experimental results show that (a) our testing framework can produce correct mutations for nine proposed mutations, and (b) the newly developed mutations have been shown to successfully trigger a substantial number of factual and unique WSD errors.</div></div><div><h3>Conclusions:</h3><div>The new types of mutations we designed can effectively be applied in mutation-based WSD testing. This suggests that by exploring more types of mutations, more WSD errors can be triggered.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"183 ","pages":"Article 107734"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925000734","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

Word sense disambiguation (WSD) models are extensively utilized in various translation and question-answering systems. Assessing the WSD capability of these models aids in their improvement and enhances their dependability. Recently, researchers have introduced the concept of “mutation” to induce WSD errors in machine translation systems to evaluate their WSD ability.

Objective:

Inspired by the recent research, this study aims to extend types of mutations and check their potential application in testing WSD models to check whether these mutations can effectively provoke WSD errors.

Method:

We have designed and implemented nine innovative types of mutations focusing on words, phrases, and sentence structure for the sentence in WSD testing. Based on these extensive mutations, we have proposed a WSD testing framework that utilizes large language models to prompt sentence mutations and assess the disambiguation capability of WSD models.

Results:

In our research, we have conducted experiments using five widely recognized test sets for WSD tasks under five widely used WSD models. The experimental results show that (a) our testing framework can produce correct mutations for nine proposed mutations, and (b) the newly developed mutations have been shown to successfully trigger a substantial number of factual and unique WSD errors.

Conclusions:

The new types of mutations we designed can effectively be applied in mutation-based WSD testing. This suggests that by exploring more types of mutations, more WSD errors can be triggered.

查看原文本刊更多论文

词义消歧义模型的广泛变异测试

上下文：词义消歧（WSD）模型广泛应用于各种翻译和问答系统中。评估这些模型的水务署能力，有助其改进及提高其可靠性。近年来，研究者引入了“突变”的概念来诱导机器翻译系统的WSD错误，以评估机器翻译系统的WSD能力。目的：受近期研究的启发，本研究旨在扩展突变类型并检查其在测试WSD模型中的潜在应用，以检查这些突变是否能有效引发WSD错误。方法：针对WSD测试中的句子，设计并实现了以单词、短语和句子结构为重点的9种创新突变类型。基于这些广泛的突变，我们提出了一个WSD测试框架，该框架利用大型语言模型来提示句子突变并评估WSD模型的消歧能力。结果：在我们的研究中，我们在五个广泛使用的WSD模型下，使用了五个广泛认可的WSD任务测试集进行了实验。实验结果表明：(a)我们的测试框架可以为9个提出的突变产生正确的突变，以及(b)新开发的突变已被证明可以成功触发大量实际和独特的WSD错误。结论：我们设计的新型突变可有效应用于基于突变的WSD检测。这表明，通过探索更多类型的突变，可以触发更多的WSD错误。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.