Metamorphic testing for textual and visual entailment: A unified framework for model evaluation and explanation

IF 4.3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Mingyue Jiang , Bintao Hu , Xiao-Yi Zhang
{"title":"Metamorphic testing for textual and visual entailment: A unified framework for model evaluation and explanation","authors":"Mingyue Jiang ,&nbsp;Bintao Hu ,&nbsp;Xiao-Yi Zhang","doi":"10.1016/j.infsof.2025.107855","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Textual entailment (TE) and visual entailment (VE) serve as the basis for a broad spectrum of tasks in natural language processing and vision–language modeling. However, although being extensively studied, both TE and VE models exhibit several quality issues. Additionally, their black-box nature hampers the understanding of their behaviors, making it unclear why the model fails to correctly predict entailment relationships. Consequently, there is a pressing need for methods that can effectively evaluate and explain TE and VE models.</div></div><div><h3>Objective:</h3><div>This study aims to develop a unified approach for detecting and interpreting failures, in both TE and VE models.</div></div><div><h3>Methods:</h3><div>We propose a metamorphic testing-based approach for evaluating and explaining both TE and VE models. The central aspect of our approach lies in the proposed three metamorphic relations, which are generic to both TE and VE, and also preserve specific associations among relevant inputs. The proposed approach conducts metamorphic testing to detect failures in TE and VE models. When a failure is revealed, it further performs a post-hoc analysis within the relevant group of inputs to identify information that is critical for the detected failure.</div></div><div><h3>Results:</h3><div>Experimental results demonstrate the effectiveness of the proposed approach in failure detection and also confirm its potential to provide useful information to pinpoint the root causes of detected failures.</div></div><div><h3>Conclusion:</h3><div>This study presents a general metamorphic testing approach for both TE and VE. It also demonstrates that, with specifically designed metamorphic relations, metamorphic testing can serve as an effective basis for model explanation.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"187 ","pages":"Article 107855"},"PeriodicalIF":4.3000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925001946","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Context:

Textual entailment (TE) and visual entailment (VE) serve as the basis for a broad spectrum of tasks in natural language processing and vision–language modeling. However, although being extensively studied, both TE and VE models exhibit several quality issues. Additionally, their black-box nature hampers the understanding of their behaviors, making it unclear why the model fails to correctly predict entailment relationships. Consequently, there is a pressing need for methods that can effectively evaluate and explain TE and VE models.

Objective:

This study aims to develop a unified approach for detecting and interpreting failures, in both TE and VE models.

Methods:

We propose a metamorphic testing-based approach for evaluating and explaining both TE and VE models. The central aspect of our approach lies in the proposed three metamorphic relations, which are generic to both TE and VE, and also preserve specific associations among relevant inputs. The proposed approach conducts metamorphic testing to detect failures in TE and VE models. When a failure is revealed, it further performs a post-hoc analysis within the relevant group of inputs to identify information that is critical for the detected failure.

Results:

Experimental results demonstrate the effectiveness of the proposed approach in failure detection and also confirm its potential to provide useful information to pinpoint the root causes of detected failures.

Conclusion:

This study presents a general metamorphic testing approach for both TE and VE. It also demonstrates that, with specifically designed metamorphic relations, metamorphic testing can serve as an effective basis for model explanation.
文本和视觉蕴涵的变形测试:模型评估和解释的统一框架
上下文:文本蕴涵(TE)和视觉蕴涵(VE)是自然语言处理和视觉语言建模中广泛任务的基础。然而,尽管被广泛研究,TE和VE模型都表现出一些质量问题。此外,它们的黑箱性质阻碍了对它们行为的理解,这使得人们不清楚为什么模型不能正确预测隐含关系。因此,迫切需要能够有效评估和解释TE和VE模型的方法。目的:本研究旨在开发一种统一的方法来检测和解释TE和VE模型中的故障。方法:我们提出了一种基于变质检验的方法来评估和解释TE和VE模型。我们的方法的核心方面在于提出的三个变质关系,它们对TE和VE都是通用的,并且还保留了相关输入之间的特定关联。提出的方法进行变质测试,以检测TE和VE模型中的故障。当发现故障时,它进一步在相关的输入组内执行事后分析,以识别对检测到的故障至关重要的信息。结果:实验结果证明了所提出的方法在故障检测中的有效性,并证实了其提供有用信息的潜力,以查明检测到的故障的根本原因。结论:本研究为TE和VE提供了一种通用的变质检测方法。这也表明,在特殊设计的变质关系下,变质检验可以作为模型解释的有效依据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information and Software Technology
Information and Software Technology 工程技术-计算机:软件工程
CiteScore
9.10
自引率
7.70%
发文量
164
审稿时长
9.6 weeks
期刊介绍: Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信