{"title":"文本和视觉蕴涵的变形测试:模型评估和解释的统一框架","authors":"Mingyue Jiang , Bintao Hu , Xiao-Yi Zhang","doi":"10.1016/j.infsof.2025.107855","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Textual entailment (TE) and visual entailment (VE) serve as the basis for a broad spectrum of tasks in natural language processing and vision–language modeling. However, although being extensively studied, both TE and VE models exhibit several quality issues. Additionally, their black-box nature hampers the understanding of their behaviors, making it unclear why the model fails to correctly predict entailment relationships. Consequently, there is a pressing need for methods that can effectively evaluate and explain TE and VE models.</div></div><div><h3>Objective:</h3><div>This study aims to develop a unified approach for detecting and interpreting failures, in both TE and VE models.</div></div><div><h3>Methods:</h3><div>We propose a metamorphic testing-based approach for evaluating and explaining both TE and VE models. The central aspect of our approach lies in the proposed three metamorphic relations, which are generic to both TE and VE, and also preserve specific associations among relevant inputs. The proposed approach conducts metamorphic testing to detect failures in TE and VE models. When a failure is revealed, it further performs a post-hoc analysis within the relevant group of inputs to identify information that is critical for the detected failure.</div></div><div><h3>Results:</h3><div>Experimental results demonstrate the effectiveness of the proposed approach in failure detection and also confirm its potential to provide useful information to pinpoint the root causes of detected failures.</div></div><div><h3>Conclusion:</h3><div>This study presents a general metamorphic testing approach for both TE and VE. It also demonstrates that, with specifically designed metamorphic relations, metamorphic testing can serve as an effective basis for model explanation.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"187 ","pages":"Article 107855"},"PeriodicalIF":4.3000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Metamorphic testing for textual and visual entailment: A unified framework for model evaluation and explanation\",\"authors\":\"Mingyue Jiang , Bintao Hu , Xiao-Yi Zhang\",\"doi\":\"10.1016/j.infsof.2025.107855\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Context:</h3><div>Textual entailment (TE) and visual entailment (VE) serve as the basis for a broad spectrum of tasks in natural language processing and vision–language modeling. However, although being extensively studied, both TE and VE models exhibit several quality issues. Additionally, their black-box nature hampers the understanding of their behaviors, making it unclear why the model fails to correctly predict entailment relationships. Consequently, there is a pressing need for methods that can effectively evaluate and explain TE and VE models.</div></div><div><h3>Objective:</h3><div>This study aims to develop a unified approach for detecting and interpreting failures, in both TE and VE models.</div></div><div><h3>Methods:</h3><div>We propose a metamorphic testing-based approach for evaluating and explaining both TE and VE models. The central aspect of our approach lies in the proposed three metamorphic relations, which are generic to both TE and VE, and also preserve specific associations among relevant inputs. The proposed approach conducts metamorphic testing to detect failures in TE and VE models. When a failure is revealed, it further performs a post-hoc analysis within the relevant group of inputs to identify information that is critical for the detected failure.</div></div><div><h3>Results:</h3><div>Experimental results demonstrate the effectiveness of the proposed approach in failure detection and also confirm its potential to provide useful information to pinpoint the root causes of detected failures.</div></div><div><h3>Conclusion:</h3><div>This study presents a general metamorphic testing approach for both TE and VE. It also demonstrates that, with specifically designed metamorphic relations, metamorphic testing can serve as an effective basis for model explanation.</div></div>\",\"PeriodicalId\":54983,\"journal\":{\"name\":\"Information and Software Technology\",\"volume\":\"187 \",\"pages\":\"Article 107855\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information and Software Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950584925001946\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925001946","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Metamorphic testing for textual and visual entailment: A unified framework for model evaluation and explanation
Context:
Textual entailment (TE) and visual entailment (VE) serve as the basis for a broad spectrum of tasks in natural language processing and vision–language modeling. However, although being extensively studied, both TE and VE models exhibit several quality issues. Additionally, their black-box nature hampers the understanding of their behaviors, making it unclear why the model fails to correctly predict entailment relationships. Consequently, there is a pressing need for methods that can effectively evaluate and explain TE and VE models.
Objective:
This study aims to develop a unified approach for detecting and interpreting failures, in both TE and VE models.
Methods:
We propose a metamorphic testing-based approach for evaluating and explaining both TE and VE models. The central aspect of our approach lies in the proposed three metamorphic relations, which are generic to both TE and VE, and also preserve specific associations among relevant inputs. The proposed approach conducts metamorphic testing to detect failures in TE and VE models. When a failure is revealed, it further performs a post-hoc analysis within the relevant group of inputs to identify information that is critical for the detected failure.
Results:
Experimental results demonstrate the effectiveness of the proposed approach in failure detection and also confirm its potential to provide useful information to pinpoint the root causes of detected failures.
Conclusion:
This study presents a general metamorphic testing approach for both TE and VE. It also demonstrates that, with specifically designed metamorphic relations, metamorphic testing can serve as an effective basis for model explanation.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.