DeepVec: State-Vector Aware Test Case Selection for Enhancing Recurrent Neural Network

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Software Engineering Pub Date : 2025-04-28 DOI:10.1109/TSE.2025.3565037

Zhonghao Jiang;Meng Yan;Li Huang;Weifeng Sun;Chao Liu;Song Sun;David Lo

{"title":"DeepVec: State-Vector Aware Test Case Selection for Enhancing Recurrent Neural Network","authors":"Zhonghao Jiang;Meng Yan;Li Huang;Weifeng Sun;Chao Liu;Song Sun;David Lo","doi":"10.1109/TSE.2025.3565037","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNN) have realized significant achievements across various application domains. There is no doubt that testing and enhancing a pre-trained DNN that has been deployed in an application scenario is crucial, because it can reduce the failures of the DNN. DNN-driven software testing and enhancement require large amounts of labeled data. The high cost and inefficiency caused by the large volume of data of manual labeling, and the time consumption of testing all cases in real scenarios are unacceptable. Therefore, test case selection technologies are proposed to reduce the time cost by selecting and only labeling representative test cases without compromising testing performance. Test case selection based on neuron coverage (NC) or uncertainty metrics has achieved significant success in Convolutional Neural Networks (CNN) testing. However, it is challenging to transfer these methods to Recurrent Neural Networks (RNN), which excel at text tasks, due to the mismatch in model output formats and the reliance on image-specific characteristics. What’s more, balancing the execution cost and performance of the algorithm is also indispensable. In this paper, we propose a state-vector aware test case selection method for RNN models, namely DeepVec, which reduces the cost of data labeling and saves computing resources and balances the execution cost and performance. DeepVec selects data using uncertainty metric based on the norm of the output vector at each time step (i.e., state-vector), and similarity metric based on the direction angle of the state-vector. Because test cases with smaller state-vector norms often possess greater information entropy and similar changes of state-vector direction angle indicate similar RNN internal states. These metrics can be calculated with just a single inference, which gives it strong bug detection and model improvement capabilities. We evaluate DeepVec on five popular datasets, containing images and texts as well as commonly used 3 RNN classification models, and compare it with NC-based, uncertainty-based, and other black-box methods. Experimental results demonstrate that DeepVec achieves an average relative improvement of 12.5%-118.22% over baseline methods in selecting fault-revealing test cases with time costs reduced to only 1% to 1‱. At the same time, we find that the absolute accuracy improvement after retraining outperforms baseline methods by 0.29%-24.01% when selecting 15% data to retrain.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1702-1723"},"PeriodicalIF":5.6000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10979368/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Deep Neural Networks (DNN) have realized significant achievements across various application domains. There is no doubt that testing and enhancing a pre-trained DNN that has been deployed in an application scenario is crucial, because it can reduce the failures of the DNN. DNN-driven software testing and enhancement require large amounts of labeled data. The high cost and inefficiency caused by the large volume of data of manual labeling, and the time consumption of testing all cases in real scenarios are unacceptable. Therefore, test case selection technologies are proposed to reduce the time cost by selecting and only labeling representative test cases without compromising testing performance. Test case selection based on neuron coverage (NC) or uncertainty metrics has achieved significant success in Convolutional Neural Networks (CNN) testing. However, it is challenging to transfer these methods to Recurrent Neural Networks (RNN), which excel at text tasks, due to the mismatch in model output formats and the reliance on image-specific characteristics. What’s more, balancing the execution cost and performance of the algorithm is also indispensable. In this paper, we propose a state-vector aware test case selection method for RNN models, namely DeepVec, which reduces the cost of data labeling and saves computing resources and balances the execution cost and performance. DeepVec selects data using uncertainty metric based on the norm of the output vector at each time step (i.e., state-vector), and similarity metric based on the direction angle of the state-vector. Because test cases with smaller state-vector norms often possess greater information entropy and similar changes of state-vector direction angle indicate similar RNN internal states. These metrics can be calculated with just a single inference, which gives it strong bug detection and model improvement capabilities. We evaluate DeepVec on five popular datasets, containing images and texts as well as commonly used 3 RNN classification models, and compare it with NC-based, uncertainty-based, and other black-box methods. Experimental results demonstrate that DeepVec achieves an average relative improvement of 12.5%-118.22% over baseline methods in selecting fault-revealing test cases with time costs reduced to only 1% to 1‱. At the same time, we find that the absolute accuracy improvement after retraining outperforms baseline methods by 0.29%-24.01% when selecting 15% data to retrain.

查看原文本刊更多论文

深度vec：状态向量感知测试用例选择增强递归神经网络

深度神经网络（Deep Neural Networks， DNN）在各个应用领域都取得了显著的成就。毫无疑问，测试和增强已经部署在应用场景中的预训练深度神经网络是至关重要的，因为它可以减少深度神经网络的故障。深层神经网络驱动的软件测试和增强需要大量的标记数据。人工标注的大数据量导致的高成本和低效率，以及在真实场景中测试所有案例的时间消耗是不可接受的。因此，提出了测试用例选择技术，通过选择和仅标记具有代表性的测试用例而不影响测试性能来减少时间成本。基于神经元覆盖率（NC）或不确定性度量的测试用例选择在卷积神经网络（CNN）测试中取得了显著的成功。然而，由于模型输出格式的不匹配和对图像特定特征的依赖，将这些方法转移到擅长文本任务的循环神经网络（RNN）是具有挑战性的。此外，平衡算法的执行成本和性能也是必不可少的。在本文中，我们提出了一种状态向量感知的RNN模型测试用例选择方法，即DeepVec，该方法降低了数据标注成本，节省了计算资源，平衡了执行成本和性能。DeepVec使用基于输出向量在每个时间步长的范数（即状态向量）的不确定性度量和基于状态向量的方向角的相似性度量来选择数据。因为状态向量范数较小的测试用例往往具有更大的信息熵，状态向量方向角的相似变化表明相似的RNN内部状态。这些度量可以通过一个推断来计算，这给了它强大的错误检测和模型改进能力。我们在包含图像和文本的五个流行数据集以及常用的3种RNN分类模型上对DeepVec进行了评估，并将其与基于nc的、基于不确定性的和其他黑盒方法进行了比较。实验结果表明，DeepVec在选择故障揭示测试用例方面，实现了12.5% ~ 118.22%的平均相对改进，时间成本仅降低到1% ~ 1%。同时，我们发现，当选择15%的数据进行再训练时，再训练后的绝对准确率提高幅度比基线方法高出0.29%-24.01%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.