Ming Yan;Junjie Chen;Tianjie Jiang;Jiajun Jiang;Zan Wang
{"title":"Evaluating Spectrum-Based Fault Localization on Deep Learning Libraries","authors":"Ming Yan;Junjie Chen;Tianjie Jiang;Jiajun Jiang;Zan Wang","doi":"10.1109/TSE.2025.3552622","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) libraries have become increasingly popular and their quality assurance is also gaining significant attention. Although many fault detection techniques have been proposed, effective fault localization techniques tailored to DL libraries are scarce. Due to the unique characteristics of DL libraries (e.g., complicated code architecture supporting DL model training and inference with extensive multidimensional tensor calculations), the effectiveness of existing fault localization techniques for traditional software is also unknown on DL library faults. To bridge this gap, we conducted the first empirical study to investigate the effectiveness of fault localization on DL libraries. Specifically, we evaluated spectrum-based fault localization (SBFL) due to its high generalizability and affordable overhead on such complicated libraries. Based on the key aspects in SBFL, our study investigated the effectiveness of SBFL with different sources of passing test cases (including human-written, fuzzer-generated, and mutation-based test cases) and various suspicious value calculation methods. In particular, mutation-based test cases are produced by our designed rule-based mutation technique and LLM-based mutation technique tailored to DL library faults. To enable our extensive study, we built the first benchmark (Defects4DLL), which contains 120 real-world faults in PyTorch and TensorFlow with easy-to-use experimental environments. Our study delivered a series of useful findings. For example, the rule-based approach is effective in localizing crash faults in DL libraries, successfully localizing 44.44% of crash faults within Top-10 functions and 74.07% of crash faults within Top-10 files, while the passing test cases from DL library fuzzers perform poorly on this task. Furthermore, based on our findings on the complementarity of different sources, we designed a hybrid technique by effectively integrating human-written, LLM-mutated, rule-based mutated test cases, which further achieves 31.48%<inline-formula><tex-math>$\\boldsymbol{\\sim}$</tex-math></inline-formula>61.36% improvements over each single source in terms of the number of detected faults within Top-5 files.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1399-1414"},"PeriodicalIF":6.5000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10930847/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning (DL) libraries have become increasingly popular and their quality assurance is also gaining significant attention. Although many fault detection techniques have been proposed, effective fault localization techniques tailored to DL libraries are scarce. Due to the unique characteristics of DL libraries (e.g., complicated code architecture supporting DL model training and inference with extensive multidimensional tensor calculations), the effectiveness of existing fault localization techniques for traditional software is also unknown on DL library faults. To bridge this gap, we conducted the first empirical study to investigate the effectiveness of fault localization on DL libraries. Specifically, we evaluated spectrum-based fault localization (SBFL) due to its high generalizability and affordable overhead on such complicated libraries. Based on the key aspects in SBFL, our study investigated the effectiveness of SBFL with different sources of passing test cases (including human-written, fuzzer-generated, and mutation-based test cases) and various suspicious value calculation methods. In particular, mutation-based test cases are produced by our designed rule-based mutation technique and LLM-based mutation technique tailored to DL library faults. To enable our extensive study, we built the first benchmark (Defects4DLL), which contains 120 real-world faults in PyTorch and TensorFlow with easy-to-use experimental environments. Our study delivered a series of useful findings. For example, the rule-based approach is effective in localizing crash faults in DL libraries, successfully localizing 44.44% of crash faults within Top-10 functions and 74.07% of crash faults within Top-10 files, while the passing test cases from DL library fuzzers perform poorly on this task. Furthermore, based on our findings on the complementarity of different sources, we designed a hybrid technique by effectively integrating human-written, LLM-mutated, rule-based mutated test cases, which further achieves 31.48%$\boldsymbol{\sim}$61.36% improvements over each single source in terms of the number of detected faults within Top-5 files.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.