{"title":"JDExtractor: an automated approach for efficient extraction of defect-related methods in Java projects","authors":"Tianyang Liu, Jiawei Ye, Weixing Ji","doi":"10.1007/s10515-025-00563-z","DOIUrl":null,"url":null,"abstract":"<div><p>High-quality repositories containing real-world defects are essential for developing defect-related algorithms. Although plenty of defect repositories exist, they often fail to capture the context of inter-procedural defects, which include all methods in the propagation path from the defect-source method to the defect-triggering method. This limitation is particularly critical for the Null Pointer Exception (NPE), a common defect that often propagates across multiple methods in Java systems. To address this problem, we propose a novel and automatic approach, called <i>JDExtractor</i>, to extract defect-related methods from real applications. The main challenge is how to identify all defect-related methods efficiently and accurately. <i>JDExtractor</i> tackles this challenge by constructing a method-level data graph using the principle of Java type compatibility and simplifying the data graph using filtering criteria. Data flow analysis helps construct a coarse-grained method-level data graph, which reflects the potential patterns of inter-procedural data interaction, thereby ensuring analysis efficiency. Afterward, filtering analysis simplifies the data graph based on the propagation properties of inter-procedural defects, thus ensuring analysis accuracy. Evaluation results suggest that both the static slicing tool WALA and the dynamic slicing tool Slicer4J yield several false positives, whereas <i>JDExtractor</i> successfully extracts defect-related methods and defect propagation paths with fewer false positives in a short time. Moreover, <i>JDExtractor</i> has been applied to open source projects on GitHub, ultimately extracting defect-related methods for 67 defects from 319 compiled open source applications.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00563-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
High-quality repositories containing real-world defects are essential for developing defect-related algorithms. Although plenty of defect repositories exist, they often fail to capture the context of inter-procedural defects, which include all methods in the propagation path from the defect-source method to the defect-triggering method. This limitation is particularly critical for the Null Pointer Exception (NPE), a common defect that often propagates across multiple methods in Java systems. To address this problem, we propose a novel and automatic approach, called JDExtractor, to extract defect-related methods from real applications. The main challenge is how to identify all defect-related methods efficiently and accurately. JDExtractor tackles this challenge by constructing a method-level data graph using the principle of Java type compatibility and simplifying the data graph using filtering criteria. Data flow analysis helps construct a coarse-grained method-level data graph, which reflects the potential patterns of inter-procedural data interaction, thereby ensuring analysis efficiency. Afterward, filtering analysis simplifies the data graph based on the propagation properties of inter-procedural defects, thus ensuring analysis accuracy. Evaluation results suggest that both the static slicing tool WALA and the dynamic slicing tool Slicer4J yield several false positives, whereas JDExtractor successfully extracts defect-related methods and defect propagation paths with fewer false positives in a short time. Moreover, JDExtractor has been applied to open source projects on GitHub, ultimately extracting defect-related methods for 67 defects from 319 compiled open source applications.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.