LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction

IF 3.5 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Fengyu Yang, Fa Zhong, Guangdong Zeng, Peng Xiao, Wei Zheng
{"title":"LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction","authors":"Fengyu Yang, Fa Zhong, Guangdong Zeng, Peng Xiao, Wei Zheng","doi":"10.1007/s10664-023-10439-z","DOIUrl":null,"url":null,"abstract":"<p>Software defect prediction plays a key role in guiding resource allocation for software testing. However, previous defect prediction studies still have some limitations: (1) the granularity of defect prediction is still coarse, so high-risk code statements cannot be accurately located; (2) in fine-grained defect prediction, the semantic and structural information available in a single line of code is limited, and the content of code semantic information is not sufficient to achieve semantic differentiation. To address the above problems, we propose a two-phase line-level defect prediction method based on deep learning called LineFlowDP. We first extract the program dependency graph (PDG) of the source files. The lines of code corresponding to the nodes in the PDG are extended semantically with data flow and control flow information and embedded as nodes, and the model is further trained using an relational graph convolutional network. Finally, a graph interpreter GNNExplainer and a social network analysis method are used to rank the lines of code in the defective file according to risk. On 32 datasets from 9 projects, the experimental results show that LineFlowDP is 13%-404% more cost-effective than four state-of-the-art line-level defect prediction methods. The effectiveness of the flow information extension and code line risk ranking methods was also verified via ablation experiments.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"3 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-023-10439-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Software defect prediction plays a key role in guiding resource allocation for software testing. However, previous defect prediction studies still have some limitations: (1) the granularity of defect prediction is still coarse, so high-risk code statements cannot be accurately located; (2) in fine-grained defect prediction, the semantic and structural information available in a single line of code is limited, and the content of code semantic information is not sufficient to achieve semantic differentiation. To address the above problems, we propose a two-phase line-level defect prediction method based on deep learning called LineFlowDP. We first extract the program dependency graph (PDG) of the source files. The lines of code corresponding to the nodes in the PDG are extended semantically with data flow and control flow information and embedded as nodes, and the model is further trained using an relational graph convolutional network. Finally, a graph interpreter GNNExplainer and a social network analysis method are used to rank the lines of code in the defective file according to risk. On 32 datasets from 9 projects, the experimental results show that LineFlowDP is 13%-404% more cost-effective than four state-of-the-art line-level defect prediction methods. The effectiveness of the flow information extension and code line risk ranking methods was also verified via ablation experiments.

Abstract Image

LineFlowDP:基于深度学习的线路级缺陷预测两阶段方法
软件缺陷预测在指导软件测试资源分配方面发挥着关键作用。然而,以往的缺陷预测研究仍存在一些局限性:(1)缺陷预测的粒度仍然较粗,无法准确定位高风险代码语句;(2)在细粒度缺陷预测中,单行代码中可获得的语义和结构信息有限,代码语义信息内容不足以实现语义区分。针对上述问题,我们提出了一种基于深度学习的两阶段行级缺陷预测方法,称为 LineFlowDP。我们首先提取源文件的程序依赖图(PDG)。然后使用关系图卷积网络进一步训练模型。最后,使用图解释器 GNNExplainer 和社会网络分析方法对缺陷文件中的代码行进行风险排序。在来自 9 个项目的 32 个数据集上,实验结果表明 LineFlowDP 比四种最先进的行级缺陷预测方法的性价比高出 13%-404%。流量信息扩展和代码行风险排序方法的有效性也通过消融实验得到了验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Empirical Software Engineering
Empirical Software Engineering 工程技术-计算机:软件工程
CiteScore
8.50
自引率
12.20%
发文量
169
审稿时长
>12 weeks
期刊介绍: Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信