通过混合粒度的树状编码进行缺陷预测,实现软件可持续性

IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Shaojian Qiu;Huihao Huang;Wenchao Jiang;Fanlong Zhang;Weilin Zhou
{"title":"通过混合粒度的树状编码进行缺陷预测,实现软件可持续性","authors":"Shaojian Qiu;Huihao Huang;Wenchao Jiang;Fanlong Zhang;Weilin Zhou","doi":"10.1109/TSUSC.2023.3248965","DOIUrl":null,"url":null,"abstract":"Defects in software may result in system crashes, sluggish performance, or even deadlock, leading to the depletion of valuable resources. Implementing defect prediction can assist quality assurance teams in identifying potential software issues and rationalizing the allocation of testing resources, thereby decreasing the elimination of resources and enhancing software sustainability. Researchers have recently incorporated deep learning into defect prediction, extracting structural-semantic features from codes’ abstract syntax trees (ASTs). However, inappropriate node granularity in ASTs may adversely impact the effectiveness of the extracted features. In addition, converting AST nodes into integer vectors may lead to the loss of structure information, resulting in poor model predictive capability. This paper proposes a tree-based encoding method with hybrid granularity for defect prediction to address these challenges. Specifically, five granularity selection schemes are extended to generate various ASTs from codes. Subsequently, a tree-based continuous bag-of-words model is utilized to map nodes of ASTs into numeric vector representations that conform to the tree-like structure of codes. The matrices converted from ASTs are then fed into a convolutional neural network to extract program features automatically. Experiments involving 24 versions of open-source projects demonstrate that our method can improve the effectiveness of extracted features in defect prediction tasks.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"9 3","pages":"249-260"},"PeriodicalIF":3.0000,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Defect Prediction via Tree-Based Encoding with Hybrid Granularity for Software Sustainability\",\"authors\":\"Shaojian Qiu;Huihao Huang;Wenchao Jiang;Fanlong Zhang;Weilin Zhou\",\"doi\":\"10.1109/TSUSC.2023.3248965\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Defects in software may result in system crashes, sluggish performance, or even deadlock, leading to the depletion of valuable resources. Implementing defect prediction can assist quality assurance teams in identifying potential software issues and rationalizing the allocation of testing resources, thereby decreasing the elimination of resources and enhancing software sustainability. Researchers have recently incorporated deep learning into defect prediction, extracting structural-semantic features from codes’ abstract syntax trees (ASTs). However, inappropriate node granularity in ASTs may adversely impact the effectiveness of the extracted features. In addition, converting AST nodes into integer vectors may lead to the loss of structure information, resulting in poor model predictive capability. This paper proposes a tree-based encoding method with hybrid granularity for defect prediction to address these challenges. Specifically, five granularity selection schemes are extended to generate various ASTs from codes. Subsequently, a tree-based continuous bag-of-words model is utilized to map nodes of ASTs into numeric vector representations that conform to the tree-like structure of codes. The matrices converted from ASTs are then fed into a convolutional neural network to extract program features automatically. Experiments involving 24 versions of open-source projects demonstrate that our method can improve the effectiveness of extracted features in defect prediction tasks.\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":\"9 3\",\"pages\":\"249-260\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-02-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10052729/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10052729/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

软件缺陷可能会导致系统崩溃、性能迟缓甚至死锁,从而耗费宝贵的资源。实施缺陷预测可以帮助质量保证团队识别潜在的软件问题,合理分配测试资源,从而减少资源损耗,提高软件的可持续性。最近,研究人员将深度学习融入缺陷预测,从代码的抽象语法树(AST)中提取结构语义特征。然而,AST 中不适当的节点粒度可能会对所提取特征的有效性产生不利影响。此外,将 AST 节点转换为整数向量可能会导致结构信息的丢失,从而导致模型预测能力低下。本文提出了一种基于树的混合粒度编码方法,用于缺陷预测,以应对这些挑战。具体来说,本文扩展了五种粒度选择方案,以便从编码中生成各种 AST。然后,利用基于树的连续词袋模型,将 AST 的节点映射为符合代码树状结构的数字向量表示。然后将从 AST 转换而来的矩阵输入卷积神经网络,以自动提取程序特征。涉及 24 个开源项目版本的实验证明,我们的方法可以提高缺陷预测任务中提取特征的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Defect Prediction via Tree-Based Encoding with Hybrid Granularity for Software Sustainability
Defects in software may result in system crashes, sluggish performance, or even deadlock, leading to the depletion of valuable resources. Implementing defect prediction can assist quality assurance teams in identifying potential software issues and rationalizing the allocation of testing resources, thereby decreasing the elimination of resources and enhancing software sustainability. Researchers have recently incorporated deep learning into defect prediction, extracting structural-semantic features from codes’ abstract syntax trees (ASTs). However, inappropriate node granularity in ASTs may adversely impact the effectiveness of the extracted features. In addition, converting AST nodes into integer vectors may lead to the loss of structure information, resulting in poor model predictive capability. This paper proposes a tree-based encoding method with hybrid granularity for defect prediction to address these challenges. Specifically, five granularity selection schemes are extended to generate various ASTs from codes. Subsequently, a tree-based continuous bag-of-words model is utilized to map nodes of ASTs into numeric vector representations that conform to the tree-like structure of codes. The matrices converted from ASTs are then fed into a convolutional neural network to extract program features automatically. Experiments involving 24 versions of open-source projects demonstrate that our method can improve the effectiveness of extracted features in defect prediction tasks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Sustainable Computing
IEEE Transactions on Sustainable Computing Mathematics-Control and Optimization
CiteScore
7.70
自引率
2.60%
发文量
54
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信