Graph-Based Mining of In-the-Wild, Fine-Grained, Semantic Code Change Patterns

H. Nguyen, T. Nguyen, Danny Dig, S. Nguyen, H. Tran, Michael C Hilton
{"title":"Graph-Based Mining of In-the-Wild, Fine-Grained, Semantic Code Change Patterns","authors":"H. Nguyen, T. Nguyen, Danny Dig, S. Nguyen, H. Tran, Michael C Hilton","doi":"10.1109/ICSE.2019.00089","DOIUrl":null,"url":null,"abstract":"Prior research exploited the repetitiveness of code changes to enable several tasks such as code completion, bug-fix recommendation, library adaption, etc. These and other novel applications require accurate detection of semantic changes, but the state-of-the-art methods are limited to algorithms that detect specific kinds of changes at the syntactic level. Existing algorithms relying on syntactic similarity have lower accuracy, and cannot effectively detect semantic change patterns. We introduce a novel graph-based mining approach, CPatMiner, to detect previously unknown repetitive changes in the wild, by mining fine-grained semantic code change patterns from a large number of repositories. To overcome unique challenges such as detecting meaningful change patterns and scaling to large repositories, we rely on fine-grained change graphs to capture program dependencies. We evaluate CPatMiner by mining change patterns in a diverse corpus of 5,000+ open-source projects from GitHub across a population of 170,000+ developers. We use three complementary methods. First, we sent the mined patterns to 108 open-source developers. We found that 70% of respondents recognized those patterns as their meaningful frequent changes. Moreover, 79% of respondents even named the patterns, and 44% wanted future IDEs to automate such repetitive changes. We found that the mined change patterns belong to various development activities: adaptive (9%), perfective (20%), corrective (35%) and preventive (36%, including refactorings). Second, we compared our tool with the state-of-the-art, AST-based technique, and reported that it detects 2.1x more meaningful patterns. Third, we use CPatMiner to search for patterns in a corpus of 88 GitHub projects with longer histories consisting of 164M SLOCs. It constructed 322K fine-grained change graphs containing 3M nodes, and detected 17K instances of change patterns from which we provide unique insights on the practice of change patterns among individuals and teams. We found that a large percentage (75%) of the change patterns from individual developers are commonly shared with others, and this holds true for teams. Moreover, we found that the patterns are not intermittent but spread widely over time. Thus, we call for a community-based change pattern database to provide important resources in novel applications.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"47 1","pages":"819-830"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2019.00089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41

Abstract

Prior research exploited the repetitiveness of code changes to enable several tasks such as code completion, bug-fix recommendation, library adaption, etc. These and other novel applications require accurate detection of semantic changes, but the state-of-the-art methods are limited to algorithms that detect specific kinds of changes at the syntactic level. Existing algorithms relying on syntactic similarity have lower accuracy, and cannot effectively detect semantic change patterns. We introduce a novel graph-based mining approach, CPatMiner, to detect previously unknown repetitive changes in the wild, by mining fine-grained semantic code change patterns from a large number of repositories. To overcome unique challenges such as detecting meaningful change patterns and scaling to large repositories, we rely on fine-grained change graphs to capture program dependencies. We evaluate CPatMiner by mining change patterns in a diverse corpus of 5,000+ open-source projects from GitHub across a population of 170,000+ developers. We use three complementary methods. First, we sent the mined patterns to 108 open-source developers. We found that 70% of respondents recognized those patterns as their meaningful frequent changes. Moreover, 79% of respondents even named the patterns, and 44% wanted future IDEs to automate such repetitive changes. We found that the mined change patterns belong to various development activities: adaptive (9%), perfective (20%), corrective (35%) and preventive (36%, including refactorings). Second, we compared our tool with the state-of-the-art, AST-based technique, and reported that it detects 2.1x more meaningful patterns. Third, we use CPatMiner to search for patterns in a corpus of 88 GitHub projects with longer histories consisting of 164M SLOCs. It constructed 322K fine-grained change graphs containing 3M nodes, and detected 17K instances of change patterns from which we provide unique insights on the practice of change patterns among individuals and teams. We found that a large percentage (75%) of the change patterns from individual developers are commonly shared with others, and this holds true for teams. Moreover, we found that the patterns are not intermittent but spread widely over time. Thus, we call for a community-based change pattern database to provide important resources in novel applications.
基于图的野外、细粒度、语义代码更改模式挖掘
先前的研究利用代码更改的重复性来实现一些任务,如代码完成、bug修复建议、库适配等。这些和其他新颖的应用程序需要准确地检测语义变化,但是最先进的方法仅限于在语法级别检测特定类型变化的算法。现有的依赖句法相似度的算法准确率较低,不能有效检测语义变化模式。我们引入了一种新颖的基于图的挖掘方法CPatMiner,通过从大量存储库中挖掘细粒度的语义代码更改模式来检测以前未知的重复更改。为了克服独特的挑战,例如检测有意义的变更模式和扩展到大型存储库,我们依赖于细粒度的变更图来捕获程序依赖关系。我们通过挖掘来自GitHub的5000多个开源项目的不同语料库中的变化模式来评估CPatMiner,这些项目涉及170,000多名开发人员。我们使用三种互补的方法。首先,我们将挖掘的模式发送给108个开源开发人员。我们发现70%的受访者认为这些模式是他们有意义的频繁变化。此外,79%的受访者甚至命名了模式,44%的受访者希望未来的ide能够自动化这种重复的更改。我们发现挖掘的变更模式属于各种开发活动:适应性的(9%),完美的(20%),纠正性的(35%)和预防性的(36%,包括重构)。其次,我们将我们的工具与最先进的基于ast的技术进行了比较,并报告说它检测到的有意义的模式多2.1倍。第三,我们使用CPatMiner在包含88个GitHub项目的语料库中搜索模式,这些项目的历史较长,包含164M sloc。它构建了包含3M节点的322K个细粒度变更图,并检测了17K个变更模式实例,从中我们提供了关于个人和团队之间变更模式实践的独特见解。我们发现,来自单个开发人员的很大比例(75%)的变更模式通常与其他人共享,对于团队来说也是如此。此外,我们发现这种模式不是间歇性的,而是随着时间的推移而广泛传播的。因此,我们需要一个基于社区的变化模式数据库来为新的应用程序提供重要的资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信