An Empirical Analysis of Git Commit Logs for Potential Inconsistency in Code Clones

Reishi Yokomori, Katsuro Inoue
{"title":"An Empirical Analysis of Git Commit Logs for Potential Inconsistency in Code Clones","authors":"Reishi Yokomori, Katsuro Inoue","doi":"arxiv-2409.08555","DOIUrl":null,"url":null,"abstract":"Code clones are code snippets that are identical or similar to other snippets\nwithin the same or different files. They are often created through\ncopy-and-paste practices and modified during development and maintenance\nactivities. Since a pair of code clones, known as a clone pair, has a possible\nlogical coupling between them, it is expected that changes to each snippet are\nmade simultaneously (co-changed) and consistently. There is extensive research\non code clones, including studies related to the co-change of clones; however,\ndetailed analysis of commit logs for code clone pairs has been limited. In this paper, we investigate the commit logs of code snippets from clone\npairs, using the git-log command to extract changes to cloned code snippets. We\nanalyzed 45 repositories owned by the Apache Software Foundation on GitHub and\naddressed three research questions regarding commit frequency, co-change ratio,\nand commit patterns. Our findings indicate that (1) on average, clone snippets\nare changed infrequently, typically only two or three times throughout their\nlifetime, (2) the ratio of co-changes is about half of all clone changes, with\n10-20\\% of co-changed commits being concerning (potentially inconsistent), and\n(3) 35-65\\% of all clone pairs being classified as concerning clone pairs\n(potentially inconsistent clone pairs). These results suggest the need for a\nconsistent management system through the commit timeline of clones.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Code clones are code snippets that are identical or similar to other snippets within the same or different files. They are often created through copy-and-paste practices and modified during development and maintenance activities. Since a pair of code clones, known as a clone pair, has a possible logical coupling between them, it is expected that changes to each snippet are made simultaneously (co-changed) and consistently. There is extensive research on code clones, including studies related to the co-change of clones; however, detailed analysis of commit logs for code clone pairs has been limited. In this paper, we investigate the commit logs of code snippets from clone pairs, using the git-log command to extract changes to cloned code snippets. We analyzed 45 repositories owned by the Apache Software Foundation on GitHub and addressed three research questions regarding commit frequency, co-change ratio, and commit patterns. Our findings indicate that (1) on average, clone snippets are changed infrequently, typically only two or three times throughout their lifetime, (2) the ratio of co-changes is about half of all clone changes, with 10-20\% of co-changed commits being concerning (potentially inconsistent), and (3) 35-65\% of all clone pairs being classified as concerning clone pairs (potentially inconsistent clone pairs). These results suggest the need for a consistent management system through the commit timeline of clones.
对 Git 提交日志进行实证分析,发现代码克隆中潜在的不一致性
代码克隆是指与相同或不同文件中的其他代码片段相同或相似的代码片段。它们通常通过复制粘贴的方式创建,并在开发和维护活动中进行修改。由于一对代码克隆(称为克隆对)之间可能存在逻辑耦合,因此对每个代码片段的修改应同时进行(共同修改)并保持一致。有关代码克隆的研究非常广泛,其中包括与克隆的共同变更相关的研究;但是,对代码克隆对的提交日志进行详细分析的研究还很有限。在本文中,我们使用 git-log 命令提取克隆代码片段的变更,研究了克隆对中代码片段的提交日志。我们分析了 GitHub 上阿帕奇软件基金会(Apache Software Foundation)拥有的 45 个版本库,并探讨了有关提交频率、共变比率和提交模式的三个研究问题。我们的研究结果表明:(1) 克隆代码片段的平均变更频率很低,通常在其整个生命周期内只变更两到三次;(2) 共同变更的比例约为所有克隆变更的一半,其中 10-20% 的共同变更提交为相关提交(潜在不一致提交);(3) 35-65% 的克隆对被归类为相关克隆对(潜在不一致克隆对)。这些结果表明,需要一个贯穿克隆提交时间线的一致性管理系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信