PaReco:在软件家族的不同变体中修补克隆和遗漏的补丁

Poedjadevie Ramkisoen, John Businge, B. V. Bladel, Alexandre Decan, S. Demeyer, Coen De Roover, Foutse Khomh
{"title":"PaReco:在软件家族的不同变体中修补克隆和遗漏的补丁","authors":"Poedjadevie Ramkisoen, John Businge, B. V. Bladel, Alexandre Decan, S. Demeyer, Coen De Roover, Foutse Khomh","doi":"10.1145/3540250.3549112","DOIUrl":null,"url":null,"abstract":"Re-using whole repositories as a starting point for new projects is often done by maintaining a variant fork parallel to the original. However, the common artifacts between both are not always kept up to date. As a result, patches are not optimally integrated across the two repositories, which may lead to sub-optimal maintenance between the variant and the original project. A bug existing in both repositories can be patched in one but not the other (we see this as a missed opportunity) or it can be manually patched in both probably by different developers (we see this as effort duplication). In this paper we present a tool (named PaReCo) which relies on clone detection to mine cases of missed opportunity and effort duplication from a pool of patches. We analyzed 364 (source to target) variant pairs with 8,323 patches resulting in a curated dataset containing 1,116 cases of effort duplication and 1,008 cases of missed opportunities. We achieve a precision of 91%, recall of 80%, accuracy of 88%, and F1-score of 85%. Furthermore, we investigated the time interval between patches and found out that, on average, missed patches in the target variants have been introduced in the source variants 52 weeks earlier. Consequently, PaReCo can be used to manage variability in “time” by automatically identifying interesting patches in later project releases to be backported to supported earlier releases.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"PaReco: patched clones and missed patches among the divergent variants of a software family\",\"authors\":\"Poedjadevie Ramkisoen, John Businge, B. V. Bladel, Alexandre Decan, S. Demeyer, Coen De Roover, Foutse Khomh\",\"doi\":\"10.1145/3540250.3549112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Re-using whole repositories as a starting point for new projects is often done by maintaining a variant fork parallel to the original. However, the common artifacts between both are not always kept up to date. As a result, patches are not optimally integrated across the two repositories, which may lead to sub-optimal maintenance between the variant and the original project. A bug existing in both repositories can be patched in one but not the other (we see this as a missed opportunity) or it can be manually patched in both probably by different developers (we see this as effort duplication). In this paper we present a tool (named PaReCo) which relies on clone detection to mine cases of missed opportunity and effort duplication from a pool of patches. We analyzed 364 (source to target) variant pairs with 8,323 patches resulting in a curated dataset containing 1,116 cases of effort duplication and 1,008 cases of missed opportunities. We achieve a precision of 91%, recall of 80%, accuracy of 88%, and F1-score of 85%. Furthermore, we investigated the time interval between patches and found out that, on average, missed patches in the target variants have been introduced in the source variants 52 weeks earlier. Consequently, PaReCo can be used to manage variability in “time” by automatically identifying interesting patches in later project releases to be backported to supported earlier releases.\",\"PeriodicalId\":68155,\"journal\":{\"name\":\"软件产业与工程\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"软件产业与工程\",\"FirstCategoryId\":\"1089\",\"ListUrlMain\":\"https://doi.org/10.1145/3540250.3549112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"软件产业与工程","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.1145/3540250.3549112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

重用整个存储库作为新项目的起点,通常是通过维护一个与原始版本并行的变体分支来完成的。然而,两者之间的公共工件并不总是保持最新。因此,补丁不能在两个存储库之间最佳地集成,这可能导致变体和原始项目之间的次优维护。存在于两个存储库中的错误可以在一个存储库中修补,而在另一个存储库中则不能(我们认为这是错失的机会),或者可以由不同的开发人员在两个存储库中手动修补(我们认为这是重复工作)。在本文中,我们提出了一个工具(名为PaReCo),它依靠克隆检测从补丁池中挖掘错失机会和工作重复的情况。我们分析了包含8323个补丁的364对(从源到目标)变体对,生成了一个包含1116个重复工作案例和1008个错失机会案例的精选数据集。我们的准确率为91%,召回率为80%,准确率为88%,f1分数为85%。此外,我们调查了补丁之间的时间间隔,发现平均而言,目标变体中缺失的补丁已经在52周前引入到源变体中。因此,PaReCo可以用来管理“时间”上的可变性,方法是自动识别后期项目版本中需要反向移植到受支持的早期版本中的有趣补丁。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
PaReco: patched clones and missed patches among the divergent variants of a software family
Re-using whole repositories as a starting point for new projects is often done by maintaining a variant fork parallel to the original. However, the common artifacts between both are not always kept up to date. As a result, patches are not optimally integrated across the two repositories, which may lead to sub-optimal maintenance between the variant and the original project. A bug existing in both repositories can be patched in one but not the other (we see this as a missed opportunity) or it can be manually patched in both probably by different developers (we see this as effort duplication). In this paper we present a tool (named PaReCo) which relies on clone detection to mine cases of missed opportunity and effort duplication from a pool of patches. We analyzed 364 (source to target) variant pairs with 8,323 patches resulting in a curated dataset containing 1,116 cases of effort duplication and 1,008 cases of missed opportunities. We achieve a precision of 91%, recall of 80%, accuracy of 88%, and F1-score of 85%. Furthermore, we investigated the time interval between patches and found out that, on average, missed patches in the target variants have been introduced in the source variants 52 weeks earlier. Consequently, PaReCo can be used to manage variability in “time” by automatically identifying interesting patches in later project releases to be backported to supported earlier releases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
676
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信