Benchmarks for software clone detection: A ten-year retrospective

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2018-03-20 DOI:10.1109/SANER.2018.8330194

C. Roy, J. Cordy

{"title":"Benchmarks for software clone detection: A ten-year retrospective","authors":"C. Roy, J. Cordy","doi":"10.1109/SANER.2018.8330194","DOIUrl":null,"url":null,"abstract":"There have been a great many methods and tools proposed for software clone detection. While some work has been done on assessing and comparing performance of these tools, very little empirical evaluation has been done. In particular, accuracy measures such as precision and recall have only been roughly estimated, due both to problems in creating a validated clone benchmark against which tools can be compared, and to the manual effort required to hand check large numbers of candidate clones. In order to cope with this issue, over the last 10 years we have been working towards building cloning benchmarks for objectively evaluating clone detection tools. Beginning with our WCRE 2008 paper, where we conducted a modestly large empirical study with the NiCad clone detection tool, over the past ten years we have extended and grown our work to include several languages, much larger datasets, and model clones in languages such as Simulink. From a modest set of 15 C and Java systems comprising a total of 7 million lines in 2008, our work has progressed to a benchmark called BigCloneBench with eight million manually validated clone pairs in a large inter-project source dataset of more than 25,000 projects and 365 million lines of code. In this paper, we present a history and overview of software clone detection benchmarks, and review the steps of ourselves and others to come to this stage. We outline a future for clone detection benchmarks and hope to encourage researchers to both use existing benchmarks and to contribute to building the benchmarks of the future.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"10 1","pages":"26-37"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2018.8330194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

Abstract

There have been a great many methods and tools proposed for software clone detection. While some work has been done on assessing and comparing performance of these tools, very little empirical evaluation has been done. In particular, accuracy measures such as precision and recall have only been roughly estimated, due both to problems in creating a validated clone benchmark against which tools can be compared, and to the manual effort required to hand check large numbers of candidate clones. In order to cope with this issue, over the last 10 years we have been working towards building cloning benchmarks for objectively evaluating clone detection tools. Beginning with our WCRE 2008 paper, where we conducted a modestly large empirical study with the NiCad clone detection tool, over the past ten years we have extended and grown our work to include several languages, much larger datasets, and model clones in languages such as Simulink. From a modest set of 15 C and Java systems comprising a total of 7 million lines in 2008, our work has progressed to a benchmark called BigCloneBench with eight million manually validated clone pairs in a large inter-project source dataset of more than 25,000 projects and 365 million lines of code. In this paper, we present a history and overview of software clone detection benchmarks, and review the steps of ourselves and others to come to this stage. We outline a future for clone detection benchmarks and hope to encourage researchers to both use existing benchmarks and to contribute to building the benchmarks of the future.

查看原文本刊更多论文

软件克隆检测的基准:十年回顾

目前已经提出了许多软件克隆检测的方法和工具。虽然在评估和比较这些工具的性能方面已经做了一些工作，但很少进行实证评估。特别是，精确度和召回率之类的准确性度量只能粗略估计，这是由于创建可与工具进行比较的经过验证的克隆基准存在问题，并且需要手工检查大量候选克隆。为了解决这个问题，在过去的10年里，我们一直致力于建立克隆基准，以客观地评估克隆检测工具。从我们的WCRE 2008论文开始，我们使用NiCad克隆检测工具进行了中等规模的实证研究，在过去的十年中，我们扩展和发展了我们的工作，包括几种语言，更大的数据集，以及用Simulink等语言进行的模型克隆。从2008年的15个C和Java系统，总共700万行，我们的工作已经发展到一个名为BigCloneBench的基准，在一个大型项目间源数据集中有超过25,000个项目和3.65亿行代码，其中有800万对手动验证的克隆对。在本文中，我们介绍了软件克隆检测基准的历史和概述，并回顾了我们自己和其他人达到这一阶段的步骤。我们概述了克隆检测基准的未来，并希望鼓励研究人员使用现有基准并为构建未来的基准做出贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量