Large scale clone detection, analysis, and benchmarking: An evolutionary perspective (Keynote)

International Workshop on Software Clones Pub Date : 2018-03-01 DOI:10.1109/IWSC.2018.8327311

C. Roy

{"title":"Large scale clone detection, analysis, and benchmarking: An evolutionary perspective (Keynote)","authors":"C. Roy","doi":"10.1109/IWSC.2018.8327311","DOIUrl":null,"url":null,"abstract":"Copying a code fragment and then reusing it by pasting and adapting (e.g., adding/modifying/deleting statements) is a common practice in software development, which results in a significant amount of duplicated code in software systems. Developers consider cloning as one of the principled re-engineering approaches and often intentionally practice cloning for a variety of reasons such as faster development, avoiding risk by reusing stable old code, or for time pressure. On the other hand, duplicated code poses a number of threats to the maintenance of software systems such as clones are the #1 “bad smell” in Flower's refactoring list and several recent studies including studies with industrial systems show that although for many cases clones are not really harmful, and even could be useful for some cases, they could be also detrimental to software maintenance. For example, reusing a fragment containing unknown bugs may result in bugs propagation, or any changes in requirements involving a cloned fragment may lead to changes to all the similar fragments to it, multiplying the work to be done. Furthermore, inconsistent changes to the cloned fragments during any updating processes may lead to severe unexpected behaviour. Software clones are thus considered to be one of the major contributors to the high software maintenance cost, which could be up to 80% of total software development cost. The era of Big Data has introduced new applications for clone detection. For example, clone detection has been used to find similar mobile applications, to intelligently tag code snippets, to identify code examples, and so on from large inter-project repositories. The dual role of clones in software development and maintenance, along with these many emerging new applications of clone detection, has led to a great many clone detection tools and analysis frameworks. In this keynote talk, I will review the cloning literature to date, in particular, I will talk about our recent work on large scale clone detection, and the challenges in evaluating such clone detectors and how we have overcome them at least in part with our BigCloneBench and Mutation framework. I will then talk about the recent advances in clone analysis and management along with a vision for a comprehensive clone management system.","PeriodicalId":374295,"journal":{"name":"International Workshop on Software Clones","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Software Clones","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWSC.2018.8327311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Copying a code fragment and then reusing it by pasting and adapting (e.g., adding/modifying/deleting statements) is a common practice in software development, which results in a significant amount of duplicated code in software systems. Developers consider cloning as one of the principled re-engineering approaches and often intentionally practice cloning for a variety of reasons such as faster development, avoiding risk by reusing stable old code, or for time pressure. On the other hand, duplicated code poses a number of threats to the maintenance of software systems such as clones are the #1 “bad smell” in Flower's refactoring list and several recent studies including studies with industrial systems show that although for many cases clones are not really harmful, and even could be useful for some cases, they could be also detrimental to software maintenance. For example, reusing a fragment containing unknown bugs may result in bugs propagation, or any changes in requirements involving a cloned fragment may lead to changes to all the similar fragments to it, multiplying the work to be done. Furthermore, inconsistent changes to the cloned fragments during any updating processes may lead to severe unexpected behaviour. Software clones are thus considered to be one of the major contributors to the high software maintenance cost, which could be up to 80% of total software development cost. The era of Big Data has introduced new applications for clone detection. For example, clone detection has been used to find similar mobile applications, to intelligently tag code snippets, to identify code examples, and so on from large inter-project repositories. The dual role of clones in software development and maintenance, along with these many emerging new applications of clone detection, has led to a great many clone detection tools and analysis frameworks. In this keynote talk, I will review the cloning literature to date, in particular, I will talk about our recent work on large scale clone detection, and the challenges in evaluating such clone detectors and how we have overcome them at least in part with our BigCloneBench and Mutation framework. I will then talk about the recent advances in clone analysis and management along with a vision for a comprehensive clone management system.

查看原文本刊更多论文

大规模克隆检测、分析和基准测试:进化视角(主题演讲)

复制代码片段，然后通过粘贴和调整(例如，添加/修改/删除语句)重用它是软件开发中的一种常见做法，这会导致软件系统中大量重复的代码。开发人员认为克隆是一种有原则的再工程方法，并且经常出于各种原因有意地进行克隆，例如更快的开发，通过重用稳定的旧代码来避免风险，或者出于时间压力。另一方面，重复的代码对软件系统的维护构成了许多威胁，比如克隆是Flower重构列表中排名第一的“难闻气味”，最近的一些研究(包括对工业系统的研究)表明，尽管在许多情况下克隆并不是真的有害，甚至在某些情况下可能是有用的，但它们也可能对软件维护有害。例如，重用包含未知错误的片段可能会导致错误的传播，或者涉及克隆片段的需求中的任何更改都可能导致对所有与之相似的片段的更改，从而增加要完成的工作。此外，在任何更新过程中对克隆片段的不一致更改可能导致严重的意外行为。因此，软件克隆被认为是高软件维护成本的主要贡献者之一，它可能高达软件开发总成本的80%。大数据时代为克隆检测带来了新的应用。例如，克隆检测已被用于查找类似的移动应用程序，智能地标记代码片段，从大型项目间存储库中识别代码示例等等。克隆在软件开发和维护中的双重作用，以及这些新兴的克隆检测的新应用，导致了大量的克隆检测工具和分析框架。在这个主题演讲中，我将回顾迄今为止的克隆文献，特别是，我将谈论我们最近在大规模克隆检测方面的工作，以及评估这种克隆检测器的挑战，以及我们如何克服它们，至少部分地使用我们的BigCloneBench和Mutation框架。然后，我将谈谈克隆分析和管理的最新进展，以及对全面克隆管理系统的展望。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Workshop on Software Clones

自引率

0.00%

发文量