{"title":"Large scale clone detection, analysis, and benchmarking: An evolutionary perspective (Keynote)","authors":"C. Roy","doi":"10.1109/IWSC.2018.8327311","DOIUrl":"https://doi.org/10.1109/IWSC.2018.8327311","url":null,"abstract":"Copying a code fragment and then reusing it by pasting and adapting (e.g., adding/modifying/deleting statements) is a common practice in software development, which results in a significant amount of duplicated code in software systems. Developers consider cloning as one of the principled re-engineering approaches and often intentionally practice cloning for a variety of reasons such as faster development, avoiding risk by reusing stable old code, or for time pressure. On the other hand, duplicated code poses a number of threats to the maintenance of software systems such as clones are the #1 “bad smell” in Flower's refactoring list and several recent studies including studies with industrial systems show that although for many cases clones are not really harmful, and even could be useful for some cases, they could be also detrimental to software maintenance. For example, reusing a fragment containing unknown bugs may result in bugs propagation, or any changes in requirements involving a cloned fragment may lead to changes to all the similar fragments to it, multiplying the work to be done. Furthermore, inconsistent changes to the cloned fragments during any updating processes may lead to severe unexpected behaviour. Software clones are thus considered to be one of the major contributors to the high software maintenance cost, which could be up to 80% of total software development cost. The era of Big Data has introduced new applications for clone detection. For example, clone detection has been used to find similar mobile applications, to intelligently tag code snippets, to identify code examples, and so on from large inter-project repositories. The dual role of clones in software development and maintenance, along with these many emerging new applications of clone detection, has led to a great many clone detection tools and analysis frameworks. In this keynote talk, I will review the cloning literature to date, in particular, I will talk about our recent work on large scale clone detection, and the challenges in evaluating such clone detectors and how we have overcome them at least in part with our BigCloneBench and Mutation framework. I will then talk about the recent advances in clone analysis and management along with a vision for a comprehensive clone management system.","PeriodicalId":374295,"journal":{"name":"International Workshop on Software Clones","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121203907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eunjong Choi, Norihiro Yoshida, T. Ishio, Katsuro Inoue, Tateki Sano
{"title":"Extracting code clones for refactoring using combinations of clone metrics","authors":"Eunjong Choi, Norihiro Yoshida, T. Ishio, Katsuro Inoue, Tateki Sano","doi":"10.1145/1985404.1985407","DOIUrl":"https://doi.org/10.1145/1985404.1985407","url":null,"abstract":"Code clone detection tools may report a large number of code clones, while software developers are interested in only a subset of code clones that are relevant to software development tasks such as refactoring. Our research group has supported many software developers with the code clone detection tool CCFinder and its GUI front-end Gemini. Gemini shows clone sets (i.e., a set of code clones identical or similar to each other) with several clone metrics including their length and the number of code clones; however, it is not clear how to use those metrics to extract interesting code clones for developers. In this paper, we propose a method combining clone metrics to extract code clones for refactoring activity. We have conducted an empirical study on a web application developed by a Japanese software company. The result indicates that combinations of simple clone metric is more effective to extract refactoring candidates in detected code clones than individual clone metric.","PeriodicalId":374295,"journal":{"name":"International Workshop on Software Clones","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126765435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Is cloned code older than non-cloned code?","authors":"J. Krinke","doi":"10.1145/1985404.1985410","DOIUrl":"https://doi.org/10.1145/1985404.1985410","url":null,"abstract":"It is still a debated question whether cloned code causes increased maintenance efforts. If cloned code is more stable than non-cloned code, i.e. it is changed less often, it will require less maintenance efforts. The more stable cloned code is, the longer it will not have been changed, so the stability can be estimated through the code's age. This paper presents a study on the average age of cloned code. For three large open source systems, the age of every line of source code is computed as the date of the last change in that line. In addition, every line is categorized whether it belongs to cloned code as detected by a clone detector. The study shows that on average, cloned code is older than non-cloned code. Moreover, if a file has cloned code, the average age of the cloned code of the file is lower than the average age of the non-cloned code in the same file. The results support the previous findings that cloned code is more stable than non-cloned code.","PeriodicalId":374295,"journal":{"name":"International Workshop on Software Clones","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130680468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Viewing simple clones from structural clones' perspective","authors":"H. Basit, Usman Ali, S. Jarzabek","doi":"10.1145/1985404.1985406","DOIUrl":"https://doi.org/10.1145/1985404.1985406","url":null,"abstract":"In previous work, we described a technique for detecting designlevel similar program structures that we called structural clones. Structural clones are recurring configurations of simple clones (i.e., similar code fragments). In this paper, we show how structural clone analysis extends the benefits of analysis based on simple clones only. First, we present experimental results showing that in many cases simple clones participated in structural clones. In such cases, structural clones being larger than simple clones but smaller in number, allow analysts to see the \"forest from the trees\", as far as the similarity situation is concerned. We provide arguments and examples to show how the knowledge of structural clones - their location and exact similarities and differences - helps in program understanding, design recovery, maintenance, and refactoring.","PeriodicalId":374295,"journal":{"name":"International Workshop on Software Clones","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121547617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How code skips over revisions","authors":"Toshihiro Kamiya","doi":"10.1145/1985404.1985420","DOIUrl":"https://doi.org/10.1145/1985404.1985420","url":null,"abstract":"This paper explores the need for 'history-aware' searches, by experimentally showing a development process that includes code fragments which disappear at a revision and appear again at a later revision. Some of these code re-appearances are not a result of a revert command of a version control system, but a result of a developer who copied a code fragment from old source files.","PeriodicalId":374295,"journal":{"name":"International Workshop on Software Clones","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123976557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clone detection through process algebras and Java bytecode","authors":"A. Santone","doi":"10.1145/1985404.1985422","DOIUrl":"https://doi.org/10.1145/1985404.1985422","url":null,"abstract":"In this paper we present a formal method-based approach in detecting source code clones by means of analysing and comparing the Java Bytecode that is produced when the source code is compiled. A preliminary investigation has been also conducted to assess the validity of the proposed approach.","PeriodicalId":374295,"journal":{"name":"International Workshop on Software Clones","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125572956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualizing the evolution of code clones","authors":"Ripon K. Saha, C. Roy, Kevin A. Schneider","doi":"10.1145/1985404.1985421","DOIUrl":"https://doi.org/10.1145/1985404.1985421","url":null,"abstract":"The knowledge of code clone evolution throughout the history of a software system is essential in comprehending and managing its clones properly and cost-effectively. However, investigating and observing facts in a huge set of text-based data provided by a clone genealogy extractor could be challenging without the support of a visualization tool. In this position paper, we present an idea of visualizing code clone evolution by exploiting the advantages of existing clone visualization techniques that would be both scalable and useful.","PeriodicalId":374295,"journal":{"name":"International Workshop on Software Clones","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116523663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research in cloning beyond code: a first roadmap","authors":"Elmar Jürgens","doi":"10.1145/1985404.1985419","DOIUrl":"https://doi.org/10.1145/1985404.1985419","url":null,"abstract":"Most research in software cloning has a strong focus on source code. However, cloning occurs in other software artifacts, as well. In this paper, we summarize existing work on cloning in other software artifacts and provide a list of research questions for future work.","PeriodicalId":374295,"journal":{"name":"International Workshop on Software Clones","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123395786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CloneDiff: semantic differencing of clones","authors":"Yinxing Xue, Zhenchang Xing, S. Jarzabek","doi":"10.1145/1985404.1985428","DOIUrl":"https://doi.org/10.1145/1985404.1985428","url":null,"abstract":"Clone detection provides a scalable and efficient way to detect similar codes, while program differencing is a powerful and effective way to analyze similar codes. CloneDiff, a Program Dependence Graphs (PDGs) differencing tool, complements clone detection with program differencing for the purpose of characterizing clones. It captures semantic information of clones from PDGs, and uses graph matching techniques to compute a precise characterization of clones in terms of a category of semantic differences.","PeriodicalId":374295,"journal":{"name":"International Workshop on Software Clones","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133544633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VisCad: flexible code clone analysis support for NiCad","authors":"M. Asaduzzaman, C. Roy, Kevin A. Schneider","doi":"10.1145/1985404.1985425","DOIUrl":"https://doi.org/10.1145/1985404.1985425","url":null,"abstract":"Clone detector results can be better understood with tools that support visualization and facilitate in-depth analysis. In this tool demo paper we present VisCad, a comprehensive code clone analysis and visualization tool that provides such support for the near-miss hybrid clone detection tool, NiCad. Through carefully selectedmetrics and visualization techniques VisCad can guide users to explore the cloning of a system from different perspectives.","PeriodicalId":374295,"journal":{"name":"International Workshop on Software Clones","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121885194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}