{"title":"Clone Swarm: A Cloud Based Code-Clone Analysis Tool","authors":"V. Bandi, C. Roy, C. Gutwin","doi":"10.1109/IWSC50091.2020.9047642","DOIUrl":"https://doi.org/10.1109/IWSC50091.2020.9047642","url":null,"abstract":"A code clone is defined as a pair of similar code fragments within a software system. While code clones are not always harmful, they can have a detrimental effect on the overall quality of a software system due to the propagation of bugs and other maintenance implications. Because of this, software developers need to analyse the code clones that exist in a software system. However, despite the availability of several clone detection systems, the adoption of such tools outside of the clone community remains low. A possible reason for this is the difficulty and complexity involved in setting up and using these tools. In this paper, we present Clone Swarm, a code clone analytics tool that identifies clones in a project and presents the information in an easily accessible manner. Clone Swarm is publicly available and can mine any open-sourced GIT repository. Clone Swarm internally uses NiCad, a popular clone detection tool in the cloud and lets users interactively explore code clones using a web-based interface at multiple granularity levels (Function and Block level). Clone results are visualized in multiple overviews, all the way from a high-level plot down to an individual line by line comparison view of cloned fragments. Also, to facilitate future research in the area of clone detection and analysis, users can directly download the clone detection results for their projects. Clone Swarm is available online at clone-swarm.usask.ca. The source code for Clone Swarm is freely available under the MIT license on GitHub.","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"195 16","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113999274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wahidur Rahman, Yisen Xu, Fan Pu, J. Xuan, Xiangyang Jia, Michail Basios, Leslie Kanthan, Lingbo Li, Fan Wu, Baowen Xu
{"title":"Clone Detection on Large Scala Codebases","authors":"Wahidur Rahman, Yisen Xu, Fan Pu, J. Xuan, Xiangyang Jia, Michail Basios, Leslie Kanthan, Lingbo Li, Fan Wu, Baowen Xu","doi":"10.1109/IWSC50091.2020.9047640","DOIUrl":"https://doi.org/10.1109/IWSC50091.2020.9047640","url":null,"abstract":"Code clones are identical or similar code segments. The wide existence of code clones can increase the cost of maintenance and jeopardise the quality of software. The research community has developed many techniques to detect code clones, however, there is little evidence of how these techniques may perform in industrial use cases. In this paper, we aim to uncover the differences when such techniques are applied in industrial use cases. We conducted large scale experimental research on the performance of two state-of-the-art code clone detection techniques, SourcererCC and AutoenCODE, on both open source projects and an industrial project written in the Scala language. Our results reveal that both algorithms perform differently on the industrial project, with the largest drop in precision being 30.7%, and the largest increase in recall being 32.4%. By manually labelling samples of the industrial project by its developers, we discovered that there are substantially less Type-3 clones in the aforementioned project than that in the open source projects.","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123247075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison and Visualization of Code Clone Detection Results","authors":"Kazuki Matsushima, Katsuro Inoue","doi":"10.1109/IWSC50091.2020.9047633","DOIUrl":"https://doi.org/10.1109/IWSC50091.2020.9047633","url":null,"abstract":"Many techniques for code clone detection have been proposed and implemented as clone detectors in the past. These studies show that a result of code clone detection changes drastically for different tools and/or their detection parameters. Therefore, it is important to apply different clone detectors or different parameters and to identify the different or common parts of the obtained detection results. In this paper, we propose a method for comparison and visualization of detection results based on the correspondence of clone pairs. It enables developers to compare detection results by different tools and/or their detection parameters. Using this method, we will show the comparison results of an OSS using two code clone detectors, CCFinderX and NiCad.","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128119085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blanker: A Refactor-Oriented Cloned Source Code Normalizer","authors":"Davide Pizzolotto, Katsuro Inoue","doi":"10.1109/IWSC50091.2020.9047634","DOIUrl":"https://doi.org/10.1109/IWSC50091.2020.9047634","url":null,"abstract":"Refactoring is widely practiced by developers and has become a key factor in order to increase the maintainability of software. However, code clones pose a threat in any refactor process due to the fact that a developer should edit identical portions of code more than once. Despite the numerous researches in this topic, most of the results are focused on discovering type-3 and type-4 clones, that require an higher effort to be refactored and removed. In this paper we present our tool, Blanker, that searches and unifies equivalent statements available in the language before feeding the source to an existing code clone detector limited to type-2 clones. This step acts as a normalization step and produces refactorable results without the error introduced by potentially unrelated added statements (like in type-3 clones), that would be unsuitable for refactoring purposes, and with added flexibility compared to checking for identical code portions (like in type-2 clones). We used NiCad to detect clones before and after our normalization step and found up to 10% more type-2 clones after our normalization, all of them being refactor candidates.","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114365278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pedro M. Caldeira, Kazunori Sakamoto, H. Washizaki, Y. Fukazawa, Takahisa Shimada
{"title":"Improving Syntactical Clone Detection Methods through the Use of an Intermediate Representation","authors":"Pedro M. Caldeira, Kazunori Sakamoto, H. Washizaki, Y. Fukazawa, Takahisa Shimada","doi":"10.1109/IWSC50091.2020.9047637","DOIUrl":"https://doi.org/10.1109/IWSC50091.2020.9047637","url":null,"abstract":"Detection of type-3 and type-4 clones remains a difficult task. Current methods are complex, both on a conceptual and computational level. Similarly, their usage requires substantial implementation efforts. Instead of creating yet another method, it might be more productive to combine the simplicity of syntactic approaches with the abstractions granted by intermediate representations (IR). To this end, we devised a c-like IR based on LLVM and ran NiCad on it (LLNiCad). To establish whether the clone detection capabilities of syntactic approaches can be improved through an IR, we compared NiCad and LLNiCad on three open source projects taken from Krutz's benchmark and a subset of Google code jam solutions. In our results, the f1-score of LLNiCad consistently outperforms NiCad. Indeed, for all clone types in Krutz's benchmark, LLNiCad has a f1-score that is 37% higher than NiCad; with both better precision and recall. For type-4 clones in our GCJ benchmark, the f1-score of LLNiCad also outperforms CCCD (a semantic clone detector) by 44%. These findings suggest that IRs are beneficial for improving clone detection and that they have a larger impact on type-3 and type-4 clones.","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114695216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CPPCD: A Token-Based Approach to Detecting Potential Clones","authors":"Yu-Liang Hung, Shingo Takada","doi":"10.1109/IWSC50091.2020.9047636","DOIUrl":"https://doi.org/10.1109/IWSC50091.2020.9047636","url":null,"abstract":"Most state-of-the-art clone detection approaches are aimed at finding clones accurately and/or efficiently. Yet, whether a code fragment is a clone often varies according to different people's perspectives and different clone detection tools. In this paper, we present CPPCD (CP-based Potential Clone Detection), a novel token-based approach to detecting potential clones. It generates CP (clone probability) values and CP distribution graphs for developers to decide if a method is a clone. We have evaluated our approach on large-scale software projects written in Java. Our experiments suggest that the majority of clones have CP values greater than or equal to 0.75 and that CPPCD is an accurate (with respect to Type-1, Type-2, and Type-3 clones), efficient, and scalable approach to detecting potential clones.","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131194713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2020 IEEE 14th International Workshop on Software Clones","authors":"","doi":"10.1109/iwsc50091.2020.9047630","DOIUrl":"https://doi.org/10.1109/iwsc50091.2020.9047630","url":null,"abstract":"","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134006620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Empirical Study on Accidental Cross-Project Code Clones","authors":"Mitchel Pyl, B. V. Bladel, S. Demeyer","doi":"10.1109/IWSC50091.2020.9047641","DOIUrl":"https://doi.org/10.1109/IWSC50091.2020.9047641","url":null,"abstract":"Software clones are considered a code smell in software development. While most clones occur due to developers copy - paste behaviour, some of them arise accidentally as a symptom of coding idioms. If such accidental clones occur across projects, then they may indicate a lack of abstraction in the underlying programming language or libraries. In this research, we study accidental cross-project clones from the perspective of missing abstraction. We discuss the six cases of frequent cross-project clones, three of them symptoms of missing language features (which have been resolved with the release of Java 7 and Java 12), and two of them symptoms of missing library features (which have not yet been addressed).","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133095834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge","authors":"Farouq Al-Omari, C. Roy, Tonghao Chen","doi":"10.1109/IWSC50091.2020.9047643","DOIUrl":"https://doi.org/10.1109/IWSC50091.2020.9047643","url":null,"abstract":"Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C# and Python.","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130214816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Semantic Clone Detection via Probabilistic Software Modeling","authors":"Hannes Thaller, L. Linsbauer, Alexander Egyed","doi":"10.1109/IWSC50091.2020.9047635","DOIUrl":"https://doi.org/10.1109/IWSC50091.2020.9047635","url":null,"abstract":"Semantic clones are program components with similar behavior, but different textual representation. Semantic similarity is hard to detect, and semantic clone detection is still an open issue. We present semantic clone detection via Probabilistic Software Modeling (PSM) as a robust method for detecting semantically equivalent methods. PSM inspects the structure and runtime behavior of a program and synthesizes a network of Probabilistic Models (PMs). Each PM in the network represents a method in the program and is capable of generating and evaluating runtime events. We leverage these capabilities to accurately find semantic clones. Results show that the approach can detect semantic clones in the complete absence of syntactic similarity with high precision and low error rates.","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124899587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}