SemanticCloneBench:使用众源知识的语义代码克隆基准

2020 IEEE 14th International Workshop on Software Clones (IWSC) Pub Date : 2020-02-01 DOI:10.1109/IWSC50091.2020.9047643

Farouq Al-Omari, C. Roy, Tonghao Chen

{"title":"SemanticCloneBench:使用众源知识的语义代码克隆基准","authors":"Farouq Al-Omari, C. Roy, Tonghao Chen","doi":"10.1109/IWSC50091.2020.9047643","DOIUrl":null,"url":null,"abstract":"Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C# and Python.","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge\",\"authors\":\"Farouq Al-Omari, C. Roy, Tonghao Chen\",\"doi\":\"10.1109/IWSC50091.2020.9047643\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C# and Python.\",\"PeriodicalId\":127830,\"journal\":{\"name\":\"2020 IEEE 14th International Workshop on Software Clones (IWSC)\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 14th International Workshop on Software Clones (IWSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWSC50091.2020.9047643\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWSC50091.2020.9047643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

不仅需要对新提出的代码克隆检测技术进行评估和比较，还需要对现有的技术和工具进行评估和比较。此评估过程可以通过手动评估报告的克隆或使用基准来完成。可用基准测试的主要限制包括:它们仅限于一种编程语言;它们在选定的系统中有有限数量的克隆对;它们需要手动验证;它们不支持所有类型的代码克隆。为了克服这些限制，我们提出了一种方法，以最少的人工验证为不同的编程语言生成广泛的语义克隆基准。我们的技术是基于参与众包信息网站Stack Overflow的开发人员提供的知识。我们对Stack Overflow答案中的源代码应用了自动过滤、选择和验证。最后，我们为Java、C、c#和Python语言构建了一个包含4000个克隆对的语义代码克隆基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge

Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C# and Python.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 14th International Workshop on Software Clones (IWSC)

自引率

0.00%

发文量