SemanticCloneBench:使用众源知识的语义代码克隆基准

Farouq Al-Omari, C. Roy, Tonghao Chen
{"title":"SemanticCloneBench:使用众源知识的语义代码克隆基准","authors":"Farouq Al-Omari, C. Roy, Tonghao Chen","doi":"10.1109/IWSC50091.2020.9047643","DOIUrl":null,"url":null,"abstract":"Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C# and Python.","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge\",\"authors\":\"Farouq Al-Omari, C. Roy, Tonghao Chen\",\"doi\":\"10.1109/IWSC50091.2020.9047643\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C# and Python.\",\"PeriodicalId\":127830,\"journal\":{\"name\":\"2020 IEEE 14th International Workshop on Software Clones (IWSC)\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 14th International Workshop on Software Clones (IWSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWSC50091.2020.9047643\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWSC50091.2020.9047643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

不仅需要对新提出的代码克隆检测技术进行评估和比较,还需要对现有的技术和工具进行评估和比较。此评估过程可以通过手动评估报告的克隆或使用基准来完成。可用基准测试的主要限制包括:它们仅限于一种编程语言;它们在选定的系统中有有限数量的克隆对;它们需要手动验证;它们不支持所有类型的代码克隆。为了克服这些限制,我们提出了一种方法,以最少的人工验证为不同的编程语言生成广泛的语义克隆基准。我们的技术是基于参与众包信息网站Stack Overflow的开发人员提供的知识。我们对Stack Overflow答案中的源代码应用了自动过滤、选择和验证。最后,我们为Java、C、c#和Python语言构建了一个包含4000个克隆对的语义代码克隆基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge
Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C# and Python.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信