Improving Syntactical Clone Detection Methods through the Use of an Intermediate Representation

Pedro M. Caldeira, Kazunori Sakamoto, H. Washizaki, Y. Fukazawa, Takahisa Shimada
{"title":"Improving Syntactical Clone Detection Methods through the Use of an Intermediate Representation","authors":"Pedro M. Caldeira, Kazunori Sakamoto, H. Washizaki, Y. Fukazawa, Takahisa Shimada","doi":"10.1109/IWSC50091.2020.9047637","DOIUrl":null,"url":null,"abstract":"Detection of type-3 and type-4 clones remains a difficult task. Current methods are complex, both on a conceptual and computational level. Similarly, their usage requires substantial implementation efforts. Instead of creating yet another method, it might be more productive to combine the simplicity of syntactic approaches with the abstractions granted by intermediate representations (IR). To this end, we devised a c-like IR based on LLVM and ran NiCad on it (LLNiCad). To establish whether the clone detection capabilities of syntactic approaches can be improved through an IR, we compared NiCad and LLNiCad on three open source projects taken from Krutz's benchmark and a subset of Google code jam solutions. In our results, the f1-score of LLNiCad consistently outperforms NiCad. Indeed, for all clone types in Krutz's benchmark, LLNiCad has a f1-score that is 37% higher than NiCad; with both better precision and recall. For type-4 clones in our GCJ benchmark, the f1-score of LLNiCad also outperforms CCCD (a semantic clone detector) by 44%. These findings suggest that IRs are beneficial for improving clone detection and that they have a larger impact on type-3 and type-4 clones.","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWSC50091.2020.9047637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Detection of type-3 and type-4 clones remains a difficult task. Current methods are complex, both on a conceptual and computational level. Similarly, their usage requires substantial implementation efforts. Instead of creating yet another method, it might be more productive to combine the simplicity of syntactic approaches with the abstractions granted by intermediate representations (IR). To this end, we devised a c-like IR based on LLVM and ran NiCad on it (LLNiCad). To establish whether the clone detection capabilities of syntactic approaches can be improved through an IR, we compared NiCad and LLNiCad on three open source projects taken from Krutz's benchmark and a subset of Google code jam solutions. In our results, the f1-score of LLNiCad consistently outperforms NiCad. Indeed, for all clone types in Krutz's benchmark, LLNiCad has a f1-score that is 37% higher than NiCad; with both better precision and recall. For type-4 clones in our GCJ benchmark, the f1-score of LLNiCad also outperforms CCCD (a semantic clone detector) by 44%. These findings suggest that IRs are beneficial for improving clone detection and that they have a larger impact on type-3 and type-4 clones.
利用中间表示改进语法克隆检测方法
3型和4型克隆的检测仍然是一项艰巨的任务。当前的方法在概念和计算层面上都很复杂。类似地,它们的使用需要大量的实现工作。与其创建另一种方法,不如将语法方法的简单性与中间表示(IR)所赋予的抽象结合起来,这样可能会更有效率。为此,我们设计了一个基于LLVM的类c IR,并在其上运行NiCad (LLNiCad)。为了确定是否可以通过IR改进语法方法的克隆检测能力,我们在三个开源项目上比较了NiCad和LLNiCad,这些项目来自Krutz的基准测试和Google代码堵塞解决方案的一个子集。在我们的结果中,LLNiCad的f1得分始终优于NiCad。实际上,在Krutz的基准测试中,对于所有克隆类型,LLNiCad的f1得分比NiCad高37%;精确度和召回率都更好。在我们的GCJ基准测试中,对于类型4克隆,LLNiCad的f1分数也比CCCD(语义克隆检测器)高出44%。这些发现表明,IRs有利于提高克隆检测,并且它们对3型和4型克隆有更大的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信