Go-clone:基于图嵌入的Golang克隆检测器

Cong Wang, Jian Gao, Yu Jiang, Zhenchang Xing, Huafeng Zhang, Weiliang Yin, M. Gu, Jiaguang Sun
{"title":"Go-clone:基于图嵌入的Golang克隆检测器","authors":"Cong Wang, Jian Gao, Yu Jiang, Zhenchang Xing, Huafeng Zhang, Weiliang Yin, M. Gu, Jiaguang Sun","doi":"10.1145/3293882.3338996","DOIUrl":null,"url":null,"abstract":"Golang (short for Go programming language) is a fast and compiled language, which has been increasingly used in industry due to its excellent performance on concurrent programming. Golang redefines concurrent programming grammar, making it a challenge for traditional clone detection tools and techniques. However, there exist few tools for detecting duplicates or copy-paste related bugs in Golang. Therefore, an effective and efficient code clone detector on Golang is especially needed. In this paper, we present Go-Clone, a learning-based clone detector for Golang. Go-Clone contains two modules -- the training module and the user interaction module. In the training module, firstly we parse Golang source code into llvm IR (Intermediate Representation). Secondly, we calculate LSFG (labeled semantic flow graph) for each program function automatically. Go-Clone trains a deep neural network model to encode LSFGs for similarity classification. In the user interaction module, users can choose one or more Golang projects. Go-Clone identifies and presents a list of function pairs, which are most likely clone code for user inspection. To evaluate Go-Clone's performance, we collect 6,110 commit versions from 48 Github projects to construct a Golang clone detection data set. Go-Clone can reach the value of AUC (Area Under Curve) and ACC (Accuracy) for 89.61% and 83.80% in clone detection. By testing several groups of unfamiliar data, we also demonstrates the generility of Go-Clone. The address of the abstract demo video: https://youtu.be/o5DogtYGbeo","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Go-clone: graph-embedding based clone detector for Golang\",\"authors\":\"Cong Wang, Jian Gao, Yu Jiang, Zhenchang Xing, Huafeng Zhang, Weiliang Yin, M. Gu, Jiaguang Sun\",\"doi\":\"10.1145/3293882.3338996\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Golang (short for Go programming language) is a fast and compiled language, which has been increasingly used in industry due to its excellent performance on concurrent programming. Golang redefines concurrent programming grammar, making it a challenge for traditional clone detection tools and techniques. However, there exist few tools for detecting duplicates or copy-paste related bugs in Golang. Therefore, an effective and efficient code clone detector on Golang is especially needed. In this paper, we present Go-Clone, a learning-based clone detector for Golang. Go-Clone contains two modules -- the training module and the user interaction module. In the training module, firstly we parse Golang source code into llvm IR (Intermediate Representation). Secondly, we calculate LSFG (labeled semantic flow graph) for each program function automatically. Go-Clone trains a deep neural network model to encode LSFGs for similarity classification. In the user interaction module, users can choose one or more Golang projects. Go-Clone identifies and presents a list of function pairs, which are most likely clone code for user inspection. To evaluate Go-Clone's performance, we collect 6,110 commit versions from 48 Github projects to construct a Golang clone detection data set. Go-Clone can reach the value of AUC (Area Under Curve) and ACC (Accuracy) for 89.61% and 83.80% in clone detection. By testing several groups of unfamiliar data, we also demonstrates the generility of Go-Clone. The address of the abstract demo video: https://youtu.be/o5DogtYGbeo\",\"PeriodicalId\":20624,\"journal\":{\"name\":\"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3293882.3338996\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3293882.3338996","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

Golang (Go编程语言的简称)是一种快速编译的语言,由于其在并发编程方面的优异性能,在工业中得到了越来越多的应用。Golang重新定义了并发编程语法,这对传统的克隆检测工具和技术来说是一个挑战。然而,在Golang中很少有工具可以检测复制或复制粘贴相关的错误。因此,在Golang上开发一种高效的代码克隆检测器是非常必要的。在本文中,我们提出了Go-Clone,一个基于学习的Golang克隆检测器。Go-Clone包含两个模块——培训模块和用户交互模块。在训练模块中,我们首先将Golang源代码解析为llvm IR (Intermediate Representation)。其次,我们自动计算每个程序函数的标记语义流图(LSFG)。Go-Clone训练一个深度神经网络模型来编码LSFGs进行相似性分类。在用户交互模块中,用户可以选择一个或多个Golang项目。Go-Clone识别并呈现一个函数对列表,这些函数对最可能是用于用户检查的克隆代码。为了评估Go-Clone的性能,我们从48个Github项目中收集了6110个提交版本来构建Golang克隆检测数据集。Go-Clone在克隆检测中的AUC (Area Under Curve)和ACC (Accuracy)分别达到89.61%和83.80%。通过测试几组不熟悉的数据,我们也证明了Go-Clone的通用性。抽象演示视频的地址:https://youtu.be/o5DogtYGbeo
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Go-clone: graph-embedding based clone detector for Golang
Golang (short for Go programming language) is a fast and compiled language, which has been increasingly used in industry due to its excellent performance on concurrent programming. Golang redefines concurrent programming grammar, making it a challenge for traditional clone detection tools and techniques. However, there exist few tools for detecting duplicates or copy-paste related bugs in Golang. Therefore, an effective and efficient code clone detector on Golang is especially needed. In this paper, we present Go-Clone, a learning-based clone detector for Golang. Go-Clone contains two modules -- the training module and the user interaction module. In the training module, firstly we parse Golang source code into llvm IR (Intermediate Representation). Secondly, we calculate LSFG (labeled semantic flow graph) for each program function automatically. Go-Clone trains a deep neural network model to encode LSFGs for similarity classification. In the user interaction module, users can choose one or more Golang projects. Go-Clone identifies and presents a list of function pairs, which are most likely clone code for user inspection. To evaluate Go-Clone's performance, we collect 6,110 commit versions from 48 Github projects to construct a Golang clone detection data set. Go-Clone can reach the value of AUC (Area Under Curve) and ACC (Accuracy) for 89.61% and 83.80% in clone detection. By testing several groups of unfamiliar data, we also demonstrates the generility of Go-Clone. The address of the abstract demo video: https://youtu.be/o5DogtYGbeo
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信