一遍预处理基于令牌的源代码克隆检测

Dingkun Li, Minghao Piao, H. Shon, K. Ryu, Incheon Paik
{"title":"一遍预处理基于令牌的源代码克隆检测","authors":"Dingkun Li, Minghao Piao, H. Shon, K. Ryu, Incheon Paik","doi":"10.1109/ICAWST.2014.6981824","DOIUrl":null,"url":null,"abstract":"Token-based source code clones detection provides a promising way to detect the source code duplication and re-dundancy. While preprocessing of clone detection plays an important role in KDD for further processing as the old saying goes: well begun is half done. However, processing unstructured source code files of large software systems is really challenging and time or space consuming. This paper introduces a novel way to clean, tokenize and transform the source code into the appropriate form for mining. A tool called OPP (One Pass Preprocessor) has been developed to preprocess the source code files efficiently and flexibly. The paper experimented on three large open source projects like Wildfly1.02 Linux core-3.6, VTK of different host languages, and the result showed that our tool has great power and flexibility to preprocess the source code files and products high quality output.","PeriodicalId":359404,"journal":{"name":"2014 IEEE 6th International Conference on Awareness Science and Technology (iCAST)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"One pass preprocessing for token-based source code clone detection\",\"authors\":\"Dingkun Li, Minghao Piao, H. Shon, K. Ryu, Incheon Paik\",\"doi\":\"10.1109/ICAWST.2014.6981824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Token-based source code clones detection provides a promising way to detect the source code duplication and re-dundancy. While preprocessing of clone detection plays an important role in KDD for further processing as the old saying goes: well begun is half done. However, processing unstructured source code files of large software systems is really challenging and time or space consuming. This paper introduces a novel way to clean, tokenize and transform the source code into the appropriate form for mining. A tool called OPP (One Pass Preprocessor) has been developed to preprocess the source code files efficiently and flexibly. The paper experimented on three large open source projects like Wildfly1.02 Linux core-3.6, VTK of different host languages, and the result showed that our tool has great power and flexibility to preprocess the source code files and products high quality output.\",\"PeriodicalId\":359404,\"journal\":{\"name\":\"2014 IEEE 6th International Conference on Awareness Science and Technology (iCAST)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 6th International Conference on Awareness Science and Technology (iCAST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAWST.2014.6981824\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 6th International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAWST.2014.6981824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

基于令牌的源代码克隆检测为检测源代码复制和冗余提供了一种很有前途的方法。而克隆检测的预处理在KDD的进一步处理中起着重要的作用,俗话说:好的开始是成功的一半。然而,处理大型软件系统的非结构化源代码文件确实具有挑战性,并且耗费时间或空间。本文介绍了一种新的方法来清理、标记源代码并将其转换为合适的挖掘形式。为了高效灵活地对源代码文件进行预处理,开发了一种名为OPP (One Pass Preprocessor)的工具。本文在不同主机语言的Wildfly1.02 Linux core-3.6、VTK三个大型开源项目上进行了实验,结果表明我们的工具对源代码文件的预处理具有强大的功能和灵活性,并能产生高质量的输出。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
One pass preprocessing for token-based source code clone detection
Token-based source code clones detection provides a promising way to detect the source code duplication and re-dundancy. While preprocessing of clone detection plays an important role in KDD for further processing as the old saying goes: well begun is half done. However, processing unstructured source code files of large software systems is really challenging and time or space consuming. This paper introduces a novel way to clean, tokenize and transform the source code into the appropriate form for mining. A tool called OPP (One Pass Preprocessor) has been developed to preprocess the source code files efficiently and flexibly. The paper experimented on three large open source projects like Wildfly1.02 Linux core-3.6, VTK of different host languages, and the result showed that our tool has great power and flexibility to preprocess the source code files and products high quality output.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信