源代码抄袭检测系统的性能评价

{"title":"源代码抄袭检测系统的性能评价","authors":"","doi":"10.24086/cocos2022/paper.732","DOIUrl":null,"url":null,"abstract":"Plagiarism Detection Systems are particularly useful in identifying plagiarism in the educational sector, where scientific publications and articles are common. Plagiarism occurs when someone replicates a piece of work without permission or citation from the original creator. Because of the advancement of communication and information technologies (ICT) and the accessibility of scientific materials on the internet, plagiarism detection has become a top priority and due to the broad availability of freeware text editors, detecting source code plagiarism has become a big difficulty. There have already been several research on the many forms of plagiarism detection algorithms used in identification systems, as well as source code plagiarism detection. This work suggests a strategy that combines TF-IDF transformations with a Random Forest Classifier to achieve a 93.5 percent accuracy rate, which is high when compared to previous strategies. The suggested system is implemented using the Python programming language.","PeriodicalId":137930,"journal":{"name":"4th International Conference on Communication Engineering and Computer Science (CIC-COCOS’2022)","volume":"281 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Evaluation of Source Code Plagiarism Detection System\",\"authors\":\"\",\"doi\":\"10.24086/cocos2022/paper.732\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Plagiarism Detection Systems are particularly useful in identifying plagiarism in the educational sector, where scientific publications and articles are common. Plagiarism occurs when someone replicates a piece of work without permission or citation from the original creator. Because of the advancement of communication and information technologies (ICT) and the accessibility of scientific materials on the internet, plagiarism detection has become a top priority and due to the broad availability of freeware text editors, detecting source code plagiarism has become a big difficulty. There have already been several research on the many forms of plagiarism detection algorithms used in identification systems, as well as source code plagiarism detection. This work suggests a strategy that combines TF-IDF transformations with a Random Forest Classifier to achieve a 93.5 percent accuracy rate, which is high when compared to previous strategies. The suggested system is implemented using the Python programming language.\",\"PeriodicalId\":137930,\"journal\":{\"name\":\"4th International Conference on Communication Engineering and Computer Science (CIC-COCOS’2022)\",\"volume\":\"281 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"4th International Conference on Communication Engineering and Computer Science (CIC-COCOS’2022)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24086/cocos2022/paper.732\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"4th International Conference on Communication Engineering and Computer Science (CIC-COCOS’2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24086/cocos2022/paper.732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在科学出版物和文章普遍存在的教育领域,剽窃检测系统在识别剽窃方面特别有用。剽窃是指某人在未经原作者许可或引用的情况下复制一件作品。由于通信和信息技术(ICT)的进步以及科学材料在互联网上的可访问性,剽窃检测已成为重中之重,并且由于免费文本编辑器的广泛可用性,检测源代码剽窃已成为一个很大的困难。对于识别系统中使用的多种形式的剽窃检测算法,以及源代码剽窃检测,已经有了一些研究。这项工作提出了一种将TF-IDF转换与随机森林分类器相结合的策略,可以实现93.5%的准确率,与以前的策略相比,这是很高的。建议的系统是使用Python编程语言实现的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance Evaluation of Source Code Plagiarism Detection System
Plagiarism Detection Systems are particularly useful in identifying plagiarism in the educational sector, where scientific publications and articles are common. Plagiarism occurs when someone replicates a piece of work without permission or citation from the original creator. Because of the advancement of communication and information technologies (ICT) and the accessibility of scientific materials on the internet, plagiarism detection has become a top priority and due to the broad availability of freeware text editors, detecting source code plagiarism has become a big difficulty. There have already been several research on the many forms of plagiarism detection algorithms used in identification systems, as well as source code plagiarism detection. This work suggests a strategy that combines TF-IDF transformations with a Random Forest Classifier to achieve a 93.5 percent accuracy rate, which is high when compared to previous strategies. The suggested system is implemented using the Python programming language.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信