{"title":"源代码抄袭检测:一种机器智能方法","authors":"Akhil Eppa, Anirudh Murali","doi":"10.1109/ICAECC54045.2022.9716671","DOIUrl":null,"url":null,"abstract":"Taking someone else’s work and claiming it as your own is termed as plagiarism. Plagiarism is a concerning issue in every field of education. There are various tools to detect plagiarism and help maintain the necessary integrity. This paper deals with plagiarism in the specific category of C programming assignments. Various machine learning and deep learning methods are investigated in detail along with the pros and cons. Concepts such as KNN, SVM, D-Trees, RNNs, and attention based transformer networks are tested for their effectiveness in detecting plagiarism in source code. A comprehensive dataset consisting of code pairs was prepared during the course of this research. Results obtained show that Machine Learning and Deep Learning methods provide better accuracy at detecting plagiarism than the current state of the art plagiarism detectors that use a text based approach. A tool is also presented to utilize the built software to detect plagiarism in source code.","PeriodicalId":199351,"journal":{"name":"2022 IEEE Fourth International Conference on Advances in Electronics, Computers and Communications (ICAECC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Source Code Plagiarism Detection: A Machine Intelligence Approach\",\"authors\":\"Akhil Eppa, Anirudh Murali\",\"doi\":\"10.1109/ICAECC54045.2022.9716671\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Taking someone else’s work and claiming it as your own is termed as plagiarism. Plagiarism is a concerning issue in every field of education. There are various tools to detect plagiarism and help maintain the necessary integrity. This paper deals with plagiarism in the specific category of C programming assignments. Various machine learning and deep learning methods are investigated in detail along with the pros and cons. Concepts such as KNN, SVM, D-Trees, RNNs, and attention based transformer networks are tested for their effectiveness in detecting plagiarism in source code. A comprehensive dataset consisting of code pairs was prepared during the course of this research. Results obtained show that Machine Learning and Deep Learning methods provide better accuracy at detecting plagiarism than the current state of the art plagiarism detectors that use a text based approach. A tool is also presented to utilize the built software to detect plagiarism in source code.\",\"PeriodicalId\":199351,\"journal\":{\"name\":\"2022 IEEE Fourth International Conference on Advances in Electronics, Computers and Communications (ICAECC)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Fourth International Conference on Advances in Electronics, Computers and Communications (ICAECC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAECC54045.2022.9716671\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Fourth International Conference on Advances in Electronics, Computers and Communications (ICAECC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAECC54045.2022.9716671","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Source Code Plagiarism Detection: A Machine Intelligence Approach
Taking someone else’s work and claiming it as your own is termed as plagiarism. Plagiarism is a concerning issue in every field of education. There are various tools to detect plagiarism and help maintain the necessary integrity. This paper deals with plagiarism in the specific category of C programming assignments. Various machine learning and deep learning methods are investigated in detail along with the pros and cons. Concepts such as KNN, SVM, D-Trees, RNNs, and attention based transformer networks are tested for their effectiveness in detecting plagiarism in source code. A comprehensive dataset consisting of code pairs was prepared during the course of this research. Results obtained show that Machine Learning and Deep Learning methods provide better accuracy at detecting plagiarism than the current state of the art plagiarism detectors that use a text based approach. A tool is also presented to utilize the built software to detect plagiarism in source code.