{"title":"Source code similarity detection by using data mining methods","authors":"E. Stankov, M. Jovanov, A. Bogdanova","doi":"10.2498/iti.2013.0576","DOIUrl":null,"url":null,"abstract":"Programming courses at university and high school level, and competitions in informatics (programming), often require fast assessment of received solutions of the programming tasks. This problem is usually solved by use of automated systems that check the produced output for some test cases for every solution. In our paper we present a novel approach of representation of the programming codes as vectors, and use of these vectors in data mining analysis that could produce better assessment of the solutions. We present the results of cluster analysis that go up to 88% of correctly clustered items on average.","PeriodicalId":262789,"journal":{"name":"Proceedings of the ITI 2013 35th International Conference on Information Technology Interfaces","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ITI 2013 35th International Conference on Information Technology Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2498/iti.2013.0576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Programming courses at university and high school level, and competitions in informatics (programming), often require fast assessment of received solutions of the programming tasks. This problem is usually solved by use of automated systems that check the produced output for some test cases for every solution. In our paper we present a novel approach of representation of the programming codes as vectors, and use of these vectors in data mining analysis that could produce better assessment of the solutions. We present the results of cluster analysis that go up to 88% of correctly clustered items on average.