Chukwuka Victor Obionwu, R. Kumar, Suhas Shantharam, David Broneske, Gunter Saake
{"title":"Semantic Relatedness: A Strategy for Plagiarism Detection in SQL Assignments","authors":"Chukwuka Victor Obionwu, R. Kumar, Suhas Shantharam, David Broneske, Gunter Saake","doi":"10.1109/WCCCT56755.2023.10052438","DOIUrl":null,"url":null,"abstract":"The Structured Query Language is the de facto language for defining, and manipulating data in a relational database. Thus, its mastery is important for students in computer science related discipline. Ergo, most universities offer more different courses that enable students to acquire SQL skill. However, this objective is plagued by code plagiarism, a major problem affecting the academic community. While plagiarism detection in other languages are detectable, detecting copied code in SQL is a difficult task to solve as most of the queries are relatively same, which makes plagiarism detection strategies ineffective when the objects are SQL queries. Research efforts in natural language processing has seen the development of several strategies that has facilitated complex evaluation of text strings. In this endavour, we liverage semantic similarity, a method that enables the evaluation of the semantic textual similarity between text strings, and the idea of distance between words, and the likelyness of their meaning to detect plagiarised SQL queries by semantically evaluating raw student query submissions from our SQL courses which are offered every semester. Result show that the semantic similarity strategy was able to detect code similarity, which translated to plagiarism in a considerable umber of submissions. In all, we describe in this paper, our plagiarism detection strategy, the limitations of our strategy, possible means that may be effective at addressing these limitations.","PeriodicalId":112978,"journal":{"name":"2023 6th World Conference on Computing and Communication Technologies (WCCCT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 6th World Conference on Computing and Communication Technologies (WCCCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WCCCT56755.2023.10052438","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The Structured Query Language is the de facto language for defining, and manipulating data in a relational database. Thus, its mastery is important for students in computer science related discipline. Ergo, most universities offer more different courses that enable students to acquire SQL skill. However, this objective is plagued by code plagiarism, a major problem affecting the academic community. While plagiarism detection in other languages are detectable, detecting copied code in SQL is a difficult task to solve as most of the queries are relatively same, which makes plagiarism detection strategies ineffective when the objects are SQL queries. Research efforts in natural language processing has seen the development of several strategies that has facilitated complex evaluation of text strings. In this endavour, we liverage semantic similarity, a method that enables the evaluation of the semantic textual similarity between text strings, and the idea of distance between words, and the likelyness of their meaning to detect plagiarised SQL queries by semantically evaluating raw student query submissions from our SQL courses which are offered every semester. Result show that the semantic similarity strategy was able to detect code similarity, which translated to plagiarism in a considerable umber of submissions. In all, we describe in this paper, our plagiarism detection strategy, the limitations of our strategy, possible means that may be effective at addressing these limitations.