SPPlagiarise: A Tool for Generating Simulated Semantics-Preserving Plagiarism of Java Source Code

Hayden Cheers, Yuqing Lin, Shamus P. Smith
{"title":"SPPlagiarise: A Tool for Generating Simulated Semantics-Preserving Plagiarism of Java Source Code","authors":"Hayden Cheers, Yuqing Lin, Shamus P. Smith","doi":"10.1109/ICSESS47205.2019.9040853","DOIUrl":null,"url":null,"abstract":"Source code plagiarism is a common occurrence in undergraduate computer science education. Studies have indicated at least 50% of students plagiarize during their undergraduate career. To identity cases of source code plagiarism, many source code plagiarism detection tools have been proposed. However, conclusively determining the effectiveness these tools at identifying cases of source code plagiarism is difficult. Evaluations are typically performed using unreleased data sets. Without a comprehensive publicly available data set for source code plagiarism detection evaluation, it is difficult to perform an unbiased and reproducible evaluations of tools. To address this problem, this paper presents a tool, SPPlagiarise, which is designed to produce simulated source code plagiarism of Java source code. SPPlagiarise applies a random number of semantics-preserving source code obfuscations at random locations to a Java code base to simulate source code plagiarism. In this paper the design of the tool and an evaluation of a generated plagiarism data set is presented.","PeriodicalId":203944,"journal":{"name":"2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS47205.2019.9040853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Source code plagiarism is a common occurrence in undergraduate computer science education. Studies have indicated at least 50% of students plagiarize during their undergraduate career. To identity cases of source code plagiarism, many source code plagiarism detection tools have been proposed. However, conclusively determining the effectiveness these tools at identifying cases of source code plagiarism is difficult. Evaluations are typically performed using unreleased data sets. Without a comprehensive publicly available data set for source code plagiarism detection evaluation, it is difficult to perform an unbiased and reproducible evaluations of tools. To address this problem, this paper presents a tool, SPPlagiarise, which is designed to produce simulated source code plagiarism of Java source code. SPPlagiarise applies a random number of semantics-preserving source code obfuscations at random locations to a Java code base to simulate source code plagiarism. In this paper the design of the tool and an evaluation of a generated plagiarism data set is presented.
spplagiarism:一个生成模拟语义保存抄袭Java源代码的工具
源代码抄袭是计算机本科教育中常见的现象。研究表明,至少有50%的学生在本科期间有过抄袭行为。为了识别源代码剽窃的情况,已经提出了许多源代码剽窃检测工具。然而,最终确定这些工具在识别源代码剽窃案例方面的有效性是困难的。通常使用未发布的数据集执行评估。如果没有一个全面的公开可用的源代码抄袭检测评估数据集,就很难对工具进行公正和可重复的评估。为了解决这个问题,本文提出了一个工具spplagiarism,它被设计用来模拟Java源代码的抄袭。spplagiarism在Java代码库的随机位置应用随机数量的保持语义的源代码混淆,以模拟源代码抄袭。本文介绍了该工具的设计和对生成的抄袭数据集的评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信