{"title":"FPSS: Fingerprint-based semantic similarity detection in big data environment","authors":"M. Elhoseny, M. Zaher, A. Shehab, A. Hassanien","doi":"10.1109/INTELCIS.2017.8260066","DOIUrl":null,"url":null,"abstract":"Although the problem of plagiarism is an ancient problem that exists before the start of internet revolution, the accessibility of free and easy accessed electronic paper on the Internet complicated and increased the problem. However, there are many systems for detecting plagiarism in natural language documents. Contrary to Latin documents, the same Arabic letter can be written into three various ways based on its position in the word. The complex nature of writing Arabic documents makes such system is a big challenge. Accordingly, this paper presents a Fingerprint-Based Semantic Similarity detection system, called (FPSS) to detect plagiarism in Arabic documents. It generates a digital fingerprint (df) for each sentence and compares all the df values. Moreover, it analyzes corresponding detection schemes to detect Semantic Similarity effectively. FPSS improves the effectiveness regarding the matched similarity ratio, the precision ratio, the recall ratio, the F-measure ratio, the plagdet ratio, and the granularity ratio.","PeriodicalId":321315,"journal":{"name":"2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTELCIS.2017.8260066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Although the problem of plagiarism is an ancient problem that exists before the start of internet revolution, the accessibility of free and easy accessed electronic paper on the Internet complicated and increased the problem. However, there are many systems for detecting plagiarism in natural language documents. Contrary to Latin documents, the same Arabic letter can be written into three various ways based on its position in the word. The complex nature of writing Arabic documents makes such system is a big challenge. Accordingly, this paper presents a Fingerprint-Based Semantic Similarity detection system, called (FPSS) to detect plagiarism in Arabic documents. It generates a digital fingerprint (df) for each sentence and compares all the df values. Moreover, it analyzes corresponding detection schemes to detect Semantic Similarity effectively. FPSS improves the effectiveness regarding the matched similarity ratio, the precision ratio, the recall ratio, the F-measure ratio, the plagdet ratio, and the granularity ratio.