Hadj Ahmed Bouarara, Amine Rahmani, R. M. Hamou, Abdelmalek Amine
{"title":"Machine learning tool and meta-heuristic based on genetic algorithms for plagiarism detection over mail service","authors":"Hadj Ahmed Bouarara, Amine Rahmani, R. M. Hamou, Abdelmalek Amine","doi":"10.1109/ICIS.2014.6912125","DOIUrl":null,"url":null,"abstract":"One of the most modern problems that computer science try to resolve is the plagiarism, in this article we present a new approach for automatic plagiarism detection in world of mail service. Our system is based on the n-gram character for the representation of the texts and tfidf as weighting to calculate the importance of term in the corpus, we use also a combination between the machine learning methods as a way to detect if a document is plagiarized or not, we use pan 09 corpus for the construction and evaluation of the prediction model then we simulate a meta-heuristic method based on genetic algorithms with a variations of parameters to know if it can improve the results. The main objective of our work is to protect intellectual property and improve the efficiency of plagiarism detection system.","PeriodicalId":237256,"journal":{"name":"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2014.6912125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
One of the most modern problems that computer science try to resolve is the plagiarism, in this article we present a new approach for automatic plagiarism detection in world of mail service. Our system is based on the n-gram character for the representation of the texts and tfidf as weighting to calculate the importance of term in the corpus, we use also a combination between the machine learning methods as a way to detect if a document is plagiarized or not, we use pan 09 corpus for the construction and evaluation of the prediction model then we simulate a meta-heuristic method based on genetic algorithms with a variations of parameters to know if it can improve the results. The main objective of our work is to protect intellectual property and improve the efficiency of plagiarism detection system.