{"title":"多智能体机器学习(MML)抄袭检测系统","authors":"Hadj Ahmed Bouarara","doi":"10.4018/IJATS.2016010101","DOIUrl":null,"url":null,"abstract":"Day after day the cases of plagiarism increase and become a crucial problem in the modern world caused by the quantity of textual information available in the web. As data mining becomes the foundation for many different domains, one of its chores is a text categorization that can be used in order to resolve the impediment of automatic plagiarism detection. This chapter is devoted to a new approach for combating plagiarism named MML (Multi-agents Machine Learning system) composed of three modules: data preparation and digitalization, using n-gram character or bag of words as methods for the text representation, TF*IDF as weighting to calculate the importance of each term in the corpus in order to transform each document to a vector, and learning and vote phase using three supervised learning algorithms (decision tree c4.5, naïve Bayes, and support vector machine).","PeriodicalId":93648,"journal":{"name":"International journal of agent technologies and systems","volume":"26 1","pages":"1-17"},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Multi-Agents Machine Learning (MML) System for Plagiarism Detection\",\"authors\":\"Hadj Ahmed Bouarara\",\"doi\":\"10.4018/IJATS.2016010101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Day after day the cases of plagiarism increase and become a crucial problem in the modern world caused by the quantity of textual information available in the web. As data mining becomes the foundation for many different domains, one of its chores is a text categorization that can be used in order to resolve the impediment of automatic plagiarism detection. This chapter is devoted to a new approach for combating plagiarism named MML (Multi-agents Machine Learning system) composed of three modules: data preparation and digitalization, using n-gram character or bag of words as methods for the text representation, TF*IDF as weighting to calculate the importance of each term in the corpus in order to transform each document to a vector, and learning and vote phase using three supervised learning algorithms (decision tree c4.5, naïve Bayes, and support vector machine).\",\"PeriodicalId\":93648,\"journal\":{\"name\":\"International journal of agent technologies and systems\",\"volume\":\"26 1\",\"pages\":\"1-17\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of agent technologies and systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/IJATS.2016010101\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of agent technologies and systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJATS.2016010101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Agents Machine Learning (MML) System for Plagiarism Detection
Day after day the cases of plagiarism increase and become a crucial problem in the modern world caused by the quantity of textual information available in the web. As data mining becomes the foundation for many different domains, one of its chores is a text categorization that can be used in order to resolve the impediment of automatic plagiarism detection. This chapter is devoted to a new approach for combating plagiarism named MML (Multi-agents Machine Learning system) composed of three modules: data preparation and digitalization, using n-gram character or bag of words as methods for the text representation, TF*IDF as weighting to calculate the importance of each term in the corpus in order to transform each document to a vector, and learning and vote phase using three supervised learning algorithms (decision tree c4.5, naïve Bayes, and support vector machine).