{"title":"基于潜在语义索引和文体学的内在抄袭检测","authors":"Muna Alsallal, R. Iqbal, S. Amin, Anne E. James","doi":"10.1109/DeSE.2013.34","DOIUrl":null,"url":null,"abstract":"Plagiarism is growing increasingly for the last few years due to the rapid proliferation of information through the World Wide Web (WWW). In this paper, we present an integrated approach based on Latent Semantic Indexing (LSI) and Stylometry technique for intrinsic plagiarism detection. LSI is used for the term document matrix of dataset, whereas, stylometry is used for intrinsic approximation of human writing style. We have conducted a series of experiments to investigate the efficiency of dimensionality reduction (DR) parameter as the core for LSI technique in order to gain insights into its effects using a small corpus. Following that, we carried out comparative evaluation of our approach by using the LSI and Stylometry separately, and then applying them together. Our results show that the performance of the proposed approach was improved when an integrated approach consisting of LSI and stylometry was applied.","PeriodicalId":248716,"journal":{"name":"2013 Sixth International Conference on Developments in eSystems Engineering","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Intrinsic Plagiarism Detection Using Latent Semantic Indexing and Stylometry\",\"authors\":\"Muna Alsallal, R. Iqbal, S. Amin, Anne E. James\",\"doi\":\"10.1109/DeSE.2013.34\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Plagiarism is growing increasingly for the last few years due to the rapid proliferation of information through the World Wide Web (WWW). In this paper, we present an integrated approach based on Latent Semantic Indexing (LSI) and Stylometry technique for intrinsic plagiarism detection. LSI is used for the term document matrix of dataset, whereas, stylometry is used for intrinsic approximation of human writing style. We have conducted a series of experiments to investigate the efficiency of dimensionality reduction (DR) parameter as the core for LSI technique in order to gain insights into its effects using a small corpus. Following that, we carried out comparative evaluation of our approach by using the LSI and Stylometry separately, and then applying them together. Our results show that the performance of the proposed approach was improved when an integrated approach consisting of LSI and stylometry was applied.\",\"PeriodicalId\":248716,\"journal\":{\"name\":\"2013 Sixth International Conference on Developments in eSystems Engineering\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Sixth International Conference on Developments in eSystems Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DeSE.2013.34\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Sixth International Conference on Developments in eSystems Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DeSE.2013.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Intrinsic Plagiarism Detection Using Latent Semantic Indexing and Stylometry
Plagiarism is growing increasingly for the last few years due to the rapid proliferation of information through the World Wide Web (WWW). In this paper, we present an integrated approach based on Latent Semantic Indexing (LSI) and Stylometry technique for intrinsic plagiarism detection. LSI is used for the term document matrix of dataset, whereas, stylometry is used for intrinsic approximation of human writing style. We have conducted a series of experiments to investigate the efficiency of dimensionality reduction (DR) parameter as the core for LSI technique in order to gain insights into its effects using a small corpus. Following that, we carried out comparative evaluation of our approach by using the LSI and Stylometry separately, and then applying them together. Our results show that the performance of the proposed approach was improved when an integrated approach consisting of LSI and stylometry was applied.