{"title":"Text Classification of Network Pyramid Scheme based on Topic Model","authors":"Pengyu Mu, Jingsha He, Nafei Zhu","doi":"10.1145/3342827.3342835","DOIUrl":null,"url":null,"abstract":"At present, the network pyramid scheme has become a major tumor that hinders social development. In order to curb the propagation of the network pyramid scheme and effectively identify the pyramid scheme text in the network, this study proposes a joint topic model, Paragraph Vector Latent Dirichlet Allocation (PV_LDA), based on the characteristics of high-yield, high rebate, hierarchical salary and text topic diversity described in the text. The model uses the paragraph as the minimum processing unit to generate the topic distribution matrix of \"high-interest rate\" and \"hierarchical salary\" from the network pyramid scheme text. The Gibbs sampling is used to derive the \"pyramid scheme\" topic distribution matrix represented by the two features, which is used for classification processing by the classifier. the classification accuracy rate for the network pyramid scheme text can reach 86.25%. The conclusions show that the topic model proposed in this paper can capture the characteristics of the pyramid scheme more reasonably.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3342827.3342835","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
At present, the network pyramid scheme has become a major tumor that hinders social development. In order to curb the propagation of the network pyramid scheme and effectively identify the pyramid scheme text in the network, this study proposes a joint topic model, Paragraph Vector Latent Dirichlet Allocation (PV_LDA), based on the characteristics of high-yield, high rebate, hierarchical salary and text topic diversity described in the text. The model uses the paragraph as the minimum processing unit to generate the topic distribution matrix of "high-interest rate" and "hierarchical salary" from the network pyramid scheme text. The Gibbs sampling is used to derive the "pyramid scheme" topic distribution matrix represented by the two features, which is used for classification processing by the classifier. the classification accuracy rate for the network pyramid scheme text can reach 86.25%. The conclusions show that the topic model proposed in this paper can capture the characteristics of the pyramid scheme more reasonably.