{"title":"弱监督文本分类的流形非负矩阵分解算法研究","authors":"Weiqiang Xiao, Xiaoli Chai, Danmo Zhang","doi":"10.1117/12.2667485","DOIUrl":null,"url":null,"abstract":"Traditional text classifiers that rely on supervised learning methods always require a large number of labeled documents. Labeling the documents often requires a certain amount of expertise to ensure the accuracy, which is time-consuming and costly. Therefore, a dataless text classifcation method around a small number of easily accessible label descriptions, ie, seed words,rather than surrounding the labeled documents to provide the supervision information for the classification task, shows a good development prospect. However, since the size of the seed word set is much smaller than the word set contained in the document , many documents do not contain any seed words or even contain some irrelevant seed words, which limits the effect of the seed word supervision. The manifold assumption suggests that highly similar texts tend to belong to the same category, so we maintain a local neighborhood structure for each document and construct a manifold regularizer to spread limited the supervised information between similar documents. We propose a Laplacian Nonnegative Matrix Factorization (LapNMF) method,adding the seed word prior information and document manifold into the framework of non-negative matrix factorization. And use the block corrdinate desent method to solve the problem. Experiments show that in most cases, our LapNMF performs better than the current weakly supervised classification methods, showing certain competitiveness.","PeriodicalId":137914,"journal":{"name":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on manifold nonnegative matrix decomposition algorithm for weakly supervised text classification\",\"authors\":\"Weiqiang Xiao, Xiaoli Chai, Danmo Zhang\",\"doi\":\"10.1117/12.2667485\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional text classifiers that rely on supervised learning methods always require a large number of labeled documents. Labeling the documents often requires a certain amount of expertise to ensure the accuracy, which is time-consuming and costly. Therefore, a dataless text classifcation method around a small number of easily accessible label descriptions, ie, seed words,rather than surrounding the labeled documents to provide the supervision information for the classification task, shows a good development prospect. However, since the size of the seed word set is much smaller than the word set contained in the document , many documents do not contain any seed words or even contain some irrelevant seed words, which limits the effect of the seed word supervision. The manifold assumption suggests that highly similar texts tend to belong to the same category, so we maintain a local neighborhood structure for each document and construct a manifold regularizer to spread limited the supervised information between similar documents. We propose a Laplacian Nonnegative Matrix Factorization (LapNMF) method,adding the seed word prior information and document manifold into the framework of non-negative matrix factorization. And use the block corrdinate desent method to solve the problem. Experiments show that in most cases, our LapNMF performs better than the current weakly supervised classification methods, showing certain competitiveness.\",\"PeriodicalId\":137914,\"journal\":{\"name\":\"International Conference on Artificial Intelligence, Virtual Reality, and Visualization\",\"volume\":\"94 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Artificial Intelligence, Virtual Reality, and Visualization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2667485\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2667485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on manifold nonnegative matrix decomposition algorithm for weakly supervised text classification
Traditional text classifiers that rely on supervised learning methods always require a large number of labeled documents. Labeling the documents often requires a certain amount of expertise to ensure the accuracy, which is time-consuming and costly. Therefore, a dataless text classifcation method around a small number of easily accessible label descriptions, ie, seed words,rather than surrounding the labeled documents to provide the supervision information for the classification task, shows a good development prospect. However, since the size of the seed word set is much smaller than the word set contained in the document , many documents do not contain any seed words or even contain some irrelevant seed words, which limits the effect of the seed word supervision. The manifold assumption suggests that highly similar texts tend to belong to the same category, so we maintain a local neighborhood structure for each document and construct a manifold regularizer to spread limited the supervised information between similar documents. We propose a Laplacian Nonnegative Matrix Factorization (LapNMF) method,adding the seed word prior information and document manifold into the framework of non-negative matrix factorization. And use the block corrdinate desent method to solve the problem. Experiments show that in most cases, our LapNMF performs better than the current weakly supervised classification methods, showing certain competitiveness.