{"title":"Research on manifold nonnegative matrix decomposition algorithm for weakly supervised text classification","authors":"Weiqiang Xiao, Xiaoli Chai, Danmo Zhang","doi":"10.1117/12.2667485","DOIUrl":null,"url":null,"abstract":"Traditional text classifiers that rely on supervised learning methods always require a large number of labeled documents. Labeling the documents often requires a certain amount of expertise to ensure the accuracy, which is time-consuming and costly. Therefore, a dataless text classifcation method around a small number of easily accessible label descriptions, ie, seed words,rather than surrounding the labeled documents to provide the supervision information for the classification task, shows a good development prospect. However, since the size of the seed word set is much smaller than the word set contained in the document , many documents do not contain any seed words or even contain some irrelevant seed words, which limits the effect of the seed word supervision. The manifold assumption suggests that highly similar texts tend to belong to the same category, so we maintain a local neighborhood structure for each document and construct a manifold regularizer to spread limited the supervised information between similar documents. We propose a Laplacian Nonnegative Matrix Factorization (LapNMF) method,adding the seed word prior information and document manifold into the framework of non-negative matrix factorization. And use the block corrdinate desent method to solve the problem. Experiments show that in most cases, our LapNMF performs better than the current weakly supervised classification methods, showing certain competitiveness.","PeriodicalId":137914,"journal":{"name":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2667485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Traditional text classifiers that rely on supervised learning methods always require a large number of labeled documents. Labeling the documents often requires a certain amount of expertise to ensure the accuracy, which is time-consuming and costly. Therefore, a dataless text classifcation method around a small number of easily accessible label descriptions, ie, seed words,rather than surrounding the labeled documents to provide the supervision information for the classification task, shows a good development prospect. However, since the size of the seed word set is much smaller than the word set contained in the document , many documents do not contain any seed words or even contain some irrelevant seed words, which limits the effect of the seed word supervision. The manifold assumption suggests that highly similar texts tend to belong to the same category, so we maintain a local neighborhood structure for each document and construct a manifold regularizer to spread limited the supervised information between similar documents. We propose a Laplacian Nonnegative Matrix Factorization (LapNMF) method,adding the seed word prior information and document manifold into the framework of non-negative matrix factorization. And use the block corrdinate desent method to solve the problem. Experiments show that in most cases, our LapNMF performs better than the current weakly supervised classification methods, showing certain competitiveness.