弱监督文本分类的流形非负矩阵分解算法研究

Weiqiang Xiao, Xiaoli Chai, Danmo Zhang
{"title":"弱监督文本分类的流形非负矩阵分解算法研究","authors":"Weiqiang Xiao, Xiaoli Chai, Danmo Zhang","doi":"10.1117/12.2667485","DOIUrl":null,"url":null,"abstract":"Traditional text classifiers that rely on supervised learning methods always require a large number of labeled documents. Labeling the documents often requires a certain amount of expertise to ensure the accuracy, which is time-consuming and costly. Therefore, a dataless text classifcation method around a small number of easily accessible label descriptions, ie, seed words,rather than surrounding the labeled documents to provide the supervision information for the classification task, shows a good development prospect. However, since the size of the seed word set is much smaller than the word set contained in the document , many documents do not contain any seed words or even contain some irrelevant seed words, which limits the effect of the seed word supervision. The manifold assumption suggests that highly similar texts tend to belong to the same category, so we maintain a local neighborhood structure for each document and construct a manifold regularizer to spread limited the supervised information between similar documents. We propose a Laplacian Nonnegative Matrix Factorization (LapNMF) method,adding the seed word prior information and document manifold into the framework of non-negative matrix factorization. And use the block corrdinate desent method to solve the problem. Experiments show that in most cases, our LapNMF performs better than the current weakly supervised classification methods, showing certain competitiveness.","PeriodicalId":137914,"journal":{"name":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on manifold nonnegative matrix decomposition algorithm for weakly supervised text classification\",\"authors\":\"Weiqiang Xiao, Xiaoli Chai, Danmo Zhang\",\"doi\":\"10.1117/12.2667485\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional text classifiers that rely on supervised learning methods always require a large number of labeled documents. Labeling the documents often requires a certain amount of expertise to ensure the accuracy, which is time-consuming and costly. Therefore, a dataless text classifcation method around a small number of easily accessible label descriptions, ie, seed words,rather than surrounding the labeled documents to provide the supervision information for the classification task, shows a good development prospect. However, since the size of the seed word set is much smaller than the word set contained in the document , many documents do not contain any seed words or even contain some irrelevant seed words, which limits the effect of the seed word supervision. The manifold assumption suggests that highly similar texts tend to belong to the same category, so we maintain a local neighborhood structure for each document and construct a manifold regularizer to spread limited the supervised information between similar documents. We propose a Laplacian Nonnegative Matrix Factorization (LapNMF) method,adding the seed word prior information and document manifold into the framework of non-negative matrix factorization. And use the block corrdinate desent method to solve the problem. Experiments show that in most cases, our LapNMF performs better than the current weakly supervised classification methods, showing certain competitiveness.\",\"PeriodicalId\":137914,\"journal\":{\"name\":\"International Conference on Artificial Intelligence, Virtual Reality, and Visualization\",\"volume\":\"94 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Artificial Intelligence, Virtual Reality, and Visualization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2667485\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence, Virtual Reality, and Visualization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2667485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

依赖于监督学习方法的传统文本分类器总是需要大量的标记文档。标记文档通常需要一定的专业知识来确保准确性,这既耗时又昂贵。因此,一种围绕少量易获取的标签描述,即种子词,而不是围绕被标注的文档为分类任务提供监督信息的无数据文本分类方法,显示出良好的发展前景。然而,由于种子词集的大小远远小于文档中包含的词集,因此许多文档不包含任何种子词,甚至包含一些不相关的种子词,这限制了种子词监督的效果。流形假设表明高度相似的文本往往属于同一类别,因此我们为每个文档保持一个局部邻域结构,并构造一个流形正则化器来在相似的文档之间传播有限的监督信息。提出了一种拉普拉斯非负矩阵分解(LapNMF)方法,将种子词先验信息和文档流形加入到非负矩阵分解框架中。并采用分块坐标表示法解决了该问题。实验表明,在大多数情况下,我们的LapNMF比现有的弱监督分类方法表现得更好,表现出一定的竞争力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Research on manifold nonnegative matrix decomposition algorithm for weakly supervised text classification
Traditional text classifiers that rely on supervised learning methods always require a large number of labeled documents. Labeling the documents often requires a certain amount of expertise to ensure the accuracy, which is time-consuming and costly. Therefore, a dataless text classifcation method around a small number of easily accessible label descriptions, ie, seed words,rather than surrounding the labeled documents to provide the supervision information for the classification task, shows a good development prospect. However, since the size of the seed word set is much smaller than the word set contained in the document , many documents do not contain any seed words or even contain some irrelevant seed words, which limits the effect of the seed word supervision. The manifold assumption suggests that highly similar texts tend to belong to the same category, so we maintain a local neighborhood structure for each document and construct a manifold regularizer to spread limited the supervised information between similar documents. We propose a Laplacian Nonnegative Matrix Factorization (LapNMF) method,adding the seed word prior information and document manifold into the framework of non-negative matrix factorization. And use the block corrdinate desent method to solve the problem. Experiments show that in most cases, our LapNMF performs better than the current weakly supervised classification methods, showing certain competitiveness.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信