An effective term weighting method using random walk model for text classification

M. Islam
{"title":"An effective term weighting method using random walk model for text classification","authors":"M. Islam","doi":"10.1109/ICCITECHN.2008.4803000","DOIUrl":null,"url":null,"abstract":"Text classification may be viewed as assigning texts in a predefined set of categories. However there are many digital documents that are not organized according to their contents. So it is difficult task to find relevant documents for a user. Automatic text classification problem can solve this problem. In this paper we introduce a new random walk term weighting method for improved text classification. In our approach to weight a term, we exploit the relationship of local (term position, term frequency) and global (inverse document frequency, information gain) information of terms (vertices). Moreover, we weight terms by considering co-occurrence and semantic relation of terms as a measure of dependency. To evaluate our term weighting approach we integrate it in Rocchio text classification algorithm and experimental results show that our method performs better than other random walk models.","PeriodicalId":335795,"journal":{"name":"2008 11th International Conference on Computer and Information Technology","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 11th International Conference on Computer and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2008.4803000","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Text classification may be viewed as assigning texts in a predefined set of categories. However there are many digital documents that are not organized according to their contents. So it is difficult task to find relevant documents for a user. Automatic text classification problem can solve this problem. In this paper we introduce a new random walk term weighting method for improved text classification. In our approach to weight a term, we exploit the relationship of local (term position, term frequency) and global (inverse document frequency, information gain) information of terms (vertices). Moreover, we weight terms by considering co-occurrence and semantic relation of terms as a measure of dependency. To evaluate our term weighting approach we integrate it in Rocchio text classification algorithm and experimental results show that our method performs better than other random walk models.
一种有效的基于随机游走模型的词权加权方法
文本分类可以看作是在一组预定义的类别中分配文本。然而,有许多数字文档并不是按照其内容进行组织的。因此,用户很难找到相关的文档。自动文本分类问题可以解决这一问题。本文提出了一种改进文本分类的随机漫步项加权方法。在我们对一个术语进行加权的方法中,我们利用了术语(顶点)的局部(术语位置,术语频率)和全局(逆文档频率,信息增益)信息之间的关系。此外,我们通过考虑术语的共现性和语义关系作为依赖性的度量来对术语进行加权。为了评估我们的术语加权方法,我们将其集成到Rocchio文本分类算法中,实验结果表明我们的方法比其他随机漫步模型表现得更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信