A semantic weighting method for document classification based on Markov logic networks

Eunji Lee, Jeongin Kim, Junho Choi, Chang Choi, Byeongkyu Ko, Pankoo Kim
{"title":"A semantic weighting method for document classification based on Markov logic networks","authors":"Eunji Lee, Jeongin Kim, Junho Choi, Chang Choi, Byeongkyu Ko, Pankoo Kim","doi":"10.1145/2663761.2664212","DOIUrl":null,"url":null,"abstract":"This paper proposes a semantic weighting method to classify textural documents. Human lives in the world where web documents have a great potential and the amount of valuable information has been consistently growing over the year. There is a problem that finding relevant web documents corresponding to what users want is more difficult due to the huge amount of web size. For this reason, there have been many researchers overcome this problem. The most important thing is document classification. All documents are composed of numerous words. Many classification methods have been extracted keywords from documents and then analyzed keywords pattern or frequency. In this paper, we propose Category Term Weight (CTW) using keywords from documents in order to enhance performance in document classification. CTW combines keywords frequency with semantic information. The frequency and semantic information have a great potential to find similarities between documents. That is why we calculates CTW from collection of training documents. After this step, CTW from unknown document and CTW in previous Category Term Database will be applied by designed Markov Logic Networks Model. Our designed MLNs Model and existing Naive-bayse Model will be compared by applied CTW. The experimental results shows the improvement of precision compare with the existing model.","PeriodicalId":120340,"journal":{"name":"Research in Adaptive and Convergent Systems","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research in Adaptive and Convergent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2663761.2664212","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper proposes a semantic weighting method to classify textural documents. Human lives in the world where web documents have a great potential and the amount of valuable information has been consistently growing over the year. There is a problem that finding relevant web documents corresponding to what users want is more difficult due to the huge amount of web size. For this reason, there have been many researchers overcome this problem. The most important thing is document classification. All documents are composed of numerous words. Many classification methods have been extracted keywords from documents and then analyzed keywords pattern or frequency. In this paper, we propose Category Term Weight (CTW) using keywords from documents in order to enhance performance in document classification. CTW combines keywords frequency with semantic information. The frequency and semantic information have a great potential to find similarities between documents. That is why we calculates CTW from collection of training documents. After this step, CTW from unknown document and CTW in previous Category Term Database will be applied by designed Markov Logic Networks Model. Our designed MLNs Model and existing Naive-bayse Model will be compared by applied CTW. The experimental results shows the improvement of precision compare with the existing model.
一种基于马尔可夫逻辑网络的文档分类语义加权方法
提出了一种基于语义加权的文本分类方法。在人类生活的世界里,网络文档具有巨大的潜力,而且有价值的信息的数量在过去的一年里一直在持续增长。有一个问题是,由于庞大的网络规模,找到与用户想要的相对应的相关网络文档变得更加困难。为此,已有许多研究者攻克了这一难题。最重要的是文档分类。所有的文件都是由大量的单词组成的。许多分类方法从文档中提取关键词,然后分析关键词的模式或频率。为了提高文档分类的性能,本文提出了基于关键词的分类词权重(CTW)方法。CTW将关键词频率与语义信息相结合。频率和语义信息在查找文档之间的相似性方面具有很大的潜力。这就是为什么我们从培训文档的集合中计算CTW。在这一步之后,通过设计的马尔可夫逻辑网络模型将未知文档中的CTW和之前的类别词数据库中的CTW进行应用。应用CTW将所设计的MLNs模型与已有的Naive-bayse模型进行比较。实验结果表明,与现有模型相比,该模型的精度有所提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信