A comprehensive analysis of using semantic information in text categorization

Kerem Çelik, T. Gungor
{"title":"A comprehensive analysis of using semantic information in text categorization","authors":"Kerem Çelik, T. Gungor","doi":"10.1109/INISTA.2013.6577651","DOIUrl":null,"url":null,"abstract":"Traditional text categorization methods only deal with the content of the documents and use some statistic based metrics to represent the documents. The representation is then used by a machine learning approach to determine the document class. In this picture, the meaning of the document is missing. In order to add meaning into the text categorization process, we start with using part-of-speech tagging (POS). As expected, in a document each part-of-speech tag does not contribute the same amount of information to the document meaning. In addition to the POS information, we make use of WordNet to add semantic features such as synonyms, hypernyms, hyponyms, meronyms and topics into classification process. Using WordNet's semantic features introduces ambiguity and not all semantic features are really related to the document content. To overcome this problem, we introduce a new method to eliminate the ambiguity. Various combinations of POS, WordNet and word sense disambiguation are applied and the results show that using semantic features perform better than the traditional, context based methods.","PeriodicalId":301458,"journal":{"name":"2013 IEEE INISTA","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE INISTA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INISTA.2013.6577651","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

Traditional text categorization methods only deal with the content of the documents and use some statistic based metrics to represent the documents. The representation is then used by a machine learning approach to determine the document class. In this picture, the meaning of the document is missing. In order to add meaning into the text categorization process, we start with using part-of-speech tagging (POS). As expected, in a document each part-of-speech tag does not contribute the same amount of information to the document meaning. In addition to the POS information, we make use of WordNet to add semantic features such as synonyms, hypernyms, hyponyms, meronyms and topics into classification process. Using WordNet's semantic features introduces ambiguity and not all semantic features are really related to the document content. To overcome this problem, we introduce a new method to eliminate the ambiguity. Various combinations of POS, WordNet and word sense disambiguation are applied and the results show that using semantic features perform better than the traditional, context based methods.
综合分析语义信息在文本分类中的应用
传统的文本分类方法只处理文档的内容,并使用一些基于统计的度量来表示文档。然后,机器学习方法使用该表示来确定文档类。在这张图片中,文件的含义丢失了。为了在文本分类过程中添加意义,我们首先使用词性标注(POS)。正如预期的那样,在文档中,每个词性标记为文档含义提供的信息量并不相同。除了词性信息外,我们还利用WordNet在分类过程中加入了同义词、上义、下义、复义和主题等语义特征。使用WordNet的语义特性会引入歧义,而且并非所有的语义特性都与文档内容真正相关。为了克服这个问题,我们引入了一种新的消除歧义的方法。结果表明,基于语义特征的消歧方法比传统的基于上下文的消歧方法效果更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信