Korean Document Classification Using Extended Vector Space Model

S. Lee
{"title":"Korean Document Classification Using Extended Vector Space Model","authors":"S. Lee","doi":"10.3745/KIPSTB.2011.18B.2.093","DOIUrl":null,"url":null,"abstract":"We propose a extended vector space model by using ambiguous words and disambiguous words to improve the result of a Korean document classification method. In this paper we study the precision enhancement of vector space model and we propose a new axis that represents a weight value. Conventional classification methods without the weight value had some problems in vector comparison. We define a word which has same axis of the weight value as ambiguous word after calculating a mutual information value between a term and its classification field. We define a word which is disambiguous with ambiguous meaning as disambiguous word. We decide the strengthness of a disambiguous word among several words which is occurring ambiguous word and a same document. Finally, we proposed a new classification method based on extension of vector dimension with ambiguous and disambiguous words.","PeriodicalId":122700,"journal":{"name":"The Kips Transactions:partb","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Kips Transactions:partb","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3745/KIPSTB.2011.18B.2.093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We propose a extended vector space model by using ambiguous words and disambiguous words to improve the result of a Korean document classification method. In this paper we study the precision enhancement of vector space model and we propose a new axis that represents a weight value. Conventional classification methods without the weight value had some problems in vector comparison. We define a word which has same axis of the weight value as ambiguous word after calculating a mutual information value between a term and its classification field. We define a word which is disambiguous with ambiguous meaning as disambiguous word. We decide the strengthness of a disambiguous word among several words which is occurring ambiguous word and a same document. Finally, we proposed a new classification method based on extension of vector dimension with ambiguous and disambiguous words.
基于扩展向量空间模型的韩语文档分类
我们提出了一种使用歧义词和非歧义词的扩展向量空间模型来改进韩语文档分类方法的结果。本文研究了向量空间模型的精度增强问题,提出了一种新的表示权值的轴。传统的没有权重值的分类方法在矢量比较中存在一些问题。我们通过计算一个词和它的分类字段之间的互信息值,定义一个与权重值轴线相同的词作为二义词。我们把意义不明确的词定义为不含糊的词。我们从几个出现歧义词的单词和同一文档中确定一个无歧义词的强度。最后,我们提出了一种基于向量维扩展的二义词和非二义词分类方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信