基于富向量空间模型的文本分类

T. Georgieva-Trifonova
{"title":"基于富向量空间模型的文本分类","authors":"T. Georgieva-Trifonova","doi":"10.1145/3134302.3134343","DOIUrl":null,"url":null,"abstract":"As one of the challenges to text classification can be indicated applying a model with the following characteristics: acceptable computational complexity of the model construction; dimension reduction of the vector space without significant decreasing of the classification performance. The present research aims to find a possible solution to mentioned problems. This paper proposes a model obtained by enrichment of the vector space model with the association relationships between words extracted from their co-occurrence in the text documents. For this purpose, the lift measure of association rules between word pairs is calculated. Experiments are conducted on Reuters-21578 dataset by using SVM classifier. The results confirm that applying the model improves the binominal and polynomial classification performance in comparison to the vector space model with respect to the F-measure even after word filtering, leading to a significant dimension reduction.","PeriodicalId":131196,"journal":{"name":"Proceedings of the 18th International Conference on Computer Systems and Technologies","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Text Classification Based on Enriched Vector Space Model\",\"authors\":\"T. Georgieva-Trifonova\",\"doi\":\"10.1145/3134302.3134343\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As one of the challenges to text classification can be indicated applying a model with the following characteristics: acceptable computational complexity of the model construction; dimension reduction of the vector space without significant decreasing of the classification performance. The present research aims to find a possible solution to mentioned problems. This paper proposes a model obtained by enrichment of the vector space model with the association relationships between words extracted from their co-occurrence in the text documents. For this purpose, the lift measure of association rules between word pairs is calculated. Experiments are conducted on Reuters-21578 dataset by using SVM classifier. The results confirm that applying the model improves the binominal and polynomial classification performance in comparison to the vector space model with respect to the F-measure even after word filtering, leading to a significant dimension reduction.\",\"PeriodicalId\":131196,\"journal\":{\"name\":\"Proceedings of the 18th International Conference on Computer Systems and Technologies\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 18th International Conference on Computer Systems and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3134302.3134343\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Conference on Computer Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3134302.3134343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

作为文本分类的挑战之一,可以使用具有以下特征的模型来表示:模型构建的可接受的计算复杂性;在不显著降低分类性能的前提下,降低了向量空间的维数。本研究旨在为上述问题寻找一个可能的解决方案。本文提出了一种利用文本文档中共现词之间的关联关系对向量空间模型进行充实的模型。为此,计算词对间关联规则的提升测度。利用SVM分类器在Reuters-21578数据集上进行了实验。结果证实,与F-measure相比,即使在单词过滤之后,应用该模型也能提高二项式和多项式分类性能,从而显著降低维数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Text Classification Based on Enriched Vector Space Model
As one of the challenges to text classification can be indicated applying a model with the following characteristics: acceptable computational complexity of the model construction; dimension reduction of the vector space without significant decreasing of the classification performance. The present research aims to find a possible solution to mentioned problems. This paper proposes a model obtained by enrichment of the vector space model with the association relationships between words extracted from their co-occurrence in the text documents. For this purpose, the lift measure of association rules between word pairs is calculated. Experiments are conducted on Reuters-21578 dataset by using SVM classifier. The results confirm that applying the model improves the binominal and polynomial classification performance in comparison to the vector space model with respect to the F-measure even after word filtering, leading to a significant dimension reduction.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信