基于富向量空间模型的文本分类

Proceedings of the 18th International Conference on Computer Systems and Technologies Pub Date : 2017-06-23 DOI:10.1145/3134302.3134343

T. Georgieva-Trifonova

{"title":"基于富向量空间模型的文本分类","authors":"T. Georgieva-Trifonova","doi":"10.1145/3134302.3134343","DOIUrl":null,"url":null,"abstract":"As one of the challenges to text classification can be indicated applying a model with the following characteristics: acceptable computational complexity of the model construction; dimension reduction of the vector space without significant decreasing of the classification performance. The present research aims to find a possible solution to mentioned problems. This paper proposes a model obtained by enrichment of the vector space model with the association relationships between words extracted from their co-occurrence in the text documents. For this purpose, the lift measure of association rules between word pairs is calculated. Experiments are conducted on Reuters-21578 dataset by using SVM classifier. The results confirm that applying the model improves the binominal and polynomial classification performance in comparison to the vector space model with respect to the F-measure even after word filtering, leading to a significant dimension reduction.","PeriodicalId":131196,"journal":{"name":"Proceedings of the 18th International Conference on Computer Systems and Technologies","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Text Classification Based on Enriched Vector Space Model\",\"authors\":\"T. Georgieva-Trifonova\",\"doi\":\"10.1145/3134302.3134343\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As one of the challenges to text classification can be indicated applying a model with the following characteristics: acceptable computational complexity of the model construction; dimension reduction of the vector space without significant decreasing of the classification performance. The present research aims to find a possible solution to mentioned problems. This paper proposes a model obtained by enrichment of the vector space model with the association relationships between words extracted from their co-occurrence in the text documents. For this purpose, the lift measure of association rules between word pairs is calculated. Experiments are conducted on Reuters-21578 dataset by using SVM classifier. The results confirm that applying the model improves the binominal and polynomial classification performance in comparison to the vector space model with respect to the F-measure even after word filtering, leading to a significant dimension reduction.\",\"PeriodicalId\":131196,\"journal\":{\"name\":\"Proceedings of the 18th International Conference on Computer Systems and Technologies\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 18th International Conference on Computer Systems and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3134302.3134343\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Conference on Computer Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3134302.3134343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

作为文本分类的挑战之一，可以使用具有以下特征的模型来表示:模型构建的可接受的计算复杂性;在不显著降低分类性能的前提下，降低了向量空间的维数。本研究旨在为上述问题寻找一个可能的解决方案。本文提出了一种利用文本文档中共现词之间的关联关系对向量空间模型进行充实的模型。为此，计算词对间关联规则的提升测度。利用SVM分类器在Reuters-21578数据集上进行了实验。结果证实，与F-measure相比，即使在单词过滤之后，应用该模型也能提高二项式和多项式分类性能，从而显著降低维数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Text Classification Based on Enriched Vector Space Model

As one of the challenges to text classification can be indicated applying a model with the following characteristics: acceptable computational complexity of the model construction; dimension reduction of the vector space without significant decreasing of the classification performance. The present research aims to find a possible solution to mentioned problems. This paper proposes a model obtained by enrichment of the vector space model with the association relationships between words extracted from their co-occurrence in the text documents. For this purpose, the lift measure of association rules between word pairs is calculated. Experiments are conducted on Reuters-21578 dataset by using SVM classifier. The results confirm that applying the model improves the binominal and polynomial classification performance in comparison to the vector space model with respect to the F-measure even after word filtering, leading to a significant dimension reduction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 18th International Conference on Computer Systems and Technologies

自引率

0.00%

发文量