{"title":"基于富向量空间模型的文本分类","authors":"T. Georgieva-Trifonova","doi":"10.1145/3134302.3134343","DOIUrl":null,"url":null,"abstract":"As one of the challenges to text classification can be indicated applying a model with the following characteristics: acceptable computational complexity of the model construction; dimension reduction of the vector space without significant decreasing of the classification performance. The present research aims to find a possible solution to mentioned problems. This paper proposes a model obtained by enrichment of the vector space model with the association relationships between words extracted from their co-occurrence in the text documents. For this purpose, the lift measure of association rules between word pairs is calculated. Experiments are conducted on Reuters-21578 dataset by using SVM classifier. The results confirm that applying the model improves the binominal and polynomial classification performance in comparison to the vector space model with respect to the F-measure even after word filtering, leading to a significant dimension reduction.","PeriodicalId":131196,"journal":{"name":"Proceedings of the 18th International Conference on Computer Systems and Technologies","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Text Classification Based on Enriched Vector Space Model\",\"authors\":\"T. Georgieva-Trifonova\",\"doi\":\"10.1145/3134302.3134343\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As one of the challenges to text classification can be indicated applying a model with the following characteristics: acceptable computational complexity of the model construction; dimension reduction of the vector space without significant decreasing of the classification performance. The present research aims to find a possible solution to mentioned problems. This paper proposes a model obtained by enrichment of the vector space model with the association relationships between words extracted from their co-occurrence in the text documents. For this purpose, the lift measure of association rules between word pairs is calculated. Experiments are conducted on Reuters-21578 dataset by using SVM classifier. The results confirm that applying the model improves the binominal and polynomial classification performance in comparison to the vector space model with respect to the F-measure even after word filtering, leading to a significant dimension reduction.\",\"PeriodicalId\":131196,\"journal\":{\"name\":\"Proceedings of the 18th International Conference on Computer Systems and Technologies\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 18th International Conference on Computer Systems and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3134302.3134343\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Conference on Computer Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3134302.3134343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Text Classification Based on Enriched Vector Space Model
As one of the challenges to text classification can be indicated applying a model with the following characteristics: acceptable computational complexity of the model construction; dimension reduction of the vector space without significant decreasing of the classification performance. The present research aims to find a possible solution to mentioned problems. This paper proposes a model obtained by enrichment of the vector space model with the association relationships between words extracted from their co-occurrence in the text documents. For this purpose, the lift measure of association rules between word pairs is calculated. Experiments are conducted on Reuters-21578 dataset by using SVM classifier. The results confirm that applying the model improves the binominal and polynomial classification performance in comparison to the vector space model with respect to the F-measure even after word filtering, leading to a significant dimension reduction.