基于聚类判别系数的文本分类特征约简

2012 Conference on Technologies and Applications of Artificial Intelligence Pub Date : 2012-11-16 DOI:10.1109/TAAI.2012.16

Li-Ju Gao, Been-Chian Chien

{"title":"基于聚类判别系数的文本分类特征约简","authors":"Li-Ju Gao, Been-Chian Chien","doi":"10.1109/TAAI.2012.16","DOIUrl":null,"url":null,"abstract":"Text classification is an important research topic for managing numerous electronic documents. Feature reduction is the key issue for text classification with high dimensional keywords. A document analysis method called discriminant coefficient was proposed to reduce features and achieve high precision text classification. However, the main problem of the discriminant based feature reduction method is that the final number of reduced features is exactly equal to the number of document classes. Although the precisions of classification are high in such a method, the recalls are relatively low. In this paper, we propose an improvement on the analyzing method indiscriminant coefficients. We apply a simple clustering method to distinguish the documents in each document class to reserve hidden differences among keywords in the same class. The clustering results can help to adjust the number of reduction features flexibly. The experimental results show that the proposed clustering mechanism supports adaptive features reduction and both of the recall and F1 measurements are improved.","PeriodicalId":385063,"journal":{"name":"2012 Conference on Technologies and Applications of Artificial Intelligence","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Feature Reduction for Text Categorization Using Cluster-Based Discriminant Coefficient\",\"authors\":\"Li-Ju Gao, Been-Chian Chien\",\"doi\":\"10.1109/TAAI.2012.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification is an important research topic for managing numerous electronic documents. Feature reduction is the key issue for text classification with high dimensional keywords. A document analysis method called discriminant coefficient was proposed to reduce features and achieve high precision text classification. However, the main problem of the discriminant based feature reduction method is that the final number of reduced features is exactly equal to the number of document classes. Although the precisions of classification are high in such a method, the recalls are relatively low. In this paper, we propose an improvement on the analyzing method indiscriminant coefficients. We apply a simple clustering method to distinguish the documents in each document class to reserve hidden differences among keywords in the same class. The clustering results can help to adjust the number of reduction features flexibly. The experimental results show that the proposed clustering mechanism supports adaptive features reduction and both of the recall and F1 measurements are improved.\",\"PeriodicalId\":385063,\"journal\":{\"name\":\"2012 Conference on Technologies and Applications of Artificial Intelligence\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Conference on Technologies and Applications of Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TAAI.2012.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Conference on Technologies and Applications of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAAI.2012.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

文本分类是管理大量电子文档的重要研究课题。特征约简是高维关键词文本分类的关键问题。提出了一种基于判别系数的文本分析方法来减少特征，实现高精度文本分类。然而，基于判别的特征约简方法的主要问题是，最终约简的特征数量恰好等于文档类的数量。虽然该方法的分类精度较高，但召回率相对较低。本文提出了一种改进的无差别系数分析方法。我们采用一种简单的聚类方法来区分每个文档类中的文档，以保留同一类中关键字之间的隐藏差异。聚类结果有助于灵活调整约简特征的数量。实验结果表明，所提出的聚类机制支持自适应特征约简，召回率和F1测量都得到了改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Feature Reduction for Text Categorization Using Cluster-Based Discriminant Coefficient

Text classification is an important research topic for managing numerous electronic documents. Feature reduction is the key issue for text classification with high dimensional keywords. A document analysis method called discriminant coefficient was proposed to reduce features and achieve high precision text classification. However, the main problem of the discriminant based feature reduction method is that the final number of reduced features is exactly equal to the number of document classes. Although the precisions of classification are high in such a method, the recalls are relatively low. In this paper, we propose an improvement on the analyzing method indiscriminant coefficients. We apply a simple clustering method to distinguish the documents in each document class to reserve hidden differences among keywords in the same class. The clustering results can help to adjust the number of reduction features flexibly. The experimental results show that the proposed clustering mechanism supports adaptive features reduction and both of the recall and F1 measurements are improved.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 Conference on Technologies and Applications of Artificial Intelligence

自引率

0.00%

发文量