基于聚类判别系数的文本分类特征约简

Li-Ju Gao, Been-Chian Chien
{"title":"基于聚类判别系数的文本分类特征约简","authors":"Li-Ju Gao, Been-Chian Chien","doi":"10.1109/TAAI.2012.16","DOIUrl":null,"url":null,"abstract":"Text classification is an important research topic for managing numerous electronic documents. Feature reduction is the key issue for text classification with high dimensional keywords. A document analysis method called discriminant coefficient was proposed to reduce features and achieve high precision text classification. However, the main problem of the discriminant based feature reduction method is that the final number of reduced features is exactly equal to the number of document classes. Although the precisions of classification are high in such a method, the recalls are relatively low. In this paper, we propose an improvement on the analyzing method indiscriminant coefficients. We apply a simple clustering method to distinguish the documents in each document class to reserve hidden differences among keywords in the same class. The clustering results can help to adjust the number of reduction features flexibly. The experimental results show that the proposed clustering mechanism supports adaptive features reduction and both of the recall and F1 measurements are improved.","PeriodicalId":385063,"journal":{"name":"2012 Conference on Technologies and Applications of Artificial Intelligence","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Feature Reduction for Text Categorization Using Cluster-Based Discriminant Coefficient\",\"authors\":\"Li-Ju Gao, Been-Chian Chien\",\"doi\":\"10.1109/TAAI.2012.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification is an important research topic for managing numerous electronic documents. Feature reduction is the key issue for text classification with high dimensional keywords. A document analysis method called discriminant coefficient was proposed to reduce features and achieve high precision text classification. However, the main problem of the discriminant based feature reduction method is that the final number of reduced features is exactly equal to the number of document classes. Although the precisions of classification are high in such a method, the recalls are relatively low. In this paper, we propose an improvement on the analyzing method indiscriminant coefficients. We apply a simple clustering method to distinguish the documents in each document class to reserve hidden differences among keywords in the same class. The clustering results can help to adjust the number of reduction features flexibly. The experimental results show that the proposed clustering mechanism supports adaptive features reduction and both of the recall and F1 measurements are improved.\",\"PeriodicalId\":385063,\"journal\":{\"name\":\"2012 Conference on Technologies and Applications of Artificial Intelligence\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Conference on Technologies and Applications of Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TAAI.2012.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Conference on Technologies and Applications of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAAI.2012.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

文本分类是管理大量电子文档的重要研究课题。特征约简是高维关键词文本分类的关键问题。提出了一种基于判别系数的文本分析方法来减少特征,实现高精度文本分类。然而,基于判别的特征约简方法的主要问题是,最终约简的特征数量恰好等于文档类的数量。虽然该方法的分类精度较高,但召回率相对较低。本文提出了一种改进的无差别系数分析方法。我们采用一种简单的聚类方法来区分每个文档类中的文档,以保留同一类中关键字之间的隐藏差异。聚类结果有助于灵活调整约简特征的数量。实验结果表明,所提出的聚类机制支持自适应特征约简,召回率和F1测量都得到了改善。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Feature Reduction for Text Categorization Using Cluster-Based Discriminant Coefficient
Text classification is an important research topic for managing numerous electronic documents. Feature reduction is the key issue for text classification with high dimensional keywords. A document analysis method called discriminant coefficient was proposed to reduce features and achieve high precision text classification. However, the main problem of the discriminant based feature reduction method is that the final number of reduced features is exactly equal to the number of document classes. Although the precisions of classification are high in such a method, the recalls are relatively low. In this paper, we propose an improvement on the analyzing method indiscriminant coefficients. We apply a simple clustering method to distinguish the documents in each document class to reserve hidden differences among keywords in the same class. The clustering results can help to adjust the number of reduction features flexibly. The experimental results show that the proposed clustering mechanism supports adaptive features reduction and both of the recall and F1 measurements are improved.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信