基于粗糙集理论的文本特征排序

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07) Pub Date : 2007-11-02 DOI:10.1109/WI.2007.150

Songbo Tan, Yuefen Wang, Xueqi Cheng

{"title":"基于粗糙集理论的文本特征排序","authors":"Songbo Tan, Yuefen Wang, Xueqi Cheng","doi":"10.1109/WI.2007.150","DOIUrl":null,"url":null,"abstract":"With the aim to reduce the dimensionality without sacrificing classification performance, the author gains insights from attribute reduction based on discernibility matrix in rough-set theory and proposes two text feature selection algorithms, i.e., DB1 and DB2. The experimental results indicate that DB2 not only yields much higher accuracy than information gain when the number of features is smaller than 6000, but also incurs much smaller CPU time than information gain.","PeriodicalId":192501,"journal":{"name":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Text Feature Ranking Based on Rough-set Theory\",\"authors\":\"Songbo Tan, Yuefen Wang, Xueqi Cheng\",\"doi\":\"10.1109/WI.2007.150\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the aim to reduce the dimensionality without sacrificing classification performance, the author gains insights from attribute reduction based on discernibility matrix in rough-set theory and proposes two text feature selection algorithms, i.e., DB1 and DB2. The experimental results indicate that DB2 not only yields much higher accuracy than information gain when the number of features is smaller than 6000, but also incurs much smaller CPU time than information gain.\",\"PeriodicalId\":192501,\"journal\":{\"name\":\"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI.2007.150\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2007.150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

为了在不牺牲分类性能的前提下降维，作者借鉴了粗糙集理论中基于可别性矩阵的属性约简，提出了两种文本特征选择算法DB1和DB2。实验结果表明，当特征的数量小于6000时，DB2不仅比信息增益产生更高的准确性，而且所消耗的CPU时间也比信息增益少得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Text Feature Ranking Based on Rough-set Theory

With the aim to reduce the dimensionality without sacrificing classification performance, the author gains insights from attribute reduction based on discernibility matrix in rough-set theory and proposes two text feature selection algorithms, i.e., DB1 and DB2. The experimental results indicate that DB2 not only yields much higher accuracy than information gain when the number of features is smaller than 6000, but also incurs much smaller CPU time than information gain.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE/WIC/ACM International Conference on Web Intelligence (WI'07)

自引率

0.00%

发文量