一种检测最佳特征集的新方法

Proceedings. 2005 IEEE Networking, Sensing and Control, 2005. Pub Date : 2005-03-19 DOI:10.1109/ICNSC.2005.1461164

Shuang Gang, Hongnian Yu

{"title":"一种检测最佳特征集的新方法","authors":"Shuang Gang, Hongnian Yu","doi":"10.1109/ICNSC.2005.1461164","DOIUrl":null,"url":null,"abstract":"The efficiency and accuracy of training neural-net classifiers is typically improved by eliminating of features that are redundant and irrelevant. The objective is to reduce the size of the input feature set and at the same time retain as much as possible of the class discriminatory information. Such an input features set after reducing features from the original input features, which is called best feature space, would offer a reduction of both cost and complexity of feature collection as well as improve the efficiency and accuracy of the resultant classifier. In this paper, we develop and evaluate two composite feature selection algorithms: mutual information feature space forward selection (MIFSFS) and mutual information feature space backward selection (MTFSBS). These two algorithms use mutual information (MI), for both continuous-valued and discrete-valued features, between the feature space and the class in order to find the best feature space from the original input features. The most important output from the new algorithms is that we not only can identity the irrelevant features, but also identify the redundant features, which cannot be identified by the common feature selection algorithms (for example artificial neural networks, ANN). Empirical studies of both realistic new and previously published classification problems indicate that the proposed algorithms are robust, stable and efficient. One of real application is drug discovery. Knowledge discovery from gene expression data is a highly research topical area for the drug discovery. Finding a number of genes out of the thousands caused a certain types of diseases is a significant contribution for the preventing and fighting diseases. Here we use the myeloma disease as an example to demonstrate how to identify the genes caused the myeloma disease.","PeriodicalId":313251,"journal":{"name":"Proceedings. 2005 IEEE Networking, Sensing and Control, 2005.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A new approach for detecting the best feature set\",\"authors\":\"Shuang Gang, Hongnian Yu\",\"doi\":\"10.1109/ICNSC.2005.1461164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The efficiency and accuracy of training neural-net classifiers is typically improved by eliminating of features that are redundant and irrelevant. The objective is to reduce the size of the input feature set and at the same time retain as much as possible of the class discriminatory information. Such an input features set after reducing features from the original input features, which is called best feature space, would offer a reduction of both cost and complexity of feature collection as well as improve the efficiency and accuracy of the resultant classifier. In this paper, we develop and evaluate two composite feature selection algorithms: mutual information feature space forward selection (MIFSFS) and mutual information feature space backward selection (MTFSBS). These two algorithms use mutual information (MI), for both continuous-valued and discrete-valued features, between the feature space and the class in order to find the best feature space from the original input features. The most important output from the new algorithms is that we not only can identity the irrelevant features, but also identify the redundant features, which cannot be identified by the common feature selection algorithms (for example artificial neural networks, ANN). Empirical studies of both realistic new and previously published classification problems indicate that the proposed algorithms are robust, stable and efficient. One of real application is drug discovery. Knowledge discovery from gene expression data is a highly research topical area for the drug discovery. Finding a number of genes out of the thousands caused a certain types of diseases is a significant contribution for the preventing and fighting diseases. Here we use the myeloma disease as an example to demonstrate how to identify the genes caused the myeloma disease.\",\"PeriodicalId\":313251,\"journal\":{\"name\":\"Proceedings. 2005 IEEE Networking, Sensing and Control, 2005.\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 2005 IEEE Networking, Sensing and Control, 2005.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNSC.2005.1461164\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 2005 IEEE Networking, Sensing and Control, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNSC.2005.1461164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

训练神经网络分类器的效率和准确性通常通过消除冗余和不相关的特征来提高。目标是减小输入特征集的大小，同时尽可能多地保留类区别信息。这种从原始输入特征中剔除特征后的输入特征集称为最佳特征空间，可以降低特征收集的成本和复杂性，提高分类器的效率和准确性。本文提出并评价了两种复合特征选择算法:互信息特征空间前向选择(MIFSFS)和互信息特征空间后向选择(MTFSBS)。这两种算法对连续值和离散值特征在特征空间和类之间使用互信息(MI)，以便从原始输入特征中找到最佳特征空间。新算法最重要的输出是，我们不仅可以识别不相关的特征，还可以识别冗余的特征，这是一般的特征选择算法(如人工神经网络，ANN)无法识别的。对现实的新分类问题和先前发表的分类问题的实证研究表明，所提出的算法具有鲁棒性、稳定性和有效性。真正的应用之一是药物发现。从基因表达数据中发现知识是药物发现的一个热门研究领域。从数千种导致某种疾病的基因中找出一些基因，对预防和对抗疾病是一项重大贡献。这里我们以骨髓瘤病为例，说明如何鉴定骨髓瘤病的致病基因。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A new approach for detecting the best feature set

The efficiency and accuracy of training neural-net classifiers is typically improved by eliminating of features that are redundant and irrelevant. The objective is to reduce the size of the input feature set and at the same time retain as much as possible of the class discriminatory information. Such an input features set after reducing features from the original input features, which is called best feature space, would offer a reduction of both cost and complexity of feature collection as well as improve the efficiency and accuracy of the resultant classifier. In this paper, we develop and evaluate two composite feature selection algorithms: mutual information feature space forward selection (MIFSFS) and mutual information feature space backward selection (MTFSBS). These two algorithms use mutual information (MI), for both continuous-valued and discrete-valued features, between the feature space and the class in order to find the best feature space from the original input features. The most important output from the new algorithms is that we not only can identity the irrelevant features, but also identify the redundant features, which cannot be identified by the common feature selection algorithms (for example artificial neural networks, ANN). Empirical studies of both realistic new and previously published classification problems indicate that the proposed algorithms are robust, stable and efficient. One of real application is drug discovery. Knowledge discovery from gene expression data is a highly research topical area for the drug discovery. Finding a number of genes out of the thousands caused a certain types of diseases is a significant contribution for the preventing and fighting diseases. Here we use the myeloma disease as an example to demonstrate how to identify the genes caused the myeloma disease.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. 2005 IEEE Networking, Sensing and Control, 2005.

自引率

0.00%

发文量