{"title":"一种检测最佳特征集的新方法","authors":"Shuang Gang, Hongnian Yu","doi":"10.1109/ICNSC.2005.1461164","DOIUrl":null,"url":null,"abstract":"The efficiency and accuracy of training neural-net classifiers is typically improved by eliminating of features that are redundant and irrelevant. The objective is to reduce the size of the input feature set and at the same time retain as much as possible of the class discriminatory information. Such an input features set after reducing features from the original input features, which is called best feature space, would offer a reduction of both cost and complexity of feature collection as well as improve the efficiency and accuracy of the resultant classifier. In this paper, we develop and evaluate two composite feature selection algorithms: mutual information feature space forward selection (MIFSFS) and mutual information feature space backward selection (MTFSBS). These two algorithms use mutual information (MI), for both continuous-valued and discrete-valued features, between the feature space and the class in order to find the best feature space from the original input features. The most important output from the new algorithms is that we not only can identity the irrelevant features, but also identify the redundant features, which cannot be identified by the common feature selection algorithms (for example artificial neural networks, ANN). Empirical studies of both realistic new and previously published classification problems indicate that the proposed algorithms are robust, stable and efficient. One of real application is drug discovery. Knowledge discovery from gene expression data is a highly research topical area for the drug discovery. Finding a number of genes out of the thousands caused a certain types of diseases is a significant contribution for the preventing and fighting diseases. Here we use the myeloma disease as an example to demonstrate how to identify the genes caused the myeloma disease.","PeriodicalId":313251,"journal":{"name":"Proceedings. 2005 IEEE Networking, Sensing and Control, 2005.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A new approach for detecting the best feature set\",\"authors\":\"Shuang Gang, Hongnian Yu\",\"doi\":\"10.1109/ICNSC.2005.1461164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The efficiency and accuracy of training neural-net classifiers is typically improved by eliminating of features that are redundant and irrelevant. The objective is to reduce the size of the input feature set and at the same time retain as much as possible of the class discriminatory information. Such an input features set after reducing features from the original input features, which is called best feature space, would offer a reduction of both cost and complexity of feature collection as well as improve the efficiency and accuracy of the resultant classifier. In this paper, we develop and evaluate two composite feature selection algorithms: mutual information feature space forward selection (MIFSFS) and mutual information feature space backward selection (MTFSBS). These two algorithms use mutual information (MI), for both continuous-valued and discrete-valued features, between the feature space and the class in order to find the best feature space from the original input features. The most important output from the new algorithms is that we not only can identity the irrelevant features, but also identify the redundant features, which cannot be identified by the common feature selection algorithms (for example artificial neural networks, ANN). Empirical studies of both realistic new and previously published classification problems indicate that the proposed algorithms are robust, stable and efficient. One of real application is drug discovery. Knowledge discovery from gene expression data is a highly research topical area for the drug discovery. Finding a number of genes out of the thousands caused a certain types of diseases is a significant contribution for the preventing and fighting diseases. Here we use the myeloma disease as an example to demonstrate how to identify the genes caused the myeloma disease.\",\"PeriodicalId\":313251,\"journal\":{\"name\":\"Proceedings. 2005 IEEE Networking, Sensing and Control, 2005.\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 2005 IEEE Networking, Sensing and Control, 2005.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNSC.2005.1461164\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 2005 IEEE Networking, Sensing and Control, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNSC.2005.1461164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The efficiency and accuracy of training neural-net classifiers is typically improved by eliminating of features that are redundant and irrelevant. The objective is to reduce the size of the input feature set and at the same time retain as much as possible of the class discriminatory information. Such an input features set after reducing features from the original input features, which is called best feature space, would offer a reduction of both cost and complexity of feature collection as well as improve the efficiency and accuracy of the resultant classifier. In this paper, we develop and evaluate two composite feature selection algorithms: mutual information feature space forward selection (MIFSFS) and mutual information feature space backward selection (MTFSBS). These two algorithms use mutual information (MI), for both continuous-valued and discrete-valued features, between the feature space and the class in order to find the best feature space from the original input features. The most important output from the new algorithms is that we not only can identity the irrelevant features, but also identify the redundant features, which cannot be identified by the common feature selection algorithms (for example artificial neural networks, ANN). Empirical studies of both realistic new and previously published classification problems indicate that the proposed algorithms are robust, stable and efficient. One of real application is drug discovery. Knowledge discovery from gene expression data is a highly research topical area for the drug discovery. Finding a number of genes out of the thousands caused a certain types of diseases is a significant contribution for the preventing and fighting diseases. Here we use the myeloma disease as an example to demonstrate how to identify the genes caused the myeloma disease.