基于多目标策略的非线性依赖特征选择

2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE) Pub Date : 2016-10-01 DOI:10.1109/BIBE.2016.38

Chun-Liang Lu, Wei Tang, Yu-Shuen Tsai, N. Pal, I. Chung

{"title":"基于多目标策略的非线性依赖特征选择","authors":"Chun-Liang Lu, Wei Tang, Yu-Shuen Tsai, N. Pal, I. Chung","doi":"10.1109/BIBE.2016.38","DOIUrl":null,"url":null,"abstract":"It is an interesting and important issue to identify a small set of useful features from a high dimensional data that can be used to design a classification mechanism. Usually, researchers prefer to find the features that have high relevance, in the sense that the correlation of each of those features with class labels is high or the mutual information between each of the features and class labels is high. Such approaches usually end up finding features that may be linearly dependent with each other. For some biological studies, it may be interesting to find a set of genes (features), which have high relevance with the class labels and also the genes are nonlinearly dependent -we explicitly want to exclude relevant genes that are linearly correlated among them. Although, our primary focus in this study is to find such genes from microarray data sets, such features may also be important in other studies. In this study, the Combinations of Relevantly Non-linear Dependency Subsets (CoRNDS) is proposed to tackle such the multi-objective problem. It opens up a good to simultaneously control selection of number of useful features, optimize the relevance between the selected features with class labels, and the non-linear dependency between the selected features. Using innovative ways we design three new objectives and optimize them by using the well-known multi-objective evolutionary algorithm based on decomposition (MOEA/D) method. To the best of our knowledge, this is the first attempt to feature (gene) selection along with identification of non-linear dependency between features via a multi-objective strategy. Experimental results show that the feasibility and effective performance on microarray cancer dataset. As to these selected gene subsets, investigate their auxiliary role of co-regulation in the biological pathways, and the occurrence in the pathogenesis of cancer are interesting future works.","PeriodicalId":377504,"journal":{"name":"2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Feature Selection with Non-Linear Dependence Based on Multi-objective Strategy\",\"authors\":\"Chun-Liang Lu, Wei Tang, Yu-Shuen Tsai, N. Pal, I. Chung\",\"doi\":\"10.1109/BIBE.2016.38\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is an interesting and important issue to identify a small set of useful features from a high dimensional data that can be used to design a classification mechanism. Usually, researchers prefer to find the features that have high relevance, in the sense that the correlation of each of those features with class labels is high or the mutual information between each of the features and class labels is high. Such approaches usually end up finding features that may be linearly dependent with each other. For some biological studies, it may be interesting to find a set of genes (features), which have high relevance with the class labels and also the genes are nonlinearly dependent -we explicitly want to exclude relevant genes that are linearly correlated among them. Although, our primary focus in this study is to find such genes from microarray data sets, such features may also be important in other studies. In this study, the Combinations of Relevantly Non-linear Dependency Subsets (CoRNDS) is proposed to tackle such the multi-objective problem. It opens up a good to simultaneously control selection of number of useful features, optimize the relevance between the selected features with class labels, and the non-linear dependency between the selected features. Using innovative ways we design three new objectives and optimize them by using the well-known multi-objective evolutionary algorithm based on decomposition (MOEA/D) method. To the best of our knowledge, this is the first attempt to feature (gene) selection along with identification of non-linear dependency between features via a multi-objective strategy. Experimental results show that the feasibility and effective performance on microarray cancer dataset. As to these selected gene subsets, investigate their auxiliary role of co-regulation in the biological pathways, and the occurrence in the pathogenesis of cancer are interesting future works.\",\"PeriodicalId\":377504,\"journal\":{\"name\":\"2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBE.2016.38\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2016.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

从高维数据中识别出一小部分有用的特征是一个有趣而重要的问题，这些特征可用于设计分类机制。通常，研究人员更倾向于寻找具有高相关性的特征，即每个特征与类标签的相关性很高，或者每个特征与类标签之间的互信息很高。这种方法通常最终会找到彼此线性相关的特征。对于一些生物学研究来说，发现一组基因(特征)可能很有趣，这些基因(特征)与类标签高度相关，并且这些基因是非线性依赖的——我们明确地想要排除它们之间线性相关的相关基因。虽然，我们在这项研究中的主要重点是从微阵列数据集中找到这些基因，但这些特征在其他研究中也可能很重要。本文提出了相关非线性依赖子集(CoRNDS)的组合来解决这类多目标问题。它为同时控制有用特征数量的选择、优化所选特征与类标签之间的相关性以及所选特征之间的非线性依赖关系开辟了一个良好的途径。我们采用创新的方法设计了三个新的目标，并采用基于分解的多目标进化算法(MOEA/D)方法对它们进行了优化。据我们所知，这是第一次尝试特征(基因)选择以及通过多目标策略识别特征之间的非线性依赖关系。实验结果表明了该方法在微阵列癌症数据集上的可行性和有效性。对于这些被选择的基因亚群，研究它们在生物通路中协同调控的辅助作用，以及在癌症发病机制中的发生是未来有趣的工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Feature Selection with Non-Linear Dependence Based on Multi-objective Strategy

It is an interesting and important issue to identify a small set of useful features from a high dimensional data that can be used to design a classification mechanism. Usually, researchers prefer to find the features that have high relevance, in the sense that the correlation of each of those features with class labels is high or the mutual information between each of the features and class labels is high. Such approaches usually end up finding features that may be linearly dependent with each other. For some biological studies, it may be interesting to find a set of genes (features), which have high relevance with the class labels and also the genes are nonlinearly dependent -we explicitly want to exclude relevant genes that are linearly correlated among them. Although, our primary focus in this study is to find such genes from microarray data sets, such features may also be important in other studies. In this study, the Combinations of Relevantly Non-linear Dependency Subsets (CoRNDS) is proposed to tackle such the multi-objective problem. It opens up a good to simultaneously control selection of number of useful features, optimize the relevance between the selected features with class labels, and the non-linear dependency between the selected features. Using innovative ways we design three new objectives and optimize them by using the well-known multi-objective evolutionary algorithm based on decomposition (MOEA/D) method. To the best of our knowledge, this is the first attempt to feature (gene) selection along with identification of non-linear dependency between features via a multi-objective strategy. Experimental results show that the feasibility and effective performance on microarray cancer dataset. As to these selected gene subsets, investigate their auxiliary role of co-regulation in the biological pathways, and the occurrence in the pathogenesis of cancer are interesting future works.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE)

自引率

0.00%

发文量