A multivariate feature selection framework for high dimensional biomedical data classification

Abeer Alzubaidi, G. Cosma
{"title":"A multivariate feature selection framework for high dimensional biomedical data classification","authors":"Abeer Alzubaidi, G. Cosma","doi":"10.1109/CIBCB.2017.8058528","DOIUrl":null,"url":null,"abstract":"High dimensional biomedical data are becoming common in various predictive models developed for disease diagnosis and prognosis. Extracting knowledge from high dimensional data which contain a large number of features and a small sample size presents intrinsic challenges for classification models. Genetic Algorithms can be successfully adopted to efficiently search through high dimensional spaces, and multivariate classification methods can be utilized to evaluate combinations of features for constructing optimized predictive models. This paper proposes a framework which can be adopted for building prediction models for high dimensional biomedical data. The proposed framework comprises of three main phases. The feature filtering phase which filters out the noisy features; the feature selection phase which is based on multivariate machine learning techniques and the Genetic Algorithm to evaluate the filtered features and select the most informative subsets of features for achieving maximum classification performance; and the predictive modeling phase during which machine learning algorithms are trained on the selected features to construct a reliable prediction model. Experiments were conducted using four high dimensional biomedical datasets including protein and geneexpression data. The results revealed optimistic performances for the multivariate selection approaches which utilize classification measurements based on implicit assumptions.","PeriodicalId":283115,"journal":{"name":"2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"11 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2017.8058528","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

High dimensional biomedical data are becoming common in various predictive models developed for disease diagnosis and prognosis. Extracting knowledge from high dimensional data which contain a large number of features and a small sample size presents intrinsic challenges for classification models. Genetic Algorithms can be successfully adopted to efficiently search through high dimensional spaces, and multivariate classification methods can be utilized to evaluate combinations of features for constructing optimized predictive models. This paper proposes a framework which can be adopted for building prediction models for high dimensional biomedical data. The proposed framework comprises of three main phases. The feature filtering phase which filters out the noisy features; the feature selection phase which is based on multivariate machine learning techniques and the Genetic Algorithm to evaluate the filtered features and select the most informative subsets of features for achieving maximum classification performance; and the predictive modeling phase during which machine learning algorithms are trained on the selected features to construct a reliable prediction model. Experiments were conducted using four high dimensional biomedical datasets including protein and geneexpression data. The results revealed optimistic performances for the multivariate selection approaches which utilize classification measurements based on implicit assumptions.
用于高维生物医学数据分类的多变量特征选择框架
高维生物医学数据在疾病诊断和预后的各种预测模型中越来越普遍。从包含大量特征和小样本量的高维数据中提取知识对分类模型提出了固有的挑战。利用遗传算法对高维空间进行高效搜索,利用多元分类方法对特征组合进行评估,构建优化的预测模型。本文提出了一种可用于高维生物医学数据预测模型构建的框架。拟议的框架包括三个主要阶段。特征滤波阶段,滤除噪声特征;特征选择阶段,基于多变量机器学习技术和遗传算法对过滤后的特征进行评估,选择信息量最大的特征子集,以实现最大的分类性能;在预测建模阶段,机器学习算法在选定的特征上进行训练,以构建可靠的预测模型。实验使用了四个高维生物医学数据集,包括蛋白质和基因表达数据。结果表明,利用基于隐式假设的分类测量的多变量选择方法具有良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信