OmicPredict: a framework for omics data prediction using ANOVA-Firefly algorithm for feature selection.

IF 1.7 4区 医学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Parampreet Kaur, Ashima Singh, Inderveer Chana
{"title":"OmicPredict: a framework for omics data prediction using ANOVA-Firefly algorithm for feature selection.","authors":"Parampreet Kaur, Ashima Singh, Inderveer Chana","doi":"10.1080/10255842.2023.2268236","DOIUrl":null,"url":null,"abstract":"<p><p>High-throughput technologies and machine learning (ML), when applied to a huge pool of medical data such as omics data, result in efficient analysis. Recent research aims to apply and develop ML models to predict a disease well in time using available omics datasets. The present work proposed a framework, 'OmicPredict', deploying a hybrid feature selection method and deep neural network (DNN) model to predict multiple diseases using omics data. The hybrid feature selection method is developed using the Analysis of Variance (ANOVA) technique and firefly algorithm. The OmicPredict framework is applied to three case studies, Alzheimer's disease, Breast cancer, and Coronavirus disease 2019 (COVID-19). In the case study of Alzheimer's disease, the framework predicts patients using GSE33000 and GSE44770 dataset. In the case study of Breast cancer, the framework predicts human epidermal growth factor receptor 2 (HER2) subtype status using Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset. In the case study of COVID-19, the framework performs patients' classification using GSE157103 dataset. The experimental results show that DNN model achieved an Area Under Curve (AUC) score of 0.949 for the Alzheimer's (GSE33000 and GSE44770) dataset. Furthermore, it achieved an AUC score of 0.987 and 0.989 for breast cancer (METABRIC) and COVID-19 (GSE157103) datasets, respectively, outperforming Random Forest, Naïve Bayes models, and the existing research.</p>","PeriodicalId":50640,"journal":{"name":"Computer Methods in Biomechanics and Biomedical Engineering","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Methods in Biomechanics and Biomedical Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1080/10255842.2023.2268236","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/16 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

High-throughput technologies and machine learning (ML), when applied to a huge pool of medical data such as omics data, result in efficient analysis. Recent research aims to apply and develop ML models to predict a disease well in time using available omics datasets. The present work proposed a framework, 'OmicPredict', deploying a hybrid feature selection method and deep neural network (DNN) model to predict multiple diseases using omics data. The hybrid feature selection method is developed using the Analysis of Variance (ANOVA) technique and firefly algorithm. The OmicPredict framework is applied to three case studies, Alzheimer's disease, Breast cancer, and Coronavirus disease 2019 (COVID-19). In the case study of Alzheimer's disease, the framework predicts patients using GSE33000 and GSE44770 dataset. In the case study of Breast cancer, the framework predicts human epidermal growth factor receptor 2 (HER2) subtype status using Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset. In the case study of COVID-19, the framework performs patients' classification using GSE157103 dataset. The experimental results show that DNN model achieved an Area Under Curve (AUC) score of 0.949 for the Alzheimer's (GSE33000 and GSE44770) dataset. Furthermore, it achieved an AUC score of 0.987 and 0.989 for breast cancer (METABRIC) and COVID-19 (GSE157103) datasets, respectively, outperforming Random Forest, Naïve Bayes models, and the existing research.

OmicPredict:使用ANOVA萤火虫算法进行特征选择的组学数据预测框架。
高通量技术和机器学习(ML)在应用于大量医学数据(如组学数据)时,可以实现高效的分析。最近的研究旨在应用和开发ML模型,利用可用的组学数据集及时预测疾病。目前的工作提出了一个名为“OmicPredict”的框架,部署了一种混合特征选择方法和深度神经网络(DNN)模型,以使用组学数据预测多种疾病。利用方差分析(ANOVA)技术和萤火虫算法开发了混合特征选择方法。OmicPredict框架应用于三项案例研究,即阿尔茨海默病、癌症乳腺癌和2019冠状病毒病(新冠肺炎)。在阿尔茨海默病的案例研究中,该框架使用GSE33000和GSE44770数据集预测患者。在癌症的案例研究中,该框架使用癌症国际联合会(METABRIC)的分子分类数据集预测人类表皮生长因子受体2(HER2)亚型状态。在新冠肺炎病例研究中,该框架使用GSE157103数据集对患者进行分类。实验结果表明,对于阿尔茨海默氏症(GSE33000和GSE44770)数据集,DNN模型的曲线下面积(AUC)得分为0.949。此外,它在癌症(METABRIC)和新冠肺炎(GSE157103)数据集的AUC得分分别为0.987和0.989,优于随机森林、朴素贝叶斯模型和现有研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.10
自引率
6.20%
发文量
179
审稿时长
4-8 weeks
期刊介绍: The primary aims of Computer Methods in Biomechanics and Biomedical Engineering are to provide a means of communicating the advances being made in the areas of biomechanics and biomedical engineering and to stimulate interest in the continually emerging computer based technologies which are being applied in these multidisciplinary subjects. Computer Methods in Biomechanics and Biomedical Engineering will also provide a focus for the importance of integrating the disciplines of engineering with medical technology and clinical expertise. Such integration will have a major impact on health care in the future.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信