用于识别乳腺癌绝经状态的多组学生物标记物的机器学习模型

IF 1.8 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Algorithms Pub Date : 2023-12-28 DOI:10.3390/a17010013
Firas Alghanim, Ibrahim Al-Hurani, H. Qattous, Abdullah Al-Refai, Osamah Batiha, A. Alkhateeb, Salama Ikki
{"title":"用于识别乳腺癌绝经状态的多组学生物标记物的机器学习模型","authors":"Firas Alghanim, Ibrahim Al-Hurani, H. Qattous, Abdullah Al-Refai, Osamah Batiha, A. Alkhateeb, Salama Ikki","doi":"10.3390/a17010013","DOIUrl":null,"url":null,"abstract":"Identifying menopause-related breast cancer biomarkers is crucial for enhancing diagnosis, prognosis, and personalized treatment at that stage of the patient’s life. In this paper, we present a comprehensive framework for extracting multiomics biomarkers specifically related to breast cancer incidence before and after menopause. Our approach integrates DNA methylation, gene expression, and copy number alteration data using a systematic pipeline encompassing data preprocessing and handling class imbalance, dimensionality reduction, and classification. The framework starts with MutSigCV for data preprocessing and ensuring data quality. The Synthetic Minority Over-sampling Technique (SMOTE) up-sampling technique is applied to address the class imbalance representation. Then, Principal Component Analysis (PCA) transforms the DNA methylation, gene expression, and copy number alteration data into a latent space. The purpose is to discard irrelevant variations and extract relevant information. Finally, a classification model is built based on the transformed multiomics data into a unified representation. The framework contributes to understanding the complex interplay between menopause and breast cancer, thereby revealing more precise diagnostic and therapeutic strategies in the future. The explainable artificial intelligence model Shapley based on the XGBoost regressor showed the power of the selected gene expressions for predicting the menopause status, and the potential biomarkers included RUNX1, PTEN, MAP3K1, and CDH1. The literature confirmed the findings.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"221 8","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Model for Multiomics Biomarkers Identification for Menopause Status in Breast Cancer\",\"authors\":\"Firas Alghanim, Ibrahim Al-Hurani, H. Qattous, Abdullah Al-Refai, Osamah Batiha, A. Alkhateeb, Salama Ikki\",\"doi\":\"10.3390/a17010013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Identifying menopause-related breast cancer biomarkers is crucial for enhancing diagnosis, prognosis, and personalized treatment at that stage of the patient’s life. In this paper, we present a comprehensive framework for extracting multiomics biomarkers specifically related to breast cancer incidence before and after menopause. Our approach integrates DNA methylation, gene expression, and copy number alteration data using a systematic pipeline encompassing data preprocessing and handling class imbalance, dimensionality reduction, and classification. The framework starts with MutSigCV for data preprocessing and ensuring data quality. The Synthetic Minority Over-sampling Technique (SMOTE) up-sampling technique is applied to address the class imbalance representation. Then, Principal Component Analysis (PCA) transforms the DNA methylation, gene expression, and copy number alteration data into a latent space. The purpose is to discard irrelevant variations and extract relevant information. Finally, a classification model is built based on the transformed multiomics data into a unified representation. The framework contributes to understanding the complex interplay between menopause and breast cancer, thereby revealing more precise diagnostic and therapeutic strategies in the future. The explainable artificial intelligence model Shapley based on the XGBoost regressor showed the power of the selected gene expressions for predicting the menopause status, and the potential biomarkers included RUNX1, PTEN, MAP3K1, and CDH1. The literature confirmed the findings.\",\"PeriodicalId\":7636,\"journal\":{\"name\":\"Algorithms\",\"volume\":\"221 8\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-12-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Algorithms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/a17010013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/a17010013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

确定与更年期相关的乳腺癌生物标志物对于加强该阶段的诊断、预后和个性化治疗至关重要。在本文中,我们提出了一个提取与绝经前后乳腺癌发病率特别相关的多组学生物标志物的综合框架。我们的方法使用一个系统管道整合了 DNA 甲基化、基因表达和拷贝数改变数据,该管道包括数据预处理、类不平衡处理、降维和分类。该框架从 MutSigCV 开始,进行数据预处理并确保数据质量。应用合成少数群体过度采样技术(SMOTE)向上采样技术来处理类不平衡表示。然后,主成分分析法(PCA)将 DNA 甲基化、基因表达和拷贝数改变数据转化为潜在空间。这样做的目的是摒弃无关变异,提取相关信息。最后,根据转换后的多组学数据建立一个统一表示的分类模型。该框架有助于理解更年期与乳腺癌之间复杂的相互作用,从而揭示未来更精确的诊断和治疗策略。基于 XGBoost 回归器的可解释人工智能模型 Shapley 显示了所选基因表达预测绝经状态的能力,潜在的生物标志物包括 RUNX1、PTEN、MAP3K1 和 CDH1。文献证实了这些发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Machine Learning Model for Multiomics Biomarkers Identification for Menopause Status in Breast Cancer
Identifying menopause-related breast cancer biomarkers is crucial for enhancing diagnosis, prognosis, and personalized treatment at that stage of the patient’s life. In this paper, we present a comprehensive framework for extracting multiomics biomarkers specifically related to breast cancer incidence before and after menopause. Our approach integrates DNA methylation, gene expression, and copy number alteration data using a systematic pipeline encompassing data preprocessing and handling class imbalance, dimensionality reduction, and classification. The framework starts with MutSigCV for data preprocessing and ensuring data quality. The Synthetic Minority Over-sampling Technique (SMOTE) up-sampling technique is applied to address the class imbalance representation. Then, Principal Component Analysis (PCA) transforms the DNA methylation, gene expression, and copy number alteration data into a latent space. The purpose is to discard irrelevant variations and extract relevant information. Finally, a classification model is built based on the transformed multiomics data into a unified representation. The framework contributes to understanding the complex interplay between menopause and breast cancer, thereby revealing more precise diagnostic and therapeutic strategies in the future. The explainable artificial intelligence model Shapley based on the XGBoost regressor showed the power of the selected gene expressions for predicting the menopause status, and the potential biomarkers included RUNX1, PTEN, MAP3K1, and CDH1. The literature confirmed the findings.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Algorithms
Algorithms Mathematics-Numerical Analysis
CiteScore
4.10
自引率
4.30%
发文量
394
审稿时长
11 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信