Asmaa M. Hassan, Safaa M. Naeem, Mohamed A. A. Eldosoky, Mai S. Mabrouk
{"title":"基于多组学的机器学习用于乳腺癌亚型分类","authors":"Asmaa M. Hassan, Safaa M. Naeem, Mohamed A. A. Eldosoky, Mai S. Mabrouk","doi":"10.1007/s13369-024-09341-7","DOIUrl":null,"url":null,"abstract":"<p>Cancer is a complicated disease that produces deregulatory changes in cellular activities (such as proteins). Data from these levels must be integrated into multi-omics analyses to better understand cancer and its progression. Deep learning approaches have recently helped with multi-omics analysis of cancer data. Breast cancer is a prevalent form of cancer among women, resulting from a multitude of clinical, lifestyle, social, and economic factors. The goal of this study was to predict breast cancer using several machine learning methods. We applied the architecture for mono-omics data analysis of the Cancer Genome Atlas Breast Cancer datasets in our analytical investigation. The following classifiers were used: random forest, partial least squares, Naive Bayes, decision trees, neural networks, and Lasso regularization. They were used and evaluated using the area under the curve metric. The random forest classifier and the Lasso regularization classifier achieved the highest area under the curve values of 0.99 each. These areas under the curve values were obtained using the mono-omics data employed in this investigation. The random forest and Lasso regularization classifiers achieved the maximum prediction accuracy, showing that they are appropriate for this problem. For all mono-omics classification models used in this paper, random forest and Lasso regression offer the best results for all metrics (precision, recall, and F1 score). The integration of various risk factors in breast cancer prediction modeling can aid in early diagnosis and treatment, utilizing data collection, storage, and intelligent systems for disease management. The integration of diverse risk factors in breast cancer prediction modeling holds promise for early diagnosis and treatment. Leveraging data collection, storage, and intelligent systems can further enhance disease management strategies, ultimately contributing to improved patient outcomes.</p>","PeriodicalId":8109,"journal":{"name":"Arabian Journal for Science and Engineering","volume":"2 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-omics-based Machine Learning for the Subtype Classification of Breast Cancer\",\"authors\":\"Asmaa M. Hassan, Safaa M. Naeem, Mohamed A. A. Eldosoky, Mai S. Mabrouk\",\"doi\":\"10.1007/s13369-024-09341-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Cancer is a complicated disease that produces deregulatory changes in cellular activities (such as proteins). Data from these levels must be integrated into multi-omics analyses to better understand cancer and its progression. Deep learning approaches have recently helped with multi-omics analysis of cancer data. Breast cancer is a prevalent form of cancer among women, resulting from a multitude of clinical, lifestyle, social, and economic factors. The goal of this study was to predict breast cancer using several machine learning methods. We applied the architecture for mono-omics data analysis of the Cancer Genome Atlas Breast Cancer datasets in our analytical investigation. The following classifiers were used: random forest, partial least squares, Naive Bayes, decision trees, neural networks, and Lasso regularization. They were used and evaluated using the area under the curve metric. The random forest classifier and the Lasso regularization classifier achieved the highest area under the curve values of 0.99 each. These areas under the curve values were obtained using the mono-omics data employed in this investigation. The random forest and Lasso regularization classifiers achieved the maximum prediction accuracy, showing that they are appropriate for this problem. For all mono-omics classification models used in this paper, random forest and Lasso regression offer the best results for all metrics (precision, recall, and F1 score). The integration of various risk factors in breast cancer prediction modeling can aid in early diagnosis and treatment, utilizing data collection, storage, and intelligent systems for disease management. The integration of diverse risk factors in breast cancer prediction modeling holds promise for early diagnosis and treatment. Leveraging data collection, storage, and intelligent systems can further enhance disease management strategies, ultimately contributing to improved patient outcomes.</p>\",\"PeriodicalId\":8109,\"journal\":{\"name\":\"Arabian Journal for Science and Engineering\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Arabian Journal for Science and Engineering\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1007/s13369-024-09341-7\",\"RegionNum\":4,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arabian Journal for Science and Engineering","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1007/s13369-024-09341-7","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0
摘要
癌症是一种复杂的疾病,会导致细胞活动(如蛋白质)发生脱节变化。必须将这些层面的数据整合到多组学分析中,才能更好地了解癌症及其进展。最近,深度学习方法为癌症数据的多组学分析提供了帮助。乳腺癌是女性中的一种常见癌症,由多种临床、生活方式、社会和经济因素导致。本研究的目标是使用多种机器学习方法预测乳腺癌。我们在分析调查中应用了癌症基因组图谱乳腺癌数据集的单组学数据分析架构。我们使用了以下分类器:随机森林、偏最小二乘、奈夫贝叶斯、决策树、神经网络和拉索正则化。我们使用曲线下面积指标对这些分类器进行了评估。随机森林分类器和 Lasso 正则化分类器的曲线下面积值最高,均为 0.99。这些曲线下面积值是使用本研究中使用的单组学数据获得的。随机森林分类器和 Lasso 正则化分类器的预测准确率最高,表明它们适用于这一问题。在本文使用的所有单组学分类模型中,随机森林和拉索回归在所有指标(精确度、召回率和 F1 分数)上都取得了最佳结果。在乳腺癌预测模型中整合各种风险因素有助于早期诊断和治疗,利用数据收集、存储和智能系统进行疾病管理。在乳腺癌预测建模中整合各种风险因素,有望实现早期诊断和治疗。利用数据收集、存储和智能系统可以进一步加强疾病管理策略,最终有助于改善患者的治疗效果。
Multi-omics-based Machine Learning for the Subtype Classification of Breast Cancer
Cancer is a complicated disease that produces deregulatory changes in cellular activities (such as proteins). Data from these levels must be integrated into multi-omics analyses to better understand cancer and its progression. Deep learning approaches have recently helped with multi-omics analysis of cancer data. Breast cancer is a prevalent form of cancer among women, resulting from a multitude of clinical, lifestyle, social, and economic factors. The goal of this study was to predict breast cancer using several machine learning methods. We applied the architecture for mono-omics data analysis of the Cancer Genome Atlas Breast Cancer datasets in our analytical investigation. The following classifiers were used: random forest, partial least squares, Naive Bayes, decision trees, neural networks, and Lasso regularization. They were used and evaluated using the area under the curve metric. The random forest classifier and the Lasso regularization classifier achieved the highest area under the curve values of 0.99 each. These areas under the curve values were obtained using the mono-omics data employed in this investigation. The random forest and Lasso regularization classifiers achieved the maximum prediction accuracy, showing that they are appropriate for this problem. For all mono-omics classification models used in this paper, random forest and Lasso regression offer the best results for all metrics (precision, recall, and F1 score). The integration of various risk factors in breast cancer prediction modeling can aid in early diagnosis and treatment, utilizing data collection, storage, and intelligent systems for disease management. The integration of diverse risk factors in breast cancer prediction modeling holds promise for early diagnosis and treatment. Leveraging data collection, storage, and intelligent systems can further enhance disease management strategies, ultimately contributing to improved patient outcomes.
期刊介绍:
King Fahd University of Petroleum & Minerals (KFUPM) partnered with Springer to publish the Arabian Journal for Science and Engineering (AJSE).
AJSE, which has been published by KFUPM since 1975, is a recognized national, regional and international journal that provides a great opportunity for the dissemination of research advances from the Kingdom of Saudi Arabia, MENA and the world.