Da Zhang, Lihong Zhao, Bo Guo, Aihong Guo, Jiangbo Ding, Dongdong Tong, Bingju Wang, Zhangjian Zhou
{"title":"Integrated Machine Learning Algorithms-Enhanced Predication for Cervical Cancer from Mass Spectrometry-Based Proteomics Data.","authors":"Da Zhang, Lihong Zhao, Bo Guo, Aihong Guo, Jiangbo Ding, Dongdong Tong, Bingju Wang, Zhangjian Zhou","doi":"10.3390/bioengineering12030269","DOIUrl":null,"url":null,"abstract":"<p><p>Early diagnosis is critical for improving outcomes in cancer patients; however, the application of diagnostic markers derived from serum proteomic screening remains challenging. Artificial intelligence (AI), encompassing deep learning and machine learning (ML), has gained increasing prominence across various scientific disciplines. In this study, we utilized cervical cancer (CC) as a model to develop an AI-driven pipeline for the identification and validation of serum biomarkers for early cancer diagnosis, leveraging mass spectrometry-based proteomics data. By processing and normalizing serum polypeptide differential peaks from 240 patients, we employed eight distinct ML algorithms to classify and analyze these differential polypeptide peaks, subsequently constructing receiver operating characteristic (ROC) curves and confusion matrices. Key performance metrics, including accuracy, precision, recall, and F1 score, were systematically evaluated. Furthermore, by integrating feature importance values, Shapley values, and local interpretable model-agnostic explanation (LIME) values, we demonstrated that the diagnostic area under the curve (AUC) achieved by our multi-dimensional learning models approached 1, significantly outperforming the diagnostic AUC of single markers derived from the PRIDE database. These findings underscore the potential of proteomics-driven integrated machine learning as a robust strategy to enhance early cancer diagnosis, offering a promising avenue for clinical translation.</p>","PeriodicalId":8874,"journal":{"name":"Bioengineering","volume":"12 3","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11939187/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioengineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/bioengineering12030269","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Early diagnosis is critical for improving outcomes in cancer patients; however, the application of diagnostic markers derived from serum proteomic screening remains challenging. Artificial intelligence (AI), encompassing deep learning and machine learning (ML), has gained increasing prominence across various scientific disciplines. In this study, we utilized cervical cancer (CC) as a model to develop an AI-driven pipeline for the identification and validation of serum biomarkers for early cancer diagnosis, leveraging mass spectrometry-based proteomics data. By processing and normalizing serum polypeptide differential peaks from 240 patients, we employed eight distinct ML algorithms to classify and analyze these differential polypeptide peaks, subsequently constructing receiver operating characteristic (ROC) curves and confusion matrices. Key performance metrics, including accuracy, precision, recall, and F1 score, were systematically evaluated. Furthermore, by integrating feature importance values, Shapley values, and local interpretable model-agnostic explanation (LIME) values, we demonstrated that the diagnostic area under the curve (AUC) achieved by our multi-dimensional learning models approached 1, significantly outperforming the diagnostic AUC of single markers derived from the PRIDE database. These findings underscore the potential of proteomics-driven integrated machine learning as a robust strategy to enhance early cancer diagnosis, offering a promising avenue for clinical translation.
早期诊断对于改善癌症患者的预后至关重要;然而,应用从血清蛋白质组筛查中得出的诊断标志物仍具有挑战性。人工智能(AI),包括深度学习和机器学习(ML),在各个科学学科中的地位日益突出。在本研究中,我们以宫颈癌(CC)为模型,利用基于质谱的蛋白质组学数据,开发了一个人工智能驱动的管道,用于识别和验证早期癌症诊断的血清生物标记物。通过对 240 名患者的血清多肽差异峰进行处理和归一化,我们采用了八种不同的 ML 算法对这些差异多肽峰进行分类和分析,随后构建了接收者操作特征曲线(ROC)和混淆矩阵。对准确率、精确度、召回率和 F1 分数等关键性能指标进行了系统评估。此外,通过整合特征重要性值、Shapley 值和局部可解释模型-诊断解释(LIME)值,我们证明了多维学习模型的诊断曲线下面积(AUC)接近 1,明显优于从 PRIDE 数据库中提取的单一标记物的诊断 AUC。这些发现凸显了蛋白质组学驱动的集成机器学习作为加强早期癌症诊断的有力策略的潜力,为临床转化提供了一条前景广阔的途径。
期刊介绍:
Aims
Bioengineering (ISSN 2306-5354) provides an advanced forum for the science and technology of bioengineering. It publishes original research papers, comprehensive reviews, communications and case reports. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. All aspects of bioengineering are welcomed from theoretical concepts to education and applications. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. There are, in addition, four key features of this Journal:
● We are introducing a new concept in scientific and technical publications “The Translational Case Report in Bioengineering”. It is a descriptive explanatory analysis of a transformative or translational event. Understanding that the goal of bioengineering scholarship is to advance towards a transformative or clinical solution to an identified transformative/clinical need, the translational case report is used to explore causation in order to find underlying principles that may guide other similar transformative/translational undertakings.
● Manuscripts regarding research proposals and research ideas will be particularly welcomed.
● Electronic files and software regarding the full details of the calculation and experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.
● We also accept manuscripts communicating to a broader audience with regard to research projects financed with public funds.
Scope
● Bionics and biological cybernetics: implantology; bio–abio interfaces
● Bioelectronics: wearable electronics; implantable electronics; “more than Moore” electronics; bioelectronics devices
● Bioprocess and biosystems engineering and applications: bioprocess design; biocatalysis; bioseparation and bioreactors; bioinformatics; bioenergy; etc.
● Biomolecular, cellular and tissue engineering and applications: tissue engineering; chromosome engineering; embryo engineering; cellular, molecular and synthetic biology; metabolic engineering; bio-nanotechnology; micro/nano technologies; genetic engineering; transgenic technology
● Biomedical engineering and applications: biomechatronics; biomedical electronics; biomechanics; biomaterials; biomimetics; biomedical diagnostics; biomedical therapy; biomedical devices; sensors and circuits; biomedical imaging and medical information systems; implants and regenerative medicine; neurotechnology; clinical engineering; rehabilitation engineering
● Biochemical engineering and applications: metabolic pathway engineering; modeling and simulation
● Translational bioengineering