Xiaowei Qin , Zhibin Bi , Wenbin Li , Huipeng Zhang , Ming Han , Kongxi Zhang , Jian Wu , Lei Huang
{"title":"基于机器学习的血浆来源的细胞外囊泡特征用于消化系统癌症预测","authors":"Xiaowei Qin , Zhibin Bi , Wenbin Li , Huipeng Zhang , Ming Han , Kongxi Zhang , Jian Wu , Lei Huang","doi":"10.1016/j.cmpb.2025.109064","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Digestive system cancers (DSCs) represent a heterogeneous group of malignancies characterized by a poor prognosis and a lack of accurate early diagnostic methods. While traditional serological biomarkers and non-coding RNA continue to be commonly diagnostic marker for these cancers, their sensitivity and specificity in detection are often limited. RNA in plasma-derived extracellular vesicles (PDEV) has emerged as a promising diagnostic tool for a variety of cancers, but its application in the detection of various DSCs has not yet been fully explored.</div></div><div><h3>Methods</h3><div>By integrating PDEV sequencing data from the exoRBase 2.0 database, a total of 444 participants were included in the study, including 326 patients of DSCs, and 118 healthy individuals. The dataset was divided into training and test sets. The PDEV-diagnostic model was constructed using various machine learning algorithms and underwent 5-fold cross-validation in the training sets. The model's performance metrics were further evaluated in the test set. Additionally, the features were assessed using bulk RNA-seq and single RNA-seq datasets for different DSCs.</div></div><div><h3>Results</h3><div>Based on various feature selection methods and a comparison of 10 machine learning algorithms using seven metrics, the XGBoost model was selected as the PDEV-diagnostic model, with an AUC of 0.83 and 0.94 in the training and test sets, respectively, and 9 exosome predictors, including BANK1, MALAT1, FGA, UBR4, ILR-7,FGB, PLPP5,PCAT19, and CIITA for DSCs prediction.</div></div><div><h3>Conclusions</h3><div>The machine learning-based PDEV diagnostic models exhibit remarkable accuracy in identifying patients of DSCs. These nine exosomal mRNAs/lncRNAs consequently showed promise as non-invasive biomarkers for DSCs diagnosis.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"272 ","pages":"Article 109064"},"PeriodicalIF":4.8000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning-based plasma-derived extracellular vesicle signatures for digestive system cancers prediction\",\"authors\":\"Xiaowei Qin , Zhibin Bi , Wenbin Li , Huipeng Zhang , Ming Han , Kongxi Zhang , Jian Wu , Lei Huang\",\"doi\":\"10.1016/j.cmpb.2025.109064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Digestive system cancers (DSCs) represent a heterogeneous group of malignancies characterized by a poor prognosis and a lack of accurate early diagnostic methods. While traditional serological biomarkers and non-coding RNA continue to be commonly diagnostic marker for these cancers, their sensitivity and specificity in detection are often limited. RNA in plasma-derived extracellular vesicles (PDEV) has emerged as a promising diagnostic tool for a variety of cancers, but its application in the detection of various DSCs has not yet been fully explored.</div></div><div><h3>Methods</h3><div>By integrating PDEV sequencing data from the exoRBase 2.0 database, a total of 444 participants were included in the study, including 326 patients of DSCs, and 118 healthy individuals. The dataset was divided into training and test sets. The PDEV-diagnostic model was constructed using various machine learning algorithms and underwent 5-fold cross-validation in the training sets. The model's performance metrics were further evaluated in the test set. Additionally, the features were assessed using bulk RNA-seq and single RNA-seq datasets for different DSCs.</div></div><div><h3>Results</h3><div>Based on various feature selection methods and a comparison of 10 machine learning algorithms using seven metrics, the XGBoost model was selected as the PDEV-diagnostic model, with an AUC of 0.83 and 0.94 in the training and test sets, respectively, and 9 exosome predictors, including BANK1, MALAT1, FGA, UBR4, ILR-7,FGB, PLPP5,PCAT19, and CIITA for DSCs prediction.</div></div><div><h3>Conclusions</h3><div>The machine learning-based PDEV diagnostic models exhibit remarkable accuracy in identifying patients of DSCs. These nine exosomal mRNAs/lncRNAs consequently showed promise as non-invasive biomarkers for DSCs diagnosis.</div></div>\",\"PeriodicalId\":10624,\"journal\":{\"name\":\"Computer methods and programs in biomedicine\",\"volume\":\"272 \",\"pages\":\"Article 109064\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S016926072500481X\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016926072500481X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Machine learning-based plasma-derived extracellular vesicle signatures for digestive system cancers prediction
Background
Digestive system cancers (DSCs) represent a heterogeneous group of malignancies characterized by a poor prognosis and a lack of accurate early diagnostic methods. While traditional serological biomarkers and non-coding RNA continue to be commonly diagnostic marker for these cancers, their sensitivity and specificity in detection are often limited. RNA in plasma-derived extracellular vesicles (PDEV) has emerged as a promising diagnostic tool for a variety of cancers, but its application in the detection of various DSCs has not yet been fully explored.
Methods
By integrating PDEV sequencing data from the exoRBase 2.0 database, a total of 444 participants were included in the study, including 326 patients of DSCs, and 118 healthy individuals. The dataset was divided into training and test sets. The PDEV-diagnostic model was constructed using various machine learning algorithms and underwent 5-fold cross-validation in the training sets. The model's performance metrics were further evaluated in the test set. Additionally, the features were assessed using bulk RNA-seq and single RNA-seq datasets for different DSCs.
Results
Based on various feature selection methods and a comparison of 10 machine learning algorithms using seven metrics, the XGBoost model was selected as the PDEV-diagnostic model, with an AUC of 0.83 and 0.94 in the training and test sets, respectively, and 9 exosome predictors, including BANK1, MALAT1, FGA, UBR4, ILR-7,FGB, PLPP5,PCAT19, and CIITA for DSCs prediction.
Conclusions
The machine learning-based PDEV diagnostic models exhibit remarkable accuracy in identifying patients of DSCs. These nine exosomal mRNAs/lncRNAs consequently showed promise as non-invasive biomarkers for DSCs diagnosis.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.