Shuilin Liao, Haonan Long, Qi Zhu, Shoujiang Li, Le Li, Shanghui Lu, Nan Tang, Yong Liang, Ming Dong
{"title":"MWENA:一种基于样本重加权的新型算法,用于疾病分类和使用细胞外囊泡组学数据的数据解释。","authors":"Shuilin Liao, Haonan Long, Qi Zhu, Shoujiang Li, Le Li, Shanghui Lu, Nan Tang, Yong Liang, Ming Dong","doi":"10.1186/s12864-025-12093-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objective: </strong>Extracellular vesicles (EVs), considered as a form of liquid biopsy, have gained significant attention in recent years due to their stability and the preservation of disease markers. Research studies underscore the clinical significance of molecules found in EVs, highlighting their role as communicative mediators between cells. However, analyzing this data is challenging due to noisy measurements, having far more variables than samples, and some groups (e.g., disease subtypes or experimental conditions) having much less data than others. We therefore develop an algorithm to address aforementioned challenges for the classification of imbalanced EVs omics data.</p><p><strong>Methods and results: </strong>We propose the EV Meta-Weight Elastic Net Algorithm (MWENA), which utilizes logistic regression with elastic net regularization for the classification and identification of EV signatures, effectively addressing the challenges posed by high-dimensional small sample sizes. To mitigate issues related to class imbalance and high noise levels, MWENA incorporates an automatic sample re-weighting function, which uses a meta-net to adaptively learn generalizable patterns directly from the data itself. We validate the MWENA algorithm on both simulated data and EVs omics data, covering six classification tasks that involve four different types of diseases (pancreatic ductal adenocarcinoma, interstitial lung diseases, colorectal cancer, and ovarian cancer) and three clinical scenarios (disease diagnosis, disease-stage screening, and disease-subtype classification). Compared to other machine learning methods, MWENA demonstrates superiority in identifying small class samples and achieves the highest scores in both sensitivity and G-means. Biological analysis is also performed to further explore the significance of selected signatures as biological markers and their roles in disease mechanisms.</p><p><strong>Conclusions: </strong>We anticipate that our proposed approach will take a modest step in harnessing EV omics data to discover biomarkers, aiding researchers in gaining a comprehensive understanding of biological processes.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"26 1","pages":"872"},"PeriodicalIF":3.7000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482335/pdf/","citationCount":"0","resultStr":"{\"title\":\"MWENA: a novel sample re-weighting-based algorithm for disease classification and data interpretation using extracellular vesicles omics data.\",\"authors\":\"Shuilin Liao, Haonan Long, Qi Zhu, Shoujiang Li, Le Li, Shanghui Lu, Nan Tang, Yong Liang, Ming Dong\",\"doi\":\"10.1186/s12864-025-12093-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and objective: </strong>Extracellular vesicles (EVs), considered as a form of liquid biopsy, have gained significant attention in recent years due to their stability and the preservation of disease markers. Research studies underscore the clinical significance of molecules found in EVs, highlighting their role as communicative mediators between cells. However, analyzing this data is challenging due to noisy measurements, having far more variables than samples, and some groups (e.g., disease subtypes or experimental conditions) having much less data than others. We therefore develop an algorithm to address aforementioned challenges for the classification of imbalanced EVs omics data.</p><p><strong>Methods and results: </strong>We propose the EV Meta-Weight Elastic Net Algorithm (MWENA), which utilizes logistic regression with elastic net regularization for the classification and identification of EV signatures, effectively addressing the challenges posed by high-dimensional small sample sizes. To mitigate issues related to class imbalance and high noise levels, MWENA incorporates an automatic sample re-weighting function, which uses a meta-net to adaptively learn generalizable patterns directly from the data itself. We validate the MWENA algorithm on both simulated data and EVs omics data, covering six classification tasks that involve four different types of diseases (pancreatic ductal adenocarcinoma, interstitial lung diseases, colorectal cancer, and ovarian cancer) and three clinical scenarios (disease diagnosis, disease-stage screening, and disease-subtype classification). Compared to other machine learning methods, MWENA demonstrates superiority in identifying small class samples and achieves the highest scores in both sensitivity and G-means. Biological analysis is also performed to further explore the significance of selected signatures as biological markers and their roles in disease mechanisms.</p><p><strong>Conclusions: </strong>We anticipate that our proposed approach will take a modest step in harnessing EV omics data to discover biomarkers, aiding researchers in gaining a comprehensive understanding of biological processes.</p>\",\"PeriodicalId\":9030,\"journal\":{\"name\":\"BMC Genomics\",\"volume\":\"26 1\",\"pages\":\"872\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482335/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12864-025-12093-9\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12864-025-12093-9","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
MWENA: a novel sample re-weighting-based algorithm for disease classification and data interpretation using extracellular vesicles omics data.
Background and objective: Extracellular vesicles (EVs), considered as a form of liquid biopsy, have gained significant attention in recent years due to their stability and the preservation of disease markers. Research studies underscore the clinical significance of molecules found in EVs, highlighting their role as communicative mediators between cells. However, analyzing this data is challenging due to noisy measurements, having far more variables than samples, and some groups (e.g., disease subtypes or experimental conditions) having much less data than others. We therefore develop an algorithm to address aforementioned challenges for the classification of imbalanced EVs omics data.
Methods and results: We propose the EV Meta-Weight Elastic Net Algorithm (MWENA), which utilizes logistic regression with elastic net regularization for the classification and identification of EV signatures, effectively addressing the challenges posed by high-dimensional small sample sizes. To mitigate issues related to class imbalance and high noise levels, MWENA incorporates an automatic sample re-weighting function, which uses a meta-net to adaptively learn generalizable patterns directly from the data itself. We validate the MWENA algorithm on both simulated data and EVs omics data, covering six classification tasks that involve four different types of diseases (pancreatic ductal adenocarcinoma, interstitial lung diseases, colorectal cancer, and ovarian cancer) and three clinical scenarios (disease diagnosis, disease-stage screening, and disease-subtype classification). Compared to other machine learning methods, MWENA demonstrates superiority in identifying small class samples and achieves the highest scores in both sensitivity and G-means. Biological analysis is also performed to further explore the significance of selected signatures as biological markers and their roles in disease mechanisms.
Conclusions: We anticipate that our proposed approach will take a modest step in harnessing EV omics data to discover biomarkers, aiding researchers in gaining a comprehensive understanding of biological processes.
期刊介绍:
BMC Genomics is an open access, peer-reviewed journal that considers articles on all aspects of genome-scale analysis, functional genomics, and proteomics.
BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.