MWENA：一种基于样本重加权的新型算法，用于疾病分类和使用细胞外囊泡组学数据的数据解释。

IF 3.7 2区生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

BMC Genomics Pub Date : 2025-09-29 DOI:10.1186/s12864-025-12093-9

Shuilin Liao, Haonan Long, Qi Zhu, Shoujiang Li, Le Li, Shanghui Lu, Nan Tang, Yong Liang, Ming Dong

{"title":"MWENA：一种基于样本重加权的新型算法，用于疾病分类和使用细胞外囊泡组学数据的数据解释。","authors":"Shuilin Liao, Haonan Long, Qi Zhu, Shoujiang Li, Le Li, Shanghui Lu, Nan Tang, Yong Liang, Ming Dong","doi":"10.1186/s12864-025-12093-9","DOIUrl":null,"url":null,"abstract":"Background and objective: Extracellular vesicles (EVs), considered as a form of liquid biopsy, have gained significant attention in recent years due to their stability and the preservation of disease markers. Research studies underscore the clinical significance of molecules found in EVs, highlighting their role as communicative mediators between cells. However, analyzing this data is challenging due to noisy measurements, having far more variables than samples, and some groups (e.g., disease subtypes or experimental conditions) having much less data than others. We therefore develop an algorithm to address aforementioned challenges for the classification of imbalanced EVs omics data.Methods and results: We propose the EV Meta-Weight Elastic Net Algorithm (MWENA), which utilizes logistic regression with elastic net regularization for the classification and identification of EV signatures, effectively addressing the challenges posed by high-dimensional small sample sizes. To mitigate issues related to class imbalance and high noise levels, MWENA incorporates an automatic sample re-weighting function, which uses a meta-net to adaptively learn generalizable patterns directly from the data itself. We validate the MWENA algorithm on both simulated data and EVs omics data, covering six classification tasks that involve four different types of diseases (pancreatic ductal adenocarcinoma, interstitial lung diseases, colorectal cancer, and ovarian cancer) and three clinical scenarios (disease diagnosis, disease-stage screening, and disease-subtype classification). Compared to other machine learning methods, MWENA demonstrates superiority in identifying small class samples and achieves the highest scores in both sensitivity and G-means. Biological analysis is also performed to further explore the significance of selected signatures as biological markers and their roles in disease mechanisms.Conclusions: We anticipate that our proposed approach will take a modest step in harnessing EV omics data to discover biomarkers, aiding researchers in gaining a comprehensive understanding of biological processes.","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"26 1","pages":"872"},"PeriodicalIF":3.7000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482335/pdf/","citationCount":"0","resultStr":"{\"title\":\"MWENA: a novel sample re-weighting-based algorithm for disease classification and data interpretation using extracellular vesicles omics data.\",\"authors\":\"Shuilin Liao, Haonan Long, Qi Zhu, Shoujiang Li, Le Li, Shanghui Lu, Nan Tang, Yong Liang, Ming Dong\",\"doi\":\"10.1186/s12864-025-12093-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background and objective: Extracellular vesicles (EVs), considered as a form of liquid biopsy, have gained significant attention in recent years due to their stability and the preservation of disease markers. Research studies underscore the clinical significance of molecules found in EVs, highlighting their role as communicative mediators between cells. However, analyzing this data is challenging due to noisy measurements, having far more variables than samples, and some groups (e.g., disease subtypes or experimental conditions) having much less data than others. We therefore develop an algorithm to address aforementioned challenges for the classification of imbalanced EVs omics data.Methods and results: We propose the EV Meta-Weight Elastic Net Algorithm (MWENA), which utilizes logistic regression with elastic net regularization for the classification and identification of EV signatures, effectively addressing the challenges posed by high-dimensional small sample sizes. To mitigate issues related to class imbalance and high noise levels, MWENA incorporates an automatic sample re-weighting function, which uses a meta-net to adaptively learn generalizable patterns directly from the data itself. We validate the MWENA algorithm on both simulated data and EVs omics data, covering six classification tasks that involve four different types of diseases (pancreatic ductal adenocarcinoma, interstitial lung diseases, colorectal cancer, and ovarian cancer) and three clinical scenarios (disease diagnosis, disease-stage screening, and disease-subtype classification). Compared to other machine learning methods, MWENA demonstrates superiority in identifying small class samples and achieves the highest scores in both sensitivity and G-means. Biological analysis is also performed to further explore the significance of selected signatures as biological markers and their roles in disease mechanisms.Conclusions: We anticipate that our proposed approach will take a modest step in harnessing EV omics data to discover biomarkers, aiding researchers in gaining a comprehensive understanding of biological processes.\",\"PeriodicalId\":9030,\"journal\":{\"name\":\"BMC Genomics\",\"volume\":\"26 1\",\"pages\":\"872\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482335/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12864-025-12093-9\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12864-025-12093-9","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景和目的：细胞外囊泡（EVs）被认为是液体活检的一种形式，近年来由于其稳定性和疾病标志物的保存而受到了极大的关注。研究强调了在ev中发现的分子的临床意义，强调了它们作为细胞间交流介质的作用。然而，分析这些数据是具有挑战性的，因为有噪声的测量，比样本有更多的变量，并且一些组（例如，疾病亚型或实验条件）的数据比其他组少得多。因此，我们开发了一种算法来解决上述不平衡电动汽车组学数据分类的挑战。方法与结果：我们提出了EV元权重弹性网络算法（MWENA），该算法利用逻辑回归和弹性网络正则化对EV特征进行分类和识别，有效地解决了高维小样本量带来的挑战。为了缓解与类不平衡和高噪声水平相关的问题，MWENA结合了一个自动样本重加权函数，该函数使用元网络直接从数据本身中自适应地学习可推广的模式。我们在模拟数据和ev组学数据上验证了MWENA算法，涵盖了涉及四种不同类型疾病（胰腺导管腺癌、肺间质性疾病、结直肠癌和卵巢癌）和三种临床场景（疾病诊断、疾病分期筛查和疾病亚型分类）的六种分类任务。与其他机器学习方法相比，MWENA在识别小类样本方面表现出优势，在灵敏度和g均值方面都取得了最高分。我们还进行了生物学分析，以进一步探索所选特征作为生物学标记的意义及其在疾病机制中的作用。结论：我们预计我们提出的方法将在利用EV组学数据发现生物标志物方面迈出适度的一步，帮助研究人员全面了解生物过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MWENA: a novel sample re-weighting-based algorithm for disease classification and data interpretation using extracellular vesicles omics data.

Background and objective: Extracellular vesicles (EVs), considered as a form of liquid biopsy, have gained significant attention in recent years due to their stability and the preservation of disease markers. Research studies underscore the clinical significance of molecules found in EVs, highlighting their role as communicative mediators between cells. However, analyzing this data is challenging due to noisy measurements, having far more variables than samples, and some groups (e.g., disease subtypes or experimental conditions) having much less data than others. We therefore develop an algorithm to address aforementioned challenges for the classification of imbalanced EVs omics data.

Methods and results: We propose the EV Meta-Weight Elastic Net Algorithm (MWENA), which utilizes logistic regression with elastic net regularization for the classification and identification of EV signatures, effectively addressing the challenges posed by high-dimensional small sample sizes. To mitigate issues related to class imbalance and high noise levels, MWENA incorporates an automatic sample re-weighting function, which uses a meta-net to adaptively learn generalizable patterns directly from the data itself. We validate the MWENA algorithm on both simulated data and EVs omics data, covering six classification tasks that involve four different types of diseases (pancreatic ductal adenocarcinoma, interstitial lung diseases, colorectal cancer, and ovarian cancer) and three clinical scenarios (disease diagnosis, disease-stage screening, and disease-subtype classification). Compared to other machine learning methods, MWENA demonstrates superiority in identifying small class samples and achieves the highest scores in both sensitivity and G-means. Biological analysis is also performed to further explore the significance of selected signatures as biological markers and their roles in disease mechanisms.

Conclusions: We anticipate that our proposed approach will take a modest step in harnessing EV omics data to discover biomarkers, aiding researchers in gaining a comprehensive understanding of biological processes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Genomics 生物-生物工程与应用微生物

CiteScore

7.40

自引率

4.50%

发文量

769

审稿时长

6.4 months

期刊介绍： BMC Genomics is an open access, peer-reviewed journal that considers articles on all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.