Yanlin Mi, Ben Cahill, Venkata V B Yallapragada, Reut Rotem, Barry A O'Reilly, Sabin Tabirca
{"title":"AI-driven discovery of novel extracellular matrix biomarkers in pelvic organ prolapse.","authors":"Yanlin Mi, Ben Cahill, Venkata V B Yallapragada, Reut Rotem, Barry A O'Reilly, Sabin Tabirca","doi":"10.1371/journal.pcbi.1013483","DOIUrl":null,"url":null,"abstract":"<p><p>Deep learning for protein function prediction faces significant challenges in identifying disease-specific proteins. We present Extracellular Matrix Protein Predictor (EPOP), an advanced transfer learning framework leveraging protein language models to decode disease mechanisms. Focusing on pelvic organ prolapse (POP), which affects up to 50% of women worldwide, EPOP demonstrates AI's power to reveal novel therapeutic targets. We developed a sophisticated fine-tuning protocol for the ESM-2 model, optimized for ECM protein prediction. Our architecture integrates specialized attention mechanisms with interpretability modules, trained on expertly curated and balanced datasets totaling 80,000 proteins (40,000 ECM and 40,000 non-ECM). The framework employs a novel validation strategy using a 16,000-sample independent test set and clinical proteomics data. EPOP achieved unprecedented performance (99.40% accuracy) in ECM protein classification, significantly surpassing traditional deep learning architectures (10.81% improvement over Transformer models, 21.71% over Long Short-Term Memory). Applied to clinical samples, our model revealed a previously unknown pattern of ECM remodeling, identifying 24 novel disease-associated proteins. Model interpretability analysis uncovered specific sequence motifs and structural features critical for ECM protein function, providing mechanistic insights into disease progression. EPOP demonstrates how advanced AI bridges molecular analysis and clinical applications, uncovering novel therapeutic targets. Its success suggests broader applications across ECM-related disorders, potentially transforming approaches to diseases affecting connective tissue architecture.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":"21 10","pages":"e1013483"},"PeriodicalIF":3.6000,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503291/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1013483","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning for protein function prediction faces significant challenges in identifying disease-specific proteins. We present Extracellular Matrix Protein Predictor (EPOP), an advanced transfer learning framework leveraging protein language models to decode disease mechanisms. Focusing on pelvic organ prolapse (POP), which affects up to 50% of women worldwide, EPOP demonstrates AI's power to reveal novel therapeutic targets. We developed a sophisticated fine-tuning protocol for the ESM-2 model, optimized for ECM protein prediction. Our architecture integrates specialized attention mechanisms with interpretability modules, trained on expertly curated and balanced datasets totaling 80,000 proteins (40,000 ECM and 40,000 non-ECM). The framework employs a novel validation strategy using a 16,000-sample independent test set and clinical proteomics data. EPOP achieved unprecedented performance (99.40% accuracy) in ECM protein classification, significantly surpassing traditional deep learning architectures (10.81% improvement over Transformer models, 21.71% over Long Short-Term Memory). Applied to clinical samples, our model revealed a previously unknown pattern of ECM remodeling, identifying 24 novel disease-associated proteins. Model interpretability analysis uncovered specific sequence motifs and structural features critical for ECM protein function, providing mechanistic insights into disease progression. EPOP demonstrates how advanced AI bridges molecular analysis and clinical applications, uncovering novel therapeutic targets. Its success suggests broader applications across ECM-related disorders, potentially transforming approaches to diseases affecting connective tissue architecture.
期刊介绍:
PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery.
Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines.
Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights.
Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology.
Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.