{"title":"蛋白质组学差异分析的无监督机器学习","authors":"Guanyang Xu, Enhui Wu, Yuxiang Lin, Ling Lin, Liang Qiao","doi":"10.1021/acs.analchem.5c03117","DOIUrl":null,"url":null,"abstract":"Differential analysis in proteomics is pivotal for biomarker discovery and disease mechanism elucidation, yet traditional statistical methods are constrained by distributional assumptions and empirical fold change threshold dependencies. This study systematically evaluates 18 unsupervised anomaly detection machine learning (ML) algorithms against the established statistical frameworks for differential protein detection from proteomic data sets. Using <i>in silico</i> simulated data sets derived from experimental data, we enabled cross-algorithm comparability through a probability based transformation. Results demonstrated that ML methods, particularly the Minimum Covariance Determinant (MCD), outperformed statistical test in recall, precision, and accuracy, with superior robustness to intersample heterogeneity. Validation on real-world proteomic data further confirmed that the MCD-identified differentially expressed proteins comprehensively covered canonical pathways while uncovering novel tumor-associated functional biomolecules. This work establishes unsupervised ML methods as robust alternatives to traditional hypothesis-driven statistical approaches in proteomics differential analysis, offering enhanced reliability for precision medicine research.","PeriodicalId":27,"journal":{"name":"Analytical Chemistry","volume":"19 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unsupervised Machine Learning for Differential Analysis in Proteomics\",\"authors\":\"Guanyang Xu, Enhui Wu, Yuxiang Lin, Ling Lin, Liang Qiao\",\"doi\":\"10.1021/acs.analchem.5c03117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Differential analysis in proteomics is pivotal for biomarker discovery and disease mechanism elucidation, yet traditional statistical methods are constrained by distributional assumptions and empirical fold change threshold dependencies. This study systematically evaluates 18 unsupervised anomaly detection machine learning (ML) algorithms against the established statistical frameworks for differential protein detection from proteomic data sets. Using <i>in silico</i> simulated data sets derived from experimental data, we enabled cross-algorithm comparability through a probability based transformation. Results demonstrated that ML methods, particularly the Minimum Covariance Determinant (MCD), outperformed statistical test in recall, precision, and accuracy, with superior robustness to intersample heterogeneity. Validation on real-world proteomic data further confirmed that the MCD-identified differentially expressed proteins comprehensively covered canonical pathways while uncovering novel tumor-associated functional biomolecules. This work establishes unsupervised ML methods as robust alternatives to traditional hypothesis-driven statistical approaches in proteomics differential analysis, offering enhanced reliability for precision medicine research.\",\"PeriodicalId\":27,\"journal\":{\"name\":\"Analytical Chemistry\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2025-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Analytical Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.analchem.5c03117\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, ANALYTICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.analchem.5c03117","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
Unsupervised Machine Learning for Differential Analysis in Proteomics
Differential analysis in proteomics is pivotal for biomarker discovery and disease mechanism elucidation, yet traditional statistical methods are constrained by distributional assumptions and empirical fold change threshold dependencies. This study systematically evaluates 18 unsupervised anomaly detection machine learning (ML) algorithms against the established statistical frameworks for differential protein detection from proteomic data sets. Using in silico simulated data sets derived from experimental data, we enabled cross-algorithm comparability through a probability based transformation. Results demonstrated that ML methods, particularly the Minimum Covariance Determinant (MCD), outperformed statistical test in recall, precision, and accuracy, with superior robustness to intersample heterogeneity. Validation on real-world proteomic data further confirmed that the MCD-identified differentially expressed proteins comprehensively covered canonical pathways while uncovering novel tumor-associated functional biomolecules. This work establishes unsupervised ML methods as robust alternatives to traditional hypothesis-driven statistical approaches in proteomics differential analysis, offering enhanced reliability for precision medicine research.
期刊介绍:
Analytical Chemistry, a peer-reviewed research journal, focuses on disseminating new and original knowledge across all branches of analytical chemistry. Fundamental articles may explore general principles of chemical measurement science and need not directly address existing or potential analytical methodology. They can be entirely theoretical or report experimental results. Contributions may cover various phases of analytical operations, including sampling, bioanalysis, electrochemistry, mass spectrometry, microscale and nanoscale systems, environmental analysis, separations, spectroscopy, chemical reactions and selectivity, instrumentation, imaging, surface analysis, and data processing. Papers discussing known analytical methods should present a significant, original application of the method, a notable improvement, or results on an important analyte.