Matthew J. K. Vince, Kristin A. Hughes, Anastasiya Buzuk, Deborah L. Perlstein, Lauren A. Viarengo-Baker, Adrian Whitty
{"title":"实验生物化学家和其他分子科学家的机器学习入门","authors":"Matthew J. K. Vince, Kristin A. Hughes, Anastasiya Buzuk, Deborah L. Perlstein, Lauren A. Viarengo-Baker, Adrian Whitty","doi":"10.1002/cpz1.70085","DOIUrl":null,"url":null,"abstract":"<p>Machine learning (ML) is rapidly gaining traction in many areas of experimental molecular science for elucidating relationships and patterns in large or complex data sets. Historically, ML was largely the preserve of those with specialized training in fields such as statistics or cheminformatics. Increasingly, however, ML methodologies are becoming part of the standard toolkit for experimental scientists across a range of disciplines. For scientists without a significant background in computer science or statistics, lowering the barrier of entry to these ML techniques is important to broadening access to these powerful methods. Here we provide detailed, step-by-step protocols for performing four ML methods that are particularly useful for applications in biochemistry, cell biology, and drug discovery: hierarchical clustering, principal component analysis (PCA), partial least squares discriminant analysis (PLSDA), and partial least squares regression (PLSR). The protocols are written for the widely used software MATLAB, but no prior experience with MATLAB is required to use them. We include an explanation of each step, pitched at a level to be understood by investigators without any prior experience with ML, MATLAB, or any kind of coding. We also highlight the scientific issues pertaining to selecting and scaling the data to be analyzed. Throughout, we emphasize the relationship between the scientific question and how to choose data and methods that will allow it to be addressed in a meaningful way. Our aim is to provide a basic introduction that will equip experimental chemical biologists, chemists, and other biomedical scientists with the knowledge required to use ML to aid in the design of experiments, the formulation and data-driven testing of hypotheses, and the analysis of experimental data. © 2025 Wiley Periodicals LLC.</p><p><b>Basic Protocol 1</b>: Clustering</p><p><b>Basic Protocol 2</b>: Principal component analysis</p><p><b>Basic Protocol 3</b>: Partial least squares-discriminant analysis</p><p><b>Basic Protocol 4</b>: Partial least squares regression</p>","PeriodicalId":93970,"journal":{"name":"Current protocols","volume":"5 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Getting Started with Machine Learning for Experimental Biochemists and Other Molecular Scientists\",\"authors\":\"Matthew J. K. Vince, Kristin A. Hughes, Anastasiya Buzuk, Deborah L. Perlstein, Lauren A. Viarengo-Baker, Adrian Whitty\",\"doi\":\"10.1002/cpz1.70085\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Machine learning (ML) is rapidly gaining traction in many areas of experimental molecular science for elucidating relationships and patterns in large or complex data sets. Historically, ML was largely the preserve of those with specialized training in fields such as statistics or cheminformatics. Increasingly, however, ML methodologies are becoming part of the standard toolkit for experimental scientists across a range of disciplines. For scientists without a significant background in computer science or statistics, lowering the barrier of entry to these ML techniques is important to broadening access to these powerful methods. Here we provide detailed, step-by-step protocols for performing four ML methods that are particularly useful for applications in biochemistry, cell biology, and drug discovery: hierarchical clustering, principal component analysis (PCA), partial least squares discriminant analysis (PLSDA), and partial least squares regression (PLSR). The protocols are written for the widely used software MATLAB, but no prior experience with MATLAB is required to use them. We include an explanation of each step, pitched at a level to be understood by investigators without any prior experience with ML, MATLAB, or any kind of coding. We also highlight the scientific issues pertaining to selecting and scaling the data to be analyzed. Throughout, we emphasize the relationship between the scientific question and how to choose data and methods that will allow it to be addressed in a meaningful way. Our aim is to provide a basic introduction that will equip experimental chemical biologists, chemists, and other biomedical scientists with the knowledge required to use ML to aid in the design of experiments, the formulation and data-driven testing of hypotheses, and the analysis of experimental data. © 2025 Wiley Periodicals LLC.</p><p><b>Basic Protocol 1</b>: Clustering</p><p><b>Basic Protocol 2</b>: Principal component analysis</p><p><b>Basic Protocol 3</b>: Partial least squares-discriminant analysis</p><p><b>Basic Protocol 4</b>: Partial least squares regression</p>\",\"PeriodicalId\":93970,\"journal\":{\"name\":\"Current protocols\",\"volume\":\"5 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current protocols\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cpz1.70085\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current protocols","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpz1.70085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0