Sebastián García López, Jesper Salomon, Wouter Boomsma
{"title":"Supervised Learning of Protein Melting Temperature: Cross-Species vs. Species-Specific Prediction.","authors":"Sebastián García López, Jesper Salomon, Wouter Boomsma","doi":"10.1002/prot.70019","DOIUrl":null,"url":null,"abstract":"<p><p>Protein melting temperatures are important proxies for stability, and frequently probed in protein engineering campaigns, for instance for enzyme discovery and protein optimization. With the emergence of large datasets of melting temperatures for diverse natural proteins, it has become possible to train models to predict this quantity, and the literature has reported impressive performance values in terms of Spearman rho. The high correlation scores suggest that it should be possible to accurately predict melting temperature changes in engineered variants, and to reliably identify naturally thermostable proteins. However, in practice, results in these settings are often disappointing. In this paper, we explore this apparent discrepancy. We show that Spearman rho over cross-species data gives an overly optimistic impression of prediction performance, and that this metric reflects the ability to distinguish global differences in amino acid composition between species, rather than the specific effects of genetic variation. We proceed by investigating whether cross-species training on melting temperature is beneficial at all, compared to training specific models for each species. We address this question using four different transfer-learning approaches and a fine-tuning procedure. Surprisingly, we consistently find no benefit of cross-species training. We conclude that (1) current models for supervised prediction of melting temperature perform substantially worse than the literature suggests, and (2) that reliable transfer across species is still a challenging problem. An implementation of this work is available at https://github.com/deltadedirac/thermocontrast_tm.</p>","PeriodicalId":56271,"journal":{"name":"Proteins-Structure Function and Bioinformatics","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteins-Structure Function and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.70019","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Protein melting temperatures are important proxies for stability, and frequently probed in protein engineering campaigns, for instance for enzyme discovery and protein optimization. With the emergence of large datasets of melting temperatures for diverse natural proteins, it has become possible to train models to predict this quantity, and the literature has reported impressive performance values in terms of Spearman rho. The high correlation scores suggest that it should be possible to accurately predict melting temperature changes in engineered variants, and to reliably identify naturally thermostable proteins. However, in practice, results in these settings are often disappointing. In this paper, we explore this apparent discrepancy. We show that Spearman rho over cross-species data gives an overly optimistic impression of prediction performance, and that this metric reflects the ability to distinguish global differences in amino acid composition between species, rather than the specific effects of genetic variation. We proceed by investigating whether cross-species training on melting temperature is beneficial at all, compared to training specific models for each species. We address this question using four different transfer-learning approaches and a fine-tuning procedure. Surprisingly, we consistently find no benefit of cross-species training. We conclude that (1) current models for supervised prediction of melting temperature perform substantially worse than the literature suggests, and (2) that reliable transfer across species is still a challenging problem. An implementation of this work is available at https://github.com/deltadedirac/thermocontrast_tm.
期刊介绍:
PROTEINS : Structure, Function, and Bioinformatics publishes original reports of significant experimental and analytic research in all areas of protein research: structure, function, computation, genetics, and design. The journal encourages reports that present new experimental or computational approaches for interpreting and understanding data from biophysical chemistry, structural studies of proteins and macromolecular assemblies, alterations of protein structure and function engineered through techniques of molecular biology and genetics, functional analyses under physiologic conditions, as well as the interactions of proteins with receptors, nucleic acids, or other specific ligands or substrates. Research in protein and peptide biochemistry directed toward synthesizing or characterizing molecules that simulate aspects of the activity of proteins, or that act as inhibitors of protein function, is also within the scope of PROTEINS. In addition to full-length reports, short communications (usually not more than 4 printed pages) and prediction reports are welcome. Reviews are typically by invitation; authors are encouraged to submit proposed topics for consideration.