Supervised Learning of Protein Melting Temperature: Cross-Species vs. Species-Specific Prediction.

IF 2.8 4区生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY

Proteins-Structure Function and Bioinformatics Pub Date : 2025-07-14 DOI:10.1002/prot.70019

Sebastián García López, Jesper Salomon, Wouter Boomsma

{"title":"Supervised Learning of Protein Melting Temperature: Cross-Species vs. Species-Specific Prediction.","authors":"Sebastián García López, Jesper Salomon, Wouter Boomsma","doi":"10.1002/prot.70019","DOIUrl":null,"url":null,"abstract":"<p><p>Protein melting temperatures are important proxies for stability, and frequently probed in protein engineering campaigns, for instance for enzyme discovery and protein optimization. With the emergence of large datasets of melting temperatures for diverse natural proteins, it has become possible to train models to predict this quantity, and the literature has reported impressive performance values in terms of Spearman rho. The high correlation scores suggest that it should be possible to accurately predict melting temperature changes in engineered variants, and to reliably identify naturally thermostable proteins. However, in practice, results in these settings are often disappointing. In this paper, we explore this apparent discrepancy. We show that Spearman rho over cross-species data gives an overly optimistic impression of prediction performance, and that this metric reflects the ability to distinguish global differences in amino acid composition between species, rather than the specific effects of genetic variation. We proceed by investigating whether cross-species training on melting temperature is beneficial at all, compared to training specific models for each species. We address this question using four different transfer-learning approaches and a fine-tuning procedure. Surprisingly, we consistently find no benefit of cross-species training. We conclude that (1) current models for supervised prediction of melting temperature perform substantially worse than the literature suggests, and (2) that reliable transfer across species is still a challenging problem. An implementation of this work is available at https://github.com/deltadedirac/thermocontrast_tm.</p>","PeriodicalId":56271,"journal":{"name":"Proteins-Structure Function and Bioinformatics","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteins-Structure Function and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.70019","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Protein melting temperatures are important proxies for stability, and frequently probed in protein engineering campaigns, for instance for enzyme discovery and protein optimization. With the emergence of large datasets of melting temperatures for diverse natural proteins, it has become possible to train models to predict this quantity, and the literature has reported impressive performance values in terms of Spearman rho. The high correlation scores suggest that it should be possible to accurately predict melting temperature changes in engineered variants, and to reliably identify naturally thermostable proteins. However, in practice, results in these settings are often disappointing. In this paper, we explore this apparent discrepancy. We show that Spearman rho over cross-species data gives an overly optimistic impression of prediction performance, and that this metric reflects the ability to distinguish global differences in amino acid composition between species, rather than the specific effects of genetic variation. We proceed by investigating whether cross-species training on melting temperature is beneficial at all, compared to training specific models for each species. We address this question using four different transfer-learning approaches and a fine-tuning procedure. Surprisingly, we consistently find no benefit of cross-species training. We conclude that (1) current models for supervised prediction of melting temperature perform substantially worse than the literature suggests, and (2) that reliable transfer across species is still a challenging problem. An implementation of this work is available at https://github.com/deltadedirac/thermocontrast_tm.

查看原文本刊更多论文

蛋白质熔化温度的监督学习：跨物种vs物种特异性预测。

蛋白质熔化温度是稳定性的重要指标，在蛋白质工程活动中经常被探测，例如酶发现和蛋白质优化。随着各种天然蛋白质融化温度的大型数据集的出现，已经有可能训练模型来预测这个数量，并且文献已经报告了令人印象深刻的Spearman rho性能值。高相关分数表明，应该有可能准确预测工程变异体的熔化温度变化，并可靠地识别天然耐热性蛋白质。然而，在实践中，这些设置的结果往往令人失望。在本文中，我们探讨了这种明显的差异。我们表明，Spearman rho在跨物种数据上给出了一个过于乐观的预测性能印象，并且这个度量反映了区分物种之间氨基酸组成的全球差异的能力，而不是遗传变异的具体影响。我们继续调查是否跨物种的熔融温度训练是有益的，相比训练特定模型为每个物种。我们使用四种不同的迁移学习方法和一个微调程序来解决这个问题。令人惊讶的是，我们一直发现跨物种训练没有任何好处。我们得出的结论是：(1)目前的有监督的熔融温度预测模型的表现比文献所表明的要差得多；(2)跨物种的可靠转移仍然是一个具有挑战性的问题。这项工作的实现可以在https://github.com/deltadedirac/thermocontrast_tm上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proteins-Structure Function and Bioinformatics 生物-生化与分子生物学

CiteScore

5.90

自引率

3.40%

发文量

172

审稿时长

3 months

期刊介绍： PROTEINS : Structure, Function, and Bioinformatics publishes original reports of significant experimental and analytic research in all areas of protein research: structure, function, computation, genetics, and design. The journal encourages reports that present new experimental or computational approaches for interpreting and understanding data from biophysical chemistry, structural studies of proteins and macromolecular assemblies, alterations of protein structure and function engineered through techniques of molecular biology and genetics, functional analyses under physiologic conditions, as well as the interactions of proteins with receptors, nucleic acids, or other specific ligands or substrates. Research in protein and peptide biochemistry directed toward synthesizing or characterizing molecules that simulate aspects of the activity of proteins, or that act as inhibitors of protein function, is also within the scope of PROTEINS. In addition to full-length reports, short communications (usually not more than 4 printed pages) and prediction reports are welcome. Reviews are typically by invitation; authors are encouraged to submit proposed topics for consideration.