Fine-tuning Protein Language Models with Deep Mutational Scanning improves Variant Effect Prediction

arXiv - QuanBio - Genomics Pub Date : 2024-05-10 DOI:arxiv-2405.06729

Aleix Lafita, Ferran Gonzalez, Mahmoud Hossam, Paul Smyth, Jacob Deasy, Ari Allyn-Feuer, Daniel Seaton, Stephen Young

引用次数: 0

Abstract

Protein Language Models (PLMs) have emerged as performant and scalable tools for predicting the functional impact and clinical significance of protein-coding variants, but they still lag experimental accuracy. Here, we present a novel fine-tuning approach to improve the performance of PLMs with experimental maps of variant effects from Deep Mutational Scanning (DMS) assays using a Normalised Log-odds Ratio (NLR) head. We find consistent improvements in a held-out protein test set, and on independent DMS and clinical variant annotation benchmarks from ProteinGym and ClinVar. These findings demonstrate that DMS is a promising source of sequence diversity and supervised training data for improving the performance of PLMs for variant effect prediction.

查看原文本刊更多论文

利用深度突变扫描微调蛋白质语言模型，提高变异效应预测能力

蛋白质语言模型（PLMs）已成为预测蛋白质编码变异的功能影响和临床意义的高性能、可扩展的工具，但其准确性仍落后于实验准确性。在这里，我们提出了一种新颖的微调方法，利用归一化对数比率（NLR）头，通过深度突变扫描（DMS）测定的变异效应实验图来提高 PLM 的性能。我们发现，DMS 和来自 ProteinGym 和 ClinVar 的临床变异注释基准在蛋白质测试集、独立 DMS 和临床变异注释基准上都有一致的改进。这些研究结果表明，DMS 是序列多样性和监督训练数据的理想来源，可以提高 PLM 在变异效应预测方面的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - QuanBio - Genomics

自引率

0.00%

发文量