Accurate top protein variant discovery via low-N pick-and-validate machine learning.

Cell systems Pub Date : 2024-02-21 Epub Date: 2024-02-09 DOI:10.1016/j.cels.2024.01.002
Hoi Yee Chu, John H C Fong, Dawn G L Thean, Peng Zhou, Frederic K C Fung, Yuanhua Huang, Alan S L Wong
{"title":"Accurate top protein variant discovery via low-N pick-and-validate machine learning.","authors":"Hoi Yee Chu, John H C Fong, Dawn G L Thean, Peng Zhou, Frederic K C Fung, Yuanhua Huang, Alan S L Wong","doi":"10.1016/j.cels.2024.01.002","DOIUrl":null,"url":null,"abstract":"<p><p>A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"193-203.e6"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.cels.2024.01.002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/9 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper's transparent peer review process is included in the supplemental information.

通过低 N 挑选和验证机器学习准确发现顶级蛋白质变体。
一种能在广阔的组合突变景观中以最少的实验工作量获得最佳变体数量的策略,对于提高蛋白质工程的资源可生产性将大有裨益。为了实现这一目标,我们提出了一种简单有效的基于机器学习的策略,其效果优于其他最先进的方法。我们的策略整合了零次预测和多轮采样,通过仅对少数预测的顶级变异进行实验来指导主动学习。我们发现,通过对 12 个变体进行四轮低 N 挑选和验证采样来进行机器学习,在组合突变体库中选出真正的前 1%变体时,准确率最高可达 92.6%,而对 24 个变体进行两轮采样也是可行的。我们展示了我们的策略,它成功地从包括基于CRISPR的基因组编辑器在内的不同家族中发现了高性能蛋白质变体,支持了它在解决蛋白质工程任务中的可推广应用。本文透明的同行评审过程记录包含在补充信息中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信