Grant M Landwehr, Jonathan W Bogart, Carol Magalhaes, Eric Hammarlund, Ashty S Karim, Michael C Jewett
{"title":"Accelerated enzyme engineering by machine-learning guided cell-free expression","authors":"Grant M Landwehr, Jonathan W Bogart, Carol Magalhaes, Eric Hammarlund, Ashty S Karim, Michael C Jewett","doi":"10.1101/2024.07.30.605672","DOIUrl":null,"url":null,"abstract":"Enzyme engineering is limited by the challenge of rapidly generating and using large datasets of sequence-function relationships for predictive design. To address this challenge, we developed a machine learning (ML)-guided platform that integrates cell-free DNA assembly, cell-free gene expression, and functional assays to rapidly map fitness landscapes across protein sequence space and optimize enzymes for multiple, distinct chemical reactions. We applied this platform to engineer amide synthetases by evaluating substrate preference for 1,217 enzyme variants in 10,953 unique reactions. We used these data to build augmented ridge regression ML models for predicting amide synthetase variants capable of making 9 small molecule pharmaceuticals. Our ML-guided, cell-free framework promises to accelerate enzyme engineering by enabling iterative exploration of protein sequence space to build specialized biocatalysts in parallel.","PeriodicalId":501408,"journal":{"name":"bioRxiv - Synthetic Biology","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Synthetic Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.30.605672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Enzyme engineering is limited by the challenge of rapidly generating and using large datasets of sequence-function relationships for predictive design. To address this challenge, we developed a machine learning (ML)-guided platform that integrates cell-free DNA assembly, cell-free gene expression, and functional assays to rapidly map fitness landscapes across protein sequence space and optimize enzymes for multiple, distinct chemical reactions. We applied this platform to engineer amide synthetases by evaluating substrate preference for 1,217 enzyme variants in 10,953 unique reactions. We used these data to build augmented ridge regression ML models for predicting amide synthetase variants capable of making 9 small molecule pharmaceuticals. Our ML-guided, cell-free framework promises to accelerate enzyme engineering by enabling iterative exploration of protein sequence space to build specialized biocatalysts in parallel.
酶工程受限于快速生成和使用大量序列-功能关系数据集进行预测性设计的挑战。为了应对这一挑战,我们开发了一个机器学习(ML)指导的平台,该平台整合了无细胞 DNA 组装、无细胞基因表达和功能检测,可快速绘制整个蛋白质序列空间的适应性景观,并针对多种不同的化学反应优化酶。我们将该平台应用于酰胺合成酶的工程化,评估了 10953 个独特反应中 1,217 个酶变体的底物偏好。我们利用这些数据建立了增强脊回归 ML 模型,用于预测能够制造 9 种小分子药物的酰胺合成酶变体。我们的以 ML 为指导的无细胞框架有望通过迭代探索蛋白质序列空间来并行构建专门的生物催化剂,从而加速酶工程。