LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning.

IF 3.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

ACS Synthetic Biology Pub Date : 2025-01-17 Epub Date: 2024-12-24 DOI:10.1021/acssynbio.4c00625

Yueming Long, Ariane Mora, Francesca-Zhoufan Li, Emre Gürsoy, Kadina E Johnston, Frances H Arnold

{"title":"LevSeq: Rapid Generation of Sequence-Function Data for Directed Evolution and Machine Learning.","authors":"Yueming Long, Ariane Mora, Francesca-Zhoufan Li, Emre Gürsoy, Kadina E Johnston, Frances H Arnold","doi":"10.1021/acssynbio.4c00625","DOIUrl":null,"url":null,"abstract":"<p><p>Sequence-function data provides valuable information about the protein functional landscape but is rarely obtained during directed evolution campaigns. Here, we present Long-read every variant Sequencing (LevSeq), a pipeline that combines a dual barcoding strategy with nanopore sequencing to rapidly generate sequence-function data for entire protein-coding genes. LevSeq integrates into existing protein engineering workflows and comes with open-source software for data analysis and visualization. The pipeline facilitates data-driven protein engineering by consolidating sequence-function data to inform directed evolution and provide the requisite data for machine learning-guided protein engineering (MLPE). LevSeq enables quality control of mutagenesis libraries prior to screening, which reduces time and resource costs. Simulation studies demonstrate LevSeq's ability to accurately detect variants under various experimental conditions. Finally, we show LevSeq's utility in engineering protoglobins for new-to-nature chemistry. Widespread adoption of LevSeq and sharing of the data will enhance our understanding of protein sequence-function landscapes and empower data-driven directed evolution.</p>","PeriodicalId":26,"journal":{"name":"ACS Synthetic Biology","volume":" ","pages":"230-238"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Synthetic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1021/acssynbio.4c00625","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/24 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Sequence-function data provides valuable information about the protein functional landscape but is rarely obtained during directed evolution campaigns. Here, we present Long-read every variant Sequencing (LevSeq), a pipeline that combines a dual barcoding strategy with nanopore sequencing to rapidly generate sequence-function data for entire protein-coding genes. LevSeq integrates into existing protein engineering workflows and comes with open-source software for data analysis and visualization. The pipeline facilitates data-driven protein engineering by consolidating sequence-function data to inform directed evolution and provide the requisite data for machine learning-guided protein engineering (MLPE). LevSeq enables quality control of mutagenesis libraries prior to screening, which reduces time and resource costs. Simulation studies demonstrate LevSeq's ability to accurately detect variants under various experimental conditions. Finally, we show LevSeq's utility in engineering protoglobins for new-to-nature chemistry. Widespread adoption of LevSeq and sharing of the data will enhance our understanding of protein sequence-function landscapes and empower data-driven directed evolution.

查看原文本刊更多论文

用于定向进化和机器学习的序列函数数据的快速生成。

序列功能数据提供了有关蛋白质功能景观的宝贵信息，但很少在定向进化运动中获得。在这里，我们提出了长读每变体测序（LevSeq），这是一种结合了双条形码策略和纳米孔测序的管道，可以快速生成整个蛋白质编码基因的序列功能数据。LevSeq集成到现有的蛋白质工程工作流程中，并附带用于数据分析和可视化的开源软件。该管道通过整合序列功能数据来促进数据驱动的蛋白质工程，为定向进化提供信息，并为机器学习引导的蛋白质工程（MLPE）提供必要的数据。LevSeq能够在筛选之前对诱变文库进行质量控制，从而减少了时间和资源成本。仿真研究证明了LevSeq在各种实验条件下准确检测变异的能力。最后，我们展示了LevSeq在新自然化学工程原珠蛋白中的应用。广泛采用LevSeq和共享数据将增强我们对蛋白质序列功能景观的理解，并赋予数据驱动的定向进化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACS Synthetic Biology 生物-

CiteScore

8.00

自引率

10.60%

发文量

380

审稿时长

6-12 weeks

期刊介绍： The journal is particularly interested in studies on the design and synthesis of new genetic circuits and gene products; computational methods in the design of systems; and integrative applied approaches to understanding disease and metabolism. Topics may include, but are not limited to: Design and optimization of genetic systems Genetic circuit design and their principles for their organization into programs Computational methods to aid the design of genetic systems Experimental methods to quantify genetic parts, circuits, and metabolic fluxes Genetic parts libraries: their creation, analysis, and ontological representation Protein engineering including computational design Metabolic engineering and cellular manufacturing, including biomass conversion Natural product access, engineering, and production Creative and innovative applications of cellular programming Medical applications, tissue engineering, and the programming of therapeutic cells Minimal cell design and construction Genomics and genome replacement strategies Viral engineering Automated and robotic assembly platforms for synthetic biology DNA synthesis methodologies Metagenomics and synthetic metagenomic analysis Bioinformatics applied to gene discovery, chemoinformatics, and pathway construction Gene optimization Methods for genome-scale measurements of transcription and metabolomics Systems biology and methods to integrate multiple data sources in vitro and cell-free synthetic biology and molecular programming Nucleic acid engineering.