Alexandra E. Paton, Daniil A. Boiko, Jonathan C. Perkins, Nicholas I. Cemalovic, Thiago Reschützegger, Gabe Gomes, Alison R. H. Narayan
{"title":"Connecting chemical and protein sequence space to predict biocatalytic reactions","authors":"Alexandra E. Paton, Daniil A. Boiko, Jonathan C. Perkins, Nicholas I. Cemalovic, Thiago Reschützegger, Gabe Gomes, Alison R. H. Narayan","doi":"10.1038/s41586-025-09519-5","DOIUrl":null,"url":null,"abstract":"The application of biocatalysis in synthesis has the potential to offer streamlined routes towards target molecules1, tunable catalyst-controlled selectivity2, as well as processes with improved sustainability3. Despite these advantages, biocatalysis is often a high-risk strategy to implement, as identifying an enzyme capable of performing chemistry on a specific intermediate required for a synthesis can be a roadblock that requires extensive screening of enzymes and protein engineering to overcome4. Strategies for predicting which enzyme and small molecule are compatible have been hindered by the lack of well-studied biocatalytic reaction datasets5. The underexploration of connections between chemical and protein sequence space constrains navigation between these two landscapes. Here we report a two-phase effort relying on high-throughput experimentation to populate connections between productive substrate and enzyme pairs and the subsequent development of a tool, CATNIP, for predicting compatible α-ketoglutarate (α-KG)/Fe(ii)-dependent enzymes for a given substrate or, conversely, for ranking potential substrates for a given α-KG/Fe(ii)-dependent enzyme sequence. We anticipate that our approach can be readily expanded to further enzyme and transformation classes and will derisk the investigation and application of biocatalytic methods. A two-phase machine-learning-based tool making use of high-throughput experimentation is introduced to examine the connections between chemical and protein sequence space and predict productive biocatalytic reactions among substrate and enzyme pairs.","PeriodicalId":18787,"journal":{"name":"Nature","volume":"646 8083","pages":"108-116"},"PeriodicalIF":48.5000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.comhttps://www.nature.com/articles/s41586-025-09519-5.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature","FirstCategoryId":"103","ListUrlMain":"https://www.nature.com/articles/s41586-025-09519-5","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
The application of biocatalysis in synthesis has the potential to offer streamlined routes towards target molecules1, tunable catalyst-controlled selectivity2, as well as processes with improved sustainability3. Despite these advantages, biocatalysis is often a high-risk strategy to implement, as identifying an enzyme capable of performing chemistry on a specific intermediate required for a synthesis can be a roadblock that requires extensive screening of enzymes and protein engineering to overcome4. Strategies for predicting which enzyme and small molecule are compatible have been hindered by the lack of well-studied biocatalytic reaction datasets5. The underexploration of connections between chemical and protein sequence space constrains navigation between these two landscapes. Here we report a two-phase effort relying on high-throughput experimentation to populate connections between productive substrate and enzyme pairs and the subsequent development of a tool, CATNIP, for predicting compatible α-ketoglutarate (α-KG)/Fe(ii)-dependent enzymes for a given substrate or, conversely, for ranking potential substrates for a given α-KG/Fe(ii)-dependent enzyme sequence. We anticipate that our approach can be readily expanded to further enzyme and transformation classes and will derisk the investigation and application of biocatalytic methods. A two-phase machine-learning-based tool making use of high-throughput experimentation is introduced to examine the connections between chemical and protein sequence space and predict productive biocatalytic reactions among substrate and enzyme pairs.
期刊介绍:
Nature is a prestigious international journal that publishes peer-reviewed research in various scientific and technological fields. The selection of articles is based on criteria such as originality, importance, interdisciplinary relevance, timeliness, accessibility, elegance, and surprising conclusions. In addition to showcasing significant scientific advances, Nature delivers rapid, authoritative, insightful news, and interpretation of current and upcoming trends impacting science, scientists, and the broader public. The journal serves a dual purpose: firstly, to promptly share noteworthy scientific advances and foster discussions among scientists, and secondly, to ensure the swift dissemination of scientific results globally, emphasizing their significance for knowledge, culture, and daily life.