Deep learning for NAD/NADP cofactor prediction and engineering using transformer attention analysis in enzymes

IF 6.8 1区生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY

Metabolic engineering Pub Date : 2025-01-01 DOI:10.1016/j.ymben.2024.11.007

Jaehyung Kim , Jihoon Woo , Joon Young Park , Kyung-Jin Kim , Donghyuk Kim

{"title":"Deep learning for NAD/NADP cofactor prediction and engineering using transformer attention analysis in enzymes","authors":"Jaehyung Kim , Jihoon Woo , Joon Young Park , Kyung-Jin Kim , Donghyuk Kim","doi":"10.1016/j.ymben.2024.11.007","DOIUrl":null,"url":null,"abstract":"<div><div>Understanding and manipulating the cofactor preferences of NAD(P)-dependent oxidoreductases, the most widely distributed enzyme group in nature, is increasingly crucial in bioengineering. However, large-scale identification of the cofactor preferences and the design of mutants to switch cofactor specificity remain as complex tasks. Here, we introduce DISCODE (Deep learning-based Iterative pipeline to analyze Specificity of COfactors and to Design Enzyme), a novel transformer-based deep learning model to predict NAD(P) cofactor preferences. For model training, a total of 7,132 NAD(P)-dependent enzyme sequences were collected. Leveraging whole-length sequence information, DISCODE classifies the cofactor preferences of NAD(P)-dependent oxidoreductase protein sequences without structural or taxonomic limitation. The model showed 97.4% and 97.3% of accuracy and F1 score, respectively. A notable feature of DISCODE is the interpretability of its transformer layers. Analysis of attention layers in the model enables identification of several residues that showed significantly higher attention weights. They were well aligned with structurally important residues that closely interact with NAD(P), facilitating the identification of key residues for determining cofactor specificities. These key residues showed high consistency with verified cofactor switching mutants. Integrated into an enzyme design pipeline, DISCODE coupled with attention analysis, enables a fully automated approach to redesign cofactor specificity.</div></div>","PeriodicalId":18483,"journal":{"name":"Metabolic engineering","volume":"87 ","pages":"Pages 86-94"},"PeriodicalIF":6.8000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Metabolic engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1096717624001496","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Understanding and manipulating the cofactor preferences of NAD(P)-dependent oxidoreductases, the most widely distributed enzyme group in nature, is increasingly crucial in bioengineering. However, large-scale identification of the cofactor preferences and the design of mutants to switch cofactor specificity remain as complex tasks. Here, we introduce DISCODE (Deep learning-based Iterative pipeline to analyze Specificity of COfactors and to Design Enzyme), a novel transformer-based deep learning model to predict NAD(P) cofactor preferences. For model training, a total of 7,132 NAD(P)-dependent enzyme sequences were collected. Leveraging whole-length sequence information, DISCODE classifies the cofactor preferences of NAD(P)-dependent oxidoreductase protein sequences without structural or taxonomic limitation. The model showed 97.4% and 97.3% of accuracy and F1 score, respectively. A notable feature of DISCODE is the interpretability of its transformer layers. Analysis of attention layers in the model enables identification of several residues that showed significantly higher attention weights. They were well aligned with structurally important residues that closely interact with NAD(P), facilitating the identification of key residues for determining cofactor specificities. These key residues showed high consistency with verified cofactor switching mutants. Integrated into an enzyme design pipeline, DISCODE coupled with attention analysis, enables a fully automated approach to redesign cofactor specificity.

Abstract Image

查看原文本刊更多论文

利用深度学习对酶中的转化注意分析进行 NAD/NADP 辅因子预测和工程设计。

NAD（P）依赖性氧化还原酶是自然界中分布最广的酶群，了解和操纵其辅助因子偏好在生物工程中越来越重要。然而，大规模鉴定辅因子偏好和设计突变体以转换辅因子特异性仍然是一项复杂的任务。在这里，我们介绍了 DISCODE（基于深度学习的迭代管道分析辅因子特异性并设计酶），这是一种新型的基于转换器的深度学习模型，用于预测 NAD(P) 辅因子的偏好。为训练模型，共收集了 7,132 个依赖 NAD(P) 的酶序列。利用全长序列信息，DISCODE对NAD(P)依赖性氧化还原酶蛋白质序列的辅因子偏好进行了分类，而不受结构或分类学的限制。该模型的准确率和F1得分分别为97.4%和97.3%。DISCODE 的一个显著特点是其转换层的可解释性。通过分析模型中的注意力层，可以识别出几个注意力权重明显较高的残基。这些残基与与 NAD(P) 密切相互作用的重要结构残基非常吻合，有助于确定辅助因子特异性的关键残基。这些关键残基与已验证的辅因子切换突变体具有高度一致性。将 DISCODE 与注意力分析集成到酶设计流水线中，就可以采用全自动方法重新设计辅因子特异性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Metabolic engineering 工程技术-生物工程与应用微生物

CiteScore

15.60

自引率

6.00%

发文量

140

审稿时长

44 days

期刊介绍： Metabolic Engineering (MBE) is a journal that focuses on publishing original research papers on the directed modulation of metabolic pathways for metabolite overproduction or the enhancement of cellular properties. It welcomes papers that describe the engineering of native pathways and the synthesis of heterologous pathways to convert microorganisms into microbial cell factories. The journal covers experimental, computational, and modeling approaches for understanding metabolic pathways and manipulating them through genetic, media, or environmental means. Effective exploration of metabolic pathways necessitates the use of molecular biology and biochemistry methods, as well as engineering techniques for modeling and data analysis. MBE serves as a platform for interdisciplinary research in fields such as biochemistry, molecular biology, applied microbiology, cellular physiology, cellular nutrition in health and disease, and biochemical engineering. The journal publishes various types of papers, including original research papers and review papers. It is indexed and abstracted in databases such as Scopus, Embase, EMBiology, Current Contents - Life Sciences and Clinical Medicine, Science Citation Index, PubMed/Medline, CAS and Biotechnology Citation Index.