TCNeKP: A Novel Deep Learning Architecture for Enzyme Catalytic Activity Prediction.

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling Pub Date : 2025-10-14 DOI:10.1021/acs.jcim.5c01830

Yuanyuan Lei,Rui Liu,Hanxi Yu,Wentao Xu,Ting Long,Hu Mei

{"title":"TCNeKP: A Novel Deep Learning Architecture for Enzyme Catalytic Activity Prediction.","authors":"Yuanyuan Lei,Rui Liu,Hanxi Yu,Wentao Xu,Ting Long,Hu Mei","doi":"10.1021/acs.jcim.5c01830","DOIUrl":null,"url":null,"abstract":"Accurate prediction of enzyme kinetic parameters (Kcat and Km) is crucial for enzyme rational design and engineering research. Based on a heterogeneous data set encompassing 17,893 Kcat and 24,585 Km records across 8911 enzyme sequences from 7 EC classes and 5023 substrates, we introduce novel TCNeKP models for predicting Kcat and Km values. Herein, enzymes' sequences were autoembedded and processed by a temporal convolutional network (TCN) module to extract the key features of catalytic and binding residues frequently located far apart in the primary sequences; substrates were encoded by a pretrained SMILES-Transformer language model; and catalytic conditions (pH and temperature) were encoded via radial basis function (RBF). The fused features were then fed into a fully connected network for single-task prediction of Kcat and Km. Results demonstrate that TCNeKP-Kcat and TCNeKP-Km models achieve robust performance across wild-type and mutant enzymes from 7 EC classes, outperforming state-of-the-art MPEK, UniKP, and DLKcat models (Table S3). Leveraging a cross-task dynamic parameter-sharing module with attention mechanism, we further developed a multitask TCNeKP model that achieves the highest R2 values among the benchmark models for both Kcat (0.677) and Km (0.657) prediction. These findings indicate that collaborative learning between Kcat and Km prediction tasks enhances feature extraction for enzyme-substrate binding and catalysis, thereby significantly enhancing the predictive performance.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"37 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.5c01830","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate prediction of enzyme kinetic parameters (Kcat and Km) is crucial for enzyme rational design and engineering research. Based on a heterogeneous data set encompassing 17,893 Kcat and 24,585 Km records across 8911 enzyme sequences from 7 EC classes and 5023 substrates, we introduce novel TCNeKP models for predicting Kcat and Km values. Herein, enzymes' sequences were autoembedded and processed by a temporal convolutional network (TCN) module to extract the key features of catalytic and binding residues frequently located far apart in the primary sequences; substrates were encoded by a pretrained SMILES-Transformer language model; and catalytic conditions (pH and temperature) were encoded via radial basis function (RBF). The fused features were then fed into a fully connected network for single-task prediction of Kcat and Km. Results demonstrate that TCNeKP-Kcat and TCNeKP-Km models achieve robust performance across wild-type and mutant enzymes from 7 EC classes, outperforming state-of-the-art MPEK, UniKP, and DLKcat models (Table S3). Leveraging a cross-task dynamic parameter-sharing module with attention mechanism, we further developed a multitask TCNeKP model that achieves the highest R2 values among the benchmark models for both Kcat (0.677) and Km (0.657) prediction. These findings indicate that collaborative learning between Kcat and Km prediction tasks enhances feature extraction for enzyme-substrate binding and catalysis, thereby significantly enhancing the predictive performance.

查看原文本刊更多论文

TCNeKP：一种用于酶催化活性预测的新型深度学习架构。

酶的动力学参数（Kcat和Km）的准确预测对酶的合理设计和工程研究至关重要。基于异构数据集，包括来自7个EC类和5023个底物的8911个酶序列的17,893 Kcat和24,585 Km记录，我们引入了新的TCNeKP模型来预测Kcat和Km值。该方法利用时序卷积网络（TCN）模块对酶的序列进行自动嵌入和处理，提取出在初级序列中相距较远的催化残基和结合残基的关键特征；用预训练的SMILES-Transformer语言模型对底物进行编码；通过径向基函数（RBF）编码催化条件（pH和温度）。然后将融合的特征输入到一个全连接网络中，用于单任务预测Kcat和Km。结果表明，TCNeKP-Kcat和TCNeKP-Km模型在7种EC类的野生型和突变型酶中都具有强大的性能，优于最先进的MPEK、UniKP和DLKcat模型（表S3）。利用具有注意机制的跨任务动态参数共享模块，我们进一步开发了一个多任务TCNeKP模型，该模型在Kcat（0.677）和Km（0.657）预测的基准模型中获得了最高的R2值。这些发现表明，Kcat和Km预测任务之间的协同学习增强了酶-底物结合和催化的特征提取，从而显著提高了预测性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.