Bing-Xue Du , Haoyang Yu , Bei Zhu , Yahui Long , Min Wu , Jian-Yu Shi
{"title":"A novel interpretability framework for enzyme turnover number prediction boosted by pre-trained enzyme embeddings and adaptive gate network","authors":"Bing-Xue Du , Haoyang Yu , Bei Zhu , Yahui Long , Min Wu , Jian-Yu Shi","doi":"10.1016/j.ymeth.2025.02.010","DOIUrl":null,"url":null,"abstract":"<div><div>It is a vital step to identify the enzyme turnover number (kcat) in synthetic biology and early-stage drug discovery. Recently, deep learning methods have achieved inspiring process to predict kcat with the development of multi-species enzyme-substrate pairs turnover number data. However, the performance of existing approaches still heavily depends on the effectiveness of feature extraction for enzymes and substrates, as well as the optimal fusion of these two types of features. Furthermore, it is essential to identify the key molecular substructures that significantly impact kcat prediction. To address these issues, we develop a novel end-to-end dual-representation interpretability framework GELKcat by harnessing graph transformers for substrate molecular encoding and CNNs for enzyme word2vec embeddings. We further integrate substrate and enzyme features using the adaptive gate network, which assigns optimal weights to capture the most suitable feature combinations. The comparison with several state-of-the-art methods demonstrates the superiority of our GELKcat and the ablation studies further illuminate the invaluable roles of three main components. Furthermore, case studies illustrate the interpretability of GELKcat by identifying the key functional groups in a substrate, which are significantly associated with turnover number. It is anticipated that this work can bridge current gaps in enzyme-substrate representation, which can give some guidance for drug discovery and synthetic biology.</div></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"237 ","pages":"Pages 45-52"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202325000519","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
It is a vital step to identify the enzyme turnover number (kcat) in synthetic biology and early-stage drug discovery. Recently, deep learning methods have achieved inspiring process to predict kcat with the development of multi-species enzyme-substrate pairs turnover number data. However, the performance of existing approaches still heavily depends on the effectiveness of feature extraction for enzymes and substrates, as well as the optimal fusion of these two types of features. Furthermore, it is essential to identify the key molecular substructures that significantly impact kcat prediction. To address these issues, we develop a novel end-to-end dual-representation interpretability framework GELKcat by harnessing graph transformers for substrate molecular encoding and CNNs for enzyme word2vec embeddings. We further integrate substrate and enzyme features using the adaptive gate network, which assigns optimal weights to capture the most suitable feature combinations. The comparison with several state-of-the-art methods demonstrates the superiority of our GELKcat and the ablation studies further illuminate the invaluable roles of three main components. Furthermore, case studies illustrate the interpretability of GELKcat by identifying the key functional groups in a substrate, which are significantly associated with turnover number. It is anticipated that this work can bridge current gaps in enzyme-substrate representation, which can give some guidance for drug discovery and synthetic biology.
期刊介绍:
Methods focuses on rapidly developing techniques in the experimental biological and medical sciences.
Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.