缺失模态下酶周转预测的多模态深度学习框架

IF 7 2区 医学 Q1 BIOLOGY
Xin Sun , Yu Guang Wang , Yiqing Shen
{"title":"缺失模态下酶周转预测的多模态深度学习框架","authors":"Xin Sun ,&nbsp;Yu Guang Wang ,&nbsp;Yiqing Shen","doi":"10.1016/j.compbiomed.2025.110348","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate prediction of the turnover number (<span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span>), which quantifies the maximum rate of substrate conversion at an enzyme’s active site, is essential for assessing catalytic efficiency and understanding biochemical reaction mechanisms. Traditional wet-lab measurements of <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> are time-consuming and resource-intensive, making deep learning (DL) methods an appealing alternative. However, existing DL models often overlook the impact of reaction products on <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> due to feedback inhibition, resulting in suboptimal performance. The multimodal nature of this <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> prediction task, involving enzymes, substrates, and products as inputs, presents additional challenges when certain modalities are unavailable during inference due to incomplete data or experimental constraints, leading to the inapplicability of existing DL models. To address these limitations, we introduce <strong>MMKcat</strong>, a novel framework employing a prior-knowledge-guided missing modality training mechanism, which treats substrates and enzyme sequences as essential inputs while considering other modalities as maskable terms. Moreover, an innovative auxiliary regularizer is incorporated to encourage the learning of informative features from various modal combinations, enabling robust predictions even with incomplete multimodal inputs. We demonstrate the superior performance of MMKcat compared to state-of-the-art methods, including DLKcat, TurNup, UniKP, EITLEM-Kinetic, DLTKcat and GELKcat, using BRENDA and SABIO-RK. Our results show significant improvements under both complete and missing modality scenarios in RMSE, <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>, and SRCC metrics, with average improvements of 6.41%, 22.18%, and 8.15%, respectively. Codes are available at <span><span>https://github.com/ProEcho1/MMKcat</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"193 ","pages":"Article 110348"},"PeriodicalIF":7.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A multimodal deep learning framework for enzyme turnover prediction with missing modality\",\"authors\":\"Xin Sun ,&nbsp;Yu Guang Wang ,&nbsp;Yiqing Shen\",\"doi\":\"10.1016/j.compbiomed.2025.110348\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate prediction of the turnover number (<span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span>), which quantifies the maximum rate of substrate conversion at an enzyme’s active site, is essential for assessing catalytic efficiency and understanding biochemical reaction mechanisms. Traditional wet-lab measurements of <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> are time-consuming and resource-intensive, making deep learning (DL) methods an appealing alternative. However, existing DL models often overlook the impact of reaction products on <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> due to feedback inhibition, resulting in suboptimal performance. The multimodal nature of this <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> prediction task, involving enzymes, substrates, and products as inputs, presents additional challenges when certain modalities are unavailable during inference due to incomplete data or experimental constraints, leading to the inapplicability of existing DL models. To address these limitations, we introduce <strong>MMKcat</strong>, a novel framework employing a prior-knowledge-guided missing modality training mechanism, which treats substrates and enzyme sequences as essential inputs while considering other modalities as maskable terms. Moreover, an innovative auxiliary regularizer is incorporated to encourage the learning of informative features from various modal combinations, enabling robust predictions even with incomplete multimodal inputs. We demonstrate the superior performance of MMKcat compared to state-of-the-art methods, including DLKcat, TurNup, UniKP, EITLEM-Kinetic, DLTKcat and GELKcat, using BRENDA and SABIO-RK. Our results show significant improvements under both complete and missing modality scenarios in RMSE, <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>, and SRCC metrics, with average improvements of 6.41%, 22.18%, and 8.15%, respectively. Codes are available at <span><span>https://github.com/ProEcho1/MMKcat</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":10578,\"journal\":{\"name\":\"Computers in biology and medicine\",\"volume\":\"193 \",\"pages\":\"Article 110348\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2025-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in biology and medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010482525006997\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525006997","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

准确预测周转数(kcat),量化酶活性位点底物转化的最大速率,对于评估催化效率和理解生化反应机制至关重要。传统的湿实验室测量kcat耗时且资源密集,这使得深度学习(DL)方法成为一种有吸引力的替代方法。然而,现有的深度学习模型往往由于反馈抑制而忽略了反应产物对kcat的影响,导致性能不理想。该kcat预测任务的多模态性质涉及酶、底物和产物作为输入,当由于数据不完整或实验限制而在推理过程中无法使用某些模态时,会带来额外的挑战,从而导致现有DL模型的不适用性。为了解决这些限制,我们引入了MMKcat,这是一个采用先验知识引导的缺失模态训练机制的新框架,它将底物和酶序列视为基本输入,同时将其他模态视为可屏蔽项。此外,一个创新的辅助正则化器被纳入,以鼓励从各种模态组合中学习信息特征,即使在不完整的多模态输入下也能实现鲁棒预测。我们使用BRENDA和SABIO-RK证明了MMKcat与最先进的方法(包括DLKcat、TurNup、UniKP、EITLEM-Kinetic、DLTKcat和GELKcat)相比具有优越的性能。我们的研究结果显示,在RMSE、R2和SRCC指标上,在完全和缺失模态情况下都有显著的改善,平均分别提高了6.41%、22.18%和8.15%。代码可在https://github.com/ProEcho1/MMKcat上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A multimodal deep learning framework for enzyme turnover prediction with missing modality
Accurate prediction of the turnover number (kcat), which quantifies the maximum rate of substrate conversion at an enzyme’s active site, is essential for assessing catalytic efficiency and understanding biochemical reaction mechanisms. Traditional wet-lab measurements of kcat are time-consuming and resource-intensive, making deep learning (DL) methods an appealing alternative. However, existing DL models often overlook the impact of reaction products on kcat due to feedback inhibition, resulting in suboptimal performance. The multimodal nature of this kcat prediction task, involving enzymes, substrates, and products as inputs, presents additional challenges when certain modalities are unavailable during inference due to incomplete data or experimental constraints, leading to the inapplicability of existing DL models. To address these limitations, we introduce MMKcat, a novel framework employing a prior-knowledge-guided missing modality training mechanism, which treats substrates and enzyme sequences as essential inputs while considering other modalities as maskable terms. Moreover, an innovative auxiliary regularizer is incorporated to encourage the learning of informative features from various modal combinations, enabling robust predictions even with incomplete multimodal inputs. We demonstrate the superior performance of MMKcat compared to state-of-the-art methods, including DLKcat, TurNup, UniKP, EITLEM-Kinetic, DLTKcat and GELKcat, using BRENDA and SABIO-RK. Our results show significant improvements under both complete and missing modality scenarios in RMSE, R2, and SRCC metrics, with average improvements of 6.41%, 22.18%, and 8.15%, respectively. Codes are available at https://github.com/ProEcho1/MMKcat.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信