缺失模态下酶周转预测的多模态深度学习框架

IF 7 2区医学 Q1 BIOLOGY

Computers in biology and medicine Pub Date : 2025-05-22 DOI:10.1016/j.compbiomed.2025.110348

Xin Sun , Yu Guang Wang , Yiqing Shen

{"title":"缺失模态下酶周转预测的多模态深度学习框架","authors":"Xin Sun , Yu Guang Wang , Yiqing Shen","doi":"10.1016/j.compbiomed.2025.110348","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate prediction of the turnover number (<span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span>), which quantifies the maximum rate of substrate conversion at an enzyme’s active site, is essential for assessing catalytic efficiency and understanding biochemical reaction mechanisms. Traditional wet-lab measurements of <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> are time-consuming and resource-intensive, making deep learning (DL) methods an appealing alternative. However, existing DL models often overlook the impact of reaction products on <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> due to feedback inhibition, resulting in suboptimal performance. The multimodal nature of this <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> prediction task, involving enzymes, substrates, and products as inputs, presents additional challenges when certain modalities are unavailable during inference due to incomplete data or experimental constraints, leading to the inapplicability of existing DL models. To address these limitations, we introduce <strong>MMKcat</strong>, a novel framework employing a prior-knowledge-guided missing modality training mechanism, which treats substrates and enzyme sequences as essential inputs while considering other modalities as maskable terms. Moreover, an innovative auxiliary regularizer is incorporated to encourage the learning of informative features from various modal combinations, enabling robust predictions even with incomplete multimodal inputs. We demonstrate the superior performance of MMKcat compared to state-of-the-art methods, including DLKcat, TurNup, UniKP, EITLEM-Kinetic, DLTKcat and GELKcat, using BRENDA and SABIO-RK. Our results show significant improvements under both complete and missing modality scenarios in RMSE, <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>, and SRCC metrics, with average improvements of 6.41%, 22.18%, and 8.15%, respectively. Codes are available at <span><span>https://github.com/ProEcho1/MMKcat</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"193 ","pages":"Article 110348"},"PeriodicalIF":7.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A multimodal deep learning framework for enzyme turnover prediction with missing modality\",\"authors\":\"Xin Sun , Yu Guang Wang , Yiqing Shen\",\"doi\":\"10.1016/j.compbiomed.2025.110348\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate prediction of the turnover number (<span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span>), which quantifies the maximum rate of substrate conversion at an enzyme’s active site, is essential for assessing catalytic efficiency and understanding biochemical reaction mechanisms. Traditional wet-lab measurements of <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> are time-consuming and resource-intensive, making deep learning (DL) methods an appealing alternative. However, existing DL models often overlook the impact of reaction products on <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> due to feedback inhibition, resulting in suboptimal performance. The multimodal nature of this <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> prediction task, involving enzymes, substrates, and products as inputs, presents additional challenges when certain modalities are unavailable during inference due to incomplete data or experimental constraints, leading to the inapplicability of existing DL models. To address these limitations, we introduce <strong>MMKcat</strong>, a novel framework employing a prior-knowledge-guided missing modality training mechanism, which treats substrates and enzyme sequences as essential inputs while considering other modalities as maskable terms. Moreover, an innovative auxiliary regularizer is incorporated to encourage the learning of informative features from various modal combinations, enabling robust predictions even with incomplete multimodal inputs. We demonstrate the superior performance of MMKcat compared to state-of-the-art methods, including DLKcat, TurNup, UniKP, EITLEM-Kinetic, DLTKcat and GELKcat, using BRENDA and SABIO-RK. Our results show significant improvements under both complete and missing modality scenarios in RMSE, <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>, and SRCC metrics, with average improvements of 6.41%, 22.18%, and 8.15%, respectively. Codes are available at <span><span>https://github.com/ProEcho1/MMKcat</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":10578,\"journal\":{\"name\":\"Computers in biology and medicine\",\"volume\":\"193 \",\"pages\":\"Article 110348\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2025-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in biology and medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010482525006997\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525006997","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

准确预测周转数（kcat），量化酶活性位点底物转化的最大速率，对于评估催化效率和理解生化反应机制至关重要。传统的湿实验室测量kcat耗时且资源密集，这使得深度学习（DL）方法成为一种有吸引力的替代方法。然而，现有的深度学习模型往往由于反馈抑制而忽略了反应产物对kcat的影响，导致性能不理想。该kcat预测任务的多模态性质涉及酶、底物和产物作为输入，当由于数据不完整或实验限制而在推理过程中无法使用某些模态时，会带来额外的挑战，从而导致现有DL模型的不适用性。为了解决这些限制，我们引入了MMKcat，这是一个采用先验知识引导的缺失模态训练机制的新框架，它将底物和酶序列视为基本输入，同时将其他模态视为可屏蔽项。此外，一个创新的辅助正则化器被纳入，以鼓励从各种模态组合中学习信息特征，即使在不完整的多模态输入下也能实现鲁棒预测。我们使用BRENDA和SABIO-RK证明了MMKcat与最先进的方法（包括DLKcat、TurNup、UniKP、EITLEM-Kinetic、DLTKcat和GELKcat）相比具有优越的性能。我们的研究结果显示，在RMSE、R2和SRCC指标上，在完全和缺失模态情况下都有显著的改善，平均分别提高了6.41%、22.18%和8.15%。代码可在https://github.com/ProEcho1/MMKcat上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A multimodal deep learning framework for enzyme turnover prediction with missing modality

Accurate prediction of the turnover number (

k_{cat}

), which quantifies the maximum rate of substrate conversion at an enzyme’s active site, is essential for assessing catalytic efficiency and understanding biochemical reaction mechanisms. Traditional wet-lab measurements of

k_{cat}

are time-consuming and resource-intensive, making deep learning (DL) methods an appealing alternative. However, existing DL models often overlook the impact of reaction products on

k_{cat}

due to feedback inhibition, resulting in suboptimal performance. The multimodal nature of this

k_{cat}

prediction task, involving enzymes, substrates, and products as inputs, presents additional challenges when certain modalities are unavailable during inference due to incomplete data or experimental constraints, leading to the inapplicability of existing DL models. To address these limitations, we introduce MMKcat, a novel framework employing a prior-knowledge-guided missing modality training mechanism, which treats substrates and enzyme sequences as essential inputs while considering other modalities as maskable terms. Moreover, an innovative auxiliary regularizer is incorporated to encourage the learning of informative features from various modal combinations, enabling robust predictions even with incomplete multimodal inputs. We demonstrate the superior performance of MMKcat compared to state-of-the-art methods, including DLKcat, TurNup, UniKP, EITLEM-Kinetic, DLTKcat and GELKcat, using BRENDA and SABIO-RK. Our results show significant improvements under both complete and missing modality scenarios in RMSE,

R^{2}

, and SRCC metrics, with average improvements of 6.41%, 22.18%, and 8.15%, respectively. Codes are available at https://github.com/ProEcho1/MMKcat.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers in biology and medicine 工程技术-工程：生物医学

CiteScore

11.70

自引率

10.40%

发文量

1086

审稿时长

74 days

期刊介绍： Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.