{"title":"缺失模态下酶周转预测的多模态深度学习框架","authors":"Xin Sun , Yu Guang Wang , Yiqing Shen","doi":"10.1016/j.compbiomed.2025.110348","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate prediction of the turnover number (<span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span>), which quantifies the maximum rate of substrate conversion at an enzyme’s active site, is essential for assessing catalytic efficiency and understanding biochemical reaction mechanisms. Traditional wet-lab measurements of <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> are time-consuming and resource-intensive, making deep learning (DL) methods an appealing alternative. However, existing DL models often overlook the impact of reaction products on <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> due to feedback inhibition, resulting in suboptimal performance. The multimodal nature of this <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> prediction task, involving enzymes, substrates, and products as inputs, presents additional challenges when certain modalities are unavailable during inference due to incomplete data or experimental constraints, leading to the inapplicability of existing DL models. To address these limitations, we introduce <strong>MMKcat</strong>, a novel framework employing a prior-knowledge-guided missing modality training mechanism, which treats substrates and enzyme sequences as essential inputs while considering other modalities as maskable terms. Moreover, an innovative auxiliary regularizer is incorporated to encourage the learning of informative features from various modal combinations, enabling robust predictions even with incomplete multimodal inputs. We demonstrate the superior performance of MMKcat compared to state-of-the-art methods, including DLKcat, TurNup, UniKP, EITLEM-Kinetic, DLTKcat and GELKcat, using BRENDA and SABIO-RK. Our results show significant improvements under both complete and missing modality scenarios in RMSE, <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>, and SRCC metrics, with average improvements of 6.41%, 22.18%, and 8.15%, respectively. Codes are available at <span><span>https://github.com/ProEcho1/MMKcat</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"193 ","pages":"Article 110348"},"PeriodicalIF":7.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A multimodal deep learning framework for enzyme turnover prediction with missing modality\",\"authors\":\"Xin Sun , Yu Guang Wang , Yiqing Shen\",\"doi\":\"10.1016/j.compbiomed.2025.110348\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate prediction of the turnover number (<span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span>), which quantifies the maximum rate of substrate conversion at an enzyme’s active site, is essential for assessing catalytic efficiency and understanding biochemical reaction mechanisms. Traditional wet-lab measurements of <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> are time-consuming and resource-intensive, making deep learning (DL) methods an appealing alternative. However, existing DL models often overlook the impact of reaction products on <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> due to feedback inhibition, resulting in suboptimal performance. The multimodal nature of this <span><math><msub><mrow><mi>k</mi></mrow><mrow><mi>cat</mi></mrow></msub></math></span> prediction task, involving enzymes, substrates, and products as inputs, presents additional challenges when certain modalities are unavailable during inference due to incomplete data or experimental constraints, leading to the inapplicability of existing DL models. To address these limitations, we introduce <strong>MMKcat</strong>, a novel framework employing a prior-knowledge-guided missing modality training mechanism, which treats substrates and enzyme sequences as essential inputs while considering other modalities as maskable terms. Moreover, an innovative auxiliary regularizer is incorporated to encourage the learning of informative features from various modal combinations, enabling robust predictions even with incomplete multimodal inputs. We demonstrate the superior performance of MMKcat compared to state-of-the-art methods, including DLKcat, TurNup, UniKP, EITLEM-Kinetic, DLTKcat and GELKcat, using BRENDA and SABIO-RK. Our results show significant improvements under both complete and missing modality scenarios in RMSE, <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>, and SRCC metrics, with average improvements of 6.41%, 22.18%, and 8.15%, respectively. Codes are available at <span><span>https://github.com/ProEcho1/MMKcat</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":10578,\"journal\":{\"name\":\"Computers in biology and medicine\",\"volume\":\"193 \",\"pages\":\"Article 110348\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2025-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in biology and medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0010482525006997\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525006997","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
A multimodal deep learning framework for enzyme turnover prediction with missing modality
Accurate prediction of the turnover number (), which quantifies the maximum rate of substrate conversion at an enzyme’s active site, is essential for assessing catalytic efficiency and understanding biochemical reaction mechanisms. Traditional wet-lab measurements of are time-consuming and resource-intensive, making deep learning (DL) methods an appealing alternative. However, existing DL models often overlook the impact of reaction products on due to feedback inhibition, resulting in suboptimal performance. The multimodal nature of this prediction task, involving enzymes, substrates, and products as inputs, presents additional challenges when certain modalities are unavailable during inference due to incomplete data or experimental constraints, leading to the inapplicability of existing DL models. To address these limitations, we introduce MMKcat, a novel framework employing a prior-knowledge-guided missing modality training mechanism, which treats substrates and enzyme sequences as essential inputs while considering other modalities as maskable terms. Moreover, an innovative auxiliary regularizer is incorporated to encourage the learning of informative features from various modal combinations, enabling robust predictions even with incomplete multimodal inputs. We demonstrate the superior performance of MMKcat compared to state-of-the-art methods, including DLKcat, TurNup, UniKP, EITLEM-Kinetic, DLTKcat and GELKcat, using BRENDA and SABIO-RK. Our results show significant improvements under both complete and missing modality scenarios in RMSE, , and SRCC metrics, with average improvements of 6.41%, 22.18%, and 8.15%, respectively. Codes are available at https://github.com/ProEcho1/MMKcat.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.