Mujunqi Wu, Yuzhen Huang, Xinyan He, Kequan Chen, Bin Wu, Gerhard Schenk
{"title":"深度学习引导的远端氨基酸残基挖掘和聚类,用于同时工程的过程内切葡聚糖酶的催化活性和热稳定性。","authors":"Mujunqi Wu, Yuzhen Huang, Xinyan He, Kequan Chen, Bin Wu, Gerhard Schenk","doi":"10.1021/acssynbio.5c00454","DOIUrl":null,"url":null,"abstract":"<p><p>Processive endoglucanases, which possess both endo- and exoglucanase activities, are considered highly promising catalysts in cellulose degradation. In this study, we employed multiple deep learning models, including MutCompute, DeepSequence, and ESM-1v, to guide the engineering of EG5C-1, a processive endoglucanase derived from <i>Bacillus subtilis</i> BS-5. This enabled a systematic exploration of the enzyme's sequence space. Through a combination of clustering analysis and a greedy algorithm, we optimized combinations of amino acid substitutions and ultimately identified an elite variant, M8 (R23Q/E43Q/K91I/K191P/A198T/Q237D/V240P/S245A), composed entirely of substituted residues. Compared to the wild-type enzyme, M8 exhibited 10-fold and 5-fold improvements in catalytic efficiency (<i>k</i><sub>cat</sub>/<i>K</i><sub>m</sub>) toward soluble substrate carboxymethyl cellulose-Na (CMC) and insoluble substrate phosphoric acid-swollen cellulose (PASC), respectively, along with enhanced optimal temperature and thermostability. Molecular mechanistic analyses revealed that all distal substituted residues enhanced dynamic coupling and coordination, primarily influencing the conformation of three loops near the substrate pocket. These structural changes modulated substrate binding and product release, thereby contributing to improved catalytic efficiency (<i>k</i><sub>cat</sub>/<i>K</i><sub>m</sub>). This work not only suggests a feasible strategy to explore the \"dark space\" within sequences but also provides insights into the practical application of machine learning in experiments.</p>","PeriodicalId":26,"journal":{"name":"ACS Synthetic Biology","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep-Learning-Guided Mining and Clustering of Remote Amino Acid Residues for the Simultaneous Engineering of the Catalytic Activity and Thermostability of a Processive Endoglucanase.\",\"authors\":\"Mujunqi Wu, Yuzhen Huang, Xinyan He, Kequan Chen, Bin Wu, Gerhard Schenk\",\"doi\":\"10.1021/acssynbio.5c00454\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Processive endoglucanases, which possess both endo- and exoglucanase activities, are considered highly promising catalysts in cellulose degradation. In this study, we employed multiple deep learning models, including MutCompute, DeepSequence, and ESM-1v, to guide the engineering of EG5C-1, a processive endoglucanase derived from <i>Bacillus subtilis</i> BS-5. This enabled a systematic exploration of the enzyme's sequence space. Through a combination of clustering analysis and a greedy algorithm, we optimized combinations of amino acid substitutions and ultimately identified an elite variant, M8 (R23Q/E43Q/K91I/K191P/A198T/Q237D/V240P/S245A), composed entirely of substituted residues. Compared to the wild-type enzyme, M8 exhibited 10-fold and 5-fold improvements in catalytic efficiency (<i>k</i><sub>cat</sub>/<i>K</i><sub>m</sub>) toward soluble substrate carboxymethyl cellulose-Na (CMC) and insoluble substrate phosphoric acid-swollen cellulose (PASC), respectively, along with enhanced optimal temperature and thermostability. Molecular mechanistic analyses revealed that all distal substituted residues enhanced dynamic coupling and coordination, primarily influencing the conformation of three loops near the substrate pocket. These structural changes modulated substrate binding and product release, thereby contributing to improved catalytic efficiency (<i>k</i><sub>cat</sub>/<i>K</i><sub>m</sub>). This work not only suggests a feasible strategy to explore the \\\"dark space\\\" within sequences but also provides insights into the practical application of machine learning in experiments.</p>\",\"PeriodicalId\":26,\"journal\":{\"name\":\"ACS Synthetic Biology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Synthetic Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1021/acssynbio.5c00454\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Synthetic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1021/acssynbio.5c00454","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Deep-Learning-Guided Mining and Clustering of Remote Amino Acid Residues for the Simultaneous Engineering of the Catalytic Activity and Thermostability of a Processive Endoglucanase.
Processive endoglucanases, which possess both endo- and exoglucanase activities, are considered highly promising catalysts in cellulose degradation. In this study, we employed multiple deep learning models, including MutCompute, DeepSequence, and ESM-1v, to guide the engineering of EG5C-1, a processive endoglucanase derived from Bacillus subtilis BS-5. This enabled a systematic exploration of the enzyme's sequence space. Through a combination of clustering analysis and a greedy algorithm, we optimized combinations of amino acid substitutions and ultimately identified an elite variant, M8 (R23Q/E43Q/K91I/K191P/A198T/Q237D/V240P/S245A), composed entirely of substituted residues. Compared to the wild-type enzyme, M8 exhibited 10-fold and 5-fold improvements in catalytic efficiency (kcat/Km) toward soluble substrate carboxymethyl cellulose-Na (CMC) and insoluble substrate phosphoric acid-swollen cellulose (PASC), respectively, along with enhanced optimal temperature and thermostability. Molecular mechanistic analyses revealed that all distal substituted residues enhanced dynamic coupling and coordination, primarily influencing the conformation of three loops near the substrate pocket. These structural changes modulated substrate binding and product release, thereby contributing to improved catalytic efficiency (kcat/Km). This work not only suggests a feasible strategy to explore the "dark space" within sequences but also provides insights into the practical application of machine learning in experiments.
期刊介绍:
The journal is particularly interested in studies on the design and synthesis of new genetic circuits and gene products; computational methods in the design of systems; and integrative applied approaches to understanding disease and metabolism.
Topics may include, but are not limited to:
Design and optimization of genetic systems
Genetic circuit design and their principles for their organization into programs
Computational methods to aid the design of genetic systems
Experimental methods to quantify genetic parts, circuits, and metabolic fluxes
Genetic parts libraries: their creation, analysis, and ontological representation
Protein engineering including computational design
Metabolic engineering and cellular manufacturing, including biomass conversion
Natural product access, engineering, and production
Creative and innovative applications of cellular programming
Medical applications, tissue engineering, and the programming of therapeutic cells
Minimal cell design and construction
Genomics and genome replacement strategies
Viral engineering
Automated and robotic assembly platforms for synthetic biology
DNA synthesis methodologies
Metagenomics and synthetic metagenomic analysis
Bioinformatics applied to gene discovery, chemoinformatics, and pathway construction
Gene optimization
Methods for genome-scale measurements of transcription and metabolomics
Systems biology and methods to integrate multiple data sources
in vitro and cell-free synthetic biology and molecular programming
Nucleic acid engineering.