Machine learning-based prediction of activity and substrate specificity for OleA enzymes in the thiolase superfamily

IF 3.2 4区 生物学 Q1 Agricultural and Biological Sciences
S. Robinson, Megan D. Smith, J. Richman, Kelly G. Aukema, L. Wackett
{"title":"Machine learning-based prediction of activity and substrate specificity for OleA enzymes in the thiolase superfamily","authors":"S. Robinson, Megan D. Smith, J. Richman, Kelly G. Aukema, L. Wackett","doi":"10.1093/synbio/ysaa004","DOIUrl":null,"url":null,"abstract":"\n Enzymes in the thiolase superfamily catalyze carbon–carbon bond formation for the biosynthesis of polyhydroxyalkanoate storage molecules, membrane lipids and bioactive secondary metabolites. Natural and engineered thiolases have applications in synthetic biology for the production of high-value compounds, including personal care products and therapeutics. A fundamental understanding of thiolase substrate specificity is lacking, particularly within the OleA protein family. The ability to predict substrates from sequence would advance (meta)genome mining efforts to identify active thiolases for the production of desired metabolites. To gain a deeper understanding of substrate scope within the OleA family, we measured the activity of 73 diverse bacterial thiolases with a library of 15 p-nitrophenyl ester substrates to build a training set of 1095 unique enzyme–substrate pairs. We then used machine learning to predict thiolase substrate specificity from physicochemical and structural features. The area under the receiver operating characteristic curve was 0.89 for random forest classification of enzyme activity, and our regression model had a test set root mean square error of 0.22 (R2 = 0.75) to quantitatively predict enzyme activity levels. Substrate aromaticity, oxygen content and molecular connectivity were the strongest predictors of enzyme–substrate pairing. Key amino acid residues A173, I284, V287, T292 and I316 in the Xanthomonas campestris OleA crystal structure lining the substrate binding pockets were important for thiolase substrate specificity and are attractive targets for future protein engineering studies. The predictive framework described here is generalizable and demonstrates how machine learning can be used to quantitatively understand and predict enzyme substrate specificity.","PeriodicalId":22158,"journal":{"name":"Synthetic Biology","volume":"18 1","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Synthetic Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/synbio/ysaa004","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
引用次数: 18

Abstract

Enzymes in the thiolase superfamily catalyze carbon–carbon bond formation for the biosynthesis of polyhydroxyalkanoate storage molecules, membrane lipids and bioactive secondary metabolites. Natural and engineered thiolases have applications in synthetic biology for the production of high-value compounds, including personal care products and therapeutics. A fundamental understanding of thiolase substrate specificity is lacking, particularly within the OleA protein family. The ability to predict substrates from sequence would advance (meta)genome mining efforts to identify active thiolases for the production of desired metabolites. To gain a deeper understanding of substrate scope within the OleA family, we measured the activity of 73 diverse bacterial thiolases with a library of 15 p-nitrophenyl ester substrates to build a training set of 1095 unique enzyme–substrate pairs. We then used machine learning to predict thiolase substrate specificity from physicochemical and structural features. The area under the receiver operating characteristic curve was 0.89 for random forest classification of enzyme activity, and our regression model had a test set root mean square error of 0.22 (R2 = 0.75) to quantitatively predict enzyme activity levels. Substrate aromaticity, oxygen content and molecular connectivity were the strongest predictors of enzyme–substrate pairing. Key amino acid residues A173, I284, V287, T292 and I316 in the Xanthomonas campestris OleA crystal structure lining the substrate binding pockets were important for thiolase substrate specificity and are attractive targets for future protein engineering studies. The predictive framework described here is generalizable and demonstrates how machine learning can be used to quantitatively understand and predict enzyme substrate specificity.
硫硫酶超家族中OleA酶的活性和底物特异性的机器学习预测
硫酶超家族中的酶催化碳-碳键形成,用于生物合成聚羟基烷酸盐储存分子、膜脂和生物活性次级代谢物。天然和工程硫酶在合成生物学中用于生产高价值化合物,包括个人护理产品和治疗药物。缺乏对硫硫酶底物特异性的基本理解,特别是在OleA蛋白家族中。从序列中预测底物的能力将推进(元)基因组挖掘工作,以确定生产所需代谢物的活性硫酶。为了更深入地了解OleA家族的底物范围,我们用15对对硝基苯酯底物库测量了73种不同细菌硫酶的活性,建立了1095对独特的酶-底物对的训练集。然后,我们使用机器学习从物理化学和结构特征来预测硫硫酶底物的特异性。酶活性随机森林分类的受试者工作特征曲线下面积为0.89,回归模型定量预测酶活性水平的检验集均方根误差为0.22 (R2 = 0.75)。底物芳香性、氧含量和分子连通性是酶-底物配对的最强预测因子。油菜黄单胞菌OleA晶体结构中的关键氨基酸残基A173、I284、V287、T292和I316排列在底物结合口袋中,对硫硫酶的底物特异性具有重要意义,是未来蛋白质工程研究的重要目标。这里描述的预测框架是可推广的,并展示了机器学习如何用于定量理解和预测酶底物特异性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Synthetic Biology
Synthetic Biology Agricultural and Biological Sciences-Agricultural and Biological Sciences (miscellaneous)
CiteScore
4.50
自引率
3.10%
发文量
28
审稿时长
25 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信