A Novel Computational Machine Learning Pipeline to Quantify Similarities in Three-Dimensional Protein Structures

Shreyas U Hirway, Xiao Xu, Fan Fan
{"title":"A Novel Computational Machine Learning Pipeline to Quantify Similarities in Three-Dimensional Protein Structures","authors":"Shreyas U Hirway, Xiao Xu, Fan Fan","doi":"10.1101/2024.08.14.607969","DOIUrl":null,"url":null,"abstract":"Animal models are widely used during drug development. The selection of suitable animal model relies on various factors such as target biology, animal resource availability and legacy species. It is imperative that the selected animal species exhibit the highest resemblance to human, in terms of target biology as well as the similarity in the target protein. The current practice to address cross-species protein similarity relies on pair wise sequence comparison using protein sequences, instead of the biologically relevant 3-dimensional (3D) structure of proteins. We developed a novel quantitative machine learning pipeline using 3D structure-based feature data from the Protein Data Bank, nominal data from UNIPROT and bioactivity data from ChEMBL, all of which were matched for human and animal data. Using the XGBoost regression model, similarity scores between targets were calculated and based on these scores, the best animal species for a target was identified. For real-world application, targets from an alternative source, i.e., AlphaFold, were tested using the model, and the animal species that had the most similar protein to the human counterparts were predicted. These targets were then grouped based on their associated phenotype such that the pipeline could predict an optimal animal species.","PeriodicalId":501518,"journal":{"name":"bioRxiv - Pharmacology and Toxicology","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Pharmacology and Toxicology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.14.607969","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Animal models are widely used during drug development. The selection of suitable animal model relies on various factors such as target biology, animal resource availability and legacy species. It is imperative that the selected animal species exhibit the highest resemblance to human, in terms of target biology as well as the similarity in the target protein. The current practice to address cross-species protein similarity relies on pair wise sequence comparison using protein sequences, instead of the biologically relevant 3-dimensional (3D) structure of proteins. We developed a novel quantitative machine learning pipeline using 3D structure-based feature data from the Protein Data Bank, nominal data from UNIPROT and bioactivity data from ChEMBL, all of which were matched for human and animal data. Using the XGBoost regression model, similarity scores between targets were calculated and based on these scores, the best animal species for a target was identified. For real-world application, targets from an alternative source, i.e., AlphaFold, were tested using the model, and the animal species that had the most similar protein to the human counterparts were predicted. These targets were then grouped based on their associated phenotype such that the pipeline could predict an optimal animal species.
量化三维蛋白质结构相似性的新型计算机器学习管道
动物模型在药物研发过程中被广泛使用。选择合适的动物模型取决于多种因素,如目标生物学、动物资源可用性和遗留物种。就目标生物学以及目标蛋白质的相似性而言,所选动物物种必须与人类具有最高的相似性。目前解决跨物种蛋白质相似性的方法依赖于使用蛋白质序列进行成对序列比较,而不是蛋白质的生物相关三维(3D)结构。我们利用蛋白质数据库中基于三维结构的特征数据、UNIPROT 的标称数据和 ChEMBL 的生物活性数据,开发了一种新型定量机器学习管道,所有这些数据都与人类和动物数据相匹配。利用 XGBoost 回归模型计算目标之间的相似性得分,并根据这些得分确定目标的最佳动物物种。在实际应用中,使用该模型测试了来自其他来源(即 AlphaFold)的靶标,并预测了与人类对应蛋白最相似的动物物种。然后根据相关的表型对这些靶标进行分组,这样管道就能预测出最佳的动物物种。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信