一种新的计算机器学习管道来量化三维蛋白质结构的相似性。

IF 4.1 3区 医学 Q2 TOXICOLOGY
Shreyas U Hirway, Xiao Xu, Fan Fan
{"title":"一种新的计算机器学习管道来量化三维蛋白质结构的相似性。","authors":"Shreyas U Hirway, Xiao Xu, Fan Fan","doi":"10.1093/toxsci/kfaf007","DOIUrl":null,"url":null,"abstract":"<p><p>Animal models are widely used during drug development. The selection of suitable animal model relies on various factors such as target biology, animal resource availability, and legacy species. It is imperative that the selected animal species exhibit the highest resemblance to humans, in terms of target biology as well as the similarity in the target protein. The current practice to address cross-species protein similarity relies on pairwise sequence comparison using protein sequences, instead of the biologically relevant 3D structure of proteins. We developed a novel quantitative machine learning pipeline using 3D structure-based feature data from the Protein Data Bank, nominal data from UNIPROT, and bioactivity data from ChEMBL, all of which were matched for human and animal data. Using the XGBoost regression model, similarity scores between targets were calculated and based on these scores, the best animal species for a target was identified. For real-world application, targets from an alternative source, i.e. AlphaFold, were tested using the model, and the animal species that had the most similar protein to the human counterparts were predicted. These targets were then grouped based on their associated phenotype such that the pipeline could predict an optimal animal species.</p>","PeriodicalId":23178,"journal":{"name":"Toxicological Sciences","volume":" ","pages":"48-56"},"PeriodicalIF":4.1000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel computational machine learning pipeline to quantify similarities in 3D protein structures.\",\"authors\":\"Shreyas U Hirway, Xiao Xu, Fan Fan\",\"doi\":\"10.1093/toxsci/kfaf007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Animal models are widely used during drug development. The selection of suitable animal model relies on various factors such as target biology, animal resource availability, and legacy species. It is imperative that the selected animal species exhibit the highest resemblance to humans, in terms of target biology as well as the similarity in the target protein. The current practice to address cross-species protein similarity relies on pairwise sequence comparison using protein sequences, instead of the biologically relevant 3D structure of proteins. We developed a novel quantitative machine learning pipeline using 3D structure-based feature data from the Protein Data Bank, nominal data from UNIPROT, and bioactivity data from ChEMBL, all of which were matched for human and animal data. Using the XGBoost regression model, similarity scores between targets were calculated and based on these scores, the best animal species for a target was identified. For real-world application, targets from an alternative source, i.e. AlphaFold, were tested using the model, and the animal species that had the most similar protein to the human counterparts were predicted. These targets were then grouped based on their associated phenotype such that the pipeline could predict an optimal animal species.</p>\",\"PeriodicalId\":23178,\"journal\":{\"name\":\"Toxicological Sciences\",\"volume\":\" \",\"pages\":\"48-56\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Toxicological Sciences\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/toxsci/kfaf007\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TOXICOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Toxicological Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/toxsci/kfaf007","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

动物模型在药物开发中被广泛使用。合适的动物模型的选择取决于多种因素,如目标生物学、动物资源可用性和遗留物种。所选择的动物物种必须表现出与人类在目标生物学和目标蛋白质方面的最高相似性。目前解决跨物种蛋白质相似性的实践依赖于使用蛋白质序列的成对序列比较,而不是生物相关的蛋白质三维(3D)结构。我们开发了一种新的定量机器学习管道,使用来自蛋白质数据库的基于3D结构的特征数据,来自UNIPROT的标称数据和来自ChEMBL的生物活性数据,所有这些数据都与人类和动物数据相匹配。利用XGBoost回归模型,计算目标之间的相似性分数,并根据这些分数确定目标的最佳动物物种。为了在现实世界中应用,我们使用该模型测试了来自另一个来源(如AlphaFold)的靶标,并预测了与人类对应蛋白最相似的动物物种。然后,这些靶标根据其相关的表型进行分组,以便该管道可以预测最佳的动物物种。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A novel computational machine learning pipeline to quantify similarities in 3D protein structures.

Animal models are widely used during drug development. The selection of suitable animal model relies on various factors such as target biology, animal resource availability, and legacy species. It is imperative that the selected animal species exhibit the highest resemblance to humans, in terms of target biology as well as the similarity in the target protein. The current practice to address cross-species protein similarity relies on pairwise sequence comparison using protein sequences, instead of the biologically relevant 3D structure of proteins. We developed a novel quantitative machine learning pipeline using 3D structure-based feature data from the Protein Data Bank, nominal data from UNIPROT, and bioactivity data from ChEMBL, all of which were matched for human and animal data. Using the XGBoost regression model, similarity scores between targets were calculated and based on these scores, the best animal species for a target was identified. For real-world application, targets from an alternative source, i.e. AlphaFold, were tested using the model, and the animal species that had the most similar protein to the human counterparts were predicted. These targets were then grouped based on their associated phenotype such that the pipeline could predict an optimal animal species.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Toxicological Sciences
Toxicological Sciences 医学-毒理学
CiteScore
7.70
自引率
7.90%
发文量
118
审稿时长
1.5 months
期刊介绍: The mission of Toxicological Sciences, the official journal of the Society of Toxicology, is to publish a broad spectrum of impactful research in the field of toxicology. The primary focus of Toxicological Sciences is on original research articles. The journal also provides expert insight via contemporary and systematic reviews, as well as forum articles and editorial content that addresses important topics in the field. The scope of Toxicological Sciences is focused on a broad spectrum of impactful toxicological research that will advance the multidisciplinary field of toxicology ranging from basic research to model development and application, and decision making. Submissions will include diverse technologies and approaches including, but not limited to: bioinformatics and computational biology, biochemistry, exposure science, histopathology, mass spectrometry, molecular biology, population-based sciences, tissue and cell-based systems, and whole-animal studies. Integrative approaches that combine realistic exposure scenarios with impactful analyses that move the field forward are encouraged.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信