一种新的计算机器学习管道来量化三维蛋白质结构的相似性。

IF 4.1 3区医学 Q2 TOXICOLOGY

Toxicological Sciences Pub Date : 2025-09-01 DOI:10.1093/toxsci/kfaf007

Shreyas U Hirway, Xiao Xu, Fan Fan

{"title":"一种新的计算机器学习管道来量化三维蛋白质结构的相似性。","authors":"Shreyas U Hirway, Xiao Xu, Fan Fan","doi":"10.1093/toxsci/kfaf007","DOIUrl":null,"url":null,"abstract":"Animal models are widely used during drug development. The selection of suitable animal model relies on various factors such as target biology, animal resource availability, and legacy species. It is imperative that the selected animal species exhibit the highest resemblance to humans, in terms of target biology as well as the similarity in the target protein. The current practice to address cross-species protein similarity relies on pairwise sequence comparison using protein sequences, instead of the biologically relevant 3D structure of proteins. We developed a novel quantitative machine learning pipeline using 3D structure-based feature data from the Protein Data Bank, nominal data from UNIPROT, and bioactivity data from ChEMBL, all of which were matched for human and animal data. Using the XGBoost regression model, similarity scores between targets were calculated and based on these scores, the best animal species for a target was identified. For real-world application, targets from an alternative source, i.e. AlphaFold, were tested using the model, and the animal species that had the most similar protein to the human counterparts were predicted. These targets were then grouped based on their associated phenotype such that the pipeline could predict an optimal animal species.","PeriodicalId":23178,"journal":{"name":"Toxicological Sciences","volume":" ","pages":"48-56"},"PeriodicalIF":4.1000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel computational machine learning pipeline to quantify similarities in 3D protein structures.\",\"authors\":\"Shreyas U Hirway, Xiao Xu, Fan Fan\",\"doi\":\"10.1093/toxsci/kfaf007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Animal models are widely used during drug development. The selection of suitable animal model relies on various factors such as target biology, animal resource availability, and legacy species. It is imperative that the selected animal species exhibit the highest resemblance to humans, in terms of target biology as well as the similarity in the target protein. The current practice to address cross-species protein similarity relies on pairwise sequence comparison using protein sequences, instead of the biologically relevant 3D structure of proteins. We developed a novel quantitative machine learning pipeline using 3D structure-based feature data from the Protein Data Bank, nominal data from UNIPROT, and bioactivity data from ChEMBL, all of which were matched for human and animal data. Using the XGBoost regression model, similarity scores between targets were calculated and based on these scores, the best animal species for a target was identified. For real-world application, targets from an alternative source, i.e. AlphaFold, were tested using the model, and the animal species that had the most similar protein to the human counterparts were predicted. These targets were then grouped based on their associated phenotype such that the pipeline could predict an optimal animal species.\",\"PeriodicalId\":23178,\"journal\":{\"name\":\"Toxicological Sciences\",\"volume\":\" \",\"pages\":\"48-56\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Toxicological Sciences\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/toxsci/kfaf007\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TOXICOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Toxicological Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/toxsci/kfaf007","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

动物模型在药物开发中被广泛使用。合适的动物模型的选择取决于多种因素，如目标生物学、动物资源可用性和遗留物种。所选择的动物物种必须表现出与人类在目标生物学和目标蛋白质方面的最高相似性。目前解决跨物种蛋白质相似性的实践依赖于使用蛋白质序列的成对序列比较，而不是生物相关的蛋白质三维（3D）结构。我们开发了一种新的定量机器学习管道，使用来自蛋白质数据库的基于3D结构的特征数据，来自UNIPROT的标称数据和来自ChEMBL的生物活性数据，所有这些数据都与人类和动物数据相匹配。利用XGBoost回归模型，计算目标之间的相似性分数，并根据这些分数确定目标的最佳动物物种。为了在现实世界中应用，我们使用该模型测试了来自另一个来源（如AlphaFold）的靶标，并预测了与人类对应蛋白最相似的动物物种。然后，这些靶标根据其相关的表型进行分组，以便该管道可以预测最佳的动物物种。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A novel computational machine learning pipeline to quantify similarities in 3D protein structures.

Animal models are widely used during drug development. The selection of suitable animal model relies on various factors such as target biology, animal resource availability, and legacy species. It is imperative that the selected animal species exhibit the highest resemblance to humans, in terms of target biology as well as the similarity in the target protein. The current practice to address cross-species protein similarity relies on pairwise sequence comparison using protein sequences, instead of the biologically relevant 3D structure of proteins. We developed a novel quantitative machine learning pipeline using 3D structure-based feature data from the Protein Data Bank, nominal data from UNIPROT, and bioactivity data from ChEMBL, all of which were matched for human and animal data. Using the XGBoost regression model, similarity scores between targets were calculated and based on these scores, the best animal species for a target was identified. For real-world application, targets from an alternative source, i.e. AlphaFold, were tested using the model, and the animal species that had the most similar protein to the human counterparts were predicted. These targets were then grouped based on their associated phenotype such that the pipeline could predict an optimal animal species.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Toxicological Sciences 医学-毒理学

CiteScore

7.70

自引率

7.90%

发文量

118

审稿时长

1.5 months

期刊介绍： The mission of Toxicological Sciences, the official journal of the Society of Toxicology, is to publish a broad spectrum of impactful research in the field of toxicology. The primary focus of Toxicological Sciences is on original research articles. The journal also provides expert insight via contemporary and systematic reviews, as well as forum articles and editorial content that addresses important topics in the field. The scope of Toxicological Sciences is focused on a broad spectrum of impactful toxicological research that will advance the multidisciplinary field of toxicology ranging from basic research to model development and application, and decision making. Submissions will include diverse technologies and approaches including, but not limited to: bioinformatics and computational biology, biochemistry, exposure science, histopathology, mass spectrometry, molecular biology, population-based sciences, tissue and cell-based systems, and whole-animal studies. Integrative approaches that combine realistic exposure scenarios with impactful analyses that move the field forward are encouraged.