{"title":"一种新的计算机器学习管道来量化三维蛋白质结构的相似性。","authors":"Shreyas U Hirway, Xiao Xu, Fan Fan","doi":"10.1093/toxsci/kfaf007","DOIUrl":null,"url":null,"abstract":"<p><p>Animal models are widely used during drug development. The selection of suitable animal model relies on various factors such as target biology, animal resource availability and legacy species. It is imperative that the selected animal species exhibit the highest resemblance to human, in terms of target biology as well as the similarity in the target protein. The current practice to address cross-species protein similarity relies on pair-wise sequence comparison using protein sequences, instead of the biologically relevant 3-dimensional (3D) structure of proteins. We developed a novel quantitative machine learning pipeline using 3D structure-based feature data from the Protein Data Bank, nominal data from UNIPROT and bioactivity data from ChEMBL, all of which were matched for human and animal data. Using the XGBoost regression model, similarity scores between targets were calculated and based on these scores, the best animal species for a target was identified. For real-world application, targets from an alternative source, ie, AlphaFold, were tested using the model, and the animal species that had the most similar protein to the human counterparts were predicted. These targets were then grouped based on their associated phenotype such that the pipeline could predict an optimal animal species.</p>","PeriodicalId":23178,"journal":{"name":"Toxicological Sciences","volume":" ","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Computational Machine Learning Pipeline to Quantify Similarities in Three-Dimensional Protein Structures.\",\"authors\":\"Shreyas U Hirway, Xiao Xu, Fan Fan\",\"doi\":\"10.1093/toxsci/kfaf007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Animal models are widely used during drug development. The selection of suitable animal model relies on various factors such as target biology, animal resource availability and legacy species. It is imperative that the selected animal species exhibit the highest resemblance to human, in terms of target biology as well as the similarity in the target protein. The current practice to address cross-species protein similarity relies on pair-wise sequence comparison using protein sequences, instead of the biologically relevant 3-dimensional (3D) structure of proteins. We developed a novel quantitative machine learning pipeline using 3D structure-based feature data from the Protein Data Bank, nominal data from UNIPROT and bioactivity data from ChEMBL, all of which were matched for human and animal data. Using the XGBoost regression model, similarity scores between targets were calculated and based on these scores, the best animal species for a target was identified. For real-world application, targets from an alternative source, ie, AlphaFold, were tested using the model, and the animal species that had the most similar protein to the human counterparts were predicted. These targets were then grouped based on their associated phenotype such that the pipeline could predict an optimal animal species.</p>\",\"PeriodicalId\":23178,\"journal\":{\"name\":\"Toxicological Sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Toxicological Sciences\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/toxsci/kfaf007\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"TOXICOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Toxicological Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/toxsci/kfaf007","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TOXICOLOGY","Score":null,"Total":0}
A Novel Computational Machine Learning Pipeline to Quantify Similarities in Three-Dimensional Protein Structures.
Animal models are widely used during drug development. The selection of suitable animal model relies on various factors such as target biology, animal resource availability and legacy species. It is imperative that the selected animal species exhibit the highest resemblance to human, in terms of target biology as well as the similarity in the target protein. The current practice to address cross-species protein similarity relies on pair-wise sequence comparison using protein sequences, instead of the biologically relevant 3-dimensional (3D) structure of proteins. We developed a novel quantitative machine learning pipeline using 3D structure-based feature data from the Protein Data Bank, nominal data from UNIPROT and bioactivity data from ChEMBL, all of which were matched for human and animal data. Using the XGBoost regression model, similarity scores between targets were calculated and based on these scores, the best animal species for a target was identified. For real-world application, targets from an alternative source, ie, AlphaFold, were tested using the model, and the animal species that had the most similar protein to the human counterparts were predicted. These targets were then grouped based on their associated phenotype such that the pipeline could predict an optimal animal species.
期刊介绍:
The mission of Toxicological Sciences, the official journal of the Society of Toxicology, is to publish a broad spectrum of impactful research in the field of toxicology.
The primary focus of Toxicological Sciences is on original research articles. The journal also provides expert insight via contemporary and systematic reviews, as well as forum articles and editorial content that addresses important topics in the field.
The scope of Toxicological Sciences is focused on a broad spectrum of impactful toxicological research that will advance the multidisciplinary field of toxicology ranging from basic research to model development and application, and decision making. Submissions will include diverse technologies and approaches including, but not limited to: bioinformatics and computational biology, biochemistry, exposure science, histopathology, mass spectrometry, molecular biology, population-based sciences, tissue and cell-based systems, and whole-animal studies. Integrative approaches that combine realistic exposure scenarios with impactful analyses that move the field forward are encouraged.