Daniel Li, B. Tsui, Charles Xue, J. Haga, Koheix Ichikawa, S. Date
{"title":"Protein Structure Modeling in a Grid Computing Environment","authors":"Daniel Li, B. Tsui, Charles Xue, J. Haga, Koheix Ichikawa, S. Date","doi":"10.1109/eScience.2013.15","DOIUrl":null,"url":null,"abstract":"Advances in sequencing technology have resulted in an exponential increase in the availability of protein sequence information. In order to fully utilize information, it is important to translate the primary sequences into high-resolution tertiary protein structures. MODELLER is a leading homology modeling method that produces high quality protein structures. In this study, the function of MODELLER was expanded by configuring and deploying it on a parallel grid computing platform using a custom four-step workflow. The workflow consisted of template selection through a protein BLAST algorithm, target-template protein sequence alignment, distribution of model generation jobs among the compute clusters, and final protein model optimization. To test the validity of this workflow, we used the Dual Specificity Phosphatase (DSP) protein family, which shares high homology among each other. Comparison of the DSP member SSH-2 with its model counterpart revealed a minimal 1.3% difference in output energy scores. Furthermore, the Dali Pair wise Comparison Program demonstrated a 98% match among amino acid features and a Z-score of 26.6 indicating very significant similarities between the model and actual protein structure. After confirming the accuracy of our workflow, we generated 23 previously unknown DSP family protein structure models. Over 40,000 models were generated 30 times faster than conventional computing. Virtual receptor-ligand screening results of modeled protein DSP21 were compared with two known structures that had either higher or lower structural homology to DSP21. There was a significant difference (p!0.001) between the average ligand ranking discrepancy of a more homologous protein pair and a less homologous protein pair, suggesting that the protein models generated were sufficiently accurate for virtual screening. These results demonstrate the accuracy and usability of a grid-enabled MODELLER program and the increased efficiency of processing protein structure models. This workflow will help increase the speed of future drug development pipelines.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 9th International Conference on e-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2013.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Advances in sequencing technology have resulted in an exponential increase in the availability of protein sequence information. In order to fully utilize information, it is important to translate the primary sequences into high-resolution tertiary protein structures. MODELLER is a leading homology modeling method that produces high quality protein structures. In this study, the function of MODELLER was expanded by configuring and deploying it on a parallel grid computing platform using a custom four-step workflow. The workflow consisted of template selection through a protein BLAST algorithm, target-template protein sequence alignment, distribution of model generation jobs among the compute clusters, and final protein model optimization. To test the validity of this workflow, we used the Dual Specificity Phosphatase (DSP) protein family, which shares high homology among each other. Comparison of the DSP member SSH-2 with its model counterpart revealed a minimal 1.3% difference in output energy scores. Furthermore, the Dali Pair wise Comparison Program demonstrated a 98% match among amino acid features and a Z-score of 26.6 indicating very significant similarities between the model and actual protein structure. After confirming the accuracy of our workflow, we generated 23 previously unknown DSP family protein structure models. Over 40,000 models were generated 30 times faster than conventional computing. Virtual receptor-ligand screening results of modeled protein DSP21 were compared with two known structures that had either higher or lower structural homology to DSP21. There was a significant difference (p!0.001) between the average ligand ranking discrepancy of a more homologous protein pair and a less homologous protein pair, suggesting that the protein models generated were sufficiently accurate for virtual screening. These results demonstrate the accuracy and usability of a grid-enabled MODELLER program and the increased efficiency of processing protein structure models. This workflow will help increase the speed of future drug development pipelines.