Protein Structure Modeling in a Grid Computing Environment

2013 IEEE 9th International Conference on e-Science Pub Date : 2013-10-22 DOI:10.1109/eScience.2013.15

Daniel Li, B. Tsui, Charles Xue, J. Haga, Koheix Ichikawa, S. Date

{"title":"Protein Structure Modeling in a Grid Computing Environment","authors":"Daniel Li, B. Tsui, Charles Xue, J. Haga, Koheix Ichikawa, S. Date","doi":"10.1109/eScience.2013.15","DOIUrl":null,"url":null,"abstract":"Advances in sequencing technology have resulted in an exponential increase in the availability of protein sequence information. In order to fully utilize information, it is important to translate the primary sequences into high-resolution tertiary protein structures. MODELLER is a leading homology modeling method that produces high quality protein structures. In this study, the function of MODELLER was expanded by configuring and deploying it on a parallel grid computing platform using a custom four-step workflow. The workflow consisted of template selection through a protein BLAST algorithm, target-template protein sequence alignment, distribution of model generation jobs among the compute clusters, and final protein model optimization. To test the validity of this workflow, we used the Dual Specificity Phosphatase (DSP) protein family, which shares high homology among each other. Comparison of the DSP member SSH-2 with its model counterpart revealed a minimal 1.3% difference in output energy scores. Furthermore, the Dali Pair wise Comparison Program demonstrated a 98% match among amino acid features and a Z-score of 26.6 indicating very significant similarities between the model and actual protein structure. After confirming the accuracy of our workflow, we generated 23 previously unknown DSP family protein structure models. Over 40,000 models were generated 30 times faster than conventional computing. Virtual receptor-ligand screening results of modeled protein DSP21 were compared with two known structures that had either higher or lower structural homology to DSP21. There was a significant difference (p!0.001) between the average ligand ranking discrepancy of a more homologous protein pair and a less homologous protein pair, suggesting that the protein models generated were sufficiently accurate for virtual screening. These results demonstrate the accuracy and usability of a grid-enabled MODELLER program and the increased efficiency of processing protein structure models. This workflow will help increase the speed of future drug development pipelines.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 9th International Conference on e-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2013.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Advances in sequencing technology have resulted in an exponential increase in the availability of protein sequence information. In order to fully utilize information, it is important to translate the primary sequences into high-resolution tertiary protein structures. MODELLER is a leading homology modeling method that produces high quality protein structures. In this study, the function of MODELLER was expanded by configuring and deploying it on a parallel grid computing platform using a custom four-step workflow. The workflow consisted of template selection through a protein BLAST algorithm, target-template protein sequence alignment, distribution of model generation jobs among the compute clusters, and final protein model optimization. To test the validity of this workflow, we used the Dual Specificity Phosphatase (DSP) protein family, which shares high homology among each other. Comparison of the DSP member SSH-2 with its model counterpart revealed a minimal 1.3% difference in output energy scores. Furthermore, the Dali Pair wise Comparison Program demonstrated a 98% match among amino acid features and a Z-score of 26.6 indicating very significant similarities between the model and actual protein structure. After confirming the accuracy of our workflow, we generated 23 previously unknown DSP family protein structure models. Over 40,000 models were generated 30 times faster than conventional computing. Virtual receptor-ligand screening results of modeled protein DSP21 were compared with two known structures that had either higher or lower structural homology to DSP21. There was a significant difference (p!0.001) between the average ligand ranking discrepancy of a more homologous protein pair and a less homologous protein pair, suggesting that the protein models generated were sufficiently accurate for virtual screening. These results demonstrate the accuracy and usability of a grid-enabled MODELLER program and the increased efficiency of processing protein structure models. This workflow will help increase the speed of future drug development pipelines.

查看原文本刊更多论文

网格计算环境下的蛋白质结构建模

测序技术的进步导致蛋白质序列信息的可用性呈指数级增长。为了充分利用信息，将一级序列翻译成高分辨率的三级蛋白结构是很重要的。modeler是一种领先的同源建模方法，可产生高质量的蛋白质结构。在本研究中，使用自定义的四步工作流，通过配置和部署modeler在并行网格计算平台上扩展其功能。该工作流程包括通过蛋白质BLAST算法选择模板，目标模板蛋白质序列比对，在计算集群中分配模型生成作业，以及最终的蛋白质模型优化。为了验证该工作流程的有效性，我们使用了双特异性磷酸酶(DSP)蛋白家族，它们之间具有很高的同源性。DSP成员SSH-2与其模型对应的比较显示，输出能量评分差异极小，仅为1.3%。此外，Dali配对比较程序显示氨基酸特征之间的匹配率为98%，z分数为26.6，表明模型与实际蛋白质结构之间存在非常显著的相似性。在确认我们工作流程的准确性后，我们生成了23个以前未知的DSP家族蛋白质结构模型。生成4万多个模型的速度是传统计算的30倍。将模拟蛋白DSP21的虚拟受体配体筛选结果与与DSP21结构同源性较高或较低的两种已知结构进行比较。同源性较高的蛋白质对和同源性较低的蛋白质对的平均配体排序差异有显著差异(p!0.001)，表明所生成的蛋白质模型对于虚拟筛选具有足够的准确性。这些结果证明了网格支持的modeler程序的准确性和可用性，以及处理蛋白质结构模型的效率提高。该工作流程将有助于提高未来药物开发管道的速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE 9th International Conference on e-Science

自引率

0.00%

发文量