利用SAGA和开放科学网格搜索适体

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI:10.1145/2616498.2616517

Kevin R. Shieh, Pilib Ó Broin, David Rhee, M. Levy, A. Golden

{"title":"利用SAGA和开放科学网格搜索适体","authors":"Kevin R. Shieh, Pilib Ó Broin, David Rhee, M. Levy, A. Golden","doi":"10.1145/2616498.2616517","DOIUrl":null,"url":null,"abstract":"RNA aptamers are small oligonucleotide molecules whose composition and resulting folded structure enable them to bind with high affinity and high selectivity to target ligands and therefore hold great promise as potential therapeutic drugs. Functional aptamers are selected from a large, randomized initial library in a process known as SELEX (systematic evolution of ligands by exponential enrichment). This is an iterative process involving numerous rounds of binding, elution, and amplification against a specific target substrate. During each iteration -- or round of selection -- we enrich for the species with the highest binding affinity to the target. After multiple rounds, we ideally have an enriched aptamer library suitable for subsequent investigation. Modern techniques employ massively parallel sequencing, enabling the generation of large libraries (~106 sequences) in a matter of hours for each round of selection. As RNA is single-stranded, covariance models (CMs) are ideal for representing motifs in their secondary structures, allowing us to discover patterns within functional aptamer populations following each round. CMs have been implemented in Infernal, a program that infers RNA alignments based on RNA sequence and structure. Calibrating a single CM in Infernal can take several hours and is a significant performance bottleneck for our work. However, as each CM calculation is itself independently determined and requires defined processing and memory resources, their computation in parallel offers a potential solution to this problem. In this paper, we describe using the Open Science Grid (OSG) to facilitate the identification of aptamer motifs by running CM calibrations and refinements in parallel across up to ten OSG clients. We use the Simple API for Grid Applications (SAGA) to interface with OSG and manage job submissions and file transfers. When run in parallel, our results show a significant speed up, constrained by typical latencies and QoS associated with nominal OSG usage. Our work demonstrates the ability of SAGA and the OSG to assist in parallelizing solutions to complex sequencing-based biomedical challenges.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"86 1","pages":"27:1-27:4"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Using SAGA and the Open Science Grid to Search for Aptamers\",\"authors\":\"Kevin R. Shieh, Pilib Ó Broin, David Rhee, M. Levy, A. Golden\",\"doi\":\"10.1145/2616498.2616517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RNA aptamers are small oligonucleotide molecules whose composition and resulting folded structure enable them to bind with high affinity and high selectivity to target ligands and therefore hold great promise as potential therapeutic drugs. Functional aptamers are selected from a large, randomized initial library in a process known as SELEX (systematic evolution of ligands by exponential enrichment). This is an iterative process involving numerous rounds of binding, elution, and amplification against a specific target substrate. During each iteration -- or round of selection -- we enrich for the species with the highest binding affinity to the target. After multiple rounds, we ideally have an enriched aptamer library suitable for subsequent investigation. Modern techniques employ massively parallel sequencing, enabling the generation of large libraries (~106 sequences) in a matter of hours for each round of selection. As RNA is single-stranded, covariance models (CMs) are ideal for representing motifs in their secondary structures, allowing us to discover patterns within functional aptamer populations following each round. CMs have been implemented in Infernal, a program that infers RNA alignments based on RNA sequence and structure. Calibrating a single CM in Infernal can take several hours and is a significant performance bottleneck for our work. However, as each CM calculation is itself independently determined and requires defined processing and memory resources, their computation in parallel offers a potential solution to this problem. In this paper, we describe using the Open Science Grid (OSG) to facilitate the identification of aptamer motifs by running CM calibrations and refinements in parallel across up to ten OSG clients. We use the Simple API for Grid Applications (SAGA) to interface with OSG and manage job submissions and file transfers. When run in parallel, our results show a significant speed up, constrained by typical latencies and QoS associated with nominal OSG usage. Our work demonstrates the ability of SAGA and the OSG to assist in parallelizing solutions to complex sequencing-based biomedical challenges.\",\"PeriodicalId\":93364,\"journal\":{\"name\":\"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)\",\"volume\":\"86 1\",\"pages\":\"27:1-27:4\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2616498.2616517\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2616498.2616517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

RNA适体是一种小的寡核苷酸分子，其组成和折叠结构使其能够以高亲和力和高选择性结合靶配体，因此作为潜在的治疗药物具有很大的前景。功能适配体是从一个大的随机初始库中选择的，这个过程被称为SELEX(配体的系统进化，通过指数富集)。这是一个迭代过程，涉及针对特定目标底物的多轮结合、洗脱和扩增。在每一次迭代或选择的过程中，我们都会对与目标结合亲和力最高的物种进行富集。经过多轮后，理想情况下，我们有一个丰富的适合后续研究的适配体库。现代技术采用大规模并行测序，使每轮选择在几小时内生成大型文库(~106个序列)。由于RNA是单链的，协方差模型(CMs)非常适合表示其二级结构中的基序，使我们能够在每轮之后发现功能适体群体中的模式。CMs已经在Infernal中实现，Infernal是一个基于RNA序列和结构推断RNA比对的程序。在Infernal中校准单个CM可能需要几个小时，并且是我们工作的一个重要性能瓶颈。然而，由于每个CM计算本身是独立确定的，并且需要定义的处理和内存资源，因此它们的并行计算为这个问题提供了一个潜在的解决方案。在本文中，我们描述了使用开放科学网格(OSG)通过在多达十个OSG客户端上并行运行CM校准和改进来促进合适基序的识别。我们使用网格应用程序的简单API (SAGA)与OSG进行交互，并管理作业提交和文件传输。当并行运行时，我们的结果显示出显著的速度提升，但受到与名义OSG使用相关的典型延迟和QoS的限制。我们的工作证明了SAGA和OSG协助并行解决复杂的基于测序的生物医学挑战的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using SAGA and the Open Science Grid to Search for Aptamers

RNA aptamers are small oligonucleotide molecules whose composition and resulting folded structure enable them to bind with high affinity and high selectivity to target ligands and therefore hold great promise as potential therapeutic drugs. Functional aptamers are selected from a large, randomized initial library in a process known as SELEX (systematic evolution of ligands by exponential enrichment). This is an iterative process involving numerous rounds of binding, elution, and amplification against a specific target substrate. During each iteration -- or round of selection -- we enrich for the species with the highest binding affinity to the target. After multiple rounds, we ideally have an enriched aptamer library suitable for subsequent investigation. Modern techniques employ massively parallel sequencing, enabling the generation of large libraries (~106 sequences) in a matter of hours for each round of selection. As RNA is single-stranded, covariance models (CMs) are ideal for representing motifs in their secondary structures, allowing us to discover patterns within functional aptamer populations following each round. CMs have been implemented in Infernal, a program that infers RNA alignments based on RNA sequence and structure. Calibrating a single CM in Infernal can take several hours and is a significant performance bottleneck for our work. However, as each CM calculation is itself independently determined and requires defined processing and memory resources, their computation in parallel offers a potential solution to this problem. In this paper, we describe using the Open Science Grid (OSG) to facilitate the identification of aptamer motifs by running CM calibrations and refinements in parallel across up to ten OSG clients. We use the Simple API for Grid Applications (SAGA) to interface with OSG and manage job submissions and file transfers. When run in parallel, our results show a significant speed up, constrained by typical latencies and QoS associated with nominal OSG usage. Our work demonstrates the ability of SAGA and the OSG to assist in parallelizing solutions to complex sequencing-based biomedical challenges.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

自引率

0.00%

发文量