Exploring Scalable Computing Architectures for Interactions Analysis

2018 27th International Conference on Computer Communication and Networks (ICCCN) Pub Date : 2018-07-01 DOI:10.1109/ICCCN.2018.8487405

Taruna Seth, Chao Feng, M. Ramanathan, V. Chaudhary

{"title":"Exploring Scalable Computing Architectures for Interactions Analysis","authors":"Taruna Seth, Chao Feng, M. Ramanathan, V. Chaudhary","doi":"10.1109/ICCCN.2018.8487405","DOIUrl":null,"url":null,"abstract":"Characterization of pharmacological signal transductions leading to drug-induced expressions of genes and proteins requires the capability to identify interactions among different potential predictor components, e.g. genomic data, clinical data, and environmental data. The detection of these gene-gene and gene-environment interactions remains challenging due to the exponential computational complexity and high dimensionality of the interaction problem. The problem is further exacerbated due to the involvement of very large-scale epidemiological datasets. Efficient high-order interaction analysis of such large-scale data is not feasible with the traditional frameworks. Parallel implementations of such applications in traditional cluster environments are often inefficient due to the storage bandwidth and network I/O limitations. Scalable distributed platforms can offer better scalability to such problems compared to the cluster architectures. Moreover, such data- and compute- intensive problems can benefit even further from data-intensive supercomputing (DISC) architectures that have been shown to yield superior performance compared to the regularly used cluster platforms. In this paper, we evaluate the applicability of different architectures such as traditional server based distributed architectures supported on commodity hardware and shared nothing architectures with massively parallel processing capabilities, towards the Interaction Analysis problem. Our experiments show that the massively parallel processing, shared-nothing architecture outweigh the benefits often realized through traditional server based and even distributed computing architectures. We conclude that the rapidly growing class of shared nothing architectures offers a potentially efficient and viable alternative to facilitate high-order interaction analysis involving extremely large scale biological datasets and is well suited to this category of data- and compute- intensive problems that cannot be addressed effectively using traditional frameworks.","PeriodicalId":399145,"journal":{"name":"2018 27th International Conference on Computer Communication and Networks (ICCCN)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 27th International Conference on Computer Communication and Networks (ICCCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN.2018.8487405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Characterization of pharmacological signal transductions leading to drug-induced expressions of genes and proteins requires the capability to identify interactions among different potential predictor components, e.g. genomic data, clinical data, and environmental data. The detection of these gene-gene and gene-environment interactions remains challenging due to the exponential computational complexity and high dimensionality of the interaction problem. The problem is further exacerbated due to the involvement of very large-scale epidemiological datasets. Efficient high-order interaction analysis of such large-scale data is not feasible with the traditional frameworks. Parallel implementations of such applications in traditional cluster environments are often inefficient due to the storage bandwidth and network I/O limitations. Scalable distributed platforms can offer better scalability to such problems compared to the cluster architectures. Moreover, such data- and compute- intensive problems can benefit even further from data-intensive supercomputing (DISC) architectures that have been shown to yield superior performance compared to the regularly used cluster platforms. In this paper, we evaluate the applicability of different architectures such as traditional server based distributed architectures supported on commodity hardware and shared nothing architectures with massively parallel processing capabilities, towards the Interaction Analysis problem. Our experiments show that the massively parallel processing, shared-nothing architecture outweigh the benefits often realized through traditional server based and even distributed computing architectures. We conclude that the rapidly growing class of shared nothing architectures offers a potentially efficient and viable alternative to facilitate high-order interaction analysis involving extremely large scale biological datasets and is well suited to this category of data- and compute- intensive problems that cannot be addressed effectively using traditional frameworks.

查看原文本刊更多论文

探索交互分析的可扩展计算体系结构

表征导致药物诱导的基因和蛋白质表达的药理学信号转导需要识别不同潜在预测成分之间的相互作用的能力，例如基因组数据、临床数据和环境数据。由于这些相互作用问题的指数计算复杂性和高维性，这些基因-基因和基因-环境相互作用的检测仍然具有挑战性。由于涉及非常大规模的流行病学数据集，这一问题进一步加剧。传统框架无法对如此大规模的数据进行高效的高阶交互分析。由于存储带宽和网络I/O限制，在传统集群环境中并行实现这类应用程序通常效率低下。与集群体系结构相比，可伸缩的分布式平台可以为此类问题提供更好的可伸缩性。此外，这些数据和计算密集型问题可以从数据密集型超级计算(DISC)体系结构中进一步受益，与常规使用的集群平台相比，数据密集型超级计算(DISC)体系结构已被证明具有优越的性能。在本文中，我们评估了不同架构对交互分析问题的适用性，例如传统的基于商用硬件支持的基于服务器的分布式架构和具有大规模并行处理能力的无共享架构。我们的实验表明，大规模并行处理、无共享架构比传统的基于服务器甚至分布式计算架构带来的好处更大。我们得出的结论是，快速增长的无共享架构提供了一种潜在的高效和可行的替代方案，以促进涉及超大规模生物数据集的高阶交互分析，并且非常适合这类数据和计算密集型问题，这些问题无法使用传统框架有效地解决。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 27th International Conference on Computer Communication and Networks (ICCCN)

自引率

0.00%

发文量