Taruna Seth, Chao Feng, M. Ramanathan, V. Chaudhary
{"title":"Exploring Scalable Computing Architectures for Interactions Analysis","authors":"Taruna Seth, Chao Feng, M. Ramanathan, V. Chaudhary","doi":"10.1109/ICCCN.2018.8487405","DOIUrl":null,"url":null,"abstract":"Characterization of pharmacological signal transductions leading to drug-induced expressions of genes and proteins requires the capability to identify interactions among different potential predictor components, e.g. genomic data, clinical data, and environmental data. The detection of these gene-gene and gene-environment interactions remains challenging due to the exponential computational complexity and high dimensionality of the interaction problem. The problem is further exacerbated due to the involvement of very large-scale epidemiological datasets. Efficient high-order interaction analysis of such large-scale data is not feasible with the traditional frameworks. Parallel implementations of such applications in traditional cluster environments are often inefficient due to the storage bandwidth and network I/O limitations. Scalable distributed platforms can offer better scalability to such problems compared to the cluster architectures. Moreover, such data- and compute- intensive problems can benefit even further from data-intensive supercomputing (DISC) architectures that have been shown to yield superior performance compared to the regularly used cluster platforms. In this paper, we evaluate the applicability of different architectures such as traditional server based distributed architectures supported on commodity hardware and shared nothing architectures with massively parallel processing capabilities, towards the Interaction Analysis problem. Our experiments show that the massively parallel processing, shared-nothing architecture outweigh the benefits often realized through traditional server based and even distributed computing architectures. We conclude that the rapidly growing class of shared nothing architectures offers a potentially efficient and viable alternative to facilitate high-order interaction analysis involving extremely large scale biological datasets and is well suited to this category of data- and compute- intensive problems that cannot be addressed effectively using traditional frameworks.","PeriodicalId":399145,"journal":{"name":"2018 27th International Conference on Computer Communication and Networks (ICCCN)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 27th International Conference on Computer Communication and Networks (ICCCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN.2018.8487405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Characterization of pharmacological signal transductions leading to drug-induced expressions of genes and proteins requires the capability to identify interactions among different potential predictor components, e.g. genomic data, clinical data, and environmental data. The detection of these gene-gene and gene-environment interactions remains challenging due to the exponential computational complexity and high dimensionality of the interaction problem. The problem is further exacerbated due to the involvement of very large-scale epidemiological datasets. Efficient high-order interaction analysis of such large-scale data is not feasible with the traditional frameworks. Parallel implementations of such applications in traditional cluster environments are often inefficient due to the storage bandwidth and network I/O limitations. Scalable distributed platforms can offer better scalability to such problems compared to the cluster architectures. Moreover, such data- and compute- intensive problems can benefit even further from data-intensive supercomputing (DISC) architectures that have been shown to yield superior performance compared to the regularly used cluster platforms. In this paper, we evaluate the applicability of different architectures such as traditional server based distributed architectures supported on commodity hardware and shared nothing architectures with massively parallel processing capabilities, towards the Interaction Analysis problem. Our experiments show that the massively parallel processing, shared-nothing architecture outweigh the benefits often realized through traditional server based and even distributed computing architectures. We conclude that the rapidly growing class of shared nothing architectures offers a potentially efficient and viable alternative to facilitate high-order interaction analysis involving extremely large scale biological datasets and is well suited to this category of data- and compute- intensive problems that cannot be addressed effectively using traditional frameworks.