{"title":"Database engine integration and performance analysis of the BigDAWG polystore system","authors":"Katherine Yu, V. Gadepally, M. Stonebraker","doi":"10.1109/HPEC.2017.8091081","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091081","url":null,"abstract":"The BigDAWG polystore database system aims to address workloads dealing with large, heterogeneous datasets. The need for such a system is motivated by an increase in Big Data applications dealing with disparate types of data, from large scale analytics to realtime data streams to text-based records, each suited for different storage engines. These applications often perform cross-engine queries on correlated data, resulting in complex query planning, data migration, and execution. One such application is a medical application built by the Intel Science and Technology Center (ISTC) on data collected from an intensive care unit (ICU). We present work done to add support for two commonly used database engines, Vertica and MySQL, to the BigDAWG system, as well as results and analysis from performance evaluation of the system using the TPC-H benchmark.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115617053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient parallel streaming algorithms for large-scale inverse problems","authors":"H. Sundar","doi":"10.1109/HPEC.2017.8091033","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091033","url":null,"abstract":"Large-scale inverse problems and uncertainty quantification (UQ), i.e., quantifying uncertainties in complex mathematical models and their large-scale computational implementations, is one of the outstanding challenges in computational science and will be a driver for the acquisition of future supercomputers. These methods generate significant amounts of simulation data that is used by other parts of the computation in a complex fashion, requiring either large inmemory storage and/or redundant computations. We present a streaming algorithm for such computation that achieves high performance without requiring additional in-memory storage or additional computations. By reducing the memory footprint of the application we are able to achieve a significant speedup (∼3×) by operating in a more favorable region of the strong scaling curve.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121941324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An introduction to an array memory processor for application specific acceleration","authors":"G. Pechanek, N. Pitsianis","doi":"10.1109/HPEC.2017.8091069","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091069","url":null,"abstract":"In this paper, we introduce an Array Memory (AM) processor. The AM processor uses a shared memory network amenable to on-chip 3D stacking. Node couplings use a 1 to K adjacency of connections in each dimension of communication of an array of nodes, such as an R×C array where R ≥ K and C ≥ K and K is a positive odd integer. This design also provides data sharing between processors within sub-arrays of the R × C array to support high-performance programmable application specific processing. A new instruction set architecture is proposed that has arithmetic instructions that do not require the specification of any source or target operand addresses. The source operands and target values are provided by separate load, store, and arithmetic instructions that are appropriately scheduled with the arithmetic instruction to be executed to reduce the storage of temporary variables for lower power implementations.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117031953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software-defined extreme scale networks for bigdata applications","authors":"Haitham Ghalwash, Chun-Hsi Huang","doi":"10.1109/HPEC.2017.8091087","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091087","url":null,"abstract":"Software-Defined Networking (SDN) is an emerging technology that supports recent network applications. An SDN redefines networks by introducing the concept of decoupling the control plane from the data plane, thus providing centralized management, programmability, and dynamic reconfiguration. In this research, we specifically investigate the significance of using SDNs in support of Big-Data applications. SDN proved to support Big-Data applications through a more efficient use of distributed nodes. With Hadoop as an example of Big-Data application, we investigate the performance in terms of throughput and execution time for the read/write and sorting operations. The experiments take into consideration different network sizes of a Fat-tree topology. A Hadoop multi-node cluster is installed in Docker containers connected through a Fat-tree of OpenFlow switches. The packet forwarding is either by way of an SDN controller or the normal packet switching rules. Experimental results show that using an SDN controller outperforms normal forwarding by the switches. As a result, our research suggests that using SDN controllers has a strong potential to greatly enhance the performance of Big-Data applications on extreme-scale networks.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"5 21","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113955708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahsen J. Uppal, Guy Swope, H. H. Huang, The George Washington
{"title":"Scalable stochastic block partition","authors":"Ahsen J. Uppal, Guy Swope, H. H. Huang, The George Washington","doi":"10.1109/HPEC.2017.8091050","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091050","url":null,"abstract":"The processing of graph data at large scale, though important and useful for real-world applications, continues to be challenging, particularly for problems such as graph partitioning. Optimal graph partitioning is NP-hard, but several methods provide approximate solutions in reasonable time. Yet scaling these approximate algorithms is also challenging. In this paper, we describe our efforts towards improving the scalability of one such technique, stochastic block partition, which is the baseline algorithm for the IEEE HPEC Graph Challenge [1]. Our key contributions are: improvements to the parallelization of the baseline bottom-up algorithm, especially the Markov Chain Monte Carlo (MCMC) nodal updates for Bayesian inference; a new top-down divide and conquer algorithm capable of reducing the algorithmic complexity of static partitioning and also suitable for streaming partitioning; a parallel single-node multi-CPU implementation and a parallel multi-node MPI implementation. Although our focus is on algorithmic scalability, our Python implementation obtains a speedup of 1.65× over the fastest baseline parallel C++ run at a graph size of 100k vertices divided into 8 subgraphs on a multi-CPU single node machine. It achieves a speedup of 61× over itself on a cluster of 4 machines with 256 CPUs for a 20k node graph divided into 4 subgraphs, and 441× speedup over itself on a 50k node graph divided into 8 subgraphs on a multi-CPU single node machine.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127653851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Henretty, M. Baskaran, J. Ezick, David Bruns-Smith, T. Simon
{"title":"A quantitative and qualitative analysis of tensor decompositions on spatiotemporal data","authors":"Thomas Henretty, M. Baskaran, J. Ezick, David Bruns-Smith, T. Simon","doi":"10.1109/HPEC.2017.8091028","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091028","url":null,"abstract":"With the recent explosion of systems capable of generating and storing large quantities of GPS data, there is an opportunity to develop novel techniques for analyzing and gaining meaningful insights into this spatiotemporal data. In this paper we examine the application of tensor decompositions, a high-dimensional data analysis technique, to georeferenced data sets. Guidance is provided on fitting spatiotemporal data into the tensor model and analyzing the results. We find that tensor decompositions provide insight and that future research into spatiotemporal tensor decompositions for pattern detection, clustering, and anomaly detection is warranted.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115825239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thejaka Amila Kanewala, Marcin Zalewski, A. Lumsdaine
{"title":"Distributed-memory fast maximal independent set","authors":"Thejaka Amila Kanewala, Marcin Zalewski, A. Lumsdaine","doi":"10.1109/HPEC.2017.8091032","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091032","url":null,"abstract":"The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby's seminal MIS algorithms, \"Luby(A)\" and \"Luby(B),\" to distributed-memory execution, and we evaluate their performance. We compare our results with the \"Filtered MIS\" implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117233116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Krishnamurthy, K. Bradford, Chris Calloway, C. Castillo, Mike C. Conway, Jason Coposky, Yue Guo, R. Idaszak, W. Lenhardt, K. Robasky, Terrell G. Russell, Erik Scott, Marcin Sliwowski, M. Stealey, Kelsey Urgo, Hao Xu, H. Yi, S. Ahalt
{"title":"xDCI, a data science cyberinfrastructure for interdisciplinary research","authors":"A. Krishnamurthy, K. Bradford, Chris Calloway, C. Castillo, Mike C. Conway, Jason Coposky, Yue Guo, R. Idaszak, W. Lenhardt, K. Robasky, Terrell G. Russell, Erik Scott, Marcin Sliwowski, M. Stealey, Kelsey Urgo, Hao Xu, H. Yi, S. Ahalt","doi":"10.1109/HPEC.2017.8091022","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091022","url":null,"abstract":"This paper introduces xDCI, a Data Science Cyber-infrastructure to support research in a number of scientific domains including genomics, environmental science, biomedical and health science, and social science. xDCI leverages open-source software packages such as the integrated Rule Oriented Data System and the CyVerse Discovery Environment to address significant challenges in data storage, sharing, analysis and visualization. We provide three example applications to evaluate xDCI for different domains: analysis of 3D images of mice brains, videos analysis of neonatal resuscitation, and risk analytics. Finally, we conclude with a discussion of potential improvements to xDCI.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114192964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Baskaran, Thomas Henretty, B. Pradelle, M. H. Langston, David Bruns-Smith, J. Ezick, R. Lethin
{"title":"Memory-efficient parallel tensor decompositions","authors":"M. Baskaran, Thomas Henretty, B. Pradelle, M. H. Langston, David Bruns-Smith, J. Ezick, R. Lethin","doi":"10.1109/HPEC.2017.8091026","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091026","url":null,"abstract":"Tensor decompositions are a powerful technique for enabling comprehensive and complete analysis of real-world data. Data analysis through tensor decompositions involves intensive computations over large-scale irregular sparse data. Optimizing the execution of such data intensive computations is key to reducing the time-to-solution (or response time) in real-world data analysis applications. As high-performance computing (HPC) systems are increasingly used for data analysis applications, it is becoming increasingly important to optimize sparse tensor computations and execute them efficiently on modern and advanced HPC systems. In addition to utilizing the large processing capability of HPC systems, it is crucial to improve memory performance (memory usage, communication, synchronization, memory reuse, and data locality) in HPC systems. In this paper, we present multiple optimizations that are targeted towards faster and memory-efficient execution of large-scale tensor analysis on HPC systems. We demonstrate that our techniques achieve reduction in memory usage and execution time of tensor decomposition methods when they are applied on multiple datasets of varied size and structure from different application domains. We achieve up to 11× reduction in memory usage and up to 7× improvement in performance. More importantly, we enable the application of large tensor decompositions on some important datasets on a multi-core system that would not have been feasible without our optimization.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126164376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A distributed algorithm for the efficient computation of the unified model of social influence on massive datasets","authors":"Alex Popa, M. Frîncu, C. Chelmis","doi":"10.1109/HPEC.2017.8091084","DOIUrl":"https://doi.org/10.1109/HPEC.2017.8091084","url":null,"abstract":"Online social networks offer a rich data source for analyzing diffusion processes including rumor and viral spreading in communities. While many models exist, a unified model which enables analytical computation of complex, nonlinear phenomena while considering multiple factors was only recently proposed. We design an optimized implementation of the unified model of influence for vertex centric graph processing distributed platforms such as Apache Giraph. We validate and test the weak and strong scalability of our implementation on a Google Cloud Platform Hadoop and a Giraph installation using two real datasets. Results show a ∼3.2× performance improvement over the single node runtime on the same platform. We also analyze the cost of achieving this speedup on public clouds as well as the impact of the underlying platform and the requirement of having more distributed nodes to process the same dataset as compared to a shared memory system.","PeriodicalId":364903,"journal":{"name":"2017 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125980887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}