W. Badgett, K. Biery, C. Green, J. Kowalkowski, K. Maeshima, M. Paterno, R. Roser
{"title":"Poster: High-Speed Decision Making on Live Petabyte Data Streams","authors":"W. Badgett, K. Biery, C. Green, J. Kowalkowski, K. Maeshima, M. Paterno, R. Roser","doi":"10.1109/SC.Companion.2012.218","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.218","url":null,"abstract":"High Energy Physics has a long history of coping with cutting-edge data rates in its efforts to extract meaning from experimental data. The quantity of data from planned future experiments that must be analyzed practically in real-time to enable efficient filtering and storage of the scientifically interesting data has driven the development of sophisticated techniques which leverage technologies such as MPI, OpenMP and Intel TBB. We show the evolution of data collection, triggering and filtering from the 1990s with TeVatron experiments into the future of Intensity Frontier and Cosmic Frontier experiments and show how the requirements of upcoming experiments lead us to the development of high-performance streaming triggerless DAQ systems.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"22 1","pages":"1404-1404"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84406460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Many-Core Accelerated LIBOR Swaption Portfolio Pricing","authors":"Jörg Lotze, P. Sutton, Hicham Lahlou","doi":"10.1109/SC.Companion.2012.143","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.143","url":null,"abstract":"This paper describes the acceleration of a MonteCarlo algorithm for pricing a LIBOR swaption portfolio using multi-core CPUs and GPUs. Speedups of up to 305x are achieved on two Nvidia Tesla M2050 GPUs and up to 20.8x on two Intel Xeon E5620 CPUs, compared to a sequential CPU implementation. This performance is achieved by using the Xcelerit platform - writing sequential, high-level C++ code and adopting a simple dataflow programming model. It avoids the complexity involved when using low-level high-performance computing frameworks such as OpenMP, OpenCL, CUDA, or SIMD intrinsics. The paper provides an overview of the Xcelerit platform, details how high performance is achieved through various automatic optimisation and parallelisation techniques, and shows how the tool can be used to implement portable accelerated Monte-Carlo algorithms in finance. It illustrates the implementation of the Monte-Carlo LIBOR swaption portfolio pricer and gives performance results. A comparison of the Xcelerit platform implementation with an equivalent low-level CUDA version shows that the overhead introduced is less than 1.5% in all scenarios.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"30 1","pages":"1185-1192"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83404584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Abstract: Autonomic Modeling of Data-Driven Application Behavior","authors":"S. Monteiro, G. Bronevetsky, Marc Casas","doi":"10.1109/SC.Companion.2012.277","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.277","url":null,"abstract":"Computational behavior of large-scale data driven applications is a complex function of their input, configuration settings, and underlying system architecture. Difficulty in predicting the behavior of these applications makes it challenging to optimize their performance and schedule them onto compute resources. However, manually diagnosing performance problems and reconfiguring resource settings to improve application performance is infeasible and inefficient. We thus need autonomic optimization techniques that observe the application, learn from the observations, and subsequently successfully predict application behavior across different systems and load scenarios. This work presents a modular modeling approach for complex data-driven applications using statistical techniques. These techniques capture important characteristics of input data, consequent dynamic application behavior and system properties to predict application behavior with minimum human intervention. The work demonstrates how to adaptively structure and configure the models based on the observed complexity of application behavior in different input and execution scenarios.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"38 1","pages":"1485-1486"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85631644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Hu, N. Gumerov, Rio Yokota, L. Barba, R. Duraiswami
{"title":"Abstract: Scalable Fast Multipole Methods for Vortex Element Methods","authors":"Qi Hu, N. Gumerov, Rio Yokota, L. Barba, R. Duraiswami","doi":"10.1109/SC.COMPANION.2012.221","DOIUrl":"https://doi.org/10.1109/SC.COMPANION.2012.221","url":null,"abstract":"We use a particle-based method to simulate incompressible flows, where the Fast Multipole Method (FMM) is used to accelerate the calculation of particle interactions. The most time-consuming kernels-the Biot-Savart equation and stretching term of the vorticity equation-are mathematically reformulated so that only two Laplace scalar potentials are used instead of six, while automatically ensuring divergence-free far-field computation. Based on this formulation, and on our previous work for a scalar heterogeneous FMM algorithm, we develop a new FMM-based vortex method capable of simulating general flows including turbulence on heterogeneous architectures. Our work for this poster focuses on the computation perspective and our implementation can perform one time step of the velocity+stretching for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"34 1","pages":"1408-1408"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86476942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Herbert Huber, A. Auweter, T. Wilde, G. Meijer, Charles Archer, Torsten Bloth, Achim Bomelburg, S. Waitz
{"title":"Case Study: LRZ Liquid Cooling, Energy Management, Contract Specialities","authors":"Herbert Huber, A. Auweter, T. Wilde, G. Meijer, Charles Archer, Torsten Bloth, Achim Bomelburg, S. Waitz","doi":"10.1109/SC.Companion.2012.123","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.123","url":null,"abstract":"This presentation explores energy management, liquid cooling and heat re-use as well as contract specialities for LRZ: Leibniz-Rechenzentrum.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"49 1","pages":"962-992"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82231010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Babak Behzad, Joey Huchette, Huong Luu, R. Aydt, Q. Koziol, Prabhat, S. Byna, M. Chaarawi, Yushu Yao
{"title":"Abstract: Auto-Tuning of Parallel IO Parameters for HDF5 Applications","authors":"Babak Behzad, Joey Huchette, Huong Luu, R. Aydt, Q. Koziol, Prabhat, S. Byna, M. Chaarawi, Yushu Yao","doi":"10.1109/SC.Companion.2012.236","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.236","url":null,"abstract":"Parallel I/O is an unavoidable part of modern high-performance computing (HPC), but its system-wide dependencies means it has eluded optimization across platforms and applications. This can introduce bottlenecks in otherwise computationally efficient code, especially as scientific computing becomes increasingly data-driven. Various studies have shown that dramatic improvements are possible when the parameters are set appropriately. However, as a result of having multiple layers in the HPC I/O stack - each with its own optimization parameters-and nontrivial execution time for a test run, finding the optimal parameter values is a very complex problem. Additionally, optimal sets do not necessarily translate between use cases, since tuning I/O performance can be highly dependent on the individual application, the problem size, and the compute platform being used. Tunable parameters are exposed primarily at three levels in the I/O stack: the system, middleware, and high-level data-organization layers. HPC systems need a parallel file system, such as Lustre, to intelligently store data in a parallelized fashion. Middleware communication layers, such as MPI-IO, support this kind of parallel I/O and offer a variety of optimizations, such as collective buffering. Scientists and application developers often use HDF5, a high-level cross-platform I/O library that offers a hierarchical object-database representation of scientific data.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"48 1","pages":"1430-1430"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81694451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bytes and BTUs: Keys to a Net Zero","authors":"S. Hammond","doi":"10.1109/SC.Companion.2012.125","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.125","url":null,"abstract":"","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"10 1","pages":"1018-1039"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81980075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing Local File Accesses for FUSE-Based Distributed Storage","authors":"Shun Ishiguro, J. Murakami, Y. Oyama, O. Tatebe","doi":"10.1109/SC.Companion.2012.104","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.104","url":null,"abstract":"Modern distributed file systems can store huge amounts of information while retaining the benefits of high reliability and performance. Many of these systems are prototyped with FUSE, a popular framework for implementing user-level file systems. Unfortunately, when these systems are mounted on a client that uses FUSE, they suffer from I/O overhead caused by extra memory copies and context switches during local file access. Overhead imposed by FUSE on file systems is not small and becomes more pronounced during local file access. This overhead may significantly degrade the performance of data-intensive applications running with distributed file systems that aggressively use local storage. In this paper, we propose a mechanism that achieves rapid local file access in FUSE-based distributed file systems by reducing the number of memory copies and context switches. We then incorporate the mechanism into the FUSE framework and demonstrate its effectiveness through experiments, using the Gfarm distributed file system.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"368 1","pages":"760-765"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76608769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryusuke Egawa, J. Tada, Yusuke Endo, H. Takizawa, Hiroaki Kobayashi
{"title":"Abstract: Exploring Design Space of a 3D Stacked Vector Cache","authors":"Ryusuke Egawa, J. Tada, Yusuke Endo, H. Takizawa, Hiroaki Kobayashi","doi":"10.1109/SC.Companion.2012.270","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.270","url":null,"abstract":"Although 3D integration technologies with through silicon vias (TSVs) have expected to overcome the memory and power wall problems in the future microprocessor design, there is no promising EDA tools to design 3D integrated VLSIs. In addition, effects of 3D integration on microprocessor design have not been discussed well. Under this situation, this paper presents design approach of 3D stacked cache memories using existing EDA tools, and shows early performances evaluation of 3D stacked cache memories for vector processors.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"81 1","pages":"1475-1476"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90036785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The long term impact of codesign","authors":"A. Gara","doi":"10.1109/SC.Companion.2012.357","DOIUrl":"https://doi.org/10.1109/SC.Companion.2012.357","url":null,"abstract":"","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"70 1-2","pages":"2212-2246"},"PeriodicalIF":0.0,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72624775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}