{"title":"Wallaby: A scalable semantic configuration service for grids and clouds","authors":"W. C. Benton, Robert H. Rati, Erik J. Erlandson","doi":"10.1145/2063348.2063362","DOIUrl":"https://doi.org/10.1145/2063348.2063362","url":null,"abstract":"Job schedulers for grids and clouds can offer great generality and configurability, but they typically do so at the cost of increased administrator complexity. In this paper, we present Wallaby, an open-source, scalable configuration service for compute resources managed by the Condor high-throughput computing system. Wallaby offers several notable advantages over similar systems: it lets administrators write declarative specifications of user-visible functionality on groups of nodes instead of low-level configuration file fragments; it presents a high-level semantic model of Condor features and their interactions and dependencies; it validates configurations before pushing them to nodes; it supports version control, \"undo,\" and configuration differencing; and it includes a networked API that enables extensions and advanced functionality. Wallaby allows administrators to extend pools to include more physical, virtual, or cloud nodes with minimal explicit con figuration. Finally, it is scalable, supporting pools consisting of thousands of nodes with hundreds of configuration parameters each.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123985821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng Li, Ke Chen, Ming-yu Hsieh, Naveen Muralimanohar, C. Kersey, J. Brockman, Arun Rodrigues, N. Jouppi
{"title":"System implications of memory reliability in exascale computing","authors":"Sheng Li, Ke Chen, Ming-yu Hsieh, Naveen Muralimanohar, C. Kersey, J. Brockman, Arun Rodrigues, N. Jouppi","doi":"10.1145/2063384.2063445","DOIUrl":"https://doi.org/10.1145/2063384.2063445","url":null,"abstract":"Resiliency will be one of the toughest challenges in future exascale systems. Memory errors contribute more than 40% of the total hardware-related failures and are projected to increase in future exascale systems. The use of error correction codes (ECC) and checkpointing are two effective approaches to fault tolerance. While there are numerous studies on ECC or checkpointing in isolation, this is the first paper to investigate the combined effect of both on overall system performance and power. Specifically, we study the impact of various ECC schemes (SECDED, BCH, and chip-kill) in conjunction with checkpointing on future exascale systems. Our simulation results show that while chipkill is 13% better for computation-intensive applications, BCH has a 28% advantage in system energy-delay product (EDP) for memory-intensive applications. We also propose to use BCH in tagged memory systems with commodity DRAMs where chipkill is impractical. Our proposed architecture achieves 2.3× better system EDP than state-of-the-art tagged memory systems.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128681287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Greenfield, L. Ice, S. Corwell, K. Haskell, C. Pavlakos, J. Noe
{"title":"One stop high performance computing user support at SNL","authors":"J. Greenfield, L. Ice, S. Corwell, K. Haskell, C. Pavlakos, J. Noe","doi":"10.1145/2063348.2063383","DOIUrl":"https://doi.org/10.1145/2063348.2063383","url":null,"abstract":"To improve the quality of user support for scientific, engineering, and high performance computing customers, the HPC OneStop Team unified the customer support activities of ten separate groups at Sandia National Laboratories (SNL). To our user communities, this team has been successful in providing a single, \"one stop\" interface for all engineering and scientific computing support, for everything from scientific applications on workstations, through small cluster operations, to large problems on the largest capability systems. To the service providers, HPC OneStop has promoted synergies, reduced redundancy of ticketing tools, and improved the capabilities for sharing problems and solutions among groups. HPC OneStop successfully accomplished the task of providing a \"one stop shop\" for our customers by: creating a unified portal for information access, integrating one ticketing tool to help improve collaboration among the various support groups, and developing a tiered HPC support structure focused on the customer.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116152633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brandt M. Westing, B. Urick, M. Esteva, Freddy Rojas, Weijia Xu
{"title":"Integrating multi-touch in high-resolution display environments","authors":"Brandt M. Westing, B. Urick, M. Esteva, Freddy Rojas, Weijia Xu","doi":"10.1145/2063348.2063359","DOIUrl":"https://doi.org/10.1145/2063348.2063359","url":null,"abstract":"High-resolution display environments consisting of many individual displays arrayed to form a single visible surface are commonly used to present large scale data. Using these displays often involves a control paradigm where interactions become cumbersome and non-intuitive. By combining high- resolution displays with multi-touch and gesture interactive hardware, researchers can explore data more naturally, efficiently and collaboratively. This fusion of technology is necessary to effectively use tiled-display environments and mediate their primary weakness interaction. In order to realize these objectives, a team at the Texas Advanced Computing Center (TACC) developed an economical display system using a combination of commodity hardware and customized software. In this paper we explain the requirements, design process, functions and best practices for constructing such displays. In addition, we explain how these systems can be used effectively with application examples.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127259425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chao Mei, Yanhua Sun, G. Zheng, Eric J. Bohm, L. Kalé, James C. Phillips, Christopher B. Harrison
{"title":"Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime","authors":"Chao Mei, Yanhua Sun, G. Zheng, Eric J. Bohm, L. Kalé, James C. Phillips, Christopher B. Harrison","doi":"10.1145/2063384.2063466","DOIUrl":"https://doi.org/10.1145/2063384.2063466","url":null,"abstract":"A 100-million-atom biomolecular simulation with NAMD is one of the three benchmarks for the NSF-funded sustainable petascale machine. Simulating this large molecular system on a petascale machine presents great challenges, including handling I/O, large memory footprint and getting good strong-scaling results. In this paper, we present parallel I/O techniques to enable the simulation. A new SMP model is designed to efficiently utilize ubiquitous wide multicore clusters by extending the Charm++ asynchronous message-driven runtime. We exploit node-aware techniques to optimize both the application and the underlying SMP runtime. Hierarchical load balancing is further exploited to scale NAMD to the full Jaguar PF Cray XT5 (224,076 cores) at Oak Ridge National Laboratory, both with and without PME full electrostatics, achieving 93% parallel efficiency (vs 6720 cores) at 9 ms per step for a simple cutoff calculation. Excellent scaling is also obtained on 65,536 cores of the Intrepid Blue Gene/P at Argonne National Laboratory.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129137709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCMFS: A file system for Storage Class Memory","authors":"XiaoJian Wu, Sheng Qiu, A. Reddy","doi":"10.1145/2063384.2063436","DOIUrl":"https://doi.org/10.1145/2063384.2063436","url":null,"abstract":"This paper considers the problem of how to implement a file system on Storage Class Memory (SCM), that is directly connected to the memory bus, byte addressable and is also non-volatile. In this paper, we propose a new file system, called SCMFS, which is implemented on the virtual address space. In SCMFS, we utilize the existing memory management module in the operating system to do the block management and keep the space always contiguous for each file. The simplicity of SCMFS not only makes it easy to implement, but also improves the performance. We have implemented a prototype in Linux and evaluated its performance through multiple benchmarks.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116882032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Kowalski, S. Krishnamoorthy, R. M. Olson, V. Tipparaju, E. Aprá
{"title":"Scalable implementations of accurate excited-state coupled cluster theories: Application of high-level methods to porphyrin-based systems","authors":"K. Kowalski, S. Krishnamoorthy, R. M. Olson, V. Tipparaju, E. Aprá","doi":"10.1145/2063384.2063481","DOIUrl":"https://doi.org/10.1145/2063384.2063481","url":null,"abstract":"The development of reliable tools for excited-state simulations is very important for understanding complex processes in the broad class of light harvesting systems and optoelectronic devices. Over the last years we have been developing equation of motion coupled cluster (EOMCC) methods capable of tackling these problems. In this paper we discuss the parallel performance of EOMCC codes which provide accurate description of excited-state correlation effects. Two aspects are discussed in detail: (1) a new algorithm for the iterative EOMCC methods based on improved parallel task scheduling algorithms, and (2) parallel algorithms for the non-iterative methods describing the effect of triply excited configurations. We demonstrate that the most computationally intensive non-iterative part can take advantage of 210,000 cores of the Cray XT5 system at the Oak Ridge Leadership Computing Facility (OLCF), achieving over 80% parallel efficiency. In particular, we demonstrate the importance of the computationally demanding non-iterative many-body methods in matching experimental level of accuracy for several porphyrin-based systems.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117289879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rory C. Kelly, Siddhartha S. Ghosh, Si Liu, D. D. Vento, R. Valent
{"title":"The NWSC benchmark suite: Using scientific throughput to measure supercomputer performance","authors":"Rory C. Kelly, Siddhartha S. Ghosh, Si Liu, D. D. Vento, R. Valent","doi":"10.1145/2063348.2063358","DOIUrl":"https://doi.org/10.1145/2063348.2063358","url":null,"abstract":"The NCAR-Wyoming Supercomputing Center (NWSC) will begin operating in June 2012, and will house NCAR's next generation HPC system. The NWSC will support a broad spectrum of Earth Science research drawn from a user community with diverse requirements for computing, storage, and data analysis resources. To ensure that the NWSC satisfies the needs of this community, the procurement benchmarking process was driven by science requirements from the start. We will discuss the science objectives for NWSC, translating scientific goals into technical requirements for a machine, and assembling a benchmark suite from community science models and synthetic tests to measure the technical capabilities of the proposed HPC systems. We will also talk about the benchmark analysis process, extending the benchmark suite as a testing tool over the life of the machine, and the applicability of the NWSC benchmarking suite to other HPC centers.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115503903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yandong Wang, Xinyu Que, Weikuan Yu, Dror Goldenberg, Dhiraj Sehgal
{"title":"Hadoop acceleration through network levitated merge","authors":"Yandong Wang, Xinyu Que, Weikuan Yu, Dror Goldenberg, Dhiraj Sehgal","doi":"10.1145/2063384.2063461","DOIUrl":"https://doi.org/10.1145/2063384.2063461","url":null,"abstract":"Hadoop is a popular open-source implementation of the MapReduce programming model for cloud computing. However, it faces a number of issues to achieve the best performance from the underlying system. These include a serialization barrier that delays the reduce phase, repetitive merges and disk access, and lack of capability to leverage latest high speed interconnects. We describe Hadoop-A, an acceleration framework that optimizes Hadoop with plugin components implemented in C++ for fast data movement, overcoming its existing limitations. A novel network-levitated merge algorithm is introduced to merge data without repetition and disk access. In addition, a full pipeline is designed to overlap the shuffle, merge and reduce phases. Our experimental results show that Hadoop-A doubles the data processing throughput of Hadoop, and reduces CPU utilization by more than 36%.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114178960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chang-Seo Park, Koushik Sen, Paul H. Hargrove, Costin Iancu
{"title":"Efficient data race detection for distributed memory parallel programs","authors":"Chang-Seo Park, Koushik Sen, Paul H. Hargrove, Costin Iancu","doi":"10.1145/2063384.2063452","DOIUrl":"https://doi.org/10.1145/2063384.2063452","url":null,"abstract":"In this paper we present a precise data race detection technique for distributed memory parallel programs. Our technique, which we call Active Testing, builds on our previous work on race detection for shared memory Java and C programs and it handles programs written using shared memory approaches as well as bulk communication. Active testing works in two phases: in the first phase, it performs an imprecise dynamic analysis of an execution of the program and finds potential data races that could happen if the program is executed with a different thread schedule. In the second phase, active testing re-executes the program by actively controlling the thread schedule so that the data races reported in the first phase can be confirmed. A key highlight of our technique is that it can scalably handle distributed programs with bulk communication and single- and split-phase barriers. Another key feature of our technique is that it is precise — a data race confirmed by active testing is an actual data race present in the program; however, being a testing approach, our technique can miss actual data races. We implement the framework for the UPC programming language and demonstrate scalability up to a thousand cores for programs with both fine-grained and bulk (MPI style) communication. The tool confirms previously known bugs and uncovers several unknown ones. Our extensions capture constructs proposed in several modern programming languages for High Performance Computing, most notably non-blocking barriers and collectives.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114527306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}