2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)最新文献

筛选
英文 中文
Wallaby: A scalable semantic configuration service for grids and clouds Wallaby:网格和云的可伸缩语义配置服务
W. C. Benton, Robert H. Rati, Erik J. Erlandson
{"title":"Wallaby: A scalable semantic configuration service for grids and clouds","authors":"W. C. Benton, Robert H. Rati, Erik J. Erlandson","doi":"10.1145/2063348.2063362","DOIUrl":"https://doi.org/10.1145/2063348.2063362","url":null,"abstract":"Job schedulers for grids and clouds can offer great generality and configurability, but they typically do so at the cost of increased administrator complexity. In this paper, we present Wallaby, an open-source, scalable configuration service for compute resources managed by the Condor high-throughput computing system. Wallaby offers several notable advantages over similar systems: it lets administrators write declarative specifications of user-visible functionality on groups of nodes instead of low-level configuration file fragments; it presents a high-level semantic model of Condor features and their interactions and dependencies; it validates configurations before pushing them to nodes; it supports version control, \"undo,\" and configuration differencing; and it includes a networked API that enables extensions and advanced functionality. Wallaby allows administrators to extend pools to include more physical, virtual, or cloud nodes with minimal explicit con figuration. Finally, it is scalable, supporting pools consisting of thousands of nodes with hundreds of configuration parameters each.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123985821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
System implications of memory reliability in exascale computing 百亿亿次计算中存储器可靠性的系统含义
Sheng Li, Ke Chen, Ming-yu Hsieh, Naveen Muralimanohar, C. Kersey, J. Brockman, Arun Rodrigues, N. Jouppi
{"title":"System implications of memory reliability in exascale computing","authors":"Sheng Li, Ke Chen, Ming-yu Hsieh, Naveen Muralimanohar, C. Kersey, J. Brockman, Arun Rodrigues, N. Jouppi","doi":"10.1145/2063384.2063445","DOIUrl":"https://doi.org/10.1145/2063384.2063445","url":null,"abstract":"Resiliency will be one of the toughest challenges in future exascale systems. Memory errors contribute more than 40% of the total hardware-related failures and are projected to increase in future exascale systems. The use of error correction codes (ECC) and checkpointing are two effective approaches to fault tolerance. While there are numerous studies on ECC or checkpointing in isolation, this is the first paper to investigate the combined effect of both on overall system performance and power. Specifically, we study the impact of various ECC schemes (SECDED, BCH, and chip-kill) in conjunction with checkpointing on future exascale systems. Our simulation results show that while chipkill is 13% better for computation-intensive applications, BCH has a 28% advantage in system energy-delay product (EDP) for memory-intensive applications. We also propose to use BCH in tagged memory systems with commodity DRAMs where chipkill is impractical. Our proposed architecture achieves 2.3× better system EDP than state-of-the-art tagged memory systems.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128681287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
One stop high performance computing user support at SNL 一站式高性能计算用户支持在SNL
J. Greenfield, L. Ice, S. Corwell, K. Haskell, C. Pavlakos, J. Noe
{"title":"One stop high performance computing user support at SNL","authors":"J. Greenfield, L. Ice, S. Corwell, K. Haskell, C. Pavlakos, J. Noe","doi":"10.1145/2063348.2063383","DOIUrl":"https://doi.org/10.1145/2063348.2063383","url":null,"abstract":"To improve the quality of user support for scientific, engineering, and high performance computing customers, the HPC OneStop Team unified the customer support activities of ten separate groups at Sandia National Laboratories (SNL). To our user communities, this team has been successful in providing a single, \"one stop\" interface for all engineering and scientific computing support, for everything from scientific applications on workstations, through small cluster operations, to large problems on the largest capability systems. To the service providers, HPC OneStop has promoted synergies, reduced redundancy of ticketing tools, and improved the capabilities for sharing problems and solutions among groups. HPC OneStop successfully accomplished the task of providing a \"one stop shop\" for our customers by: creating a unified portal for information access, integrating one ticketing tool to help improve collaboration among the various support groups, and developing a tiered HPC support structure focused on the customer.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116152633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating multi-touch in high-resolution display environments 在高分辨率显示环境中集成多点触控
Brandt M. Westing, B. Urick, M. Esteva, Freddy Rojas, Weijia Xu
{"title":"Integrating multi-touch in high-resolution display environments","authors":"Brandt M. Westing, B. Urick, M. Esteva, Freddy Rojas, Weijia Xu","doi":"10.1145/2063348.2063359","DOIUrl":"https://doi.org/10.1145/2063348.2063359","url":null,"abstract":"High-resolution display environments consisting of many individual displays arrayed to form a single visible surface are commonly used to present large scale data. Using these displays often involves a control paradigm where interactions become cumbersome and non-intuitive. By combining high- resolution displays with multi-touch and gesture interactive hardware, researchers can explore data more naturally, efficiently and collaboratively. This fusion of technology is necessary to effectively use tiled-display environments and mediate their primary weakness interaction. In order to realize these objectives, a team at the Texas Advanced Computing Center (TACC) developed an economical display system using a combination of commodity hardware and customized software. In this paper we explain the requirements, design process, functions and best practices for constructing such displays. In addition, we explain how these systems can be used effectively with application examples.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127259425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime 在具有多核优化的消息驱动运行时的千万亿次机器上启用和缩放1亿个原子的生物分子模拟
Chao Mei, Yanhua Sun, G. Zheng, Eric J. Bohm, L. Kalé, James C. Phillips, Christopher B. Harrison
{"title":"Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime","authors":"Chao Mei, Yanhua Sun, G. Zheng, Eric J. Bohm, L. Kalé, James C. Phillips, Christopher B. Harrison","doi":"10.1145/2063384.2063466","DOIUrl":"https://doi.org/10.1145/2063384.2063466","url":null,"abstract":"A 100-million-atom biomolecular simulation with NAMD is one of the three benchmarks for the NSF-funded sustainable petascale machine. Simulating this large molecular system on a petascale machine presents great challenges, including handling I/O, large memory footprint and getting good strong-scaling results. In this paper, we present parallel I/O techniques to enable the simulation. A new SMP model is designed to efficiently utilize ubiquitous wide multicore clusters by extending the Charm++ asynchronous message-driven runtime. We exploit node-aware techniques to optimize both the application and the underlying SMP runtime. Hierarchical load balancing is further exploited to scale NAMD to the full Jaguar PF Cray XT5 (224,076 cores) at Oak Ridge National Laboratory, both with and without PME full electrostatics, achieving 93% parallel efficiency (vs 6720 cores) at 9 ms per step for a simple cutoff calculation. Excellent scaling is also obtained on 65,536 cores of the Intrepid Blue Gene/P at Argonne National Laboratory.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129137709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
SCMFS: A file system for Storage Class Memory SCMFS:存储类内存的文件系统
XiaoJian Wu, Sheng Qiu, A. Reddy
{"title":"SCMFS: A file system for Storage Class Memory","authors":"XiaoJian Wu, Sheng Qiu, A. Reddy","doi":"10.1145/2063384.2063436","DOIUrl":"https://doi.org/10.1145/2063384.2063436","url":null,"abstract":"This paper considers the problem of how to implement a file system on Storage Class Memory (SCM), that is directly connected to the memory bus, byte addressable and is also non-volatile. In this paper, we propose a new file system, called SCMFS, which is implemented on the virtual address space. In SCMFS, we utilize the existing memory management module in the operating system to do the block management and keep the space always contiguous for each file. The simplicity of SCMFS not only makes it easy to implement, but also improves the performance. We have implemented a prototype in Linux and evaluated its performance through multiple benchmarks.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116882032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 364
Scalable implementations of accurate excited-state coupled cluster theories: Application of high-level methods to porphyrin-based systems 精确激发态耦合集群理论的可扩展实现:基于卟啉系统的高级方法的应用
K. Kowalski, S. Krishnamoorthy, R. M. Olson, V. Tipparaju, E. Aprá
{"title":"Scalable implementations of accurate excited-state coupled cluster theories: Application of high-level methods to porphyrin-based systems","authors":"K. Kowalski, S. Krishnamoorthy, R. M. Olson, V. Tipparaju, E. Aprá","doi":"10.1145/2063384.2063481","DOIUrl":"https://doi.org/10.1145/2063384.2063481","url":null,"abstract":"The development of reliable tools for excited-state simulations is very important for understanding complex processes in the broad class of light harvesting systems and optoelectronic devices. Over the last years we have been developing equation of motion coupled cluster (EOMCC) methods capable of tackling these problems. In this paper we discuss the parallel performance of EOMCC codes which provide accurate description of excited-state correlation effects. Two aspects are discussed in detail: (1) a new algorithm for the iterative EOMCC methods based on improved parallel task scheduling algorithms, and (2) parallel algorithms for the non-iterative methods describing the effect of triply excited configurations. We demonstrate that the most computationally intensive non-iterative part can take advantage of 210,000 cores of the Cray XT5 system at the Oak Ridge Leadership Computing Facility (OLCF), achieving over 80% parallel efficiency. In particular, we demonstrate the importance of the computationally demanding non-iterative many-body methods in matching experimental level of accuracy for several porphyrin-based systems.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117289879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
The NWSC benchmark suite: Using scientific throughput to measure supercomputer performance NWSC基准套件:使用科学吞吐量来测量超级计算机性能
Rory C. Kelly, Siddhartha S. Ghosh, Si Liu, D. D. Vento, R. Valent
{"title":"The NWSC benchmark suite: Using scientific throughput to measure supercomputer performance","authors":"Rory C. Kelly, Siddhartha S. Ghosh, Si Liu, D. D. Vento, R. Valent","doi":"10.1145/2063348.2063358","DOIUrl":"https://doi.org/10.1145/2063348.2063358","url":null,"abstract":"The NCAR-Wyoming Supercomputing Center (NWSC) will begin operating in June 2012, and will house NCAR's next generation HPC system. The NWSC will support a broad spectrum of Earth Science research drawn from a user community with diverse requirements for computing, storage, and data analysis resources. To ensure that the NWSC satisfies the needs of this community, the procurement benchmarking process was driven by science requirements from the start. We will discuss the science objectives for NWSC, translating scientific goals into technical requirements for a machine, and assembling a benchmark suite from community science models and synthetic tests to measure the technical capabilities of the proposed HPC systems. We will also talk about the benchmark analysis process, extending the benchmark suite as a testing tool over the life of the machine, and the applicability of the NWSC benchmarking suite to other HPC centers.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115503903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hadoop acceleration through network levitated merge Hadoop加速通过网络悬浮合并
Yandong Wang, Xinyu Que, Weikuan Yu, Dror Goldenberg, Dhiraj Sehgal
{"title":"Hadoop acceleration through network levitated merge","authors":"Yandong Wang, Xinyu Que, Weikuan Yu, Dror Goldenberg, Dhiraj Sehgal","doi":"10.1145/2063384.2063461","DOIUrl":"https://doi.org/10.1145/2063384.2063461","url":null,"abstract":"Hadoop is a popular open-source implementation of the MapReduce programming model for cloud computing. However, it faces a number of issues to achieve the best performance from the underlying system. These include a serialization barrier that delays the reduce phase, repetitive merges and disk access, and lack of capability to leverage latest high speed interconnects. We describe Hadoop-A, an acceleration framework that optimizes Hadoop with plugin components implemented in C++ for fast data movement, overcoming its existing limitations. A novel network-levitated merge algorithm is introduced to merge data without repetition and disk access. In addition, a full pipeline is designed to overlap the shuffle, merge and reduce phases. Our experimental results show that Hadoop-A doubles the data processing throughput of Hadoop, and reduces CPU utilization by more than 36%.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114178960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 123
Efficient data race detection for distributed memory parallel programs 分布式内存并行程序的高效数据竞争检测
Chang-Seo Park, Koushik Sen, Paul H. Hargrove, Costin Iancu
{"title":"Efficient data race detection for distributed memory parallel programs","authors":"Chang-Seo Park, Koushik Sen, Paul H. Hargrove, Costin Iancu","doi":"10.1145/2063384.2063452","DOIUrl":"https://doi.org/10.1145/2063384.2063452","url":null,"abstract":"In this paper we present a precise data race detection technique for distributed memory parallel programs. Our technique, which we call Active Testing, builds on our previous work on race detection for shared memory Java and C programs and it handles programs written using shared memory approaches as well as bulk communication. Active testing works in two phases: in the first phase, it performs an imprecise dynamic analysis of an execution of the program and finds potential data races that could happen if the program is executed with a different thread schedule. In the second phase, active testing re-executes the program by actively controlling the thread schedule so that the data races reported in the first phase can be confirmed. A key highlight of our technique is that it can scalably handle distributed programs with bulk communication and single- and split-phase barriers. Another key feature of our technique is that it is precise — a data race confirmed by active testing is an actual data race present in the program; however, being a testing approach, our technique can miss actual data races. We implement the framework for the UPC programming language and demonstrate scalability up to a thousand cores for programs with both fine-grained and bulk (MPI style) communication. The tool confirms previously known bugs and uncovers several unknown ones. Our extensions capture constructs proposed in several modern programming languages for High Performance Computing, most notably non-blocking barriers and collectives.","PeriodicalId":358797,"journal":{"name":"2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114527306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信