ACM/IEEE SC 2006 Conference (SC'06)最新文献

筛选
英文 中文
High-Performance and Scalable MPI over InfiniBand with Reduced Memory Usage: An In-Depth performance Analysis 通过InfiniBand减少内存使用的高性能和可扩展MPI:深入的性能分析
ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188565
S. Sur, Matthew J. Koop, D. Panda
{"title":"High-Performance and Scalable MPI over InfiniBand with Reduced Memory Usage: An In-Depth performance Analysis","authors":"S. Sur, Matthew J. Koop, D. Panda","doi":"10.1145/1188455.1188565","DOIUrl":"https://doi.org/10.1145/1188455.1188565","url":null,"abstract":"InfiniBand is an emerging HPC interconnect being deployed in very large scale clusters, with even larger InfiniBand-based clusters expected to be deployed in the near future. The message passing interface (MPI) is the programming model of choice for scientific applications running on these large scale clusters. Thus, it is very critical for the MPI implementation used to be based on a scalable and high-performance design. We analyze the performance and scalability aspects of MVAPICH, a popular open-source MPI implementation on InfiniBand, from an application standpoint. We analyze the performance and memory requirements of the MPI library while executing several well-known applications and benchmarks, such as NAS, SuperLU, NAMD, and HPL on a 64-node InfiniBand cluster. Our analysis reveals that latest design of MVAPICH requires an order of magnitude less internal MPI memory (average per process) and yet delivers the best possible performance. Further, we observe that for these benchmarks and applications evaluated, the internal memory requirement of MVAPICH remains nearly constant at around 5-10 MB as the number of processes increase, indicating that the MVAPICH design is highly scalable","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130761396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Blue Matter: Approaching the Limits of Concurrency for Classical Molecular Dynamics 蓝物质:接近经典分子动力学并发性的极限
ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188547
B. Fitch, A. Rayshubskiy, M. Eleftheriou, T. Ward, M. Giampapa, M. Pitman, R. Germain
{"title":"Blue Matter: Approaching the Limits of Concurrency for Classical Molecular Dynamics","authors":"B. Fitch, A. Rayshubskiy, M. Eleftheriou, T. Ward, M. Giampapa, M. Pitman, R. Germain","doi":"10.1145/1188455.1188547","DOIUrl":"https://doi.org/10.1145/1188455.1188547","url":null,"abstract":"This paper describes a novel spatial-force decomposition for N-body simulations for which we observe O(sqrt(p)) communication scaling. This has enabled Blue Matter to approach the effective limits of concurrency for molecular dynamics using particle-mesh (FFT-based) methods for handling electrostatic interactions. Using this decomposition, Blue Matter running on Blue Gene/L has achieved simulation rates in excess of 1000 time steps per second and demonstrated significant speed-ups to O(1) atom per node. Blue Matter employs a communicating sequential process (CSP) style model with application communication state machines compiled to hardware interfaces. The scalability achieved has enabled methodologically rigorous biomolecular simulations on biologically interesting systems, such as membrane-bound proteins, whose time scales dwarf previous work on those systems. Major scaling improvements require exploration of alternative algorithms for treating the long range electrostatics","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128964947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Evaluation of a Workflow Scheduler Using Integrated Performance Modelling and Batch Queue Wait Time Prediction 基于集成性能建模和批处理队列等待时间预测的工作流调度程序评估
ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188579
Daniel Nurmi, A. Mandal, J. Brevik, C. Koelbel, R. Wolski, K. Kennedy
{"title":"Evaluation of a Workflow Scheduler Using Integrated Performance Modelling and Batch Queue Wait Time Prediction","authors":"Daniel Nurmi, A. Mandal, J. Brevik, C. Koelbel, R. Wolski, K. Kennedy","doi":"10.1145/1188455.1188579","DOIUrl":"https://doi.org/10.1145/1188455.1188579","url":null,"abstract":"Large-scale distributed systems offer computational power at unprecedented levels. In the past, HPC users typically had access to relatively few individual supercomputers and, in general, would assign a one-to-one mapping of applications to machines. Modern HPC users have simultaneous access to a large number of individual machines and are beginning to make use of all of them for single-application execution cycles. One method that application developers have devised in order to take advantage of such systems is to organize an entire application execution cycle as a workflow. The scheduling of such workflows has been the topic of a great deal of research in the past few years and, although very sophisticated algorithms have been devised, a very specific aspect of these distributed systems, namely that most supercomputing resources employ batch queue scheduling software, has therefore been omitted from consideration, presumably because it is difficult to model accurately. In this work, we augment an existing workflow scheduler through the introduction of methods which make accurate predictions of both the performance of the application on specific hardware, and the amount of time individual workflow tasks would spend waiting in batch queues. Our results show that although a workflow scheduler alone may choose correct task placement based on data locality or network connectivity, this benefit is often compromised by the fact that most jobs submitted to current systems must wait in overcommitted batch queues for a significant portion of time. However, incorporating the enhancements we describe improves workflow execution time in settings where batch queues impose significant delays on constituent workflow tasks","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122515577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
A Memory Model for Scientific Algorithms on Graphics Processors 图形处理器上科学算法的内存模型
ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188549
N. Govindaraju, S. Larsen, J. Gray, Dinesh Manocha
{"title":"A Memory Model for Scientific Algorithms on Graphics Processors","authors":"N. Govindaraju, S. Larsen, J. Gray, Dinesh Manocha","doi":"10.1145/1188455.1188549","DOIUrl":"https://doi.org/10.1145/1188455.1188549","url":null,"abstract":"We present a memory model to analyze and improve the performance of scientific algorithms on graphics processing units (GPUs). Our memory model is based on texturing hardware, which uses a 2D block-based array representation to perform the underlying computations. We incorporate many characteristics of GPU architectures including smaller cache sizes, 2D block representations, and use the 3C's model to analyze the cache misses. Moreover, we present techniques to improve the performance of nested loops on GPUs. In order to demonstrate the effectiveness of our model, we highlight its performance on three memory-intensive scientific applications - sorting, fast Fourier transform and dense matrix-multiplication. In practice, our cache-efficient algorithms for these applications are able to achieve memory throughput of 30-50 GB/s on a NVIDIA 7900 GTX GPU. We also compare our results with prior GPU-based and CPU-based implementations on high-end processors. In practice, we are able to achieve 2-5x performance improvement","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"92 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120869714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 211
Toward Real-Time Image Guided Neurosurgery Using Distributed and Grid Computing 利用分布式和网格计算实现实时图像引导神经外科
ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188536
N. Chrisochoides, Andrey Fedorov, A. Kot, N. Archip, P. Black, O. Clatz, A. Golby, R. Kikinis, S. Warfield
{"title":"Toward Real-Time Image Guided Neurosurgery Using Distributed and Grid Computing","authors":"N. Chrisochoides, Andrey Fedorov, A. Kot, N. Archip, P. Black, O. Clatz, A. Golby, R. Kikinis, S. Warfield","doi":"10.1145/1188455.1188536","DOIUrl":"https://doi.org/10.1145/1188455.1188536","url":null,"abstract":"Neurosurgical resection is a therapeutic intervention in the treatment of brain tumors. Precision of the resection can be improved by utilizing magnetic resonance imaging (MRI) as an aid in decision making during image guided neurosurgery (IGNS). Image registration adjusts pre-operative data according to intra-operative tissue deformation. Some of the approaches increase the registration accuracy by tracking image landmarks through the whole brain volume. High computational cost used to render these techniques inappropriate for clinical applications. In this paper we present a parallel implementation of a state of the art registration method, and a number of needed incremental improvements. Overall, we reduced the response time for registration of an average dataset from about an hour and for some cases more than an hour to less than seven minutes, which is within the time constraints imposed by neurosurgeons. For the first time in clinical practice we demonstrated, that with the help of distributed computing non-rigid MRI registration based on volume tracking can be computed intra-operatively","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130255700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
End-System Aware, Rate-Adaptive Protocol for Network Transport in LambdaGrid Environments 面向终端系统的低速网格网络传输速率自适应协议
ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188572
P. Datta, W. Feng, Sushant Sharma
{"title":"End-System Aware, Rate-Adaptive Protocol for Network Transport in LambdaGrid Environments","authors":"P. Datta, W. Feng, Sushant Sharma","doi":"10.1145/1188455.1188572","DOIUrl":"https://doi.org/10.1145/1188455.1188572","url":null,"abstract":"Next-generation e-Science applications would require the ability to transfer information at high data rates between distributed computing centers and data repositories. A LambdaGrid offers dedicated, optical, circuit-switched, point-to-point connections that can be reserved exclusively for such applications. These dedicated high-speed connections eliminate network congestion as seen in traditional Internet, but they effectively push the network congestion to the end systems, as processing speeds cannot keep up with networking speeds. Thus, developing an efficient transport protocol over such high-speed dedicated circuits is of critical importance. We propose the idea of a end-system aware, rate-adaptive protocol for network transport, based on end-system performance monitoring. Our proposed protocol significantly improves the performance of data transfer over LambdaGrids by intelligently adapting the sending rate based on end-system constraints. We demonstrate the effectiveness of our proposed protocol and illustrate the performance gains achieved via wide-area network emulation","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131954925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Architectures and APIs: Assessing Requirements for Delivering FPGA Performance to Applications 架构和api:评估向应用交付FPGA性能的需求
ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188571
K. Underwood, K. Hemmert, C. Ulmer
{"title":"Architectures and APIs: Assessing Requirements for Delivering FPGA Performance to Applications","authors":"K. Underwood, K. Hemmert, C. Ulmer","doi":"10.1145/1188455.1188571","DOIUrl":"https://doi.org/10.1145/1188455.1188571","url":null,"abstract":"Reconfigurable computing leveraging field programmable gate arrays (FPGAs) is one of many accelerator technologies that are being investigated for application to high performance computing (HPC). Like most accelerators, FPGAs are very efficient at both dense matrix multiplication and FFT computations, but two important aspects of how to deliver that performance to applications have received too little attention. First, the standard API for important compute kernels hides parallelism from the system. Second, the issue of system architecture is virtually never addressed. This paper explores both issues and their implications for applications. We find that high bandwidth, low latency connectivity can be important, but the right API can be even more important","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132961477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Designing a Highly-Scalable Operating System: The Blue Gene/L Story 设计一个高度可扩展的操作系统:蓝色基因/L的故事
ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188578
José E. Moreira, Michael Brutman, J. Castaños, Thomas Engelsiepen, M. Giampapa, Thomas Gooding, R. Haskin, T. Inglett, D. Lieber, P. McCarthy, M. Mundy, Jeff Parker, Brian P. Wallenfelt
{"title":"Designing a Highly-Scalable Operating System: The Blue Gene/L Story","authors":"José E. Moreira, Michael Brutman, J. Castaños, Thomas Engelsiepen, M. Giampapa, Thomas Gooding, R. Haskin, T. Inglett, D. Lieber, P. McCarthy, M. Mundy, Jeff Parker, Brian P. Wallenfelt","doi":"10.1145/1188455.1188578","DOIUrl":"https://doi.org/10.1145/1188455.1188578","url":null,"abstract":"Blue Gene/L, is currently the world's fastest and most scalable supercomputer. It has demonstrated essentially linear scaling all the way to 131,072 processors in several benchmarks and real applications. The operating systems for the compute and I/O nodes of Blue Gene/L are among the components responsible for that scalability. Compute nodes are dedicated to running application processes, whereas I/O nodes are dedicated to performing system functions. The operating systems adopted for each of these nodes reflect this separation of junction. Compute nodes run a lightweight operating system called the compute node kernel. I/O nodes run a port of the Linux operating system. This paper discusses the architecture and design of this solution for Blue Gene/L in the context of the hardware characteristics that led to the design decisions. It also explains and demonstrates how those decisions are instrumental in achieving the performance and scalability for which Blue Gene/L is famous","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114669819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Toward a Doctrine of Containment: Grid Hosting with Adaptive Resource Control 迈向遏制原则:网格托管与自适应资源控制
ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188561
L. Ramakrishnan, David E. Irwin, Laura E. Grit, Aydan R. Yumerefendi, Adriana Iamnitchi, J. Chase
{"title":"Toward a Doctrine of Containment: Grid Hosting with Adaptive Resource Control","authors":"L. Ramakrishnan, David E. Irwin, Laura E. Grit, Aydan R. Yumerefendi, Adriana Iamnitchi, J. Chase","doi":"10.1145/1188455.1188561","DOIUrl":"https://doi.org/10.1145/1188455.1188561","url":null,"abstract":"Grid computing environments need secure resource control and predictable service quality in order to be sustainable. We propose a grid hosting model in which independent, self-contained grid deployments run within isolated containers on shared resource provider sites. Sites and hosted grids interact via an underlying resource control plane to manage a dynamic binding of computational resources to containers. We present a prototype grid hosting system, in which a set of independent globus grids share a network of cluster sites. Each grid instance runs a coordinator that leases and configures cluster resources for its grid on demand. Experiments demonstrate adaptive provisioning of cluster resources and contrast job-level and container-level resource management in the context of two grid application managers","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123573837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
Evaluating Grid Portal Security 评估网格门户的安全性
ACM/IEEE SC 2006 Conference (SC'06) Pub Date : 2006-11-11 DOI: 10.1145/1188455.1188574
D. Vecchio, Victor Hazlewood, M. Humphrey
{"title":"Evaluating Grid Portal Security","authors":"D. Vecchio, Victor Hazlewood, M. Humphrey","doi":"10.1145/1188455.1188574","DOIUrl":"https://doi.org/10.1145/1188455.1188574","url":null,"abstract":"Grid portals are an increasingly popular mechanism for creating customizable, Web-based interfaces to grid services and resources. Due to the powerful, general-purpose nature of grid technology, the security of any portal or entry point to such resources cannot be taken lightly. This is particularly true if the portal is running inside of a trusted perimeter, such as a science gateway running on an SDSC machine for access to the TeraGrid. To evaluate the current state of grid portal security, we undertake a comparative analysis of the three most popular grid portal frameworks that are being pursued as frontends to the TeraGrid: GridSphere, OGCE and clarens. We explore general challenges that grid portals face in the areas of authentication (including user identification), authorization, auditing (logging) and session management then contrast how the different grid portal implementations address these challenges. We find that although most grid portals address these security concerns to a certain extent, there is still room for improvement, particularly in the areas of secure default configurations and comprehensive logging and auditing support. We conclude with specific recommendations for designing, implementing and configuring secure grid portals","PeriodicalId":333909,"journal":{"name":"ACM/IEEE SC 2006 Conference (SC'06)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129170044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信