ACM/IEEE SC 2001 Conference (SC'01)最新文献

筛选
英文 中文
Delivering Acceleration: The Potential for Increased HPC Application Performance Using Reconfigurable Logic 提供加速:使用可重构逻辑提高HPC应用程序性能的潜力
ACM/IEEE SC 2001 Conference (SC'01) Pub Date : 2001-11-10 DOI: 10.1145/582034.582066
D. Caliga, David Barker
{"title":"Delivering Acceleration: The Potential for Increased HPC Application Performance Using Reconfigurable Logic","authors":"D. Caliga, David Barker","doi":"10.1145/582034.582066","DOIUrl":"https://doi.org/10.1145/582034.582066","url":null,"abstract":"SRC Computers, Inc. has integrated adaptive computing into its SRC-6 high-end server, incorporating reconfigurable processors as peers to the microprocessors. Performance improvements resulting from reconfigurable computing can provide orders of magnitude speedups for a wide variety of algorithms. Reconfigurable logic in Field Programmable Gate Arrays (FPGAs) has shown great advantage to date in special purpose applications and specialty hardware. SRC Computers is working to bring this technology into the general purpose HPC world via an advanced system interconnect and enhanced compiler technology.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115446839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Parallel Implementation and Performance of FastDNAml - A Program for Maximum Likelihood Phylogenetic Inference 最大似然系统发育推断程序FastDNAml的并行实现和性能
ACM/IEEE SC 2001 Conference (SC'01) Pub Date : 2001-11-10 DOI: 10.1145/582034.582054
C. Stewart, Dave Hart, Donald K. Berry, G. Olsen, E. Wernert, William Fischer
{"title":"Parallel Implementation and Performance of FastDNAml - A Program for Maximum Likelihood Phylogenetic Inference","authors":"C. Stewart, Dave Hart, Donald K. Berry, G. Olsen, E. Wernert, William Fischer","doi":"10.1145/582034.582054","DOIUrl":"https://doi.org/10.1145/582034.582054","url":null,"abstract":"This paper describes the parallel implementation of fastDNAml, a program for the maximum likelihood inference of phylogenetic trees from DNA sequence data. Mathematical means of inferring phylogenetic trees have been made possible by the wealth of DNA data now available. Maximum likelihood analysis of phylogenetic trees is extremely computationally intensive. Availability of computer resources is a key factor limiting use of such analyses. fastDNAml is implemented in serial, PVM, and MPI versions, and may be modified to use other message passing libraries in the future. We have developed a viewer for comparing phylogenies. We tested the scaling behavior of fastDNAml on an IBM RS/6000 SP up to 64 processors. The parallel version of fastDNAml is one of very few computational phylogenetics codes that scale well. fastDNAml is available for download as source code or compiled for Linux or AIX.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114818908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 111
Removing the Overhead from Software-Based Shared Memory 消除基于软件的共享内存的开销
ACM/IEEE SC 2001 Conference (SC'01) Pub Date : 2001-11-10 DOI: 10.1145/582034.582090
Z. Radovic, Erik Hagersten
{"title":"Removing the Overhead from Software-Based Shared Memory","authors":"Z. Radovic, Erik Hagersten","doi":"10.1145/582034.582090","DOIUrl":"https://doi.org/10.1145/582034.582090","url":null,"abstract":"The implementation presented in this paper — DSZOOM-WF — is a sequentially consistent, fine-grained distributed software-based shared memory. It demonstrates a protocol-handling overhead below a microsecond for all the actions involved in a remote load operation, to be compared to the fastest implementation to date of around ten microseconds. The all-software protocol is implemented assuming some basic low-level primitives in the cluster interconnect and an operating system bypass functionality, similar to the emerging InfiniBand standard. All interrupt- and/or poll-based asynchronous protocol processing is completely removed by running the entire coherence protocol in the requesting processor. This not only removes the asynchronous overhead, but also makes use of a processor that otherwise would stall. The technique is applicable to both page-based and fine-grain software-based shared memory. DSZOOM-WF consistently demonstrates performance comparable to hardware-based distributed shared memory implementations.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130052862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Increasing Temporal Locality with Skewing and Recursive Blocking 用倾斜和递归阻塞增加时间局部性
ACM/IEEE SC 2001 Conference (SC'01) Pub Date : 2001-11-10 DOI: 10.1145/582034.582077
G. Jin, J. Mellor-Crummey, R. Fowler
{"title":"Increasing Temporal Locality with Skewing and Recursive Blocking","authors":"G. Jin, J. Mellor-Crummey, R. Fowler","doi":"10.1145/582034.582077","DOIUrl":"https://doi.org/10.1145/582034.582077","url":null,"abstract":"We present a strategy, called recursive prismatic time skewing, that increase temporal reuse at all memory hierarchy levels, thus improving the performance of scientific codes that use iterative methods. Prismatic time skewing partitions iteration space of multiple loops into skewed prisms with both spatial and temporal (or convergence) dimensions. Novel aspects of this work include: multi-dimensional loop skewing; handling carried data dependences in the skewed loops without additional storage; bi-directional skewing to accommodate periodic boundary conditions; and an analysis and transformation strategy that works inter-procedurally. We combine prismatic skewing with a recursive blocking strategy to boost reuse at all levels in a memory hierarchy. A preliminary evaluation of these techniques shows significant performance improvements compared both to original codes and to methods described previously in the literature. With an inter-procedural application of our techniques, we were able to reduce total primary cache misses of a large application code by 27% and secondary cache misses by 119%.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125434525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Improving Parallel Irregular Reductions Using Partial Array Expansion 利用部分数组展开改进并行不规则约简
ACM/IEEE SC 2001 Conference (SC'01) Pub Date : 2001-11-10 DOI: 10.1145/582034.582072
E. Gutiérrez, O. Plata, E. Zapata
{"title":"Improving Parallel Irregular Reductions Using Partial Array Expansion","authors":"E. Gutiérrez, O. Plata, E. Zapata","doi":"10.1145/582034.582072","DOIUrl":"https://doi.org/10.1145/582034.582072","url":null,"abstract":"Much effort has been devoted recently to efficiently parallelize irregular reductions. In this paper, parallelizing techniques for these computations are analyzed in terms of three performance aspects: parallelism, data locality and memory overhead. These aspects have a strong influence in the overall performance and scalability of the parallel code. We will discuss how the parallelization techniques usually try to optimize some of these aspects, while missing the other(s). We will show that by combining complementary techniques we can improve the overall performance/scalability of the parallel irregular reduction, obtaining an effective solution for large problems on large machines. Specifically, a combination of array expansion and a locality-oriented method (DWA-LIP), named partial array expansion, is introduced. An implementation of the proposed method is discussed, showing that the transformation that the compiler must apply to the irregular reduction code is not excessively complex. Finally, the method is analyzed and experimentally evaluated.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129246660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Dynamic Page Placement to Improve Locality in CC-NUMA Multiprocessors for TPC-C 基于TPC-C的CC-NUMA多处理器的动态页面放置改进局域性
ACM/IEEE SC 2001 Conference (SC'01) Pub Date : 2001-11-10 DOI: 10.1145/582034.582067
Kenneth M. Wilson, B. Aglietti
{"title":"Dynamic Page Placement to Improve Locality in CC-NUMA Multiprocessors for TPC-C","authors":"Kenneth M. Wilson, B. Aglietti","doi":"10.1145/582034.582067","DOIUrl":"https://doi.org/10.1145/582034.582067","url":null,"abstract":"The use of CC-NUMA multiprocessors complicates the placement of physical memory pages. Memory closest to a processor provides the best access time, but optimal memory page placement is a difficult problem with process movement, multiple processes requiring access to the same physical memory page, and application behavior changing over execution time. We use dynamic page placement to move memory pages where needed for the database benchmark TPC-C executing on a four node CC-NUMA multiprocessor. Dynamic page placement achieves local memory accesses up to 73% of the time instead of the static page placement results of 34% locality achieved with first touch and 25% with round robin. This can result in a 17% improvement in performance.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"07 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127266591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Parallel Graphics and Interactivity with the Scaleable Graphics Engine 并行图形和可伸缩图形引擎的交互性
ACM/IEEE SC 2001 Conference (SC'01) Pub Date : 2001-11-10 DOI: 10.1145/582034.582039
Kenneth A. Perrine, Donald R. Jones
{"title":"Parallel Graphics and Interactivity with the Scaleable Graphics Engine","authors":"Kenneth A. Perrine, Donald R. Jones","doi":"10.1145/582034.582039","DOIUrl":"https://doi.org/10.1145/582034.582039","url":null,"abstract":"A parallel rendering environment is being developed to utilize the IBM Scaleable Graphics Engine (SGE), a hardware frame buffer for parallel computers. Goals of this software development effort include finding efficient ways of producing and displaying graphics generated on IBM SP nodes and of assisting programmers in adapting or creating scientific simulation applications to use the SGE. Four software development phases discussed utilize the SGE: tunneling, SMP rendering, development of an OpenGL API implementation which utilizes the SGE in parallel environments, and additions to the SGE-enabled OpenGL implementation that uses threads. The performance observed in software tests show that programmers would be able to utilize the SGE to output interactive graphics in a parallel environment.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125739305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
A Case Study in Application I/O on Linux Clusters Linux集群上应用程序I/O的案例研究
ACM/IEEE SC 2001 Conference (SC'01) Pub Date : 2001-11-10 DOI: 10.1145/582034.582045
R. Ross, Daniel Nurmi, A. Cheng, M. Zingale
{"title":"A Case Study in Application I/O on Linux Clusters","authors":"R. Ross, Daniel Nurmi, A. Cheng, M. Zingale","doi":"10.1145/582034.582045","DOIUrl":"https://doi.org/10.1145/582034.582045","url":null,"abstract":"A critical but often ignored component of system performance is the I/O system. Today’s applications demand a great deal from underlying storage systems and software, and both high-performance distributed storage and high level interfaces have been developed to fill these needs. In this paper we discuss the I/O performance of a parallel scientific application on a Linux cluster, the FLASH astrophysics code. This application relies on three I/O software components to provide high-performance parallel I/O on Linux clusters: the Parallel Virtual File System, the ROMIO MPI-IO implementation, and the Hierarchical Data Format library. Through instrumentation of both the application and underlying system software code we discover the location of major software bottlenecks. We work around the most inhibiting of these bottlenecks, showing substantial performance improvement. We point out similarities between the inefficiencies found here and those found in message passing systems, indicating that research in the message passing field could be leveraged to solve similar problems in high-level I/O interfaces.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131527543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
The Sun Fireplane System Interconnect 太阳飞机系统互连
ACM/IEEE SC 2001 Conference (SC'01) Pub Date : 2001-11-10 DOI: 10.1145/582034.582041
Alan E. Charlesworth
{"title":"The Sun Fireplane System Interconnect","authors":"Alan E. Charlesworth","doi":"10.1145/582034.582041","DOIUrl":"https://doi.org/10.1145/582034.582041","url":null,"abstract":"System interconnect is a key determiner of the cost, performance, and reliability of large cache-coherent, shared-memory multiprocessors. Interconnect implementations have to accommodate ever greater numbers of ever faster processors. This paper describes the Sun™ Fireplane two-level cache-coherency protocol, and its use in the medium and large-sized UltraSPARC-III-based Sun Fire™ servers.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122366633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
On Using SCALEA for Performance Analysis of Distributed and Parallel Programs 基于SCALEA的分布式并行程序性能分析研究
ACM/IEEE SC 2001 Conference (SC'01) Pub Date : 2001-11-10 DOI: 10.1145/582034.582068
Hong Linh Truong, T. Fahringer, Georg Madsen, A. Malony, H. Moritsch, S. Shende
{"title":"On Using SCALEA for Performance Analysis of Distributed and Parallel Programs","authors":"Hong Linh Truong, T. Fahringer, Georg Madsen, A. Malony, H. Moritsch, S. Shende","doi":"10.1145/582034.582068","DOIUrl":"https://doi.org/10.1145/582034.582068","url":null,"abstract":"In this paper we give an overview of SCALEA, which is a new performance analysis tool for OpenMP, MPI, HPF, and mixed parallel/distributed programs. SCALEA instruments, executes and measures programs and computes a variety of performance overheads based on a novel overhead classification. Source code and HWprofiling is combined in a single system which significantly extends the scope of possible overheads that can be measured and examined, ranging from HW-counters, such as the number of cache misses or floating point operations, to more complex performance metrics, such as control or loss of parallelism. Moreover, SCALEA uses a new representation of code regions, called the dynamic code region call graph, which enables detailed overhead analysis for arbitrary code regions. An instrumentation description file is used to relate performance information to code regions of the input program and to reduce instrumentation overhead. Several experiments with realistic codes that cover MPI, OpenMP, HPF, and mixed OpenMP/MPI codes demonstrate the usefulness of SCALEA.","PeriodicalId":325282,"journal":{"name":"ACM/IEEE SC 2001 Conference (SC'01)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123198518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信