2005 IEEE International Conference on Cluster Computing最新文献

筛选
英文 中文
Extracting Critical Path Graphs from MPI Applications 从MPI应用程序中提取关键路径图
2005 IEEE International Conference on Cluster Computing Pub Date : 2005-07-27 DOI: 10.1109/CLUSTR.2005.347035
M. Schulz
{"title":"Extracting Critical Path Graphs from MPI Applications","authors":"M. Schulz","doi":"10.1109/CLUSTR.2005.347035","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347035","url":null,"abstract":"The critical path is one of the fundamental runtime characteristics of a parallel program. It identifies the longest execution sequence without wait delays. In other words, the critical path is the global execution path that inflicts wait operations on other nodes without itself being stalled. Hence, it dictates the overall runtime and knowing it is important to understand an application's runtime and message behavior and to target optimizations. We have developed a toolset that identifies the critical path of MPI applications, extracts it, and then produces a graphical representation of the corresponding program execution graph to visualize it. To implement this, we intercept all MPI library calls, use the information to build the relevant subset of the execution graph, and then extract the critical path from there. We have applied our technique to several scientific benchmarks and successfully produced critical path diagrams for applications running on up to 128 processors","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132801481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Accelerating List Management for MPI 加速列表管理的MPI
2005 IEEE International Conference on Cluster Computing Pub Date : 2005-07-01 DOI: 10.1109/CLUSTR.2005.347036
K. Underwood, Arun Rodrigues, K. S. Hemmeit
{"title":"Accelerating List Management for MPI","authors":"K. Underwood, Arun Rodrigues, K. S. Hemmeit","doi":"10.1109/CLUSTR.2005.347036","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347036","url":null,"abstract":"The latency and throughput of MPI messages are critically important to a range of parallel scientific applications. In many modern networks, both of these performance characteristics are largely driven by the performance of a processor on the network interface. Because of the semantics of MPI, this embedded processor is forced to traverse a linked list of posted receives each time a messages is received. As this list grows long, the latency of message reception grows and the throughput of MPI messages decreases. This paper presents a novel hardware feature to handle list management functions on a network interface. By moving functions such as list insertion, list traversal, and list deletion to the hardware unit, latencies are decreased by up to 20% in the zero length queue case with dramatic improvements in the presence of long queues. Similarly, the throughput is increased by up to 10% in the zero length queue case and by nearly 100% in the presence queues of 30 messages","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"339 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114728481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fast Query Processing by Distributing an Index over CPU Caches 通过在CPU缓存上分配索引来快速处理查询
2005 IEEE International Conference on Cluster Computing Pub Date : 2004-10-25 DOI: 10.1109/CLUSTR.2005.347047
Xiaoqin Ma, G. Cooperman
{"title":"Fast Query Processing by Distributing an Index over CPU Caches","authors":"Xiaoqin Ma, G. Cooperman","doi":"10.1109/CLUSTR.2005.347047","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347047","url":null,"abstract":"Data intensive applications on clusters often require requests quickly be sent to the node managing the desired data. In many applications, one must look through a sorted tree structure to determine the responsible node for accessing or storing the data. Examples include object tracking in sensor networks, packet routing over the Internet, request processing in publish-subscribe middleware, and query processing in database systems. When the tree structure is larger than the CPU cache, the standard implementation potentially incurs many cache misses for each lookup; one cache miss at each successive level of the tree. As the CPU-RAM gap grows, this performance degradation will only become worse in the future. We propose a solution that takes advantage of the growing speed of local area networks for clusters. We split the sorted tree structure among the nodes of the cluster. We assume that the structure will fit inside the aggregation of the CPU caches of the entire cluster. We then send a word over the network (as part of a larger packet containing other words) in order to examine the tree structure in another node's CPU cache. We show that this is often faster than the standard solution, which locally incurs multiple cache misses while accessing each successive level of the tree. The principle is demonstrated with a cluster configured with Pentium III nodes connected with a Myrinet network. The new approach is shown to be 50% faster on this current cluster. In the future, the new approach is expected to have a still greater advantage as networks grow in speed, and as cache lines grow in length (greater cache miss penalty). This can be used to successfully overcome the inherent memory latency associated with cache misses","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"265 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126052746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信