Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)最新文献

筛选
英文 中文
An integrated processor management scheme for the mesh-connected multicomputer systems 网格连接多计算机系统的集成处理器管理方案
Chung-Yen Chang, P. Mohapatra
{"title":"An integrated processor management scheme for the mesh-connected multicomputer systems","authors":"Chung-Yen Chang, P. Mohapatra","doi":"10.1109/ICPP.1997.622572","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622572","url":null,"abstract":"The performance of a multicomputer system depends on the processor management strategy. Processor management deals with processor allocation and job scheduling. Most of the processor allocation and job scheduling schemes proposed in the literature incur high implementation complexity and are therefore impractical to be integrated. In this paper, we propose an integrated processor management scheme that includes a bypass-queue scheduling policy and a fixed-orientation allocation algorithm. Both policies have very low complexities and are hence suitable to be integrated. Both policies improve the system performance considerably when applied in isolation. The integrated scheme provides even better performance.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133052091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Quantifying the effects of communication optimizations 量化沟通优化的效果
Sung-Eun Choi, L. Snyder
{"title":"Quantifying the effects of communication optimizations","authors":"Sung-Eun Choi, L. Snyder","doi":"10.1109/ICPP.1997.622647","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622647","url":null,"abstract":"Using a specially constructed machine independent communication optimizer that allows control over optimization selection, we quantify the performance benefit of three well known communication optimizations: redundant communication removal, communication combination, and communication pipelining. The numbers are shown relative to the base performance of benchmark programs using the standard communication optimization of message vectorization. The effects on the number of calls to communication routines, both static and dynamic, are tabulated. We consider a variety of communication primitives including those found in Intel's NX library, PVM and the T3D's SHMEM library. The results show substantial improvement, with two combinations of optimizations being most effective.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133408117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Multidimensional network performance with unidirectional links 具有单向链路的多维网络性能
James R. Anderson, S. Abraham
{"title":"Multidimensional network performance with unidirectional links","authors":"James R. Anderson, S. Abraham","doi":"10.1109/ICPP.1997.622544","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622544","url":null,"abstract":"A stochastic analysis of multidimensional networks with unidirectional links between nodes is presented, which is more accurate than previous models and valid for the hypercube. The results are reconciled with those of previous researchers who have reported conflicting conclusions. In addition to the classic constraints of constant link width, pin-out, and bisection width, a new constraint, constant maximum throughput, is introduced. This constraint dramatizes the performance and cost trade-offs between different network topologies.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129980980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A parametrized branch-and-bound strategy for scheduling precedence-constrained tasks on a multiprocessor system 多处理器系统中优先级受限任务调度的参数化分支绑定策略
Jan Jonsson, K. Shin
{"title":"A parametrized branch-and-bound strategy for scheduling precedence-constrained tasks on a multiprocessor system","authors":"Jan Jonsson, K. Shin","doi":"10.1109/ICPP.1997.622580","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622580","url":null,"abstract":"In this paper we experimentally evaluate the performance of a parametrized branch-and-bound (B&B) algorithm for scheduling real-time tasks an a multiprocessor system. The objective of the B&B algorithm is to minimize the maximum task lateness in the system. We show that a last-in-first-out (LIFO) vertex selection rule clearly outperforms the commonly used least-lower-bound (LLB) rule for the scheduling problem. We also present a new adaptive lower-bound cost function that greatly improves the performance of the B&B algorithm when parallelism in the application cannot be fully exploited on the multiprocessor architecture. Finally, we evaluate a set of heuristic strategies, one of which generates near-optimal results with performance guarantees and another of which generates approximate results without performance guarantees.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129586193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Reducing overheads of local communications in fine-grain parallel computation 减少细粒度并行计算中的本地通信开销
Jin-Soo Kim, S. Ha, C. Jhon
{"title":"Reducing overheads of local communications in fine-grain parallel computation","authors":"Jin-Soo Kim, S. Ha, C. Jhon","doi":"10.1109/ICPP.1997.622648","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622648","url":null,"abstract":"For fine-grain computation to be effective, the cost of communications between the large number of subtasks should be minimised. In this paper we present an optimization technique which reduces overheads of communications between local subtasks by bypassing the network interface and transferring data directly from memory or registers to memory. On average, the optimization results in 35.6% improvement in total execution time on instruction-level simulations with six benchmark programs from 1 to 32 nodes.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129753386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient algorithms for multi-dimensional block-cyclic redistribution of arrays 数组多维块循环再分配的高效算法
Y. Lim, Neungsoo Park, V. Prasanna
{"title":"Efficient algorithms for multi-dimensional block-cyclic redistribution of arrays","authors":"Y. Lim, Neungsoo Park, V. Prasanna","doi":"10.1109/ICPP.1997.622650","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622650","url":null,"abstract":"We present a uniform framework for a classical problem, redistribution of a multi-dimensional array. Using a generalized circulant matrix formalism, we derive efficient direct, indirect and hybrid contention-free communication schedules. Our indirect schedule reduces the number of communication steps significantly compared with the previous approaches. Our approach exploits the regularity of the block-cyclic redistribution to minimize the index computation overheads. For the case of 2-d redistribution, when the block size increases by factors of K/sub 1/ and K/sub 2/ along each dimension and the process topology remains fixed, our indirect schedule performs the redistribution in O(log(K/sub 1/K/sub 2/)) communication steps. For the case of fixed block size and the processor topology is transposed, our indirect schedule results in O(log(L/G)) communication steps. Implementations of our algorithms on the IBM SP-2 show superior performance over previous approaches.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128975608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Probabilistic rotation: scheduling graphs with uncertain execution time 概率旋转:具有不确定执行时间的调度图
S. Tongsima, C. Phongpensri, E. Sha, N. Passos
{"title":"Probabilistic rotation: scheduling graphs with uncertain execution time","authors":"S. Tongsima, C. Phongpensri, E. Sha, N. Passos","doi":"10.1109/ICPP.1997.622658","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622658","url":null,"abstract":"This paper proposes an algorithm called probabilistic rotation scheduling which takes advantage of loop pipelining to schedule tasks with uncertain times to a parallel processing system. These tasks normally occur when conditional instructions are employed and/or inputs of the tasks influence the computation time. We show that based on our loop scheduling algorithm the length of the resulting schedule can be guaranteed to be satisfied for a given probability. The experiments show that the resulting schedule length for a given probability of confidence can be significantly better than the schedules obtained by worst-case or average-case scenario.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130849397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Sufficient conditions for optimal multicast communication 最佳组播通信的充分条件
B. Birchler, A. Esfahanian, E. Torng
{"title":"Sufficient conditions for optimal multicast communication","authors":"B. Birchler, A. Esfahanian, E. Torng","doi":"10.1109/ICPP.1997.622671","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622671","url":null,"abstract":"In this paper, we give a general technique for computing optimal multicast calling schedules in any multiprocessor system that utilizes a direct network interconnection structure as long as a few simple conditions are satisfied. Since almost any real system will satisfy these conditions, this result essentially means that multicast can always be performed in [log(d+1)] phases where d is the number of multicast destinations. In particular, previous results on optimal multicast algorithms in specific direct network topologies are simply corollaries of our result.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131130348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Hybrid compiler/hardware prefetching for multiprocessors using low-overhead cache miss traps 使用低开销缓存丢失陷阱的多处理器混合编译器/硬件预取
J. Skeppstedt, M. Dubois
{"title":"Hybrid compiler/hardware prefetching for multiprocessors using low-overhead cache miss traps","authors":"J. Skeppstedt, M. Dubois","doi":"10.1109/ICPP.1997.622659","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622659","url":null,"abstract":"We propose and evaluate a new data prefetching technique for cache coherent multiprocessors. Prefetches are issued by a prefetch engine which is controlled by the compiler. Second-level cache misses generate cache miss traps, and start the prefetch engine in a trap handler generated by the compiler. The only instruction overhead in our approach is when a trap handler terminates after data arrives. We present the functionality of the prefetch engine and a compiler algorithm to control it. We also study emulation of the prefetch engine in software. Our techniques are evaluated on six parallel applications using a compiler which incorporates our algorithm and a simulated multiprocessor. The prefetch engines remove up to 67% of the memory access stall time at an instruction overhead less than 0.42%. The emulated prefetch engines remove in general less stall time at a higher instruction overhead.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126675968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Implementations of a feature-based visual tracking algorithm on two MIMD machines 基于特征的视觉跟踪算法在两台MIMD机器上的实现
M. B. Kulaczewski, H. Siegel
{"title":"Implementations of a feature-based visual tracking algorithm on two MIMD machines","authors":"M. B. Kulaczewski, H. Siegel","doi":"10.1109/ICPP.1997.622676","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622676","url":null,"abstract":"As an example of a task that processes complex visual information to generate control signals for a system, an existing feature-based visual tracking algorithm for a static camera was mapped onto two parallel machines representing the MIMD execution model. The algorithm is described and a version suitable for mapping onto parallel machines is developed. Timing results for the implementation on the Intel Paragon and the IBM SP2 are presented, using real image data for all experiments. For each subtask of the algorithm, its performance is measured as a function of data layout. In addition, the impact of the time required to distribute image data across processing elements on the performance is considered. For the subtask of finding the best match of a feature in an image, load balancing approaches dependent on machine characteristics and submachine size are discussed. This type of matching is used in many vision tasks.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114705909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信