Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing最新文献

筛选
英文 中文
Periodically regular chordal rings: generality, scalability, and VLSI layout 周期性正则弦环:通用性、可扩展性和VLSI布局
D. Kwai, B. Parhami
{"title":"Periodically regular chordal rings: generality, scalability, and VLSI layout","authors":"D. Kwai, B. Parhami","doi":"10.1109/SPDP.1996.570327","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570327","url":null,"abstract":"Based on the chordal ring structure, we introduce a general framework to describe networks with periodic connection patterns. The periodically regular chordal (PRC) ring is proposed as an alternative for realizing massively parallel processors. A PRC ring consists of identical nodes that are connected cyclically via a finite set of skip links and has the desirable properties of bounded node degree and regular layout In this paper, we investigate the scalability and layout aspects of PRC rings with fixed period and chord lengths and show that they lead to linearly increasing area and constant wire length without deviating significantly from optimal architectural parameters.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"277 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123721288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Integrating task and data parallelism in an irregular application: a case study 在不规则应用程序中集成任务和数据并行性:一个案例研究
Ky MacPherson, P. Banerjee
{"title":"Integrating task and data parallelism in an irregular application: a case study","authors":"Ky MacPherson, P. Banerjee","doi":"10.1109/SPDP.1996.570335","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570335","url":null,"abstract":"Recently, there has been growing interest in simultaneous exploitation of task and data parallelism in scientific applications and in compiler and runtime support of this combined form of parallelism. In this paper we report on the integration of task and data parallelism on an important irregular application from the VLSI computer-aided design field, namely VLSI layout verification. We report on the implementation, and experimental results of our study on a SUN Sparcserver 1000 shared memory multiprocessor, a CM-5 distributed memory multiprocessor.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"157 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113960627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Recovering scalable spin locks 恢复可伸缩自旋锁
P. Bohannon, D. Lieuwen, A. Silberschatz
{"title":"Recovering scalable spin locks","authors":"P. Bohannon, D. Lieuwen, A. Silberschatz","doi":"10.1109/SPDP.1996.570349","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570349","url":null,"abstract":"We present a mechanism for making a scalable spin lock protocol, the MCS lock recoverable, thereby ensuring that a lock never becomes permanently unavailable, even if one or more processes using the lock die. This is achieved by modifying the original protocol to write additional information to shared memory and introducing a cleanup process which returns locks to a usable state in case of process death(s). Our method does not require kernel or hardware support other than the swap instruction, and maintains performance comparable to the original protocol (one third as fast in the uncontested case). We have proven the correctness of our scheme in the face of the weak memory models provided by modern systems.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129782507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
The active-ray approach to rendering on distributed memory multiprocessors 在分布式内存多处理器上进行渲染的主动射线方法
A. Law, R. Yagel
{"title":"The active-ray approach to rendering on distributed memory multiprocessors","authors":"A. Law, R. Yagel","doi":"10.1109/SPDP.1996.570363","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570363","url":null,"abstract":"Object dataflow is a popular approach used in parallel rendering. The data representing the 3D scene is statically distributed among processors and objects are fetched and cached only on demand. Most previous object dataflow methods were implemented on shared memory architectures and exploited spatial coherency to reduce hardware cache misses. We propose an efficient model for object dataflow parallel volume rendering on message passing machines. The active ray tracing algorithm is introduced and its ray storage mechanism is used to support latency hiding by postponing computation on inactive rays. Memory usage is optimized by letting objects migrate and replicate at different processors rather than the common static assignments. Our cache-only-memory approach uses a distributed-directory scheme to trace the location of objects at other nodes. A mechanism to minimize network congestion was implemented which optimizes channel utilization. Unlike previous methods, our approach can benefit from temporal coherence and effectively minimizes communication costs in successive frames. We implemented a volume ray casting instance of the algorithm on the Cray T3D and achieved higher efficiency and scalability than existing algorithms. We achieve interactive frame rates of approximately 20 Hz for 128/sup 3/ volume, and 4 Hz for 256/sup 3/ volume on 128 processors.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"1027 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116461261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Index-shuffle graphs Index-shuffle图
M. Baumslag, B. Obrenic
{"title":"Index-shuffle graphs","authors":"M. Baumslag, B. Obrenic","doi":"10.1109/SPDP.1996.570329","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570329","url":null,"abstract":"Index-shuffle graphs are introduced as candidate interconnection networks for parallel computers. The comparative advantages of index-shuffle graphs over the standard bounded-degree \"approximations\" of the hypercube, namely butterfly-like and shuffle-like graphs, are demonstrated in the theoretical framework of graph embedding and network emulations. An N-node index-shuffle graph emulates: (1) an N-node shuffle-exchange graph with no slowdown, while the currently best emulations of shuffle-like graphs by hypercubes and butterflies incur a slowdown of /spl Omega/(log N); (2) its like-sized butterfly graph with a slowdown O(log log log N), while the currently best emulations of butterfly-like graphs by shuffle-like graphs incur a slowdown of /spl Omega/(log log N); (3) an N-node hypercube that executes an on-line leveled algorithm with a slowdown O(log log N) and without data circulation, while the slowdown of currently best such emulations of the hypercube by its bounded-degree shuffle-like and butterfly-like derivatives remains /spl Omega/(log N), and only if the entire local data set of every processor is allowed to circulate through the network.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131165487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Optimistic parallel computation: an example from computational chemistry 乐观并行计算:计算化学中的一个例子
Emily Angerer Crawford, K. Schwan, S. Yalamanchili
{"title":"Optimistic parallel computation: an example from computational chemistry","authors":"Emily Angerer Crawford, K. Schwan, S. Yalamanchili","doi":"10.1109/SPDP.1996.570336","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570336","url":null,"abstract":"Performance penalties due to synchronization are a common concern in parallel programming. Traditional approaches enforce the correct ordering of write operations using locks, but this can be time-consuming and drastically reduce the benefits of using a parallel machine. Instead, for certain classes or programs we propose using an optimistic approach where the solution is calculated without any locks. This approach detects data races by maintaining statistics on memory writes and correcting potentially inappropriate data values by repeating selected computations and write operations. This scheme is evaluated with a novel parallel implementation of the Moller-Plesset perturbation theory energy calculation for closed-shell molecules.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133339923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A framework for automatic dynamic data mapping 一个用于自动动态数据映射的框架
Jordi Garcia, E. Ayguadé, Jesús Labarta
{"title":"A framework for automatic dynamic data mapping","authors":"Jordi Garcia, E. Ayguadé, Jesús Labarta","doi":"10.1109/SPDP.1996.570321","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570321","url":null,"abstract":"Physically-distributed memory multiprocessors are becoming popular and data distribution and loop parallelization are aspects that a parallelizing compiler has to consider in order to get efficiency from the system. The cost of accessing local and remote data can be one or several orders of magnitude different, and this can dramatically affect the performance of the system. It would be desirable to free the programmer from considerations of the low-level details of the target architecture, to program explicit processes or specify interprocess communication. We present an approach to automatically derive static or dynamic data distribution strategies for the arrays used in a program. All the information required about data movement and parallelism is contained in a single data structure, called the Communication-Parallelism Graph (CPG). The problem is modeled and solved using a general purpose linear 0-1 integer programming solver. This allows us to find the optimal solution for the problem for one-dimensional array distributions. We also show the feasibility of using this approach in terms of compilation time and quality of the solutions generated.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132093261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Compile-time inter-query dependence analysis 编译时查询间依赖分析
S. Parthasarathy, Wei Li, Michal Cierniak, Mohammed J. Zaki
{"title":"Compile-time inter-query dependence analysis","authors":"S. Parthasarathy, Wei Li, Michal Cierniak, Mohammed J. Zaki","doi":"10.1109/SPDP.1996.570377","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570377","url":null,"abstract":"Most parallel databases exploit two types of parallelism: intra-query parallelism and inter-transaction concurrency. Between these two cases lies another type of parallelism: inter-query parallelism within a transaction or application. Exploiting inter-query parallelism requires either compiler support to automatically parallelize the existing embedded query programs; or programming support to write explicitly parallel query programs. The authors present compiler analysis to automatically detect parallelism in the embedded query programs. They present compiler algorithms for detecting dependences in such programs. They show that the properties of some aggregate functions such as MIN and MAX can help reduce statically computed dependences.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"29 40","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114059384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mapping and parallel implementation of Bayesian belief networks 贝叶斯信念网络的映射与并行实现
N. Saxena, Sudeep Sarkar, N. Ranganathan
{"title":"Mapping and parallel implementation of Bayesian belief networks","authors":"N. Saxena, Sudeep Sarkar, N. Ranganathan","doi":"10.1109/SPDP.1996.570391","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570391","url":null,"abstract":"Presents an efficient technique for mapping arbitrarily large Bayesian belief networks on hypercubes with deadlock-free implementation. We show that the speedup does not vary with the number of nodes in the Bayesian network and is limited by the height of the Peot-Shachter tree which is obtained by hanging the Bayesian polytree by a pivot node. We also found that the overhead in implementing Bayesian networks on parallel machines like hypercubes can be large because of the communication intensive nature of the network.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"33 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113941362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A loop allocation policy for DOACROSS loops DOACROSS循环的循环分配策略
Joford T. Lim, A. Hurson, K. Kavi, Ben Lee
{"title":"A loop allocation policy for DOACROSS loops","authors":"Joford T. Lim, A. Hurson, K. Kavi, Ben Lee","doi":"10.1109/SPDP.1996.570340","DOIUrl":"https://doi.org/10.1109/SPDP.1996.570340","url":null,"abstract":"The dataflow model of computation, in general, and its recent direction to combine dataflow processing with control-flow processing, in particular, provide attractive alternatives to satisfy the computational demands of new applications, without experiencing the shortcomings of the traditional concurrent systems. This should motivate researchers to analyze the applicability of familiar concepts, such as scheduling and load balancing, within this new architectural framework. Effective execution of loop iterations as a means to improve performance and hardware utilization has received a great deal of attention in the past. In this paper we address the problem of scheduling/allocation of DOACROSS loops in a multithreaded dataflow environment. An extension to the staggered scheme-Cyclic staggered scheme-which produces a more balanced distribution of iterations among processors is introduced and its performance improvement in a dataflow and control-flow environment is simulated and analyzed.","PeriodicalId":360478,"journal":{"name":"Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122880685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信