Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN)最新文献

筛选
英文 中文
Integrated VLSI layout compaction and wire balancing on a shared memory multiprocessor: evaluation of a parallel algorithm 共享内存多处理器上集成VLSI布局压缩和线路平衡:一种并行算法的评估
P. Chalasani, K. Thulasiraman, M. Corneau
{"title":"Integrated VLSI layout compaction and wire balancing on a shared memory multiprocessor: evaluation of a parallel algorithm","authors":"P. Chalasani, K. Thulasiraman, M. Corneau","doi":"10.1109/ISPAN.1994.367165","DOIUrl":"https://doi.org/10.1109/ISPAN.1994.367165","url":null,"abstract":"We first present a unified formulation to three problems in VLSI physical design: layout compaction, wire balancing and integrated layout compaction and wire balancing problem. The aim of layout compaction is to achieve minimum chip width. Whereas wire balancing seeks to achieve minimum total wire length, integrated layout compaction and wire balancing seeks to minimize wire length maintaining the chip width at the optimum value. Our formulation is in terms of the dual transshipment problem. We then review our recent work on a parallel algorithm for the dual transshipment problem. We show how this algorithm called Modified Network Dual Simplex Method provides a unified approach to solve the three problems mentioned above and present experimental results. Our implementations have been on the BBN Butterfly machine. We draw attention to certain rather unusual results and argue that if the MNDS method is used then integrated layout compaction and wire balancing will achieve minimum chip width and a total wire length close to the optimum achieved by the wire balancing algorithm.<<ETX>>","PeriodicalId":142405,"journal":{"name":"Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131752602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Software pipelining for Jetpipeline architecture Jetpipeline架构的软件流水线
M. Katahira, Takehito Sasaki, Hong Shen, Hiroaki Kobayashi, Tadao Nakamura
{"title":"Software pipelining for Jetpipeline architecture","authors":"M. Katahira, Takehito Sasaki, Hong Shen, Hiroaki Kobayashi, Tadao Nakamura","doi":"10.1109/ISPAN.1994.367155","DOIUrl":"https://doi.org/10.1109/ISPAN.1994.367155","url":null,"abstract":"High performance processors based on pipeline processing play an important role in scientific computation. We have proposed a hybrid pipeline architecture named Jetpipeline in our former work. The concept of Jetpipeline comes from the integration of superscalar, VLIW and vector architectures. Jetpipeline has multiple instruction pipelines, which execute multiple instructions like superscalar architectures. Instructions to be executed simultaneously are statically scheduled by a compiler like VLIW architectures. Therefore, parallelism derivation and instruction scheduling are very important for Jetpipeline. Software pipelining is one of the well-known techniques to achieve high throughput when processing loop programs. In this paper, we propose software pipelining for Jetpipeline. Firstly, the overview of the Jetpipeline architecture is described. Then the banked register configuration of Jetpipeline for reducing hardware complexity and supporting software pipelining is presented. Finally, the effectiveness of software pipelining for Jetpipeline is discussed by simulation.<<ETX>>","PeriodicalId":142405,"journal":{"name":"Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129016335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Quicksort and permutation routing on the hypercube and de Bruijn networks 超立方体和de Bruijn网络上的快速排序和排列路由
David S. L. Wei
{"title":"Quicksort and permutation routing on the hypercube and de Bruijn networks","authors":"David S. L. Wei","doi":"10.1109/ISPAN.1994.367189","DOIUrl":"https://doi.org/10.1109/ISPAN.1994.367189","url":null,"abstract":"We consider the problems of sorting and routing on some unbounded interconnection networks, namely hypercube and de Bruijn network. We first present two efficient implementations of quicksort on the hypercube. The first algorithm sorts N items on an N-node hypercube, one item per node, in O((log/sup 2/ N)/(log log N)) time with high probability, while the other one sorts N items on an (N/log N)-node hypercube, log N items per node, in O(log/sup 2/ N) time with high probability, which achieves optimal speedup in the sense of PT product. Both algorithms beat the fastest previous quicksort that runs in O(log/sup 2/ N) expected time on a butterfly of N nodes. We also present a deterministic (nonoblivious) permutation routing algorithm which runs in O(d/spl middot/n/sup 2/) time on a d-ary de Bruijn network of N=d/sup n/ nodes. To the best of our knowledge, this routing algorithm is so far the fastest deterministic one for the de Bruijn network of arbitrary degree. The best previous one runs in O((log d)/spl middot/d/spl middot/n/sup 2/) time. All algorithms presented are simple, the constants hidden behind the big Oh being small.<<ETX>>","PeriodicalId":142405,"journal":{"name":"Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134607107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Views of mixed-mode computing and network evaluation 混合模式计算与网络评估的观点
H. Siegel, J. Antonio
{"title":"Views of mixed-mode computing and network evaluation","authors":"H. Siegel, J. Antonio","doi":"10.1109/ISPAN.1994.367171","DOIUrl":"https://doi.org/10.1109/ISPAN.1994.367171","url":null,"abstract":"Trade-offs between the SIMD and MIMD models of architecture for parallelism are presented. Mixed-mode parallelism, where a machine can switch between the SIMD and MIMD modes of parallelism at instruction-level granularity with generally negligible overhead, is discussed. Advantages and disadvantages of mixed-mode parallelism and an example of a mixed-mode parallel algorithm are given. The relationship of mixed-mode processing to high-performance heterogeneous computing is overviewed. Difficulties involved with evaluating interconnection networks for parallel machines are then considered. There are a myriad of metrics that have been used in the literature. The problems involved with choosing the most appropriate metric or weighted set of metrics, and performing \"fair\" comparisons, are explored.<<ETX>>","PeriodicalId":142405,"journal":{"name":"Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131513435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Increase analysis in the total execution time of a parallel program 增加对并行程序总执行时间的分析
Dingchao Li, Hiromitsu Takagi, N. Ishii
{"title":"Increase analysis in the total execution time of a parallel program","authors":"Dingchao Li, Hiromitsu Takagi, N. Ishii","doi":"10.1109/ISPAN.1994.367175","DOIUrl":"https://doi.org/10.1109/ISPAN.1994.367175","url":null,"abstract":"Lower bound on the finishing time of optimal schedules is used as an absolute performance measure of static scheduling heuristics. This paper presents an efficient method of computing such a bound based on estimating overlaps among the execution ranges of tasks in a given task graph and analyzing the delays of tasks on the critical paths of the graph. The computation performed by this method is shown to be of higher quality than that of other known methods. The future work and directions on this topic are also indicated.<<ETX>>","PeriodicalId":142405,"journal":{"name":"Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN)","volume":"34 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132440494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Organization of a parallel virtual machine 并行虚拟机的组织
V. Beletsky, T. Popova, A. Chemeris
{"title":"Organization of a parallel virtual machine","authors":"V. Beletsky, T. Popova, A. Chemeris","doi":"10.1109/ISPAN.1994.367137","DOIUrl":"https://doi.org/10.1109/ISPAN.1994.367137","url":null,"abstract":"A virtual parallel machine is presented. The virtual machine includes the following programs: loop parallelization, dependence graph building, scheduling job programs, compiler and simulating programs. The basic principles and ideas, on which the programs were realized, are expounded. It is enumerated the basic opportunities and advantages of the virtual machine. The results of simulation on the virtual machine are presented.<<ETX>>","PeriodicalId":142405,"journal":{"name":"Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124169557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systolic array implementation of common factor algorithm to compute DFT 一个收缩阵列实现的公因子算法来计算DFT
S. He, M. Torkelson
{"title":"A systolic array implementation of common factor algorithm to compute DFT","authors":"S. He, M. Torkelson","doi":"10.1109/ISPAN.1994.367177","DOIUrl":"https://doi.org/10.1109/ISPAN.1994.367177","url":null,"abstract":"An extension to the common factor algorithm, CFA, to compute discrete Fourier transform, DFT, under the condition that the site of the transform is N=M/sup 2/, shows that the input and output data array of the transform may have identical index mapping. A simple planar 2-dimensional systolic array implementation of CFA algorithm is presented. The systolic array consists of N homogeneous processing element, PE. A DFT of size N=M/sup 2/ can be computed in 2M+1 steps of pipelined operations, achieving the area-time complexity AT/sup 2/=O(N/sup 2/log/sup 3/N). Asymptotically sub-optimal and without the necessity of complicated index mapping and data shuffling, the proposed approach is compared favorably with other existing approaches in realistic VLSI implementation. This architecture has also very good expansibility that a 2/sup t/N-size DFT transform can be computed on 2/sup t/ nearest-neighbor connected N-size array with reloaded twiddle factors, which makes it more suitable for VLSI implementation of DFT transform in various practical size.<<ETX>>","PeriodicalId":142405,"journal":{"name":"Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN)","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127216239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信