Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing最新文献

筛选
英文 中文
Mapping of backpropagation learning onto distributed memory multiprocessors 反向传播学习在分布式内存多处理器上的映射
S. Mahapatra, R. Mahapatra
{"title":"Mapping of backpropagation learning onto distributed memory multiprocessors","authors":"S. Mahapatra, R. Mahapatra","doi":"10.1109/ICAPP.1995.472188","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472188","url":null,"abstract":"This paper presents a mapping scheme for parallel pipelined execution of the Backpropagation Learning Algorithm on distributed memory multiprocessors (DMMs). The proposed implementation exhibits training set parallelism that involves batch updating. Simple algorithms have been presented, which allow the data transfer involved in both forward and backward executions phases of the backpropagation algorithm to be carried out with a small communication overhead. The effectiveness of our mapping has been illustrated, by estimating the speedup of a proposed implementation on an array of T-805 transputers.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130344975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A performance comparison of buffering schemes for multistage switches 多级开关缓冲方案的性能比较
Bin Zhou, Mohammed Atiquzzaman
{"title":"A performance comparison of buffering schemes for multistage switches","authors":"Bin Zhou, Mohammed Atiquzzaman","doi":"10.1109/ICAPP.1995.472271","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472271","url":null,"abstract":"Multistage Interconnection Networks (MIN) are used to connect processors and memories in large scale scalable multiprocessor systems. MINs have also been proposed as switching fabrics in ATM networks in the future Broadband ISDN networks. A MIN consists of several stages of small crossbar switching elements (SE). Buffers are used in the SEs to increase the throughput of the MIN and prevent internal loss of packets. Different buffering schemes for the SEs are discussed in this paper. The objective of this paper is to study the performance of MINs with different buffering schemes, in the presence of uniform and hot spot traffic patterns. The results obtained from the study will help the network designers in choosing appropriate buffering strategies for MINs. For comparing different buffering strategies, the throughput and packet delay have been used as the performance measures.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132613045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
On deflection worm routing on meshes 关于蜗杆在网格上的偏转
A. Roberts, A. Symvonis
{"title":"On deflection worm routing on meshes","authors":"A. Roberts, A. Symvonis","doi":"10.1109/ICAPP.1995.472207","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472207","url":null,"abstract":"In this paper, we consider the deflection worm routing problem on two dimensional n/spl times/n meshes. Our results include: (i) an off-line algorithm for routing permutations in O(kn) steps, and (ii) a general method to obtain deflection worm routing algorithms from packet routing algorithms.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114075009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Embedded real-time video decompression algorithm and architecture for HDTV applications 用于高清电视应用的嵌入式实时视频解压算法和体系结构
R. Neogi
{"title":"Embedded real-time video decompression algorithm and architecture for HDTV applications","authors":"R. Neogi","doi":"10.1109/ICAPP.1995.472212","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472212","url":null,"abstract":"DCT/IDCT bared source coding and decoding techniques are widely accepted in HDTV systems and other MPEG based applications. In this paper, we propose a new direct 2-D IDCT algorithm bared on the parallel divide-and-conquer approach. The algorithm distributes computation by considering one transformed coefficient at a time and doing partial computation and updating as every coefficient arrives. A novel parallel and fully pipelined architecture with an effective processing time of one cycle per pixel for an N/spl times/N size block is designed to implement the algorithm. An unique feature of this architecture is that it integrates inverse-shuffling, inverse-quantization, inverse-source-coding, and motion-compensation into a single compact data-path. We avoid the insertion of a FIFO between the bit-stream decoder and decompression engine. The entire block of pixel values are sampled in a single cycle for post-processing after de-compression. Also, we use only (N/2(N/2+1))/2 multipliers and N/sup 2/ adders.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122309485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A multicast mechanism for actual causal ordering 用于实际因果排序的多播机制
W. Cheng, X. Jia, M. Werner
{"title":"A multicast mechanism for actual causal ordering","authors":"W. Cheng, X. Jia, M. Werner","doi":"10.1109/ICAPP.1995.472199","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472199","url":null,"abstract":"A number of multicast algorithms have been proposed to guarantee potential causal ordering. In these mechanisms, the delivery of a message would be blocked waiting for the delivery of causally earlier messages, even though it may not actually have a causal relationship with these messages. This represents a higher latency cost than necessary. Our objective is to reduce the latency time of multicast message delivery. This can be achieved by reducing the number of messages that a multicast message has to wait for. We propose a mechanism in which the delivery of a message would only be blocked by the delivery of messages with which it has an actual causal relationship. The mechanism includes causality information, supplied by users, in the multicast messages. Receivers deliver messages to application processes according to this information. We introduce a programming construction, message blocks, to simplify the task of expressing causality. Simulation results are included and discussed in detail.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"319 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127390011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Analysis of shared buffer multistage networks with hot spot 带热点的共享缓冲区多级网络分析
M. Saleh, Mohammed Atiquzzaman
{"title":"Analysis of shared buffer multistage networks with hot spot","authors":"M. Saleh, Mohammed Atiquzzaman","doi":"10.1109/ICAPP.1995.472270","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472270","url":null,"abstract":"Multistage interconnection networks based on shared buffering are known to have better performance and buffer utilization than input or output buffered switches. Shared buffer switches do not suffer from head of line blocking which is a common problem in simple input buffering. Shared buffer switches have previously been studied under uniform and unbalanced traffic patterns. However, due to the complexity of the model, the performance of such a network, in the presence of a single hot spot, has not been fully explored. A hot spot arises when one of the outputs of the network becomes very popular. We develop a model for a multistage interconnection network constructed from shared buffer switching elements and operating under a hot spot traffic pattern. The model is validated by comparison with simulation results. The model is used to study the network performance in terms of the throughput, packet delay, packet loss probability and the optimal buffer utilization, Numerical results show that, in the presence of hot spot traffic, shared buffer switches degrade more significantly than switches with dedicated input and/or output buffers.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127732435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Package of iterative solvers for the Fujitsu VPP500 parallel supercomputer 富士通VPP500并行超级计算机的迭代求解器包
Z. Leyk, M. Dow
{"title":"Package of iterative solvers for the Fujitsu VPP500 parallel supercomputer","authors":"Z. Leyk, M. Dow","doi":"10.1109/ICAPP.1995.472196","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472196","url":null,"abstract":"We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116743448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Motion detection on distributed-memory machines: a case study 分布式内存机器上的运动检测:一个案例研究
P. Cremonesi, M. Pugassi, N. Scarabottolo
{"title":"Motion detection on distributed-memory machines: a case study","authors":"P. Cremonesi, M. Pugassi, N. Scarabottolo","doi":"10.1109/ICAPP.1995.472211","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472211","url":null,"abstract":"The problem considered in this paper is the implementation of motion-detection on distributed-memory MIMD machines. The solution here proposed is based on pyramidal algorithms that, by iteratively discarding uninteresting details, allow to focus on the moving parts of an image stream. Different parallelisation methodologies have been evaluated and the most promising ones have been implemented on a Transputer-based parallel machine. Experimental results are here presented and compared with the theoretical ones.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114950596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved parallel algorithms for finding connected components 改进的寻找连接组件的并行算法
K. W. Chong, Tak-Wah Lam
{"title":"Improved parallel algorithms for finding connected components","authors":"K. W. Chong, Tak-Wah Lam","doi":"10.1109/ICAPP.1995.472217","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472217","url":null,"abstract":"Finding the connected components of a graph is a basic computational problem. In recent years, there were several exciting results in breaking the log/sup 2/ n-time barrier to finding connected components on parallel machines using shared memory without concurrent-write capability. This paper further presents two new parallel algorithms both using less than log/sup 2/ n time. The merit of the first algorithm is that it uses only a sublinear number of processors, yet retains the time complexity of the fastest existing algorithm. The second algorithm is slightly slower but its work (i.e., the time-processor product) is closer to optimal than all previous algorithms using less than log/sup 2/ n time.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130251059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Essential features of a compiler target language for parallel machines 并行机器的编译器目标语言的基本特征
G. A. Papadopoulos
{"title":"Essential features of a compiler target language for parallel machines","authors":"G. A. Papadopoulos","doi":"10.1109/ICAPP.1995.472172","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472172","url":null,"abstract":"Term Graph Rewriting Systems (TGRS) have been used extensively as an implementation vehicle for a number of, often divergent, programming paradigms ranging from the traditional functional programming ones to the (concurrent) logic programming ones and various amalgamations of them, to (concurrent) object-oriented ones. More recently, the relationship between TGRS and process calculi (such as the /spl pi/-calculus) as well as Linear Logic has also been explored. In this paper we describe our experience in using an intermediate Compiler Target Language (CTL) based on TGRS for mapping a variety of programming paradigms of the aforementioned types onto it, highlighting in the process some of the issues which we feel any such intermediate representation should address and which form effectively a minimum set of features every CTL should possess.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130789411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信