Proceedings. Advances in Parallel and Distributed Computing最新文献

筛选
英文 中文
A multithreaded processor designed for distributed shared memory systems 为分布式共享内存系统设计的多线程处理器
Proceedings. Advances in Parallel and Distributed Computing Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574034
Winfried Grünewald, T. Ungerer
{"title":"A multithreaded processor designed for distributed shared memory systems","authors":"Winfried Grünewald, T. Ungerer","doi":"10.1109/APDC.1997.574034","DOIUrl":"https://doi.org/10.1109/APDC.1997.574034","url":null,"abstract":"The multithreaded processor-called Rhamma-uses a fast context switch to bridge latencies caused by memory accesses or by synchronization operations. Load/store, synchronization, and execution operations of different threads of control are executed simultaneously by appropriate functional units. A fast context switch is performed whenever a functional unit comes across an operation that is destined for another unit. The overall performance depends on the speed of the context switch. We present two techniques to reduce the context switch cost to at most one processor cycle: A context switch is explicitly coded in the opcode, and a context switch buffer is used. The load/store unit shows up as the principal bottleneck. We evaluate four implementation alternatives of the load/store unit to increase processor performance.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133344164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Construction of multimedia server in a distributed multimedia system 分布式多媒体系统中多媒体服务器的构建
Proceedings. Advances in Parallel and Distributed Computing Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574040
Xiaoqiang Fei, P. Shi
{"title":"Construction of multimedia server in a distributed multimedia system","authors":"Xiaoqiang Fei, P. Shi","doi":"10.1109/APDC.1997.574040","DOIUrl":"https://doi.org/10.1109/APDC.1997.574040","url":null,"abstract":"The framework of constructing a distributed multimedia system based on the server/client architecture is described in this paper. We focus our attention on the realization of synchronization presentation of different media in a multimedia application, and a set of QoS (qualify of service) parameters is given as a criterion to make a trade-off between overall performance of the system and the synchronization presentation in each multimedia application.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121490477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An effective parallelizing scheme of MPEG-1 video encoding on Ethernet-connected workstations 基于以太网的MPEG-1视频编码并行化方案
Proceedings. Advances in Parallel and Distributed Computing Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574007
J. Nang, Junwha Kim
{"title":"An effective parallelizing scheme of MPEG-1 video encoding on Ethernet-connected workstations","authors":"J. Nang, Junwha Kim","doi":"10.1109/APDC.1997.574007","DOIUrl":"https://doi.org/10.1109/APDC.1997.574007","url":null,"abstract":"Although MPEG-1 Video is a promising and the most widely used moving picture compression standard it requires a lot of computational resources to encode the moving pictures with a reasonable frame size and quality. In this paper we propose and implement an efficient parallelizing scheme for an MPEG-1 Video encoding algorithm on Ethernet-connected workstations which is the most widely available computing environment nowadays. In this parallelizing scheme, the slice-level, frame-level, and GOP (Group of Pictures)-level parallelisms are identified as the attractive parallelisms that can be exploited in Ethernet-connected workstations. Three efficient parallel implementation schemes considering the communication characteristics of Ethernet-connected workstations are also proposed and experimented A series of experiments using thirty workstations shows that the MPEG-1 Video encoding time can be reduced in proportional to the number of workstations used in encoding computations although there is a saturation point in the speedup graphs.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130667175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Precise dependence test for scalars within nested loops 精确的依赖测试标量内嵌套循环
Proceedings. Advances in Parallel and Distributed Computing Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574055
Gao Nianshu, Zhaoqing Zhang, Ruliang Qiao
{"title":"Precise dependence test for scalars within nested loops","authors":"Gao Nianshu, Zhaoqing Zhang, Ruliang Qiao","doi":"10.1109/APDC.1997.574055","DOIUrl":"https://doi.org/10.1109/APDC.1997.574055","url":null,"abstract":"Exact direction and distance vectors are essential for detecting hierarchical parallelism and examining legality of loop transformation for a multiple level loop nest. Much of this work has been concentrated on array references. Little has been done to address the problems of finding precise dependences between scalar references, except to use extended SSA form with factored use-def links. In this paper, we present a technique for calculating precise direction and distance vectors for scalar references within nested loops without using any forms of SSA. To do this, we use conventional use-def links in combination with joint dominator and joint postdominator relationships, which are extended from dominator and postdominator respectively in standard data flow analysis. The precision of dependence information gathered by our algorithm can not be achieved by traditional analysis of dominator or reaching definitions.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131991424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive hybrid scheduling of nonuniform loops on UMA models UMA模型上非均匀环路的自适应混合调度
Proceedings. Advances in Parallel and Distributed Computing Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574059
Hua-ping Chen, Jing Li, Guoliang Chen
{"title":"Adaptive hybrid scheduling of nonuniform loops on UMA models","authors":"Hua-ping Chen, Jing Li, Guoliang Chen","doi":"10.1109/APDC.1997.574059","DOIUrl":"https://doi.org/10.1109/APDC.1997.574059","url":null,"abstract":"It is very difficult to keep load balancing among processors for the nonuniform loop in compile-time and it must be at the price of extra overhead to use dynamic methods. This paper proposes an adaptive hybrid scheduling way, in which the processes of distribution of loop are divided into a few rounds and the block size in each round is determined adaptively according to the average overhead due to dynamic scheduling. Several experimental results have also exposed the effect of scheduling parameter, which could be selected by programmers according to the probability that a fetching processor may not perform an additional task fetching.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134113622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient implementation of portable C*-like data-parallel library in C++ 在c++中高效实现可移植的类似C*的数据并行库
Proceedings. Advances in Parallel and Distributed Computing Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574061
Motohiko Matsuda, M. Sato, Y. Ishikawa
{"title":"Efficient implementation of portable C*-like data-parallel library in C++","authors":"Motohiko Matsuda, M. Sato, Y. Ishikawa","doi":"10.1109/APDC.1997.574061","DOIUrl":"https://doi.org/10.1109/APDC.1997.574061","url":null,"abstract":"The C* language is a data-parallel extension of the C language which incorporates parallel data types. Since the C++ language provides operator overloading, a C++ library can implement the C* parallel extensions with a similar syntax. Although library implementations are highly portable, some overheads make them impractical. The two major overheads incurred are temporaries in each operator application and the inability to detect regular communication patterns. The C++ overloading mechanism forces a temporary for each operator application. Also, regular communications in C* are syntactically indistinguishable from general point-to-point communications. We tackled these problems extensively in a library. The template mechanism, a type parameterization in C++, is used to eliminate temporaries by delaying operator application and evaluating the entire expression at once. The polymorphic type dispatch mechanism is used to detect regular communications by assigning particular types to potentially regular communications. We have implemented the library on the CM-5, and compared its performance with the C* compiler using three simple examples. The techniques presented offers improved performance comparable to the C* compiler, which is close or 1.5 times slower in two examples, and even faster in one example.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134165679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ATOLL: a high-performance communication device for parallel systems ATOLL:用于并行系统的高性能通信设备
Proceedings. Advances in Parallel and Distributed Computing Pub Date : 1997-03-19 DOI: 10.1109/APDC.1997.574037
U. Bruening, Lambert Schaelicke
{"title":"ATOLL: a high-performance communication device for parallel systems","authors":"U. Bruening, Lambert Schaelicke","doi":"10.1109/APDC.1997.574037","DOIUrl":"https://doi.org/10.1109/APDC.1997.574037","url":null,"abstract":"Fast and efficient communication is one of the major design goals not only for parallel systems but also for clusters of workstations. The proposed model of the high performance communication device ATOLL features very low latency for the start of communication operations and reduces the software overhead for communication specific functions. To close the gap between off-the-shelf microprocessors and the communication system a highly sophisticated processor interface implements atomic start of communication, MMU support, and a flexible event scheduling scheme. The interconnectivity of ATOLL provided by four independent network ports combined with cut-through routing allows the configuration of a large variety of network topologies. A software transparent error correction mechanism significantly reduces the required protocol overhead. The presented simulation results promise high performance and low-latency communication.","PeriodicalId":413925,"journal":{"name":"Proceedings. Advances in Parallel and Distributed Computing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126948817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信