Proceedings 2000 International Conference on Parallel Processing最新文献

The double scheme: deadlock-free dynamic reconfiguration of cut-through networks 双重方案:无死锁的直通网络动态重新配置

Proceedings 2000 International Conference on Parallel Processing Pub Date : 2000-08-21 DOI: 10.1109/ICPP.2000.876160

Ruoming Pang, T. Pinkston, J. Duato

引用次数: 27

A novel channel-adaptive uplink access control protocol for nomadic computing 一种新的用于游牧计算的信道自适应上行访问控制协议

Proceedings 2000 International Conference on Parallel Processing Pub Date : 2000-08-21 DOI: 10.1109/ICPP.2000.876174

Yu-Kwong Kwok, V. Lau

{"title":"A novel channel-adaptive uplink access control protocol for nomadic computing","authors":"Yu-Kwong Kwok, V. Lau","doi":"10.1109/ICPP.2000.876174","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876174","url":null,"abstract":"We consider the uplink access control problem in a mobile computing system, which is based on a cellular phone network in that a user can use the mobile device to transmit voice or file data. This resource management problem is important because efficient solution to uplink access control is critical for supporting a large user population with a reasonable level of quality of service (QoS). While there are a number of recently proposed protocols for uplink access control, these protocols possess a common drawback in that they do not exploit well the burst error properties, which are inevitable in a wireless communication system. In this paper, we propose a novel TDMA-based uplink access protocol, which employs a channel state dependent allocation strategy. Our protocol is motivated by two observations: (1) when channel state is bad, the throughput is low due to large amount of FEC (forward error correction) or excessive ARQ (automatic repeated request) is needed; and (2) because of (1), much of the mobile device's energy is wasted. The proposed protocol works closely with the underlying physical layer in that through observing the channel state information (CSI) of each mobile user, the MAC protocol first segregates a set of users with good CSI from requests gathered in the request contention phase of an uplink frame. The protocol then judiciously allocates channel bandwidth to contending users based on their channel conditions. Simulation results indicate that the proposed protocol considerably outperforms five state-of-the-art protocols in terms of packet loss, delay, and throughput.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125266680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

ADAPT: Automated De-coupled Adaptive Program Transformation ADAPT:自动解耦自适应程序转换

Proceedings 2000 International Conference on Parallel Processing Pub Date : 2000-08-21 DOI: 10.1109/ICPP.2000.876107

Michael J. Voss, R. Eigenmann

{"title":"ADAPT: Automated De-coupled Adaptive Program Transformation","authors":"Michael J. Voss, R. Eigenmann","doi":"10.1109/ICPP.2000.876107","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876107","url":null,"abstract":"Dynamic program optimization offers performance improvements far beyond those possible with traditional compile-time optimization. These gains are due to the ability to exploit both architectural and input data set characteristics that are unknown prior to execution time. In this paper, we propose a novel framework for dynamic program optimization, ADAPT (Automated De-coupled Adaptive Program Transformation), that builds on the strengths of existing approaches. The key to our framework is the de-coupling of the dynamic compilation of new code variants from the dynamic selection of these variants at their points of use. This allows code generation to occur concurrently with program execution, removing dynamic compilation overheads from the critical path. We present a compilation system, based on the Polaris optimizing compiler, that automatically applies this framework to general \"plugged-in\" optimization techniques. We evaluate our system on three programs from the SPEC floating point benchmark suite by dynamically applying loop distribution, loop unrolling, loop tiling and automatic parallelization. We show that our techniques can improve performance by as much as 70% over statically optimized code.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124603881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 73

Partial resolution in data value predictors 数据值预测器的部分分辨率

Proceedings 2000 International Conference on Parallel Processing Pub Date : 2000-08-21 DOI: 10.1109/ICPP.2000.876078

Toshinori Sato, I. Arita

引用次数: 12

Match virtual machine: an adaptive runtime system to execute MATLAB in parallel Match虚拟机:一种自适应运行时系统，用于并行执行MATLAB

Proceedings 2000 International Conference on Parallel Processing Pub Date : 2000-08-21 DOI: 10.1109/ICPP.2000.876100

M. Haldar, A. Nayak, Abhay Kanhere, P. Joisha, N. Shenoy, A. Choudhary, P. Banerjee

{"title":"Match virtual machine: an adaptive runtime system to execute MATLAB in parallel","authors":"M. Haldar, A. Nayak, Abhay Kanhere, P. Joisha, N. Shenoy, A. Choudhary, P. Banerjee","doi":"10.1109/ICPP.2000.876100","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876100","url":null,"abstract":"MATLAB is one of the most popular languages for desktop numerical computations as well as for signal and image processing applications. Applying parallel processing techniques to improve performance of MATLAB codes has been the goal of many recent works. Most current frameworks require the user to specify parallelism and/or information regarding type/shape of the variables, thereby sacrificing the user friendliness which is one of the most popular MATLAB features. Other systems work on a restricted subset of MATLAB, thereby limiting the class of applications MATLAB can support. We present a runtime system capable of executing MATLAB code in parallel without any user intervention. The runtime system performs automatic parallelization and type/shape inference of the code at runtime. A unique feature of the runtime system is its capability to automatically adapt to changes in the underlying architecture, making it particularly useful for systems where predicting performance statically is difficult. We present experimental results obtained for the runtime system running on SGI Origin2000 shared memory multiprocessor.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"14 17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127928358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

A scalable, cost-effective, and flexible disk system using high-performance embedded-processors 使用高性能嵌入式处理器的可扩展、经济高效且灵活的磁盘系统

Proceedings 2000 International Conference on Parallel Processing Pub Date : 2000-08-21 DOI: 10.1109/ICPP.2000.876147

Aki W. Tomita, Naoki Watanabe, Y. Takamoto, S. Inohara, F. Maciel, Hiroaki Odawara, M. Sugie

{"title":"A scalable, cost-effective, and flexible disk system using high-performance embedded-processors","authors":"Aki W. Tomita, Naoki Watanabe, Y. Takamoto, S. Inohara, F. Maciel, Hiroaki Odawara, M. Sugie","doi":"10.1109/ICPP.2000.876147","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876147","url":null,"abstract":"As a scalable, cost-effective, and flexible solution for data-intensive systems, we are exploring active-network-storage (ANS), which is an array of ANS disk drives. The ANS drive improves flexibility by using a modular software design; that is, users can specify functions of the ANS drive by loading/unloading the corresponding modules on it. To keep the ANS drive cost-effective, users are allowed to choose whether native code modules or platform-independent Java-bytecode modules are executed on the drive. We forecast that a current high-performance embedded-processor is powerful enough to enable this modular design to be implemented and to provide a scalable, cost-effective, and flexible ANS system. We have confirmed our forecast by conducting an experiment with an ANS drive prototype with a 200 MHz embedded-processor running database sequential scanning and NFS, which are typical off-loaded functions with different characteristics. To evaluate scalability and cost-effectiveness of the ANS system, we estimated the throughput from measurements on our ANS prototype, and we compared it with the throughput that was measured on a 450 MHz Pentium II Xeon server. Our estimation indicates that the scan throughput of the ANS system increases up to 71 MB/s while that of the server saturates at 25 MB/s because of its CPU bottleneck. The NFS read/write throughputs of two ANS drives surpassed the server maximum throughputs.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115786401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Lower bounds on precedence-constrained scheduling for parallel processors 并行处理器优先级约束调度的下界

Proceedings 2000 International Conference on Parallel Processing Pub Date : 2000-08-21 DOI: 10.1109/ICPP.2000.876172

Ivan D. Baev, W. Meleis, A. Eichenberger

{"title":"Lower bounds on precedence-constrained scheduling for parallel processors","authors":"Ivan D. Baev, W. Meleis, A. Eichenberger","doi":"10.1109/ICPP.2000.876172","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876172","url":null,"abstract":"We consider two general precedence-constrained scheduling problems that have wide applicability in the areas of parallel processing, high performance compiling, and digital system synthesis. These problems are intractable so it is important to be able to compute tight bounds on their solutions. A tight lower bound on makespan scheduling can be obtained by replacing precedence constraints with release and due dates, giving a problem that can be efficiently solved. We demonstrate that recursively applying this approach yields a bound that is provably tighter than other known bounds, and experimentally shown to achieve the optimal value at least 86.5% of the time over a synthetic benchmark. We compute the best known lower bound on weighted completion time scheduling by applying the recent discovery of a new algorithm for solving a related scheduling problem. Experiments show that this bound significantly outperforms the linear programming-based bound. We have therefore demonstrated that combinatorial algorithms can be a valuable alternative to linear programming for computing tight bounds on large scheduling problems.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123486412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Multi-node multicast in three and higher dimensional wormhole tori and meshes with load balance 基于负载均衡的三维及高维虫洞环面多节点组播

Proceedings 2000 International Conference on Parallel Processing Pub Date : 2000-08-21 DOI: 10.1109/ICPP.2000.876067

Ming-Hour Yang, Y. Tseng, Ming-Shian Jian, Chao Lin

引用次数: 0

Load redundancy removal through instruction reuse 通过指令重用消除负载冗余

Proceedings 2000 International Conference on Parallel Processing Pub Date : 2000-08-21 DOI: 10.1109/ICPP.2000.876075

Jun Yang, R. Gupta

{"title":"Load redundancy removal through instruction reuse","authors":"Jun Yang, R. Gupta","doi":"10.1109/ICPP.2000.876075","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876075","url":null,"abstract":"Instruction reuse techniques have been developed to detect and remove redundancy at runtime. By maintaining the execution history of an instruction, reuse techniques detect if a subsequent execution of an instruction will yield the same result as its previous execution, and if this is the case, the result is made available to dependent instructions without executing the instruction. This approach eliminates same instruction redundancy, that is, redundancy across different dynamic instances of the same static instruction. However, the main limitation of existing instruction reuse techniques is that they do not detect or eliminate different instruction redundancy, that is, redundancy across dynamic instances of statically distinct instructions. We present instruction reuse techniques for load redundancy removal that eliminate both same and different instruction redundancy. We first present a study that shows that in addition to significant levels of same instruction redundancy (average of 20%), load instructions also contain high levels (average of 35%) of different instruction redundancy arising at other load or store instructions. We also describe studies that characterize the behavior of the redundancy and develop a hardware implementation guided by this characterization. Our experiments show that our techniques yield IPC improvements of up to 11% and reduces off-chip traffic due to cache misses by as much as 32% for SPECint95 benchmarks.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125928666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

Depth first search and location based localized routing and QoS routing in wireless networks 无线网络中基于深度优先搜索和位置的局部路由和QoS路由

Proceedings 2000 International Conference on Parallel Processing Pub Date : 2000-08-21 DOI: 10.1109/ICPP.2000.876111

I. Stojmenovic, Mark Russell, B. Vukojevic

{"title":"Depth first search and location based localized routing and QoS routing in wireless networks","authors":"I. Stojmenovic, Mark Russell, B. Vukojevic","doi":"10.1109/ICPP.2000.876111","DOIUrl":"https://doi.org/10.1109/ICPP.2000.876111","url":null,"abstract":"In a localized routing algorithm, node A currently holding the message forwards it based on the location of itself, its neighboring nodes and destination. We propose to use depth first search (DFS) method for routing decisions. Each node A, upon receiving the message for the first time, sorts all its neighbors according to a criteria such as their distance to destination and uses that order in DFS algorithm. It is the first localized algorithm that guarantees delivery for (connected) wireless networks modeled by arbitrary graphs, including inaccurate location information. We then propose the first localized QoS routing algorithm for wireless networks. It performs DFS routing algorithm after edges with insufficient bandwidth or insufficient connection time are deleted from the graph, and attempts to minimize hop count. This is also the first paper to apply GPS in QoS routing decisions, and to consider the connection time (estimated lifetime of a link) as a QoS criterion. The average length of measured QoS path in our experiments, obtained by DFS method, was between 1 and 1.34 times longer than the length of QoS path obtained by shortest path algorithm. The overhead is considerably reduced by applying the concept of internal nodes.","PeriodicalId":149642,"journal":{"name":"Proceedings 2000 International Conference on Parallel Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130757345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 162