Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)最新文献_第5页

Communication in parallel applications: characterization and sensitivity analysis 通信在并行应用:表征和敏感性分析

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622679

Dale Seed, A. Sivasubramaniam, C. Das

{"title":"Communication in parallel applications: characterization and sensitivity analysis","authors":"Dale Seed, A. Sivasubramaniam, C. Das","doi":"10.1109/ICPP.1997.622679","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622679","url":null,"abstract":"Communication characterization of parallel applications is essential to understand the interplay between architectures and applications in determining the maximum achievable performance. Although a significant amount of research has been conducted on execution-based architectural evaluations, very little effort has gone into capturing the communication behavior of an application mathematically. In this paper, we attempt to characterize the communication behavior of applications by temporal, spatial and volume attributes. We also study the impact of variation in application and architectural parameters on the communication behavior in terms of the three attributes. Our results show that for the chosen suite of applications, the message arrival and spatial distributions can be closely approximated by known statistical distributions and that the temporal as well as spatial distributions of all applications remain unchanged with respect to four parameters considered in this study. These results lead us closer to the belief that it is possible to abstract the communication properties of parallel applications in convenient mathematical forms that have wide applicability.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121457470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Modeling the impact of run-time uncertainty on optimal computation scheduling using feedback 基于反馈的运行时不确定性对最优计算调度的影响建模

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622683

R. Dietz, T. Casavant, T. Scheetz, T. Braun, M. Andersland

引用次数: 5

The affinity entry consistency protocol 亲和性表项一致性协议

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622646

C. Bentes, R. Bianchini, C. Amorim

{"title":"The affinity entry consistency protocol","authors":"C. Bentes, R. Bianchini, C. Amorim","doi":"10.1109/ICPP.1997.622646","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622646","url":null,"abstract":"In this paper we propose a novel software-only distributed shared memory system (SW-DSM), the Affinity Entry Consistency (AEC) protocol. The protocol is based on Entry Consistency but, unlike previous approaches, does not require the explicit association of shared data to synchronization variables, uses the page as its coherence unit, and generates the set of modifications (in the form of diffs) made to shared pages eagerly. The AEC protocol hides the overhead of generating and applying diffs behind synchronization delays, and uses a novel technique, Lock Acquirer Prediction (LAP), to tolerate the overhead of transferring diffs through the network. LAP attempts to predict the next acquirer of a lock at the time of the release, so that the acquirer can be updated even before requesting ownership of the lack. Using execution-driven simulation of real applications, we show that LAP performs very well under AEC; LAP predictions are within the 80-97% range of accuracy. Our results also show that LAP improves performance by 7-28% for our applications. In addition we find that most of the diff creation overhead in the AEC protocol can usually be overlapped with synchronization latencies. A comparison against simulated TreadMarks shows that AEC outperforms TreadMarks by as much as 47%. We conclude that LAP is a useful technique for improving the performance of update-based SW-DSMs, while AEC is an efficient implementation of the Entry Consistency model.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115489049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Decisive path scheduling: a new list scheduling method 决定性路径调度:一种新的列表调度方法

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622682

G. Park, B. Shirazi, J. Marquis, Hyunseung Choo

{"title":"Decisive path scheduling: a new list scheduling method","authors":"G. Park, B. Shirazi, J. Marquis, Hyunseung Choo","doi":"10.1109/ICPP.1997.622682","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622682","url":null,"abstract":"Scheduling parallel tasks represented as a Directed Acyclic Graph (DAG), on a multiprocessor system has been an important research area in the past decades. One of the critical aspects of a class of scheduling algorithms, called \"List Scheduling\", is how to decide which task is to be scheduled next. This is achieved by assigning priorities to the nodes or the edges of the input DAG, and thus the task with the highest priority will be scheduled next. This paper proposes a low complexity scheduling algorithm to improve the priority node selection criteria in list scheduling algorithms. The worst case performance of the proposed algorithm is analyzed for general input DAGs. Also, the worst case performance and the optimality conditions are obtained for free structured input DAGs. The performance comparison study shows that the proposed algorithm outperforms existing scheduling algorithms especially for input DAGs with high communication overheads. The performance improvement over existing algorithms becomes larger as the input DAG becomes more dense and the level of parallelism in the DAG is increased.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129970218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Hardware versus software implementation of COMA 硬件与软件实现的对比

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622652

Adrian Moga, M. Dubois, A. Gefflaut

{"title":"Hardware versus software implementation of COMA","authors":"Adrian Moga, M. Dubois, A. Gefflaut","doi":"10.1109/ICPP.1997.622652","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622652","url":null,"abstract":"Traditionally, cache coherence in multiprocessors has been maintained in hardware. However, the cost-effectiveness of hardwired protocols is questionable. Virtual Shared Memory systems have highlighted the many advantages of software-implemented protocols, albeit at a performance price. The performance gap is narrowed by hybrid systems with the addition of hardware support for fine-grain sharing. We have developed a software protocol for a COMA (Cache-Only Memory Architecture). We call the system SC-COMA for Software-Controlled COMA, to emphasize that the protocol engine is emulated by software executed on the main processor. Contrary to user-level protocols, the software handling coherence events in SC-COMA runs in sub-kernel mode, transparently providing the same services to applications as a hardware counterpart. The software emulation layer has been written and we compare SC-COMA to an idealized hardware COMA through detailed simulations. Our results show that SC-COMA is competitive. On systems with 32 processors, it achieves a slowdown of 11-56% with respect to its hardware counterpart, across a range of applications and memory pressures. SC-COMA scales well, up to 32 nodes. A study on the impact of faster processors on SC-COMA's relative performance indicates a consistent improvement, but with a limitation due to the loosely-integrated design. We conclude that SC-COMA is a viable solution to easily transform networks of workstations into powerful multiprocessors.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"246 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120892870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Data distribution analysis and optimization for Pointer-based distributed programs 基于指针的分布式程序的数据分布分析与优化

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622556

ProgramsJenq Kuen Lee, Daniel Ho, Yue-Chee ChuangDepartment

{"title":"Data distribution analysis and optimization for Pointer-based distributed programs","authors":"ProgramsJenq Kuen Lee, Daniel Ho, Yue-Chee ChuangDepartment","doi":"10.1109/ICPP.1997.622556","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622556","url":null,"abstract":"A critical question remains open if the compiler can understand the distribution pattern of pointer-based distributed objects built by application programmers, and perform optimization as effectively as the HPF compiler does with distributed arrays. In this paper, we address this challenging issue. In our work, we first present a parallel progamming model which allows application programmers to build pointer-based distributed objects at application levels. Next we propose a distribution analysis algorithm which can automatically summarize the distribution pattern of pointer-based distributed objects built by application programmers. Our work, to our best knowledge, is the first work to attempt to address this open issue. Our distribution analysis framework employs Feautrier's parametric integer programming as the basic solver, and can always obtain precise distribution information from the class of programs written in our parallel programming model with static control. Experimental results done on a 16-node IBM SP-2 machine show that the compiler with the help of distribution analysis algorithm can significantly improve the performance of pointer-based distributed programs.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122782055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Exploiting task and data parallelism in parallel Hough and Radon transforms 利用并行霍夫和拉东变换中的任务和数据并行性

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622678

D. Krishnaswamy, P. Banerjee

引用次数: 9

Local search for DAG scheduling and task assignment 本地搜索DAG调度和任务分配

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622584

Minyou Wu, W. Shu, J. Gu

引用次数: 19

Design of scalable and multicast capable cut-through switches for high-speed LANs 高速局域网可扩展和多播直通交换机的设计

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622662

Mingyao Yang, L. Ni

引用次数: 10

Throttle and preempt: a new flow control for real-time communications in wormhole networks 节流和抢占:虫洞网络中实时通信的一种新的流量控制

Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162) Pub Date : 1997-08-11 DOI: 10.1109/ICPP.1997.622589

Hyojeong Song, Boseob Kwon, H. Yoon

引用次数: 44