Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)最新文献

筛选
英文 中文
Communication in parallel applications: characterization and sensitivity analysis 通信在并行应用:表征和敏感性分析
Dale Seed, A. Sivasubramaniam, C. Das
{"title":"Communication in parallel applications: characterization and sensitivity analysis","authors":"Dale Seed, A. Sivasubramaniam, C. Das","doi":"10.1109/ICPP.1997.622679","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622679","url":null,"abstract":"Communication characterization of parallel applications is essential to understand the interplay between architectures and applications in determining the maximum achievable performance. Although a significant amount of research has been conducted on execution-based architectural evaluations, very little effort has gone into capturing the communication behavior of an application mathematically. In this paper, we attempt to characterize the communication behavior of applications by temporal, spatial and volume attributes. We also study the impact of variation in application and architectural parameters on the communication behavior in terms of the three attributes. Our results show that for the chosen suite of applications, the message arrival and spatial distributions can be closely approximated by known statistical distributions and that the temporal as well as spatial distributions of all applications remain unchanged with respect to four parameters considered in this study. These results lead us closer to the belief that it is possible to abstract the communication properties of parallel applications in convenient mathematical forms that have wide applicability.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121457470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modeling the impact of run-time uncertainty on optimal computation scheduling using feedback 基于反馈的运行时不确定性对最优计算调度的影响建模
R. Dietz, T. Casavant, T. Scheetz, T. Braun, M. Andersland
{"title":"Modeling the impact of run-time uncertainty on optimal computation scheduling using feedback","authors":"R. Dietz, T. Casavant, T. Scheetz, T. Braun, M. Andersland","doi":"10.1109/ICPP.1997.622683","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622683","url":null,"abstract":"Increasingly, feedback of measured run-time information is being used in the optimization of computation execution. This paper introduces a model relating the static view of a computation to its run-time variance that is useful in this context. A notion of uncertainty is then used to provide bounds on key scheduling parameters of the run-time computation. To illustrate the relationship between fidelity in measured information and minimum schedulable, grain size, we apply the bounds to three existing parallel architectures for the case of run-time variance caused by monitoring intrusion. We also outline a hybrid static-dynamic scheduling paradigm-SEDIA-that uses the model of uncertainty to optimize computation for execution in the presence of run-time variance from sources other than monitoring intrusion.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"3 Suppl N 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116895618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The affinity entry consistency protocol 亲和性表项一致性协议
C. Bentes, R. Bianchini, C. Amorim
{"title":"The affinity entry consistency protocol","authors":"C. Bentes, R. Bianchini, C. Amorim","doi":"10.1109/ICPP.1997.622646","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622646","url":null,"abstract":"In this paper we propose a novel software-only distributed shared memory system (SW-DSM), the Affinity Entry Consistency (AEC) protocol. The protocol is based on Entry Consistency but, unlike previous approaches, does not require the explicit association of shared data to synchronization variables, uses the page as its coherence unit, and generates the set of modifications (in the form of diffs) made to shared pages eagerly. The AEC protocol hides the overhead of generating and applying diffs behind synchronization delays, and uses a novel technique, Lock Acquirer Prediction (LAP), to tolerate the overhead of transferring diffs through the network. LAP attempts to predict the next acquirer of a lock at the time of the release, so that the acquirer can be updated even before requesting ownership of the lack. Using execution-driven simulation of real applications, we show that LAP performs very well under AEC; LAP predictions are within the 80-97% range of accuracy. Our results also show that LAP improves performance by 7-28% for our applications. In addition we find that most of the diff creation overhead in the AEC protocol can usually be overlapped with synchronization latencies. A comparison against simulated TreadMarks shows that AEC outperforms TreadMarks by as much as 47%. We conclude that LAP is a useful technique for improving the performance of update-based SW-DSMs, while AEC is an efficient implementation of the Entry Consistency model.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115489049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Decisive path scheduling: a new list scheduling method 决定性路径调度:一种新的列表调度方法
G. Park, B. Shirazi, J. Marquis, Hyunseung Choo
{"title":"Decisive path scheduling: a new list scheduling method","authors":"G. Park, B. Shirazi, J. Marquis, Hyunseung Choo","doi":"10.1109/ICPP.1997.622682","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622682","url":null,"abstract":"Scheduling parallel tasks represented as a Directed Acyclic Graph (DAG), on a multiprocessor system has been an important research area in the past decades. One of the critical aspects of a class of scheduling algorithms, called \"List Scheduling\", is how to decide which task is to be scheduled next. This is achieved by assigning priorities to the nodes or the edges of the input DAG, and thus the task with the highest priority will be scheduled next. This paper proposes a low complexity scheduling algorithm to improve the priority node selection criteria in list scheduling algorithms. The worst case performance of the proposed algorithm is analyzed for general input DAGs. Also, the worst case performance and the optimality conditions are obtained for free structured input DAGs. The performance comparison study shows that the proposed algorithm outperforms existing scheduling algorithms especially for input DAGs with high communication overheads. The performance improvement over existing algorithms becomes larger as the input DAG becomes more dense and the level of parallelism in the DAG is increased.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129970218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Hardware versus software implementation of COMA 硬件与软件实现的对比
Adrian Moga, M. Dubois, A. Gefflaut
{"title":"Hardware versus software implementation of COMA","authors":"Adrian Moga, M. Dubois, A. Gefflaut","doi":"10.1109/ICPP.1997.622652","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622652","url":null,"abstract":"Traditionally, cache coherence in multiprocessors has been maintained in hardware. However, the cost-effectiveness of hardwired protocols is questionable. Virtual Shared Memory systems have highlighted the many advantages of software-implemented protocols, albeit at a performance price. The performance gap is narrowed by hybrid systems with the addition of hardware support for fine-grain sharing. We have developed a software protocol for a COMA (Cache-Only Memory Architecture). We call the system SC-COMA for Software-Controlled COMA, to emphasize that the protocol engine is emulated by software executed on the main processor. Contrary to user-level protocols, the software handling coherence events in SC-COMA runs in sub-kernel mode, transparently providing the same services to applications as a hardware counterpart. The software emulation layer has been written and we compare SC-COMA to an idealized hardware COMA through detailed simulations. Our results show that SC-COMA is competitive. On systems with 32 processors, it achieves a slowdown of 11-56% with respect to its hardware counterpart, across a range of applications and memory pressures. SC-COMA scales well, up to 32 nodes. A study on the impact of faster processors on SC-COMA's relative performance indicates a consistent improvement, but with a limitation due to the loosely-integrated design. We conclude that SC-COMA is a viable solution to easily transform networks of workstations into powerful multiprocessors.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"246 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120892870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Data distribution analysis and optimization for Pointer-based distributed programs 基于指针的分布式程序的数据分布分析与优化
ProgramsJenq Kuen Lee, Daniel Ho, Yue-Chee ChuangDepartment
{"title":"Data distribution analysis and optimization for Pointer-based distributed programs","authors":"ProgramsJenq Kuen Lee, Daniel Ho, Yue-Chee ChuangDepartment","doi":"10.1109/ICPP.1997.622556","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622556","url":null,"abstract":"A critical question remains open if the compiler can understand the distribution pattern of pointer-based distributed objects built by application programmers, and perform optimization as effectively as the HPF compiler does with distributed arrays. In this paper, we address this challenging issue. In our work, we first present a parallel progamming model which allows application programmers to build pointer-based distributed objects at application levels. Next we propose a distribution analysis algorithm which can automatically summarize the distribution pattern of pointer-based distributed objects built by application programmers. Our work, to our best knowledge, is the first work to attempt to address this open issue. Our distribution analysis framework employs Feautrier's parametric integer programming as the basic solver, and can always obtain precise distribution information from the class of programs written in our parallel programming model with static control. Experimental results done on a 16-node IBM SP-2 machine show that the compiler with the help of distribution analysis algorithm can significantly improve the performance of pointer-based distributed programs.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122782055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Exploiting task and data parallelism in parallel Hough and Radon transforms 利用并行霍夫和拉东变换中的任务和数据并行性
D. Krishnaswamy, P. Banerjee
{"title":"Exploiting task and data parallelism in parallel Hough and Radon transforms","authors":"D. Krishnaswamy, P. Banerjee","doi":"10.1109/ICPP.1997.622678","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622678","url":null,"abstract":"Edge detection and shape detection in digital images are very computationally intensive problems. Parallel algorithms can potentially provide significant speedups while preserving the quality of the result obtained. Hough and Radon Transforms are projection-based transforms which are commonly used for edge detection and shape detection respectively. We propose in this paper various new parallel algorithms which exploit both task and data parallelism available in Hough and Radon transforms algorithms. A memory scalable aggressive task parallel algorithm is shown to be the most optimal algorithm in terms of memory scalability and performance on an IBM SP2.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116456652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Local search for DAG scheduling and task assignment 本地搜索DAG调度和任务分配
Minyou Wu, W. Shu, J. Gu
{"title":"Local search for DAG scheduling and task assignment","authors":"Minyou Wu, W. Shu, J. Gu","doi":"10.1109/ICPP.1997.622584","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622584","url":null,"abstract":"Scheduling DAGs to multiprocessors is one of the key issues in high-performance computing. Local search can be used to effectively improve the quality of a scheduling algorithm. In this paper, based on topological ordering, we present a fast local search algorithm which can improve the quality of DAG scheduling algorithms. This low complexity algorithm can effectively reduce the length of a given schedule.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127309968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Design of scalable and multicast capable cut-through switches for high-speed LANs 高速局域网可扩展和多播直通交换机的设计
Mingyao Yang, L. Ni
{"title":"Design of scalable and multicast capable cut-through switches for high-speed LANs","authors":"Mingyao Yang, L. Ni","doi":"10.1109/ICPP.1997.622662","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622662","url":null,"abstract":"High-speed switches play an important role in building switched LANs. Among different techniques used in switch design, cut-through switching promises short latency delivery and thus is well suited to distributed/parallel applications. The back pressure flow control of cut-through switching also prevents packet loss due to buffer overflow. This paper presents an incremental switch design based on modular building blocks using cut-through switching technique. The switch can be either nonblocking with full configuration and deterministic routing, or blocking but having more flexibility in configuration and fault tolerance. A kind of switch configuration that fits the client/server computing paradigm is presented. Simulation results are given for various switch configurations and traffic loads. The switch also has built-in hardware multicast capability. Issues of physical layout and integration into practical LANs are also discussed.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131981145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Throttle and preempt: a new flow control for real-time communications in wormhole networks 节流和抢占:虫洞网络中实时通信的一种新的流量控制
Hyojeong Song, Boseob Kwon, H. Yoon
{"title":"Throttle and preempt: a new flow control for real-time communications in wormhole networks","authors":"Hyojeong Song, Boseob Kwon, H. Yoon","doi":"10.1109/ICPP.1997.622589","DOIUrl":"https://doi.org/10.1109/ICPP.1997.622589","url":null,"abstract":"We study wormhole routed networks and their suitability for real-time traffic in a priority-driven paradigm. A traditional blocking flow control in wormhole routing may lead to a priority inversion in the sense that high priority packets are blocked by low priority packets for unlimited time. This uncontrolled priority inversion causes the frequent deadline missing. This paper therefore proposes a new flow control called throttle and preempt flow control, where high priority packets can preempt network resources held by low priority packets, if necessary. As a result, this flow control does not cause priority inversion. Our simulations show that the throttle and preempt flow control dramatically reduces deadline miss ratio without extra virtual channels. It is also observed that the throttle and preempt flow control offers shorter delay for non-real-time traffic than existing real-time flow control does.","PeriodicalId":221761,"journal":{"name":"Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115090426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信