Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000最新文献

筛选
英文 中文
Caching single-assignment structures to build a robust fine-grain multi-threading system 缓存单赋值结构,构建健壮的细粒度多线程系统
Wen-Yen Lin, J. Gaudiot, J. N. Amaral, G. Gao
{"title":"Caching single-assignment structures to build a robust fine-grain multi-threading system","authors":"Wen-Yen Lin, J. Gaudiot, J. N. Amaral, G. Gao","doi":"10.1109/IPDPS.2000.846039","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846039","url":null,"abstract":"We present the design, implementation, and evaluation of single assignment data structures and of a software controlled cache in an existing multi-threaded architecture platform-the Efficient Architecture for Running Threads (EARTH). The software-controlled cache (ISSC) exploits temporal and spatial locality of EARTH split-phased memory transactions for single-assignment memory references. Our experimental evaluation indicates that the caching mechanism for single-assignment storage makes the EARTH memory system more robust to variations in the latency of memory operations. As a consequence the system can be ported to a wider range of machine platforms and deliver speedup for both regular and irregular application.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133683496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Thread migration and load balancing in non-dedicated environments 非专用环境中的线程迁移和负载平衡
K. Thitikamol, P. Keleher
{"title":"Thread migration and load balancing in non-dedicated environments","authors":"K. Thitikamol, P. Keleher","doi":"10.1109/IPDPS.2000.846038","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846038","url":null,"abstract":"Networks of workstations are fast becoming the standard environment for parallel applications. However, the use of \"found\" resources as a platform for tightly-coupled runtime environments has at least three obstacles: contention for resources, differing processor speeds, and processor heterogeneity. All three obstacles result in load imbalance, leading to poor performance for scientific applications. This paper describes the use of thread migration in transparently addressing this load imbalance in the context of the CVM software distributed shared memory system. We describe the implementation and performance of mechanisms and policies that accommodate both resource contention, and heterogeneity in clock speed and processor type. Our results show that these cycles can indeed be effectively exploited, and that the runtime cost of processor heterogeneity can be quite manageable. Along the way, however, we identify a number of problems that need to be addressed before such systems can enjoy widespread use.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127463061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Using available remote memory dynamically for parallel data mining application on ATM-connected PC cluster 动态利用可用的远程内存实现连接atm的PC集群上的并行数据挖掘
M. Oguchi, M. Kitsuregawa
{"title":"Using available remote memory dynamically for parallel data mining application on ATM-connected PC cluster","authors":"M. Oguchi, M. Kitsuregawa","doi":"10.1109/IPDPS.2000.846014","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846014","url":null,"abstract":"Personal computer/Workstation (PC/WS) clusters are promising candidates for future high performance computers, because of their good scalability and cost performance ratio. Data intensive applications, such as data mining and ad hoc query processing in databases, are considered very important for massively parallel processors, as well as conventional scientific calculations. Thus, investigating the feasibility of data intensive applications on a PC cluster is meaningful. Association rule mining, one of the best-known problems in data mining, differs from conventional scientific calculations in its usage of main memory. It allocates many small data areas in main memory, and the number of those areas suddenly grows enormously during execution. As a result, the contents of memory must be swapped out if the requirement for memory space exceeds the real memory size. However, because the size of each data area is rather small and the elements are accessed almost at random, swapping out to a storage device must degrade the performance severely. In this paper, we investigate the feasibility of using available remote nodes' memory as a swap area when application execution nodes need to swap out their real memory contents during the execution of parallel data mining on PC clusters. We report our experiments in which application execution nodes acquire extra memory dynamically from several available remote nodes through an ATM network. A method of remote memory utilization with remote update operations is proposed and evaluated. The experimental results on our PC cluster show that the proposed method is expected to be considerably better than using hard disks as a swapping device. The dynamic decision mechanism for remote memory availability and the migration operations are also evaluated.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127360246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
On sorting an intransitive total ordered set using semi-heap 关于用半堆对不可传递的全有序集排序
Jie Wu
{"title":"On sorting an intransitive total ordered set using semi-heap","authors":"Jie Wu","doi":"10.1109/IPDPS.2000.845993","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845993","url":null,"abstract":"The problem of sorting an intransitive total ordered set, a generalization of regular sorting, is considered. This generalized sorting is based on the fact that there exists a special linear ordering for any intransitive total ordered set. A new data structure called semi-heap is proposed to construct an optimal /spl Theta/(n log n) sorting algorithm. Finally, we propose a cost-optimal parallel algorithm using semi-heap. The run time of this algorithm is /spl Theta/(n) with /spl Theta/(log n) processors under the EREW PRAM model.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129945779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
VisOK: a flexible visualization system for distributed Java object application VisOK:一个灵活的分布式Java对象应用可视化系统
Dong-Woo Lee, R. S. Ramakrishna
{"title":"VisOK: a flexible visualization system for distributed Java object application","authors":"Dong-Woo Lee, R. S. Ramakrishna","doi":"10.1109/IPDPS.2000.846011","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846011","url":null,"abstract":"Distributed object systems are known to be very complex. Consequently it is very difficult, if not impossible, to see the overall relationship among participating objects in the system. That complicates the issues connected with performance tuning and maintenance. An economical way to visualize the system is clearly needed. In this paper, we propose a tracing facility for Java-based distributed object system, especially Java RMI (Remote Method Invocation). Our visualization system VisOK (Visual Object-Kit) uses two-phase hybrid post-mortem/on-the-fly technique. The fundamental tracing part has a flexible and dynamic mechanism. The main idea behind the tracing technique is the plug-in sensor model (PSM). There is a close relationship between tracing part and the visualization part. For effective visualization of a working system, the causality of events has to be preserved VisOK supports global event ordering. And for collecting and assembling local states of objects, we propose a distributed snapshot algorithm.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134073239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The memory of bandwidth bottleneck and its amelioration by a compiler 内存带宽瓶颈及其编译器的改进
C. Ding, K. Kennedy
{"title":"The memory of bandwidth bottleneck and its amelioration by a compiler","authors":"C. Ding, K. Kennedy","doi":"10.1109/IPDPS.2000.845980","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845980","url":null,"abstract":"As the speed gap between CPU and memory widens, memory hierarchy has become the primary factor limiting program performance. Until now, the principal focus of hardware and software innovations has been overcoming latency. However, the advent of latency tolerance techniques such as non-blocking cache and software prefetching begins the process of trading bandwidth for latency by overlapping and pipelining memory transfers. Since actual latency is the inverse of the consumed bandwidth, memory latency cannot be fully tolerated without infinite bandwidth. This perspective has led us to two questions. Do current machines provide sufficient data bandwidth? If not, can a program be restructured to consume less bandwidth? This paper answers these questions in two parts. The first part defines a new bandwidth-based performance model and demonstrates the serious performance bottleneck due to the lack of memory bandwidth. The second part describes a new set of compiler optimizations for reducing bandwidth consumption of programs.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123088461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Optimal periodic remapping of bulk synchronous computations on multiprogrammed distributed systems 多程序分布式系统批量同步计算的最优周期重映射
N. Fong, Chengzhong Xu, L. Wang
{"title":"Optimal periodic remapping of bulk synchronous computations on multiprogrammed distributed systems","authors":"N. Fong, Chengzhong Xu, L. Wang","doi":"10.1109/IPDPS.2000.845970","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845970","url":null,"abstract":"For bulk synchronous computations that have nondeterministic behaviors, dynamic remapping is an effective approach to ensure parallel efficiency. There are two basic issues in remapping: when and how to remap. This paper presents a formal treatment of the first issue for dynamic computations with a priori known statistical behaviors. We have formulated the problem as two complement sequential stochastic optimization, with an objective of finding optimal remapping frequencies for a given tolerance of load imbalance on multiprogrammed distributed systems. We have developed analytical approaches to precisely characterize the transient statistical behaviors of the workload process and derived optimal remapping frequencies for various random workload change processes.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129786080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A task duplication based scheduling algorithm for heterogeneous systems 基于任务复制的异构系统调度算法
S. Ranaweera, D. Agrawal
{"title":"A task duplication based scheduling algorithm for heterogeneous systems","authors":"S. Ranaweera, D. Agrawal","doi":"10.1109/IPDPS.2000.846020","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846020","url":null,"abstract":"Optimal scheduling of tasks of a directed acyclic graph (DAG) onto a set of processors is a strong NP-hard problem. In this paper we present a scheduling scheme called TDS to schedule tasks of a DAG onto a heterogeneous system. This models a network of workstations, with processors of varying computing power. The primary objective of this scheme is to minimize schedule length and scheduling time itself. The existing task duplication based scheduling scheme is primarily done for totally homogeneous systems. We compare the performance of this algorithm with an existing scheduling scheme for heterogeneous processors called BIL. In initial simulations TDS has been observed to generate scheduling lengths shorter than that of BIL, for communication-to-computation cost ratios (CCR) of 0.2 to 1. Moreover TDS is far more superior than BIL as far as scheduling time is concerned.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129608524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 162
Repartitioning unstructured adaptive meshes 重新划分非结构化自适应网格
J. Castaños, J. Savage
{"title":"Repartitioning unstructured adaptive meshes","authors":"J. Castaños, J. Savage","doi":"10.1109/IPDPS.2000.846070","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846070","url":null,"abstract":"We present a new parallel repartitioning algorithm for adaptive finite-element meshes that significantly reduces the amount of data that needs to move between processors in order to rebalance a workload after mesh adaptation (refinement or coarsening). These results derive their importance from the fact that the time to migrate data can be a large fraction of the total time far the parallel adaptive solution of partial differential equations.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124653837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
S3MP: a task duplication based scalable scheduling algorithm for symmetric multiprocessors S3MP:对称多处理器的基于任务复制的可伸缩调度算法
O. Kang, D. Agrawal
{"title":"S3MP: a task duplication based scalable scheduling algorithm for symmetric multiprocessors","authors":"O. Kang, D. Agrawal","doi":"10.1109/IPDPS.2000.846021","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846021","url":null,"abstract":"We present a task duplication based scalable scheduling algorithm for Symmetric Multiprocessors (SMP), called S3MP (Scalable Scheduling for SMP), to address the problem of task scheduling. The algorithm pre-allocates network communication resources so as to avoid potential communication conflicts, and generates a schedule for the number of processors available in an SMP. This algorithm employs heuristics to select duplication of tasks so that schedule length is reduced/minimized. The performance of the S3MP algorithm has been observed by comparing the schedule length under various number of processors and the ratio of communication to computation cost. This algorithm also has been applied to some practical directed acyclic graphs (DAGs).","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132608082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信