{"title":"Parallel computation for chromosome reconstruction on a cluster of workstations","authors":"S. Bhandarkar, Salem Machaka, S. Shete, J. Arnold","doi":"10.1109/IPDPS.2000.845965","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845965","url":null,"abstract":"Reconstructing a physical map of a chromosome from a genomic library presents a central computational problem in genetics. Physical map reconstruction in the presence of errors is a problem of high computational complexity which provides the motivation for parallel computing. Parallelization strategies for a maximum likelihood estimation-based approach to physical map reconstruction are presented. The estimation procedure entails gradient descent search for determining the optimal spacings between probes for a given probe ordering. The optimal probe ordering is determined using a stochastic optimization algorithm. A two-tier parallelization strategy is proposed wherein the gradient descent search is parallelized at the lower level and the stochastic optimization algorithm is simultaneously parallelized at the higher level. Implementation and experimental results on a distributed memory multiprocessor cluster running the Parallel Virtual Machine (PVM) environment are presented.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"41 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128481804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective load sharing on heterogeneous networks of workstations","authors":"Li Xiao, Xiaodong Zhang, Yanxia Qu","doi":"10.1109/IPDPS.2000.846016","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846016","url":null,"abstract":"We consider networks of workstations which are not only time-sharing, but also heterogeneous with a large variation in the computing power and memory capacities of different workstations. Many load sharing schemes mainly target sharing CPU resources, and have been intensively evaluated in homogeneous distributed environments. However the penalties of data accesses and movement in modern computer systems, such as page faults, have grown to the point where the overall performance of distributed systems cannot be further improved without serious considerations concerning memory resources in the design of load sharing policies. Considering both system heterogeneity and effective usage of memory resources, we design and evaluate load sharing policies in order to minimize both CPU idle times and the number of page faults in heterogeneous distributed systems. Conducting trace-driven simulations, we show that load sharing policies considering both CPU and memory resources are robust and effective in heterogeneous systems. We also show that the functionality and the nature of load sharing policies are quite independent on several memory demand distributions of workloads.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128970732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal on demand packet scheduling in single-hop multichannel communication systems","authors":"M. Bonuccelli, S. Pelagatti","doi":"10.1109/IPDPS.2000.846004","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846004","url":null,"abstract":"In this paper, we study the problem of on demand minimum length packet scheduling in single-hop multichannel systems. Examples of these systems are those centered around switching networks, like crossbar switches, and WDM optical fiber networks. On demand scheduling require that packets are scheduled upon receipt, and without changing the schedule of earlier packets. On demand scheduling is performed by on-line algorithms. In this paper we-show that a large group of online scheduling algorithms, called maximal algorithms, are asymptotically optimal (in the worst case sense). This result is established by first giving the competitive ratio of these algorithms (nearly 3), and then by showing that no on-line algorithm can (asymptotically) perform better in the worst case. Then, we run a simulation experiment on randomly generated problem instances, whose outcome indicates an average increase of the schedule length of maximal algorithms, of 5% with respect to the lower bound.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130508774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Study of a multilevel approach to partitioning for parallel logic simulation","authors":"Swaminathan Subramanian, D. Rao, P. Wilsey","doi":"10.1109/IPDPS.2000.846071","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846071","url":null,"abstract":"Parallel simulation techniques are often employed to meet the computational requirements of large hardware simulations in order to reduce simulation time. In addition, partitioning for parallel simulations has been shown to be vital for achieving higher simulation throughput. This paper presents the results of our partitioning studies conducted on an optimistic-parallel logic simulation framework based on the time warp synchronization protocol. The paper also presents the design and implementation of a new partitioning algorithm based on a multilevel heuristic, developed as a part of this study. The multilevel algorithm attempts to balance load, maximize concurrency, and reduce inter-processor communication in three phases to improve performance. The experimental results obtained from our benchmarks indicate that the multilevel algorithm yields better partitions than other partitioning algorithms included in the study.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"23 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133071406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Relating two-dimensional reconfigurable meshes with optically pipelined buses","authors":"A. Bourgeois, J. Trahan","doi":"10.1109/IPDPS.2000.846060","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846060","url":null,"abstract":"Recently, many models using reconfigurable optically pipelined buses have been proposed in the literature. We present simulations for a number of these models and establish that they possess the same complexity, so that any of these models can simulate a step of one of the other models in constant time with a polynomial increase in size. Specifically, we determine the complexity of three optical models (the PR-Mesh, APPBS, and AROB) to be the same as the well known LR-Mesh and the cycle-free LR-Mesh.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130991062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A mechanism for speculative memory accesses following synchronizing operations","authors":"Takayuki Sato, K. Ohno, H. Nakashima","doi":"10.1109/IPDPS.2000.845976","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845976","url":null,"abstract":"In order to reduce the overhead of synchronizing operations of shared memory multiprocessors, this paper proposes a mechanism, named specMEM, to execute memory accesses following a synchronizing operation speculatively before the completion of the synchronization is confirmed. A unique feature of our mechanism is that the detection of speculation failure and the restoration of computational state on the failure are implemented by a small extension of coherent cache. It is also remarkable that operations for speculation on its success and failure are performed in a constant time for each independent of the number of speculative accesses. This is realized by implementing a part of cache tag for cache line state with a simple functional memory. This paper also describes an evaluation result of specMEM applied to barrier synchronization. Performance data was obtained by simulation running benchmark programs in SPLASH-2. We found that the execution time of LU decomposition, in which the length of period between a pair of barriers significantly varies because of the fluctuation of computational load, is improved by 13%.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132251509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal all-to-all personalized exchange in a class of optical multistage networks","authors":"Yuanyuan Yang, Jianchao Wang","doi":"10.1109/IPDPS.2000.846061","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846061","url":null,"abstract":"All-to-all personalized exchange is one of the most dense collective communication patterns and occurs in many important parallel computing/networking applications. In this paper, we look into the issue of realizing all-to-all personalized exchange in optical multistage networks. Advances in electro-optic technologies have made optical communication a promising networking choice to meet the increasing demands for high channel bandwidth and low communication latency of high-performance computing/communication applications. Although optical multistage networks hold great promise and have demonstrated advantages over their electronic counterpart, they also hold their own challenges. Due to the unique properties of optics, crosstalk in optical switches should be avoided to make them work properly. In this paper, we will provide an optimal scheme for realizing all-to-all personalized exchange in a class of unique-path, self-routing optical multistage networks crosstalk-free. The basic idea of realizing all-to-all personalized exchange in such a multistage network is to transform it to multiple semi-permutations, each of which can be realized crosstalk-free in a single pass, and take advantage of pipelined message transmission in consecutive passes. As can be seen, the time complexity of our all-to-all personalized exchange algorithms matches the lower bound of the communication delay in this type of network.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133607314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neungsoo Park, Dongsoo Kang, K. Bondalapati, V. Prasanna
{"title":"Dynamic data layouts for cache-conscious factorization of DFT","authors":"Neungsoo Park, Dongsoo Kang, K. Bondalapati, V. Prasanna","doi":"10.1109/IPDPS.2000.846054","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846054","url":null,"abstract":"Effective utilization of cache memories is a key factor in achieving high performance in computing the Discrete Fourier Transform (DFT). Most optimization techniques for computing the DFT rely on either modifying the computation and data access order or exploiting low level platform specific details, while keeping the data layout in memory static. In this paper we propose a high level optimization technique, dynamic data layout (DDL). In DDL, data reorganization is performed between computations to effectively utilize the cache. This cache-conscious factorization of the DFT including the data reorganization steps is automatically computed by using efficient techniques in our approach. An analytical model of the cache miss pattern is utilized to predict the performance and explore the search space of factorizations. Our technique results in up to a factor of 4 improvement over standard FFT implementations and up to 33% improvement over other optimization techniques such as copying on SUN UltraSPARC-II, DEC Alpha and Intel Pentium III.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132655276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A constructive solution to the juggling problem in processor array synthesis","authors":"A. Darte, R. Schreiber, B. R. Rau, F. Vivien","doi":"10.1109/IPDPS.2000.846069","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846069","url":null,"abstract":"We describe a new, practical, constructive method for solving the well-known conflict-free scheduling problem for the locally sequential, globally parallel (LSGP) case of processor array synthesis. First, we provide a closed form solution that enables the enumeration of all conflict-free schedules. Then, we discuss the reduction of the cost of hardware whose function is to control the flow of data, enable or disable functional units, and generate memory addresses. We present a new technique for controlling the complexity of these housekeeping functions in a processor array. Both of these techniques have been incorporated into a software system for the automatic synthesis of hardware accelerators developed by HP Labs.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"51 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132723486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel performance study of Monte Carlo photon transport code on shared-, distributed-, and distributed-shared-memory architectures","authors":"A. Majumdar","doi":"10.1109/IPDPS.2000.845969","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845969","url":null,"abstract":"We have parallelized a Monte Carlo photon transport algorithm. Three different parallel versions of the algorithm were developed. The first version is for the Tera Multi-Threaded Architecture (MTA) and uses Tera specific directives. The second version, which uses MPI library calls, has been implemented on both the CRAY T3E and the 8-way SMP IBM SP with Power3 processors. The third version is a hybrid MPI-OpenMP implementation and is used on the SMP IBM SP. This version uses MPI to communicate between nodes and OpenMP to perform shared memory operations among processors within a node. We explain the three different parallelization approaches and present parallel performance results of these three parallel implementations on three different machines. We observe near perfect speedup for the three versions on the three architectures. The results on the SMP IBM SP suggest that the hybrid MPI-OpenMP programming is suitable for SMP type machines.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114827924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}