{"title":"Dynamic load balancing schemes for computing accessible surface area of protein molecules","authors":"E. Suh, B. Narahari, R. Simha","doi":"10.1109/HIPC.1998.738005","DOIUrl":"https://doi.org/10.1109/HIPC.1998.738005","url":null,"abstract":"This paper presents an experimental study of dynamic load balancing methods for a parallelized solution to a well-known problem in computational molecular biology: computing the accessible surface areas (ASA) of proteins. The main contribution is a better understanding of how certain techniques for load estimation and redistribution must be combined carefully for effectiveness and how these combinations need to change during the course of a computation. In particular, the Shrake-Rupley ASA algorithm is implemented and three aspects of dynamic load balancing are studied: how to estimate load imbalance (the estimation problem); when to invoke load redistribution (the invocation problem); and how to load balance (the mapping problem). The results in this paper show that a dynamically-selected mix of algorithms in each category that adapts to changing structure within the protein works better than a static periodic application of a static mix of algorithms.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123018817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Precise control of instruction caches","authors":"Maria Smirli, D. Lioupis, K. Kissell","doi":"10.1109/HIPC.1998.737965","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737965","url":null,"abstract":"Instruction caches are usually designed to fetch the whole block from memory in case of a miss. However, the fetched blocks might contain branch instructions which if taken, will render the rest of the block useless. A novel approach is introduced, namely the Precise Control, which fetches only the words of a cache block that are likely to be used. The performance of Precise Control is evaluated and it is shown that it has a positive influence on system performance. Precise Control reduces the words fetched from memory by up to 60%, thus reducing significantly the communication overhead between cache and main memory as well as the total execution time.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133144467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"One to all broadcast in hyper butterfly networks","authors":"Wei Shi, P. Srimani","doi":"10.1109/HIPC.1998.737984","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737984","url":null,"abstract":"The authors further investigate the topological properties of the hyper butterfly networks; they develop algorithms for constructing edge disjoint spanning trees in wrapped butterfly graphs and hyper butterfly networks and they use those results to design asymptotically optimal one-to-all broadcast algorithms in those two classes of networks.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"48 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125728891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance analysis of wavelength converters in WDM wavelength routed optical networks","authors":"K. Venugopal, E. E. Rajan, P. S. Kumar","doi":"10.1109/HIPC.1998.737994","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737994","url":null,"abstract":"This paper attempts to study the impact of wavelength converters in WDM wavelength routed all-optical networks. A new heuristic approach for placement of wavelength converters to reduce blocking probabilities is explored. Multihop virtual topology is designed to minimize the number and overall cost of the converters. Blocking probabilities for static lightpath establishment (SLE) and dynamic lightpath establishment (DLE) are analyzed. In the case of SLE, arranging lightpaths in ascending order of their path length reduces blocking probability. Wavelength converters placed at nodes with high nodal degree further reduces the blocking probabilities. Simulation studies performed on a 28-node USA long haul network and a 20-node arbitrary mesh network, validate the above observations.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129450764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On topology and bisection bandwidth of hierarchical-ring networks for shared-memory multiprocessors","authors":"G. Ravindran, M. Stumm","doi":"10.1109/HIPC.1998.737997","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737997","url":null,"abstract":"Hierarchical-ring based multiprocessors are interesting alternatives to the more popular two-dimensional direct networks. They allow for simple router designs and wider communication paths than their direct network counterparts. There are several ways hierarchical-ring networks can be configured for a given number of processors. Feasible topologies range from tall, lean networks to short, wide networks, but only a few of these possess high throughput and low latency. We present the results of a simulation study: to determine how large hierarchical-ring networks can become before their performance deteriorates due to their bisection bandwidth constraints; and to derive topologies with high throughput and low latency for a given number of processors. We show that a system with a maximum of 120 processors and three levels of hierarchy can sustain most memory access behaviours, but that larger systems can be sustained, only if their bisection bandwidth is increased.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126059402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virtual channel multiplexing in networks of workstations with irregular topology","authors":"F. Silla, J. Duato, A. Sivasubramaniam, C. Das","doi":"10.1109/HIPC.1998.737983","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737983","url":null,"abstract":"Networks of workstations are becoming a cost-effective alternative for small-scale parallel computing. Although they may not provide the closely coupled environment of multicomputers and multiprocessors, they meet the needs of a great variety of parallel computing problems at a lower cost. However in order to achieve a high efficiency, the interconnects used to build the network of workstations must provide a very high bandwidth and low latencies, making their design a critical issue. Recently, a very efficient flow control protocol for networks of workstations has been proposed by the authors. This protocol multiplexes physical channels between several virtual channels and minimizes the use of control flits by transmitting several data flits each time a virtual channel gets the link. In this protocol, a virtual channel sends data flits until the message blocks or is completely transmitted. However it can reduce network throughput, by increasing short message latency, due to long messages monopolizing channels and hindering the progress of short messages. In this paper, we analyze the impact of limiting the number of flits (block size) that a virtual channel can send once it gets the link. We propose a new version of the previous flow control protocol that is easily, implementable on hardware. Simulation results show that limiting the maximum block size is not a good design decision, because the overall network performance decreases. Only when short message latency is crucial is it is acceptable to limit the block size.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131263996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data structure distribution and multi-threading of Linux file system for multiprocessors","authors":"Anish Sheth, K. Gopinath","doi":"10.1109/HIPC.1998.737976","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737976","url":null,"abstract":"The standard Linux design assumes a uniprocessor architecture. Allowing several processors to execute simultaneously in the kernel mode on behalf of different processes can cause consistency problems unless appropriate exclusion mechanisms are used. In addition, if the file system data structures are not distributed, performance can be affected. We discuss a multiprocessor file system design for Linux ext2fs with various data structures, such as super block, inodes, buffer cache, directory cache (name cache), distributed with respect to different processors with appropriate exclusion mechanisms.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131886655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extended collective I/O for efficient retrieval of large objects","authors":"S. More, A. Choudhary","doi":"10.1109/HIPC.1998.738009","DOIUrl":"https://doi.org/10.1109/HIPC.1998.738009","url":null,"abstract":"Object-relational database management systems (OR-DBMS) extend the capabilities of the relational databases by allowing definition of new data types and methods to operate on these data types while retaining most of the relational model semantics. In this paper we examine issues related to parallel processing of queries in the object-relational model with respect to efficient storage and retrieval of large objects. We extend the concept of collective I/O and other related techniques such as request merging and data sieving in the database domain to achieve high performance in the retrieval of large objects. We deal with the I/O optimization problem in the query executor, access methods and the low level runtime system. We also propose a new technique called pooled striping for efficient storage of large objects on multiple disks. The results presented in this paper clearly show the effectiveness of the proposed I/O optimization techniques in handling large amounts of data in a parallel object-relational database system.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133896189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nabanita Das, B. Bhattacharya, R. Menon, S. Bezrukov
{"title":"Permutation admissibility in shuffle-exchange networks with arbitrary number of stages","authors":"Nabanita Das, B. Bhattacharya, R. Menon, S. Bezrukov","doi":"10.1109/HIPC.1998.737998","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737998","url":null,"abstract":"The set of input-output permutations that are routable through a multistage interconnection network without any conflict (known as the admissible set), plays an important role in determining the capability of the network. Recent works on the permutation admissibility problem of shuffle-exchange networks (SEN) of size N/spl times/N, deal with (n+k) stages, where n=log/sub 2/N, and k denotes the number of extra stages. For k=0 or 1, O(Nn) algorithms exist to check if any permutation is admissible, but for k/spl ges/2, a polynomial time solution is not yet known. The more general problem of finding the minimum number (m) of shuffle-exchange stages required to realize an arbitrary permutation, 1/spl les/m/spl les/2n-1, is also an open problem. In this paper, we present an O(Nn) algorithm that checks whether a given permutation P is admissible in an m stage SEN, 1/spl les/m/spl les/n, and determines in O(Nnlogn) time the minimum number of stages m of shuffle-exchange, required to realize P. Thus, a single-stage shuffle-exchange network will be able to realize such a permutation with m passes, by recirculating all the paths m times through a single-stage, i.e., with minimum transmission delay, which, otherwise cannot be achieved with a fixed-stage SEN. Furthermore, we present a necessary condition for permutation admissibility in an m stage SEN, where n","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116381628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data prefetching with co-operative caching","authors":"Chi-Hung Chi, Siu-Chung Lau","doi":"10.1109/HIPC.1998.737967","DOIUrl":"https://doi.org/10.1109/HIPC.1998.737967","url":null,"abstract":"Recent research in data cache prefetching is found to be selective in nature: achieving high prediction accuracy over a set of selected references such as array access with constant strides. As a result, for applications where the memory latency is mainly due to data accesses in the set of non selected references of a program, they lose their effectiveness. In fact, their performance might be worse than that of the traditional, less accurate prefetch-on-miss scheme. To overcome this situation, we propose three cooperative cache techniques to assist data prefetching. They are: [1] default prefetching to increase the overall prefetch coverage; [2] block concept to perform variable distance lookahead prefetching; and [3] a spatial data buffer with load balancing to reduce the interference between spatial data and temporal data. To illustrate the potentials of these techniques, they were implemented on top of our previously proposed Instruction Opcode-Based Prefetching (IOBP) scheme (T.F. Chen, 1993). Trace driven simulation on SPEC92 showed that a 8 Kbytes data cache with a 512 bytes spatial buffer can achieve similar performance as a 32 Kbytes data cache through these techniques.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121948011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}