{"title":"A framework for portable shared memory programming","authors":"M. Schulz, S. Mckee","doi":"10.1109/IPDPS.2003.1213146","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213146","url":null,"abstract":"Widespread adaptation of shared memory programming for high performance computing has been inhibited by a lack of standardization and the resulting portability problems between platforms and APIs. We present the HAMSTER framework, which helps overcome these problems via cross-platform support and easy retargetability to a wide range of programming models. HAMSTER currently supports models ranging from thread APIs to one-sided put/get interfaces, all on top of a single, core middleware architecture. The HAMSTER framework allows programmers to use any of these models, without modification, on top of SMPs, NUMA-like clusters, and Beowulf systems. In addition, our experiments show that HAMSTER achieves this flexibility and portability without sacrificing performance.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128898051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A reconfigurable low-power high-performance matrix multiplier architecture with borrow parallel counters","authors":"R. Lin","doi":"10.1109/IPDPS.2003.1213336","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213336","url":null,"abstract":"A novel run-time reconfigurable matrix processor and its prototype implementation with new circuits, called borrow parallel counters, achieving low power, high speed, simple inter-connections and extra compact design, are presented. For typical graphics and image applications, the multiplier can produce in parallel the products of four 4/spl times/4 matrix pairs of 8-bit data, or two matrices X(4/spl times/4) and Y(4/spl times/4) of 16-bit data, or two matrices X(4/spl times/4) and Y(4/spl times/4) of 32-bit data, or two 64-b numbers. The proposed parallel counters utilize 4-bit 1-hot integer encoding and borrow bits, i.e. input bits of weight 2, effectively merging type-conversions and additions through using a unique embedded full adder circuit.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"224 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130753508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On scheduling collaborative computations on the Internet, I: mesh-DAGs and their close relatives","authors":"A. Rosenberg","doi":"10.1109/IPDPS.2003.1213078","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213078","url":null,"abstract":"Advancing technology has rendered the Internet a viable medium for collaborative computing, via mechanisms such as Web-based computing and Grid computing. We present a \"pebble game\" that abstracts the process of scheduling a computation-DAG (directed acyclic graph) for computing over the Internet, including a novel formal criterion for comparing the qualities of competing schedules. Within this formal setting, we identify a strategy for scheduling the task-nodes of a computation-DAG whose dependencies have the structure of a mesh of any finite dimensionality (a mesh-DAG), that is optimal to within a small constant factor (to within a low-order additive term for 2- and 3-dimensional mesh-DAG). We show that this strategy remains nearly optimal for a generalization of 2-dimensional mesh-DAG whose structures are determined by abelian monoids (a monoid-based version of Cayley graphs).","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132158441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Prasad, A. Bourgeois, Erdogan Dogdu, Rajshekhar Sunderraman, Yi Pan, S. Navathe, Vijay K. Madisetti
{"title":"Implementation of a calendar application based on SyD coordination links","authors":"S. Prasad, A. Bourgeois, Erdogan Dogdu, Rajshekhar Sunderraman, Yi Pan, S. Navathe, Vijay K. Madisetti","doi":"10.1109/IPDPS.2003.1213438","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213438","url":null,"abstract":"System on devices (SyD) is a specification for a middleware to enable heterogeneous collections of information, databases, or devices (such as hand-held devices) to collaborate with each other. This paper illustrates the advantages of SyD by describing a prototype calendar of meetings application. This application highlights some of the technical merits of SyD by exploiting the use of coordination links. Based on the underlying event-and-trigger mechanism, these links allow automatic updates as well as real-time enforcements of global constraints and interdependencies, not available with existing calendar applications. Additionally, the calendar application illustrates coordination among heterogeneous devices and databases, formation and maintenance of dynamic groups, mobility support through proxies, and performance group transactions across independent data stores.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131317014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance prediction of paging workloads using lightweight tracing","authors":"A. Burton, P. Kelly","doi":"10.1109/IPDPS.2003.1213499","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213499","url":null,"abstract":"A trace of a workload's system calls can be obtained with minimal interference, and can be used to drive repeatable experiments to evaluate system configuration alternatives. Replaying system call traces alone sometimes leads to inaccurate predictions because paging, and access to memory-mapped files, are not modelled. The paper extends tracing to handle such workloads. At trace capture time, the application's page-level virtual memory access is monitored. The size of the page access trace, and capture overheads, are reduced by excluding recently-accessed pages. This leads to a slight loss of accuracy. Using a suite of memory-intensive applications, we evaluate the capture overhead and measure the predictive accuracy of the approach.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131740972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Srinivasan, Philip Holman, James H. Anderson, Sanjoy Baruah
{"title":"The case for fair multiprocessor scheduling","authors":"A. Srinivasan, Philip Holman, James H. Anderson, Sanjoy Baruah","doi":"10.1109/IPDPS.2003.1213226","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213226","url":null,"abstract":"In this paper, we compare the PD/sup 2/ Pfair algorithm to the EDF-FF partitioning scheme, which uses \"first fit\" (FF) as a partitioning heuristic and the earliest-deadline-first (EDF) algorithm for per-processor scheduling. We present experimental results that show that PD/sup 2/ is competitive with, and in some cases outperforms, EDF-FF. These results suggest that Pfair scheduling is a viable alternative to partitioning. Furthermore, as discussed herein, Pfair scheduling provides many additional benefits, such as simple and efficient synchronization, temporal isolation, fault tolerance, and support for dynamic tasks.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129223982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A BSP/CGM algorithm for the all-substrings longest common subsequence problem","authors":"C. E. R. Alves, E. Cáceres, S. W. Song","doi":"10.1109/IPDPS.2003.1213150","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213150","url":null,"abstract":"Given two strings X and Y of lengths m and n, respectively, the all-substrings longest common subsequence (ALCS) problem obtains the lengths of the subsequences common to X and any substring of Y. The sequential algorithm takes O(mn) time and O(n) space. We present a parallel algorithm for ALCS on a coarse-grained multicomputer (BSP/CGM) model with p < /spl radic/m processors that takes O(mn/p) time and O(n/spl radic/m) space per processor, with O(log p) communication rounds. The proposed parallel algorithm also solves the well-known LCS problem. To our knowledge this is the best BSP/CGM algorithm for the ALCS problem in the literature.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129302557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applying optical reconfiguration on ATM switch fabrics","authors":"H. S. Laskaridis, G. Papadimitriou, A. Pomportsis","doi":"10.1109/IPDPS.2003.1213338","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213338","url":null,"abstract":"Altering the internal structure of an ATM switch fabric, based on the correlation among ports, can be proved to be advantageous in terms of performance, especially in LAN or campus ATM switches, where we witness stronger traffic correlation. Such reconfiguration can be easily performed in the optical domain, using simple optical elements. We prove the performance improvement, by applying data collected from a campus production ATM switch onto our proposed architecture.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128831036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The feelfem system: a repository system for the finite element method","authors":"H. Fujio","doi":"10.1109/IPDPS.2003.1213461","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213461","url":null,"abstract":"We have developed a finite element method (FEM) software repository tool named feelfem that serves as a code generator. One important feature of feelfem is that it is designed to generate various program models of FEM analysis, including users' own newly developed numerical schemes. Another feature is that interfaces to newly developed parallel programming paradigms and parallel solvers can easily be added to it. Software reuse is an important target of the feelfem system. To achieve flexibility and expandability for the system, we adopt an object-oriented technique and implementation-oriented pseudo-code representation of numerical algorithms. In its latest released version, feelfem has strong interaction with the personal pre/post processor GiD. By using a combination of feelfem and GiD, users can generate prototype parallel FEM applications with newly developed solvers very easily and quickly.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"87 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128845094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concurrent bug patterns and how to test them","authors":"E. Farchi, Yarden Nir-Buchbinder, S. Ur","doi":"10.1109/IPDPS.2003.1213511","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213511","url":null,"abstract":"We present and categorize a taxonomy of concurrent bug patterns. We then use the taxonomy to create new timing heuristics for ConTest. Initial industrial experience indicates that these heuristics improve the bug finding ability of ConTest. We also show how concurrent bug patterns can be derived from concurrent design patterns. Further research is required to complete the concurrent bug taxonomy and formal experiments are needed to show that heuristics derived from the taxonomy improve the bug finding ability of ConTest.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122226530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}