{"title":"Model-driven specification of component-based distributed real-time and embedded systems for verification of systemic QoS properties","authors":"James H. Hill, A. Gokhale","doi":"10.1109/IPDPS.2008.4536573","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536573","url":null,"abstract":"The adage \"the whole is not equal to the sum of its parts\" is very appropriate in the context of verifying a range of systemic properties, such as deadlocks, correctness, and conformance to quality of service (QoS) requirements, for component-based distributed real-time and embedded (DRE) systems. For example, end-to-end worst case response time (WCRT) in component-based DRE systems is not as simple as accumulating WCRT for each individual component in the system because of inherent complexities introduced by the large solution space of possible deployment and configurations. This paper describes a novel process and tool-based artifacts that simplify the formal specification of component-based DRE systems for verification of systemic QoS properties. Our approach is based on the mathematical formalism of Timed Input/Output Automata and uses generative programming techniques for automating the verification of systemic QoS properties for component-based DRE systems.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128383896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A game theoretical data replication technique for mobile ad hoc networks","authors":"S. Khan, A. A. Maciejewski, H. Siegel, I. Ahmad","doi":"10.1109/IPDPS.2008.4536303","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536303","url":null,"abstract":"Adaptive replication of data items on servers of a mobile ad hoc network can alleviate access delays. The selection of data items and servers requires solving a constrained optimization problem, that is in general NP-complete. The problem is further complicated by frequent partitions of the ad hoc network. In this paper, a mathematical model for data replication in ad hoc networks is formulated. We treat the mobile servers in the ad hoc network as self-interested entities, hence they have the capability to manipulate the outcome of a resource allocation mechanism by misrepresenting their valuations. We design a game theoretic \"truthful\" mechanism in which replicas are allocated to mobile servers based on reported valuations. We sketch the exact properties of the truthful mechanism and derive a payment scheme that suppresses the selfish behavior of the mobile servers. The proposed technique is extensively evaluated against three ad hoc network replica allocation methods: (a) extended static access frequency, (b) extended dynamic access frequency and neighborhood, and (c) extended dynamic connectivity grouping. The experimental results reveal that the proposed approach outperforms the three techniques in solution quality and has competitive execution times.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129310010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method","authors":"Yuto Hosogaya, Toshio Endo, S. Matsuoka","doi":"10.1109/IPDPS.2008.4536222","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536222","url":null,"abstract":"With increasing demand for low power high performance computing, reducing power of not only CPUs but also memory is becoming important. In typical general-purpose HPC environments, DRAM is installed in an over-provisioned fashion to avoid swapping, although in most cases not all such memory is used, leading to unnecessary and excessive power consumption, even in a standby state. We propose a next generation low power memory system that reduces required DRAM capacity while minimizing application performance degradation. In this system, both DRAM and MRAM, fast non-volatile memory, are used as main memory, while flash memory is used as a swap device. Our profile-based paging algorithm optimizes memory accesses by using faster memory as much as possible, reducing accesses to slower memory. Simulated results of our architecture show that the overall energy consumption of the memory system can be reduced to 25% by in the best case by reducing DRAM capacity, with only 17% performance loss in application benchmarks.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124546279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Demertzi, P. Diniz, Mary W. Hall, A. Gilbert, Yi Wang
{"title":"The potential of computation reuse in high-level optimization of a signal recognition system","authors":"M. Demertzi, P. Diniz, Mary W. Hall, A. Gilbert, Yi Wang","doi":"10.1109/IPDPS.2008.4536402","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536402","url":null,"abstract":"This paper evaluates the potential of exploiting computation reuse in a signal recognition system that is jointly optimized from mathematical representation, algorithm design and final implementation. Walsh wavelet packets in conjunction with a BestBasis algorithm are used to derive transforms that discriminate between signals. The FPGA implementation of this computation exploits the structure of the resulting transform matrices in several ways to derive a highly optimized hardware representation of this signal recognition problem. Specifically, we observe in the transform matrices a significant amount of reuse of subrows, thus indicating redundant computation. Through analysis of this reuse, we discover the potential for a 3times reduction in the amount of computation of combining a transform matrix and signal. In this paper, we focus on how the implementation might exploit this reuse in a profitable way. By exploiting a subset of this computation reuse, the system can navigate the tradeoff space of reducing computation and the extra storage required.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129898007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online scheduling in grids","authors":"U. Schwiegelshohn, Andrei Tchernykh, R. Yahyapour","doi":"10.1109/IPDPS.2008.4536273","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536273","url":null,"abstract":"This paper addresses nonclairvoyant and non-preemptive online job scheduling in Grids. In the applied basic model, the grid system consists of a large number of identical processors that are divided into several machines. Jobs are independent, they have a fixed degree of parallelism, and they are submitted over time. Further, a job can only be executed on the processors belonging to the same machine. It is our goal to minimize the total makespan. We show that the performance of Garey and Graham's list scheduling algorithm is significantly worse in grids than in multiprocessors. Then we present a Grid scheduling algorithm that guarantees a competitive factor of 5. This algorithm can be implemented using a \"job stealing\" approach and may be well suited to serve as a starting point for Grid scheduling algorithms in real systems.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127030142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reusable context pipelining for low power coarse-grained reconfigurable architecture","authors":"Yoonjin Kim, R. Mahapatra","doi":"10.1109/IPDPS.2008.4536523","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536523","url":null,"abstract":"Coarse-grained reconfigurable architectures (CGRA) require many processing elements and a configuration memory unit (configuration cache) for reconfiguration of the ALU array elements. This structure consumes significant amount of power. Power reduction during reconfiguration is necessary for the reconfigurable architecture to be used as a competitive IP core in embedded systems. In this paper, we propose a power-conscious reusable context pipelining architecture for CGRA that efficiently reduces power consumption in configuration cache without performance degradation. Experimental results show that the proposed approach saves up to 57.97% of the total power consumed in the configuration cache with reduced configuration cache size compared to the previous approach.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123942183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gedae’s automated management of hierarchical memories on multicore processors Commercial Tutorial","authors":"W. Lundgren","doi":"10.1109/IPDPS.2008.4536578","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536578","url":null,"abstract":"Multicore processors present new programming challenges even to those with experience programming parallel and distributed systems. Gedae offers improved productivity and an expanded developer pool for these architectures by automating many of the difficult and tedious issues such as threading, deadlock avoidance, planning of memory use, and runtime observability. Gedae is able to address these issues will still creating highly efficient applications through the automatic incorporation of target-optimized compute kernels and minimal impact of the Gedae scheduler during runtime. The focus of this tutorial is one of those challenges heightened by the advent of multicores - the management of hierarchical memories. Because multiple cores are being brought together on the limited real estate of a single chip, there is limited room to provide core-specific memory, bringing about programming challenges for the software developer. An example is the cell broadband engine (Cell/B.E.) processor. The Cell/B.E. processor combines 8 synergistic processing elements (SPEs) with one power processing element (PPE). The SPEs each have a small, 256 kB local storage, while the system has a larger monolithic memory available to all PEs. Accessing the system memory from the SPEs must utilize a single memory interface with limited bandwidth compared to the element interconnect bus (EIB) between the SPEs. Programming this hierarchical memory involves manual management of the SPEs' local storage, overlapping of memory puts and gets with computation, and special consideration of alignment issues to provide high performance. While the Cell/B.E. memory structure presents special programming considerations, other multicores also utilize hierarchical memory structures, such as the use of core-specific multilayered cache on Intel Core 2 and Tilera Tile64 processors. Gedae is a programming language, compiler, and analysis tools that provide a method for specifying the use of hierarchical memory and automating the use of these memories. This tutorial will introduce the concepts of managing hierarchical memories on multicore processors, discuss how those issues affect programming the processors, illustrate Gedae's solution for programming these memories, and walk through example applications that show how Gedae automatically manages these issues.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116266134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient automated marshaling of C++ data structures for MPI applications","authors":"Wesley Tansey, E. Tilevich","doi":"10.1109/IPDPS.2008.4536307","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536307","url":null,"abstract":"We present an automated approach for marshaling C++ data structures in high performance computing (HPC) applications. Our approach utilizes a graphical editor through which the user can express a subset of an object's state to be marshaled and sent across a network. Our tool, MPI serializer, then automatically generates efficient marshaling and unmarshaling code for use with the message passing interface (MPI), the predominant communication middleware for HPC systems. Our approach provides a more comprehensive level of support for C++ language features than the existing state of the art, and does so in full compliance with the C++ language standard. Specifically, we can marshal effectively and efficiently non-trivial language constructs such as polymorphic pointers, dynamically allocated arrays, non-public member fields, inherited members, and STL container classes. Additionally, our marshaling approach is also applicable to third party libraries, as it does not require any modifications to the existing C++ source code. We validate our approach through two case studies of applying our tool to automatically generate the marshaling functionality of two realistic HPC applications. The case studies demonstrate that the automatically generated code matches the performance of typical hand-written implementations and surpasses current state-of-the-art C++ marshaling libraries, in some cases by more than an order of magnitude. The results of our case studies indicate that our approach can be beneficial for both the initial construction of HPC applications as well as for the refactoring of sequential applications for parallel execution.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"1 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121660985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed, heterogeneous resource management using artificial immune systems","authors":"Lucas A. Wilson","doi":"10.1109/IPDPS.2008.4536363","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536363","url":null,"abstract":"As high performance and distributed computing become more important tools for enabling scientists and engineers to solve large computational problems, the need for methods to fairly and efficiently schedule tasks across multiple, possibly geographically distributed, computing resources becomes more crucial. Given the nature of distributed systems and the immense numbers of resources to be managed in distributed and large-scale cluster environments, traditional centralized schedulers will not be extremely effective at providing timely scheduling information. In order to manage large numbers of resources quickly, less computationally intensive methods for scheduling tasks must be explored. This paper proposes a novel resource management system based on the immune system metaphor, making use of the concepts in Immune Network Theory and Danger Theory. By emulating various elements in the immune system, the proposed manager could efficiently execute tasks on very large systems of heterogeneous resources across geographic and/or administrative domains. The distributed nature of the immune system is also exploited in order to allow efficient scheduling of tasks, even in extremely large environments, without the use of a centralized or hierarchical scheduler.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"320 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113966961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jesus Luna, Michail Flouris, M. Marazakis, A. Bilas
{"title":"Providing security to the Desktop Data Grid","authors":"Jesus Luna, Michail Flouris, M. Marazakis, A. Bilas","doi":"10.1109/IPDPS.2008.4536443","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536443","url":null,"abstract":"Volunteer computing is becoming a new paradigm not only for the computational grid, but also for institutions using production-level data grids because of the enormous storage potential that may be achieved at a low cost by using commodity hardware within their own computing premises. However, this novel \"Desktop Data Grid\" depends on a set of widely distributed and untrusted storage nodes, therefore offering no guarantees about neither availability nor protection to the stored data. These security challenges must be carefully managed before fully deploying desktop data grids in sensitive environments (such as eHealth) to cope with a broad range of storage needs, including backup and caching. In this paper we propose a cryptographic protocol able to fulfil the storage security requirements related with a generic desktop data grid scenario, which were identified after applying an analysis framework extended from our previous research on the data grid's storage services. The proposed protocol uses three basic mechanisms to accomplish its goal: (a) symmetric cryptography and hashing, (b) an information dispersal algorithm and the novel (c) \"quality of security\" (QoSec) quantitative metric. Although the focus of this work is the associated protocol, we also present an early evaluation using an analytical model. Our results show a strong relationship between the assurance of the data at rest, the QoSec of the volunteer storage client and the number of fragments required to rebuild the original file.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126426514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}