{"title":"Supporting the hard real-time requirements of mechatronic systems by 2-level interrupt service management","authors":"Christian Siemers, R. Falsett, R. Seyer, K. Ecker","doi":"10.1109/IPDPS.2003.1213236","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213236","url":null,"abstract":"Mechatronic systems often require hard real-time behaviour of the controlling system. The standard solution for this kind of application is based on the time-triggered approach, and for certain circumstances the schedulability is provable. In contrast to this, the approach in this paper introduces some hardware enhancements that allow first to substitute the time-triggered system by an event-triggered system and second to enhance the event-triggered system by a two-level reaction system while conserving the hard real-time capabilities. This results in a hard-time-but-weak-logic reaction system when computing time is tide but maintains full processing capabilities and therefore exact reaction values for all reactions whenever possible. Combining two or more events will improve the theoretical schedulability of the system too.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125331124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new DMA registration strategy for pinning-based high performance networks","authors":"Christian Bell, D. Bonachea","doi":"10.1109/IPDPS.2003.1213363","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213363","url":null,"abstract":"This paper proposes anew memory registration strategy for supporting Remote DMA (RDMA) operations over pinning-based networks, as existing approaches are insufficient for efficiently implementing Global Address Space (GAS) languages. Although existing approaches often maximize bandwidth, they require levels of synchronization that discourage one-sided communication, and can have significant latency costs for small messages. The proposed Firehose algorithm attempts to expose one-sided, zero-copy communication as a common case, while minimizing the number of host-level synchronizations required to support remote memory operations. The basic idea is to reap the performance benefits of a pin-everything approach in the common case (without the drawbacks) and revert to a rendezvous-based approach to handle the uncommon case. In all cases, the algorithm attempts to amortize the cost of synchronization and pinning over multiple remote memory operations, improving performance over rendezvous by avoiding many handshaking messages and the cost of re-pinning recently used pages. Performance results are presented which demonstrate that the cost of two-sided handshaking and memory registration is negligible when the set of remotely referenced memory pages on a given node is smaller than the physical memory (where the entire working set can remain pinned), and for applications with larger working sets the performance degrades gracefully and consistently outperforms conventional approaches.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116037566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximation in non-product form multiple queue systems","authors":"N. Thomas","doi":"10.1109/IPDPS.2003.1213507","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213507","url":null,"abstract":"In this paper a class of finite length Markovian queueing models is studied that, in general, does not exhibit a product form solution. Good approximations can be derived for the marginal queue size distributions in this case, and hence measures such as the average response time can be calculated. However, because no product form exists, expressions for the joint queue size distribution are much more costly to derive, hence many performance measures of interest cannot be easily computed. An approximation for the joint queue size distributions is explored here, which improves on a naive product form assumption by considering various boundary cases. This approximation is explored numerically by example.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116048824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
François Cantonnet, Yiyi Yao, Smita Annareddy, A. Mohamed, T. El-Ghazawi
{"title":"Performance monitoring and evaluation of a UPC implementation on a NUMA architecture","authors":"François Cantonnet, Yiyi Yao, Smita Annareddy, A. Mohamed, T. El-Ghazawi","doi":"10.1109/IPDPS.2003.1213492","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213492","url":null,"abstract":"UPC is an explicit parallel extension of ANSI C, which has been gaining rising attention from vendors and users. In this paper, we consider the low-level monitoring and experimental performance evaluation of a new implementation of the UPC compiler on the SGI Origin family of NUMA architectures. These systems offer many opportunities for the high-performance implantation of UPC They also offer, due to their many hardware monitoring counters, the opportunity for low-level performance measurements to guide compiler implementations. Early, UPC compilers have the challenge of meeting the syntax and semantics requirements of the language. As a result, such compilers tend to focus on correctness rather than on performance. In this paper, we report on the performance of selected applications and kernels under this new compiler. The measurements were designed to help shed some light on the next steps that should be taken by UPC compiler developers to harness the full performance and usability potential of UPC under these architectures.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122659019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The unlinkability of randomization-enhanced Chaum's blind signature scheme","authors":"Zichen Li, Junmei Zhang, W. Kou","doi":"10.1109/IPDPS.2003.1213443","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213443","url":null,"abstract":"The key issue in e-commerce security is digital signature. Chaum first proposed the concept of blind digital signature, and designed untraceable payments. To avoid threats from chosen-message attacks presented by Coron et al. (1999), Fan et al. (2000) proposed a randomization enhanced Chaum blind signature scheme, by injecting a random factor into messages. In this paper, we first formally define the unlinkability of the blind signature scheme. According to this definition, we prove that Fan's scheme does not possess the unlinkablity property: after the message and signature have been revealed to the public by the sender, the signer can trace the corresponding blinded message and signature by constructing a linkage between the message and the blind message. Therefore, Fan's scheme cannot provide true blind signatures.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121993701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New dynamic heuristics in the client-agent-server model","authors":"Y. Caniou, E. Jeannot","doi":"10.1109/IPDPS.2003.1213200","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213200","url":null,"abstract":"MCT is a widely used heuristic for scheduling tasks onto Grid platforms. However, when dealing with many tasks, MCT tends to dramatically delay already mapped task completion time, while scheduling a new task. In this paper we propose heuristics based on two features: the historical trace manager that simulates the environment and the perturbation that defines the impact a new allocated task has on already mapped tasks. Our simulations and experiments on a real environment show that the proposed heuristics outperform MCT.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122099496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Gardner, M. Broxton, Adam Engelhart, Wu-chun Feng
{"title":"MUSE: a software oscilloscope for clusters and grids","authors":"M. Gardner, M. Broxton, Adam Engelhart, Wu-chun Feng","doi":"10.1109/IPDPS.2003.1213096","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213096","url":null,"abstract":"Oscilloscopes and their cousins, logic analyzers, are the tools of choice for difficult electronic hardware problems. In the hands of a skilled engineer or technician, these tools can be used to solve stubborn problems. The key to the utility of oscilloscopes is the depth of detail they provide and their flexibility, which allows the level of detail to be adjusted to fit the task at hand. Distributed applications, which run on computing clusters and computational grids, are also complex and difficult to tame. We need tools to understand their complexities and the ability to choose the level of detail to fit the task, whether the task be debugging, tuning, monitoring or controlling. The MAGNET User-Space Environment (MUSE) has been designed as a \"software oscilloscope\" for computing clusters and computational grids. It is a toolkit for applications and developers to obtain detailed information about the environment on the host. The information can be used on-line or saved for off-line analysis. It has low overhead and allows the level of detail to be adjusted. Furthermore, MUSE monitors without requiring the modification or relinking of applications. It has been designed to make it easy to develop \"adaptive applications\" - applications that are aware of their environment and can adapt to changes.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128328729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved methods for divisible load distribution on k-dimensional meshes using pipelined communications","authors":"Keqin Li","doi":"10.1109/IPDPS.2003.1213185","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213185","url":null,"abstract":"We give the closed form solutions to the parallel time and speedup of the classic method for processing divisible loads on linear arrays as functions of N, the network size. We propose two methods which employ pipelined communications to distribute divisible loads on linear arrays. We derive the closed form solutions to the parallel time and speedup for both methods and show that the asymptotic speedup of both methods is /spl beta/+1, where /spl beta/ is the ratio of the time for computing a unit load to the time for communicating a unit load Such performance is even better than that of the known methods on k-dimensional meshes with k>1. The two new algorithms which use pipelined communications are generalized to distribute divisible loads on k-dimensional meshes, and we show that the asymptotic speedup of both algorithms is k/spl beta/+1, where k/spl ges/1. We also prove that on k-dimensional meshes where k/spl ges/1, as the network size becomes large, the asymptotic speedup of 2k/spl beta/+1 can be achieved for processing divisible loads by using interior initial processors.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128540244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guilin Chen, B. Kang, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, R. Chandramouli
{"title":"Energy-aware compilation and execution in Java-enabled mobile devices","authors":"Guilin Chen, B. Kang, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, R. Chandramouli","doi":"10.1109/IPDPS.2003.1213116","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213116","url":null,"abstract":"Java-enabled wireless devices are preferred for various reasons such as enhanced user experience and the support for dynamically downloading applications on demand. The dynamic download capability supports extensibility of the mobile client features and centralizes application maintenance at the server. Also, it enables service providers to customize features for the clients. In this work, we extend this client-server collaboration further by offloading some of the computations (i.e., method execution and dynamic compilation) normally performed by the mobile client to the resource-rich server in order to conserve energy consumed by the client in a wireless Java environment. In the proposed framework, the object serialization feature of Java is used to allow offloading of both method execution and bytecode-to-native code compilation to the server when executing a Java application. Our framework takes into account communication, computation and compilation energies to dynamically decide where to compile and execute a method (locally or remotely) and how to execute it (using interpretation or just-in-time compilation with different levels of optimizations).","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129002823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Continuous compilation: a new approach to aggressive and adaptive code transformation","authors":"B. Childers, J. Davidson, M. Soffa","doi":"10.1109/IPDPS.2003.1213375","DOIUrl":"https://doi.org/10.1109/IPDPS.2003.1213375","url":null,"abstract":"Over the past several decades, the compiler research community has developed a number of sophisticated and powerful algorithms for a variety of code improvements. While there are still promising directions for particular optimizations, research on new or improved optimizations is reaching the point of diminishing returns and new approaches are needed to achieve significant performance improvements beyond traditional optimizations. In this paper, we describe a new strategy based on a continuous compilation system that constantly improves application code by applying aggressive and adaptive code optimizations at all times, from static optimization to online dynamic optimization. In this paper, we describe our general approach and process for continuous compilation of application code. We also present initial results from our research with continuous compilation. These initial results include a new prediction framework that can estimate the benefit of applying code transformations without actually doing the transformation. We also describe results that demonstrate the benefit of adaptively changing application code for embedded systems to make trade-offs between code size, performance, and power consumption.","PeriodicalId":177848,"journal":{"name":"Proceedings International Parallel and Distributed Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129132750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}