{"title":"Predicting Loop Termination to Boost Speculative Thread-Level Parallelism in Embedded Applications","authors":"Md. Mafijul Islam","doi":"10.1109/SBAC-PAD.2007.23","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.23","url":null,"abstract":"The necessity of devising novel thread-level speculation (TLS) techniques has become extremely important with the growing acceptance of multi-core architectures by the industry. However, the achievable performance to commensurate the actual potential of TLS is limited by the thread-management overhead. In this paper, we have exploited the run-time behavior of the performance-critical loops to minimize such overhead to improve the performance using embedded applications. We have shown that an average speedup of 2.4 is achievable on a 4-way machine which supports TLS, but has no special mechanism to predict the loop trip count. Then we have augmented the machine with the perfect knowledge of the loop trip count and obtained an average speedup of 2.6. Finally, we have incorporated a simple stride predictor to predict the loop trip count dynamically. The proposed predictor has an average prediction accuracy of 96% and the machine then yields an average speedup of 2.5 for the chosen applications.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121402694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Node Level Primitives for Parallel Exact Inference","authors":"Yinglong Xia, V. Prasanna","doi":"10.1109/SBAC-PAD.2007.18","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.18","url":null,"abstract":"We present node level primitives for parallel exact inference on an arbitrary Bayesian network. We explore the probability representation on each node of Bayesian networks and each clique of junction trees. We study the operations with respect to these probability representations and categorize the operations into four node level primitives: table extension, table multiplication, table division, and table marginalization. Exact inference on Bayesian networks can be implemented based on these node level primitives. We develop parallel algorithms for the above and achieve parallel computational complexity of O(omega2r(omega+1)N/p), O(Nromega) space complexity and scalability up to O(romega), where N is the number of cliques in the junction tree, r is the number of states of a random variable, w is the maximal size of the cliques, and p is the number of processors. Experimental results illustrate the scalability of our parallel algorithms for each of these primitives.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115893868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Silva-Filho, Carmelo J. A. Bastos Filho, Ricardo Massa Ferreira Lima, D. Falcão, F. Cordeiro, Marília P. Lima
{"title":"An Intelligent Mechanism to Explore a Two-Level Cache Hierarchy Considering Energy Consumption and Time Performance","authors":"A. Silva-Filho, Carmelo J. A. Bastos Filho, Ricardo Massa Ferreira Lima, D. Falcão, F. Cordeiro, Marília P. Lima","doi":"10.1109/SBAC-PAD.2007.14","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.14","url":null,"abstract":"Cache memory hierarchy contributes positively to system performance. Moreover, tuning cache architectures in platforms for embedded applications can dramatically reduce energy consumption. This paper presents an automated method for adjusting two-level cache memory hierarchy intended for data caches in order to reduce energy consumption and improve the performance of embedded applications. We propose an automated mechanism called TEMGA (Two-level cache Exploration Mechanism based on Genetic Algorithm), to determine the suitable cache hierarchy configuration by exploring a small part of search space. In our experiments, we applied the proposed mechanism to 12 different benchmarks from the MiBench suite. The results show an average reduction of about 15% in the energy consumption for data caches when compared to existing heuristics and a reduction of 5 times in the number of cycles needed to execute applications from Mibench Benchmark suite.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123247360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Pereira, P. Vargas, M. D. Castro, F. França, I. Dutra
{"title":"Automatic Constraint Partitioning to Speed Up CLP Execution","authors":"M. Pereira, P. Vargas, M. D. Castro, F. França, I. Dutra","doi":"10.1109/SBAC-PAD.2007.29","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.29","url":null,"abstract":"Speedup in distributed executions of constraint logic programming (CLP) applications are directed related to a good constraint partitioning algorithm. In this work we study different mechanisms to distribute constraints to processors based on straightforward mechanisms such as round-robin and block distribution, and on a more sophisticated automatic distribution method, grouping-sink, that takes into account the connectivity of the constraint network graph. This aims at reducing the communication overhead in distributed environments. Our results show that grouping-sink is, in general, the best alternative for partitioning constraints as it produces results as good or better than round-robin or blocks with low communication rate.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114864566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. P. Pezzi, M. C. Cera, E. Mathias, N. Maillard, P. Navaux
{"title":"On-line Scheduling of MPI-2 Programs with Hierarchical Work Stealing","authors":"G. P. Pezzi, M. C. Cera, E. Mathias, N. Maillard, P. Navaux","doi":"10.1109/SBAC-PAD.2007.36","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.36","url":null,"abstract":"MPI (Message Passing Interface) is the de facto standard in High Performance Computing. By using some MPI- 2 new features, such as the dynamic creation of processes, it is possible to implement highly efficient parallel programs that can run on dynamic and/or heterogeneous resources, provided a good schedule of the processes can be computed at run-time. A classical solution to schedule parallel programs on-line is Work Stealing. However, its use with MPI- 2 is complicated by a restricted communication scheme between the processes: namely, spawned processes in MPI-2 can only communicate with their direct parents. This work presents an on-line scheduling algorithm, called Hierarchical Work Stealing, to obtain good load-balancing of MPI- 2 programs that follow a Divide & Conquer strategy. Experimental results are provided, based on a synthetic application, the N-Queens computation. The results show that the Hierarchical Work Stealing algorithm enables the use of MPI with high efficiency, even in parallel dynamic HPC platforms that are not as homogeneous as clusters.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114335776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Imposed Temporal Redundancy: An Efficient Technique to Enhance the Reliability of Pipelined Functional Units","authors":"E. Mizan, Tileli Amimeur, M. Jacome","doi":"10.1109/SBAC-PAD.2007.39","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.39","url":null,"abstract":"Temporal redundancy (TR) improves the reliability of computational functional units (FUs). However, it can guarantee detection of transient errors only, and may have a substantial power and area overhead. In this paper we present self-imposed temporal redundancy (SITR), a form of TR that can be applied to pipelined FUs and does not suffer from the aforementioned problems. A SITR-enhanced FU forces redundant computations to fire in consecutive cycles and requires a single additional cycle for the second computation and the comparison of the two results. We evaluate the power and area overhead of SITR and conclude that is always smaller than that of standard TR and that it does not depend on the FU complexity. We also use SITR to improve the reliability of the execution datapath of a simple out-of-order engine, typical of that used in high reliability embedded systems and future many-core architectures. Our simulations show that SITR outperforms TR, especially in FP applications. When the number of integer ALUs is larger than the machine width, the performance penalty of SITR is consistently less than 10%.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125155466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas S. Casagrande, Rodrigo Fernandes de Mello, Ricardo Bertagna, J. A. A. Filho, Francisco José Monaco
{"title":"Exigency-based real-time scheduling policy to provide absolute QoS for web services","authors":"Lucas S. Casagrande, Rodrigo Fernandes de Mello, Ricardo Bertagna, J. A. A. Filho, Francisco José Monaco","doi":"10.1109/SBAC-PAD.2007.21","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.21","url":null,"abstract":"Telemedicine, distance learning and e-commerce applications impose time constraints directly related to the efficacy of their operations. In order to offer reliability levels capable of meeting such requirements, mechanisms to provide QoS have been widely employed, what motivates this work to propose, implement and validate a real-time scheduling policy for providing absolute QoS for web services. The policy, named Exigency-Based Scheduling (EBS), intends to fast serve the most urgent requests, without degrading the whole system service. The current approach is based on the real-time scheduling, low latency and feedback scheduling, allowing a balanced configuration by the quantification of the exigency imposed to the system by the service classes. The technique evaluation uses metrics proposed in the present work. Experimental results confirm improvements in terms of QoS and client satisfaction.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126957538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Component-Oriented Support for Hierarchical MPI Programming on Multi-Cluster Grid Environments","authors":"E. Mathias, F. Baude, Vincent Cavé, N. Maillard","doi":"10.1109/SBAC-PAD.2007.37","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.37","url":null,"abstract":"In this paper, we present a proposal for hierarchical MPI programming through some intuitive extensions to the MPI standard that may help users to develop non-embarrassingly parallel grid applications in a topology- aware manner. Afterwards, we present the design of such a support based upon a component model suited to grid computing (the EU CoreGRID grid component model - GCM - and its implementation in the ProActive grid environment) to handle inter-cluster and group communications. The usage of such components to handle high-level data distribution, parallelism and synchronization seems to be the most adequate technology to support MPI primitives in multi-cluster grids as they provide a built-in support to the encapsulation of native code, collective interfaces, tunneling of communications and a hierarchical and adaptable structure. The preliminary results have shown that the overhead is not negligible, but within the expected range. However we can expect the benefits to applications to bypass the generated overhead.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"36 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131550211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Javier Echaiz, Jorge Ardenghi, Guillermo R. Simari
{"title":"A Novel Algorithm for Indirect Reputation-Based Grid Resource Management","authors":"Javier Echaiz, Jorge Ardenghi, Guillermo R. Simari","doi":"10.1109/SBAC-PAD.2007.24","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2007.24","url":null,"abstract":"A computational grid is a distributed infrastructure that appears to the end user as one large computing resource across organization boundaries. Grid technologies enable large-scale sharing of resources within formal or informal consortia of individuals and/or institutions, usually called virtual organizations. In these settings, the discovery, characterization, management, and monitoring of resources, services, and computations can be challenging due to the considerable diversity, large numbers, dynamic behavior, and geographical distribution of the entities in which a user might be interested. Trust is one of the biggest concerns in the grid resource management field. Grid systems can employ reputation mechanisms in order to provide this essential trust, but not usually without incurring in certain additional costs that negate the potential performance gains offered by grid computing technologies. Moreover, current reputation mechanisms are not appropriate for resource management in large-scale systems. In this paper, we present a new reputation model for resource management based on a economy model. Also we demonstrate how it can by employed to add trust into algorithms for grid scheduling. Finally, we simulate the proposed resource management algorithm in order to verify its effectiveness.","PeriodicalId":261956,"journal":{"name":"19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126019641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}