E. Tavares, R. Barreto, Meuse N. Oliveira, P. Maciel, Marília Neves, R. Lima
{"title":"An approach for pre-runtime scheduling in embedded hard real-time systems with power constraints","authors":"E. Tavares, R. Barreto, Meuse N. Oliveira, P. Maciel, Marília Neves, R. Lima","doi":"10.1109/CAHPC.2004.7","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.7","url":null,"abstract":"Embedded hard real-time systems have stringent timing constraints that must be satisfied for the correct functioning of the system. Hence all tasks must be finished before their deadlines. In addition, there are systems where energy is another constraint that must also be satisfied. In this paper, a pre-runtime scheduling algorithm is presented in order to find schedules satisfying both timing and energy constraints. The proposed approach uses state space exploration for finding pre-runtime schedules. However, the main problem with such methods is the space size, which can exponentially grow. This paper tackles this problem through a depth-first search method for generating a partial timed labeled transition system derived from the time Petri net model.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"5 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120986499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining a shared-memory high performance computer and a heterogeneous cluster for the simulation of light interaction with human skin","authors":"A. Krishnaswamy, G. Baranoski","doi":"10.1109/CAHPC.2004.13","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.13","url":null,"abstract":"When light interacts with human skin, a complex and involved process begins as the light is absorbed and propagated by cells, fibers and other microscopic materials. This interaction happens countless times each day and its accurate simulation is essential to biomedical and computer graphics applications. Simulating this interaction is computationally intensive, yet highly suitable to parallelization. This paper describes the use of both a shared-memory high performance computer and a heterogeneous cluster to accelerate these simulations. With a description of the parallel software used, we present results to show the performance gains from using such a hybrid approach.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134408888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcus Bartholomeu, R. Azevedo, S. Rigo, G. Araújo
{"title":"Optimizations for compiled simulation using instruction type information","authors":"Marcus Bartholomeu, R. Azevedo, S. Rigo, G. Araújo","doi":"10.1109/CAHPC.2004.28","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.28","url":null,"abstract":"The design of new architectures can be simplified with the use of retargetable instruction set simulation tools, which can validate the design decisions in the design exploration cycle with high flexibility and reduced cost. The growing system complexity makes the traditional approach inefficient for today's architectures. Compiled simulation techniques make use of a priori knowledge to accelerate the simulation, with the highest efficiency achieved by employing static scheduling techniques. This paper presents our approach to the static scheduling compiled simulation technique that is 90% faster than the best published performance results. It also introduces two novel optimization techniques based on instruction type information that further increase the simulation speed by more than 100%. The so-called fast static compiled simulation (FSCS) technique applicability will be demonstrated by the use of the SPARC and MIPS architectures.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116139607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel adaptive mesh coarsening for seismic tomography","authors":"M. Grunberg, S. Genaud, C. Mongenet","doi":"10.1109/CAHPC.2004.29","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.29","url":null,"abstract":"Seismic tomography enables to model the internal structure of the Earth. In order to improve the precision of existing models, a huge amount of acquired seismic data must be analyzed. The analysis of such massive data requires a considerable computing power, which can only be delivered by parallel computational equipments. Yet, parallel computation is not sufficient for the task: we also need algorithms to automatically concentrate the computations on the most relevant data parts. The objective of the paper is to present such an algorithm. From an initial regular mesh in which cells carry data with varying relevance, we present a method to aggregate elementary cells so as to homogenize the relevance of data. The result is an irregular mesh, which has the advantage over the initial mesh of having orders of magnitude less cells while preserving the geophysical meaning of data. We present both a sequential and a parallel algorithm to solve this problem under the hypotheses and constraints inherited from the geophysical context.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120962743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-monitored adaptive cache warm-up for microprocessor simulation","authors":"Yue Luo, L. John, L. Eeckhout","doi":"10.1109/CAHPC.2004.38","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.38","url":null,"abstract":"Simulation is the most important tool for computer architects to evaluate the performance of new computer designs. However, detailed simulation is extremely time consuming. Sampling is one of the techniques that effectively reduce simulation time. In order to achieve accurate sampling results, microarchitectural structure must be adequately warmed up before each measurement. In this paper, a new technique for warming up microprocessor caches is proposed. The simulator monitors the warm-up process of the caches and decides when the caches are warmed up based on simple heuristics. In our experiments the self-monitored adaptive (SMA) warm-up technique on average exhibits only 0.2% warm-up error in CPI. SMA achieves smaller warm-up error with only 1/2-1/3 of the warm-up length of previous methods. In addition, it is adaptive to the cache configuration simulated. For simulating small caches, the SMA technique can reduce the warm-up overhead by an order of magnitude compared to previous techniques. Finally, SMA gives the user some indicator of warm-up error at the end of the cycle-accurate simulation that helps the user to gauge the accuracy of the warm-up.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116663375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study of errant pipeline flushes caused by value misspeculation","authors":"D. Balkan, J. Kalamatianos, D. Kaeli","doi":"10.1109/CAHPC.2004.6","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.6","url":null,"abstract":"Value speculation has been proposed as a technique that can overcome true data dependencies, hide memory latencies, and expose higher degrees of instruction level parallelism (ILP). Branch direction prediction and target address prediction are two widely used control speculation techniques aimed at providing a steady stream of instructions to the instruction window. In this paper we consider a load value predictor used together with an aggressive branch predictor microarchitecture and investigate the effects of load value misspeculations on branch resolution. We study the performance impact of the interaction of these mechanisms and characterize the occurence of these events in a multiple issue, out-of-order, superscalar pipeline. We perform execution-driven studies using integer benchmarks taken from the SPECint2000, SPECint95 and Olden suites. We show that IPC can deteriorate by as much as 4.7% due to unnecessary pipeline flushes caused by branch resolutions that use speculative data. This paper also proposes a mechanism that can prevent these unnecessary squashes from occurring.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114665326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FlowCert : probabilistic certification for peer-to-peer computations","authors":"S. Varrette, Jean-Louis Roch, Franck Leprévost","doi":"10.1109/CAHPC.2004.17","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.17","url":null,"abstract":"Large scale cluster, peer-to-peer computing systems and grid computer systems gather thousands of nodes for computing parallel applications. At this scale, it raises the problem of the result checking of the parallel execution of a program on an unsecured grid. This domain is the object of numerous works, either at the hardware or at the software level. We propose here an original software method based on the dynamic computation of the data-flow associated to a partial execution of the program on a secure machine. This data-flow is a summary of the execution: any complete execution of the program on an unsecured remote machine with the same inputs supplies a flow which summary has to correspond to the one obtained by partial execution.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123888074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Cirne, F. Brasileiro, L. Costa, D. D. Silva, E. Santos-Neto, N. Andrade, C. Rose, T. Ferreto, M. Mowbray, R. Scheer, João Jornada
{"title":"Scheduling in Bag-of-Task grids: the PAUA case","authors":"W. Cirne, F. Brasileiro, L. Costa, D. D. Silva, E. Santos-Neto, N. Andrade, C. Rose, T. Ferreto, M. Mowbray, R. Scheer, João Jornada","doi":"10.1109/CAHPC.2004.37","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.37","url":null,"abstract":"In this paper we discuss the difficulties involved in the scheduling of applications on computational grids. We highlight two main sources of difficulties: 1) the size of the grid rules out the possibility of using a centralized scheduler; 2) since resources are managed by different parties, the scheduler must consider several different policies. Thus, we argue that scheduling applications on a grid require the orchestration of several schedulers, with possibly conflicting goals. We discuss how we have addressed this issue in the context of PAUA, a grid for Bag-of-Tasks applications (i.e. parallel applications whose tasks are independent) that we are currently deploying throughout Brazil.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126893096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance characterisation of intra-cluster collective communications","authors":"L. Steffenel, G. Mounié","doi":"10.1109/CAHPC.2004.32","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.32","url":null,"abstract":"Although recent works try to improve collective communication in grid systems by separating intra and inter-cluster communication, the optimisation of communications focus only on inter-cluster communications. We believe, instead, that the overall performance of the application may be improved if intra-cluster collective communications performance is known in advance. Hence, it is important to have an accurate model of the intra-cluster collective communications, which provides the necessary evidences to tune and to predict their performance correctly. In this paper we present our experience on modelling such communication strategies. We describe and compare different implementation strategies with their communication models, evaluating the models' accuracy and describing the practical challenges that can be found when modelling collective communications.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129838714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}