Guangyu Chen, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, M. Wolczko
{"title":"Tracking object life cycle for leakage energy optimization","authors":"Guangyu Chen, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, M. Wolczko","doi":"10.1145/944645.944701","DOIUrl":"https://doi.org/10.1145/944645.944701","url":null,"abstract":"The focus of this work is on utilizing the state of objects during their lifespan in optimizing the leakage energy consumed in the data caches when executing embedded Java applications. Our analysis reveals that a major portion of the leakage energy is actually wasted in retaining the objects beyond their last use. In order to eliminate this wastage, we investigate three approaches that use the garbage collector, escape analysis and last use analysis for reducing leakage energy. Finally, we track the access gap between successive object accesses to reduce leakage energy of live objects. A combination of these schemes is shown to provide 21% data cache leakage energy reduction in our default configuration.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"2017 32","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120847166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transaction level modeling: an overview","authors":"Lukai Cai, D. Gajski","doi":"10.1145/944645.944651","DOIUrl":"https://doi.org/10.1145/944645.944651","url":null,"abstract":"Recently, the transaction-level modeling has been widely referred to in system-level design community. However, the transaction-level models (TLMs) are not well defined and the usage of TLMs in the existing design domains, namely modeling, validation, refinement, exploration, and synthesis, is not well coordinated. This paper introduces a TLM taxonomy and compares the benefits of TLMs' use.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128377428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Cong, Yiping Fan, Guoling Han, Xun Yang, Zhiru Zhang
{"title":"Architecture and synthesis for multi-cycle on-chip communication","authors":"J. Cong, Yiping Fan, Guoling Han, Xun Yang, Zhiru Zhang","doi":"10.1145/944645.944667","DOIUrl":"https://doi.org/10.1145/944645.944667","url":null,"abstract":"There are two important infection points in the development of deep submicron (DSM) process technologies. The first point is when the average interconnect delay exceeds the gate delay, which happened during mid 1990s and led to the so-called timing closure problem. The second point is when single-cycle full chip synchronization is no longer possible, which is about to happen soon. It can be shown that, even with the aggressive interconnect optimization techniques (e.g., buffer insertion and wire-sizing), 5 clock cycles are still needed to go from corner-to-corner for the die of 28.3 mm /spl times/ 28.3 mm in the 0.07 /spl mu/m technology generation, assuming a 5.63 GHz clock by 2006 predicted in ITRS'01 (2001). This clearly suggests that multi-cycle on-chip communication is a necessity in multi-gigahertz synchronous designs. However, it is not supported in the current design tools and methodologies, as most of these implicitly assume that full chip synchronization in a single clock cycle is feasible. Our contributions are as follows: (i) we propose a regular distributed register (RDR) microarchitecture which offers high regularity and direct support of multi-cycle communication; (ii) we develop a set of novel architectural synthesis algorithms to efficiently synthesize behavior-level designs onto the RDR architecture.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126780788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pareto-optimization-based run-time task scheduling for embedded systems","authors":"Peng Yang, F. Catthoor","doi":"10.1145/944645.944680","DOIUrl":"https://doi.org/10.1145/944645.944680","url":null,"abstract":"Pareto-set-based optimization can be found in several different areas of embedded system design. One example is task scheduling, where different task mapping and ordering choices for a target platform will lead to different performance/cost tradeoffs. To explore this design space at runtime, a fast and effective heuristic is needed. We have modeled the problem as the well known Multiple Choice Knapsack Problem (MCKP) and have developed a fast greedy heuristic for the run-time task scheduling. To show the effectiveness of our algorithm, examples from randomly generated task graphs and realistic applications are studied. Compared to the optimal dynamic programming solver, the heuristic is more than ten times faster while the result is less than 5% away from the optimum. Moreover, due to its iterative feature, the algorithm is well suitable to be used as an online algorithm.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127095337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient retargetable framework for instruction-set simulation","authors":"Mehrdad Reshadi, N. Bansal, P. Mishra, N. Dutt","doi":"10.1145/944645.944649","DOIUrl":"https://doi.org/10.1145/944645.944649","url":null,"abstract":"Instruction-set structure (ISA) simulators are an integral part of today's processor and software design process. While increasing complexity of the architectures demands high performance simulation, the increasing variety of available architectures makes retargetability a critical feature of an instruction-set simulator. Retargetability requires generic models while high performance demands target specific customizations. To address these contradictory requirements, we have developed a generic instruction model and a generic decode algorithm that facilitates easy and efficient retargetability of the ISA-simulator for a wide range of processor architectures such as RISC, CISC, VLIW and variable length instruction set processors. The instruction model is used to generate compact and easy to debug instruction descriptions that are very similar to that of architecture manual. These descriptions are used to generate high performance simulators. The generation of the simulator is completely separate from the simulation engine. Hence, we can incorporate any fast simulation technique in our retargetable framework without losing performance. We illustrate the retargetability of our approach using two popular, yet different realistic architectures: the Sparc and the ARM.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121221402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accurate estimation of cache-related preemption delay","authors":"H. S. Negi, T. Mitra, Abhik Roychoudhury","doi":"10.1145/944645.944698","DOIUrl":"https://doi.org/10.1145/944645.944698","url":null,"abstract":"Multitasked real-time systems often employ caches to boost performance. However the unpredictable dynamic behavior of caches makes schedulability analysis of such systems difficult. In particular, the effect of caches needs to be considered for estimating the inter-task interference. As the memory blocks of different tasks can map to the same cache blocks, preemption of a task may introduce additional cache misses. The time penalty introduced by these misses is called the cache-related preemption delay (CRPD). In this paper, we provide a program path analysis technique to estimate CRPD. Our technique performs path analysis of both the preempted and the preempting tasks. Furthermore, we improve the accuracy of the analysis by estimating the possible states of the entire cache at each possible preemption point rather than estimating the states of each cache block independently. To avoid incurring high space requirements, the cache states can be maintained symbolically as a binary decision diagram. Experimental results indicate that we obtain tight CRPD estimates for realistic benchmarks.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128480981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware support for real-time operating systems","authors":"P. Kohout, B. Ganesh, B. Jacob","doi":"10.1145/944645.944656","DOIUrl":"https://doi.org/10.1145/944645.944656","url":null,"abstract":"The growing complexity of embedded applications and pressure on time-to-market has resulted in the increasing use of embedded real-time operating systems. Unfortunately, RTOSes can introduce a significant performance degradation. The paper presents the Real-Time Task Manager (RTM) - a processor extension that minimizes the performance drawbacks associated with RTOSes. The RTM accomplishes this by supporting, in hardware, a few of the common RTOS operations that are performance bottlenecks: task scheduling, time management, and event management. By exploiting the inherent parallelism of these operations, the RTM completes them in constant time, thereby significantly reducing RTOS overhead. It decreases both the processor time used by the RTOS and the maximum response time by an order of magnitude.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132795773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Early estimation of the size of VHDL projects","authors":"W. Fornaciari, F. Salice, D. Scarpazza","doi":"10.1145/944645.944699","DOIUrl":"https://doi.org/10.1145/944645.944699","url":null,"abstract":"The analysis of the amount of human resources required to complete a project is felt as a critical issue in any company of the electronics industry. In particular, early estimation of the effort involved in a development process is a key requirement for any cost-driven system-level design decision. In this paper, we present a methodology to predict the final size of a VHDL project on the basis of a high-level description, obtaining a significant indication about the development effort. The methodology is the composition of a number of specialized models, tailored to estimate the size of specific component types. Models were trained and tested on two disjoint and large sets of real VHDL projects. Quality-of-result indicators show that the methodology is both accurate and robust.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131369432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extending the SystemC synthesis subset by object-oriented features","authors":"E. Grimpe, F. Oppenheimer","doi":"10.1145/944645.944652","DOIUrl":"https://doi.org/10.1145/944645.944652","url":null,"abstract":"In this article we present an approach to object-oriented hardware design and synthesis based on SystemC. We give an introduction to an extended SystemC synthesis subset which we propose, and, in particular, its object-oriented features. We will also briefly outline our basic synthesis concepts for object-oriented hardware specifications. Finally, we present some examples for the application of the extended synthesis subset, which are directly processable by a first synthesis tool prototype which we have developed for this purpose.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133166942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design optimization of mixed time/event-triggered distributed embedded systems","authors":"T. Pop, P. Eles, Zebo Peng","doi":"10.1145/944645.944672","DOIUrl":"https://doi.org/10.1145/944645.944672","url":null,"abstract":"Distributed embedded systems implemented with mixed, event-triggered task sets, which communicate over bus protocols consisting of both static and dynamic phases, are emerging as the new standard in application areas such as automotive electronics. In a previous paper, we developed a holistic timing analysis and scheduling approach for this category of systems. Based on this result, new design problems are solved, which we identified as characteristic for such hybrid systems: partitioning of the system functionality into time-triggered and event-triggered domains and the optimization of parameters corresponding to the communication protocol. We addressed both problems in the context of a heuristic which performs mapping and scheduling of the system functionality. We demonstrate the efficiency of the proposed technique with extensive experiments.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129261263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}