Xiang Yun, Thidapat Chantem, R. Dick, X. Hu, L. Shang
{"title":"System-level reliability modeling for MPSoCs","authors":"Xiang Yun, Thidapat Chantem, R. Dick, X. Hu, L. Shang","doi":"10.1145/1878961.1879013","DOIUrl":"https://doi.org/10.1145/1878961.1879013","url":null,"abstract":"The reliability of multi-processor systems-on-chip (MPSoCs) is affected by several inter-dependent system-level and physical effects. Accurate and fast reliability modeling is a primary challenge in the design and optimization of reliable MPSoCs. This paper presents a reliability modeling framework that integrates device-, component-, and system-level models. This framework contains modules for electromigration, time-dependent dielectric breakdown, stress migration, and variable-amplitude thermal cycling. A new statistical reliability distribution is proposed for accurate characterization of components containing too few devices for an extreme value distribution to be appropriate. A hierarchical system-level survival lattice based Monte Carlo technique is used to estimate the temporal fault distributions of MPSoCs that use arbitrary static and dynamic reliability-enhancing redundancy schemes. Physical process variation, which may have a significant impact on MPSoC reliability, is considered in the model. The proposed modeling technique has 5% average error in mean time to failure and reduces simulation time by nearly 3 orders of magnitude relative to a non-hierarchical Monte Carlo technique.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129056048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Workload characterization and its impact on multicore platform design","authors":"P. Bogdan, R. Marculescu","doi":"10.1145/1878961.1879003","DOIUrl":"https://doi.org/10.1145/1878961.1879003","url":null,"abstract":"Networks-on-chip (NoCs) have been proposed as a scalable solution to solving the communication problem in multicore systems. Although the queuing-based approaches have been traditionally used for performance analysis purposes, they cannot properly account for many of the traffic characteristics (e.g., non-stationary, self-similarity, higher order statistics) that are crucial for multicore platform design when communication happens via the NoC approach. To overcome this limitation, we propose a mean field approach to analyze the traffic dynamics in multicore systems and show how the non-stationary effects of the NoC workload can be effectively captured; this is of fundamental significance for rethinking the very basis of multicore systems design. Moreover, our experimental results demonstrate that both network architecture and application characteristics are the main sources of power law behavior observed in network traffic. Our findings open new research directions into NoC optimization which require accurate models of time- and space-dependent traffic behavior.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127749307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
X. Hu, R. Murphy, S. Dosanjh, K. Olukotun, S. Poole
{"title":"Hardware/software co-design for high performance computing: Challenges and opportunities","authors":"X. Hu, R. Murphy, S. Dosanjh, K. Olukotun, S. Poole","doi":"10.1145/1878961.1878975","DOIUrl":"https://doi.org/10.1145/1878961.1878975","url":null,"abstract":"This special session aims to introduce to the hardware/software codesign community challenges and opportunities in designing high performance computing (HPC) systems. Though embedded system design and HPC system design have traditionally been considered as two separate areas of research, they in fact share quite some common features, especially as CMOS devices continue along their scaling trends and the HPC community hits hard power and energy limits. Understanding the similarities and differences between the design practices adopted in the two areas will help bridge the two communities and lead to design tool developments benefiting both communities.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132350262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power aware SID-based simulator for embedded multicore DSP subsystems","authors":"Cheng-Yen Lin, Po-Yu Chen, Chun-Kai Tseng, Chung-Wen Huang, Chia-Chien Weng, Chi-Bang Kuan, Shih-Han Lin, Shi-Yu Huang, Jenq-Kuen Lee","doi":"10.1145/1878961.1878981","DOIUrl":"https://doi.org/10.1145/1878961.1878981","url":null,"abstract":"The embedded multicore DSP systems are playing increasingly important role for consumer electronic design. Such systems try to optimize the objective for both performance and power with mobile devices. Embedded application developers will then devise designs to optimize embedded applications for not only performance but also power. However, currently there are no power metrics support for popular application design platforms such as QEMU and SID, where application developers develop their applications. This hinders application developers to help tune optimizations for power. In this paper, we propose a power aware simulation framework on embedded multicore DSP subsystems for SID framework. To the best of our knowledge, this is the first work to attempt to build a power aware simulator based on SID simulation framework. The power estimation flow includes two phases, IP level power modeling and system level power prower profiling. In the IP level power modeling, PowerMixerIP is employed to build up the power model for PAC DSP and major IPs. In the system level power profiling, we provide a power profiling hierarchy that meets the demand of embedded software developers. The granularity of power profiling can be configured to the whole simulation stage or any specific time slot in the simulation such as a dedicated function loop. In our experiments, DSP programs with SIMD intrinsics for DSPStone benchmark are examined with our proposed power aware simulator. In addition, a face detection application is deployed as a running example on multi-core DSP systems to show how our power simulator can be used to help collaborate with developers in the optimization process to illustrate views of power dissipations of applications.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128575618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing","authors":"J. Coole, G. Stitt","doi":"10.1145/1878961.1878966","DOIUrl":"https://doi.org/10.1145/1878961.1878966","url":null,"abstract":"Although hardware/software partitioning of embedded applications onto FPGAs is widely known to have performance and power advantages, FPGA usage has been typically limited to hardware experts, due largely to several problems: 1) difficulty of integrating hardware design tools into well-established software tool flows, 2) increasingly lengthy FPGA design iterations due to placement and routing, and 3) a lack of portability and interoperability resulting from device/platform-specific tools and bitfiles. In this paper, we directly address the last two problems by introducing intermediate fabrics, which are virtual reconfigurable architectures specialized for different application domains, implemented on top of commercial-off-the-shelf devices. Such specialization enables near-instantaneous placement and routing by hiding the complexity of fine-grained physical devices, while also enabling circuit portability across all devices that implement the intermediate fabric. When combined with existing work on runtime synthesis from software binaries, intermediate fabrics reduce the effects of all three problems by enabling transparent usage of COTS FPGAs by software designers. In this paper, we explore intermediate fabric architectures using specialization techniques to minimize area and performance overhead of the virtual fabric while maximizing routability and speedup of placement and routing. We present results showing an average placement and routing speedup of 554×, with an average area overhead of 10% and clock overhead of 18%, which corresponds to an average frequency of 195 MHz.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125974696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic, non-linear cache architecture for power-sensitive mobile processors","authors":"Garo Bournoutian, A. Orailoglu","doi":"10.1145/1878961.1878997","DOIUrl":"https://doi.org/10.1145/1878961.1878997","url":null,"abstract":"Today, mobile smartphones are expected to be able to run the same complex, algorithm-heavy, memory-intensive applications that were originally designed and coded for general-purpose processors. All the while, it is also expected that these mobile processors be power-conscientious as well as of minimal area impact. These devices pose unique usage demands of ultra-portability, but also demand an always-on, continuous data access paradigm. As a result, this dichotomy of continuous execution versus long battery life poses a difficult challenge. This paper explores a novel approach to mitigating mobile processor power consumption, with a nonlinear degradation in execution speed. The concept relies on using dynamic application memory behavior to intelligently target adjustments in the cache to significantly reduce overall processor power, taking into account both the dynamic and leakage power footprint of the cache subsystem. The simulation results show a significant reduction in power consumption of approximately 16% to 19%, while only incurring a nominal increase in execution time and area.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116926721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felix Reimann, M. Glaß, C. Haubelt, M. Eberl, J. Teich
{"title":"Improving platform-based system synthesis by satisfiability modulo theories solving","authors":"Felix Reimann, M. Glaß, C. Haubelt, M. Eberl, J. Teich","doi":"10.1145/1878961.1878986","DOIUrl":"https://doi.org/10.1145/1878961.1878986","url":null,"abstract":"Due to the ever increasing system complexity, deciding whether a given platform is sufficient to implement a set of applications under given constraints becomes a serious bottleneck in platform-based design. As a remedy, the work at hand proposes a novel automatic platform-based system synthesis procedure, inspired by techniques developed in the context of automatic system verification known as Satisfiability Modulo Theories. It tightly couples the computation of a feasible allocation and binding with nonfunctional constraint checking where, in contrast to existing approaches, not only linear constraints but even nonlinear constraints are supported. This allows to efficiently prove whether there exists a feasible implementation of a set of applications on the given platform with respect to both, functional and nonfunctional constraints. Moreover, an approach for early learning based on feasibility checking of partial implementations is proposed that can significantly improve the synthesis runtime, especially in case the selected platform imposes stringent constraints on the implementation. The effectiveness of this approach is shown for an automotive ECU network design that requires Modular Performance Analysis to ensure nonfunctional nonlinear timing constraints.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121924963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware/software optimization of error detection implementation for real-time embedded systems","authors":"A. Lifa, P. Eles, Zebo Peng, V. Izosimov","doi":"10.1145/1878961.1878970","DOIUrl":"https://doi.org/10.1145/1878961.1878970","url":null,"abstract":"This paper presents an approach to system-level optimization of error detection implementation in the context of fault-tolerant real-time distributed embedded systems used for safety-critical applications. An application is modeled as a set of processes communicating by messages. Processes are mapped on computation nodes connected to the communication infrastructure. To provide resiliency against transient faults, efficient error detection and recovery techniques have to be employed. Our main focus in this paper is on the efficient implementation of the error detection mechanisms. We have developed techniques to optimize the hardware/software implementation of error detection, in order to minimize the global worst-case schedule length, while meeting the imposed hardware cost constraints and tolerating multiple transient faults. We present two design optimization algorithms which are able to find feasible solutions given a limited amount of resources: the first one assumes that, when implemented in hardware, error detection is deployed on static reconfigurable FPGAs, while the second one considers partial dynamic reconfiguration capabilities of the FPGAs.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128076719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance modeling of embedded applications with zero architectural knowledge","authors":"M. Lattuada, Fabrizio Ferrandi","doi":"10.1145/1878961.1879010","DOIUrl":"https://doi.org/10.1145/1878961.1879010","url":null,"abstract":"Performance estimation is a key step in the development of an embedded system. Normally, the performance evaluation is performed using a simulator or a performance mathematical model of the target architecture. However, both these approaches are usually based on the knowledge of the architectural details of the target. In this paper we present a methodology for automatically building an analytical model to estimate the performance of an application on a generic processor without requiring any information about the processor architecture but the one provided by the GNU GCC Intermediate Representation. The proposed methodology exploits the linear regression technique based on an application analysis performed on the Register Transfer Level internal representation of the GNU GCC compiler. The benefits of working with this type of model and with this intermediate representation are three: we take into account most of the compiler optimizations, we implicitly consider some architectural characteristics of the target processor and we can easily estimate the performance of portions of the specification. We validate our approach by evaluating with cross-validation technique the accuracy and the generality of the performance models built for the ARM926EJ-S and the LEON3 processors.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114567813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring models of computation with Ptolemy II","authors":"Christopher X. Brooks, Edward A. Lee, S. Tripakis","doi":"10.1145/1878961.1879020","DOIUrl":"https://doi.org/10.1145/1878961.1879020","url":null,"abstract":"The Ptolemy project studies modeling, simulation, and design of concurrent, real-time, embedded systems. The focus is on assembly of concurrent components. The key underlying principle in the project is the use of well-defined models of computation that govern the interaction between components. A major problem area being addressed is the use of heterogeneous mixtures of models of computation. Ptolemy II takes a component view of design, in that models are constructed as a set of interacting components. A model of computation governs the semantics of the interaction, and thus imposes an execution-time discipline. Ptolemy II has implementations of many models of computation including Synchronous Data Flow, Kahn Process Networks, Discrete Event, Continuous Time, Synchronous/Reactive and Modal Models This hands-on tutorial explores how these models of computation are implemented in Ptolemy II and how to create new models of computation such as a \"non-dogmatic\" Process Networks example and a left-to-right execution policy example.","PeriodicalId":118816,"journal":{"name":"2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130788255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}