V. Athavale, Sam Hertz, Darshan Jetly, V. Ganesan, Jim Krysl, Shobha Vasudevan
{"title":"Using static analysis for coverage extraction fromemulation/prototyping platforms","authors":"V. Athavale, Sam Hertz, Darshan Jetly, V. Ganesan, Jim Krysl, Shobha Vasudevan","doi":"10.1145/2380445.2380481","DOIUrl":"https://doi.org/10.1145/2380445.2380481","url":null,"abstract":"Full-system emulation and prototyping is now being used widely in the industry for System-on-Chip (SoC) verification. Emulation/ prototyping platforms run tests in a fraction of time compared to the traditional simulation based verification. However, unlike simulation, they do not provide visibility into the hardware design source code. As a result, they fail to provide any information about code coverage achieved, which is an important metric to measure the completeness of the verification process. In this paper, we present a novel technique to extract code coverage from emulation/prototyping platforms. Through analysis of the source code for the hardware design, we relate the evaluation of branch conditions to other statements in the code. Evaluation of the branch conditions is recorded using additional logic during emulation, and mapped back to the code to obtain coverage information. We apply our technique to an industrial system, and show that it can efficiently provide code coverage statistics that are faithful to the coverage obtained from simulation. We also perform experiments on the publicly available OpenRISC processor and demonstrate similar results.","PeriodicalId":268500,"journal":{"name":"Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134150885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Liuha, K. Pehkonen, J. Rummukainen, Veli-Pekka Vatula, T. Koljonen
{"title":"Research issues in smart phones, notepads and related services","authors":"P. Liuha, K. Pehkonen, J. Rummukainen, Veli-Pekka Vatula, T. Koljonen","doi":"10.1145/2380445.2380453","DOIUrl":"https://doi.org/10.1145/2380445.2380453","url":null,"abstract":"Networked embedded systems are building intelligence to everyplace. There are more and more incentives to open proprietary data and interfaces for free to third party service developers. This development called \"ubiquitous communication\" or \"internet of things\" requires new dominant design for user interfaces, interoperability and contextuality. In the user interface design, for example, we have witnessed the growth of the smartphone display, which today is at about 5 inches. Developing the new dominant designs is part of the \"ecosystem war\", where the most attractive platforms are getting the most developer, users and profits. The four speakers of today represent companies that have entered the competition with different platforms, assets and strategies and hence have different research challenges to be solved.","PeriodicalId":268500,"journal":{"name":"Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis","volume":"332 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133808576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The roce-bush router: a case for routing-centric dimensional decomposition for low-latency 3D noC routers","authors":"M. Salas, S. Pasricha","doi":"10.1145/2380445.2380476","DOIUrl":"https://doi.org/10.1145/2380445.2380476","url":null,"abstract":"As 3D System-On-Chips (SoCs) come ever closer to becoming the standard for high performance ICs, 3D Networks on Chips (NoCs) have emerged as a key component in meeting performance constraints and ensuring power-efficiency. Among the proposed 3D router architectures, dimensionally-decomposed routers are widely accepted as an efficient solution to deal with the increased port count and the accompanying exponential power and area increases. All decompositions proposed thus far have however been dimensionally static, that is, they have set in stone a particular bias among the three dimensions. This paper presents a novel router with a routing-centric decomposition and virtual channel buffer sharing called the Roce-Bush router. To our knowledge, this is the first work that integrates routing-awareness in the context of dimensional decomposition and buffer resource allocation for NoC routers. Experimental results involving RTL level implementations of our router and synthesis at 45nm show that compared to a dimensional-agnostic decomposed router, the Roce-Bush router can achieve up to 14% better performance and 5% lower power.","PeriodicalId":268500,"journal":{"name":"Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114333670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen-Chun Huang, Bailey Miller, F. Vahid, T. Givargis
{"title":"Synthesis of custom networks of heterogeneous processing elements for complex physical system emulation","authors":"Chen-Chun Huang, Bailey Miller, F. Vahid, T. Givargis","doi":"10.1145/2380445.2380483","DOIUrl":"https://doi.org/10.1145/2380445.2380483","url":null,"abstract":"Physical system models that consist of thousands of ordinary differential equations can be synthesized to field-programmable gate arrays (FPGAs) for highly-parallelized, real-time physical system emulation. Previous work introduced synthesis of custom networks of homogeneous processing elements, consisting of processing elements that are either all general differential equation solvers or are all custom solvers tailored to solve specific equations. However, a complex physical system model may contain different types of equations such that using only general solvers or only custom solvers does not provide all of the possible speedup. We introduce methods to synthesize a custom network of heterogeneous processing elements for emulating physical systems, where each element is either a general or custom differential equation solver. We show average speedups of 45x over a 3 GHz single-core desktop processor, and of 11x and 20x over a 3 GHz four-core desktop and a 763 MHz NVIDIA graphical processing unit, respectively. Compared to a commercial high-level synthesis tool including regularity extraction, the networks of heterogeneous processing elements were on average 10.8x faster. Compared to homogeneous networks of general and single-type custom processing elements, heterogeneous networks were on average 7x and 6x faster, respectively.","PeriodicalId":268500,"journal":{"name":"Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116735771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Managing parallelism in multi-core systems","authors":"R. Ernst","doi":"10.1145/3250268","DOIUrl":"https://doi.org/10.1145/3250268","url":null,"abstract":"","PeriodicalId":268500,"journal":{"name":"Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129597648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating interlocked instruction pipelines from specifications of instruction sets","authors":"R. Dreesen","doi":"10.1145/2380445.2380492","DOIUrl":"https://doi.org/10.1145/2380445.2380492","url":null,"abstract":"The development of application specific processors (ASIPs) for systems-on-a-chip (SoCs) became increasingly popular in recent years. To efficiently develop such processors, respective tools are crucial. This paper presents methods to generate pipelined processors from a bare instruction set specification in ViDL. All microarchitectural aspects of the processor are contributed by a generator. Hazard resolution by forwarding, interlocking and branch prediction is automatically derived from instruction semantics, information on the targeted chip technology and an user supplied timing constraint. By variation of the latter, a set of compatible processor implementations is generated with different physical and dynamic characteristics. The processor generator has been evaluated using realistic instruction sets, such as ARM, MIPS, Power, SRC, DNACore and CoreVA. The generated processors have been tested on register-transfer-level and gate-level. In total, 83 processors have been generated and synthesized for a 65 nm STM low power technology, yielding clock frequencies of 260 - 680MHz for 2 - 7 stage pipelines. Clock frequency and the number of cycles per instruction (CPI) is similar to handcrafted designs.","PeriodicalId":268500,"journal":{"name":"Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130590976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Managing latency in embedded streaming applications under hard-real-time scheduling","authors":"M. Bamakhrama, T. Stefanov","doi":"10.1145/2380445.2380464","DOIUrl":"https://doi.org/10.1145/2380445.2380464","url":null,"abstract":"In this paper, we consider the problem of hard-real-time scheduling of embedded streaming applications, modeled using dataflow graphs, while minimizing the application latency. Recently, it has been shown that the actors in an acyclic Cyclo-Static Dataflow (CSDF) graph can be scheduled as a set of implicit-deadline periodic tasks. Such scheduling approach has been shown to yield the maximum achievable throughput for a large set of graphs, called matched I/O rates graphs. We show that scheduling the graph actors as implicit-deadline periodic tasks increases the latency significantly for a class of graphs called unbalanced graphs. To alleviate this problem, we propose a new task-set representation for the actors in which the actors are scheduled as a set of constrained-deadline periodic tasks. We prove that scheduling the actors as constrained-deadline periodic tasks delivers optimal throughput (i.e., rate) and latency for graphs with repetition vector equal to $vec{1}$. Furthermore, we evaluate the constrained-deadline representation using a set of 19 real-life applications and show that it is capable of achieving the minimum achievable latency for more than 70% of the applications, and even if the application has a repetition vector not equal to $vec{1}$. We show that choosing the task deadline involves a trade-off between the latency and the resources requirements. Finally, we propose a decision tree to assist the designer in choosing the appropriate real-time periodic task model for scheduling acyclic CSDF graphs.","PeriodicalId":268500,"journal":{"name":"Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131508073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Bombieri, S. Vinco, V. Bertacco, Debapriya Chatterjee
{"title":"SystemC simulation on GP-GPUs: CUDA vs. OpenCL","authors":"N. Bombieri, S. Vinco, V. Bertacco, Debapriya Chatterjee","doi":"10.1145/2380445.2380500","DOIUrl":"https://doi.org/10.1145/2380445.2380500","url":null,"abstract":"SystemC is a widespread language for developing SoC designs. Unfortunately, most SystemC simulators are based on a strictly sequential scheduler that heavily limits their performance, impacting verification schedules and time-to-market of new designs. Parallelizing SystemC simulation entails a complete re-design of the simulator kernel for the specific target parallel architectures. This paper proposes an automatic methodology to generate a parallel SystemC simulator kernel, exploiting the massive parallelism of GP-GPU architectures. Our solution leverages static scheduling to reduce synchronization overheads. The generated simulator code targets both CUDA and OpenCL libraries, to boost scalability and provide support for multiple GP-GPU architectures. Finally, the paper compares the performance of our solution on CUDA vs. OpenCL platforms, with the goal of investigating advantages and drawbacks that the two thread management libraries offer to concurrent SystemC simulation.","PeriodicalId":268500,"journal":{"name":"Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130404558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Lorenz, Kim Grüttner, N. Bombieri, V. Guarnieri, S. Bocchio
{"title":"From RTL IP to functional system-level models with extra-functional properties","authors":"Daniel Lorenz, Kim Grüttner, N. Bombieri, V. Guarnieri, S. Bocchio","doi":"10.1145/2380445.2380529","DOIUrl":"https://doi.org/10.1145/2380445.2380529","url":null,"abstract":"The paper presents a novel abstraction methodology for generating time- and power-annotated TLM models from synthesizable RTL descriptions. The proposed techniques allow the integration of existing RTL IP components into virtual platforms for early software development and platform design, configuration, and exploration. With the proposed approach, IP models can be natively integrated into SystemC TLM-2.0 platforms and executed 10-1000 times faster compared to state-of-the-art RTL simulators. The abstraction methodology guarantees preservation of the behaviour and timing of the RTL models. Target technology dependent power properties of IP components are represented as power state-machines and integrated into the abstracted TLM models. The experimental results show a relative error less than 10% of the abstracted model's power consumption compared to state-of-the-art RTL power simulators. The evaluation has been performed on RTL IP components with different characteristics and demonstrates the effectiveness of the presented abstraction methodology.","PeriodicalId":268500,"journal":{"name":"Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130411112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Software solutions for handling physical effects in embedded platforms","authors":"Karam S. Chatha","doi":"10.1145/3250266","DOIUrl":"https://doi.org/10.1145/3250266","url":null,"abstract":"","PeriodicalId":268500,"journal":{"name":"Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134282524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}