{"title":"K-Periodic schedules for evaluating the maximum throughput of a Synchronous Dataflow graph","authors":"Bruno Bodin, Alix Munier Kordon, B. Dinechin","doi":"10.1109/SAMOS.2012.6404169","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404169","url":null,"abstract":"Synchronous Dataflow graphs, introduced by Lee and Messerschmitt in 1987, are a well-known formalism commonly used to model data-exchanges between parallel processes. This model was extensively studied in the last two decades because of the importance of its applications. However, the determination of a maximal throughput is a difficult question, for which no polynomial time algorithm exists to date. In this context, several authors proved that a K-Periodic schedule, where K is a vector of no polynomially bounded values, reaches the maximum throughput. On the other hand, a 1-Periodic schedule may be built polynomially, but without any guarantee on the throughput achieved. Therefore, the investigated problem is the trade-off between the schedule size induced by the vector K (called the periodicity vector) and its corresponding throughput. Necessary and sufficient conditions for the existence of K-Periodic schedules are first shown for any fixed value in the vector K; the computation of the maximum throughput of a K-Periodic schedule is deduced. A set of dominant values of K is exhibited, and a relationship between the optimal throughput of these values is proved. Some real-life experiments measure the variation of the throughput according to K.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128669732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using OpenMP superscalar for parallelization of embedded and consumer applications","authors":"M. Andersch, C. C. Chi, B. Juurlink","doi":"10.1109/SAMOS.2012.6404154","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404154","url":null,"abstract":"In the past years, research and industry have introduced several parallel programming models to simplify the development of parallel applications. A popular class among these models are task-based programming models which proclaim ease-of-use, portability, and high performance. A novel model in this class, OpenMP Superscalar, combines advanced features such as automated runtime dependency resolution, while maintaining simple pragma-based programming for C/C++. OpenMP Superscalar has proven to be effective in leveraging parallelism in HPC workloads. Embedded and consumer applications, however, are currently still mainly parallelized using traditional thread-based programming models. In this work, we investigate how effective OpenMP Superscalar is for embedded and consumer applications in terms of usability and performance. To determine the usability of OmpSs, we show in detail how to implement complex parallelization strategies such as ones used in parallel H.264 decoding. To evaluate the performance we created a collection of ten embedded and consumer benchmarks parallelized in both OmpSs and Pthreads.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125226560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lauri Matilainen, Lasse Lehtonen, J. Määttä, E. Salminen, T. Hämäläinen
{"title":"System-on-Chip deployment with MCAPI abstraction and IP-XACT metadata","authors":"Lauri Matilainen, Lasse Lehtonen, J. Määttä, E. Salminen, T. Hämäläinen","doi":"10.1109/SAMOS.2012.6404176","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404176","url":null,"abstract":"IP-XACT, the recent IEEE1685 standard, defines metadata format for IP packing and integration in System-on-Chip designs. It was originally proposed for hardware descriptions, but we have extended it for software, HW/SW mappings and application communication abstraction. The latter is realized with Multicore Association MCAPI that is a lightweight message passing interface. In this paper we present as a work-in-progress how we utilize all these to deploy and move application tasks between different platforms for FPGA prototyping, execution acceleration or verification. The focus is on the metadata format since it is a foundation for automation and tool development. The design flow is illustrated with two case studies: A motion JPEG encoder and a 12-node workload model of video object plane decoder (VOPD). These are deployed to PC and Altera and Xilinx FPGA boards in five variations. The results are reported as the deployment time for both non-recurring and deployment specific tasks. Setting up a new deployment is a matter of hours when there is an IP-XACT library of HW and SW components.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130007658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HNOCS: Modular open-source simulator for Heterogeneous NoCs","authors":"Y. Ben-Itzhak, E. Zahavi, I. Cidon, A. Kolodny","doi":"10.1109/SAMOS.2012.6404157","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404157","url":null,"abstract":"We present HNOCS (Heterogeneous Network-on-Chip Simulator), an open-source NoC simulator based on OMNeT++. To the best of our knowledge, HNOCS is the first simulator to support modeling of heterogeneous NoCs with variable link capacities and number of VCs per unidirectional port. The HNOCS simulation platform provides an open-source, modular, scalable, extendible and fully parameterizable framework for modeling NoCs. It includes three types of NoC routers: synchronous, synchronous virtual output queue (VoQ) and asynchronous. HNOCS provides a rich set of statistical measurements at the flit and packet levels: end-to-end latencies, throughput, VC acquisition latencies, transfer latencies, etc. We describe the architecture, structure, available models and the features that make HNOCS suitable for advanced NoC exploration. We also evaluate several case studies which cannot be evaluated with any other exiting NoC simulator.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124441001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Architecture-level fault-tolerance for biomedical implants","authors":"R. M. Seepers, C. Strydis, G. Gaydadjiev","doi":"10.1109/SAMOS.2012.6404163","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404163","url":null,"abstract":"In this paper, we describe the design and implementation of a new fault-tolerant RISC-processor architecture suitable for a design framework targeting biomedical implants. The design targets both soft and hard faults and is original in efficiently combining as well as enhancing classic fault-tolerance techniques. The proposed architecture allows run-time tradeoffs between performance and fault tolerance by means of instruction-level configurability. The system design is synthesized for UMC 90nm CMOS standard-process and is evaluated in terms of fault coverage, area, average power consumption, total energy consumption and performance for various duplication policies and test-sequence schedules. It is shown that area and power overheads of approximately 25% and 32%, respectively, are required to implement our techniques on the baseline processor. The major overheads of the proposed architecture are performance (up to 107%) and energy consumption (up to 157%). It is observed that the average power consumption is often reduced when a higher degree of fault tolerance is targeted. It is shown that test sequences can effectively be scheduled during the available program stalls and that nearly all soft faults are tolerated by using instruction duplication. The main advantages of the proposed architecture are the high portability of the introduced architecture-level fault-tolerance techniques, the flexibility in trading processor overheads for required fault-tolerance degree as well as affordable area and power consumption overheads.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124348396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Hübner, D. Göhringer, Carsten Tradowsky, J. Henkel, J. Becker
{"title":"Adaptive processor architecture - invited paper","authors":"M. Hübner, D. Göhringer, Carsten Tradowsky, J. Henkel, J. Becker","doi":"10.1109/SAMOS.2012.6404181","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404181","url":null,"abstract":"This paper introduces a novel methodology to adapt the microarchitecture of a processor at run-time. The goal is to tailor the internal architecture to the requirements of an application and the data to be processed. The latter parameter is normally not known at design time. This leads to the development of more general purpose processors which are capable to handle the data to be processed in any case. With the novel approach which keeps the microarchitecture of a processor flexible, the processor can start as a general purpose device and end up with a specific parameterization, comparable with application specific processor architectures. Furthermore, the increased degree of freedom which is enabled through the approach for a novel quality of processors is described.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116819021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Teemu Nylanden, J. Boutellier, Karri Nikunen, J. Hannuksela, O. Silvén
{"title":"Reconfigurable miniature sensor nodes for condition monitoring","authors":"Teemu Nylanden, J. Boutellier, Karri Nikunen, J. Hannuksela, O. Silvén","doi":"10.1109/SAMOS.2012.6404164","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404164","url":null,"abstract":"The wireless sensor networks are being deployed at escalating rate for various application fields. The ever growing number of application areas requires a diverse set of algorithms with disparate processing needs. The wireless sensor networks also need to adapt to the prevailing energy conditions and processing requirements. The preceding reasons rule out the use of a single fixed design. Instead a general purpose design that can rapidly adapt to different conditions and requirements is desired. In lieu of the traditional inflexible wireless sensor node consisting of a micro-controller, radio transceiver, sensor array and energy storage, we propose a rapidly reconfigurable miniature sensor node, implemented with a transport triggered architecture processor on a low-power Flash FPGA. Also power consumption and silicon area usage comparison between 16-bit fixed and floating point and 32-bit floating point implementations is presented in this paper. The implemented processors and algorithms are intended for rolling bearing condition monitoring, but can be fully extended for other applications as well.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129060467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fakhar Anjam, Stephan Wong, L. Carro, G. Nazar, M. B. Rutzig
{"title":"Simultaneous reconfiguration of issue-width and instruction cache for a VLIW processor","authors":"Fakhar Anjam, Stephan Wong, L. Carro, G. Nazar, M. B. Rutzig","doi":"10.1109/SAMOS.2012.6404173","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404173","url":null,"abstract":"This paper presents an analysis on the impact of simultaneous instruction cache (I-cache) and issue-width reconfiguration for a very long instruction word (VLIW) processor. The issue-width of the processor can be adjusted at run-time to be 2-issue, 4-issue, or 8-issue, and the I-cache can be reconfigured in terms of associativity, cache size, and line size.We observe that, compared to reconfiguring only the I-cache for a fixed issue-width core, reconfiguring the issue-width and I-cache together can further reduce the execution time, energy consumption, and/or the energy-delay product (EDP). The results for the MiBench and the PowerStone benchmark suites show that compared to “2-issue + the best I-cache”, “4-issue + the best I-cache” can reduce execution time, energy consumption, and EDP by up to 37%, 11%, and 36%, respectively, for different applications. Similarly, compared to “2-issue + the best I-cache”, “8-issue + the best I-cache” can reduce execution time and EDP by up to 46% and 30%, respectively, for different applications.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128654435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Throughput driven transformations of Synchronous Data Flows for mapping to heterogeneous MPSoCs","authors":"Anastasia Stulova, R. Leupers, G. Ascheid","doi":"10.1109/SAMOS.2012.6404168","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404168","url":null,"abstract":"Due to energy efficiency requirements of modern embedded systems, chip vendors are inclined towards multicore architectures with different types of processing engines and non-uniform interconnect fabrics. At the same time multiple applications are intended to run concurrently on the devices with such heterogeneous architectures. This rapid growth in the complexity of the hardware and its use cases imposes new challenges on the software development tools. To overcome this complexity, model of computation based approaches are becoming increasingly promising. Synchronous Data Flow (SDF) is a popular specification formalism for streaming applications with inherently concurrent nature. However, the parallelism expressed in the original representation is often not sufficient to maximally exploit the potential of multicore platforms. In this paper we present a holistic methodology for improving the throughput of streaming applications while mapping them onto heterogeneous architectures. The approach uses transformations that adapt the parallelism in SDF according to available platform resources. We use a genetic algorithm to explore SDF instances with the objective of maximizing throughput on a target platform. Our model supports architecture heterogeneity and multi-application scenarios. The experiments indicate that our approach outperforms other techniques for exploiting parallelism on a single application in most of the test cases and enables concurrent applications optimization.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121255951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An application-specific Network-on-Chip for control architectures in RF transceivers","authors":"S. Brandstätter, M. Huemer","doi":"10.1109/SAMOS.2012.6404159","DOIUrl":"https://doi.org/10.1109/SAMOS.2012.6404159","url":null,"abstract":"This paper focuses on the design of an on-chip communication system for control architectures used in RF (Radio Frequency) transceivers. Continuous developments and enhancements of RF transceivers, especially of smart transceivers supporting multi-mode standards, led to new and complex SoC (System-on-Chip) designs. These designs are defined by a distributed controlling concept using several processing modules which are connected over an advanced communication system. Based on the requirements and restrictions of this communication system an application-specific NoC (Network-on-Chip) is presented and analyzed in this work.","PeriodicalId":130275,"journal":{"name":"2012 International Conference on Embedded Computer Systems (SAMOS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121665654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}