F. Cancare, C. Pilato, Andrea Cazzaniga, D. Sciuto, M. Santambrogio
{"title":"D-RECS: A complete methodology to implement Self Dynamic Reconfigurable FPGA-based systems","authors":"F. Cancare, C. Pilato, Andrea Cazzaniga, D. Sciuto, M. Santambrogio","doi":"10.1109/ReCoSoC.2013.6581550","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581550","url":null,"abstract":"Dynamic self reconfigurable embedded systems are gathering, day after day, an increasing interest from both the scientific and the industrial world. At the same time, however, the need of a comprehensive and easy to use tool which can guide designers through the whole implementation process is becoming stronger. Up to now every proposed methodology for implementing dynamic self reconfigurable systems is architecture-centered. In most cases the system development process is time consuming and requires a very specific technical background. Aim of this work is to provide a fast brain to bit design flow whose goal is to simplify the dynamic reconfigurable system development process by shifting the designer focus from the architecture point of view to the application point of view: designers will not need to possess Dynamic Reconfigurability expertise but just to be skilled with the application domain.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115533360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ACMA: Accuracy-configurable multiplier architecture for error-resilient System-on-Chip","authors":"Kartikeya Bhardwaj, P. Mane","doi":"10.1109/ReCoSoC.2013.6581532","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581532","url":null,"abstract":"In nanometer regime, optimization of System-on-Chip (SoC) designs w.r.t. speed, power and area is a major concern for VLSI designers today. Imprecise/approximate design obviates the constraints on accuracy, stemming a novel Speed-Power-Accuracy-Area (SPAA) metrics which can pilot to tremendous improvements in speed and/or power with a feeble accord in accuracy. This astonishingly expediency captivated researchers to delve into imprecise/approximate VLSI design evolution. In this paper, we present a new accuracy-configurable multiplier architecture (ACMA) for error-resilient systems. The ACMA uses a technique called Carry-in Prediction for approximate multiplication based on efficient precomputation logic that increases its throughput. The proposed multiplication reduces the latency of an accurate multiplier by almost half by reducing its critical path. The simulation results suggest that SPAA metrics can be administered by exploiting the design for apposite number of iterations. The results for 16-bit multiplication show the mean accuracy of 99.85% to 99.9% in case there is no lower bound on the size of operands and if size of operands are 10-bit or more (numbers > 1000), it results into a mean accuracy of 99.965%.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129035611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan Fernando Eusse Giraldo, Christopher Williams, R. Leupers
{"title":"CoEx: A novel profiling-based algorithm/architecture co-exploration for ASIP design","authors":"Juan Fernando Eusse Giraldo, Christopher Williams, R. Leupers","doi":"10.1109/ReCoSoC.2013.6581520","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581520","url":null,"abstract":"Application Specific Instruction Set Processor (ASIP) design methodologies have not been significantly altered during the past decade, and are still based on a highly manual and iterative process. Profiling has been established as a first step to prune the design space, and gain a deep understanding of the algorithms that underpin the application for which an ASIP is to be tailored. Independently of the profiling strategy, none of the existing ASIP-oriented profiling technologies enables on-the-loop application optimization or algorithmic exploration, which are mandatory steps throughout ASIP design. An innovative multi-grained approach that enables multiple levels of profiling detail according to the ASIP design stage (i.e. hot spot identification, application optimization, algorithmic exploration and architectural design) is presented. To validate our multi-grained profiling approach, the design of an ASIP for Marker-Based Augmented Reality was undertaken, achieving a 6x speedup in application execution in two days of design time.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127969132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximation of hyperbolic tangent activation function using hybrid methods","authors":"M. Sartin, A. M. Silva","doi":"10.1109/ReCoSoC.2013.6581545","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581545","url":null,"abstract":"Artificial Neural Networks are widely used in various applications in engineering, as such solutions of nonlinear problems. The implementation of this technique in reconfigurable devices is a great challenge to researchers by several factors, such as floating point precision, nonlinear activation function, performance and area used in FPGA. The contribution of this work is the approximation of a nonlinear function used in ANN, the popular hyperbolic tangent activation function. The system architecture is composed of several scenarios that provide a tradeoff of performance, precision and area used in FPGA. The results are compared in different scenarios and with current literature on error analysis, area and system performance.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133687094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory allocation and optimization in system-level architectural synthesis","authors":"Shuo Li, A. Hemani","doi":"10.1109/ReCoSoC.2013.6581537","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581537","url":null,"abstract":"In this paper, we present a novel approach to optimally allocate memory resources in a system-level synthesis flow, which converts a dataflow style system description (synchronous data flow) into the register-transfer level description in the specified implementation style (ASIC, FPGA or CGRA). The first problem is encountered by the synthesis flow is that since it covers different implementation styles, a generic model is required to support resource allocation and optimization. The second problem is the memory allocation method to optimally allocate memory resources in the RTL model. The contribution of this paper has two parts, which are 1) a generic memory model for different memory architectures in ASIC, FPGA and CGRA, and 2) a memory allocation and optimization method for optimally allocating storage elements in the intermediate representation with actual implementations (e.g. on-chip SRAM for ASIC, memory controller and off-chip SDRAM for FPGA). The memory allocation method is an implementation style dependent procedure and has three steps: architecture independent optimization, resource allocation and architecture depended optimization. The experimental result shows that the proposed method is efficient and effective. The automatically generated implementation uses only approximately 4% more resources compared to manual implementation. The fast and automatic memory allocation method enables fast design space exploration that requires little effort form the system designer.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131023810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Hempel, Jan Hoyer, Thilo Pionteck, C. Hochberger
{"title":"Register allocation for high-level synthesis of hardware accelerators targeting FPGAs","authors":"G. Hempel, Jan Hoyer, Thilo Pionteck, C. Hochberger","doi":"10.1109/ReCoSoC.2013.6581522","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581522","url":null,"abstract":"This work evaluates the benefits of several register allocation strategies as part of a design flow for automatic generation of application-specific hardware accelerators targeting FPGAs. As usage of vendor-specific design tools is mandatory for system designs targeting FPGAs, high-level synthesis has to account for the optimization capabilities already implemented in these design tools. In addition, FPGA-specific hardware characteristics have to be considered as well. Therefore, several register allocation strategies are evaluated in the context of a GCC based C to HDL design flow for application-specific hardware accelerators. Evaluation was done by means of several example designs from typical application domains for embedded systems. These designs were synthesized using the ISE design suite with either area or speed as an optimization criteria. Synthesis results for Spartan 6 and Artix 7 FPGAs show that with regards to clock frequency and area requirements, register allocation strategy should be kept simple when generating HDL code as an input for FPGA vendor-specific design tools.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128503428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christoph Roth, Harald Bucher, Simon Reder, O. Sander, J. Becker
{"title":"Improving parallel MPSoC simulation performance by exploiting dynamic routing delay prediction","authors":"Christoph Roth, Harald Bucher, Simon Reder, O. Sander, J. Becker","doi":"10.1109/ReCoSoC.2013.6581524","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581524","url":null,"abstract":"Raising the abstraction level or parallel execution are two possible solutions in order to cope with extremely long runtimes of complex Multi-Processor System-on-Chip (MPSoC) simulations. Within previous works, a SystemC/TLM based modeling methodology targeting accurate simulation of NoC-based MPSoCs bas been proposed that benefits from both. Communication is abstracted into transactions. This enables extraction of parallelism through temporal decoupling for increasing efficiency of parallel simulation if a loss of accuracy is acceptable. This work extends previous works by a dynamic prediction mechanism that allows adapting the degree of temporal decoupling during runtime and thus prevents any loss of accuracy. The method is based on local time quanta that exist once for every module connection. Delay annotations within modules are exploited for predicting communication delays between modules. Based on these predictions, local time quanta are dynamically adjusted. The approach is evaluated by means of a realistic MPSoC model. Measurements have been performed on different host platforms. Results demonstrate that the method can significantly contribute to acceleration of parallel and sequential simulation.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127331883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RecMIN: A reconfiguration architecture for network on chip","authors":"A. Logvinenko, Carsten Gremzow, D. Tutsch","doi":"10.1109/ReCoSoC.2013.6581547","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581547","url":null,"abstract":"RecMIN (Reconfigurable Multi-Interconnection Network) is a new network architecture that reduces inefficiency and increases the throughput of the network. With reconfiguration, the network topology adapts itself to traffic. Like FPGA topology, the proposed RecMIN topology consists of dynamically reconfigurable cells. The reconfiguration of each cell can possibly be done in one clock cycle, and the reconfiguration of each cell is independent. The architecture can be easily expanded to utilize reconfigurable cells together with non-reconfigurable network structures. The number of network inputs and outputs does not affect the viability of the architecture.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121263356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamically reconfigurable FIR filter architectures with fast reconfiguration","authors":"M. Kumm, Konrad Möller, P. Zipf","doi":"10.1109/ReCoSoC.2013.6581517","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581517","url":null,"abstract":"This work compares two finite impulse response (FIR) filter architectures for FPGAs for which the coefficients can be reconfigured during run-time. One is a recently proposed filter architecture based on distributed arithmetic (DA) and the other is based on a LUT multiplication scheme. Instead of using the common internal configuration access port (ICAP) for reconfiguration which is able to change the logic as well as the routing, it is sufficient to reconfigure only the logic in the regarded architectures. This is realized by using the configurable look-up table (CFGLUT) primitive of Xilinx that allows reconfiguration times which are orders of magnitudes faster than using ICAP. The resulting FIR filter architectures achieves reconfiguration times of typically less than 100 ns. They can be reconfigured with arbitrary coefficients that are only limited by their length and word size. As their resource consumptions depend on different parameters of the filter, a detailed comparison is done. It turned out that if the input word size is greater than approximately half the number of coefficients, the LUT based multiplication scheme needs less resources than the DA architecture and vice versa.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129931961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jesús Carabaño, Francisco Dios, M. Daneshtalab, M. Ebrahimi
{"title":"An exploration of heterogeneous systems","authors":"Jesús Carabaño, Francisco Dios, M. Daneshtalab, M. Ebrahimi","doi":"10.1109/ReCoSoC.2013.6581542","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581542","url":null,"abstract":"Heterogeneous computing represents a trendy way to achieve further scalability in the high-performance computing area. It aims to join different processing units in a networked-based system such that each task is preferably executed by the unit which is able to efficiently perform that task. Memory hierarchy, instruction set, control logic, and other properties may differ in processing units so as to be specialized for different variety of problems. However, it will be more time-consuming for computer engineers to understand, design, and program on these systems. On the other hand, proper problems running on well-chosen heterogeneous systems present higher performance and superior energy efficiency. Such balance of attributes seldom makes a heterogeneous system useful for other fields than embedded computing or high-performance computing. Among them, embedded computing is more area and energy efficient while high-performance computing obtains more performance. GPUs, FPGAs or the new Xeon Phi are example of common computational units that, along with CPUs, can compose heterogeneous systems aiming to accelerate the execution of programs. In this paper, we have explored these architectures in terms of energy efficiency, performance, and productivity.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122798501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}