{"title":"Among slow dwarfs and fast giants: A systematic design space exploration of KECCAK","authors":"Bernhard Jungk, Marc Stöttinger","doi":"10.1109/ReCoSoC.2013.6581527","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581527","url":null,"abstract":"The SHA-3 competition ended in late 2012 by announcing KECCAK as the winning algorithm. During the contest, several criteria were evaluated for hardware implementations, foremost the resource consumption, the throughput and the tradeoff between both criteria. Unfortunately, especially for lightweight and midrange implementations, a clear rationale for the design choices were missing most of the time. Therefore, in this paper a new methodology is proposed to evaluate such implementations using a new and systematic procedure. With this novel approach we show, that there are several different implementation styles to implement KECCAK with different tradeoffs. Furthermore, we substantiate the usefulness of the new methodology with several concrete and competitive implementations. These implementations are derived from our evaluation estimates and add several data points for midrange and lightweight designs to the current state of the art.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116673322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic task remapping for power and latency performance improvement in priority-based non-preemptive Networks On Chip","authors":"J. Harbin, L. Indrusiak","doi":"10.1109/ReCoSoC.2013.6581526","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581526","url":null,"abstract":"In dynamic system-on-chip and multicore CPU applications, the communication patterns between tasks are not easy to characterise in advance. Dynamic task mapping is commonly used in Network-On-Chip (NoC) research in order to redistribute tasks around network processing elements at runtime in response to changes in network loading. Dynamic task mapping is anticipated to become more important as general purpose CPUs become massively multicore and system-on-chip (SoC) designs become more reconfigurable in their application usage patterns. Simultaneously, reducing NoC power consumption is a necessary consideration in the development of future scaleable and energy efficient NoC systems. The work illustrated here uses a dynamic metric which combines contention and the power consumption impact of task remapping decisions, in order to produce a non-preemptive NoC that can deliver as good or better latency as a preemptive NoC in a real application scenario, while reducing overall power consumption. The results obtained show a power consumption reduction of approximately 35% in an application case involving an autonomous vehicle application, and significant reductions in the latency of individual flows.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126818432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Component based design using constraint programming for module placement on FPGAs","authors":"Alexander Wold, Dirk Koch, J. Tørresen","doi":"10.1109/ReCoSoC.2013.6581541","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581541","url":null,"abstract":"Constraint satisfaction modeling is both an efficient, and an elegant approach to model and solve many real world problems. In this paper, we present a constraint solver targeting module placement in static and partial run-time reconfigurable systems. We use the constraint solver to compute feasible placement positions. Our placement model incorporates communication, implementation variants and device configuration granularity. In addition, we model heterogeneous resources such as embedded memory, multipliers and logic. Furthermore, we take into account that logic resources consist of different types including logic only LUTs, arithmetic LUTs with carry chains, and LUTs with distributed memory. Our work targets state of the art field-programmable gate arrays (FPGAs) in both design-time and run-time applications. In order to evaluate our placement model and module placer implementation, we have implemented a repository containing 200 fully functional, placed and routed relocatable modules. The modules are used to implement complete systems. This validates the feasibility of both the model and the module placer. Furthermore, we present simulated results for run-time applications, and compare this to other state of the art research. In run-time applications, the results point to improved resource utilization. This is a result of using a finer tile grid and complex module shapes.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131857297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan M. Campos, R. Cumplido, C. F. Uribe, Roberto Perez-Andrade
{"title":"A parallelization methodology for reconfigurable systems applied to edge detection","authors":"Juan M. Campos, R. Cumplido, C. F. Uribe, Roberto Perez-Andrade","doi":"10.1109/ReCoSoC.2013.6581546","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581546","url":null,"abstract":"In this paper, a novel parallelization methodology is applied to Edge Detection Algorithm (EDA). The proposed methodology is based on a multiprojection approach and on a fusion of processor elements. It eliminates the relationship between problem size and processor array size when using methodologies based on projections. EDA is an interesting problem because its data dependencies and its potential parallelism, besides EDA is used in multiple applications. In this study, multiple versions of the EDA architecture are generated in order to fulfill requirements of throughput and implementation area.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133937197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring memory access latency for software objects in a NUMA system-on-chip architecture","authors":"Daniela Genius","doi":"10.1109/ReCoSoC.2013.6581525","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581525","url":null,"abstract":"We consider streaming applications modeled as a set of tasks communicating via channels. These channels are mapped to on-chip memory of a multi-processor system on chip (MPSoC) with non-uniform memory access. In complex applications like advanced packet processing and video streaming, often only part of the data transits through the channels. Tasks also communicate via shared memory; synchronization mechanisms like locks and barriers might be required. Effects of I/O on the traffic on the interconnect also have to be taken into account, all together increasing traffic to and from memory. Our clustered MPSoC architecture is modeled with SoCLib. SocLib's design space exploration tool proposes, among others, communication channels and shared memory for inter-task communication. Each consists of one of several software objects which are mapped to on-chip memory. The difficulty when measuring latency is to find out which (co-)processor issued a request for a particular software object. We intervene early in the design process by monitoring the transfers on the interconnection network caused by the access to these software objects. We identify the software objects by name and trace the corresponding memory accesses. In spite of the cycle accurate bit accurate level of simulation, our method has little overhead and avoids distorting the performance results.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134020844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riccardo Cattaneo, Xinyu Niu, C. Pilato, Tobias Becker, W. Luk, M. Santambrogio
{"title":"A framework for effective exploitation of partial reconfiguration in dataflow computing","authors":"Riccardo Cattaneo, Xinyu Niu, C. Pilato, Tobias Becker, W. Luk, M. Santambrogio","doi":"10.1109/ReCoSoC.2013.6581535","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581535","url":null,"abstract":"The exploitation of high-performance architectures based on reconfigurable hardware to build power efficient supercomputing clusters is becoming more and more common. Indeed, large speedups have already been demonstrated in several high-performance computing (HPC) applications. On the other hand, partial reconfiguration (PR) has the potential to further increase performance and power efficiency in many applications; however, there is currently very limited support for transforming a traditional design into a reconfigurable one. In this work, we introduce a design methodology for PR designs that combines application analysis, partitioning, mapping and scheduling, and supports fast exploration of various design options. These steps are integrated in an automated toolchain which allows a designer to implement reconfigurable designs with simple guidance through a graphical interface. We demonstrate our approach by applying the methodology to an image processing application, implementing the proposed design on a Maxeler MaxWorkstation.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115342876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting FPGA block memories for protected cryptographic implementations","authors":"S. Bhasin, J. Danger, S. Guilley, W. He","doi":"10.1145/2629552","DOIUrl":"https://doi.org/10.1145/2629552","url":null,"abstract":"Modern Field Programmable Gate Arrays (FPGAs) are power packed with features to facilitate designers. Availability of features like huge block memory (BRAM), Digital Signal Processing (DSP) cores, embedded CPU makes the design strategy of FPGAs quite different from ASICs. FPGA are also widely used in security-critical application where protection against known attacks is of prime importance. We focus ourselves on physical attacks which target physical implementations. To design countermeasures against such attacks, the strategy for FPGA designers should also be different from that in ASIC. The available features should be exploited to design compact and strong countermeasures. In this paper, we propose methods to exploit the BRAMs in FPGAs for designing compact countermeasures. BRAM can be used to optimize intrinsic countermeasures like masking and dual-rail logic, which otherwise have significant overhead (at least 2X). The optimizations are applied on a real AES-128 co-processor and tested for area overhead and resistance on Xilinx Virtex-5 chips. The presented masking countermeasure has an overhead of only 16% when applied on AES. Moreover Dual-rail Precharge Logic (DPL) countermeasure has been optimized to pack the whole sequential part in the BRAM, hence enhancing the security. Proper robustness evaluations are conducted to analyze the optimization for area and security.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114461270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware/software co-compilation with the Nymble system","authors":"Jens Huthmann, B. Liebig, J. Oppermann, A. Koch","doi":"10.1109/ReCoSoC.2013.6581538","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581538","url":null,"abstract":"The Nymble compiler system accepts C code, annotated by the user with partitioning directives, and translates the indicated parts into hardware accelerators for execution on FPGA-based reconfigurable computers. The interface logic between the remaining software parts and the accelerators is automatically created, taking into account details such as cache flushes and copying of FPGA-local memories to the shared main memory. The system also supports calls from hardware back into software, both for infrequent operations that do not merit hardware area, as well as for using operating system / library services such as memory management and I/O.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"312 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122020641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simulation framework for cycle-accurate RTL modeling of partial run-time reconfiguration in VHDL","authors":"Simen Gimle Hansen, Dirk Koch, J. Tørresen","doi":"10.1109/ReCoSoC.2013.6581519","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581519","url":null,"abstract":"Partial run-time reconfiguration has brought forward a new dimension and many new possibilities when designing systems. However, it also leads to many new challenges that need to be addressed for partial run-time reconfiguration to be successful. One of the most significant challenges is how to perform functional verification of systems using partial run-time reconfiguration. In this paper, we propose a simulation framework for functional modeling and verification of partial run-time reconfiguration at the Register Transfer Level (RTL) using VHDL. The proposed simulation framework provides cycle-accurate modeling of the reconfiguration process using the real bitstream file, and supports both island-based and slot-based reconfigurable design styles. For slot-based design styles, the simulation framework supports modules that either occupies one slot or multiple slots, as well as module relocation.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122108462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bitfile preservation - Generation of reusable out of context modules","authors":"C. Stüllein, N. Abel, U. Kebschull","doi":"10.1109/ReCoSoC.2013.6581544","DOIUrl":"https://doi.org/10.1109/ReCoSoC.2013.6581544","url":null,"abstract":"This paper presents the idea of bitfile preservation which enables the re-use of partial bitfiles in different environments and at different positions of the FPGA without any re-compilation. This way, the behaviour of a system can be changed on the fly just by plugging together different pre-compiled modules (represented by partial bitfiles). These modules can be developed without any knowledge of the final surrounding system. They may even be provided by third-party vendors as closed-source components. Current implementations demonstrate that it is possible to enhance the standard partial reconfiguration vendor flow in a way that enables bitfile preservation. Moreover, the already established cooperations prove that bitfile preservation is highly relevant for contemporary FPGA industry.","PeriodicalId":354964,"journal":{"name":"2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133823079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}