{"title":"Area-efficient dynamically reconfigurable protocol-processing-hardware for access network communications SoC","authors":"Saki Hatta, N. Tanaka, S. Shigematsu","doi":"10.1109/ReConFig.2014.7032501","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032501","url":null,"abstract":"Our proposed architecture of dynamically reconfigurable hardware for protocol processing (DRHPP) provides flexibility with high area efficiency. It can be used for a communications system-on-a-chip (SoC) in access networks. The DRHPP enables the modification and addition of various functions for protocol processing. Our architecture consists of three types of cells. The optimized number of these types of cells for the intended protocol processing can be implemented for increasing cell utilization, which can decrease the total area. Additionally, the best granularity for the cell also contributes to a decrease of the total area. We implemented a protocol-processing circuit using DRHPP for protocol-frame parser processing. Implementation results show the proposed architecture improves flexibility with only a 33% area penalty in comparison with a hard-wired protocol-processing circuit.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127172509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A high-level analysis of a multi-core vision processor using SystemC and TLM2.0","authors":"J. Y. Mori, M. Hübner","doi":"10.1109/ReConFig.2014.7032491","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032491","url":null,"abstract":"Vision Processors are integrated circuits with the aim to put together sensors and processing elements at the same chip. There are several constraints a designer may take into account when developing a vision processor: available technology, power consumption, thermal management, fault tolerance, speed, silicon area and application-specific needs. Most of these vision processors are based on analog circuits and can perform only low-level processing, like filtering and contrast adjustment. Digital processing elements can allow for more programmability in such systems, however, the approaches found in the literature do not explore the integration of sensor and processing elements in an efficient way. In addition, it is envisioned that vision processors can take advantage of the recent Multi/Many-Core advances. In this work, a full integration is analyzed, exploring the spatial distribution of sensors and processors. All the design blocks were developed using SystemC language with TLM2.0 standard, in order to allow for a better ESL analysis. The exploration of pure LT and mixed LT/AT models is used for extract information about parallelism in data transfer and operations. An application with some well-known algorithms is analyzed for a variable number of cores, in order to validate the tool-set and the methodology used.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127421389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Net reordering and multicommodity flow based global routing for FPGAs","authors":"Cristinel Ababei, R. Kavasseri, M. Zare","doi":"10.1109/ReConFig.2014.7032540","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032540","url":null,"abstract":"The most popular algorithm for solving the routing problem for field programmable gate arrays (FPGAs) has virtually remained the same for the past two decades. It is essentially an iterative maze technique, such as Dijkstra's algorithm, applied to each net in the circuit repeatedly. During multiple routing iterations, nets are ripped-up and rerouted via different paths to resolve competition for routing resources or to improve circuit delay. The most popular implementation of such a routing approach is the PathFinder algorithm used inside the VPR tool [1]. The quality of the routing solution depends however on the order in which nets are processed during each of the routing iterations. This is commonly referred to as the net ordering problem. PathFinder addresses this problem through continuous updates of the cost associated with overusing routing resources. After each routing iteration, the cost of overusing a routing resource is increased based on the routing so far, so that probability of resolving all congestion during future iterations increases. To further address the net ordering problem, in this paper, we investigate the effectiveness of two combined techniques to enhance PathFinder. We change the order in which nets are ripped-up and rerouted to give higher priority to nets with two, three, and more than eleven pins because these nets have the largest impact on the quality of the routing solution. Also, we alter the cost calculation during wave expansions for two-pin nets based on the global routing solution obtained by solving an equivalent multicommodity flow problem. Preliminary results suggest that the conventional FPGA routing solutions can still be improved.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117237362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martha Johanna Sepúlveda, G. Gogniat, Daniel Flórez, J. Diguet, C. Pedraza, M. Strum
{"title":"3D-LeukoNoC: A dynamic NoC protection","authors":"Martha Johanna Sepúlveda, G. Gogniat, Daniel Flórez, J. Diguet, C. Pedraza, M. Strum","doi":"10.1109/ReConFig.2014.7032485","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032485","url":null,"abstract":"Weaknesses derived from Through-Silicon-Vias (TSV)-based physical configurations of three-dimension Multiprocessor System-on-Chip (3D-MPSoC) can be exploited by malicious software to attack the system. By means of vertical communication manipulation an attacker is able to modify, spy and even denial the TSV communication. In this paper we propose 3D-LeukoNoC, a flexible and efficient security architecture based on 3D-NoC communication structure which, as the biological immune system, is able to detect attacks, isolate sensitive vertical communication while guaranteeing the correct system behavior. We compare our approach with several 3D-protection proposals, including the simple extension of 2D security countermeasures. We show that our solution outperforms other approaches with respect to attack detection while decreasing cost and performance impact on the system.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115326624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Place Reservation technique for online task placement on a multi-context heterogeneous reconfigurable architecture","authors":"Quang-Hoa Le, E. Casseau, A. Courtay","doi":"10.1109/ReConFig.2014.7032553","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032553","url":null,"abstract":"Dynamically and partially reconfigurable architectures, like FPGAs, have increasingly become heterogeneous with DSP, RAM and communication interface blocks. However, in most of online FPGA task placement approaches, the FPGA is modeled as a homogeneous architecture. In this work, we propose a heuristic which focus on the online task placement problem on a multi-context, dynamically and partially heterogeneous reconfigurable architecture. Configuration Prefetching and Anti-fragmentation well known techniques are combined with the Place Reservation technique in order to improve resource usage capacity. Compared to a placement without reservation, our approach improves, on average, by 33% the number of placed tasks and by 46% the resource utilization rate.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128485802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterization of OpenCL on a scalable FPGA architecture","authors":"Shanyuan Gao, Jeremy Chritz","doi":"10.1109/ReConFig.2014.7032505","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032505","url":null,"abstract":"The recent release of Altera's SDK for OpenCL has greatly eased the development of FPGA-based systems. Research have shown performance improvements brought by OpenCL using a single FPGA device. However, to meet the objectives of high performance computing, OpenCL needs to be evaluated using multiple FPGAs. This work has proposed a scalable FPGA architecture for high performance computing. The design includes multiple FPGA modules and a high performance backplane. The modular nature of this architecture supports the combination of different FPGAs, as well as provides for easy hardware updates. FPGA modules based on Stratix V are compatible with Altera's OpenCL tool flow. The evaluation has tested the native IO performance of the architecture and the results have demonstrated scalability using six FPGAs. The host-to-device peak bandwidth is measured as 13.1 GB/s for read operation and 12.1 GB/s for write operation. The FPGA-to-memory bandwidth is measured as 64.5 GB/s in total. An OpenCL AES kernel is selected to test the scalable multi-FPGA architecture. The test results have shown peak throughput is achiveded when six FPGAs are used. The throughput per watt shows 5× improvement using four FPGAs, over a general-purpose processor.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122143017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sam Skalicky, S. López, M. Lukowiak, Christopher A. Wood
{"title":"Mission control: A performance metric and analysis of control logic for pipelined architectures on FPGAs","authors":"Sam Skalicky, S. López, M. Lukowiak, Christopher A. Wood","doi":"10.1109/ReConFig.2014.7032539","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032539","url":null,"abstract":"The performance of a pipelined architecture is often limited by incorrectly designed or poorly implemented control logic. Once a design is implemented and meets timing constraints, the mission is to evaluate if it is achieving optimum performance. At this stage, the number of pipelines and functional units are fixed and the amount of resources and memory bandwidth are finalized. If a design is performing suboptimally the only recourse is to improve the control logic. In this paper we present a metric to quantify the achievable performance of a design and use it to analyze performance degradation due to control logic. We analyze the control logic of existing architectures and present improvements that achieve speedups of up to 10.7×.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"748 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133556964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Wilson, P. Zicari, S. Craciun, P. Gauvin, E. Carlisle, A. George, H. Lam
{"title":"A power-efficient real-time architecture for SURF feature extraction","authors":"C. Wilson, P. Zicari, S. Craciun, P. Gauvin, E. Carlisle, A. George, H. Lam","doi":"10.1109/ReConFig.2014.7032492","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032492","url":null,"abstract":"This paper presents a novel FPGA-based architecture for the Speeded-Up Robust Feature (SURF) extractor. By leveraging the inherent parallelism of the SURF algorithm, we designed a fully pipelined architecture implemented on the FPGA fabric of a Xilinx Zynq-7020 device (XC7Z020CLG484-1). Compared with other high-performing SURF designs in the literature, our implementation achieved the highest frame rate (131.36 fps) while compactly fitting on a single device and consuming only 0.608 Watts of average power. An experimental platform featuring a 640×480 resolution camera was used to compare the proposed design with OpenSURF, a widely used open-source C++ library, running on a high-end Intel i7 processor. Our system achieved real-time performance independent of the number of interest points extracted from the targeted image, and consistently outperformed the SURF software baseline, reaching a maximum speedup of 15. An extensive analysis was conducted to prove that the performance of our proposed architecture was as robust as the SURF algorithm to image transformations (rotation and scaling) and image distortions (blurring and pixelation), demonstrating that interest-point repeatability was maintained under varying viewing conditions.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134000937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling FPGA support in Matlab based heterogeneous systems","authors":"Sam Skalicky, Tyler Kwolek, S. López, M. Lukowiak","doi":"10.1109/ReConFig.2014.7032515","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032515","url":null,"abstract":"FPGAs have been shown to provide orders of magnitude improvement over CPUs and GPUs in terms of absolute performance and energy efficiency for various kernels such as Cholesky decomposition, matrix inversion, and FFT among others. Despite this, the overall performance of many applications suffer when implemented entirely in FPGAs. Combining FPGAs with CPUs and GPUs provides the range of capabilities needed to support diverse computational requirements of applications. Integrating FPGAs into these systems challenges application developers with constructing hardware kernel implementations and interfacing from the low level hardware logic in the FPGA to the high speed networks that connect processors in the system. In this work we extend the compute capabilities of Matlab by incorporating support for FPGAs and automating the parallel code generation. We characterize the system and evaluate the performance gains that can be achieved by adding the FPGA for two compute intensive applications. We present performance results for medical imaging and fluid dynamics applications implemented in a CPU+GPU+FPGA system and achieved up to 40× improvement compared to the standard Matlab CPU+GPU environment.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131214030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. F. Zazo, Marco Forconesi, S. López-Buedo, G. Sutter, J. Aracil
{"title":"TNT10G: A high-accuracy 10 GbE traffic player and recorder for multi-Terabyte traces","authors":"J. F. Zazo, Marco Forconesi, S. López-Buedo, G. Sutter, J. Aracil","doi":"10.1109/ReConFig.2014.7032561","DOIUrl":"https://doi.org/10.1109/ReConFig.2014.7032561","url":null,"abstract":"In this paper we present TNT10G (multi-Terabyte trace Network Tester), an FPGA-based tool for replaying and capturing massive Ethernet traces at 10 Gb/s. The tool is capable of reproducing and storing terabytes of network traffic at line rate, even if small packets are being used. Moreover, since the design works at low level (XGMII), accuracy is better than 10 ns, and it is also possible to observe and generate anomalous conditions, such as malformed frames, FCS errors, or illegal inter-frame gaps. All such features make TNT10G a truly useful tool for network testing and monitoring at 10 Gb/s. The design uses the NetFPGA-10G platform, although it could be easily ported to other boards since it uses standard AXI buses. The key element to achieve line-rate operation is a custom-developed Linux driver, which works in conjunction with a high-speed DMA backend core from Northwest Logic. Such blocks, together with a RAID0 array of commodity SSD disks, enable operation at 10 Gb/s. Finally, the use of a low-cost academic board together with off-the-shelf components allows for an open, extensible and cost-effective solution, a unique combination not found in commercial products.","PeriodicalId":137331,"journal":{"name":"2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129260889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}