{"title":"Binary stochastic implementation of digital logic","authors":"Yanzi Zhu, Peiran Suo, K. Bazargan","doi":"10.1145/2554688.2554778","DOIUrl":"https://doi.org/10.1145/2554688.2554778","url":null,"abstract":"Stochastic computing refers to a mode of computation in which numbers are treated as probabilities implemented as 0/1 bit streams, which essentially is a unary encoding scheme. Previous work has shown significant reduction in area and increase in fault tolerance for low to medium resolution values (6-10 bits). However, this comes at very high latency cost. We propose a novel hybrid approach combining traditional binary with unary stochastic encoding, called binary stochastic. Similar to the binary representation, it is a positional number system, but instead of only 0/1 digits, the digits would be fractions. We show how simple logic such as adders and multipliers can be implemented, and then show more complex function implementations such as the gamma correction function and functions such as tanh, absolute and exponentiation using both combinational and sequential binary stochastic logic. Our experiments show significant reduction in latency compared to unary stochastic, while using significantly smaller area compared to binary implementations on FPGAs.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"44 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130490016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new basic logic structure for data-path computation (abstract only)","authors":"P. Gaillardon, L. Amarù, G. Micheli","doi":"10.1145/2554688.2554701","DOIUrl":"https://doi.org/10.1145/2554688.2554701","url":null,"abstract":"Nowadays, Field Programmable Gate Arrays (FPGA) implement arithmetic functions using specific circuits at the logic block level, such as the carry paths, or at the structure level adopting Digital Signal Processing (DSP) blocks. Nevertheless, all these approaches, introduced to ease the realization of specific functions, are lacking of generality. In this paper, we introduce a new logic block that natively realizes arithmetic functions while preserving the versatility to implement general logic functions. It consists of a partially interconnected matrix of signal routers driven by comparators. We demonstrate that this structure can realize (i) any 2-output 2-input logic function or (ii) any single-output 3-input logic function or (iii) specific logic, such as arithmetic functions, with up to 4-output and 8-inputs. As compared to a standard 6-input Look Up Table (LUT), the proposed block requires roughly the same area but is 35.3% faster. Even though the proposed block has not the same exhaustive configurability of a 6-input LUT, there are arithmetic functions realizable in a single block that do not fit in one, or even more, 6-input LUT. For example, a single block inherently implements an entire 3-bit adder that requires 3× more resources with LUTs plus also custom circuitry. From a system level perspective, we show that a 256-bit adder is implemented with a gain on area×delay product of 31% as compared to its traditional LUT-based counterpart.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127897732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An automatic netlist and floorplanning approach to improve the MTTR of scrubbing techniques (abstract only)","authors":"Bernhard Schmidt, Daniel Ziener, J. Teich","doi":"10.1145/2554688.2554730","DOIUrl":"https://doi.org/10.1145/2554688.2554730","url":null,"abstract":"We introduce a new SEU mitigation approach which minimizes the scrubbing effort by a) using an automatic classification of the criticality of netlist instances and their resulting configuration bits, and by b) minimizing the number of frames which must be scrubbed by using intelligent floorplanning. The criticality of configuration bits is defined by the actions needed to correct a radiation-induced SEU at this bit. Indeed, circuits that involve feedback loops might still and infinitely cause a malfunction even if scrubbing is applied to involved configuration frames. Here, only supplementary state-restoring might be a viable solution. By analyzing an FPGA design already at the logic level and partition configuration bits of the resulting FPGA mapping into so-called essential bits and critical bits, we are able to significantly reduce the number of time consuming state-restoring actions. Moreover, by using placement and routing constraints, it is shown how to minimize the number of frames which have to be reconfigured or checked when using scrubbing. By applying both methods, we will show a reduction of the Mean-Time-To-Repair (MTTR) for sequential benchmark circuits by up to 48.5% compared to a state-of-the-art approach.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123259606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-efficient multiplier-less discrete convolver through probabilistic domain transformation","authors":"Mohammed Alawad, Yu Bai, R. Demara, Mingjie Lin","doi":"10.1145/2554688.2554769","DOIUrl":"https://doi.org/10.1145/2554688.2554769","url":null,"abstract":"Energy efficiency and algorithmic robustness typically are conflicting circuit characteristics, yet with CMOS technology scaling towards 10-nm feature size, both become critical design metrics simultaneously for modern logic circuits. This paper propose a novel computing scheme hinged on probabilistic domain transformation aiming for both low power operation and fault resilience. In such a computing paradigm, algorithm inputs are first encoded through probabilistic means, which translates the input values into a number of random samples. Subsequently, light-weight operations, such as sim- ple additions will be performed onto these random samples in order to generate new random variables. Finally, the resulting random samples will be decoded probabilistically to give the final results.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114078938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Viswanathan, R. B. Atitallah, J. Dekeyser, Benjamin Nakache, M. Nakache
{"title":"Redefining the role of FPGAs in the next generation avionic systems (abstract only)","authors":"V. Viswanathan, R. B. Atitallah, J. Dekeyser, Benjamin Nakache, M. Nakache","doi":"10.1145/2554688.2554744","DOIUrl":"https://doi.org/10.1145/2554688.2554744","url":null,"abstract":"Embedded reconfigurable computing is becoming a new paradigm for system designers in avionic applications. In fact, FPGAs can be used for more than just computational purpose in order to improve the system performance. The introduction of FPGA Mezzanine Card (FMC) I/O standard has given a new purpose for FPGAs to be used as a communication platform. Taking into account the features offered by FPGAs and FMCs, such as runtime reconfiguration and modularity, we have redefined the role of these devices to be used as a generic communication and computation-centric platform. A new modular, runtime reconfigurable, Intellectual Property (IP)-based communication-centric platform for avionic applications has been designed. This means that, when the communication requirement of an avionic system changes, the necessary communication protocol is installed and executed on demand, without disturbing the normal operation of a time-critical avionic system. The efficiency and the performances of our platform are illustrated through a real industrial use-case designed using a computationally intensive application and several avionic I/O bus standards. The reconfiguration latency can be hidden totally in many cases. While in certain others, the overhead of reconfiguration can be justified by the reduction in the resource utilization.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116212236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Rethinagiri, Oscar Palomar, A. Cristal, O. Unsal
{"title":"Power estimation tool for system on programmable chip based platforms (abstract only)","authors":"S. Rethinagiri, Oscar Palomar, A. Cristal, O. Unsal","doi":"10.1145/2554688.2554718","DOIUrl":"https://doi.org/10.1145/2554688.2554718","url":null,"abstract":"The ever increasing complexity of the applications result in the development of power hungry processors. There is a scarcity of standalone tools that have a good trade off between estimation speed and accuracy to estimate power/energy at an earlier phase of design flow. There are very few tools that addresses the design space exploration issue based on power and energy. In this paper, we propose a virtual platform based standalone power and energy estimation tool for System-on-Programmable Chip (SoPC) embedded platforms, which is independent of in-house tools. There are two steps involved in this tool development. The first step is power model generation. For the power model development, we used functional parameters to set up generic power models for the different parts of the system. This is a onetime activity. In the second step, a simulation based virtual platform framework is developed to evaluate accurately the activities used in the related power models developed in the first step. The combination of the two steps lead to a hybrid power estimation, which gives a better trade-off between accuracy and speed. The proposed tool has several benefits: it considers the power consumption of the embedded system in its entirety and leads to accurate estimates without a costly and complex material. The proposed tool is also scalable for exploring complex embedded multi-core architectures. The effectiveness of our proposed tool is validated through dualcore RISC processor designed around the FPGA board and extended to accommodate futuristic multi-core processors for a reliable energy based design space exploration. The accuracy of our proposed tool is evaluated by using a variety of industrial benchmarks such as Multimedia, EEMBC and SPEC2006. Estimated power values are compared to real board measurements and also to McPAT. Our obtained power/energy estimation results provide less than 9% of error for heterogeneous MPSoC based system and are 200% faster compared to other state-of-the-art power estimation tools.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"09 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116536653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Tools and methods","authors":"J. Anderson","doi":"10.1145/3260938","DOIUrl":"https://doi.org/10.1145/3260938","url":null,"abstract":"","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123799353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chao Wang, Xi Li, Xuehai Zhou, Yunji Chen, K. Bertels
{"title":"Co-processing with dynamic reconfiguration on heterogeneous MPSoC: practices and design tradeoffs (abstract only)","authors":"Chao Wang, Xi Li, Xuehai Zhou, Yunji Chen, K. Bertels","doi":"10.1145/2554688.2554695","DOIUrl":"https://doi.org/10.1145/2554688.2554695","url":null,"abstract":"Reconfiguration technique has been considered as one of the most promising electronic design automation (EDA) technologies in MPSoC design paradigms. However, due to the unavoidable latency in the reconfiguration procedure, it still poses a significant challenge to efficiently analyze the trade-offs for the software/hardware execution, static reconfiguration and dynamic reconfiguration. In this paper we first present a heterogeneous MPSoC middleware to support state-of-the-art dynamic partial reconfigurable technologies. Furthermore, we evaluate the reconfiguration latency and analyze the trade-off for the dynamic partial reconfiguration technologies. As a practical study, a heterogeneous MPSoC prototype with JPEG application has been developed on Xilinx Zynq FPGA with state-of-the-art static/dynamic partial reconfigurable technologies. Experimental results on the JPEG case studies demonstrated the leverage among the software execution, hardware execution, and static/dynamic reconfiguration. For the quantitative approach, we have demonstrated the execution time for the different configuration of the hardware steps in JPEG, and the quantitative impact of the dynamic reconfiguration execution. The dynamic reconfiguration could gain the performance benefits for large scale (larger than a certain threshold) computational tasks. Furthermore, overheads and HWICAP hardware utilization have been measured discussed. This work was supported by the NSFC grants No. 61379040, No. 61272131 and No. 61202053.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116161565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A FPGA prototype design emphasis on low power technique","authors":"Xu Hanyang, Wang Jian, Jin Meilai","doi":"10.1145/2554688.2554762","DOIUrl":"https://doi.org/10.1145/2554688.2554762","url":null,"abstract":"In this paper, we propose a fully-functional Nanometer FPGA prototype chip. Compared to traditional single supply voltage, single threshold voltage design, we explore low power nanometer FPGA design challenges with Multi-Vt, Static Voltage Scaling and sleep mode technique. Compared to Dynamic Voltage Scaling (DVS), we make a table of Voltage-Delay parameter pairs under different voltage conditions so that timing information can be calculated by a Static Timing Analysis (STA) tool. Thus a lowest supply power is chosen among all results which meet the timing requirements. This approach would simplify the hardware design since we don't need a complex workload detection circuit compared to DVS system. By separating supply voltages, we can directly shutdown power supply of the unused circuits. Compared to inserting sleep transistor in pull-up or pull-down networks, we can eliminate the speed penalty cased by the additional sleep transistor. We implement a tile-based heterogeneous architecture with island style routing and embedded specific blocks such as DSP and memory. The array size is 64×31 (Row×Col) including 64×24 CLBs. The final design is fabricated using a 1P10M 65-nm bulk CMOS process. Test results show a 53% reduction in static power compared to a commercial FPGA device which is also fabricated in 65nm process and has a similar array size.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121336954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Wang, Donghoon Yeo, Muhammad Sohail, Hyunchul Shin
{"title":"Control signal aware slice-level window based legalization method for FPGA placement (abstract only)","authors":"Yu Wang, Donghoon Yeo, Muhammad Sohail, Hyunchul Shin","doi":"10.1145/2554688.2554727","DOIUrl":"https://doi.org/10.1145/2554688.2554727","url":null,"abstract":"The control signal sharing while packing flip-flops and other instances in slices is a necessary constraint in the placement of instances in FPGAs. Global placement usually does not consider signal sharing. In this paper, we propose a control signal aware slice-level packing algorithm within the framework of window based legalization method to obtain an optimized legal layout, satisfying all constraints, after global placement. We select a target window with the highest number of overlaps. Then, we check the capacity of the target window and adjust its size to secure enough space required for legalization. Lastly, window based legalization takes three constraints into account: 1) Control Signal Sharing: Two Flip-Flops in a slice must share a single control signal in FPGA architecture. 2) CLB Architecture Matching: Instances should be placed within a half slice to minimize the routing requirement. 3) Slice Level Packing: Instances are packed into slices for effective utilization of available empty space within a window. The experimental results show that our algorithm performs better with 45% less block displacement and 10% less runtime with the same wirelength when compared to a previous well-known mixed size block greedy legalization method [1].","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127764393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}