{"title":"Automatic generation of run-time parameterizable configurations","authors":"Karel Bruneel, D. Stroobandt","doi":"10.1109/FPL.2008.4629964","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629964","url":null,"abstract":"In many applications, subsequent data manipulations differ only in a small set of parameter values. Because of their reconfigurability, FPGAs (field programmable gate arrays) can be configured with an optimized configuration every time the parameter values change. These optimized configurations are smaller and faster than their generic counterparts. However, the overhead involved in generating the configurations at run-time with conventional tools is very large. This paper introduces an automatic method for generating runtime parameterizable configurations from arbitrary Boolean circuits. These configurations in which some of the configuration bits are expressed as a function of a set of parameters enable very fast run-time specialization since specialization only involves evaluating these functions. Our approach is validated on adaptive filtering. We show that the specialized filter configurations produced by our method are 2.3 times smaller and 36% faster than a generic filter configuration and that these configurations can be generated in on average 166 mus. Being a generic method, run-time hardware optimization suddenly becomes feasible for a large class of applications.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132203281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Resource allocation algorithm and OpenMP extensions for parallel execution on a heterogeneous reconfigurable platform","authors":"V. Sima, E. Panainte, K. Bertels","doi":"10.1109/FPL.2008.4630031","DOIUrl":"https://doi.org/10.1109/FPL.2008.4630031","url":null,"abstract":"In this paper, we present the compiler extensions, based on OpenMP libraries, needed for supporting parallel execution on the reconfigurable Molen platform. More specifically, we propose an ILP algorithm to map parallel applications on the target platform, assuming that for a section of the application, the designer can select from a set of hardware implementations with different area and speedup features. Based on profile information, the algorithm aims to minimize the total execution time of the running threads, taking into account the limited reconfigurable area. We show that the speedup of our algorithm compared to other related algorithms is up to 1.9times for a real application and the real hardware implementation of the kernels. We also investigate the impact of several factors such as the size of the reconfigurable area and the number of threads on our algorithm and determine the range of parameters for which the algorithm is efficient.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"8 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114034806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thilo Pionteck, R. Koch, C. Albrecht, E. Maehle, Michael Meitinger, Rainer Ohlendorf, Thomas Wild, A. Herkersdorf
{"title":"SPP1148 booth: Network processors","authors":"Thilo Pionteck, R. Koch, C. Albrecht, E. Maehle, Michael Meitinger, Rainer Ohlendorf, Thomas Wild, A. Herkersdorf","doi":"10.1109/FPL.2008.4629960","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629960","url":null,"abstract":"Traditional design of network processors is complicated by two conflicting demands, flexibility and performance. On the one side, network processors should be flexible enough to adapt to changing protocols and varying traffic profiles, on the other side they have to cope with increasing data rates of network links. This demonstrator shows that runtime reconfigurable systems have the potential to optimise both criteria without affecting each other negatively. The demonstrator addresses edge router applications and consists of two independently developed subsystems, the FlexPath NP architecture designed at the TU Munchen and the Dyna-CORE architecture designed at the University of Lubeck.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114485155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An ILP formulation for architectural synthesis and application mapping on FPGA-based hybrid multi-processor SOC","authors":"Jason Wu, John W. Williams, N. Bergmann","doi":"10.1109/FPL.2008.4629981","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629981","url":null,"abstract":"In this paper, we present an ILP formulation to assist designers to identify the architectural design, binding schema and scheduling algorithm while satisfying physical constraints such as available logic resources, computation time and memory usage used. Directing the solver to optimise for logic usage, execution time, or other parameters allows ease of exploration of the design space. This case study shows how a proposed ILP formulation solves the design exploration problem in the domain of FPGA-based MPSoC design.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132293850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application-adaptive reconfiguration of memory address shuffler for FPGA-embedded instruction-set processor","authors":"Youngsu Kwon, N. Eum","doi":"10.1142/S0218126610006748","DOIUrl":"https://doi.org/10.1142/S0218126610006748","url":null,"abstract":"Programmability requirement in reconfigurable systems necessitates the integration of soft processors in FPGAs. The extensive memory bandwidth sets a major performance bottleneck in soft processors for media applications. While the parallel memory system is a viable solution to account for a large amount of memory transactions in media processors, the memory access conflicts caused by multiple memory buses limit the overall performance. We propose and evaluate the configurable memory address shuffler to be integrated in the memory access arbiter for the parallel memory system in a soft processor. The novel address shuffling algorithm reallocates the decomposed memory sub-pages based on the access conflict graph obtained by profiling the memory access pattern of the application to produce the synthesizable code. The address shuffler efficiently translates the requested memory addresses into the shuffled addresses such that the amount of simultaneous accesses to the identical physical memory block diminishes. The reconfigurability of the address shuffler enables the adaptive address shuffling depending on the memory access pattern of an application running on the soft processor. The configurable address shuffler reduces the amount of access conflicts by 80% on average utilizing 1592 LUTs which is 14% of that of the processor.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128083540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kristofer Vorwerk, Madhu Raman, J. Dunoyer, Yaun-Chung Hsu, A. Kundu, A. Kennings
{"title":"A technique for minimizing power during FPGA placement","authors":"Kristofer Vorwerk, Madhu Raman, J. Dunoyer, Yaun-Chung Hsu, A. Kundu, A. Kennings","doi":"10.1109/FPL.2008.4629937","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629937","url":null,"abstract":"This paper considers the implementation of an annealing technique for dynamic power reduction in FPGAs. The proposed method comprises a power-aware objective function for placement and is implemented in a commercial tool. In particular, a capacitance model based on multi-dimensional nonlinear regression is described, as well as a new capacitance model for global nets. The importance and advantages of these models are highlighted in terms of the overall attainable reduction in power in a real, commercially-available architecture and tool flow. The results are quantified across a range of industrial benchmarks targeting the Actelreg IGLOOtrade FPGA architecture. Power measurements show that, across a suite of 120 industrial designs, the technique described in this paper reduces dynamic power by 13% on average, with only a 1% degradation in timing performance.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121911778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Pedraza, Emilio Castillo, J. Castillo, C. Camarero, J. L. Bosque, J. Martínez, R. M. D. Llano
{"title":"Cluster architecture based on low cost reconfigurable hardware","authors":"C. Pedraza, Emilio Castillo, J. Castillo, C. Camarero, J. L. Bosque, J. Martínez, R. M. D. Llano","doi":"10.1109/FPL.2008.4630017","DOIUrl":"https://doi.org/10.1109/FPL.2008.4630017","url":null,"abstract":"The SMILE project accelerates scientific and industrial applications by means of a cluster of low-cost FPGA boards. With this approach the intensive calculation tasks are accelerated using the FPGA logic, while the communication patterns of the applications remains unchanged by using a Message Passing Library over Linux. This paper explains the cluster architecture: the SMILE nodes and the developed high-speed communication network for the FPGA RocketIO interfaces. A SystemC model developed to simulate the cluster is also detailed. In order to show the potential of the SMILE proposal a Content-Based Information Retrieval parallel application has been developed and compared with a HP cluster architecture in terms of response time andpower consumption.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"306 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125765073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ming Liu, J. Lang, Shuo Yang, T. Perez, W. Kuehn, Hao Xu, D. Jin, Qiang Wang, Lu Li, Zhen'An Liu, Zhonghai Lu, A. Jantsch
{"title":"ATCA-based computation platform for data acquisition and triggering in particle physics experiments","authors":"Ming Liu, J. Lang, Shuo Yang, T. Perez, W. Kuehn, Hao Xu, D. Jin, Qiang Wang, Lu Li, Zhen'An Liu, Zhonghai Lu, A. Jantsch","doi":"10.1109/FPL.2008.4629946","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629946","url":null,"abstract":"An ATCA-based computation platform for data acquisition and trigger applications in nuclear and particle physics experiments has been developed. Each compute node (CN) which appears as a field replaceable unit (FRU) in an ATCA shelf, features 5 Xilinx Virtex-4 FX60 FPGAs and up to 10 GBytes DDR2 memory. Connectivity is provided with 8 optical links and 5 Gigabit Ethernet ports, which are mounted on each board to receive data from detectors and forward results to outer shelves or PC farms with attached mass storage. Fast point-to-point on-board interconnections between FPGAs as well as the full-mesh shelf backplane provide flexibility and high bandwidth to partition algorithms and correlate results among them. The system represents a highly reconfigurable and scalable solution for multiple applications.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122013317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GICS: Generic interconnection system","authors":"Tamas Malek, Tomáš Martínek, J. Korenek","doi":"10.1109/FPL.2008.4629942","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629942","url":null,"abstract":"The division of an application between a conventional processor and an acceleration card with FPGA chips has been proved as a suitable way for an acceleration of computationally intensive tasks. In such applications, the designer usually has to implement an interconnection between components placed in FPGA and the host system bus. This task is often complicated by different requirements of user components for throughput, latency of reading operations, need for DMA transfers etc. The objective of this work is to show a new approach for implementation of interconnection systems and to enable the designer to focus on the development of the target application. The proposed interconnection system is based on tree topology. The system eliminates the sensitivity of wide buses to the distance, supports the connection of components with different requirements for throughput, supports split transaction model and many other features. The proposed system is implemented and evaluated on chips with Virtex 5 technology.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126023085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Numerical function generators using bilinear interpolation","authors":"Shinobu Nagayama, Tsutomu Sasao, J. T. Butler","doi":"10.1109/FPL.2008.4629984","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629984","url":null,"abstract":"Two-variable numerical functions are widely used in various applications, such as computer graphics and digital signal processing. Fast and compact hardware implementations are required. This paper introduces the bilinear interpolation method to produce fast and compact numerical function generators (NFGs) for two-variable functions. This paper also introduces a design method for symmetric two-variable functions. This method can reduce the memory size needed for symmetric functions by nearly half with small speed penalty. Experimental results show that the bilinear interpolation method can significantly reduce the memory size needed for two-variable functions, and the speed of NFGs based on the bilinear method is comparable to that of NFGs based on tangent plane approximation. For a complicated function, our NFG is faster and more compact than a circuit designed using a one-variable NFG.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130188226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}