Yuuri Sugihara, Yohei Kume, Kazutoshi Kobayashi, H. Onodera
{"title":"Performance optimization by track swapping on critical paths utilizing random variations for FPGAS","authors":"Yuuri Sugihara, Yohei Kume, Kazutoshi Kobayashi, H. Onodera","doi":"10.1109/FPL.2008.4629994","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629994","url":null,"abstract":"Since FPGAs in future deep sub-micron processes will suffer from drastic speed and yield losses caused by device variations, we propose variation-aware reconfiguration that utilizes these variations for performance enhancement. To utilize random variations on a current deep submicron process for performance enhancement, optimizing each device from a common configuration is better than producing optimized configurations based on detailed measurement results. In this paper we apply a track swapping procedure to critical path reconfiguration. First, we configure all fabricated FPGAs with common configuration data. The configuration of each die is optimized to reroute the critical paths that do not satisfy timing specifications. The rerouting of a critical path usually causes serious topology changes that may prolong other paths and create new critical paths. In the track swapping procedure, we swap a wire track on a critical path for the adjacent track without any topology changes by switching blocks with more flexibility. We experiment on performance enhancement by applying track swapping to LGSynth93 benchmark circuits. The average speed enhancement is 2.45%, and the average yield enhancement is 32.7% when the standard deviation of the random variations is 10.0%.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127418245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meikang Qiu, Jiande Wu, C. Xue, J. Hu, Wei-Che Tseng, E. Sha
{"title":"Loop scheduling and assignment to minimize energy while hiding latency for heterogeneous multi-bank memory","authors":"Meikang Qiu, Jiande Wu, C. Xue, J. Hu, Wei-Che Tseng, E. Sha","doi":"10.1109/FPL.2008.4629983","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629983","url":null,"abstract":"Many high-performance DSP processors employ multi-bank on-chip memory to improve performance and energy consumption. This architectural feature supports higher memory bandwidth by allowing multiple data memory accesses to be executed in parallel. This paper studies the scheduling and assignment problem on minimizing the total energy consumption while satisfying timing constraint with heterogeneous multi-bank memory for applications with loop. An algorithm, TASL (Type Assignment and Scheduling for Loops), is proposed. The algorithm uses loop scheduling and assignment with the consideration of variable partition to find the best configuration for both memory and ALU.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126747853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A computation- and communication- infrastructure for modular special instructions in a dynamically reconfigurable processor","authors":"L. Bauer, M. Shafique, J. Henkel","doi":"10.1109/FPL.2008.4629932","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629932","url":null,"abstract":"Processors with a reconfigurable instruction set combine the performance of dedicated application accelerators with a flexibility that goes beyond that of traditional application specific instruction set processors (ASIPs). The latter are optimized for certain application domains and thus typically do not provide a high performance and/or efficiency when deployed in other domains. State-of-the-art reconfigurable processors on the other side still use the concept of monolithic Special Instructions (SIs, i.e. the application accelerators). In our work, we instead present modular SIs as a hierarchy of elementary data paths and different SI implementations that facilitate a high flexibility and performance. This is a novel concept that achieves a speedup of 26.6x compared to a general purpose processor and 1.24x compared to a state-of-the-art reconfigurable processor (that is statically optimized for the predetermined benchmark situation) when executing an H.264 video encoder. We introduce a novel infrastructure for computation and communication that actually enables the implementation of modular SIs and offers various parameters to match specific requirements. The infrastructure is implemented and tested on an FPGA-based prototype to demonstrate its feasibility.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"49 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114006334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Acceleration of a production rigid molecule docking code","authors":"Bharat Sukhwani, M. Herbordt","doi":"10.1109/FPL.2008.4629955","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629955","url":null,"abstract":"Modeling the interactions of biological molecules, or docking is critical to both understanding basic life processes and to designing new drugs. Here we describe the FPGA-based acceleration of a recently developed, complex, production docking code. We find that it is necessary to extend our previous 3D correlation structure in several ways, most significantly to support simultaneous computation of several correlation functions. The result is a hundred-fold speed-up of a section of the code that represents over 92% of the original run-time. An additional 4% is accelerated through a previously described method, yielding a total acceleration of almost 25times for typical protein-ligand combinations.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122401218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"File system access from reconfigurable FPGA hardware processes in BORPH","authors":"Hayden Kwok-Hay So, R. Brodersen","doi":"10.1109/FPL.2008.4630010","DOIUrl":"https://doi.org/10.1109/FPL.2008.4630010","url":null,"abstract":"This paper presents the design and implementation of BORPHpsilas kernel file system layer that provides FPGA processes direct access to the general file system. Using a semantics resembling that of conventional UNIX file I/Os, an FPGA accesses the file system through a special hardware system call interface. By extending the semantics of a UNIX pipe, a single file system access mechanism is used for both regular file I/O, as well as for hardware/software and hardware/hardware data streaming. An FPGA design may switch between different communication modes dynamically during run time by means of file redirection. Design trade-offs among system manageability, user usability and application performance are explored. An example of constructing a video processing system during run time using commodity software and FPGA applications connected by pipes is used to demonstrate the feasibility and potential of such FPGA-centric file system access capability.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126975308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combating process variation on FPGAS with a precise at-speed delay measurement method","authors":"Justin S. J. Wong, P. Cheung, N. P. Sedcole","doi":"10.1109/FPL.2008.4630046","DOIUrl":"https://doi.org/10.1109/FPL.2008.4630046","url":null,"abstract":"The goal of this PhD project is to devise a way to combat the effect of process variation on propagation delays in modern FPGAs. Through our research, we have devised a novel measurement method that is capable of measuring the delays of components on FPGAs with picosecond timing resolution and fine spatial granularity. The method avoids the use of external test equipment and able to measure stochastic delay variability, which is becoming increasingly significant. The aim is to exhaustively test FPGA components based on this method and use the results to optimise the placement and routing of circuits in FPGAs to maximise performance under the negative influence of process variation.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133244910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenshu Seto, Yuta Nonaka, T. Maruizumi, Y. Shiraki
{"title":"SAT-based resource binding for reducing critical path delays","authors":"Kenshu Seto, Yuta Nonaka, T. Maruizumi, Y. Shiraki","doi":"10.1109/FPL.2008.4629995","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629995","url":null,"abstract":"In this paper, a new function unit binding approach based on SAT is proposed. Differently from previous approaches, which heuristically minimize the total numbers of inputs of multiplexers, the proposed approach generates SAT formulas that constrain the numbers of inputs of specific multiplexers to certain numbers and produces a solution that satisfies the constraints with a SAT solver. The proposed approach is applied to constrain the numbers of inputs of the multiplexers that lie between the input and output registers of multipliers, since these multiplexers are likely to be on critical paths. Experimental comparisons with a traditional approach show that the proposed approach is promising for reducing critical path delays.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133954910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An element-by-element preconditioned Conjugate Gradient solver of 3D tetrahedral finite elements on an FPGA coprocessor","authors":"Jing Hu, S. Quigley, A. Chan","doi":"10.1109/FPL.2008.4630012","DOIUrl":"https://doi.org/10.1109/FPL.2008.4630012","url":null,"abstract":"An element-by-element preconditioned conjugate gradient (PCG) iterative solver for the solution of 3D finite element analysis has been implemented into a FPGA-based 32-bit floating-point reconfigurable computer. The algorithm formulation has been chosen in order to optimize the match with the capabilities of the FPGA platform. It is capable of giving a speed-up of about 40 times compared to an optimized 64-bit software version running on a fast PC.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134158961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Fekete, Tom Kamphans, Nils Schweer, Christopher Tessars, J. V. D. Veen, Josef Angermeier, Dirk Koch, J. Teich
{"title":"No-break dynamic defragmentation of reconfigurable devices","authors":"S. Fekete, Tom Kamphans, Nils Schweer, Christopher Tessars, J. V. D. Veen, Josef Angermeier, Dirk Koch, J. Teich","doi":"10.1109/FPL.2008.4629917","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629917","url":null,"abstract":"We propose a new method for defragmenting the module layout of a reconfigurable device, enabled by a novel approach for dealing with communication needs between relocated modules and with inhomogeneities found in commonly used FPGAs. Our method is based on dynamic relocation of module positions during runtime, with only very little reconfiguration overhead; the objective is to maximize the length of contiguous free space that is available for new modules. We describe a number of algorithmic aspects of good defragmentation, and present an optimization method based on tabu search. Experimental results indicate that we can improve the quality of module layout by roughly 50% over static layout. Among other benefits, this improvement avoids unnecessary rejection of modules.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134086596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA acceleration of Monte-Carlo based credit derivative pricing","authors":"Alexander Kaganov, P. Chow, A. Lakhany","doi":"10.1109/FPL.2008.4629953","DOIUrl":"https://doi.org/10.1109/FPL.2008.4629953","url":null,"abstract":"In recent years the financial world has seen an increasing demand for faster risk simulations, driven by growth in client portfolios. Traditionally many financial models employ Monte-Carlo simulation, which can take excessively long to compute in software. This paper describes a hardware implementation for collateralized debt obligations (CDOs) pricing, using the one-factor Gaussian copula (OFGC) model. We explore the precision requirements and the resulting resource utilization for each number representation. Our results show that our hardware implementation mapped onto a Xilinx XC5VSX50T is over 63 times faster than a software implementation running on a 3.4 GHz Intel Xeon processor.","PeriodicalId":137963,"journal":{"name":"2008 International Conference on Field Programmable Logic and Applications","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116151744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}