{"title":"Session details: Tools and models 2","authors":"K. Rupnow","doi":"10.1145/3260943","DOIUrl":"https://doi.org/10.1145/3260943","url":null,"abstract":"","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131947754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lin Meng, K. Matsuyama, Naoto Nojiri, T. Izumi, K. Yamazaki
{"title":"Pipelining FPPGA-based defect detction in FPDs (abstract only)","authors":"Lin Meng, K. Matsuyama, Naoto Nojiri, T. Izumi, K. Yamazaki","doi":"10.1145/2554688.2554729","DOIUrl":"https://doi.org/10.1145/2554688.2554729","url":null,"abstract":"The real-time detection of defects in Flat-Panel Displays (FPDs) is very important during the production stages. This paper describes the manner in which defects induced by bubbles are detected as fast as possible by using 4-stage image processing pipelines with 3-line buffers on a Field-Programmable Gate Array (FPGA). The image processing consists of reading a Time Delay Integration (TDI) image, Laplacian filtering, binarization, and labeling. TDI is applied to the initial image of the FPD to reduce noises induced when taking the FPD images. Laplacian filtering and binarization are used to detect the edges in the image, and labeling is used to number the objects in the image for defect detection. In the 4-stage pipelining, the first stage reads the TDI image from the Block Random Access Memory (BRAM), the second stage implements Laplacian filtering and binarization, the third stage implements labeling, and the final stage revises the labels and writes them into the BRAM. The target pixel and its eight surrounding neighbors are required during Laplacian filtering, and four neighbors are necessary during labeling. Thus, three line registers (3-line buffer) are used as a general pipeline register between two neighboring stages in our system. The pipelining system accesses these 3-line buffers and runs four image processing steps in parallel. Therefore, the system uses four different addresses to access the BRAM and the 3-line buffers. Further, to facilitate performance comparison, we implemented sequential image processing systems with 3-line buffers on FPGA and CPU software. The experiments reveal that Laplacian filtering, binarization, and labeling for FPD defect detection can be executed in less than 1 ms by using four-stage pipelining on an FPGA, which is 3.62 times faster than the sequential system and 158.7 times faster than the CPU software. The pipelining system is 28% larger as compared to the sequential system in terms of the size of the LUTs.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129512714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable multi-access flash store for big data analytics","authors":"S. Jun, Ming Liu, Kermin Fleming, Arvind","doi":"10.1145/2554688.2554789","DOIUrl":"https://doi.org/10.1145/2554688.2554789","url":null,"abstract":"For many \"Big Data\" applications, the limiting factor in performance is often the transportation of large amount of data from hard disks to where it can be processed, i.e. DRAM. In this paper we examine an architecture for a scalable distributed flash store which aims to overcome this limitation in two ways. First, the architecture provides a high-performance, high-capacity, scalable random-access storage. It achieves high-throughput by sharing large numbers of flash chips across a low-latency, chip-to-chip backplane network managed by the flash controllers. The additional latency for remote data access via this network is negligible as compared to flash access time. Second, it permits some computation near the data via a FPGA-based programmable flash controller. The controller is located in the datapath between the storage and the host, and provides hardware acceleration for applications without any additional latency. We have constructed a small-scale prototype whose network bandwidth scales directly with the number of nodes, and where average latency for user software to access flash store is less than 70mus, including 3.5mus of network overhead.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133690086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenyi Feng, J. Greene, Kristofer Vorwerk, V. Pevzner, A. Kundu
{"title":"Rent's rule based FPGA packing for routability optimization","authors":"Wenyi Feng, J. Greene, Kristofer Vorwerk, V. Pevzner, A. Kundu","doi":"10.1145/2554688.2554763","DOIUrl":"https://doi.org/10.1145/2554688.2554763","url":null,"abstract":"Packing is a critical step in the CAD flow for cluster-based FPGA architectures, and has a significant impact on the quality of the final placement and routing results. One basic quality metric is routability. Traditionally, minimizing cut (the number of external signals) has been used as the main criterion in packing for routability optimization. This paper shows that minimizing cut is a sub-optimal criterion, and argues to use the Rent characteristic as the new criterion for FPGA packing. We further propose using a recursive bipartitioning-based k-way partitioner to optimize the Rent characteristic during packing. We developed a new packer, PPack2, based on this approach. Compared to T-VPack, PPack2 achieves 35.4%, 35.6%, and 11.2% reduction in wire length, minimal channel width, and critical path delay, respectively. These improvements show that PPack2 outperforms all previous leading packing tools (including iRAC, HDPack, and the original PPack) by a wide margin.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121315168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A methodology for identifying and placing heterogeneous cluster groups based on placement proximity data (abstract only)","authors":"Farnaz Gharibian, Lesley Shannon, P. Jamieson","doi":"10.1145/2554688.2554726","DOIUrl":"https://doi.org/10.1145/2554688.2554726","url":null,"abstract":"Due to the rapid growth in the size of designs and Field Programmable Gate Arrays (FPGAs), CAD run-time has increased dramatically. Reducing FPGA design compilation times without degrading circuit performance is crucial. In this work, we describe a novel approach for incremental design flows that both identifies tightly grouped FPGA logic blocks and then uses this information during circuit placement. Our approach reduces placement run-time on average by more than 17% while typically maintaining the design's critical path delay and marginally increasing its minimum channel width and wire length on average. Instead of following the traditional approach of evaluating a circuit's pre-placement netlist, this new algorithm analyzes designs post-placement to detect proximity data. It uses this information to non-aggressively extract heterogeneous cluster groupings from the design, which we call \"gems,\" that consist of two to seventeen clusters. We modified VPR's simulated annealing placement algorithm to use our Singularity Placer, which first crushes each cluster grouping into a \"singularity,\" to be treated as a single cluster. We then run the annealer over this condensed circuit, followed by an expansion of the singularities, and a second annealing phase for the entire expanded circuit.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133113180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}