{"title":"Analysis of matrix multiplication on high density Virtex-7 FPGA","authors":"Wilson Jose, Ana Rita Silva, H. Neto, M. Véstias","doi":"10.1109/FPL.2013.6645604","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645604","url":null,"abstract":"In this work, we have developed a theoretical model of matrix multiplication including a detailed model of external memory access. We have used the model to guide the design of a many core architecture. The architecture was modeled and simulated in SystemC and a small prototype was implemented in an FPGA board to determine the accuracy of the model. Finally, using the model, we determined the achievable performance in Virtex-7 FPGAs. The results indicate the correctness of the model and the performance of state-of-the-art FPGAs in the execution of matrix-multiplication.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116635264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giovanni Mariani, V. Sima, G. Palermo, V. Zaccaria, G. Marchiori, C. Silvano, K. Bertels
{"title":"Run-time optimization of a dynamically reconfigurable embedded system through performance prediction","authors":"Giovanni Mariani, V. Sima, G. Palermo, V. Zaccaria, G. Marchiori, C. Silvano, K. Bertels","doi":"10.1109/FPL.2013.6645523","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645523","url":null,"abstract":"A key tool to increase the exploitation of dynamic reconfigurable platforms is the run-time resource manager. This system module coordinates the usage of both software and reconfigurable hardware resources in the context of a multi-programmed environment, by alleviating the operating system's induced overhead. This paper introduces a two-layers run-time resource manager for dynamic reconfigurable platforms. The upper level is composed of several application-level managers (one for each application) that provide the most suitable mapping based on resource constraints and performance prediction. The lower level is composed of a centralized system-level resource manager that assigns the HW/SW resources to each application. We present a video surveillance case study in which the proposed resource management technique outperforms the performance of the state of the art by 28% on average, introducing a computational time overhead within 2%.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131160776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Carlo, Giulio Gambardella, P. Prinetto, Daniele Rolfo, Pascal Trotta, P. Lanza
{"title":"FEMIP: A high performance FPGA-based features extractor & matcher for space applications","authors":"S. Carlo, Giulio Gambardella, P. Prinetto, Daniele Rolfo, Pascal Trotta, P. Lanza","doi":"10.1109/FPL.2013.6645606","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645606","url":null,"abstract":"Nowadays, Video-Based Navigation (VBN) is increasingly used in space-applications. The future space-missions will include this approach during the Entry, Descent and Landing (EDL) phase, in order to increase the landing point precision. This paper presents FEMIP: a high performance FPGA-based features extractor and matcher tuned for space applications. It outperforms the current state-of-the-art, ensuring a higher throughput and a lower hardware resources usage.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131558404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rapid modular assembly of Xilinx FPGA designs","authors":"A. Love, P. Athanas","doi":"10.1109/FPL.2013.6645620","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645620","url":null,"abstract":"This paper presents an alternative FPGA design compilation flow that reduces the back-end time required to implement a design. Beginning with the GReasy front-end and proceeding through the TFlow back-end, this flow consists of a rapid method for design assembly, decoupled from the vendor tools. This enables software-like turnaround time for faster prototyping and increased productivity.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"76 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113990097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for hardware cellular genetic algorithms: An application to spectrum allocation in cognitive radio","authors":"P. V. Santos, J. Alves, J. Ferreira","doi":"10.1109/FPL.2013.6645599","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645599","url":null,"abstract":"The genetic algorithm (GA) is an optimization metaheuristic that relies on the evolution of a set of solutions (population) according to genetically inspired transformations. In the variant of this technique called cellular GA, the evolution is done separately for subgroups of solutions. This paper describes a hardware framework capable of efficiently supporting custom accelerators for this metaheuristic. This approach builds a regular array of problem-specific processing elements (PEs), which perform the genetic evolution, connected to shared memories holding the local subpopulations. To assist the design of the custom PEs, a methodology based on highlevel synthesis from C++ descriptions is used. The proposed architecture was applied to a spectrum allocation problem in cognitive radio networks. For an array of 5×5 PEs in a Virtex-6 FPGA, the results show a minimum speedup of 22× compared to a software version running on a PC and a speedup near 2000× over a MicroBlaze soft processor.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127911170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating maximum likelihood estimation for Hawkes point processes","authors":"Ce Guo, W. Luk","doi":"10.1109/FPL.2013.6645502","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645502","url":null,"abstract":"Hawkes processes are point processes that can be used to build probabilistic models to describe and predict occurrence patterns of random events. They are widely used in high-frequency trading, seismic analysis and neuroscience. A critical numerical calculation in Hawkes process models is parameter estimation, which is used to fit a Hawkes process model to a data set. The parameter estimation problem can be solved by searching for a parameter set that maximises the log-likelihood. A core operation of this search process, the log-likelihood evaluation, is computationally demanding if the number of data points is large. To accelerate the computation, we present a log-likelihood evaluation strategy which is suitable for hardware acceleration. We then design and optimise a pipelined engine based on our proposed strategy. In the experiments, an FPGA-based implementation of the proposed engine is shown to be up to 72 times faster than a single-core CPU, and 10 times faster than an 8-core CPU.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128433484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Token-based dictionary pattern matching for text analytics","authors":"R. Polig, K. Atasu, C. Hagleitner","doi":"10.1109/FPL.2013.6645535","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645535","url":null,"abstract":"When performing queries for text analytics on unstructured text data, a large amount of the processing time is spent on regular expressions and dictionary matching. In this paper we present a compilable architecture for token-bound pattern matching with support for token pattern sequence detection. The architecture presented is capable of detecting several hundreds of dictionaries, each containing thousands of elements at high throughput. A programmable state machine is used as pattern detection engine to achieve deterministic performance while maintaining low storage requirements. For the detection of token sequences, a dedicated circuitry is compiled based on a non-deterministic automaton. A cascaded result lookup ensures efficient storage while allowing multi-token elements to be detected and multiple dictionary hits to be reported. We implemented on an Altera Stratix IV GX530, and were able to process up to 16 documents in parallel at a peak throughput rate of 9.7 Gb/s.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128031321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jochen Vandorpe, Jo Vliegen, R. Smeets, N. Mentens, M. Drutarovský, M. Varchola, Kerstin Lemke-Rust, Paul Plöger, P. Samarin, Dirk Koch, Yngve Hafting, J. Tørresen
{"title":"Remote FPGA design through eDiViDe — European Digital Virtual Design Lab","authors":"Jochen Vandorpe, Jo Vliegen, R. Smeets, N. Mentens, M. Drutarovský, M. Varchola, Kerstin Lemke-Rust, Paul Plöger, P. Samarin, Dirk Koch, Yngve Hafting, J. Tørresen","doi":"10.1109/FPL.2013.6645621","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645621","url":null,"abstract":"Summary form only given. The design and development of digital electronic systems is mainly performed by use of a hardware description language. To prepare students in electrical engineering for a career in hardware design many universities provide courses on VHDL. The traditional approach in teaching VHDL is mainly by means of textbook examples and simulation provided by software applications. These exercises are perceived as monotonous by the students and do not or only very slightly correspond with actual real-life applications based on FPGAs. Moreover, most real-life applications are too expensive to be equipped in student laboratories. To bridge the gap between a simulation-only environment and affordable real-life applications students should be provided access to remote real-life setups with a 24/7 availability and preferably shared between multiple institutes. The eDiViDe platform (European Digital Virtual Design Lab, http://www.edivide.eu), see Fig. 1, provides students with this unlimited and exciting access to FPGA based setups. Instead of theory-only courses and a quick basic lab, they can work their way through digital design courses testing their skills on real-life setups to trigger their interest. The platform hosts multiple FPGA setups at different European institutes. These setups are accessible through a web-based interface with video feedback. VHDL development is performed offline, given an entity and specific setup information. All further steps of the FPGA toolchain are performed on the platform. A reservation system takes care of the FPGA programming and student interaction with the setups. Similar initiatives provide stable solutions with educational support [1,2,3]. The eDiViDe platform differentiates with a distributed platform across several institutes and with the support for advanced setups. It is the result of a joint effort and easily expandable with additional setups at any location. At this moment following setups are available: greenhouse, stepper motor control, sea noise emulator, state machine workshop, Geffe generator, pong / game of life, traffic light control, MIPS CPU. This set will be extended with more advanced setups that include e.g. a partial reconfiguration workshop for audio/video filters, a side-channel analysis setup and a mars rover playfield. Besides promoting digital design education, the eDiViDe platform creates a channel to make the research activities in the contributing universities more visible. Industry could also benefit from this platform to promote their brand and products to soon to be engineers.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126832352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Zemčík, Roman Juránek, Petr Musil, M. Musil, Michal Hradiš
{"title":"High performance architecture for object detection in streamed videos","authors":"P. Zemčík, Roman Juránek, Petr Musil, M. Musil, Michal Hradiš","doi":"10.1109/FPL.2013.6645559","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645559","url":null,"abstract":"In this paper, we introduce a novel architecture of an engine for high performance multi-scale detection of objects in videos based on WaldBoost training algorithm. The key properties of the architecture include processing of streamed data and low resource consumption. We implemented the engine in FPGA and we show that it can process 640×480 pixel video streams at over 160 fps without the need of external memory. We evaluate the design on the face detection task, compare it to state of the art designs, and discuss its features and limitations.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124152298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Binarization based implementation for real-time human detection","authors":"Shuai Xie, Yibin Li, Zhiping Jia, Lei Ju","doi":"10.1109/FPL.2013.6645590","DOIUrl":"https://doi.org/10.1109/FPL.2013.6645590","url":null,"abstract":"Hardware implementation of human detection is a challenging task for embedded designs. This paper presents a real-time image-based field-programmable gate array (FPGA) implementation of human detection. Our implementation is based on the histograms of oriented gradients (HOG) feature and linear support vector machine (SVM) classifier. The novelty of this work is that we replace normalization process of HOG with a modified binarization process. Therefore, during classification process with SVM classifier, all multiplication operations are replaced by addition operations. All these modifications result in reduction of hardware resource. Experimental evaluation reveals that 293 fps can be achieved on a low-end Xilinx Spartan-3e FPGA. Moreover, a detection accuracy of 1.97% miss rate and 1% false positive rate is achieved. For further demonstration, a prototype system is developed with an OV7670 camera device. Restricted to the speed of camera, a detection rate of 30 fps is achieved.","PeriodicalId":200435,"journal":{"name":"2013 23rd International Conference on Field programmable Logic and Applications","volume":"284 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124247530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}