{"title":"Proposition and evaluation of a real-time generic architecture for a laser stripe detection system on FPGA","authors":"Seher Colak, E. Dumas, V. Fresse, O. Alata","doi":"10.1109/DASIP.2017.8122110","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122110","url":null,"abstract":"Laser triangulation applications are commonly used for industrial quality control. Such algorithms require real-time systems often made of a computing unit close to the image sensor through a short and fast link. Choosing a camera with integrated Field Programmable Gate Array (FPGA) as the computing unit can provide high pipeline and parallel computing adapted to process image in real-time. Moreover, it is necessary in the industry to maintain code for several years whatever the system upgrade. So the conceived operators should be flexible to adapt to any hardware changes (sensor or FPGA) or any tool update with minimum effort. The purpose of this article is to present a generic architecture for laser stripe detection based on the centroid algorithm for a FPGA-based system. Evaluation of the use of resources with respect to two parameters (image width and parallelism) is pointed out. With three syntheses, models have been extracted to forecast evolution of these resources and an error analysis have been conducted to validate these models.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"160 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72718654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting data-parallel synchronous dataflow graphs","authors":"Sudeep Kanur, J. Lilius, Johan Ersfolk","doi":"10.1109/DASIP.2017.8122118","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122118","url":null,"abstract":"Synchronous Dataflow (SDF), a popular subset of the dataflow programming paradigm, gives a well structured formalism to capture signal and stream processing applications. With data-parallel architectures becoming ubiquitous, several frameworks leverage the SDF formalism to map applications to parallel architectures. But, these frameworks assume that the Synchronous Dataflow graphs (SDFGs) under consideration already are data-parallel. In this paper, we address the lack of mechanisms required to detect if an SDFG can be executed in a data-parallel fashion. We develop necessary and sufficient conditions that an SDFG must satisfy for its data-parallel execution. In addition, we develop methods that detect and transform SDFGs that cannot be determined to be data-parallel through visual graph inspection alone. We report on a prototype implementation of the developed conditions as a compiler pass in PREESM framework and test them against some useful applications expressed as an SDFG.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"31 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84290403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware-software abandoned object detection vision system in heterogeneous zynq device","authors":"T. Kryjak, Artur Skirzynski, M. Gorgon","doi":"10.1109/DASIP.2017.8122122","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122122","url":null,"abstract":"In this paper a hardware-software abandoned object detection vision system implemented in the Zynq SoC (System on Chip) device is presented. First, the solution was implemented in C++ and run as a bare metal application on the ARM processor core of the Zynq (using floating and fixed-point computations). For the target video stream 1280 χ 720 @ 50 fps (74.25 MHz pixel clock) it reached only 2 fps. Therefore, to speed-up the application, it was decided to move some of the image processing and analysis operations to the programmable logic. This allowed to obtain real-time image processing i.e. 50 fps, with power consumption of less than 4W.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"129 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74692953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steffen Vaas, Peter Ulbrich, M. Reichenbach, D. Fey
{"title":"The best of both: High-performance anc deterministic real-time executive by application-specific multi-core SoCs","authors":"Steffen Vaas, Peter Ulbrich, M. Reichenbach, D. Fey","doi":"10.1109/DASIP.2017.8122107","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122107","url":null,"abstract":"Embedded multi-core processors improve performance significantly and are desirable in many application-fields. This in particular includes safety-critical real-time systems, which typically require a deterministic temporal behavior. However, even tasks without dependencies running on different cores can interfere due to, sometimes hidden, shared hardware resources, such as common memories or buses. Consequently, only a pessimistic assumption of the worst-case execution time (WCET) that incorporates interference can be given. Hence, the aspired performance gain fizzles out in the poor temporal analyzability. Based on the fact that in safety-critical systems all tasks and their dependencies are known at compile-time, this paper presents an approach to generate application-specific, deterministic multi-core processor architectures for these systems. Thereby safety-critical tasks are executed on dedicated Deterministic Execution Units (DEUs) including lightweight, deterministic processor cores, bus systems, memories and peripherals. The remaining soft real-time tasks are executed on a general purpose multi-core processor that offers performance over determinism. Consequently, timing analysis for hard real-time tasks is significantly simplified, since interferences caused by shared resources and scheduling are effectively eliminated. To show the benefits of our approach, an application-specific architecture for a flight controller was generated and compared to an ARM Cortex-A9 dual-core as reference. Overall, we were able to significantly improve temporal properties of safety-critical tasks while preserving the overall performance for soft real-time tasks.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"7 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78960169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Ibrahim, W. Simon, Damien Doy, E. Pignat, F. Angiolini, M. Arditi, J. Thiran, G. Micheli
{"title":"Single-FPGA complete 3D and 2D medical ultrasound imager","authors":"A. Ibrahim, W. Simon, Damien Doy, E. Pignat, F. Angiolini, M. Arditi, J. Thiran, G. Micheli","doi":"10.1109/DASIP.2017.8122113","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122113","url":null,"abstract":"3D ultrasound (US) acquisition acquires volumetric images, thus alleviating a classical US imaging bottleneck that requires a highly-trained sonographer to operate the US probe. However, this opportunity has not been explored in practice, since 3D US machines are only suitable for hospital usage in terms of cost, size and power requirements. In this work we propose the first fully-digital, single-chip 3D US imager on FPGA. The proposed design is a complete processing pipeline that includes pre-processing, image reconstruction, and post-processing. It supports up to 1024 input channels, which matches or exceeds state of the art, in an unprecedented estimated power budget of 6.1 W. The imager exploits a highly scalable architecture which can be either downscaled for 2D imaging, or further upscaled on a larger FPGA. Our platform supports both real-time inputs over an optical cable, or test data feeds sent by a laptop running Matlab and custom tools over an Ethernet connection. Additionally, the design allows HDMI video output on a screen.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"12 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75556644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Lieske, W. Uhring, N. Dumas, J. Léonard, D. Fey
{"title":"Embedded fluorescence lifetime determination for high throughput real-time droplet sorting with microfluidics","authors":"T. Lieske, W. Uhring, N. Dumas, J. Léonard, D. Fey","doi":"10.1109/DASIP.2017.8122129","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122129","url":null,"abstract":"Time-resolved fluorescence (TRF) analysis is considered to be among the primary research tools in biochemistry and biophysics. One application of this method is the investigation of biomolecular interactions with promising applications for biosensing. For the latter context, time-correlated single photon counting (TCSPC) is the most sensitive, hence preferred implementation of TRF. However, high throughput applications are presently limited by the maximum achievable photon acquisition rate, and even more by the data processing rate. The latter rate is actually limited by the computational complexity to estimate accurately the fluorescence lifetime from TCSPC data. Here we propose a solution that would enable the implementation of TRF detection for fluorescence-activated droplet sorting (FADS), a particularly high throughput, microfluidic-based technology. Most fluorescence lifetime algorithms require a large number of detected photons for an accurate lifetime computation. This paper presents an implementation based on a maximum likelihood estimator (MLE), enabling high precision estimation with a limited number of detected photons, significantly reducing the total measurement time. This speedup rapidly increases the input data rate. As a result, off-the-shelf embedded products cannot handle the data rates produced by current TCSPC units that are used to measure the fluorescence. Therefore, a configurable real-time capable hardware architecture is implemented on a field-programmable gate array (FPGA) that can handle the data rates of future TCSPC units, rendering high throughput droplet sorting with microfluidics possible.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"38 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75648080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seyyed Mahdi Najmabadi, Harsimran Singh Tungal, Trung-Hieu Tran, S. Simon
{"title":"Hardware-based architecture for asymmetric numeral systems entropy decoder","authors":"Seyyed Mahdi Najmabadi, Harsimran Singh Tungal, Trung-Hieu Tran, S. Simon","doi":"10.1109/DASIP.2017.8122109","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122109","url":null,"abstract":"In this paper, two novel hardware architectures based on tabled asymmetric numeral systems decoding algorithm are proposed. In the proposed architectures the decoding throughput is highly dependent on the how much the data is compressed at encoding time. The synthesis results presented here show that the throughput of the parallel architecture can reach up 200 MB/s. The benchmarks show that the parallel architecture that runs on Xilinx Kintex FPGA provides higher throughout in comparison with the same algorithm running on Core i3 CPU.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"84 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82406237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deepayan Bhowmik, Paulo Garcia, A. Wallace, Robert J. Stewart, G. Michaelson
{"title":"Power efficient dataflow design for a heterogeneous smart camera architecture","authors":"Deepayan Bhowmik, Paulo Garcia, A. Wallace, Robert J. Stewart, G. Michaelson","doi":"10.1109/DASIP.2017.8122128","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122128","url":null,"abstract":"Visual attention modelling characterises the scene to segment regions of visual interest and is increasingly being used as a pre-processing step in many computer vision applications including surveillance and security. Smart camera architectures are an emerging technology and a foundation of security and safety frameworks in modern vision systems. In this paper, we present a dataflow design of a visual saliency based camera architecture targeting a heterogeneous CPU+FPGA platform to propose a smart camera network infrastructure. The proposed design flow encompasses image processing algorithm implementation, hardware & software integration and network connectivity through a unified model. By leveraging the properties of the dataflow paradigm, we iteratively refine the algorithm specification into a deployable solution, addressing distinct requirements at each design stage: from algorithm accuracy to hardware-software interactions, real-time execution and power consumption. Our design achieved real-time run time performance and the power consumption of the optimised asynchronous design is reported at only 0.25 Watt. The resource usages on a Xilinx Zynq platform remains significantly low.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"79 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84108852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Juurlink, J. Lucas, Nadjib Mammeri, G. Keramidas, Katerina Pontzolkova, I. Aransay, Chrysa Kokkala, Martyn Bliss, A. Richards
{"title":"Enabling GPU software developers to optimize their applications — The LPGPU2 approach","authors":"B. Juurlink, J. Lucas, Nadjib Mammeri, G. Keramidas, Katerina Pontzolkova, I. Aransay, Chrysa Kokkala, Martyn Bliss, A. Richards","doi":"10.1109/DASIP.2017.8122116","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122116","url":null,"abstract":"Low-power GPUs have become ubiquitous, they can be found in domains ranging from wearable and mobile computing to automotive systems. With this ubiquity has come a wider range of applications exploiting low-power GPUs, placing ever increasing demands on the expected performance and power efficiency of the devices. The LPGPU2 project is an EU-funded, Innovation Action, 30-month-project targeting to develop an analysis and visualization framework that enables GPU application developers to improve the performance and power consumption of their applications. To this end, the project follows a holistic approach. First, several applications (use cases) are being developed for or ported to low-power GPUs. These applications will be optimized using the tooling framework in the last phase of the project. In addition, power measurement devices and power models are devised that are 10× more accurate than the state of the art. The ultimate goal of the project is to promote open vendor-neutral standards via the Khronos group. This paper briefly reports on the achievements made in the first phase of the project (till month 18) and focuses on the progress made in applications; in power measurement, estimation, and modelling; and in the analysis and visualization tool suite.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"26 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82060300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Madroñal, R. Lazcano, H. Fabelo, S. Ortega, R. Salvador, G. Callicó, E. Juárez, C. Sanz
{"title":"Energy consumption characterization of a Massively Parallel Processor Array (MPPA) platform running a hyperspectral SVM classifier","authors":"D. Madroñal, R. Lazcano, H. Fabelo, S. Ortega, R. Salvador, G. Callicó, E. Juárez, C. Sanz","doi":"10.1109/DASIP.2017.8122112","DOIUrl":"https://doi.org/10.1109/DASIP.2017.8122112","url":null,"abstract":"In this paper, a Massively Parallel Processor Array platform is characterized in terms of energy consumption using a Support Vector Machine for hyperspectral image classification. This platform gathers 16 clusters composed of 16 cores each, i.e., 256 processors working in parallel. The objective of the work is to associate power dissipation and energy consumed by the platform with the different resources of the architecture. Experimenting with a hyperspectral SVM classifier, this study has been conducted using three strategies: i) modifying the number of processing elements, i.e., clusters and cores, ii) increasing system frequency, and iii) varying the number of active communication links during the analysis, i.e., I/Os and DMAs. As a result, a relationship between the energy consumption and the active platform resources has been exposed using two different parallelization strategies. Finally, the implementation that fully exploits the parallelization possibilities working at 500MHz has been proven to be also the most efficient one, as it reduces the energy consumption by 98% when compared to the sequential version running at 400MHz.","PeriodicalId":6637,"journal":{"name":"2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"8 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82068279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}