{"title":"Mapping and Frequency Joint Optimization for Energy Efficient Execution of Multiple Applications on Multicore Systems","authors":"Simei Yang, S. L. Nours, M. M. Real, S. Pillement","doi":"10.1109/DASIP48288.2019.9049177","DOIUrl":"https://doi.org/10.1109/DASIP48288.2019.9049177","url":null,"abstract":"Run-time resource managers are essential components to optimize energy consumption in cluster-based multicore architectures. However, with the ever increasing number of functionalities supported by these architectures, it is also necessary to optimize the usage of processing resources while guaranteeing that applications' timing constraints are met. In this paper, we present a new run-time management strategy that includes both processing resource allocation and frequency tuning to optimize clusters energy consumption when multiple applications are executed concurrently. The proposed hybrid allocation process minimizes the number of used processing cores while meeting the latency constraint of each application. This approach offers a good trade-off between efficiency and complexity. The achieved energy saving has been demonstrated through various case-studies with different sets of active applications. Results show an improvement of energy saving up to 206% when compared to the literature.","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129673372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Fernández Brillet, N. Leclaire, S. Mancini, Sébastien Cleyet-Merle, M. Nicolas, Jean-Paul Henriques, C. Delnondedieu
{"title":"Speeding-up CNN inference through dimensionality reduction","authors":"Lucas Fernández Brillet, N. Leclaire, S. Mancini, Sébastien Cleyet-Merle, M. Nicolas, Jean-Paul Henriques, C. Delnondedieu","doi":"10.1109/DASIP48288.2019.9049204","DOIUrl":"https://doi.org/10.1109/DASIP48288.2019.9049204","url":null,"abstract":"Computational complexity of CNNs makes their integration in embedded systems with low power consumption requirements a challenging task, which requires the joint design and adaptation of hardware and algorithms. In this paper, we propose a new general CNN compression method, allowing to reduce both the number of parameters and operations. This method is applied to a binary face detection network which is then implemented and evaluated on hardware.","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124797918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Time-of-Flight Sensors for People Counting Applications","authors":"Michal Stec, Viktor Herrmann, B. Stabernack","doi":"10.1109/DASIP48288.2019.9049169","DOIUrl":"https://doi.org/10.1109/DASIP48288.2019.9049169","url":null,"abstract":"Precisely detecting and counting people who are using public transportation is one of the key methods for predicting and planning an efficient use of buses, trams and trains. Providing an effective, well-planned public transportation service is not only important for economic reasons. It also helps to tackle a variety of environmental problems and contributes to a reduction of traffic congestion in urban areas. A couple of such systems had been developed in the past. Those were not sufficiently precise, however. In most cases, these systems rely on data processing generated by one particular type of a 2D image sensor. In this paper we present a robust people counting application, which runs on embedded systems with reasonable requirements as far as computational power is concerned and relies on the processing of 3D data generated by a Time-of-Flight (ToF) sensor. Processing of time-of-flight data requires a couple of preprocessing steps, which is crucial for the subsequent people detection, tracking and counting algorithms. The influence of these preprocessing steps and the effect on the developed detection algorithm are presented. Methods of avoiding misinterpretations by the detection algorithms are discussed. A detailed description of the core algorithms which were developed to process 3D data is provided. An overview will be given on how this method could be further enhanced for the purpose of detecting and differentiating vital and non-vital objects.","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131089085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. A. Momen, Mohammed A. S. Khalid, Mohammad Abdul Moin Oninda
{"title":"FPGA-Based Acceleration of Expectation Maximization Algorithm Using High-Level Synthesis","authors":"M. A. Momen, Mohammed A. S. Khalid, Mohammad Abdul Moin Oninda","doi":"10.1109/DASIP48288.2019.9049183","DOIUrl":"https://doi.org/10.1109/DASIP48288.2019.9049183","url":null,"abstract":"Expectation Maximization (EM) is a soft clustering algorithm which partitions data iteratively into M clusters. It is one of the most popular data mining algorithms that uses Gaussian Mixture Models (GMM) for probability density modeling and is widely used in applications such as signal processing and Machine Learning (ML). EM requires high computation time when dealing with large data sets. This paper presents an optimized implementation of EM algorithm on Stratix V and Arria 10 FPGAs using Intel FPGA Software Development Kit (SDK) for Open Computing Language (OpenCL). Comparison of performance and power consumption between Central Processing Unit (CPU), Graphics Processing Unit (GPU) and FPGA is presented for various dimension and cluster sizes. Compared to Intel® Xeon® CPU E5-2637, our fully optimized OpenCL model for EM targeting Arria 10 FPGA achieved up to 1000x speedup in terms of throughput (T) and 5395x speedup in terms of throughput per unit of power consumed (T/P). Compared to previous research on EM-GMM implementation on GPUs, Arria 10 FPGA obtained up to 64.74x speedup (T) and 486.78x speedup (T/P).","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134283604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Petreto, Thomas Romera, F. Lemaitre, I. Masliah, B. Gaillard, Manuel Bouyer, Quentin L. Meunier, L. Lacassagne
{"title":"A New Real-Time Embedded Video Denoising Algorithm","authors":"Andrea Petreto, Thomas Romera, F. Lemaitre, I. Masliah, B. Gaillard, Manuel Bouyer, Quentin L. Meunier, L. Lacassagne","doi":"10.1109/DASIP48288.2019.9049189","DOIUrl":"https://doi.org/10.1109/DASIP48288.2019.9049189","url":null,"abstract":"Many embedded applications rely on video processing or on video visualization. Noisy video is thus a major issue for such applications. However, video denoising requires a lot of computational effort and most of the state-of-the-art algorithms cannot be run in real-time at camera framerate. This article introduces a new real-time video denoising algorithm for embedded platforms called RTE-VD. We first compare its denoising capabilities with other online and offline algorithms. We show that RTE-VD can achieve real-time performance (25 frames per second) for qHD video (960⨯540 pixels) on embedded CPUs and the output image quality is comparable to state-of-the-art algorithms. In order to reach real-time denoising, we applied several high-level transforms and optimizations (SIMDization, multi-core parallelization, operator fusion and pipelining). We study the relation between computation time and power consumption on several embedded CPUs and show that it is possible to determine different frequency and core configurations in order to minimize either the computation time or the energy.","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114957376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Abdelsalam, A. Elsheikh, J. David, Pierre Langlois
{"title":"POLYCiNN: Multiclass Binary Inference Engine using Convolutional Decision Forests","authors":"A. Abdelsalam, A. Elsheikh, J. David, Pierre Langlois","doi":"10.1109/DASIP48288.2019.9049176","DOIUrl":"https://doi.org/10.1109/DASIP48288.2019.9049176","url":null,"abstract":"Convolutional Neural Networks (CNNs) have achieved significant success in image classification. One of the main reasons that CNNs achieve state-of-the-art accuracy is using many multi-scale learnable windowed feature detectors called kernels. Fetching of kernel feature weights from memory and performing the associated multiply and accumulate computations consume massive amount of energy. This hinders the widespread usage of CNNs, especially in embedded devices. In comparison with CNNs, decision forests are computationally efficient since they are composed of decision trees, which are binary classifiers by nature and can be implemented using AND-OR gates instead of costly multiply and accumulate units. In this paper, we investigate the migration of CNNs to decision forests as one of the promising approaches for reducing both execution time and power consumption while achieving acceptable accuracy. We introduce POLYCiNN, an architecture composed of a stack of decision forests. Each decision forest classifies one of the overlapped sub-images of the original image. Then, all decision forest classifications are fused together to classify the input image. In POLYCiNN, each decision tree is implemented in a single 6-input Look-Up Table and requires no memory access. Therefore, POLYCiNN can be efficiently mapped to simple and densely parallel hardware designs. We validate the performance of POLYCiNN on the benchmark image classification tasks of the MNIST, CIFAR-10 and SVHN datasets.","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128860285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arief Wicaksana, A. Charif, Caaliph Andriamisaina, N. Ventroux
{"title":"Hybrid Prototyping Methodology for Rapid System Validation in HW/SW Co-Design","authors":"Arief Wicaksana, A. Charif, Caaliph Andriamisaina, N. Ventroux","doi":"10.1109/DASIP48288.2019.9049195","DOIUrl":"https://doi.org/10.1109/DASIP48288.2019.9049195","url":null,"abstract":"As the System-on-Chip (SoC) complexity increases, hardware/software co-design plays an important role to improve design productivity, reduce time to market, and optimize the overall results. Consequently, there is a high interest in providing rapid system validation in such a paradigm to achieve the aforementioned objectives. There exist in previous works prototyping techniques related to the development phase. FPGA-based prototyping has the benefits of enabling HW/SW integration and system validation after the Register Transfer Level (RTL) implementation is available while virtual platforms provide capabilities to accelerate software development with higher level functional models, e.g. Transaction Level Modeling (TLM). In this paper, we propose a hybrid prototyping methodology which takes advantage of virtual and FPGA-based prototyping in a single framework. We aim to provide a rapid and flexible system validation solution for HW/SW co-design at various stages of development based on the availability of TLM and RTL implementations. The proposed methodology allows online and offline performance analysis and debugging for early feedback in HW/SW architecture exploration. This was evaluated in the experiments with a neural network processor as a case study.","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133484526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-Time Implementation of Adaptive Correlation Filter Tracking for 4K Video Stream in Zynq UltraScale+ MPSoC","authors":"M. Kowalczyk, Dominika Przewlocka, T. Kryjak","doi":"10.1109/DASIP48288.2019.9049203","DOIUrl":"https://doi.org/10.1109/DASIP48288.2019.9049203","url":null,"abstract":"In this paper a hardware-software implementation of adaptive correlation filter tracking for a 3840 ⨯ 2160 @ 60 fps video stream in a Zynq UltraScale+ MPSoC is discussed. Correlation filters gained popularity in recent years because of their efficiency and good results in the VOT (Visual Object Tracking) challenge. An implementation of the MOSSE (Minimum Output Sum of Squared Error) algorithm is presented. It utilizes 2-dimensional FFT for computing correlation and updates filter coefficients in every frame. The initial filter coefficients are computed on the ARM processor in the PS (Processing System), while all other operations are preformed in PL (Programmable Logic). The presented architecture was described with the use of Verilog hardware description language.","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130532002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rafail Psiakis, A. Kritikakou, O. Sentieys, E. Casseau
{"title":"Run-Time Coarse-Grained Hardware Mitigation for Multiple Faults on VLIW Processors","authors":"Rafail Psiakis, A. Kritikakou, O. Sentieys, E. Casseau","doi":"10.1109/DASIP48288.2019.9049194","DOIUrl":"https://doi.org/10.1109/DASIP48288.2019.9049194","url":null,"abstract":"As transistors scale down, processors are more vulnerable to radiation that can cause multiple transient faults in function units. Rather than excluding these units from execution, performance overhead of VLIW processors can be reduced when fault-free components of these affected units are still used. In the proposed approach, the function units are enhanced with coarse-grained fault detectors. A re-scheduling of the instructions is performed at run-time to use not only the healthy function units, but also the fault-free components of the faulty function units. The scheduling window of the proposed mechanism is two instruction bundles being able to explore mitigation solutions in the current and the next instruction execution. Experiments show that the proposed approach can mitigate a large number of faults with low performance and area overheads.","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124401245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Garbay, Orlando Chuquimia, A. Pinna, H. Sahbi, X. Dray, B. Granado
{"title":"Distilling the knowledge in CNN for WCE screening tool","authors":"Thomas Garbay, Orlando Chuquimia, A. Pinna, H. Sahbi, X. Dray, B. Granado","doi":"10.1109/DASIP48288.2019.9049201","DOIUrl":"https://doi.org/10.1109/DASIP48288.2019.9049201","url":null,"abstract":"A way to improve the early detection of colorectal cancer is screening. Polyps are a marker of colorectal cancer and the best modality to detect them is the image. In 2003 Wireless Capsule Endoscopy was introduced and opened a way to integrate automatic image processing to realize a screening tool. Moreover, the capacity to detect polyp with Convolutional Neural Network was shown in many scientific studies, but one issue is the integration of these networks. In this article, we present our works to integrate CNN or image processing based on a CNN inside a WCE to realize a powerful screening tool. We apply the knowledge distillation method. We prove that knowledge distillation is efficient from VGG16 to Squeezenet in polyp detection context","PeriodicalId":120855,"journal":{"name":"2019 Conference on Design and Architectures for Signal and Image Processing (DASIP)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132375636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}