{"title":"Hardware-software implementation of the PointPillars network for 3D object detection in point clouds","authors":"Joanna Stanisz, K. Lis, T. Kryjak, M. Gorgon","doi":"10.1145/3441110.3441150","DOIUrl":"https://doi.org/10.1145/3441110.3441150","url":null,"abstract":"In this paper, we present a hardware-software implementation of a deep neural network for object detection based on a point cloud obtained by a LiDAR sensor. The Brevitas / PyTorch tools were used for network quantisation and the FINN tool for hardware implementation in the reprogrammable Zynq UltraScale+ MPSoC device. The PointPillars network was used in the research, as it is a reasonable compromise between detection accuracy and calculation complexity. The obtained results show that quite a significant computation precision limitation along with a few network architecture simplifications allows the solution to be implemented on an heterogeneous embedded platform with reasonable detection accuracy.","PeriodicalId":398729,"journal":{"name":"Workshop on Design and Architectures for Signal and Image Processing (14th edition)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122110479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Convolutional Fully-Connected Capsule Network (CFC-CapsNet)","authors":"Pouya Shiri, A. Baniasadi","doi":"10.1145/3441110.3441148","DOIUrl":"https://doi.org/10.1145/3441110.3441148","url":null,"abstract":"Capsule Networks (CapsNets) are the new generation of classifiers with several advantages over the previous ones. Such advantages include higher robustness to affine transformed datasets and detection of overlapping images. CapsNets, while obtaining state-of-the-art accuracy on the MNIST digit recognition dataset, fall behind Convolutional Neural Networks (CNNs) for other datasets. Moreover, CapsNets are slow compared to CNNs. In this work, we propose Convolutional Fully Connected (CFC) CapsNet as an alternative enhanced architecture to conventional CapsNet [8]. CFC-CapsNet is a more efficient network: training and testing are performed faster and a slightly higher accuracy is achieved compared to the conventional CapsNet. CFC-CapsNet includes fewer trainable weights (parameters) and therefore is more efficient in terms of memory usage. The code for CFC-CapsNet is available on Github 1.","PeriodicalId":398729,"journal":{"name":"Workshop on Design and Architectures for Signal and Image Processing (14th edition)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122681925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Krzysztof Błachut, M. Danilowicz, Hubert Szolc, Mateusz Wasala, T. Kryjak, Nikodem Pankiewicz, M. Komorkiewicz
{"title":"Automotive perception system evaluation with reference data obtained by a UAV","authors":"Krzysztof Błachut, M. Danilowicz, Hubert Szolc, Mateusz Wasala, T. Kryjak, Nikodem Pankiewicz, M. Komorkiewicz","doi":"10.1145/3441110.3441151","DOIUrl":"https://doi.org/10.1145/3441110.3441151","url":null,"abstract":"Testing and evaluation of an automotive perception system is a complicated task which requires special equipment and infrastructure. To compute key performance indicators and compare the results with the real-world situation, some additional sensors and manual data labelling are often required. In this article, we propose a different approach, which is based on a UAV equipped with a 4K camera flying above the test track. Thanks to the synchronisation of the sensors between the tested vehicle and the UAV, it is possible to precisely determine the positions of the objects around the car and correlate them with the perception system readings. The performed experiments indicate that this approach could be an interesting alternative to the existing evaluation solutions.","PeriodicalId":398729,"journal":{"name":"Workshop on Design and Architectures for Signal and Image Processing (14th edition)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132043983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-Power Sign-Magnitude FFT Design for FMCW Radar Signal Processing","authors":"O. Meteer, M. Bekooij","doi":"10.1145/3441110.3441145","DOIUrl":"https://doi.org/10.1145/3441110.3441145","url":null,"abstract":"Fully integrated CMOS frequency-modulated continuous-wave radar ICs are under development, in which computing FFTs cost a significant amount of energy. In this paper we introduce a power-efficient FFT solution which exploits that intermediate results of FFT computations typically have small amplitudes in FMCW radar systems. We propose using the sign-magnitude number representation combined with a custom, unsigned Booth multiplier that does not generate negative numbers internally, significantly decreasing switching activity. RTL power-simulation results show up to 46.45% less power usage with our sign-magnitude radix-2 FFT implementation compared to a two’s complement design, while only having a 6.67% lower maximum clock speed.","PeriodicalId":398729,"journal":{"name":"Workshop on Design and Architectures for Signal and Image Processing (14th edition)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125178783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alemeh Ghasemi, R. Cataldo, J. Diguet, Kevin J. M. Martin
{"title":"On Cache Limits for Dataflow Applications and Related Efficient Memory Management Strategies","authors":"Alemeh Ghasemi, R. Cataldo, J. Diguet, Kevin J. M. Martin","doi":"10.1145/3441110.3441573","DOIUrl":"https://doi.org/10.1145/3441110.3441573","url":null,"abstract":"The dataflow paradigm frees the designer to focus on the functionality of an application, independently from the underlying architecture executing it. While mapping the dataflow computational part to the cores seems obvious, the memory aspects do not match accordingly. Dataflow compilers usually do not consider the presence of caches when generating code. A generally accepted idea is that bigger and multi-level caches improve the performance of applications. Unfortunately, state-of-the-art dataflow compilers may prove the exception to this rule. This paper presents two efficient memory management strategies for dataflow applications through a study on the impact of sharing, size, and the number of levels of caches on them. The results show that bigger is not always better, and the foreseen future of more cores and bigger caches do not guarantee software-free better performance for dataflow applications. We propose two strategies, that can be used concurrently, to address the memory aspects of the dataflow model: copy-on-write and non-temporal memory transfers. Experimental results show that we speed up a computer stereo vision application by 2.1 × and reduce the number of L1 data cache misses by 45% while maintaining the actors’ source code and design intact.","PeriodicalId":398729,"journal":{"name":"Workshop on Design and Architectures for Signal and Image Processing (14th edition)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121667171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naouel Haggui, Fatma Belghith, W. Hamidouche, N. Masmoudi, J. Nezan
{"title":"Multiple Transform Selection concept modeling and implementation using Interface Based SDF graphs","authors":"Naouel Haggui, Fatma Belghith, W. Hamidouche, N. Masmoudi, J. Nezan","doi":"10.1145/3441110.3441153","DOIUrl":"https://doi.org/10.1145/3441110.3441153","url":null,"abstract":"Recent studies predict that video data accounts for 82% of Internet traffic by 2022. This fact has motivated MPEG to define a new Video Coding Standard called Versatile Video Coding (VVC), which will be released by the end of 2020. VVC will offer the possibility to handle new video formats and to improve significantly video compression over its predecessor HEVC. Indeed, the objective is to reduce the necessary bit rate by half, at equivalent quality. These advances require the use of more complex algorithms, although the increase in complexity has been limited throughout the standardization process. In order to decrease the complexity of VVC and consequently the coding execution time, several methods have been introduced at different stages of the encoder. The aim of this paper is to explore the available parallelism of VVC to accelerate the coding and the decoding processes. This paper focuses on the transformation block and more specifically the new concept of Multiple Transform Selection (MTS) introduced by VVC. Moreover, a study of several granularity levels of Interface-Based Synchronous Dataflow (IBSDF) models and their impact on the performances obtained on x86 architectures is presented. IBSDF dataflow graph has been developed to reveal the available parallelism of MTS. The PREESM fast prototyping tool is then used for the mapping and the scheduling of MTS on virtual and real parallel architectures and for generating efficient parallel implementations on real architectures. PREESM has been used in this work to explore the potential parallelism offered by MTS and to prove the efficiency of MTS on multicore x86 architectures. Experimental results show a speed-up close to the optimum.","PeriodicalId":398729,"journal":{"name":"Workshop on Design and Architectures for Signal and Image Processing (14th edition)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131931103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Desnos, Nicolas Sourbier, Pierre-Yves Raumer, Olivier Gesny, M. Pelcat
{"title":"Gegelati: Lightweight Artificial Intelligence through Generic and Evolvable Tangled Program Graphs","authors":"K. Desnos, Nicolas Sourbier, Pierre-Yves Raumer, Olivier Gesny, M. Pelcat","doi":"10.1145/3441110.3441575","DOIUrl":"https://doi.org/10.1145/3441110.3441575","url":null,"abstract":"Tangled Program Graph (TPG) is a reinforcement learning technique based on genetic programming concepts. On state-of-the-art learning environments, TPGs have been shown to offer comparable competence with Deep Neural Networks (DNNs), for a fraction of their computational and storage cost. This lightness of TPGs, both for training and inference, makes them an interesting model to implement Artificial Intelligences (AIs) on embedded systems with limited computational and storage resources. In this paper, we introduce the Gegelati library for TPGs. Besides introducing the general concepts and features of the library, two main contributions are detailed in the paper: 1/ The parallelization of the deterministic training process of TPGs, for supporting heterogeneous Multiprocessor Systems-on-Chipss (MPSoCss). 2/ The support for customizable instruction sets and data types within the genetically evolved programs of the TPG model. The scalability of the parallel training process is demonstrated through experiments on architectures ranging from a high-end 24-core processor to a low-power heterogeneous MPSoCs. The impact of customizable instructions on the outcome of a training process is demonstrated on a state-of-the-art reinforcement learning environment.","PeriodicalId":398729,"journal":{"name":"Workshop on Design and Architectures for Signal and Image Processing (14th edition)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126202449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}