{"title":"Low Resource Species Agnostic Bird Activity Detection","authors":"Mark Anderson, J. Kennedy, N. Harte","doi":"10.1109/SiPS52927.2021.00015","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00015","url":null,"abstract":"This paper explores low resource classifiers and features for the detection of bird activity, suitable for embedded Automatic Recording Units which are typically deployed for long term remote monitoring of bird populations. Features include low-level spectral parameters, statistical moments on pitch samples, and features derived from amplitude modulation. Performance is evaluated on several lightweight classifiers using the NIPS4Bplus dataset. Our experiments show that random forest classifiers perform best on this task, achieving an accuracy of 0.721 and an F1-Score of 0.604. We compare the results of our system against both a Convolutional Neural Network based detector, and standard MFCC features. Our experiments show that we can achieve equal or better performance in most metrics using features and models with a smaller computational cost and which are suitable for edge deployment.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132145318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arne Fischer-Bühner, E. Matús, M. Gomony, L. Anttila, G. Fettweis, M. Valkama
{"title":"Digital Predistortion with Compressed Observations for Cloud-Based Learning","authors":"Arne Fischer-Bühner, E. Matús, M. Gomony, L. Anttila, G. Fettweis, M. Valkama","doi":"10.1109/SiPS52927.2021.00016","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00016","url":null,"abstract":"This paper presents a novel system architecture for digital predistortion (DPD) of power amplifiers (PA), where the training of the DPD model is done in a remote compute infrastructure i.e. cloud or a distributed unit (DU). In beyond-5G systems it is no longer feasible to perform computationally intensive tasks such as DPD training locally in the radio unit front-end which has stringent power consumption requirements. Thus, we propose to split the DPD system and perform the compute-intensive DPD training in the DU where more processing resources are available. To enable the distant training, the observed PA output, i.e. the observation signal, must be available, however, sending the data-intensive observation signal to the DU adds additional communication overhead to the system. In this paper, a low-complexity compression method is proposed to reduce the bit-resolution of the observation signal by removing the known linear part in the observation to use fewer bits to represent the remaining information. Our numerical simulations show a reduction of 50 % of bits/samples for the accurate training of the DPD model. The DPD performance was evaluated based on simulation for a strongly driven PA operated at 28 GHz with a 200 MHz wide OFDM signal.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126562690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camille Monière, Kassem Saied, Bertrand Le Gal, Emmanuel Boutillon
{"title":"Time sliding window for the detection of CCSK frames","authors":"Camille Monière, Kassem Saied, Bertrand Le Gal, Emmanuel Boutillon","doi":"10.1109/SiPS52927.2021.00026","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00026","url":null,"abstract":"In wireless communications, frame detection and synchronization are usually performed using a preamble that consumes bandwidth and resources. A new type of frame called Quasi Cyclic Short Packet offers the advantage of avoiding preamble (thus saving resource) while allowing simple detection algorithm. The paper presents a time method to simplify the proposed detection algorithm and makes it robust to channel gain variation. First results show that a receiver can detect reliably short packet transmitted at few 100 Kbits/s at very low signal-to-noise ratio (-10 dB, typically).","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114519804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[Title page iii]","authors":"","doi":"10.1109/sips52927.2021.00002","DOIUrl":"https://doi.org/10.1109/sips52927.2021.00002","url":null,"abstract":"","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124245978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Globally Assisted Instance Normalization for Bandwidth-Efficient Neural Style Transfer","authors":"Hsiu-Pin Hsu, Chao-Tsung Huang","doi":"10.1109/SiPS52927.2021.00019","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00019","url":null,"abstract":"Instance normalization (IN) has been widely considered as a key technique in fast neural style transfer algorithms to generate high-quality stylized images. However, because of the calculations of channel-wise means and standard deviations, instance normalization requires layer-by-layer inference flow for CNN accelerators. This kind of dataflow results in huge DRAM bandwidth which is unaffordable for mobile devices or embedding applications. We propose a novel normalization method named globally assisted instance normalization (GAIN) which receives generated statistics from a global branch without actually calculating channel-wise means and standard deviations. Our method generates comparable stylized results and incorporates block-based inference flows to avoid intermediate data transmission. For fast neural style transfer at Full HD 30 fps and 4K UHD 60 fps, we only need 2.52 GB/s and 15.40 GB/s of DRAM bandwidth respectively, which are 90.22% and 92.53% lower than IN with the layer-by-layer inference flow method.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116579781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Short Codes with Near-ML Universal Decoding: Are Random Codes Good Enough?","authors":"Vivian Papadopoulou, Marzieh Hashemipour-Nazari, Alexios Balatsoukas-Stimming","doi":"10.1109/SiPS52927.2021.00025","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00025","url":null,"abstract":"Short blocklength codes have an important role in machine-type and ultra-low-latency communications. Unfortunately, reducing the blocklength makes it very challenging to achieve good error-correcting performance. There exist near-ML decoding algorithms with manageable complexity for short blocklength codes, such as ordered statistics decoding and the more recent guessing random additive noise decoding algorithm. These algorithms have the additional advantage that they are universal, in the sense that they can decode any linear block code. For this reason, some recent works have attempted to construct unstructured linear codes for use with universal decoders using sophisticated techniques, such as reinforcement learning. In this work, we first describe a genetic-algorithm-aided (GA-aided) construction method for unstructured codes and we then compare a very simple random construction to both the GA-aided construction and the reinforcement learning construction. Our simulation results indicate that, while some care should be taken when selecting an unstructured code, sophisticated and complex code construction methods may not be necessary in the sense that they lead to minimal improvements.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131485988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nermine Ali, Jean-Marc Philippe, Benoît Tain, P. Coussy
{"title":"Exploration and Generation of Efficient FPGA-based Deep Neural Network Accelerators","authors":"Nermine Ali, Jean-Marc Philippe, Benoît Tain, P. Coussy","doi":"10.1109/SiPS52927.2021.00030","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00030","url":null,"abstract":"Convolutional Neural Networks (CNNs) have emerged as an answer to next-generation applications such as complex image recognition and object detection. Embedding such compute-intensive and memory-hungry algorithms on edge systems will lead to smarter high-value applications. However, the algorithmic innovations in the CNN field leave the hardware accelerators one step behind. Reconfigurable hardware (e.g. FPGAs) allows designing custom accelerators adapted to new algorithms. Furthermore, new design approaches such as high-level synthesis (HLS) enable to generate RTL code based on high-level function descriptions. This paper presents a high-level CNN accelerator generation framework for FPGAs. A first phase of the framework characterizes CNN descriptions using hardware-aware metrics. These metrics then drive a hardware generation phase which builds the proper C source code implementation for each layer of the network. Finally, an HLS tool outputs the synthesizable RTL code of the accelerator. This approach aims at reducing the gap between the evolving applications based on artificial intelligence and hardware accelerators, thus reducing time-to-market of new systems.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"520 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131886759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Parallel Architecture for Resource-Shareable Reed-Solomon Encoder","authors":"Yok Jye Tang, Xinmiao Zhang","doi":"10.1109/SiPS52927.2021.00035","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00035","url":null,"abstract":"Reed-Solomon (RS) codes are adopted in many digital communication and storage systems to ensure data reliability. For many of these systems, the encoder and decoder are not active at the same time. In previous designs, RS encoders implemented as linear feedback shift registers in a concatenated structure are reused to compute the syndromes so that the decoder complexity is reduced. However, the parallel versions of such encoders have very long critical path and hence can not achieve high speed. This paper proposes a new parallel resource-shareable RS encoder architecture based on the Chinese Remainder Theorem (CRT). The generator polynomial of RS codes is decomposed into factors of degree one and state transformation is developed to enable the sharing of the hardware units for syndrome computation. As a result, the critical path is reduced to only one multiplier and one adder, regardless of the parallelism. Additionally, by utilizing the property that the degrees of the decomposed polynomial factors are one, optimizations are also developed to greatly simplify the CRT-based encoder. For example encoders of a (255, 229) RS code over GF(28), our proposed design can achieve at least 29% higher efficiency in terms of area-time product for moderate or higher parallelisms compared to the previous resource-shareable RS encoder and traditional parallel RS encoders combined with syndrome computation units that implement the same functionality.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129120680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of a Two-Dimensional FFT/IFFT Processor for Real-Time High-Resolution Synthetic Aperture Radar Imaging","authors":"Hung-Yuan Chin, P. Tsai, Sz-Yuan Lee","doi":"10.1109/sips52927.2021.00045","DOIUrl":"https://doi.org/10.1109/sips52927.2021.00045","url":null,"abstract":"The demand of high-resolution synthetic aperture radar (SAR) images entails large-size fast Fourier transform (FFT) in the range and azimuth directions and makes real-time processing a challenging task. A 2D-FFT/IFFT processor is implemented to support 8192-, 16384-, and 32768-point range FFT/IFFT and 8192-point azimuth FFT/IFFT. To exploit the burst read/write of external DDR memory for access efficiency, azimuth decomposition is adopted. Besides, normal-order input for azimuth FFT and bit-reversed order input for azimuth IFFT are designed to save latency and storage for re-ordering. The control logic for look-up tables of twiddle factors in normal-order FFT and bit-reversed-order IFFT given azimuth decomposition is derived and a significant ROM-table reduction is achieved. The radix-23 single-path delay feedback (SDF) architecture is employed to reduce the number of complex multipliers and to allow for streaming input/output. A customized floating-point data-path is utilized. The maximum operating frequency is 111MHz of our 2D-FFT/IFFT processor realized by Xilinx ultrascale VU37P HBM FPGA. The SQNR achieves more than 48dB for one transformation and about 38dB for successive 2D- FFT and 2D-IFFT operations. We demonstrate a promising solution of2D FFT/IFFT for real-time SARimaging.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114738675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multi-Domain Architectural Efficiency Metric","authors":"S. Nagi, D. Markovic","doi":"10.1109/SiPS52927.2021.00055","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00055","url":null,"abstract":"Developments in the field of circuit design and computer architecture have introduced a variety of architectures such as in-memory compute, near-memory compute, FPGA, CGRA, DSP, GPU and many more still in the research phase. There is an increasing need for a unifying architectural efficiency metric that is applicable during the architecture design phase as well as post-silicon benchmarking, which quantifies different architectures, and is independent of the underlying implementation technology. This paper introduces an architectural efficiency metric that satisfies the above mentioned criteria. The metric quantifies the number of instructions or the size of reconfiguration bits required to perform a computation over a range of program sizes in the architecture. The metric helps understand limitations and benefits of different architectures, and provides insight into theoretical throughput. Our efficiency metric also informs the user/compiler about hardware options in a multi-architecture system based on the size of computation required.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129006981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}