{"title":"Time Reversal Based Robust Gesture Recognition Using Wifi","authors":"Sai Deepika Regani, Beibei Wang, Min Wu, K. Liu","doi":"10.1109/ICASSP40776.2020.9053420","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053420","url":null,"abstract":"Gesture recognition using wireless sensing opened a plethora of applications in the field of human-computer interaction. However, most existing works are not robust without requiring wearables or tedious training/calibration. In this work, we propose WiGRep, a time reversal based gesture recognition approach using Wi-Fi, which can recognize different gestures by counting the number of repeating gesture segments. Built upon the time reversal phenomenon in RF transmission, the Time Reversal Resonating Strength (TRRS) is used to detect repeating patterns in a gesture. A robust low-complexity algorithm is proposed to accommodate possible variations of gestures and indoor environments. The main advantages of WiGRep are that it is calibration-free and location and environment independent. Experiments performed in both line of sight and non-line-of-sight scenarios demonstrate a detection rate of 99.6% and 99.4%, respectively, for a fixed false alarm rate of 5%.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"17 1","pages":"8309-8313"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73620392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matching Pursuit Based Dynamic Phase-Amplitude Coupling Measure","authors":"T. T. Munia, Selin Aviyente","doi":"10.1109/ICASSP40776.2020.9054503","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054503","url":null,"abstract":"Long-distance neuronal communication in the brain is enabled by the interactions across various oscillatory frequencies. One interaction that is gaining importance during cognitive brain functions is phase amplitude coupling (PAC), where the phase of a slow oscillation modulates the amplitude of a fast oscillation. Current techniques for calculating PAC provide a numerical index that represents an average value across a pre-determined time window. However, there is growing empirical evidence that PAC is dynamic, varying across time. Current approaches to quantify time-varying PAC relies on computing PAC over sliding short time windows. This approach suffers from the arbitrary selection of the window length and does not adapt to the signal dynamics. In this paper, we introduce a data-driven approach to quantify dynamic PAC. The proposed approach relies on decomposing the signal using matching pursuit (MP) to extract time and frequency localized atoms that best describe the given signal. These atoms are then used to compute PAC across time and frequency. As the atoms are time and frequency localized, we only compute PAC across time and frequency regions determined by the selected atoms rather than the whole time-frequency range. The proposed approach is evaluated on both simulated and real electroencephalogram (EEG) signals.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"1279-1283"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85416048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TDMF: Task-Driven Multilevel Framework for End-to-End Speaker Verification","authors":"Chen Chen, Jiqing Han","doi":"10.1109/ICASSP40776.2020.9052957","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9052957","url":null,"abstract":"In this paper, a task-driven multilevel framework (TDMF) is proposed for end-to-end speaker verification. The TDMF has four layers, and each layer has different effects on speaker models or representations to implement the functions of universal background model (UBM), Gaussian mixture model (GMM), total variability model (TVM) and probabilistic linear discriminant analysis (PLDA). Unlike the typical i-vector method, the proposed TDMF can supervise the optimal solution of each phase (layer) towards the direction required by the PLDA classifier. Moreover, different from most endto-end neural network approaches, which extract embeddings first and then additionally calculate the distance between two embeddings as the verification score, the TDMF can directly provide scores via the fourth-layer PLDA. The experimental results show that the TDMF can achieve better performance than that of the typical i-vector framework and VGG-M convolutional neural networks (CNN) framework.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 1","pages":"6809-6813"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85706615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Kurihara, M. Fukui, Suehiro Shimauchi, N. Harada
{"title":"Subjective Quality Estimation Using PESQ For Hands-Free Terminals","authors":"S. Kurihara, M. Fukui, Suehiro Shimauchi, N. Harada","doi":"10.1109/ICASSP40776.2020.9053960","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053960","url":null,"abstract":"Previous reports have mentioned the possibility that subjective quality of the echo-suppressed speech signal can be estimated based on perceptual evaluation of speech quality (PESQ), but there are few experimental results. We propose third-party listening and conversational test procedures to assess whether PESQ can be used for predicting the subjective quality of an acoustic echo canceler. In the proposed third-party listening test procedure, near-end and far-end signals are presented separately in the left and right channels of stereo playback and differential category rating evaluation is applied to those stimuli for obtaining differential mean opinion scores. In the proposed conversational test procedure, impaired and non-impaired reference signals are recorded during a conversation to make PESQ processing possible. Experimental results indicate that there is a strong correlation between PESQ and subjective scores.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"921-925"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85844595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Lagunas, A. Pérez-Neira, M. Lagunas, M. Vázquez
{"title":"Transmit Beamforming Design with Received-Interference Power Constraints: The Zero-Forcing Relaxation","authors":"E. Lagunas, A. Pérez-Neira, M. Lagunas, M. Vázquez","doi":"10.1109/ICASSP40776.2020.9053471","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053471","url":null,"abstract":"The use of multi-antenna transmitters is emerging as an essential technology of the future wireless communication systems. While Zero-Forcing Beamforming (ZFB) has become the most popular low-complexity transmit beamforming design, it has some drawbacks basically related to the effort of \"trying\" to invert the channel coefficients towards the interfered users. In particular, ZFB performs poorly in the low Signal-to-Noise Ratio (SNR) regime and does not work when the interfered users outnumber the transmit antennas. In this paper, we study in detail an alternative transmit beamforming design framework, where we allow some residual received-interference power instead of trying to null it completely out. Subsequently, we provide a close-form non-iterative optimal solution that avoids the use of sophisticated convex optimization techniques that compromise its applicability onto practical systems. Supporting results based on numerical simulations show that the proposed transmit beamforming is able to perform close to the optimal with much lower computational complexity.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"29 1","pages":"4727-4731"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84154790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jhansi Mallela, Aravind Illa, BN Suhas, Sathvik Udupa, Yamini Belur, A. Nalini, R. Yadav, P. Reddy, D. Gope, P. Ghosh
{"title":"Voice based classification of patients with Amyotrophic Lateral Sclerosis, Parkinson’s Disease and Healthy Controls with CNN-LSTM using transfer learning","authors":"Jhansi Mallela, Aravind Illa, BN Suhas, Sathvik Udupa, Yamini Belur, A. Nalini, R. Yadav, P. Reddy, D. Gope, P. Ghosh","doi":"10.1109/ICASSP40776.2020.9053682","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053682","url":null,"abstract":"In this paper, we consider 2-class and 3-class classification problems for classifying patients with Amyotrophic Lateral Sclerosis (ALS), Parkinson’s Disease (PD), and Healthy Controls (HC) using a CNNLSTM network. Classification performance is examined for three different tasks, namely, Spontaneous speech (SPON), Diadochokinetic rate (DIDK) and Sustained phoneme production (PHON). Experiments are conducted using speech data recorded from 60 ALS, 60 PD, and 60 HC subjects. Classifications using SVM and DNN are considered as baseline schemes. Classification accuracy of ALS and HC (indicated by ALS/HC) using CNN-LSTM has shown an improvement of 10.40%, 4.22% and 0.08% for PHON, SPON and DIDK tasks, respectively over the best of the baseline schemes. Furthermore, the CNN-LSTM network achieves the highest PD/HC classification accuracy of 88.5% for the SPON task and the highest 3-class (ALS/PD/HC) classification accuracy of 85.24% for the DIDK task. Experiments using transfer learning at low resource training data show that data from ALS benefits PD/HC classification and vice-versa. Experiments with fine-tuning weights of 3-class (ALS/PD/HC) classifier for 2-class classification (PD/HC or ALS/HC) gives an absolute improvement of 2% classification accuracy in SPON task when compared with randomly initialized 2-class classifier.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"51 1","pages":"6784-6788"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78291256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"JPEG Steganography with Side Information from the Processing Pipeline","authors":"Quentin Giboulot, R. Cogranne, P. Bas","doi":"10.1109/ICASSP40776.2020.9054486","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054486","url":null,"abstract":"The current art in schemes using deflection criterion such as Mi-POD for JPEG steganography is either under-performing or on par with distortion-based schemes. We link this lack of performance to a poor estimation of the variance of the model of the noise on the cover image. In this paper, we propose a method to better estimate the variances of DCT coefficients by taking into account the dependencies between pixels that come from the development pipeline. Using this estimate, we are able to extend statistically-informed steganographic schemes to the JPEG domain while significantly outperforming the current state-of-the-art JPEG steganography. An extension of Gaussian Embedding in the JPEG domain using quantization error as side-information is also formulated and shown to attain state-of-the-art performances.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 6 1","pages":"2767-2771"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78339393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Balancing Rates and Variance via Adaptive Batch-Sizes in First-Order Stochastic Optimization","authors":"Zhan Gao, Alec Koppel, Alejandro Ribeiro","doi":"10.1109/ICASSP40776.2020.9054292","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054292","url":null,"abstract":"Stochastic gradient descent is a canonical tool for addressing stochastic optimization problems, and forms the bedrock of modern machine learning and statistics. In this work, we seek to balance the fact that attenuating step-sizes is required for exact asymptotic convergence with the fact that larger constant step-sizes learn faster in finite time up to an error. To do so, rather than fixing the mini-batch and step-size at the outset, we propose a strategy to allow parameters to evolve adaptively. Specifically, the batch-size is set to be a piecewise-constant increasing sequence where the increase occurs when a suitable error criterion is satisfied. Moreover, the step-size is selected as that which yields the fastest convergence. The overall algorithm, two scale adaptive (TSA) scheme, is shown to inherit the exact asymptotic convergence of stochastic gradient method. More importantly, the optimal error decreasing rate is achieved theoretically, as well as an overall reduction in sample computational cost. Experimentally, we observe a favorable tradeoff relative to standard SGD schemes absorbing their advantages, which illustrates the significant performance of proposed TSA scheme.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"45 1","pages":"5385-5389"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73324555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi Image Depth from Defocus Network with Boundary Cue for Dual Aperture Camera","authors":"Gwangmo Song, Yumee Kim, K. Chun, Kyoung Mu Lee","doi":"10.1109/ICASSP40776.2020.9054346","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054346","url":null,"abstract":"In this paper, we estimate depth information using two defocused images from dual aperture camera. Recent advances in deep learning techniques have increased the accuracy of depth estimation. Besides, methods of using a defocused image in which an object is blurred according to a distance from a camera have been widely studied. We further improve the accuracy of the depth estimation by training the network using two images with different degrees of depth-of-field. Using images taken with different apertures for the same scene, we can determine the degree of blur in an image more accurately. In this work, we propose a novel deep convolutional network that estimates depth map using dual aperture images based on boundary cue. Our proposed method achieves state-of-the-art performance on a synthetically modified NYU-v2 dataset. In addition, we built a new camera using fast variable apertures to build a test environment in the real world. In particular, we collected a new dataset which consists of real world vehicle driving scenes. Our proposed work shows excellent performance in the new dataset.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"2293-2297"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73384588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Acoustic Scene Classification for Mismatched Recording Devices Using Heated-Up Softmax and Spectrum Correction","authors":"Truc The Nguyen, F. Pernkopf, Michal Kosmider","doi":"10.1109/ICASSP40776.2020.9053582","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053582","url":null,"abstract":"Deep neural networks (DNNs) are successful in applications with matching inference and training distributions. In realworld scenarios, DNNs have to cope with truly new data samples during inference, potentially coming from a shifted data distribution. This usually causes a drop in performance. Acoustic scene classification (ASC) with different recording devices is one of this situation. Furthermore, an imbalance in quality and amount of data recorded by different devices causes severe challenges. In this paper, we introduce two calibration methods to tackle these challenges. In particular, we applied scaling of the features to deal with varying frequency response of the recording devices. Furthermore, to account for the shifted data distribution, a heated-up softmax is embedded to calibrate the predictions of the model. We use robust and resource-efficient models, and show the efficiency of heated-up softmax. Our ASC system reaches state-of-the-art performance on the development set of DCASE challenge 2019 task 1B with only ~70K parameters. It achieves 70.1% average classification accuracy for device B and device C. It performs on par with the best single model system of the DCASE 2019 challenge and outperforms the baseline system by 28.7% (absolute).","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"38 1","pages":"126-130"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79998896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}