{"title":"Robust Binary Loss for Multi-Category Classification with Label Noise","authors":"Defu Liu, Guowu Yang, Jinzhao Wu, Jiayi Zhao, Fengmao Lv","doi":"10.1109/ICASSP39728.2021.9414493","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414493","url":null,"abstract":"Deep learning has achieved tremendous success in image classification. However, the corresponding performance leap relies heavily on large-scale accurate annotations, which are usually hard to collect in reality. It is essential to explore methods that can train deep models effectively under label noise. To address the problem, we propose to train deep models with robust binary loss functions. To be specific, we tackle the K-class classification task by using K binary classifiers. We can immediately use multi-category large margin classification approaches, e.g., Pairwise-Comparison (PC) or One-Versus-All (OVA), to jointly train the binary classifiers for multi-category classification. Our method can be robust to label noise if symmetric functions, e.g., the sigmoid loss or the ramp loss, are employed as the binary loss function in the framework of risk minimization. The learning theory reveals that our method can be inherently tolerant to label noise in multi-category classification tasks. Extensive experiments over different datasets with different types of label noise are conducted. The experimental results clearly confirm the effectiveness of our method.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122701894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy-Accuracy Trade-Off of Inference as Service","authors":"Yulu Jin, L. Lai","doi":"10.1109/ICASSP39728.2021.9413438","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413438","url":null,"abstract":"In this paper, we propose a general framework to provide a desirable trade-off between inference accuracy and privacy protection in the inference as service scenario. Instead of sending data directly to the server, the user will preprocess the data through a privacy-preserving mapping, which will increase privacy protection but reduce inference accuracy. To properly address the trade-off between privacy protection and inference accuracy, we formulate an optimization problem to find the optimal privacy-preserving mapping. Even though the problem is non-convex in general, we characterize nice structures of the problem and develop an iterative algorithm to find the desired privacy-preserving mapping.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122733087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Upasana Tiwari, Swapnil Bhosale, Rupayan Chakraborty, S. Kopparapu
{"title":"Deep Lung Auscultation Using Acoustic Biomarkers for Abnormal Respiratory Sound Event Detection","authors":"Upasana Tiwari, Swapnil Bhosale, Rupayan Chakraborty, S. Kopparapu","doi":"10.1109/ICASSP39728.2021.9414845","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414845","url":null,"abstract":"Lung Auscultation is a non-invasive process of distinguishing normal respiratory sounds from abnormal ones by analyzing the airflow along the respiratory tract. With developments in the Deep Learning (DL) techniques and wider access to anonymized medical data, automatic detection of specific sounds such as crackles and wheezes have been gaining popularity. In this paper, we propose to use two sets of diversified acoustic biomarkers extracted using Discrete Wavelet Transform (DWT) and deep encoded features from the intermediate layer of a pre-trained Audio Event Detection (AED) model trained using sounds from daily activities. First set of biomarkers highlight the time frequency localization characteristics obtained from DWT coefficients. However, the second set of deep encoded biomarkers captures a generalized reliable representation, and thus indemnifies the scarcity of training samples and the class imbalance in dataset. The model trained using these features achieves a 15.05% increase in terms of the specificity over the baseline model that uses spectrogram features. Moreover, ensemble of DWT features and deep encoded feature based models show absolute improvements of 8.32%, 6.66% and 7.40% in terms of sensitivity, specificity and ICBHI-score, respectively, and clearly outperforms the state-of-the-art with a significant margin.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122465927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Periodic Frame Learning Approach for Accurate Landmark Localization in M-Mode Echocardiography","authors":"Yinbing Tian, Shibiao Xu, Li Guo, Fu'ze Cong","doi":"10.1109/ICASSP39728.2021.9414375","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414375","url":null,"abstract":"Anatomical landmark localization has been a key challenge for medical image analysis. Existing researches mostly adopt CNN as the main architecture for landmark localization while they are not applicable to process image modalities with periodic structure. In this paper, we propose a novel two-stage frame-level detection and heatmap regression model for accurate landmark localization in m-mode echocardiography, which promotes better integration between global context information and local appearance. Specifically, a periodic frame detection module with LSTM is designed to model periodic context and detect frames of systole and diastole from original echocardiography. Next, a CNN based heatmap regression model is introduced to predict landmark localization in each systolic or diastolic local region. Experiment results show that the proposed model achieves average distance error of 9.31, which is at a reduction by 24% comparing to baseline models.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122477849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yogesh Virkar, Marcello Federico, Robert Enyedi, R. Barra-Chicote
{"title":"Improvements to Prosodic Alignment for Automatic Dubbing","authors":"Yogesh Virkar, Marcello Federico, Robert Enyedi, R. Barra-Chicote","doi":"10.1109/ICASSP39728.2021.9414966","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414966","url":null,"abstract":"Automatic dubbing is an extension of speech-to-speech translation such that the resulting target speech is carefully aligned in terms of duration, lip movements, timbre, emotion, prosody, etc. of the speaker in order to achieve audiovisual coherence. Dubbing quality strongly depends on isochrony, i.e., arranging the translation of the original speech to optimally match its sequence of phrases and pauses. To this end, we present improvements to the prosodic alignment component of our recently introduced dubbing architecture. We present empirical results for four dubbing directions – English to French, Italian, German and Spanish – on a publicly available collection of TED Talks. Compared to previous work, our enhanced prosodic alignment model significantly improves prosodic alignment accuracy and provides segmentation perceptibly better or on par with manually annotated reference segmentation.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114521812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse Parameter Estimation for PMCW MIMO Radar Using Few-Bit ADCs","authors":"Chao-Yi Wu, Jian Li, T. Wong","doi":"10.1109/ICASSP39728.2021.9414267","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414267","url":null,"abstract":"In this work, we consider target parameter estimation of phase-modulated continuous-wave (PMCW) multiple-input multiple-output (MIMO) radars with few-bit analog-to-digital converters (ADCs). We formulate the estimation problem as a sparse signal recovery problem and modify the fast iterative shrinkage-thresholding algorithm (FISTA) to solve it. The ℓ2,1-norm is adopted to promote the sparsity in the range-Doppler-angle domain. Simulation results show that using few-bit ADCs can achieve comparable performance to many-bit ADCs when targets are widely separated. However, if targets are spaced closely, performance losses can occur when 1-bit ADCs are applied.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114574315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DoA estimation of a hidden RF source exploiting simple backscatter radio tags","authors":"G. Vougioukas, A. Bletsas","doi":"10.1109/ICASSP39728.2021.9414918","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414918","url":null,"abstract":"Conventional direction of arrival (DoA) techniques employ multi-antenna receivers with increased complexity and cost. This work emulates a multi-antenna system using a singleantenna receiver and exploiting the beauty and simplicity of backscatter radio. More specifically, a number of simple backscatter radio tags offer copies of the hidden RF source, relayed in space and shifted in frequency, while requiring minimal time-synchronisation. DoA of a hidden RF source was estimated with an error of less than 5 degrees, exploiting a small number of simple, ultra-low-cost backscattering tags.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121972477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection","authors":"Otavio Braga, O. Siohan","doi":"10.1109/ICASSP39728.2021.9414160","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414160","url":null,"abstract":"Audio-visual automatic speech recognition is a promising approach to robust ASR under noisy conditions. However, up until recently it had been traditionally studied in isolation assuming the video of a single speaking face matches the audio, and selecting the active speaker at inference time when multiple people are on screen was put aside as a separate problem. As an alternative, recent work has proposed to address the two problems simultaneously with an attention mechanism, baking the speaker selection problem directly into a fully differentiable model. One interesting finding was that the attention indirectly learns the association between the audio and the speaking face even though this correspondence is never explicitly provided at training time. In the present work we further investigate this connection and examine the interplay between the two problems. With experiments involving over 50 thousand hours of public YouTube videos as training data, we first evaluate the accuracy of the attention layer on an active speaker selection task. Secondly, we show under closer scrutiny that an end-to-end model performs at least as well as a considerably larger two-step system that utilizes a hard decision boundary under various noise conditions and number of parallel face tracks.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122033490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fengyu Wang, Xiaolu Zeng, Chenshu Wu, Beibei Wang, K. Liu
{"title":"Radio Frequency Based Heart Rate Variability Monitoring","authors":"Fengyu Wang, Xiaolu Zeng, Chenshu Wu, Beibei Wang, K. Liu","doi":"10.1109/ICASSP39728.2021.9413465","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413465","url":null,"abstract":"Heart Rate Variability (HRV), which measures the fluctuation of heartbeat intervals, has been considered as an important indicator for general health evaluation. In this paper, we present mmHRV, a contact-free HRV monitoring system using commercial millimeter-wave (mmWave) radio. We devise a heartbeat signal extractor, which can optimize the decomposition of the phase of the channel information modulated by the chest movement, and thus estimate the heartbeat signal. The exact time of heartbeats is estimated by finding the peak location of the heartbeat signal while the Inter-Beat Intervals (IBIs) can be further derived for evaluating the HRV metrics. Experimental results show that mmHRV can measure the HRV accurately with 3.68ms average error of mean IBI (w.r.t. 99.49% accuracy) based on the experiments over 10 participants.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122065170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Wang, Chang Liu, Chuanyan Hu, Xudong Liu, Jun Cao
{"title":"Arrhythmia Classification with Heartbeat-Aware Transformer","authors":"Bin Wang, Chang Liu, Chuanyan Hu, Xudong Liu, Jun Cao","doi":"10.1109/ICASSP39728.2021.9413938","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413938","url":null,"abstract":"Electrocardiography (ECG) is a conventional method in arrhythmia diagnosis. In this paper, we proposed a novel neural network model which treats typical heartbeat classification task as ‘Translation’ problem. By introducing Transformer structure into model, and adding heartbeat-aware attention mechanism to enhance the alignment between encoded sequence and decoded sequence, after trained with ECG database, (which are collected from 200k patients in over 2000 hospitals for more than 10 years), the validation result of independent test dataset shows that this new heartbeat-aware Transformer model can outperform classic Transformer and other sequence to sequence methods. Finally, we show that the visualization of encoder-decoder attention weights provides more interpretable information about how a Transformer make a diagnosis based on raw ECG signals, which has guiding significance in clinical diagnosis.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122142524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}