{"title":"Progressive Co-Teaching for Ambiguous Speech Emotion Recognition","authors":"Yifei Yin, Yu Gu, Longshan Yao, Ying Zhou, Xuefeng Liang, He Zhang","doi":"10.1109/ICASSP39728.2021.9414494","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414494","url":null,"abstract":"Speech emotion recognition is a challenging task due to the ambiguity of emotion, which makes it difficult to learn the features of emotion data using machine learning algorithms. However, previous studies conventionally ignore the ambiguity of emotion and treat the emotion data as the same difficulty level, which results in low recognition accuracy. Motivated by human and animal learning studies, we propose a novel method named Progressive Co-teaching (PCT) to learn speech emotion features from simple to difficult. PCT method automatically identifies the difficulty level of data by itself using loss values, and then each network exchanges easy instances with small loss to peer network for early training. The rest instances with large loss are added gradually for later training. The experiment results demonstrate that our method achieves an improvement of 3.8% and 1.27% on MAS and IEMOCAP database than the state-of-the-arts, respectively.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132502026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying Spammers to Boost Crowdsourced Classification","authors":"Panagiotis A. Traganitis, G. Giannakis","doi":"10.1109/ICASSP39728.2021.9414242","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414242","url":null,"abstract":"The present work addresses the problem of adversarial attacks in unsupervised ensemble or crowdsourcing classification tasks. Under certain conditions, it is shown, both analytically and through numerical tests, that spammers cause the most damage with respect to classification performance. To curb their effect, a novel spectral algorithm for spammer detection that utilizes second-order statistics of annotators, is developed and preliminary results on synthetic and real data showcase the potential of this approach.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130058053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Genovese, M. S. Hosseini, V. Piuri, K. Plataniotis, F. Scotti
{"title":"Acute Lymphoblastic Leukemia Detection Based on Adaptive Unsharpening and Deep Learning","authors":"A. Genovese, M. S. Hosseini, V. Piuri, K. Plataniotis, F. Scotti","doi":"10.1109/ICASSP39728.2021.9414362","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414362","url":null,"abstract":"Computer Aided Diagnosis (CAD) systems are increasingly utilizing image analysis and Deep Learning (DL) techniques, due to their high accuracy in several medical imaging fields, including the detection of Acute Lymphoblastic (or Lymphocytic) Leukemia (ALL) from peripheral blood samples. However, no method in the literature has specifically analyzed the focus quality of ALL images or proposed a technique for sharpening the samples in an adaptive way for the purpose of classification. To address this issue, in this paper we propose the first machine learning-based approach able to enhance blood sample images by an adaptive unsharpening method. The method uses image processing techniques and DL to normalize the radius of the cell, estimate the focus quality, adaptively improve the sharpness of the images, and then perform the classification. We evaluated the methodology on a public database of ALL images, considering several state-of-the-art CNNs to perform the classification, with results showing the validity of the proposed approach. For a complete reproducibility of the work, the source code is available at: http://iebil.di.unimi.it/cnnALL/index.htm.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130451136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Ranked Similarity Loss Function with pair Weighting for Deep Metric Learning","authors":"Jian Wang, Zhichao Zhang, Dongmei Huang, Wei Song, Quanmiao Wei, Xinyue Li","doi":"10.1109/ICASSP39728.2021.9414668","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414668","url":null,"abstract":"Metric learning is a widely-used method for image retrieval. The object of metric learning is to limit the distance between similar samples and increase the distance between samples of different classes through learning. Many studies tend to pay more attention to keep the distance between positive and negative samples, but ignore the distance between different classes of negative samples. In fact, query samples should be separated from negative samples of different classes by different distances. To address these problems, we propose to build a ranked similarity loss function with pair weighting (dubbed RMS loss). The proposed RMS loss can keep a distance between samples of different classes by weighting the negative samples according to the sorting order. Meanwhile, it further widens the distance between positive and negative samples by different processing of similarity of positive pairs and negative pairs. The effectiveness of our method is evaluated by extensive experiments on four public datasets and compared with state-of-the-art methods. The results show the proposed method obtains new performance on four public datasets, e.g., reaching 67.4% on CUB200 at Recall@1.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134408805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Haubner, Andreas Brendel, Mohamed Elminshawi, Walter Kellermann
{"title":"Noise-Robust Adaptation Control for Supervised Acoustic System Identification Exploiting a Noise Dictionary","authors":"Thomas Haubner, Andreas Brendel, Mohamed Elminshawi, Walter Kellermann","doi":"10.1109/ICASSP39728.2021.9414180","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414180","url":null,"abstract":"We present a noise-robust adaptation control strategy for block-online supervised acoustic system identification by exploiting a noise dictionary. The proposed algorithm takes advantage of the pronounced spectral structure which characterizes many types of interfering noise signals. We model the noisy observations by a linear Gaussian Discrete Fourier Transform-domain state space model whose parameters are estimated by an online generalized Expectation-Maximization algorithm. Unlike all other state-of-the-art approaches we suggest to model the covariance matrix of the observation probability density function by a dictionary model. We propose to learn the noise dictionary from training data, which can be gathered either offline or online whenever the system is not excited, while we infer the activations continuously. The proposed algorithm represents a novel machine-learning-based approach to noise-robust adaptation control which allows for faster convergence in applications characterized by high-level and non-stationary interfering noise signals and abrupt system changes.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133902493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Indurthi, Mohd Abbas Zaidi, Nikhil Kumar Lakumarapu, Beomseok Lee, HyoJung Han, Seokchan Ahn, Sangha Kim, Chanwoo Kim, Inchul Hwang
{"title":"Task Aware Multi-Task Learning for Speech to Text Tasks","authors":"S. Indurthi, Mohd Abbas Zaidi, Nikhil Kumar Lakumarapu, Beomseok Lee, HyoJung Han, Seokchan Ahn, Sangha Kim, Chanwoo Kim, Inchul Hwang","doi":"10.1109/ICASSP39728.2021.9414703","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414703","url":null,"abstract":"In general, the direct Speech-to-text translation (ST) is jointly trained with Automatic Speech Recognition (ASR), and Machine Translation (MT) tasks. However, the issues with the current joint learning strategies inhibit the knowledge transfer across these tasks. We propose a task modulation network which allows the model to learn task specific features, while learning the shared features simultaneously. This proposed approach removes the need for separate finetuning step resulting in a single model which performs all these tasks. This single model achieves a performance of 28.64 BLEU score on ST MuST-C English-German, WER of 11.61% on ASR TEDLium v3, 23.35 BLEU score on MT WMT’15 English-German task. This sets a new state-of-the-art performance (SOTA) on the ST task while outperforming the existing end-to-end ASR systems.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134370824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. He, Yangjinan Hu, Lulu Wang, Zhongshi He, Jinglong Du
{"title":"Gating Feature Dense Network for Single Anisotropic Mr Image Super-Resolution","authors":"W. He, Yangjinan Hu, Lulu Wang, Zhongshi He, Jinglong Du","doi":"10.1109/ICASSP39728.2021.9414646","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414646","url":null,"abstract":"High resolution (HR) magnetic resonance (MR) images are crucial for medical diagnosis. However, in practice, low resolution MR images are often acquired due to hardware limitation. In this work, we propose a gating feature dense network to reconstruct HR MR images from low resolution acquisitions, where we use local residual dense block (LRDB) as the backbone. We propose gating mechanism, which includes absorption gate and release gate, to adaptively introduce the informative features of previous LRDBs to current LRDB to solve the problem of insufficient features sharing. The absorption gate can fuse the output feature of LRDBs with adaptive weights, which allows the model to adaptively learn the effects of different LRDBs for MR image super-resolution (SR). Experimental results show that our proposed method achieves a new state-of-the-art quantitative and visual performance in anisotropic MR image SR.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131513337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Thinned Coprime Array for DOA Estimation","authors":"Junpeng Shi, Yongxiang Liu, Fang-qing Wen, Zhen Liu, Panhe Hu, Zhenghui Gong","doi":"10.1109/ICASSP39728.2021.9414146","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414146","url":null,"abstract":"Owing to the large degrees of freedom and reduced mutual coupling by producing difference coarrays, nonuniform linear arrays have aroused great interest in direction of arrival (DOA) estimation. Previous works have presented some new sparse arrays, such as the thinned coprime array. In this paper, we propose a generalized thinned coprime array by introducing the flexible inter-element spacings, where the conventional one can be seen as a special case. We derive closedform expression for the range of consecutive lags, written as the functions of the antenna numbers and inter-element spacings. We show that, after optimization, the proposed array can achieve more consecutive lags than the other coprime arrays. In particular, the optimized results also provide the minimum number of antenna pairs with small separation. Simulation results demonstrate the superiority of the proposed GTCA using the subspace-based method.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132960198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erting Pan, Yong Ma, Xiaoguang Mei, Fan Fan, Jiayi Ma
{"title":"Unsupervised Stacked Capsule Autoencoder for Hyperspectral Image Classification","authors":"Erting Pan, Yong Ma, Xiaoguang Mei, Fan Fan, Jiayi Ma","doi":"10.1109/ICASSP39728.2021.9413664","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413664","url":null,"abstract":"Since CapsNet [1] shattered all previous records of algorithms for image recognition, the capsule's conception has attracted bright attention. It interprets an object by the geometrical arrangement of parts. We think it can be transferred to hyperspectral images. In a hyperspectral data cube, each pixel spectrum can be regarded as a continuous curve representing its inherent properties. In the spatial domain, there are various spatial distributions in different positionsand there is usually a specific structural relationship between adjacently distributed categories. Based on HSI data's aforementioned structural characteristics, combined with the stacked capsule autoencoder, we propose our model to achieve an unsupervised HSI classification. In our model, the ConvLSTM is employed to discover part capsules of HSI, and we utilize Set Transformer to encode relations among all parts and indicate object capsules. The decoders of both phases use Gaussian mixture models to reconstruct specific information. Experimental results of the Pavia Center dataset show the exceptional of our model.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127562213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Tubular Structure Tracking Algorithm Based On Curvature-Penalized Perceptual Grouping","authors":"Li Liu, Da Chen, Minglei Shu, H. Shu, L. Cohen","doi":"10.1109/ICASSP39728.2021.9414114","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414114","url":null,"abstract":"In this paper, we propose a new minimal path-based framework for minimally interactive tubular structure tracking in conjunction with a perceptual grouping scheme. The minimal path models have shown great advantages in tubular structures tracing. However, they suffer from shortcuts or short branches combination problems especially in the case of tubular network with complicated structures or background. Thus, we utilize the curvature-penalized minimal paths and the prescribed tubular trajectories to seek the desired shortest path. The proposed approach benefits from the local smoothness prior on tubular structures and the global optimality of the graph-based path searching scheme. Experimental results on synthetic and real images prove that the proposed model indeed obtains outperformance to state-of-the-art minimal path-based algorithms.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128942506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}