Harshavardhan Sundar, Weiran Wang, Ming Sun, Chao Wang
{"title":"Raw Waveform Based End-to-end Deep Convolutional Network for Spatial Localization of Multiple Acoustic Sources","authors":"Harshavardhan Sundar, Weiran Wang, Ming Sun, Chao Wang","doi":"10.1109/ICASSP40776.2020.9054090","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054090","url":null,"abstract":"In this paper, we present an end-to-end deep convolutional neural network operating on multi-channel raw audio data to localize multiple simultaneously active acoustic sources in space. Previously reported deep learning based approaches work well in localizing a single source directly from multi-channel raw-audio, but are not easily extendable to localize multiple sources due to the well known permutation problem. We propose a novel encoding scheme to represent the spatial coordinates of multiple sources, which facilitates 2D localization of multiple sources in an end-to-end fashion, avoiding the permutation problem and achieving arbitrary spatial resolution. Experiments on a simulated data set and real recordings from the AV16.3 Corpus demonstrate that the proposed method generalizes well to unseen test conditions, and outperforms a recent time difference of arrival (TDOA) based multiple source localization approach reported in the literature.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"134 1","pages":"4642-4646"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76427371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deepak S. Kalhan, A. S. Bedi, Alec Koppel, K. Rajawat, Abhishek K. Gupta, Adrish Banerjee
{"title":"Projection Free Dynamic Online Learning","authors":"Deepak S. Kalhan, A. S. Bedi, Alec Koppel, K. Rajawat, Abhishek K. Gupta, Adrish Banerjee","doi":"10.1109/ICASSP40776.2020.9053771","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053771","url":null,"abstract":"Projection based algorithms are popular in the literature for online convex optimization with convex constraints and the projection step results in a bottleneck for the practical implementation of the algorithms. To avoid this bottleneck, we propose a projection-free scheme based on Frank-Wolfe: where instead of online gradient steps, we use steps that are collinear with the gradient but guaranteed to be feasible. We establish performance in terms of dynamic regret, which quantifies cost accumulation as compared with the optimal at each individual time slot. Specifically, for convex losses, we establish $mathcal{O}left( {{T^{1/2}}} right)$ dynamic regret up to metrics of non-stationarity. We relax the algorithm’s required information to only noisy gradient estimates, i.e., partial feedback and derived the dynamic regret bounds. Experiments on matrix completion problem and background separation in video demonstrate favorable performance of the proposed scheme.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"3957-3961"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76519411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Angular Discriminative Deep Feature Learning for Face Verification","authors":"Bowen Wu, Huaming Wu","doi":"10.1109/ICASSP40776.2020.9053675","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053675","url":null,"abstract":"Thanks to the development of deep Convolutional Neural Network (CNN), face verification has achieved great success rapidly. Specifically, Deep Distance Metric Learning (DDML), as an emerging area, has achieved great improvements in computer vision community. Softmax loss is widely used to supervise the training of most available CNN models. Whereas, feature normalization is often used to compute the pair similarities when testing. In order to bridge the gap between training and testing, we require that the intra-class cosine similarity of the inner-product layer before softmax loss is larger than a margin in the training step, accompanied by the supervision signal of softmax loss. To enhance the discriminative power of the deeply learned features, we extend the intra-class constraint to force the intra-class cosine similarity larger than the mean of nearest neighboring inter-class ones with a margin in the normalized exponential feature projection space. Extensive experiments on Labeled Face in the Wild (LFW) and Youtube Faces (YTF) datasets demonstrate that the proposed approaches achieve competitive performance for the open-set face verification task.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"98 1","pages":"2133-2137"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76536403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunjie Wu, Zhengxing Sun, Youcheng Song, Yunhan Sun, Jinlong Shi
{"title":"Slicenet: Slice-Wise 3D Shapes Reconstruction from Single Image","authors":"Yunjie Wu, Zhengxing Sun, Youcheng Song, Yunhan Sun, Jinlong Shi","doi":"10.1109/ICASSP40776.2020.9054674","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054674","url":null,"abstract":"3D object reconstruction from a single image is a highly ill-posed problem, requiring strong prior knowledge of 3D shapes. Deep learning methods are popular for this task. Especially, most works utilized 3D deconvolution to generate 3D shapes. However, the resolution of results is limited by the high resource consumption of 3D deconvolution. In this paper, we propose SliceNet, sequentially generating 2D slices of 3D shapes with shared 2D deconvolution parameters. To capture relations between slices, the RNN is also introduced. Our model has three main advantages: First, the introduction of RNN allows the CNN to focus more on local geometry details,improving the results’ fine-grained plausibility. Second, replacing 3D deconvolution with 2D deconvolution reducs much consumption of memory, enabling higher resolution of final results. Third, an slice-aware attention mechanism is designed to provide dynamic information for each slice’s generation, which helps modeling the difference between multiple slices, making the learning process easier. Experiments on both synthesized data and real data illustrate the effectiveness of our method.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"1833-1837"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76188018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Phase Retrieval with Outliers","authors":"Xue Jiang, H. So, Xingzhao Liu","doi":"10.1109/ICASSP40776.2020.9053060","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053060","url":null,"abstract":"An outlier-resistance phase retrieval algorithm based on alternating direction method of multipliers (ADMM) is devised in this paper. Instead of the widely used least squares criterion that is only optimal for Gaussian noise environment, we adopt the least absolute deviation criterion to enhance the robustness against outliers. Considering both intensityand amplitude-based observation models, the framework of ADMM is developed to solve the resulting non-differentiable optimization problems. It is demonstrated that the core subproblem of ADMM is the proximity operator of the ℓ1-norm, which can be computed efficiently by soft-thresholding in each iteration. Simulation results are provided to validate the accuracy and efficiency of the proposed approach compared to the existing schemes.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"81 1","pages":"5320-5324"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87474215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting Vocal Tract Coordination Using Dilated CNNS For Depression Detection In Naturalistic Environments","authors":"Zhaocheng Huang, J. Epps, Dale Joachim","doi":"10.1109/ICASSP40776.2020.9054323","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054323","url":null,"abstract":"Depression detection from speech continues to attract significant research attention but remains a major challenge, particularly when the speech is acquired from diverse smartphones in natural environments. Analysis methods based on vocal tract coordination have shown great promise in depression and cognitive impairment detection for quantifying relationships between features over time through eigenvalues of multi-scale cross-correlations. Motivated by the success of these methods, this paper proposes a novel way to extract full vocal tract coordination (FVTC) features by use of convolutional neural networks (CNNs), overcoming earlier shortcomings. Evaluations of the proposed FVTC-CNN structure on depressed speech data show improvements in mean F1 scores of at least 16.4% under clean conditions and comparable results under noisy conditions relative to existing VTC baseline systems.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"6549-6553"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87922450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Ghodsi, Xiaofeng Liu, J. Apfel, Rodrigo Cabrera, Eugene Weinstein
{"title":"Rnn-Transducer with Stateless Prediction Network","authors":"M. Ghodsi, Xiaofeng Liu, J. Apfel, Rodrigo Cabrera, Eugene Weinstein","doi":"10.1109/ICASSP40776.2020.9054419","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054419","url":null,"abstract":"The RNN-Transducer (RNNT) outperforms classic Automatic Speech Recognition (ASR) systems when a large amount of supervised training data is available. For low-resource languages, the RNNT models overfit, and can not directly take advantage of additional large text corpora as in classic ASR systems.We focus on the prediction network of the RNNT, since it is believed to be analogous to the Language Model (LM) in the classic ASR systems. We pre-train the prediction network with text-only data, which is not helpful. Moreover, removing the recurrent layers from the prediction network, which makes the prediction network stateless, performs virtually as well as the original RNNT model, when using wordpieces. The stateless prediction network does not depend on the previous output symbols, except the last one. Therefore it simplifies the RNNT architectures and the inference.Our results suggest that the RNNT prediction network does not function as the LM in classical ASR. Instead, it merely helps the model align to the input audio, while the RNNT encoder and joint networks capture both the acoustic and the linguistic information.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 1","pages":"7049-7053"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87004778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Favorable Propagation and Linear Multiuser Detection for Distributed Antenna Systems","authors":"R. Gholami, L. Cottatellucci, D. Slock","doi":"10.1109/ICASSP40776.2020.9053449","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053449","url":null,"abstract":"Cell-free MIMO, employing distributed antenna systems (DAS), is a promising approach to deal with the capacity crunch of next generation wireless communications. In this paper, we consider a wireless network with transmit and receive antennas distributed according to homogeneous point processes. The received signals are jointly processed at a central processing unit. We study if the favorable propagation properties, which enable almost optimal low complexity detection via matched filtering in massive MIMO systems, hold for DAS with line of sight (LoS) channels and general attenuation exponent. Making use of Euclidean random matrices (ERM) and their moments, we show that the analytical conditions for favorable propagation are not satisfied. Hence, we propose multistage detectors, of which the matched filter represents the initial stage. We show that polynomial expansion detectors and multistage Wiener filters coincide in DAS and substantially outperform matched filtering. Simulation results are presented which validate the analytical results.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"5190-5194"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87517141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yibin Tang, Xufei Li, Ying Chen, Y. Zhong, A. Jiang, Xiaofeng Liu
{"title":"High-Accuracy Classification of Attention Deficit Hyperactivity Disorder with L2,1-Norm Linear Discriminant Analysis","authors":"Yibin Tang, Xufei Li, Ying Chen, Y. Zhong, A. Jiang, Xiaofeng Liu","doi":"10.1109/ICASSP40776.2020.9053391","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053391","url":null,"abstract":"Attention Deficit Hyperactivity Disorder (ADHD) is a high incidence of neurobehavioral disease in school-age children. Its neurobiological classification is meaningful for clinicians. The existing ADHD classification methods suffer from two problems, i.e., insufficient data and noise disturbance. Here, a high-accuracy classification method is proposed, which uses brain Functional Connectivity (FC) as material for ADHD feature analysis. In detail, we introduce a binary hypothesis testing framework as the classification outline to cope with insufficient data of ADHD database. Under binary hypotheses, the FCs of test data are allowed to use for training and thus affect the subspace learning of training data. To overcome noise disturbance, an l2,1-norm LDA model is adopted to robustly learn ADHD features in subspaces. The subspace energies of training data under binary hypotheses are then calculated, and an energy-based comparison is finally performed to identify ADHD individuals. On the platform of ADHD-200 database, the experiments show our method outperforms other state-of-the-art methods with the significant average accuracy of 97.6%.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"108 1","pages":"1170-1174"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87589816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tian Lan, Yilan Lyu, Guoqiang Hui, Refuoe Mokhosi, Sen Li, Qiao Liu
{"title":"Redundant Convolutional Network With Attention Mechanism For Monaural Speech Enhancement","authors":"Tian Lan, Yilan Lyu, Guoqiang Hui, Refuoe Mokhosi, Sen Li, Qiao Liu","doi":"10.1109/ICASSP40776.2020.9053277","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053277","url":null,"abstract":"The redundant convolutional encoder-decoder network has proven useful in speech enhancement tasks. It can capture localized time-frequency details of speech signals through both the fully convolutional network structure and feature selection capability resulting from the encoder-decoder mechanism. However, it does not explicitly consider the signal filtering mechanism, which we regard as important for speech enhancement models. In this study, we introduce an attention mechanism into the convolutional encoderdecoder model. This mechanism adaptively filters channelwise feature responses by explicitly modeling attentions (on speech versus noise signals) between channels. Experimental results show that the proposed attention model is effective in capturing speech signals from background noise, and performs especially better in unseen noise conditions compared to other state-of-the-art models.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"29 1","pages":"6654-6658"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87751460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}