ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Raw Waveform Based End-to-end Deep Convolutional Network for Spatial Localization of Multiple Acoustic Sources
Harshavardhan Sundar, Weiran Wang, Ming Sun, Chao Wang
{"title":"Raw Waveform Based End-to-end Deep Convolutional Network for Spatial Localization of Multiple Acoustic Sources","authors":"Harshavardhan Sundar, Weiran Wang, Ming Sun, Chao Wang","doi":"10.1109/ICASSP40776.2020.9054090","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054090","url":null,"abstract":"In this paper, we present an end-to-end deep convolutional neural network operating on multi-channel raw audio data to localize multiple simultaneously active acoustic sources in space. Previously reported deep learning based approaches work well in localizing a single source directly from multi-channel raw-audio, but are not easily extendable to localize multiple sources due to the well known permutation problem. We propose a novel encoding scheme to represent the spatial coordinates of multiple sources, which facilitates 2D localization of multiple sources in an end-to-end fashion, avoiding the permutation problem and achieving arbitrary spatial resolution. Experiments on a simulated data set and real recordings from the AV16.3 Corpus demonstrate that the proposed method generalizes well to unseen test conditions, and outperforms a recent time difference of arrival (TDOA) based multiple source localization approach reported in the literature.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"134 1","pages":"4642-4646"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76427371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Projection Free Dynamic Online Learning 投影免费动态在线学习
Deepak S. Kalhan, A. S. Bedi, Alec Koppel, K. Rajawat, Abhishek K. Gupta, Adrish Banerjee
{"title":"Projection Free Dynamic Online Learning","authors":"Deepak S. Kalhan, A. S. Bedi, Alec Koppel, K. Rajawat, Abhishek K. Gupta, Adrish Banerjee","doi":"10.1109/ICASSP40776.2020.9053771","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053771","url":null,"abstract":"Projection based algorithms are popular in the literature for online convex optimization with convex constraints and the projection step results in a bottleneck for the practical implementation of the algorithms. To avoid this bottleneck, we propose a projection-free scheme based on Frank-Wolfe: where instead of online gradient steps, we use steps that are collinear with the gradient but guaranteed to be feasible. We establish performance in terms of dynamic regret, which quantifies cost accumulation as compared with the optimal at each individual time slot. Specifically, for convex losses, we establish $mathcal{O}left( {{T^{1/2}}} right)$ dynamic regret up to metrics of non-stationarity. We relax the algorithm’s required information to only noisy gradient estimates, i.e., partial feedback and derived the dynamic regret bounds. Experiments on matrix completion problem and background separation in video demonstrate favorable performance of the proposed scheme.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"3957-3961"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76519411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Angular Discriminative Deep Feature Learning for Face Verification 面向人脸验证的角度判别深度特征学习
Bowen Wu, Huaming Wu
{"title":"Angular Discriminative Deep Feature Learning for Face Verification","authors":"Bowen Wu, Huaming Wu","doi":"10.1109/ICASSP40776.2020.9053675","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053675","url":null,"abstract":"Thanks to the development of deep Convolutional Neural Network (CNN), face verification has achieved great success rapidly. Specifically, Deep Distance Metric Learning (DDML), as an emerging area, has achieved great improvements in computer vision community. Softmax loss is widely used to supervise the training of most available CNN models. Whereas, feature normalization is often used to compute the pair similarities when testing. In order to bridge the gap between training and testing, we require that the intra-class cosine similarity of the inner-product layer before softmax loss is larger than a margin in the training step, accompanied by the supervision signal of softmax loss. To enhance the discriminative power of the deeply learned features, we extend the intra-class constraint to force the intra-class cosine similarity larger than the mean of nearest neighboring inter-class ones with a margin in the normalized exponential feature projection space. Extensive experiments on Labeled Face in the Wild (LFW) and Youtube Faces (YTF) datasets demonstrate that the proposed approaches achieve competitive performance for the open-set face verification task.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"98 1","pages":"2133-2137"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76536403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Slicenet: Slice-Wise 3D Shapes Reconstruction from Single Image 切片:从单个图像的切片三维形状重建
Yunjie Wu, Zhengxing Sun, Youcheng Song, Yunhan Sun, Jinlong Shi
{"title":"Slicenet: Slice-Wise 3D Shapes Reconstruction from Single Image","authors":"Yunjie Wu, Zhengxing Sun, Youcheng Song, Yunhan Sun, Jinlong Shi","doi":"10.1109/ICASSP40776.2020.9054674","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054674","url":null,"abstract":"3D object reconstruction from a single image is a highly ill-posed problem, requiring strong prior knowledge of 3D shapes. Deep learning methods are popular for this task. Especially, most works utilized 3D deconvolution to generate 3D shapes. However, the resolution of results is limited by the high resource consumption of 3D deconvolution. In this paper, we propose SliceNet, sequentially generating 2D slices of 3D shapes with shared 2D deconvolution parameters. To capture relations between slices, the RNN is also introduced. Our model has three main advantages: First, the introduction of RNN allows the CNN to focus more on local geometry details,improving the results’ fine-grained plausibility. Second, replacing 3D deconvolution with 2D deconvolution reducs much consumption of memory, enabling higher resolution of final results. Third, an slice-aware attention mechanism is designed to provide dynamic information for each slice’s generation, which helps modeling the difference between multiple slices, making the learning process easier. Experiments on both synthesized data and real data illustrate the effectiveness of our method.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"40 1","pages":"1833-1837"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76188018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Robust Phase Retrieval with Outliers 基于异常值的鲁棒相位检索
Xue Jiang, H. So, Xingzhao Liu
{"title":"Robust Phase Retrieval with Outliers","authors":"Xue Jiang, H. So, Xingzhao Liu","doi":"10.1109/ICASSP40776.2020.9053060","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053060","url":null,"abstract":"An outlier-resistance phase retrieval algorithm based on alternating direction method of multipliers (ADMM) is devised in this paper. Instead of the widely used least squares criterion that is only optimal for Gaussian noise environment, we adopt the least absolute deviation criterion to enhance the robustness against outliers. Considering both intensityand amplitude-based observation models, the framework of ADMM is developed to solve the resulting non-differentiable optimization problems. It is demonstrated that the core subproblem of ADMM is the proximity operator of the ℓ1-norm, which can be computed efficiently by soft-thresholding in each iteration. Simulation results are provided to validate the accuracy and efficiency of the proposed approach compared to the existing schemes.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"81 1","pages":"5320-5324"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87474215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploiting Vocal Tract Coordination Using Dilated CNNS For Depression Detection In Naturalistic Environments 利用扩张型CNNS在自然环境中进行抑郁检测的声道协调
Zhaocheng Huang, J. Epps, Dale Joachim
{"title":"Exploiting Vocal Tract Coordination Using Dilated CNNS For Depression Detection In Naturalistic Environments","authors":"Zhaocheng Huang, J. Epps, Dale Joachim","doi":"10.1109/ICASSP40776.2020.9054323","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054323","url":null,"abstract":"Depression detection from speech continues to attract significant research attention but remains a major challenge, particularly when the speech is acquired from diverse smartphones in natural environments. Analysis methods based on vocal tract coordination have shown great promise in depression and cognitive impairment detection for quantifying relationships between features over time through eigenvalues of multi-scale cross-correlations. Motivated by the success of these methods, this paper proposes a novel way to extract full vocal tract coordination (FVTC) features by use of convolutional neural networks (CNNs), overcoming earlier shortcomings. Evaluations of the proposed FVTC-CNN structure on depressed speech data show improvements in mean F1 scores of at least 16.4% under clean conditions and comparable results under noisy conditions relative to existing VTC baseline systems.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"6549-6553"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87922450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Rnn-Transducer with Stateless Prediction Network 基于无状态预测网络的rnn换能器
M. Ghodsi, Xiaofeng Liu, J. Apfel, Rodrigo Cabrera, Eugene Weinstein
{"title":"Rnn-Transducer with Stateless Prediction Network","authors":"M. Ghodsi, Xiaofeng Liu, J. Apfel, Rodrigo Cabrera, Eugene Weinstein","doi":"10.1109/ICASSP40776.2020.9054419","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9054419","url":null,"abstract":"The RNN-Transducer (RNNT) outperforms classic Automatic Speech Recognition (ASR) systems when a large amount of supervised training data is available. For low-resource languages, the RNNT models overfit, and can not directly take advantage of additional large text corpora as in classic ASR systems.We focus on the prediction network of the RNNT, since it is believed to be analogous to the Language Model (LM) in the classic ASR systems. We pre-train the prediction network with text-only data, which is not helpful. Moreover, removing the recurrent layers from the prediction network, which makes the prediction network stateless, performs virtually as well as the original RNNT model, when using wordpieces. The stateless prediction network does not depend on the previous output symbols, except the last one. Therefore it simplifies the RNNT architectures and the inference.Our results suggest that the RNNT prediction network does not function as the LM in classical ASR. Instead, it merely helps the model align to the input audio, while the RNNT encoder and joint networks capture both the acoustic and the linguistic information.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"39 1","pages":"7049-7053"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87004778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Favorable Propagation and Linear Multiuser Detection for Distributed Antenna Systems 分布式天线系统的有利传播与线性多用户检测
R. Gholami, L. Cottatellucci, D. Slock
{"title":"Favorable Propagation and Linear Multiuser Detection for Distributed Antenna Systems","authors":"R. Gholami, L. Cottatellucci, D. Slock","doi":"10.1109/ICASSP40776.2020.9053449","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053449","url":null,"abstract":"Cell-free MIMO, employing distributed antenna systems (DAS), is a promising approach to deal with the capacity crunch of next generation wireless communications. In this paper, we consider a wireless network with transmit and receive antennas distributed according to homogeneous point processes. The received signals are jointly processed at a central processing unit. We study if the favorable propagation properties, which enable almost optimal low complexity detection via matched filtering in massive MIMO systems, hold for DAS with line of sight (LoS) channels and general attenuation exponent. Making use of Euclidean random matrices (ERM) and their moments, we show that the analytical conditions for favorable propagation are not satisfied. Hence, we propose multistage detectors, of which the matched filter represents the initial stage. We show that polynomial expansion detectors and multistage Wiener filters coincide in DAS and substantially outperform matched filtering. Simulation results are presented which validate the analytical results.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"5190-5194"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87517141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
High-Accuracy Classification of Attention Deficit Hyperactivity Disorder with L2,1-Norm Linear Discriminant Analysis 注意缺陷多动障碍的L2,1-范数线性判别分析
Yibin Tang, Xufei Li, Ying Chen, Y. Zhong, A. Jiang, Xiaofeng Liu
{"title":"High-Accuracy Classification of Attention Deficit Hyperactivity Disorder with L2,1-Norm Linear Discriminant Analysis","authors":"Yibin Tang, Xufei Li, Ying Chen, Y. Zhong, A. Jiang, Xiaofeng Liu","doi":"10.1109/ICASSP40776.2020.9053391","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053391","url":null,"abstract":"Attention Deficit Hyperactivity Disorder (ADHD) is a high incidence of neurobehavioral disease in school-age children. Its neurobiological classification is meaningful for clinicians. The existing ADHD classification methods suffer from two problems, i.e., insufficient data and noise disturbance. Here, a high-accuracy classification method is proposed, which uses brain Functional Connectivity (FC) as material for ADHD feature analysis. In detail, we introduce a binary hypothesis testing framework as the classification outline to cope with insufficient data of ADHD database. Under binary hypotheses, the FCs of test data are allowed to use for training and thus affect the subspace learning of training data. To overcome noise disturbance, an l2,1-norm LDA model is adopted to robustly learn ADHD features in subspaces. The subspace energies of training data under binary hypotheses are then calculated, and an energy-based comparison is finally performed to identify ADHD individuals. On the platform of ADHD-200 database, the experiments show our method outperforms other state-of-the-art methods with the significant average accuracy of 97.6%.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"108 1","pages":"1170-1174"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87589816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Redundant Convolutional Network With Attention Mechanism For Monaural Speech Enhancement 基于注意机制的冗余卷积网络单词语音增强
Tian Lan, Yilan Lyu, Guoqiang Hui, Refuoe Mokhosi, Sen Li, Qiao Liu
{"title":"Redundant Convolutional Network With Attention Mechanism For Monaural Speech Enhancement","authors":"Tian Lan, Yilan Lyu, Guoqiang Hui, Refuoe Mokhosi, Sen Li, Qiao Liu","doi":"10.1109/ICASSP40776.2020.9053277","DOIUrl":"https://doi.org/10.1109/ICASSP40776.2020.9053277","url":null,"abstract":"The redundant convolutional encoder-decoder network has proven useful in speech enhancement tasks. It can capture localized time-frequency details of speech signals through both the fully convolutional network structure and feature selection capability resulting from the encoder-decoder mechanism. However, it does not explicitly consider the signal filtering mechanism, which we regard as important for speech enhancement models. In this study, we introduce an attention mechanism into the convolutional encoderdecoder model. This mechanism adaptively filters channelwise feature responses by explicitly modeling attentions (on speech versus noise signals) between channels. Experimental results show that the proposed attention model is effective in capturing speech signals from background noise, and performs especially better in unseen noise conditions compared to other state-of-the-art models.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"29 1","pages":"6654-6658"},"PeriodicalIF":0.0,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87751460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信