ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
Unveiling Anomalous Nodes Via Random Sampling and Consensus on Graphs 基于图上随机抽样和一致性的异常节点揭示
V. N. Ioannidis, Dimitris Berberidis, G. Giannakis
{"title":"Unveiling Anomalous Nodes Via Random Sampling and Consensus on Graphs","authors":"V. N. Ioannidis, Dimitris Berberidis, G. Giannakis","doi":"10.1109/ICASSP39728.2021.9414953","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414953","url":null,"abstract":"The present paper develops a graph-based sampling and consensus (GraphSAC) approach to effectively detect anomalous nodes in large-scale graphs. GraphSAC randomly draws sub-sets of nodes, and relies on graph-aware criteria to judiciously filter out sets contaminated by anomalous nodes, before employing a semi-supervised learning (SSL) module to estimate nominal label distributions per node. These learned nominal distributions are minimally affected by the anomalous nodes, and hence can be directly adopted for anomaly detection. The per-draw complexity grows linearly with the number of edges, which implies efficient SSL, while draws can be run in parallel, thereby ensuring scalability to large graphs. GraphSAC is tested under different anomaly generation models based on random walks, as well as contemporary adversarial attacks for graph data. Experiments with real-world graphs show-case the advantage of GraphSAC relative to state-of-the-art alternatives.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123680286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Structured Support Exploration for Multilayer Sparse Matrix Factorization 多层稀疏矩阵分解的结构化支持探索
Quoc-Tung Le, R. Gribonval
{"title":"Structured Support Exploration for Multilayer Sparse Matrix Factorization","authors":"Quoc-Tung Le, R. Gribonval","doi":"10.1109/ICASSP39728.2021.9414238","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414238","url":null,"abstract":"Matrix factorization with sparsity constraints plays an important role in many machine learning and signal processing problems such as dictionary learning, data visualization, dimension reduction. Among the most popular tools for sparse matrix factorization are proximal algorithms, a family of algorithms based on proximal operators. In this paper, we address two problems with the application of proximal algorithms to sparse matrix factorization. On the one hand, we analyze a weakness of proximal algorithms in sparse matrix factorization: the premature convergence of the support. A remedy is also proposed to address this problem. On the other hand, we describe a new tractable proximal operator called Generalized Hungarian Method, associated to so-called k-regular matrices, which are useful for the factorization of a class of matrices associated to fast linear transforms. We further illustrate the effectiveness of our proposals by numerical experiments on the Hadamard Transform and magnetoencephalography matrix factorization.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125562522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
End-to-End Multilingual Automatic Speech Recognition for Less-Resourced Languages: The Case of Four Ethiopian Languages 端到端多语言自动语音识别资源较少的语言:四种埃塞俄比亚语言的情况
S. Abate, Martha Yifiru Tachbelie, T. Schultz
{"title":"End-to-End Multilingual Automatic Speech Recognition for Less-Resourced Languages: The Case of Four Ethiopian Languages","authors":"S. Abate, Martha Yifiru Tachbelie, T. Schultz","doi":"10.1109/ICASSP39728.2021.9415020","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9415020","url":null,"abstract":"The End-to-End (E2E) approach, which maps a sequence of input features into a sequence of graphemes or words, to Automatic Speech Recognition (ASR) is a hot research agenda. It is interesting for less-resourced languages since it avoids the use of pronunciation dictionary, which is one of the major components in the traditional ASR systems. However, like any deep neural network (DNN) approaches, E2E is data greedy. This makes the application of E2E to less-resourced languages questionable. However, using data from other languages in a multilingual (ML) setup is being applied to solve the problem of data scarcity. We have, therefore, conducted ML E2E ASR experiments for four less-resourced Ethiopian languages using different language and acoustic modelling units. The results of our experiments show that relative Word Error Rate (WER) reductions (over the monolingual E2E systems) of up to 29.83% can be achieved by just using data of two related languages in E2E ASR system training. Moreover, we have also noticed that the use of data from less related languages also leads to E2E ASR performance improvement over the use of monolingual data.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115019358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Exploiting Non-Negative Matrix Factorization for Binaural Sound Localization in the Presence of Directional Interference 利用非负矩阵分解法进行定向干扰下双耳声音定位
Ingvi Örnolfsson, T. Dau, Ning Ma, T. May
{"title":"Exploiting Non-Negative Matrix Factorization for Binaural Sound Localization in the Presence of Directional Interference","authors":"Ingvi Örnolfsson, T. Dau, Ning Ma, T. May","doi":"10.1109/ICASSP39728.2021.9414233","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414233","url":null,"abstract":"This study presents a novel solution to the problem of binaural localization of a speaker in the presence of interfering directional noise and reverberation. Using a state-of-the-art binaural localization algorithm based on a deep neural network (DNN), we propose adding a source separation stage based on non-negative matrix factorization (NMF) to improve the localization performance in conditions with interfering sources. The separation stage is coupled with the localization stage and is optimized with respect to a broad range of different acoustic conditions, emphasizing a robust and generalizable solution. The machine listening system is shown to greatly benefit from the NMF-based separation stage at low target-to-masker ratios (TMRs) for a variety of noise types, especially for non-stationary noise. It is also demonstrated that training the NMF algorithm on anechoic speech provides better performance than using reverberant speech, and that optimizing the source separation stage using a localization metric rather than a source separation metric substantially increases the system performance.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115222588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Prediction of Egfr Mutation Status in Lung Adenocarcinoma Using Multi-Source Feature Representations 利用多源特征表征预测肺腺癌中Egfr突变状态
Jianhong Cheng, Jin Liu, M. Jiang, H. Yue, Lin Wu, Jianxin Wang
{"title":"Prediction of Egfr Mutation Status in Lung Adenocarcinoma Using Multi-Source Feature Representations","authors":"Jianhong Cheng, Jin Liu, M. Jiang, H. Yue, Lin Wu, Jianxin Wang","doi":"10.1109/ICASSP39728.2021.9414064","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414064","url":null,"abstract":"Epidermal growth factor receptor (EGFR) genotyping is essential to treatment guidelines for the use of tyrosine kinase inhibitors in lung adenocarcinoma. However, accurate and noninvasive methods to detect the EGFR gene are ongoing challenges. In this study, we propose a hybrid framework, namely HC-DLR, to noninvasively predict EGFR mutation status by fusing multi-source features including low-level handcrafted radiomics (HCR) features, high-level deep learning-based radiomics (DLR) features, and demographics features. The HCR features first are selected from massive handcrafted features extracted from CT images. The DLR features are also extracted from CT images using the pre-trained 3D DenseNet. Then, multi-source feature representations are refined and fused to build an HC-DLR model for improving the predictive performance of EGFR mutations. The proposed method is evaluated on a newly collected dataset with 670 patients. Experimental results show that the HC-DLR model achieves an encouraging predictive performance with an AUC of 0.76, an accuracy of 72.47%, and an F1-score of 71.35%, which may have potential clinical value for predicting EGFR mutations in lung adenocarcinoma.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115577867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Low Latency Online Blind Source Separation Based on Joint Optimization with Blind Dereverberation 基于联合优化和盲去噪的低延迟在线盲信源分离
Tetsuya Ueda, T. Nakatani, Rintaro Ikeshita, K. Kinoshita, S. Araki, S. Makino
{"title":"Low Latency Online Blind Source Separation Based on Joint Optimization with Blind Dereverberation","authors":"Tetsuya Ueda, T. Nakatani, Rintaro Ikeshita, K. Kinoshita, S. Araki, S. Makino","doi":"10.1109/ICASSP39728.2021.9413700","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413700","url":null,"abstract":"This paper presents a new low-latency online blind source separation (BSS) algorithm. Although algorithmic delay of a frequency domain online BSS can be reduced simply by shortening the short-time Fourier transform (STFT) frame length, it degrades the source separation performance in the presence of reverberation. This paper proposes a method to solve this problem by integrating BSS with Weighted Prediction Error (WPE) based dereverberation. Although a simple cascade of online BSS after online WPE upgrades the separation performance, the overall optimality is not guaranteed. Instead, this paper extends a recently proposed batch processing algorithm that can jointly optimize dereverberation and separation so that it can perform online processing with low computational cost and little processing delay (< 12 ms). The results of a source separation experiment in a noisy car environment suggest that the proposed online method has better separation performance than the simple cascaded methods.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115582236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Compressing Deep Neural Networks for Efficient Speech Enhancement 压缩深度神经网络用于高效语音增强
Ke Tan, Deliang Wang
{"title":"Compressing Deep Neural Networks for Efficient Speech Enhancement","authors":"Ke Tan, Deliang Wang","doi":"10.1109/ICASSP39728.2021.9413536","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413536","url":null,"abstract":"The use of deep neural networks (DNNs) has dramatically improved the performance of speech enhancement in the past decade. However, a large DNN is typically required to achieve strong enhancement performance, and this kind of model is both computationally intensive and memory consuming. Hence it is difficult to deploy such DNNs on devices with limited hardware resources or in applications with strict latency requirements. In order to address this problem, we propose a model compression pipeline to reduce DNN size for speech enhancement, which is based on three kinds of techniques: sparse regularization, iterative pruning and clustering-based quantization. Evaluation results show that our approach substantially reduces the sizes of different DNNs without significantly affecting their enhancement performance. Moreover, we find that training and compressing a large DNN yields higher STOI and PESQ than directly training a small DNN that has a comparable size to the compressed DNN. This further suggests the benefits of using the proposed model compression approach.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116409768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Nested Error Map Generation Network for No-Reference Image Quality Assessment 无参考图像质量评估的嵌套误差图生成网络
Junming Chen, Haiqiang Wang, Ge Li, Shan Liu
{"title":"Nested Error Map Generation Network for No-Reference Image Quality Assessment","authors":"Junming Chen, Haiqiang Wang, Ge Li, Shan Liu","doi":"10.1109/ICASSP39728.2021.9413489","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413489","url":null,"abstract":"We propose a multi-task learning neural network for No-Reference image quality assessment (NR-IQA). The pro-posed architecture consists of a backbone feature extractor, a nested multi-task generative module and a quality regression module. We adopt a coarse-to-fine strategy to predict objective error maps in two subtasks optimized with different loss functions. The network is designed to be nested such that discriminative features learned from subtasks are efficiently shared by the primary task. Perceptual distortion maps are achieved by applying masking mechanism between reconstructed error maps and the learned distortion sensitivity map. At last, a quality regression module is adopted to nonlinearly map masked distortions to the subjective score. Experimental results demonstrate the superior performances of the proposed model over state-of-the-art models. The implementation of our method is released at https://github.com/R-JunmingChen/NEMG-IQA.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116460901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Blind Extraction of Moving Audio Source in a Challenging Environment Supported by Speaker Identification Via X-Vectors 基于x向量说话人识别的环境下运动声源的盲提取
J. Málek, Jakub Janský, Tomás Kounovský, Zbyněk Koldovský, J. Zdánský
{"title":"Blind Extraction of Moving Audio Source in a Challenging Environment Supported by Speaker Identification Via X-Vectors","authors":"J. Málek, Jakub Janský, Tomás Kounovský, Zbyněk Koldovský, J. Zdánský","doi":"10.1109/ICASSP39728.2021.9414331","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9414331","url":null,"abstract":"We propose a novel approach for semi-supervised extraction of a moving audio source of interest (SOI) applicable in reverberant and noisy environments. The blind part of the method is based on independent vector extraction (IVE) and uses the recently proposed constant separating vector (CSV) mixing model. This model allows for changes of mixing parameters within the processed interval of the mixture, which potentially leads to higher accuracy of SOI estimation. The supervised part of the method concerns a pilot signal, which is related to the SOI and ensures the convergence of the blind method towards the SOI. The pilot is based on robust detection of frames where SOI is dominant via speaker embeddings called X-vectors. Robustness of the detection is achieved through augmentation of the data for the supervised training of the X-vectors. The pilot-supported extraction yields significantly better performance compared to its unsupervised counterpart identifying SOI solely using the initialization.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116563347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Scale Invariant Measure of Flatness for Deep Network Minima 深度网络最小值平坦度的尺度不变度量
Akshay Rangamani, Nam H. Nguyen, Abhishek Kumar, D. Phan, S. Chin, T. Tran
{"title":"A Scale Invariant Measure of Flatness for Deep Network Minima","authors":"Akshay Rangamani, Nam H. Nguyen, Abhishek Kumar, D. Phan, S. Chin, T. Tran","doi":"10.1109/ICASSP39728.2021.9413771","DOIUrl":"https://doi.org/10.1109/ICASSP39728.2021.9413771","url":null,"abstract":"It has been empirically observed that the flatness of minima obtained from training deep networks seems to correlate with better generalization. However, for deep networks with positively homogeneous activations, most measures of flatness are not invariant to rescaling of the network parameters. This means that the measure of flatness can be made as small or as large as possible through rescaling, rendering the quantitative measures meaningless. In this paper we show that for deep networks with positively homogenous activations, these rescalings constitute equivalence relations, and that these equivalence relations induce a quotient manifold structure in the parameter space. Using an appropriate Riemannian metric, we propose a Hessian-based measure for flatness that is invariant to rescaling and perform simulations to empirically verify our claim. Finally we perform experiments to verify that our flatness measure correlates with generalization by using minibatch stochastic gradient descent with different batch sizes to find deep network minima with different generalization properties.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122428929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信