ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
A Unified Two-Stage Model for Separating Superimposed Images 一种统一的两阶段叠加图像分离模型
Huiyu Duan, Xiongkuo Min, Wei Shen, Guangtao Zhai
{"title":"A Unified Two-Stage Model for Separating Superimposed Images","authors":"Huiyu Duan, Xiongkuo Min, Wei Shen, Guangtao Zhai","doi":"10.1109/icassp43922.2022.9746606","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746606","url":null,"abstract":"A single superimposed image containing two image views causes visual confusion for both human vision and computer vision. Human vision needs a \"develop-then-rival\" process to decompose the superimposed image into two individual images, which effectively suppresses visual confusion. In this paper, we propose a human vision-inspired framework for separating superimposed images. We first propose a network to simulate the development stage, which tries to understand and distinguish the semantic information of the two layers of a single superimposed image. To further simulate the rivalry activation/suppression process in human brains, we carefully design a rivalry stage, which incorporates the original mixed input (superimposed image), the activated visual information (outputs of the development stage) together, and then rivals to get images without ambiguity. Experimental results show that our novel framework effectively separates the superimposed images and significantly improves the performance with better output quality compared with state-of-the-art methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123906339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Attention Guided Invariance Selection for Local Feature Descriptors 局部特征描述符的注意引导不变性选择
Jiapeng Li, Ge Li, Thomas H. Li
{"title":"Attention Guided Invariance Selection for Local Feature Descriptors","authors":"Jiapeng Li, Ge Li, Thomas H. Li","doi":"10.1109/icassp43922.2022.9746419","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746419","url":null,"abstract":"To copy with the extreme variations of illumination and rotation in the real world, popular descriptors have captured more invariance recently, but more invariance makes descriptors less informative. So this paper designs a unique attention guided framework (named AISLFD) to select appropriate invariance for local feature descriptors, which boosts the performance of descriptors even in the scenes with extreme changes. Specifically, we first explore an efficient multi-scale feature extraction module that provides our local descriptors with more useful information. Besides, we propose a novel parallel self-attention module to get meta descriptors with the global receptive field, which guides the invariance selection more correctly. Compared with state-of-the-art methods, our method achieves competitive performance through sufficient experiments.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123508287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
New Improved Criterion for Model Selection in Sparse High-Dimensional Linear Regression Models 稀疏高维线性回归模型模型选择的新改进准则
P. B. Gohain, M. Jansson
{"title":"New Improved Criterion for Model Selection in Sparse High-Dimensional Linear Regression Models","authors":"P. B. Gohain, M. Jansson","doi":"10.1109/icassp43922.2022.9746867","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746867","url":null,"abstract":"Extended Bayesian information criterion (EBIC) and extended Fisher information criterion (EFIC) are two popular criteria for model selection in sparse high-dimensional linear regression models. However, EBIC is inconsistent in scenarios when the signal-to-noise-ratio (SNR) is high but the sample size is small, and EFIC is not invariant to data scaling, which affects its performance under different signal and noise statistics. In this paper, we present a refined criterion called EBICR where the ‘R’ stands for robust. EBICR is an improved version of EBIC and EFIC. It is scale-invariant and a consistent estimator of the true model as the sample size grows large and/or when the SNR tends to infinity. The performance of EBICR is compared to existing methods such as EBIC, EFIC and multi-beta-test (MBT). Simulation results indicate that the performance of EBICR in identifying the true model is either at par or superior to that of the other considered methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123621785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
HOQRI: Higher-Order QR Iteration for Scalable Tucker Decomposition HOQRI:可扩展Tucker分解的高阶QR迭代
Yuchen Sun, Kejun Huang
{"title":"HOQRI: Higher-Order QR Iteration for Scalable Tucker Decomposition","authors":"Yuchen Sun, Kejun Huang","doi":"10.1109/icassp43922.2022.9746726","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746726","url":null,"abstract":"We propose a new algorithm called higher-order QR iteration (HO-QRI) for computing the Tucker decomposition of large and sparse tensors. Compared to the celebrated higher-order orthogonal iterations (HOOI), HOQRI relies on a simple orthogonalization step in each iteration rather than a more sophisticated singular value de-composition step as in HOOI. More importantly, when dealing with extremely large and sparse data tensors, HOQRI completely eliminates the intermediate memory explosion by defining a new sparse tensor operation called TTMcTC. Furthermore, HOQRI is shown to monotonically improve the objective function, thus enjoying the same convergence guarantee as that of HOOI. Numerical experiments on synthetic and real data showcase the effectiveness of HOQRI.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123625430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Signal Recovery from Inconsistent Nonlinear Observations 不一致非线性观测的信号恢复
P. L. Combettes, Zev Woodstock
{"title":"Signal Recovery from Inconsistent Nonlinear Observations","authors":"P. L. Combettes, Zev Woodstock","doi":"10.1109/icassp43922.2022.9746145","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746145","url":null,"abstract":"We show that many nonlinear observation models in signal recovery can be represented using firmly nonexpansive operators. To address problems with inaccurate measurements, we propose solving a variational inequality relaxation which is guaranteed to possess solutions under mild conditions and which coincides with the original problem if it happens to be consistent. We then present an efficient algorithm for its solution, as well as numerical applications in signal and image recovery, including an experimental operator-theoretic method of promoting sparsity.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"20 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123649550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generation of Personal Sound Fields in Reverberant Environments Using Interframe Correlation 利用帧间关联在混响环境中产生个人声场
Liming Shi, Guoli Ping, Xiaoxiang Shen, M. G. Christensen
{"title":"Generation of Personal Sound Fields in Reverberant Environments Using Interframe Correlation","authors":"Liming Shi, Guoli Ping, Xiaoxiang Shen, M. G. Christensen","doi":"10.1109/icassp43922.2022.9747574","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747574","url":null,"abstract":"Personal sound field control techniques aim to produce sound fields for different sound contents in different places of an acoustic space without interference. The limitations of the state-of-the-art methods for sound field control include high latency and computational complexity, especially in the cases when the reverberation time is long and number of loudspeakers is large. In this paper, we propose a personal sound field control approach that exploits interframe correlation. Considering the past frames, the proposed method can accommodate long reverberation time with a low latency. To find the optimal parameters for the physical meaningful constraints, the subspace decomposition and Newton’s method are applied. Furthermore, a sound field distortion oriented subspace construction method is proposed to reduce the subspace dimension. Compared with traditional methods, simulation results show that the proposed algorithm is able to obtain a good trade-off between acoustic contrast and reproduction error with a low latency for measured room impulse responses.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114094179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning Task-Specific Representation for Video Anomaly Detection with Spatial-Temporal Attention 基于时空注意的视频异常检测学习任务特定表示
Y. Liu, Jing Liu, Xiaoguang Zhu, Donglai Wei, Xiaohong Huang, Liang Song
{"title":"Learning Task-Specific Representation for Video Anomaly Detection with Spatial-Temporal Attention","authors":"Y. Liu, Jing Liu, Xiaoguang Zhu, Donglai Wei, Xiaohong Huang, Liang Song","doi":"10.1109/icassp43922.2022.9746822","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746822","url":null,"abstract":"The automatic detection of abnormal events in surveillance videos with weak supervision has been formulated as a multiple instance learning task, which aims to localize the clips containing abnormal events temporally with the video-level labels. However, most existing methods rely on the features extracted by the pre-trained action recognition models, which are not discriminative enough for video anomaly detection. In this work, we propose a spatial-temporal attention mechanism to learn inter- and intra-correlations of video clips, and the boosted features are encouraged to be task-specific via the mutual cosine embedding loss. Experimental results on standard benchmarks demonstrate the effectiveness of the spatial-temporal attention, and our method achieves superior performance to the state-of-the-art methods.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114310685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Discourse-Level Prosody Modeling with a Variational Autoencoder for Non-Autoregressive Expressive Speech Synthesis 基于变分自编码器的非自回归表达语音合成语篇级韵律建模
Ning Wu, Zhaoci Liu, Zhenhua Ling
{"title":"Discourse-Level Prosody Modeling with a Variational Autoencoder for Non-Autoregressive Expressive Speech Synthesis","authors":"Ning Wu, Zhaoci Liu, Zhenhua Ling","doi":"10.1109/icassp43922.2022.9746238","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746238","url":null,"abstract":"To address the issue of one-to-many mapping from phoneme sequences to acoustic features in expressive speech synthesis, this paper proposes a method of discourse-level prosody modeling with a variational autoencoder (VAE) based on the non-autoregressive architecture of FastSpeech. In this method, phone-level prosody codes are extracted from prosody features by combining VAE with FastSpeech, and are predicted using discourse-level text features together with BERT embeddings. The continuous wavelet transform (CWT) in FastSpeech2 for F0 representation is not necessary anymore. Experimental results on a Chinese audiobook dataset show that our proposed method can effectively take advantage of discourse-level linguistic information and has outperformed FastSpeech2 on the naturalness and expressiveness of synthetic speech.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"54 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116214318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learning Approach For Fast Approximate Matrix Factorizations 快速近似矩阵分解的学习方法
Haiyan Yu, Zhen Qin, Zhihui Zhu
{"title":"Learning Approach For Fast Approximate Matrix Factorizations","authors":"Haiyan Yu, Zhen Qin, Zhihui Zhu","doi":"10.1109/icassp43922.2022.9747165","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9747165","url":null,"abstract":"Efficiently computing an (approximate) orthonormal basis and low-rank approximation for the input data X plays a crucial role in data analysis. One of the most efficient algorithms for such tasks is the randomized algorithm, which proceeds by computing a projection XA with a random sketching matrix A of much smaller size, and then computing the orthonormal basis as well as low-rank factorizations of the tall matrix XA. While a random matrix A is the de facto choice, in this work, we improve upon its performance by utilizing a learning approach to find an adaptive sketching matrix A from a set of training data. We derive a closed-form formulation for the gradient of the training problem, enabling us to use efficient gradient-based algorithms. We also extend this approach for learning structured sketching matrix, such as the sparse sketching matrix that performs as selecting a few number of representative columns from the input data. Our experiments on both synthetical and real data show that both learned dense and sparse sketching matrices outperform the random ones in finding the approximate orthonormal basis and low-rank approximations.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121485540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NEX+: Novel View Synthesis with Neural Regularisation Over Multi-Plane Images NEX+:基于神经正则化的新型多平面图像视图合成
Wenpeng Xing, Jie Chen
{"title":"NEX+: Novel View Synthesis with Neural Regularisation Over Multi-Plane Images","authors":"Wenpeng Xing, Jie Chen","doi":"10.1109/icassp43922.2022.9746938","DOIUrl":"https://doi.org/10.1109/icassp43922.2022.9746938","url":null,"abstract":"We propose Nex+, a neural Multi-Plane Image (MPI) representation with alpha denoising for the task of novel view synthesis (NVS). Overfitting to training data is a common challenge for all learning-based models. We propose a novel solution for resolving such issue in the context of NVS with signal denoising-motivated operations over the alpha coefficients of the MPI, without any additional requirements for supervision. Nex+ contains a novel 5D Alpha Neural Regulariser (ANR), which favors low-frequency components in the angular domain, i.e., the alpha coefficients’ signal sub-space indicating various viewing directions. ANR’s angular low-frequency property derives from its small number of angular encoding levels and output basis. The regularised alpha in Nex+ can model the scene geometry more accurately than Nex, and outperforms other state-of-the-art methods on public datasets for the task of NVS.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"87 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114002135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信