ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)最新文献

筛选
英文 中文
TeAw: Text-Aware Few-Shot Remote Sensing Image Scene Classification 文本感知的少拍遥感图像场景分类
Kaihui Cheng, Chule Yang, Zunlin Fan, Dayan Wu, Naiyang Guan
{"title":"TeAw: Text-Aware Few-Shot Remote Sensing Image Scene Classification","authors":"Kaihui Cheng, Chule Yang, Zunlin Fan, Dayan Wu, Naiyang Guan","doi":"10.1109/ICASSP49357.2023.10095523","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095523","url":null,"abstract":"The recent advance has shown that few-shot learning may be a promising way to alleviate the data reliance of remote sensing image scene classification. However, most existing works focus on extracting distinguishable features only from visual modality, while the problem of learning knowledge from multiple modalities has barely been visited. In this work, we propose a text-aware framework for few-shot remote sensing image scene classification (TeAw). Specifically, TeAw converts the class names to more detailed text descriptions and extracts text features using a pre-trained text encoder. Mean-while, TeAw obtains image features via an image encoder. Then we compute the correlation between the text and the image features, which helps the model grasp the core concept of the input image. Finally, TeAw calculates the similarity of local features between supports and queries to get the predictions. Extensive experiments show the outperformance of our TeAw compared with other SOTA methods.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127479709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
mmWave Wi-Fi Trajectory Estimation with Continuous-Time Neural Dynamic Learning 基于连续时间神经动态学习的毫米波Wi-Fi轨迹估计
Cristian J. Vaca-Rubio, P. Wang, T. Koike-Akino, Ye Wang, P. Boufounos, P. Popovski
{"title":"mmWave Wi-Fi Trajectory Estimation with Continuous-Time Neural Dynamic Learning","authors":"Cristian J. Vaca-Rubio, P. Wang, T. Koike-Akino, Ye Wang, P. Boufounos, P. Popovski","doi":"10.1109/ICASSP49357.2023.10096474","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096474","url":null,"abstract":"We leverage standards-compliant beam training measurements from commercial-of-the-shelf (COTS) 802.11ad/ay devices for localization of a moving object. Two technical challenges need to be addressed: (1) the beam training measurements are intermittent due to beam scanning overhead control and contention-based channel-time allocation, and (2) how to exploit underlying object dynamics to assist the localization. To this end, we formulate the trajectory estimation as a sequence regression problem. We propose a dual-decoder neural dynamic learning framework to simultaneously reconstruct Wi-Fi beam training measurements at irregular time instances and learn the unknown dynamics over the latent space in a continuous-time fashion by enforcing strong supervision at both the coordinate and measurement levels. The proposed method was evaluated on an in-house mmWave Wi-Fi dataset and compared with a range of baseline methods, including traditional machine learning methods and recurrent neural networks.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124742395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modaldrop: Modality-Aware Regularization for Temporal-Spectral Fusion in Human Activity Recognition 情态下降:人类活动识别中时间-光谱融合的情态感知正则化
Xin Zeng, Yiqiang Chen, Benfeng Xu, Tengxiang Zhang
{"title":"Modaldrop: Modality-Aware Regularization for Temporal-Spectral Fusion in Human Activity Recognition","authors":"Xin Zeng, Yiqiang Chen, Benfeng Xu, Tengxiang Zhang","doi":"10.1109/ICASSP49357.2023.10095880","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095880","url":null,"abstract":"Although most of existing works for sensor-based Human Activity Recognition rely on the temporal view, we argue that the spectral view also provides complementary prior and accordingly benchmark a standard multi-view framework with extensive experiments to demonstrate its consistent superiority over single-view opponents. We then delve into the intrinsic mechanism of the multi-view representation fusion, and propose ModalDrop as a novel modality-aware regularization method to learn and exploit representations of both views effectively. We demonstrate its advantage over existing representation fusion alternatives with comprehensive experiments and ablations. The improvements are consistent for various settings and are orthogonal with different backbones. We also discuss its potential application for other related tasks regarding representation or modality fusion. The source code is available on https://github.com/studyzx/ModalDrop.git.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124798493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Extrapolation Technique to Accelerate WMMSE 一种新的加速WMMSE的外推技术
Kaiwen Zhou, Zhilin Chen, Guochen Liu, Zhitang Chen
{"title":"A Novel Extrapolation Technique to Accelerate WMMSE","authors":"Kaiwen Zhou, Zhilin Chen, Guochen Liu, Zhitang Chen","doi":"10.1109/ICASSP49357.2023.10096806","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096806","url":null,"abstract":"Precoding design is essential for massive multi-user multiple-input multiple-output (MU-MIMO) systems, which aims at maximizing the weighted sum-rate (WSR). This problem is known to be NP-hard, and iterative algorithms are typically used to approximately solve it. The weighted minimum mean-squared error (WMMSE) algorithm is a popular solver for WSR maximization, which efficiently finds a local maxima of WSR. In this work, we introduce a novel extrapolation technique to further accelerate WMMSE. This technique is inspired by the momentum technique in convex optimization, and can be interpreted as an accelerated second-order method. The merits of the proposed extrapolation technique are (i) lightweight, as it almost does not increase the iteration complexity, (ii) generic, since it works in various settings such as the sum power constraint or per-antenna power constraint cases and coordinated multi-point joint transmission networks, and (iii) effective, that our simulation results show it significantly accelerates the convergence of WMMSE in the high channel correlation regime.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124940387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Supervised Audio-Visual Speaker Representation with Co-Meta Learning 基于协同元学习的自监督视听说话人表示
Hui Chen, Hanyi Zhang, Longbiao Wang, Kong-Aik Lee, Meng Liu, J. Dang
{"title":"Self-Supervised Audio-Visual Speaker Representation with Co-Meta Learning","authors":"Hui Chen, Hanyi Zhang, Longbiao Wang, Kong-Aik Lee, Meng Liu, J. Dang","doi":"10.1109/ICASSP49357.2023.10096925","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096925","url":null,"abstract":"In self-supervised speaker verification, the quality of pseudo labels determines the upper bound of its performance and it is not uncommon to end up with massive amount of unreliable pseudo labels. We observe that the complementary information in different modalities ensures a robust supervisory signal for audio and visual representation learning. This motivates us to propose an audio-visual self-supervised learning framework named Co-Meta Learning. Inspired by the Coteaching+, we design a strategy that allows the information of two modalities to be coordinated through the Update by Disagreement. Moreover, we use the idea of modelagnostic meta learning (MAML) to update the network parameters, which makes the hard samples of two modalities to be better resolved by the other modality through gradient regularization. Compared to the baseline, our proposed method achieves a 29.8%, 11.7% and 12.9% relative improvement on Vox-O, Vox-E and Vox-H trials of Voxceleb1 evaluation dataset respectively.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125008891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Estimating Acoustic Direction of Arrival Using a Single Structural Sensor on a Resonant Surface 利用谐振表面上的单一结构传感器估计声到达方向
Tre Dipassio, Michael C. Heilemann, Benjamin Thompson, M. Bocko
{"title":"Estimating Acoustic Direction of Arrival Using a Single Structural Sensor on a Resonant Surface","authors":"Tre Dipassio, Michael C. Heilemann, Benjamin Thompson, M. Bocko","doi":"10.1109/ICASSP49357.2023.10095986","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10095986","url":null,"abstract":"The direction of arrival (DOA) of an acoustic source is a signal characteristic used by smart audio devices to enable signal enhancement algorithms. Though DOA estimations are traditionally made using a multi-microphone array, we propose that the resonant modes of a surface excited by acoustic waves contain sufficient spatial information that DOA may be estimated using a singular structural vibration sensor. In this work, sensors are affixed to an acrylic panel and used to record acoustic noise signals at various angles of incidence. From these recordings, feature vectors containing the sums of the energies in the panel’s isolated modal regions are extracted and used to train deep neural networks to estimate DOA. Experimental results show that when all 13 of the acrylic panel’s isolated modal bands are utilized, the DOA of incident acoustic waves for a broadband noise signal may be estimated by a single structural sensor to within ±5° with a reliability of 98.4%. The size of the feature set may be reduced by eliminating the resonant modes that do not have strong spatial coupling to the incident acoustic wave. Reducing the feature set to the 7 modal bands that provide the most spatial information produces a reliability of 89.7% for DOA estimates within ±5° using a single sensor.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125117383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
DQFORMER: Dynamic Query Transformer for Lane Detection DQFORMER:车道检测的动态查询变压器
Hao Yang, Shuyuan Lin, Runqing Jiang, Yang Lu, Hanzi Wang
{"title":"DQFORMER: Dynamic Query Transformer for Lane Detection","authors":"Hao Yang, Shuyuan Lin, Runqing Jiang, Yang Lu, Hanzi Wang","doi":"10.1109/ICASSP49357.2023.10097047","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10097047","url":null,"abstract":"Lane detection is one of the most important tasks in self-driving. The critical purpose of lane detection is the prediction of lane shapes. Meanwhile, it is challenging and difficult to determine lane instance positions before predicting lane shapes in an image. In this paper, we propose a top-down method called Dynamic Query Transformer (DQFormer), which uses a Dynamic Lane Queries (DLQs) module to predict lane shapes. Specifically, to accurately predict lane shapes, we propose a new framework for generating dynamic weights based on DLQs, which can focus on the context of lane shapes dynamically. Unlike existing transformer-based methods, the proposed DQFormer does not require setting a fixed number of lane queries, so it is suitable for various scenes. In addition, we further propose a Line Voting Module (LVM) which collects votes from other lanes to enhance lane features, to determine lane instance positions. Extensive experiments demonstrate that DQFormer outperforms several state-of-the-art methods on two popular lane detection benchmarks (i.e., CULane and TuSimple).","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"25 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125893402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facial Texure Perceiver: Towards High-Fidelity Facial Texture Recovery with Input-Level Inductive Biased Perceiver IO 面部纹理感知器:用输入级诱导偏置感知器实现高保真的面部纹理恢复
Seungeun Lee
{"title":"Facial Texure Perceiver: Towards High-Fidelity Facial Texture Recovery with Input-Level Inductive Biased Perceiver IO","authors":"Seungeun Lee","doi":"10.1109/ICASSP49357.2023.10096776","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096776","url":null,"abstract":"This paper presents a new method, called Facial Texture Perceiver. It deals with the task of facial texture recovery from in-the-wild images without 3D supervision. Motivated by their success in various computer vision tasks, we attempt to use transformers for this task. However, capturing high-fidelity facial details requires a large number of mesh vertices and in this case, naively applying vanilla transformer can incur prohibitively high computational and memory costs. We address this challenge by mapping the input with a large number of mesh vertices to a latent space and performing their attention on this space. Also, we introduce input-level inductive biases by injecting the geometry and appearance embeddings as extra inputs. It helps to data-efficiently learn and generalize in-the-wild domains. The resulting architecture enable the application of Transformers to high-resolution facial meshes. Experiments on CelebA, MICC-Florence and MoFA-test datasets demonstrate that our method can accurately reconstruct facial textures, outperforming state-of-the-art methods.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125976131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Core: Transferable Long-Range Time Series Forecasting Enhanced by Covariates-Guided Representation 核心:协变量引导表示法增强的可转移长期时间序列预测
Xin-Yi Li, Pei-Nan Zhong, Dingquan Chen, Yubin Yang
{"title":"Core: Transferable Long-Range Time Series Forecasting Enhanced by Covariates-Guided Representation","authors":"Xin-Yi Li, Pei-Nan Zhong, Dingquan Chen, Yubin Yang","doi":"10.1109/ICASSP49357.2023.10096231","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096231","url":null,"abstract":"In recent years, long-range time series forecasting has been actively studied and has shown promising results. However, since these methods mainly focus on predicting time series with a fixed dimension, they are inapplicable to the large-scale and ever-changing datasets that are common in real-world applications. Additionally, existing methods only take a window of the near past as input, which prevents the models from learning persistent historical patterns. To tackle these problems, we propose CoRe, a novel transferable long-term forecasting method enhanced by Covariates-guided Representation. By encoding the input series into a dense vector, CoRe is able to extract instance-wise global features. Specifically, the representation is learned by modeling the correlation between the target series and constructed auxiliary covariates, which is implemented by our proposed cross-dependency network. Comprehensive experiments on six real-world datasets show that CoRe achieves overall state-of-the-art results and can transfer to unseen data with stable performance.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126057639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hearttoheart: The Arts of Infant Versus Adult-Directed Speech Classification 心对心:婴儿与成人定向言语分类的艺术
Najla D. Al Futaisi, Alejandrina Cristia, B. Schuller
{"title":"Hearttoheart: The Arts of Infant Versus Adult-Directed Speech Classification","authors":"Najla D. Al Futaisi, Alejandrina Cristia, B. Schuller","doi":"10.1109/ICASSP49357.2023.10096728","DOIUrl":"https://doi.org/10.1109/ICASSP49357.2023.10096728","url":null,"abstract":"Psycholinguistics researchers investigate child language exposure by studying children’s language environment. A main factor is whether, in humanistic heart-to-heart dialogue, the speech is directed to the infant (infant-directed speech) versus to another adult (adult-directed speech). The former has been found to better predict children’s lexicon, and therefore constitutes a more relevant part of children’s language environment. Listening to, segmenting and annotating naturalistic long-form recordings collected through infant-worn devices is highly costly and time-consuming, and could be prone to errors in misclassification. We aim to overcome these challenges by automatically classifying speech as infant-directed versus adult-directed. In this research, we exploit multiple datasets, combined to form a larger corpus for training. In addition, we employ four different methods: Multi-task learning, adversarial training, autoencoder multi-task learning and adversarial multi-task learning, the last of which yielded the best results on all datasets.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126060472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信