IEEE Journal of Selected Topics in Signal Processing最新文献

IEEE Signal Processing Society Publication Information IEEE信号处理学会出版物信息

IF 8.7 1区工程技术

IEEE Journal of Selected Topics in Signal Processing Pub Date : 2025-06-27 DOI: 10.1109/JSTSP.2025.3570399

引用次数: 0

Guest Editorial: IEEE JSTSP Special Issue on Deep Multimodal Speech Enhancement and Separation (DEMSES) 嘉宾评论：IEEE JSTSP深度多模态语音增强与分离（DEMSES）特刊

IF 8.7 1区工程技术

IEEE Journal of Selected Topics in Signal Processing Pub Date : 2025-06-27 DOI: 10.1109/JSTSP.2025.3570397

Amir Hussain;Yu Tsao;John H.L. Hansen;Naomi Harte;Shinji Watanabe;Isabel Trancoso;Shixiong Zhang

引用次数: 0

IEEE Signal Processing Society Information IEEE信号处理学会信息

IF 8.7 1区工程技术

IEEE Journal of Selected Topics in Signal Processing Pub Date : 2025-06-27 DOI: 10.1109/JSTSP.2025.3570405

引用次数: 0

$C^{2}$AV-TSE: Context and Confidence-Aware Audio Visual Target Speaker Extraction 上下文和自信感知的视听目标说话人提取

IF 8.7 1区工程技术

IEEE Journal of Selected Topics in Signal Processing Pub Date : 2025-04-15 DOI: 10.1109/JSTSP.2025.3560513

Wenxuan Wu;Xueyuan Chen;Shuai Wang;Jiadong Wang;Lingwei Meng;Xixin Wu;Helen Meng;Haizhou Li

{"title":"$C^{2}$AV-TSE: Context and Confidence-Aware Audio Visual Target Speaker Extraction","authors":"Wenxuan Wu;Xueyuan Chen;Shuai Wang;Jiadong Wang;Lingwei Meng;Xixin Wu;Helen Meng;Haizhou Li","doi":"10.1109/JSTSP.2025.3560513","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3560513","url":null,"abstract":"Audio-Visual Target Speaker Extraction (AV-TSE) aims to mimic the human ability to enhance auditory perception using visual cues. Although numerous models have been proposed recently, most of them estimate target signals by primarily relying on local dependencies within acoustic features, underutilizing the human-like capacity to infer unclear parts of speech through contextual information. This limitation results in not only suboptimal performance but also inconsistent extraction quality across the utterance, with some segments exhibiting poor quality or inadequate suppression of interfering speakers. To close this gap, we propose a model-agnostic strategy called the Mask-And-Recover (MAR). It integrates both inter- and intra-modality contextual correlations to enable global inference within extraction modules. Additionally, to better target challenging parts within each sample, we introduce a Fine-grained Confidence Score (FCS) model to assess extraction quality and guide extraction modules to emphasize improvement on low-quality segments. To validate the effectiveness of our proposed model-agnostic training paradigm, six popular AV-TSE backbones were adopted for evaluation on the VoxCeleb2 dataset, demonstrating consistent performance improvements across various metrics.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 4","pages":"646-657"},"PeriodicalIF":8.7,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HPCNet: Hybrid Pixel and Contour Network for Audio-Visual Speech Enhancement With Low-Quality Video HPCNet：用于低质量视频的视听语音增强的混合像素和轮廓网络

IF 8.7 1区工程技术

IEEE Journal of Selected Topics in Signal Processing Pub Date : 2025-04-10 DOI: 10.1109/JSTSP.2025.3559763

Hang Chen;Chen-Yue Zhang;Qing Wang;Jun Du;Sabato Marco Siniscalchi;Shi-Fu Xiong;Gen-Shun Wan

{"title":"HPCNet: Hybrid Pixel and Contour Network for Audio-Visual Speech Enhancement With Low-Quality Video","authors":"Hang Chen;Chen-Yue Zhang;Qing Wang;Jun Du;Sabato Marco Siniscalchi;Shi-Fu Xiong;Gen-Shun Wan","doi":"10.1109/JSTSP.2025.3559763","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3559763","url":null,"abstract":"To advance audio-visual speech enhancement (AVSE) research in low-quality video settings, we introduce the multimodal information-based speech processing-low quality video (MISP-LQV) benchmark, which includes a 120-hour real-world Mandarin audio-visual dataset, two video degradation simulation methods, and benchmark results from several well-known AVSE models. We also propose a novel hybrid pixel and contour network (HPCNet), incorporating a lip reconstruction and distillation (LRD) module and a contour graph convolution (CGConv) layer. Specifically, the LRD module reconstructs high-quality lip frames from low-quality audio-visual data, utilizing knowledge distillation from a teacher model trained on high-quality data. The CGConv layer employs spatio-temporal and semantic-contextual graphs to capture complex relationships among lip landmark points. Extensive experiments on the MISP-LQV benchmark reveal the performance degradation caused by low-quality video across various AVSE models. Notably, including real/simulated low-quality videos in AVSE training enhances its robustness to low-quality videos but degrades the performance of high-quality videos.The proposed HPCNet demonstrates strong robustness against video quality degradation, which can be attributed to (1) the reconstructed lip frames closely aligning with high-quality frames and (2) the contour features exhibiting consistency across different video quality levels. The generalizability of HPCNet also has been validated through experiments on the 2nd COG-MHEAR AVSE Challenge dataset.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 4","pages":"671-684"},"PeriodicalIF":8.7,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Input-Independent Subject-Adaptive Channel Selection for Brain-Assisted Speech Enhancement 脑辅助语音增强的输入独立主体自适应信道选择

IF 8.7 1区工程技术

IEEE Journal of Selected Topics in Signal Processing Pub Date : 2025-04-07 DOI: 10.1109/JSTSP.2025.3558653

Qingtian Xu;Jie Zhang;Zhenhua Ling

{"title":"Input-Independent Subject-Adaptive Channel Selection for Brain-Assisted Speech Enhancement","authors":"Qingtian Xu;Jie Zhang;Zhenhua Ling","doi":"10.1109/JSTSP.2025.3558653","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3558653","url":null,"abstract":"Brain-assisted speech enhancement (BASE) that utilizes electroencephalogram (EEG) signals as an assistive modality has shown a great potential for extracting the target speaker in multi-talker conditions. This is feasible as the EEG measurements contain the auditory attention of hearing-impaired listeners that can be leveraged to classify the target identity. Considering that an EEG cap with sparse channels exhibits multiple benefits and in practice many electrodes might contribute marginally, the EEG channel selection for BASE is desired. This problem was tackled in a subject-invariant manner in literature, the resulting BASE performance varies significantly across subjects. In this work, we therefore propose an input-independent subject-adaptive channel selection method for BASE, called subject-adaptive convolutional regularization selection (SA-ConvRS), which enables a personalized informative channel distribution. We observe the abnormal <italic>over memory</i> phenomenon that facilitates the model to perform BASE without any brain signals, which often occurs in related fields due to the data recording and validation conditions. To remove this effect, we further design a task-based multi-process adversarial training (TMAT) approach by exploiting pseudo-EEG inputs. Experimental results on a public dataset show that the proposed SA-ConvRS can achieve subject-adaptive channel selections and keep the BASE performance close to the full-channel upper bound; the TMAT can avoid the over memory problem without sacrificing the performance of SA-ConvRS.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 4","pages":"658-670"},"PeriodicalIF":8.7,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Complex-Valued Autoencoder-Based Neural Data Compression for SAR Raw Data 基于复值自编码器的SAR原始数据神经数据压缩

IF 8.7 1区工程技术

IEEE Journal of Selected Topics in Signal Processing Pub Date : 2025-04-07 DOI: 10.1109/JSTSP.2025.3558651

Reza Mohammadi Asiyabi;Mihai Datcu;Andrei Anghel;Adrian Focsa;Michele Martone;Paola Rizzoli;Ernesto Imbembo

{"title":"Complex-Valued Autoencoder-Based Neural Data Compression for SAR Raw Data","authors":"Reza Mohammadi Asiyabi;Mihai Datcu;Andrei Anghel;Adrian Focsa;Michele Martone;Paola Rizzoli;Ernesto Imbembo","doi":"10.1109/JSTSP.2025.3558651","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3558651","url":null,"abstract":"Recent advances in Synthetic Aperture Radar (SAR) sensors and innovative advanced imagery techniques have enabled SAR systems to acquire very high-resolution images with wide swaths, large bandwidth and in multiple polarization channels. The improvements of the SAR system capabilities also imply a significant increase in SAR data acquisition rates, such that efficient and effective compression methods become necessary. The compression of SAR raw data plays a crucial role in addressing the challenges posed by downlink and memory limitations onboard the SAR satellites and directly affects the quality of the generated SAR image. Neural data compression techniques using deep models have attracted many interests for natural image compression tasks and demonstrated promising results. In this study, neural data compression is extended into the complex domain to develop a Complex-Valued (CV) autoencoder-based data compression for SAR raw data. To this end, the basic fundamentals of data compression and Rate-Distortion (RD) theory are reviewed, well known data compression methods, Block Adaptive Quantization (BAQ) and JPEG2000 methods, are implemented and tested for SAR raw data compression, and a neural data compression based on CV autoencoders is developed for SAR raw data. Furthermore, since the available Sentinel-1 SAR raw products are already compressed with Flexible Dynamic BAQ (FDBAQ), an adaptation procedure applied to the decoded SAR raw data to generate SAR raw data with quasi-uniform quantization that resemble the statistics of the uncompressed SAR raw data onboard the satellites.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"572-582"},"PeriodicalIF":8.7,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10955162","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SAV-SE: Scene-Aware Audio-Visual Speech Enhancement With Selective State Space Model 基于选择性状态空间模型的场景感知视听语音增强

IF 8.7 1区工程技术

IEEE Journal of Selected Topics in Signal Processing Pub Date : 2025-04-07 DOI: 10.1109/JSTSP.2025.3558654

Xinyuan Qian;Jiaran Gao;Yaodan Zhang;Qiquan Zhang;Hexin Liu;Leibny Paola Garcia Perera;Haizhou Li

{"title":"SAV-SE: Scene-Aware Audio-Visual Speech Enhancement With Selective State Space Model","authors":"Xinyuan Qian;Jiaran Gao;Yaodan Zhang;Qiquan Zhang;Hexin Liu;Leibny Paola Garcia Perera;Haizhou Li","doi":"10.1109/JSTSP.2025.3558654","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3558654","url":null,"abstract":"Speech enhancement plays an essential role in various applications, and the integration of visual information has been demonstrated to bring substantial advantages. However, the majority of current research concentrates on the examination of facial and lip movements, which can be compromised or entirely inaccessible in scenarios where occlusions occur or when the camera view is distant. Whereas contextual visual cues from the surrounding environment have been overlooked: for example, when we see a dog bark, our brain has the innate ability to discern and filter out the barking noise. To this end, in this paper, we introduce a novel task, i.e. Scene-aware Audio-Visual Speech Enhancement (SAV-SE). To our best knowledge, this is the first proposal to use rich contextual information from synchronized video as auxiliary cues to indicate the type of noise, which eventually improves the speech enhancement performance. Specifically, we propose the VC-S <inline-formula><tex-math>$^{2}$</tex-math></inline-formula> E method, which incorporates the Conformer and Mamba modules for their complementary strengths. Extensive experiments are conducted on public MUSIC, AVSpeech and AudioSet datasets, where the results demonstrate the superiority of VC-S <inline-formula><tex-math>$^{2}$</tex-math></inline-formula> E over other competitive methods.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 4","pages":"623-634"},"PeriodicalIF":8.7,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144502891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Protecting Images From Manipulations With Deep Optical Signatures 保护图像免受操纵与深光学签名

IF 8.7 1区工程技术

IEEE Journal of Selected Topics in Signal Processing Pub Date : 2025-04-01 DOI: 10.1109/JSTSP.2025.3554136

Kevin Arias;Pablo Gomez;Carlos Hinojosa;Juan Carlos Niebles;Henry Arguello

引用次数: 0

MIMO-Based Indoor Localisation With Hybrid Neural Networks: Leveraging Synthetic Images From Tidy Data for Enhanced Deep Learning 基于mimo的室内定位与混合神经网络：利用来自整洁数据的合成图像增强深度学习

IF 8.7 1区工程技术

IEEE Journal of Selected Topics in Signal Processing Pub Date : 2025-03-31 DOI: 10.1109/JSTSP.2025.3555067

Manuel Castillo-Cara;Jesus Martínez-Gómez;Javier Ballesteros-Jerez;Ismael García-Varea;Raúl García-Castro;Luis Orozco-Barbosa

{"title":"MIMO-Based Indoor Localisation With Hybrid Neural Networks: Leveraging Synthetic Images From Tidy Data for Enhanced Deep Learning","authors":"Manuel Castillo-Cara;Jesus Martínez-Gómez;Javier Ballesteros-Jerez;Ismael García-Varea;Raúl García-Castro;Luis Orozco-Barbosa","doi":"10.1109/JSTSP.2025.3555067","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3555067","url":null,"abstract":"Indoor localization determines an object's position within enclosed spaces, with applications in navigation, asset tracking, robotics, and context-aware computing. Technologies range from WiFi and Bluetooth to advanced systems like Massive Multiple Input-Multiple Output (MIMO). MIMO, initially designed to enhance wireless communication, is now key in indoor positioning due to its spatial diversity and multipath propagation. This study integrates MIMO-based indoor localization with Hybrid Neural Networks (HyNN), converting structured datasets into synthetic images using TINTO. This research marks the first application of HyNNs using synthetic images for MIMO-based indoor localization. Our key contributions include: (i) adapting TINTO for regression problems; (ii) using synthetic images as input data for our model; (iii) designing a novel HyNN with a Convolutional Neural Network branch for synthetic images and an MultiLayer Percetron branch for tidy data; and (iv) demonstrating improved results and metrics compared to prior literature. These advancements highlight the potential of HyNNs in enhancing the accuracy and efficiency of indoor localization systems.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"559-571"},"PeriodicalIF":8.7,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10946146","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0