IEEE Signal Processing Letters最新文献_第7页

You Might Not Need Attention Diagonals 你可能不需要注意对角线

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-21 DOI: 10.1109/LSP.2025.3601497

Yiming Cui;Xin Yao;Shijin Wang;Guoping Hu

引用次数: 0

A Dual-Path Multiple Instance Learning Network Guided by Image Quality Assessment for Cervical Whole Slide Image Classification 基于图像质量评价的双路径多实例学习网络用于宫颈全切片图像分类

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-20 DOI: 10.1109/LSP.2025.3601043

Lanlan Kang;Jian Wang;Jian Qin;Yongjun He;Bo Ding

引用次数: 0

Grid-Free Radio Map Estimation via Unsupervised Implicit Continuous Representation 基于无监督隐式连续表示的无网格无线地图估计

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-20 DOI: 10.1109/LSP.2025.3601038

Xiaonan Chen;Jun Wang

{"title":"Grid-Free Radio Map Estimation via Unsupervised Implicit Continuous Representation","authors":"Xiaonan Chen;Jun Wang","doi":"10.1109/LSP.2025.3601038","DOIUrl":"https://doi.org/10.1109/LSP.2025.3601038","url":null,"abstract":"Radio map estimation (RME), also known as spectrum cartography (SC), aims to estimate instantaneous signal power distribution over a certain space-frequency region. Recent RME approaches typically discretize the to-be-estimated radio map into grid cells under a fixed resolution. Meshing subtly adds structural priors, e.g., low-rankness or deep image priors, to the radio map. These priors can effectively enhance the performance of RME, especially in blind scenarios. However, the downside is all the locations in a grid cell will share the same signal power, which is overly simplistic and contradict the continuity nature of power propagation. This work puts forth a blind grid-free RME framework. We introduce implicit continuous representation (ICR), which learns a mapping between spatial coordinates and power propagation pattern of each transmitter. This mechanism conceptually enables estimating the signal power at any spatial location within a certain region. With some model-based interpretations and designated optimization criteria, the ICR-based framework could be fully unsupervised, using only sampled data for training. This implies that our approach is not prone to the prevalent generalizability issue. Experiments under simulated and ray-tracing datasets verify the effectiveness of the proposed approach.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3430-3434"},"PeriodicalIF":3.9,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Scale Cross-Dimensional Attention Network for Gland Segmentation 基于多尺度跨维注意力网络的腺体分割

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-19 DOI: 10.1109/LSP.2025.3600374

Chaozhi Yu;Hongnan Cheng;Yufei Huang;Zhizhe Lin;Teng Zhou

引用次数: 0

Least-Square Estimation of FM Rates for Removing LFM Interference in SAR Images 去除SAR图像中LFM干扰的调频速率最小二乘估计

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-19 DOI: 10.1109/LSP.2025.3600375

Peijun Jin;Huizhang Yang;Meng Sun;Xuanchen Guo;Shuolin Pan

引用次数: 0

Altering Query Prompting With Contrastive Learning for Multimodal Intent Recognition 用对比学习改变查询提示用于多模态意图识别

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-19 DOI: 10.1109/LSP.2025.3599107

Yuxin Jia;Xueping Wang;Zhanpeng Shao;Min Liu

{"title":"Altering Query Prompting With Contrastive Learning for Multimodal Intent Recognition","authors":"Yuxin Jia;Xueping Wang;Zhanpeng Shao;Min Liu","doi":"10.1109/LSP.2025.3599107","DOIUrl":"https://doi.org/10.1109/LSP.2025.3599107","url":null,"abstract":"Multimodal intent recognition utilizes heterogeneous modalities such as visual, auditory, and textual cues to infer user intent, serving as a pivotal component in human-machine interaction. Existing approaches, however, often rely on unimodal paradigms or shallow multimodal fusion, failing to model cross-modal semantic dependencies and struggling to extract discriminative features from non-verbal modalities, limiting their robustness in complex scenarios. To mitigate these limitations, we propose an Altering Query Prompting with Contrastive Learning framework (AQP-CL) that dynamically aligns and refines multimodal representations. Specifically, the Altering Query Prompting (AQP) module introduces a tri-modality rotation attention mechanism, where textual, visual, and acoustic modalities cyclically alternate as queries in cross-attention operations. This approach addresses modality bias while strengthening interdependencies between modalities, ultimately yielding intent-aware fused feature representations that preserve discriminative cues. The Label-semantic Augmented Contrastive Learning (LACL) strategy generates augmented samples through the intent-aware query prompt and enhances feature discrimination via NT-Xent loss on label tokens. By integrating high-confidence textual semantics from intent labels, LACL refines auxiliary modality features through contrastive alignment, ensuring robust cross-modal representation learning. Evaluations on IEMOCAP and MIntRec validate AQP-CL’s superiority, achieving state-of-the-art precision of 77.78% on IEMOCAP, a 3.41% improvement over existing methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3345-3349"},"PeriodicalIF":3.9,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Task Diffusion With Masked Measurements 多任务扩散与屏蔽测量

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-19 DOI: 10.1109/LSP.2025.3600370

Mahdi Shamsi;Farokh Marvasti

引用次数: 0

StreamMel: Real-Time Zero-Shot Text-to-Speech Via Interleaved Continuous Autoregressive Modeling StreamMel：实时零镜头文本到语音通过交错连续自回归建模

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-19 DOI: 10.1109/LSP.2025.3600376

Hui Wang;Yifan Yang;Shujie Liu;Jinyu Li;Lingwei Meng;Yanqing Liu;Jiaming Zhou;Haoqin Sun;Yan Lu;Yong Qin

引用次数: 0

Domain-Factored Untrained Deep Prior for Spectrum Cartography 用于频谱制图的域因子非训练深度先验

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-19 DOI: 10.1109/LSP.2025.3599714

Subash Timilsina;Sagar Shrestha;Lei Cheng;Xiao Fu

引用次数: 0

DOA or Speaker Embedding: Which is Better for Multi-Microphone Target Speaker Extraction DOA或扬声器嵌入：哪个更适合多麦克风目标扬声器提取

IF 3.9 2区工程技术

IEEE Signal Processing Letters Pub Date : 2025-08-19 DOI: 10.1109/LSP.2025.3600168

Shuang Zhang;Jie Zhang;Yichi Wang;Haoyin Yan

{"title":"DOA or Speaker Embedding: Which is Better for Multi-Microphone Target Speaker Extraction","authors":"Shuang Zhang;Jie Zhang;Yichi Wang;Haoyin Yan","doi":"10.1109/LSP.2025.3600168","DOIUrl":"https://doi.org/10.1109/LSP.2025.3600168","url":null,"abstract":"Target speaker extraction (TSE) is a useful front-end to improve the speech quality and intelligibility for speech applications, whereas direction-of-arrival (DOA) and speaker embedding are two of the most often-used assistive clues to identify the target speaker in audio-only multi-microphone systems. Both can significantly improve the TSE performance compared to blind TSE models, which however have not yet been comprehensively compared in literature. In order to show their pros and cons, in this work we therefore build a unified framework for a fair comparison that allows for both DOA and speaker embedding as the assistive clue. The DOA is used to calculate multichannel spatiotemporal speech features and a speaker encoder is designed to extract the speaker embedding, either of which is then fused with the noisy speech features for TSE. We can then evaluate their respective strengths in diverse acoustic conditions, e.g., varying noise level, microphone number, speaker location. Results show that given true DOA angles, the DOA-based TSE model always outperforms the speaker embedding based counterpart regardless of noise/microphone/location conditions, meaning the stronger discriminativity of DOA in terms of speaker identity. This superiority becomes smaller if the DOA mis-match increases, and the latter can do better in the large DOA mismatch case.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3350-3354"},"PeriodicalIF":3.9,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0