Haeyun Lee;Kyungsu Lee;Jong Pil Yoon;Jihun Kim;Jun-Young Kim
{"title":"Real-Time Self-Supervised Ultrasound Image Enhancement Using Test-Time Adaptation for Sophisticated Rotator Cuff Tear Diagnosis","authors":"Haeyun Lee;Kyungsu Lee;Jong Pil Yoon;Jihun Kim;Jun-Young Kim","doi":"10.1109/LSP.2025.3557754","DOIUrl":"https://doi.org/10.1109/LSP.2025.3557754","url":null,"abstract":"Medical ultrasound imaging is a key diagnostic tool across various fields, with computer-aided diagnosis systems benefiting from advances in deep learning. However, its lower resolution and artifacts pose challenges, particularly for non-specialists. The simultaneous acquisition of degraded and high-quality images is infeasible, limiting supervised learning approaches. Additionally, self-supervised and zero-shot methods require extensive processing time, conflicting with the real-time demands of ultrasound imaging. Therefore, to address the aforementioned issues, we propose real-time ultrasound image enhancement via a self-supervised learning technique and a test-time adaptation for sophisticated rotational cuff tear diagnosis. The proposed approach learns from other domain image datasets and performs self-supervised learning on an ultrasound image during inference for enhancement. Our approach not only demonstrated superior ultrasound image enhancement performance compared to other state-of-the-art methods but also achieved an 18% improvement in the RCT segmentation performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1635-1639"},"PeriodicalIF":3.2,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bayesian Language Model Adaptation for Personalized Speech Recognition","authors":"Mun-Hak Lee;Ji-Hwan Mo;Ji-Hun Kang;Jin-Young Son;Joon-Hyuk Chang","doi":"10.1109/LSP.2025.3556787","DOIUrl":"https://doi.org/10.1109/LSP.2025.3556787","url":null,"abstract":"In deployment environments for speech recognition models, diverse proper nouns such as personal names, song titles, and application names are frequently uttered. These proper nouns are often sparsely distributed within the training dataset, leading to performance degradation and limiting the practical utility of the models. Personalization strategies that leverage user-specific information, such as contact lists or search histories, have proven effective in mitigating performance degradation caused by rare words. In this study, we propose a novel personalization method for combining the scores of a general language model (LM) and a personal LM within a probabilistic framework. The proposed method entails low computational costs, storage requirements, and latency. Through experiments using a real-world dataset collected from the vehicle environment, we demonstrate that the proposed method effectively overcomes the out-of-vocabulary problem and improves recognition performance for rare words.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1620-1624"},"PeriodicalIF":3.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143871000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Viewport-Independent Blind Quality Assessment of AI-Generated Omnidirectional Images via Vision-Language Correspondence","authors":"Xuelin Liu;Jiebin Yan;Chenyi Lai;Yang Li;Yuming Fang","doi":"10.1109/LSP.2025.3556791","DOIUrl":"https://doi.org/10.1109/LSP.2025.3556791","url":null,"abstract":"The advancement of deep generation technology has significantly enhanced the growth of artificial intelligence-generated content (AIGC). Among these, AI-generated omnidirectional images (AGOIs), hold considerable promise for applications in virtual reality (VR). However, the quality of AGOIs varies widely, and there has been limited research focused on their quality assessment. In this letter, inspired by the characteristics of the human visual system, we propose a novel viewport-independent blindquality assessment method for AGOIs, termed VI-AGOIQA, which leverages vision-language correspondence. Specifically, to minimize the computational burden associated with viewport-based prediction methods for omnidirectional image quality assessment, a set of image patches are first extracted from AGOIs in Equirectangular Projection (ERP) format. Then, the correspondence between visual and textual inputs is effectively learned by utilizing the pre-trained image and text encoders of the Contrastive Language-Image Pre-training (CLIP) model. Finally, a multimodal feature fusion module is applied to predict human visual preferences based on the learned knowledge of visual-language consistency. Extensive experiments conducted on publicly available database demonstrate the promising performance of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1630-1634"},"PeriodicalIF":3.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Party Reversible Data Hiding in Ciphertext Binary Images Based on Visual Cryptography","authors":"Bing Chen;Jingkun Yu;Bingwen Feng;Wei Lu;Jun Cai","doi":"10.1109/LSP.2025.3557273","DOIUrl":"https://doi.org/10.1109/LSP.2025.3557273","url":null,"abstract":"Existing methods for reversible data hiding in ciphertext binary images only involve one data hider to perform data embedding. When the data hider is attacked, the original binary image cannot be perfectly reconstructed. To this end, this letter proposes multi-party reversible data hiding in ciphertext binary images, where multiple data hiders are involved in data embedding. In this solution, we use visual cryptography technology to encrypt a binary image into multiple ciphertext binary images, and transmit the ciphertext binary images to different data hiders. Each data hider can embed data into a ciphertext binary image and generate a marked ciphertext binary image. The original binary image is perfectly reconstructed by collecting a portion of marked ciphertext binary images from the unattacked data hiders. Compared with existing solutions, the proposed solution enhances the recoverability of the original binary image. Besides, the proposed solution maintains a stable embedding capacity for different categories of images.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1560-1564"},"PeriodicalIF":3.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perceptual Screen Content Image Hashing Using Adaptive Texture and Shape Features","authors":"Xue Yang;Ziqing Huang;Yonghua Zhang;Shuo Zhang;Zhenjun Tang","doi":"10.1109/LSP.2025.3557272","DOIUrl":"https://doi.org/10.1109/LSP.2025.3557272","url":null,"abstract":"With the flourishing development of multi-client interactive systems, a new type of digital image known as Screen Content Image (SCI) has emerged. Unlike traditional natural scene images, SCI encompasses various visual contents, including natural images, graphics, and text. Because the multi-region distribution characteristics of screen content images result in the presence of blank regions, malicious modifications are easier to operate and harder to perceive, making a serious threat to visual content security. To this end, this paper proposes a color screen content image hashing algorithm using adaptive text regions features and global shape features. Specifically, the text regions are adaptively collected by calculating the local standard deviation of sub-blocks. Then, quaternion Fourier significant maps are computed for the text regions, and texture statistical features are further extracted to reflect the essential visual content robustness. Moreover, the global shape features are represented from the entire color SCI to ensure the discrimination. Finally, the hash sequence with a length of 142 bits is derived from the above features. Importantly, a specialized tampering dataset for SCIs has been established, and the proposed hashing shows highly sensitive to malicious modifications with a satisfactory detection accuracy. Meanwhile, the ROC curve analysis indicates that the proposed method outperforms existing hashing algorithms.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1655-1659"},"PeriodicalIF":3.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143883431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Zhou;Liangtian He;Liang-Jian Deng;Hongming Chen;Chao Wang
{"title":"MDFormer: Multi-Scale Downsampling-Based Transformer for Low-Light Image Enhancement","authors":"Yang Zhou;Liangtian He;Liang-Jian Deng;Hongming Chen;Chao Wang","doi":"10.1109/LSP.2025.3556786","DOIUrl":"https://doi.org/10.1109/LSP.2025.3556786","url":null,"abstract":"Vision Transformers have achieved impressive performance in the field of low-light image enhancement. Some Transformer-based methods acquire attention maps within channel dimension, whereas the spatial resolutions of queries and keys involved in matrix multiplication are much larger than the dimensions of channels. During the key-query dot-product interaction to generate attention maps, massive information redundancy and expensive computational costs are incurred. Simultaneously, most previous feed-forward networks in Transformers do not model the multi-range information that plays an important role for feature reconstruction. Based on the above observations, we propose an effective Multi-Scale Downsampling-Based Transformer (MDFormer) for low-light image enhancement, which consists of multi-scale downsampling-based self-attention (MDSA) and multi-range gated extraction block (MGEB). MDSA employs downsampling with two different factors for queries and keys to save the computational cost when implementing self-attention operations within channel dimension. Furthermore, we introduce learnable parameters for the two generated attention maps to adjust the weights for fusion, which allows MDSA to adaptively retain the most significant attention scores from attention maps. The proposed MGEB captures multi-range information by virtue of the multi-scale depth-wise convolutions and dilated convolutions, to enhance modeling capabilities. Extensive experiments on four challenging low-light image enhancement datasets demonstrate that our method outperforms the state-of-the-art.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1575-1579"},"PeriodicalIF":3.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hamza Djelouat;Reijo Leinonen;Mikko J. Sillanpää;Bhaskar D. Rao;Markku Juntti
{"title":"Adaptive and Self-Tuning SBL With Total Variation Priors for Block-Sparse Signal Recovery","authors":"Hamza Djelouat;Reijo Leinonen;Mikko J. Sillanpää;Bhaskar D. Rao;Markku Juntti","doi":"10.1109/LSP.2025.3556790","DOIUrl":"https://doi.org/10.1109/LSP.2025.3556790","url":null,"abstract":"This letter addresses the problem of estimating block sparse signal with unknown group partitions in a multiple measurement vector (MMV) setup. We propose a Bayesian framework by applying an adaptive total variation (TV) penalty on the hyper-parameter space of the sparse signal. The main contributions are two-fold. 1) We extend the TV penalty beyond the immediate neighbor, thus enabling better capture of the signal structure. 2) A dynamic framework is provided to learn the regularization weights for the TV penalty based on the statistical dependencies between the entries of tentative blocks, thus eliminating the need for fine-tuning. The superior performance of the proposed method is empirically demonstrated by extensive computer simulations with the state-of-art benchmarks. The proposed solution exhibits both excellent performance and robustness against sparsity model mismatch.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1555-1559"},"PeriodicalIF":3.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10946850","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CLIP-TNseg: A Multi-Modal Hybrid Framework for Thyroid Nodule Segmentation in Ultrasound Images","authors":"Xinjie Sun;Boxiong Wei;Yalong Jiang;Liquan Mao;Qi Zhao","doi":"10.1109/LSP.2025.3556789","DOIUrl":"https://doi.org/10.1109/LSP.2025.3556789","url":null,"abstract":"Thyroid nodule segmentation in ultrasound images is crucial for accurate diagnosis and treatment planning. However, existing methods struggle with segmentation accuracy, interpretability, and generalization. This letter proposes CLIP-TNseg, a novel framework that integrates a multimodal large model with a neural network architecture to address these challenges. We innovatively divide visual features into coarse-grained and fine-grained components, leveraging textual integration with coarse-grained features for enhanced semantic understanding. Specifically, the Coarse-grained Branch extracts high-level semantic features from a frozen CLIP model, while the Fine-grained Branch refines spatial details using U-Net-style residual blocks. Extensive experiments on the newly collected PKTN dataset and other public datasets demonstrate the competitive performance of CLIP-TNseg. Additional ablation experiments confirm the critical contribution of textual inputs, particularly highlighting the effectiveness of our carefully designed textual prompts compared to fixed or absent textual information.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1625-1629"},"PeriodicalIF":3.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143871045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TJCMNet: An Efficient Vision-Text Joint Identity Clues Mining Network for Visible-Infrared Person Re-Identification","authors":"ZhuXuan Cheng;ZhiJia Zhang;Huijie Fan;XingQi Na","doi":"10.1109/LSP.2025.3556784","DOIUrl":"https://doi.org/10.1109/LSP.2025.3556784","url":null,"abstract":"Retrieving images for Visible-Infrared Person Re-identification task is challenging, because of the huge modality discrepancy caused by the different imaging principle of RGB and infrared cameras. Existing approaches rely on seeking distinctive information within unified visual feature space, ignoring the stable identity information brought by textual description. To overcome these problems, this letter propose a novel Text-vision Joint Clue Mining (TJCM) network to aggregate vision and text features, then distill the joint knowledge for enhancing the modality-shared branch. Specifically, we first extract modality-shared and textual features using a parameter-shared vision encoder and a text encoder. Then, a text-vision co-refinement module is proposed to refine the implicit information within vision feature and text feature, then aggregate them into joint feature. Finally, introduce the heterogeneous distillation alignment loss provides enhancement for modality-shared feature through joint knowledge distillation at feature-level and logit-level. Our TJCMNet achieves significant improvements over the state-of-the-art methods on three mainstream datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1615-1619"},"PeriodicalIF":3.2,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143871028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Bank-Guided Reconstruction for Anomaly Detection","authors":"Sihan He;Tao Zhang;Wei Song;Hongbin Yu","doi":"10.1109/LSP.2025.3555544","DOIUrl":"https://doi.org/10.1109/LSP.2025.3555544","url":null,"abstract":"Visual surface anomaly detection targets the location of anomalies, with numerous methods available to address the challenge. Reconstruction-based methods are popular for their adaptability and interpretability. However, reconstruction-based methods currently struggle with the challenge of achieving low image fidelity and a tendency to reconstruct anomalies. To overcome these challenges, we introduces the Feature Bank-guided Reconstruction method (FBR), incorporating three innovative modules: anomaly simulation, feature bank module, and a cross-fused Discrete Cosine Transform channel attention module. Guided by these modules, our method is capable of reconstructing images with enhanced robustness. The experimental results validate the effectiveness of the proposed approach, which not only achieves outstanding performance on the BeanTech AD dataset with an 96.4% image-AUROC and a 97.3% pixel-AUROC, but also demonstrates competitive performance on the MVTec AD dataset with a 99.5% image-AUROC and a 98.3% pixel-AUROC.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1480-1484"},"PeriodicalIF":3.2,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143809016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}