Sungsoo Kim;Dongjune Lee;Ju Yeon Kang;Myeonghun Jeong;Nam Soo Kim
{"title":"Sampling-Based Pruned Knowledge Distillation for Training Lightweight RNN-T","authors":"Sungsoo Kim;Dongjune Lee;Ju Yeon Kang;Myeonghun Jeong;Nam Soo Kim","doi":"10.1109/LSP.2025.3528364","DOIUrl":"https://doi.org/10.1109/LSP.2025.3528364","url":null,"abstract":"We present a novel training method for small-scale RNN-T models, widely used in real-world speech recognition applications. Despite efforts to scale down models for edge devices, the demand for even smaller and more compact speech recognition models persists to accommodate a broader range of devices. In this letter, we propose Sampling-based Pruned Knowledge Distillation (SP-KD) for training lightweight RNN-T models. In contrast to the conventional knowledge distillation techniques, the proposed method enables student models to distill knowledge from the distribution of teacher models, which is estimated by considering not only the best paths but also less likely paths. Additionally, we leverage pruning the output lattice of RNN-T to comprehensively transfer knowledge from teacher models to student models. Experimental results demonstrate that our proposed method outperforms the baseline in training tiny RNN-T models.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"631-635"},"PeriodicalIF":3.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video Inpainting Localization With Contrastive Learning","authors":"Zijie Lou;Gang Cao;Man Lin","doi":"10.1109/LSP.2025.3527196","DOIUrl":"https://doi.org/10.1109/LSP.2025.3527196","url":null,"abstract":"Video inpainting techniques typically serve to restore destroyed or missing regions in digital videos. However, such techniques may also be illegally used to remove important objects for creating forged videos. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). A 3D Uniformer encoder is applied to the video noise residual for learning effective spatiotemporal features. To enhance discriminative power, supervised contrastive learning is adopted to capture the local regional inconsistency through separating the pristine and inpainted pixels. The pixel-wise inpainting localization map is yielded by a lightweight convolution decoder with two-stage training. To prepare enough training samples, we build a video object segmentation dataset (VOS2k5) of 2500 videos with pixel-level annotations per frame. Extensive experimental results validate the superiority of ViLocal over the state-of-the-arts.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"611-615"},"PeriodicalIF":3.2,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Superpixel-Guided Non-Homogeneous Image Dehazing","authors":"Hao Zhang;Ping Lu;Te Qi;Yan Xu;Tieyong Zeng","doi":"10.1109/LSP.2025.3527197","DOIUrl":"https://doi.org/10.1109/LSP.2025.3527197","url":null,"abstract":"Image dehazing is regarded as a fundamental image processing task with a major impact on higher-level imaging tasks. Many existing haze removal methods are designed for homogeneous haze, but in real-world cases, the haze is normally non-homogeneous. Superpixels, which segment an image into a set of closely spaced regions, can be employed in real-world scenarios to deal with non-homogeneous haze. In our paper, an adaptive non-homogeneous image dehazing approach that utilizes the superpixel-guided algorithm is designed to segment different hazy regions. Considering that both ambient light and transmission map estimation have a significant impact on the results, our research focuses on the development of a variational dehazing model that takes into account non-uniform ambient light and non-uniform transmission maps to address varying levels of haze. A series of numerical results illustrate the superiority and efficacy of our method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"591-595"},"PeriodicalIF":3.2,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10833755","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-View Fusion for Multi-View Clustering","authors":"Zhijie Huang;Binqiang Huang;Qinghai Zheng;Yuanlong Yu","doi":"10.1109/LSP.2025.3527231","DOIUrl":"https://doi.org/10.1109/LSP.2025.3527231","url":null,"abstract":"Multi-view clustering has attracted significant attention in recent years because it can leverage the consistent and complementary information of multiple views to improve clustering performance. However, effectively fuse the information and balance the consistent and complementary information of multiple views are common challenges faced by multi-view clustering. Most existing multi-view fusion works focus on weighted-sum fusion and concatenating fusion, which unable to fully fuse the underlying information, and not consider balancing the consistent and complementary information of multiple views. To this end, we propose Cross-view Fusion for Multi-view Clustering (CFMVC). Specifically, CFMVC combines deep neural network and graph convolutional network for cross-view information fusion, which fully fuses feature information and structural information of multiple views. In order to balance the consistent and complementary information of multiple views, CFMVC enhances the correlation among the same samples to maximize the consistent information while simultaneously reinforcing the independence among different samples to maximize the complementary information. Experimental results on several multi-view datasets demonstrate the effectiveness of CFMVC for multi-view clustering task.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"621-625"},"PeriodicalIF":3.2,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuhang Feng;Ruifeng Duan;Shurui Li;Peng Cheng;Wanchun Liu
{"title":"A Dual-Branch Network With Feature Assistance for Automatic Modulation Recognition","authors":"Yuhang Feng;Ruifeng Duan;Shurui Li;Peng Cheng;Wanchun Liu","doi":"10.1109/LSP.2025.3527901","DOIUrl":"https://doi.org/10.1109/LSP.2025.3527901","url":null,"abstract":"Automatic modulation recognition (AMR) is a critical technology in wireless communications, aiming to achieve high recognition accuracy with low complexity in increasingly intricate electromagnetic environments. To tackle this challenge, in this paper, we propose a dual-branch convolution cascaded transformer network with feature assistance, termed DCTFANet. To enhance the differentiation between samples, we employ the gramian angular field (GAF) to capture potential temporal correlations between each data point. Subsequently, both I/Q sequences and GAF data are input into the model for joint signal feature extraction. The network backbone is constructed using multiple improved depthwise separable convolution (DSC) blocks, which significantly reduce computational complexity. Moreover, the backbone depth is flexibly adjustable to fully exploit local features of different data types. Finally, feature transition and the transformer encoder are used to reduce parameters and extract global feature. Experimental results on RML2016.10b show that the proposed method achieves higher recognition accuracy compared to several state-of-the-art methods, especially at low signal-to-noise ratios (SNRs), with an increase of at least 10.80% at −20 dB.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"701-705"},"PeriodicalIF":3.2,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rethinking Feature Guidance for Medical Image Segmentation","authors":"Wei Wang;Jixing He;Xin Wang","doi":"10.1109/LSP.2025.3526745","DOIUrl":"https://doi.org/10.1109/LSP.2025.3526745","url":null,"abstract":"Despite the evident advantages of variants of UNet in medical image segmentation, these methods still exhibit limitations in the extraction of foreground, background, and boundary features. Based on feature guidance, we propose a new network (FG-UNet). Specifically, adjacent high-level and low-level features are used to gradually guide the network to perceive lesion features. To accommodate lesion features of different scales, the multi-order gated aggregation (MGA) block is designed based on multi-order feature interactions. Furthermore, a novel feature-guided context-aware (FGCA) block is devised to enhance the capability of FG-UNet to segment lesions by fusing boundary-enhancing features, object-enhancing features, and uncertain areas. Eventually, a bi-dimensional interaction attention (BIA) block is designed to enable the network to highlight crucial features effectively. To appraise the effectiveness of FG-UNet, experiments were conducted on Kvasir-seg, ISIC2018, and COVID-19 datasets. The experimental results illustrate that FG-UNet achieves a DSC score of 92.70% on the Kvasir-seg dataset, which is 1.15% higher than that of the latest SCUNet++, 4.70% higher than that of ACC-UNet, and 5.17% higher than that of UNet.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"641-645"},"PeriodicalIF":3.2,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PFCNet: Enhancing Rail Surface Defect Detection With Pixel-Aware Frequency Conversion Networks","authors":"Yue Wu;Fangfang Qiang;Wujie Zhou;Weiqing Yan","doi":"10.1109/LSP.2025.3525855","DOIUrl":"https://doi.org/10.1109/LSP.2025.3525855","url":null,"abstract":"Applying computer vision techniques to rail surface defect detection (RSDD) is crucial for preventing catastrophic accidents. However, challenges such as complex backgrounds and irregular defect shapes persist. Previous methods have focused on extracting salient object information from a pixel perspective, thereby neglecting valuable high- and low-frequency image information, which can better capture global structural information. In this study, we design a pixel-aware frequency conversion network (PFCNet) to explore RSDD from a frequency domain perspective. We use different attention mechanisms and frequency enhancement for high-level and shallow features to explore local details and global structures comprehensively. In addition, we design a dual-control reorganization module to refine the features across levels. We conducted extensive experiments on an industrial RGB-D dataset (NEU RSDDS-AUG), and PFCNet achieved superior performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"606-610"},"PeriodicalIF":3.2,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Piecewise Student's t-distribution Mixture Model-Based Estimation for NAND Flash Memory Channels","authors":"Cheng Wang;Zhen Mei;Jun Li;Kui Cai;Lingjun Kong","doi":"10.1109/LSP.2024.3521326","DOIUrl":"https://doi.org/10.1109/LSP.2024.3521326","url":null,"abstract":"Accurate modeling and estimation of the threshold voltages of the flash memory can facilitate the efficient design of channel codes and detectors. However, most flash memory channel models are based on Gaussian distributions, which fail to capture certain key properties of the threshold voltages, such as their heavy-tails. To enhance the model accuracy, we first propose a piecewise student's t-distribution mixture model (PSTMM), which features degrees of freedom to control the left and right tails of the voltage distributions. We further propose an PSTMM based expectation maximization (PSTMM-EM) algorithm to estimate model parameters for flash memories by alternately computing the expected values of the missing data and maximizing the likelihood function with respect to the model parameters. Simulation results demonstrate that our proposed algorithm exhibits superior stability and can effectively extend the flash memory lifespan by 1700 program/erase (PE) cycles compared with the existing parameter estimation algorithms.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"451-455"},"PeriodicalIF":3.2,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Noise Covariance Matrix Estimation in Block-Correlated Noise Field for Direction Finding","authors":"Majdoddin Esfandiari;Sergiy A. Vorobyov","doi":"10.1109/LSP.2025.3525898","DOIUrl":"https://doi.org/10.1109/LSP.2025.3525898","url":null,"abstract":"A noise covariance matrix estimation approach in unknown noise field for direction finding applicable for the practically important cases of nonuniform and block-diagonal sensor noise is proposed. It is based on an alternating procedure that can be adjusted for a specific noise type. Numerical simulations are conducted in order to establish the generality and superiority of the proposed approach over the existing state-of-the-art methods, especially in challenging scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"531-535"},"PeriodicalIF":3.2,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10824965","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142976117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bridging the Modality Gap in Multimodal Eye Disease Screening: Learning Modality Shared-Specific Features via Multi-Level Regularization","authors":"Jiayue Zhao;Shiman Li;Yi Hao;Chenxi Zhang","doi":"10.1109/LSP.2025.3526094","DOIUrl":"https://doi.org/10.1109/LSP.2025.3526094","url":null,"abstract":"Color fundus photography (CFP) and optical coherence tomography (OCT) are two common modalities used in eye disease screening, providing crucial complementary information for the diagnosis of eye diseases. However, existing multimodal learning methods cannot fully leverage the information from each modality due to the large dimensional and semantic gap between 2D CFP and 3D OCT images, leading to suboptimal classification performance. To bridge the modality gap and fully exploit the information from each modality, we propose a novel feature disentanglement method that decomposes features into modality-shared and modality-specific components. We design a multi-level regularization strategy including intra-modality, inter-modality, and intra-inter-modality regularization to facilitate the effective learning of the modality Shared-Specific features. Our method achieves state-of-the-art performance on two eye disease diagnosis tasks using two publicly available datasets. Our method promises to serve as a useful tool for multimodal eye disease diagnosis.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"586-590"},"PeriodicalIF":3.2,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}