{"title":"Compressed Line Spectral Estimation Using Covariance: A Sparse Reconstruction Perspective","authors":"Jiahui Cao;Zhibo Yang;Xuefeng Chen","doi":"10.1109/LSP.2024.3457449","DOIUrl":"10.1109/LSP.2024.3457449","url":null,"abstract":"Efficient line spectral estimation methods applicable to sub-Nyquist sampling are drawing considerable attention in both academia and industry. In this letter, we propose an enhanced compressed sensing (CS) framework for line spectral estimation, termed sparsity-based compressed covariance sensing (SCCS). In terms of sampling, SCCS is implemented by periodic non-uniform sampling; In terms of recovery, SCCS focuses on compressed line spectral recovery using covariance information. Due to the dual priors on sparsity and structure, SCCS theoretically performs better than CS in compressed line spectral estimation. We explain this superiority from the mutual incoherence perspective: the sensing matrix in SCCS has a lower mutual coherence than that in classic CS. Extensive experimental results show a high consistency with the theoretical inference. All in all, SCCS opens many avenues for line spectral estimation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maximum Entropy Attack on Decision Fusion With Herding Behaviors","authors":"Yiqing Lin;H. Vicky Zhao","doi":"10.1109/LSP.2024.3457244","DOIUrl":"10.1109/LSP.2024.3457244","url":null,"abstract":"The reliability and security of distributed detection systems have become increasingly important due to their growing prevalence in various applications. As advancements in human-machine systems continue, human factors, such as herding behaviors, are becoming influential in decision fusion process of these systems. The presence of malicious users further highlights the necessity to mitigate security concerns. In this paper, we propose a maximum entropy attack exploring the herding behaviors of users to amplify the hazard of attackers. Different from prior works that try to maximize the fusion error rate, the proposed attack maximizes the entropy of inferred system states from the fusion center, making the fusion results the same as a random coin toss. Moreover, we design static and dynamic attack modes to maximize the entropy of fusion results at the steady state and during the dynamic evolution stage, respectively. Simulation results show that the proposed attack strategy can cause the fusion accuracy to hover around 50% and existing fusion rules cannot resist our proposed attack, demonstrating its effectiveness.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kalman-SSM: Modeling Long-Term Time Series With Kalman Filter Structured State Spaces","authors":"Zheng Zhou;Xu Guo;Yu-Jie Xiong;Chun-Ming Xia","doi":"10.1109/LSP.2024.3457862","DOIUrl":"10.1109/LSP.2024.3457862","url":null,"abstract":"In the field of time series forecasting, time series are often considered as linear time-varying systems, which facilitates the analysis and modeling of time series from a structural state perspective. Due to the non-stationary nature and noise interference in real-world data, existing models struggle to predict long-term time series effectively. To address this issue, we propose a novel model that integrates the Kalman filter with a state space model (SSM) approach to enhance the accuracy of long-term time series forecasting. The Kalman filter requires recursive computation, whereas the SSM approach reformulates the Kalman filtering process into a convolutional form, simplifying training and enhancing model efficiency. Our Kalman-SSM model estimates the future state of dynamic systems for forecasting by utilizing a series of time series data containing noise. In real-world datasets, the Kalman-SSM has demonstrated competitive performance and satisfactory efficiency in comparison to state-of-the-art (SOTA) models.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Visual Representations of Masked Autoencoders With Artifacts Suppression","authors":"Zhengwei Miao;Hui Luo;Dongxu Liu;Jianlin Zhang","doi":"10.1109/LSP.2024.3458792","DOIUrl":"10.1109/LSP.2024.3458792","url":null,"abstract":"Recently, Masked Autoencoders (MAE) have gained attention for their abilities to generate visual representations efficiently through pretext tasks. However, there has been little research evaluating the visual representations obtained by pre-trained MAE during the fine-tuning process. In this study, we address the gap by examining the attention maps within each block of the pre-trained MAE during the fine-tuning process. We observed artifacts in pre-trained models, which appear as significant responses in the attention maps of shallow blocks. These artifacts may negatively impact the transfer ability performance of MAE. To address this issue, we localize the cause of these artifacts to the asymmetry between the pre-training and fine-tuning processes. To suppress these artifacts, we propose a novel semantic masking strategy. This strategy aims to preserve complete and continuous semantic information within visible patches while maintaining randomness to facilitate robust representation learning. Experimental results demonstrate that the proposed masking strategy improves the performance of various downstream tasks while reducing artifacts. Specifically, we observed a 3.2% improvement in linear probing, a 0.5% enhancement in fine-tuning on Imagenet1K, and a 0.6% increase in semantic segmentation on ADE20K.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangyu Cheng;Yaofei Wang;Chang Liu;Donghui Hu;Zhaopin Su
{"title":"HiFi-GANw: Watermarked Speech Synthesis via Fine-Tuning of HiFi-GAN","authors":"Xiangyu Cheng;Yaofei Wang;Chang Liu;Donghui Hu;Zhaopin Su","doi":"10.1109/LSP.2024.3456673","DOIUrl":"10.1109/LSP.2024.3456673","url":null,"abstract":"Advancements in speech synthesis technology bring generated speech closer to natural human voices, but they also introduce a series of potential risks, such as the dissemination of false information and voice impersonation. Therefore, it becomes significant to detect any potential misuse of the released speech content. This letter introduces an active strategy that combines audio watermarking with the HiFi-GAN vocoder to embed an invisible watermark in all synthesized speech for detection purposes. We first pre-train a watermark extraction network as the watermark extractor, and then use the watermark extraction loss and speech quality loss of the extractor to adjust the HiFi-GAN generator to ensure that the watermark can be extracted from the synthesized speech. We evaluate the imperceptibility and robustness of the watermark across various speech synthesis models. The experimental results demonstrate that our method effectively withstands various attacks and exhibits excellent imperceptibility. Moreover, our method is universal and compatible with various vocoder-based speech synthesis models.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive Fusion and Correlation Network for Three-Modal Images Few-Shot Semantic Segmentation","authors":"Haolan He;Xianguo Dong;Xiaofei Zhou;Bo Wang;Jiyong Zhang","doi":"10.1109/LSP.2024.3456634","DOIUrl":"10.1109/LSP.2024.3456634","url":null,"abstract":"This letter presents a novel method for three-modal images few-shot semantic segmentation. Some previous efforts fuse multiple modalities before feature correlation, while this changes the original visual information that is useful to subsequent feature matching. Others are built based on early correlation learning, which can cause details loss and thereby defects multi-modal integration. To address these challenges, we build a novel interactive fusion and correlation network (IFCNet). Specifically, the proposed fusing and correlating (FC) module performs feature correlating and attention-based multi-modal fusing interactively, which establishes effective inter-modal complementarity and benefits intra-modal query-support correlation. Furthermore, we add a multi-modal correlation (MC) module, which leverages multi-layer cosine similarity maps to enrich multi-modal visual correspondence. Experiments on the VDT-2048-5\u0000<inline-formula><tex-math>$^{i}$</tex-math></inline-formula>\u0000 dataset demonstrate the network's superior performance, which outperforms existing state-of-the-art methods in both 1-shot and 5-shot settings. The study also includes an ablation analysis to validate the contributions of the FC module and the MC module to the overall segmentation accuracy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial Adaptive Filter Network With Scale-Sharing Convolution for Image Demoiréing","authors":"Yong Xu;Zhiyu Wei;Ruotao Xu;Zihan Zhou;Zhuliang Yu","doi":"10.1109/LSP.2024.3451948","DOIUrl":"10.1109/LSP.2024.3451948","url":null,"abstract":"Removing moiré patterns is a challenging task as it is a spatially varying degradation that varies in shape, color and scale. Existing image restoration models often rely on static convolutional neural networks (CNNs)-based architectures, and hence potentially suboptimal for addressing the diverse manifestations of moiré patterns across different images and spatial positions. To this end, we propose a spatially adaptive neural network for image demoiréing. This network introduces a dual-branch filter prediction module engineered to predict pixel-wise adaptive filters that can process moiré patterns of varying orientations and color-shift issues. To further tackle the challenge presented by scale variability, a scale-sharing convolution module is proposed, utilizing pixel-wise adaptive filters with multiple dilations to handle moiré patterns of different sizes but similar shapes effectively. Upon extensive evaluations of three benchmark datasets, our model consistently outperforms existing methods, yielding a PSNR improvement of over 0.37dB across all evaluated datasets and providing additional benefits in terms of model size.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-Stream Temporal Feature Aggregation Based on Clustering for Few-Shot Action Recognition","authors":"Long Deng;Ao Li;Bingxin Zhou;Yongxin Ge","doi":"10.1109/LSP.2024.3456670","DOIUrl":"10.1109/LSP.2024.3456670","url":null,"abstract":"The metric learning paradigm has achieved notable success in few-shot action recognition; however, it faces unaddressed challenges. Specifically, \u0000<bold>(1)</b>\u0000 limited training data could impede the exploration of temporal action relations, and \u0000<bold>(2)</b>\u0000 precision would decline from the presence of outliers during the frame-level feature alignment. To address the challenges, we propose a two-stream temporal feature aggregation method based on clustering, incorporating a temporal augmentation module (TAM) and a feature aggregation module (FAM). The TAM adeptly integrates three consecutive grayscale frames into the original RGB frame through weighted summation, thereby addressing the color-related misguidance and enhancing the temporal information extraction. Meanwhile, the FAM employs clustering to aggregate the frame-level features into high semantic sub-actions and replaces the original features with cluster centers to mitigate the adverse impact of outliers on the model performance. Experimental results on benchmark datasets demonstrate the effectiveness of our method in few-shot action recognition. We validate our proposed approach by conducting comprehensive ablation experiments.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed Online Ordinal Regression Based on VUS Maximization","authors":"Huan Liu;Jiankai Tu;Chunguang Li","doi":"10.1109/LSP.2024.3456629","DOIUrl":"10.1109/LSP.2024.3456629","url":null,"abstract":"Ordinal regression (OR) is a multi-class classification problem with ordered labels. The objective functions of most OR methods are based on the misclassification error. The volume under the ROC surface (VUS) is a measure of OR that quantifies the ranking ability of OR models. It can also be used as an objective function in OR. In practice, data may be collected by multiple nodes in a distributed and online manner, and is difficult to process centrally. In this paper, we intend to develop a VUS-based distributed online OR method. Computing VUS requires a sequence of data from all categories, but the available online data may not cover all categories and the required data may distribute across different nodes. Besides, the existing approximation methods of VUS are inappropriate for using in OR. To address these issues, we first propose two new surrogate losses of the VUS in OR. We then derive their decomposed formulations and propose distributed online OR algorithms based on VUS maximization (dVMOR). The experimental results demonstrate their effectiveness.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yifan Wang;Shixuan Feng;Wei Zhang;Kai Li;Fuzheng Yang
{"title":"Fast H.266/VVC Intra Coding by Early Skipping Joint Coding of Chroma Residuals","authors":"Yifan Wang;Shixuan Feng;Wei Zhang;Kai Li;Fuzheng Yang","doi":"10.1109/LSP.2024.3456631","DOIUrl":"10.1109/LSP.2024.3456631","url":null,"abstract":"Versatile Video Coding (H.266/VVC) significantly enhances compression performance but also increases encoding complexity. Numerous fast intra coding algorithms have been proposed to balance the performance gains with the increased complexity introduced by various advanced tools. However, few fast algorithms are specific to Joint Coding of Chroma Residuals (JCCR), a newly introduced technique that requires additional rate-distortion optimization (RDO) processes. To address this gap, this letter introduces a fast intra coding algorithm that reduces complexity by early skipping unnecessary JCCR. Specifically, we propose skipping JCCR when the correlation between chroma components is low, and we construct a JCCR normalized reconstruction distortion to measure this correlation. The skip conditions are determined by statistical analysis, intelligently reducing the number of RDO processes for chroma components. Experimental results show that, compared to the Fraunhofer Versatile Video Encoder (VVenC), our algorithm achieves encoding time savings of 1.68% and 2.82% under Random Access (RA) and All Intra (AI) settings, respectively, with a performance loss of 0.03% and 0.09% only, demonstrating the effectiveness of our fast algorithm.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}