{"title":"Diagnosis of Parkinson's Disease Based on Hybrid Fusion Approach of Offline Handwriting Images","authors":"Shanyu Dong;Jin Liu;Jianxin Wang","doi":"10.1109/LSP.2024.3496579","DOIUrl":"https://doi.org/10.1109/LSP.2024.3496579","url":null,"abstract":"Handwriting images are commonly used to diagnose Parkinson's disease due to their intuitive nature and easy accessibility. However, existing methods have not explored the potential of the fusion of different handwriting image sources for diagnosis. To address this issue, this study proposes a hybrid fusion approach that makes use of the visual information derived from different handwriting images and handwriting templates, significantly enhancing the performance in diagnosing Parkinson's disease. The proposed method involves several key steps. Initially, different preprocessed handwriting images undergo pixel-level fusion using Laplacian transformation. Subsequently, the fused and original images are fed into a pre-trained CNN separately to extract visual features. Finally, feature-level fusion is performed by concatenating the feature vectors extracted from the flatten layer, and the fused feature vectors are input into SVM to obtain classification results. Our experimental results validate that the proposed method achieves excellent performance by only utilizing visual features from images, with 95.45% accuracy on the NewHandPD. Furthermore, the results obtained on our dataset verify the strong generalizability of the proposed approach.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3179-3183"},"PeriodicalIF":3.2,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142671993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Multi-Prototypes Aware Integration for Zero-Shot Cross-Domain Slot Filling","authors":"Shaoshen Chen;Peijie Huang;Zhanbiao Zhu;Yexing Zhang;Yuhong Xu","doi":"10.1109/LSP.2024.3495561","DOIUrl":"https://doi.org/10.1109/LSP.2024.3495561","url":null,"abstract":"Cross-domain slot filling is a widely explored problem in spoken language understanding (SLU), which requires the model to transfer between different domains under data sparsity conditions. Dominant two-step hierarchical models first extract slot entities and then calculate the similarity score between slot description-based prototypes and the last hidden layer of the slot entity, selecting the closest prototype as the predicted slot type. However, these models only use slot descriptions as prototypes, which lacks robustness. Moreover, these approaches have less regard for the inherent knowledge in the slot entity embedding to suffer from the issue of overfitting. In this letter, we propose a Robust Multi-prototypes Aware Integration (RMAI) method for zero-shot cross-domain slot filling. In RMAI, more robust slot entity-based prototypes and inherent knowledge in the slot entity embedding are utilized to improve the classification performance and alleviate the risk of overfitting. Furthermore, a multi-prototypes aware integration approach is proposed to effectively integrate both our proposed slot entity-based prototypes and the slot description-based prototypes. Experimental results on the SNIPS dataset demonstrate the well performance of RMAI.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3169-3173"},"PeriodicalIF":3.2,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SoLAD: Sampling Over Latent Adapter for Few Shot Generation","authors":"Arnab Kumar Mondal;Piyush Tiwary;Parag Singla;Prathosh A.P.","doi":"10.1109/LSP.2024.3496822","DOIUrl":"https://doi.org/10.1109/LSP.2024.3496822","url":null,"abstract":"Few-shot adaptation of Generative Adversarial Networks (GANs) under distributional shift is generally achieved via regularized retraining or latent space adaptation. While the former methods offer fast inference, the latter generate diverse images. This work aims to solve these issues and achieve the best of both regimes in a principled manner via Bayesian reformulation of the GAN objective. We highlight a hidden expectation term over GAN parameters, that is often overlooked but is critical in few-shot settings. This observation helps us justify prepending a latent adapter network (LAN) before a pre-trained GAN and propose a sampling procedure over the parameters of LAN (called SoLAD) to compute the usually-ignored hidden expectation. SoLAD enables fast generation of quality samples from multiple few-shot target domains using a GAN pre-trained on a single source domain.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3174-3178"},"PeriodicalIF":3.2,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differentiable Duration Refinement Using Internal Division for Non-Autoregressive Text-to-Speech","authors":"Jaeuk Lee;Yoonsoo Shin;Joon-Hyuk Chang","doi":"10.1109/LSP.2024.3495578","DOIUrl":"https://doi.org/10.1109/LSP.2024.3495578","url":null,"abstract":"Most non-autoregressive text-to-speech (TTS) models acquire target phoneme duration (target duration) from internal or external aligners. They transform the speech-phoneme alignment produced by the aligner into the target duration. Since this transformation is not differentiable, the gradient of the loss function that maximizes the TTS model's likelihood of speech (e.g., mel spectrogram or waveform) cannot be propagated to the target duration. In other words, the target duration is produced regardless of the TTS model's likelihood of speech. Hence, we introduce a differentiable duration refinement that produces a learnable target duration for maximizing the likelihood of speech. The proposed method uses an internal division to locate the phoneme boundary, which is determined to improve the performance of the TTS model. Additionally, we propose a duration distribution loss to enhance the performance of the duration predictor. Our baseline model is JETS, a representative end-to-end TTS model, and we apply the proposed methods to the baseline model. Experimental results show that the proposed method outperforms the baseline model in terms of subjective naturalness and character error rate.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3154-3158"},"PeriodicalIF":3.2,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142671988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengyi Liu;Longzhen Wang;Xianyong Fang;Zhengzheng Tu;Linbo Wang
{"title":"LFSamba: Marry SAM With Mamba for Light Field Salient Object Detection","authors":"Zhengyi Liu;Longzhen Wang;Xianyong Fang;Zhengzheng Tu;Linbo Wang","doi":"10.1109/LSP.2024.3493799","DOIUrl":"https://doi.org/10.1109/LSP.2024.3493799","url":null,"abstract":"A light field camera can reconstruct 3D scenes using captured multi-focus images that contain rich spatial geometric information, enhancing applications in stereoscopic photography, virtual reality, and robotic vision. In this work, a state-of-the-art salient object detection model for multi-focus light field images, called LFSamba, is introduced to emphasize four main insights: (a) Efficient feature extraction, where SAM is used to extract modality-aware discriminative features; (b) Inter-slice relation modeling, leveraging Mamba to capture long-range dependencies across multiple focal slices, thus extracting implicit depth cues; (c) Inter-modal relation modeling, utilizing Mamba to integrate all-focus and multi-focus images, enabling mutual enhancement; (d) Weakly supervised learning capability, developing a scribble annotation dataset from an existing pixel-level mask dataset, establishing the first scribble-supervised baseline for light field salient object detection.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3144-3148"},"PeriodicalIF":3.2,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142671986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Binomial Harmonic Approximation Double-Phase Estimator Tracking for BOC Modulated Signals","authors":"Xiangjie Ding;Zhi Zhao;Ying Yang","doi":"10.1109/LSP.2024.3493793","DOIUrl":"https://doi.org/10.1109/LSP.2024.3493793","url":null,"abstract":"For binary offset carrier (BOC) signal tracking, the Two-Dimensional (2D) tracking method that independently tracks the code and subcarrier has garnered significant attention. The double estimator (DE) and the double phase estimator (DPE) are prominent approaches. However, the performance of the DE suffers under limited front-end bandwidths and sampling rates. The DPE, which treats the subcarrier as a sine wave, neglects side lobes, leading to performance degradation. This letter introduces the Binomial Harmonic Approximation DPE (BH-DPE), which uses two phase lock loops to track the first and third harmonics of the subcarrier. By applying a weighted combination of correlation values, the BH-DPE effectively reduces coherent output signal-to-noise ratio (SNR) loss and enhances ranging accuracy through combined delay estimations from both the harmonics. Theoretical analysis and simulations show that the BH-DPE outperforms both the DE and the DPE in terms of SNR loss and ranging accuracy under constrained front-end bandwidths and sampling rates, and approaches the DE while exceeds the DPE under wide front-end bandwidths.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3139-3143"},"PeriodicalIF":3.2,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CADeTT: Context-Adaptive Deep-Trinary-Tree Lossless Compression of Event Camera Frames","authors":"Ionut Schiopu;Radu Ciprian Bilcu","doi":"10.1109/LSP.2024.3493801","DOIUrl":"https://doi.org/10.1109/LSP.2024.3493801","url":null,"abstract":"The letter proposes an efficient context-adaptive lossless compression method for encoding event frame sequences. A first contribution proposes the use of a deep-ternary-tree of the current pixel position context as the context-tree model selector. The arithmetic codec encodes each trinary symbol using the probability distribution of the associated context-tree-leaf model. Another contribution proposes a novel context design based on several frames, where the context order controls the codec's complexity. Another contribution proposes a model search procedure to replace the context-tree prune-and-encode strategy by searching for the closest “mature” context model between lower-order context-tree models. The experimental evaluation shows that the proposed method provides an improved coding performance of 34.34% and a smaller runtime of up to \u0000<inline-formula><tex-math>$5.18times$</tex-math></inline-formula>\u0000 compared with state-of-the-art lossless image codec FLIF and, respectively, 6.95% and \u0000<inline-formula><tex-math>$14.42times$</tex-math></inline-formula>\u0000 compared with our prior work.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3149-3153"},"PeriodicalIF":3.2,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142671989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengru Sun;Weijian Liu;Jun Liu;Chengpeng Hao;Kefei Li
{"title":"Multiple Subspace-Based Target Detection in Deterministic Interference","authors":"Mengru Sun;Weijian Liu;Jun Liu;Chengpeng Hao;Kefei Li","doi":"10.1109/LSP.2024.3491012","DOIUrl":"https://doi.org/10.1109/LSP.2024.3491012","url":null,"abstract":"In this letter, the problem of detecting a multiple subspace-based target in the presence of deterministic interference is considered. To solve the problem, we utilize the Kullback-Leibler information criterion and model order selection rules to design detection schemes. The alternative hypothesis related to the most likely signal subspace is selected from multiple alternative hypotheses, and is tested versus the null hypothesis for target detection. Numerical examples verify the effectiveness of the proposed detection schemes, which can achieve the target detection and subspace-based target classification simultaneously.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3134-3138"},"PeriodicalIF":3.2,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142671985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Precise Analysis of Covariance Identifiability for Activity Detection in Grant-Free Random Access","authors":"Shengsong Luo;Junjie Ma;Chongbin Xu;Xin Wang","doi":"10.1109/LSP.2024.3491018","DOIUrl":"https://doi.org/10.1109/LSP.2024.3491018","url":null,"abstract":"We consider the identifiability issue of maximum-likelihood based activity detection in massive MIMO-based grant-free random access. An intriguing observation by (Chen et al., 2022) indicates that the identifiability undergoes a phase transition for commonly-used random user signatures as \u0000<inline-formula><tex-math>$L^{2}$</tex-math></inline-formula>\u0000, \u0000<inline-formula><tex-math>$N$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$K$</tex-math></inline-formula>\u0000 tend to infinity with fixed ratios, where \u0000<inline-formula><tex-math>$L$</tex-math></inline-formula>\u0000, \u0000<inline-formula><tex-math>$N$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$K$</tex-math></inline-formula>\u0000 denote the user signature length, the total number of users, and the number of active users, respectively. In this letter, we provide a precise analytical characterization of the phase transition based on a spectral universality conjecture. Numerical results demonstrate excellent agreement between our theoretical predictions and the empirical phase transitions.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3184-3188"},"PeriodicalIF":3.2,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142671996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gradient-Level Differential Privacy Against Attribute Inference Attack for Speech Emotion Recognition","authors":"Haijiao Chen;Huan Zhao;Zixing Zhang","doi":"10.1109/LSP.2024.3490379","DOIUrl":"https://doi.org/10.1109/LSP.2024.3490379","url":null,"abstract":"The Federated Learning (FL) paradigm for distributed privacy preservation is valued for its ability to collaboratively train Speech Emotion Recognition (SER) models while keeping data localized. However, recent studies reveal privacy leakage in the model sharing process. Existing differential privacy schemes face increasing inference attack risks as clients expose more model updates. To address these challenges, we propose a \u0000<underline>G</u>\u0000radient-level \u0000<underline>H</u>\u0000ierarchical \u0000<underline>D</u>\u0000ifferential \u0000<underline>P</u>\u0000rivacy (GHDP) strategy to mitigate attribute inference attacks. GHDP employs normalization to distinguish gradient importance, clipping significant gradients and filtering out sensitive information that may lead to privacy leaks. Additionally, increased random perturbations are applied to early model layers during backpropagation, achieving hierarchical differential privacy through layered noise addition. This theoretically grounded approach offers enhanced protection for critical information. Our experiments show that GHDP maintains stable SER performance while providing robust privacy protection, unaffected by the number of model updates.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3124-3128"},"PeriodicalIF":3.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142671123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}