IEEE open journal of signal processing最新文献_第3页

On Conditional Independence Graph Learning From Multi-Attribute Gaussian Dependent Time Series 多属性高斯相关时间序列的条件独立图学习

IF 2.9

IEEE open journal of signal processing Pub Date : 2025-06-11 DOI: 10.1109/OJSP.2025.3578807

Jitendra K. Tugnait

{"title":"On Conditional Independence Graph Learning From Multi-Attribute Gaussian Dependent Time Series","authors":"Jitendra K. Tugnait","doi":"10.1109/OJSP.2025.3578807","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3578807","url":null,"abstract":"Estimation of the conditional independence graph (CIG) of high-dimensional multivariate Gaussian time series from multi-attribute data is considered. Existing methods for graph estimation for such data are based on single-attribute models where one associates a scalar time series with each node. In multi-attribute graphical models, each node represents a random vector or vector time series. In this paper we provide a unified theoretical analysis of multi-attribute graph learning for dependent time series using a penalized log-likelihood objective function formulated in the frequency domain using the discrete Fourier transform of the time-domain data. We consider both convex (sparse-group lasso) and non-convex (log-sum and SCAD group penalties) penalty/regularization functions. We establish sufficient conditions in a high-dimensional setting for consistency (convergence of the inverse power spectral density to true value in the Frobenius norm), local convexity when using non-convex penalties, and graph recovery. We do not impose any incoherence or irrepresentability condition for our convergence results. We also empirically investigate selection of the tuning parameters based on the Bayesian information criterion, and illustrate our approach using numerical examples utilizing both synthetic and real data.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"705-721"},"PeriodicalIF":2.9,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11030300","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Random Matrix Theory Predictions of Dominant Mode Rejection SINR Loss due to Signal in the Training Data 随机矩阵理论预测训练数据中信号导致的优势模抑制SINR损失

IF 2.9

IEEE open journal of signal processing Pub Date : 2025-06-11 DOI: 10.1109/OJSP.2025.3578812

Christopher C. Hulbert;Kathleen E. Wage

{"title":"Random Matrix Theory Predictions of Dominant Mode Rejection SINR Loss due to Signal in the Training Data","authors":"Christopher C. Hulbert;Kathleen E. Wage","doi":"10.1109/OJSP.2025.3578812","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3578812","url":null,"abstract":"Detection and estimation performance depends on signal-to-interference-plus-noise ratio (SINR) at the output of an array. The Capon beamformer (BF) designed with ensemble statistics achieves the optimum SINR in stationary environments. Adaptive BFs compute their weights using the sample covariance matrix (SCM) obtained from snapshots, i.e., training samples. SINR loss, the ratio of adaptive to optimal SINR, quantifies the number of snapshots required to achieve a desired average level of performance. For adaptive Capon BFs that invert the full SCM, Reed et al. derived the SINR loss distribution and Miller quantified how the desired signal’s presence in the snapshots degrades that loss. Abraham and Owsley designed dominant mode rejection (DMR) for cases where the number of snapshots is less than or approximately equal to the number of sensors. DMR’s success in snapshot-starved passive sonar scenarios led to its application in other areas such as hyperspectral sensing and medical imaging. DMR forms a modified SCM as a weighted combination of the identity matrix and the dominant eigensubspace containing the loud interferers, thereby eliminating the inverse of the poorly estimated noise subspace. This work leverages recent random matrix theory (RMT) results to develop DMR performance predictions under the assumption that the desired signal is contained in the training data. Using white noise gain and interference suppression predictions, the paper derives a lower bound on DMR’s average SINR loss and confirms its accuracy using Monte Carlo simulations. Moreover, this paper creates a new eigensubspace leakage estimator applicable to broader RMT applications.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"735-752"},"PeriodicalIF":2.9,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11030297","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144550496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The First Cadenza Challenges: Using Machine Learning Competitions to Improve Music for Listeners With a Hearing Loss 第一个华彩挑战：使用机器学习比赛来改善听力损失听众的音乐

IF 2.9

IEEE open journal of signal processing Pub Date : 2025-06-10 DOI: 10.1109/OJSP.2025.3578299

Gerardo Roa-Dabike;Michael A. Akeroyd;Scott Bannister;Jon P. Barker;Trevor J. Cox;Bruno Fazenda;Jennifer Firth;Simone Graetzer;Alinka Greasley;Rebecca R. Vos;William M. Whitmer

{"title":"The First Cadenza Challenges: Using Machine Learning Competitions to Improve Music for Listeners With a Hearing Loss","authors":"Gerardo Roa-Dabike;Michael A. Akeroyd;Scott Bannister;Jon P. Barker;Trevor J. Cox;Bruno Fazenda;Jennifer Firth;Simone Graetzer;Alinka Greasley;Rebecca R. Vos;William M. Whitmer","doi":"10.1109/OJSP.2025.3578299","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3578299","url":null,"abstract":"Listening to music can be an issue for those with a hearing impairment, and hearing aids are not a universal solution. This paper details the first use of an open challenge methodology to improve the audio quality of music for those with hearing loss through machine learning. The first challenge (CAD1) had 9 participants. The second was a 2024 ICASSP grand challenge (ICASSP24), which attracted 17 entrants. The challenge tasks concerned demixing and remixing pop/rock music to allow a personalized rebalancing of the instruments in the mix, along with amplification to correct for raised hearing thresholds. The software baselines provided for entrants to build upon used two state-of-the-art demix algorithms: Hybrid Demucs and Open-Unmix. Objective evaluation used HAAQI, the Hearing-Aid Audio Quality Index. No entries improved on the best baseline in CAD1. It is suggested that this arose because demixing algorithms are relatively mature, and recent work has shown that access to large (private) datasets is needed to further improve performance. Learning from this, for ICASSP24 the scenario was made more difficult by using loudspeaker reproduction and specifying gains to be applied before remixing. This also made the scenario more useful for listening through hearing aids. Nine entrants scored better than the best ICASSP24 baseline. Most of the entrants used a refined version of Hybrid Demucs and NAL-R amplification. The highest scoring system combined the outputs of several demixing algorithms in an ensemble approach. These challenges are now open benchmarks for future research with freely available software and data.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"722-734"},"PeriodicalIF":2.9,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11030066","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144536564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AVCaps: An Audio-Visual Dataset With Modality-Specific Captions AVCaps：具有模态特定标题的视听数据集

IF 2.9

IEEE open journal of signal processing Pub Date : 2025-06-09 DOI: 10.1109/OJSP.2025.3578296

Parthasaarathy Sudarsanam;Irene Martín-Morató;Aapo Hakala;Tuomas Virtanen

{"title":"AVCaps: An Audio-Visual Dataset With Modality-Specific Captions","authors":"Parthasaarathy Sudarsanam;Irene Martín-Morató;Aapo Hakala;Tuomas Virtanen","doi":"10.1109/OJSP.2025.3578296","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3578296","url":null,"abstract":"This paper introduces AVCaps, an audio-visual dataset that contains separate textual captions for the audio, visual, and audio-visual contents of video clips. The dataset contains 2061 video clips constituting a total of 28.8 hours. We provide up to 5 captions for the audio, visual, and audio-visual content of each clip, crowdsourced separately. Existing datasets focus on a single modality or do not provide modality-specific captions, limiting the study of how each modality contributes to overall comprehension in multimodal settings. Our dataset addresses this critical gap in multimodal research by offering a resource for studying how audio and visual content are captioned individually, as well as how audio-visual content is captioned in relation to these individual modalities. Crowdsourced audio-visual captions are prone to favor visual content over audio content. To avoid this we use large language models (LLMs) to generate three balanced audio-visual captions for each clip based on the crowdsourced captions. We present captioning and retrieval experiments to illustrate the effectiveness of modality-specific captions in evaluating model performance. Specifically, we show that the modality-specific captions allow us to quantitatively assess how well a model understands audio and visual information from a given video. Notably, we find that a model trained on the balanced LLM-generated audio-visual captions captures audio information more effectively compared to a model trained on crowdsourced audio-visual captions. This model achieves a 14% higher Sentence-BERT similarity on crowdsourced audio captions compared to a model trained on crowdsourced audio-visual captions, which are typically more biased towards visual information. We also discuss the possibilities in multimodal representation learning, question answering, developing new video captioning metrics, and generative AI that this dataset unlocks. The dataset is available publicly at Zenodo and Hugging Face.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"691-704"},"PeriodicalIF":2.9,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11029114","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144511206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised Action Anticipation Through Action Cluster Prediction 通过动作聚类预测的无监督动作预测

IF 2.9

IEEE open journal of signal processing Pub Date : 2025-06-09 DOI: 10.1109/OJSP.2025.3578300

Jiuxu Chen;Nupur Thakur;Sachin Chhabra;Baoxin Li

{"title":"Unsupervised Action Anticipation Through Action Cluster Prediction","authors":"Jiuxu Chen;Nupur Thakur;Sachin Chhabra;Baoxin Li","doi":"10.1109/OJSP.2025.3578300","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3578300","url":null,"abstract":"Predicting near-future human actions in videos has become a focal point of research, driven by applications such as human-helping robotics, collaborative AI services, and surveillance video analysis. However, the inherent challenge lies in deciphering the complex spatial-temporal dynamics inherent in typical video feeds. While existing works excel in constrained settings with fine-grained action ground-truth labels, the general unavailability of such labeling at the frame level poses a significant hurdle. In this paper, we present an innovative solution to anticipate future human actions without relying on any form of supervision. Our approach involves generating pseudo-labels for video frames through the clustering of frame-wise visual features. These pseudo-labels are then input into a temporal sequence modeling module that learns to predict future actions in terms of pseudo-labels. Apart from the action anticipation method, we propose an innovative evaluation scheme GreedyMapper, a unique many-to-one mapping scheme that provides a practical solution to the many-to-one mapping challenge, a task that existing mapping algorithms struggle to address. Through comprehensive experimentation conducted on demanding real-world cooking datasets, our unsupervised method demonstrates superior performance compared to weakly-supervised approaches by a significant margin on the 50Salads dataset. When applied to the Breakfast dataset, our approach yields strong performance compared to the baselines in an unsupervised setting and delivers competitive results to (weakly) supervised methods under a similar setting.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"641-650"},"PeriodicalIF":2.9,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11029147","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144366940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Federated Learning With Automated Dual-Level Hyperparameter Tuning 自动双级超参数调优的联邦学习

IF 2.9

IEEE open journal of signal processing Pub Date : 2025-06-09 DOI: 10.1109/OJSP.2025.3578273

Rakib Ul Haque;Panagiotis Markopoulos

{"title":"Federated Learning With Automated Dual-Level Hyperparameter Tuning","authors":"Rakib Ul Haque;Panagiotis Markopoulos","doi":"10.1109/OJSP.2025.3578273","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3578273","url":null,"abstract":"Federated Learning (FL) is a decentralized machine learning (ML) approach where multiple clients collaboratively train a shared model over several update rounds without exchanging local data. Similar to centralized learning, determining hyperparameters (HPs) like learning rate and batch size remains challenging yet critical for model performance. Current adaptive HP-tuning methods are often domain-specific and heavily influenced by initialization. Moreover, model accuracy often improves slowly, requiring many update rounds. This slow improvement is particularly problematic for FL, where each update round incurs high communication costs in addition to computation and energy costs. In this work, we introduce FLAUTO, the first method to perform dynamic HP-tuning simultaneously at both local (client) and global (server) levels. This dual-level adaptation directly addresses critical bottlenecks in FL, including slow convergence, client heterogeneity, and high communication costs, distinguishing it from existing approaches. FLAUTO leverages training loss and relative local model deviation as novel metrics, enabling robust and dynamic hyperparameter adjustments without reliance on initial guesses. By prioritizing high performance in early update rounds, FLAUTO significantly reduces communication and energy overhead—key challenges in FL deployments. Comprehensive experimental studies on image classification and object detection tasks demonstrate that FLAUTO consistently outperforms state-of-the-art methods, establishing its efficacy and broad applicability.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"795-802"},"PeriodicalIF":2.9,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11029096","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144634874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multidimensional Polynomial Phase Estimation 多维多项式相位估计

IF 2.9

IEEE open journal of signal processing Pub Date : 2025-06-06 DOI: 10.1109/OJSP.2025.3577503

Heedong Do;Namyoon Lee;Angel Lozano

引用次数: 0

Mask Optimization for Image Inpainting Using No-Reference Image Quality Assessment 使用无参考图像质量评估的图像绘制蒙版优化

IF 2.7

IEEE open journal of signal processing Pub Date : 2025-06-05 DOI: 10.1109/OJSP.2025.3577089

Taiki Uchiyama;Mariko Isogawa

引用次数: 0

Enhancing Learning-Based Cross-Modality Prediction for Lossless Medical Imaging Compression

IF 2.9

IEEE open journal of signal processing Pub Date : 2025-04-28 DOI: 10.1109/OJSP.2025.3564830

Daniel S. Nicolau;Lucas A. Thomaz;Luis M. N. Tavora;Sergio M. M. Faria

{"title":"Enhancing Learning-Based Cross-Modality Prediction for Lossless Medical Imaging Compression","authors":"Daniel S. Nicolau;Lucas A. Thomaz;Luis M. N. Tavora;Sergio M. M. Faria","doi":"10.1109/OJSP.2025.3564830","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3564830","url":null,"abstract":"Multimodal medical imaging, which involves the simultaneous acquisition of different modalities, enhances diagnostic accuracy and provides comprehensive visualization of anatomy and physiology. However, this significantly increases data size, posing storage and transmission challenges. Standard image codecs fail to properly exploit cross-modality redundancies, limiting coding efficiency. In this paper, a novel approach is proposed to enhance the compression gain and to reduce the computational complexity of a lossless cross-modality coding scheme for multimodal image pairs. The scheme uses a deep learning-based approach with Image-to-Image translation based on a Generative Adversarial Network architecture to generate an estimated image of one modality from its cross-modal pair. Two different approaches for inter-modal prediction are considered: one using the original and the estimated images for the inter-prediction scheme and another considering a weighted sum of both images. Subsequently, a decider based on a Convolutional Neural Network is employed to estimate the best coding approach to be selected among the two alternatives, before the coding step. A novel loss function that considers the decision accuracy and the compression gain of the chosen prediction approach is applied to improve the decision-making task. The experimental results on PET-CT and PET-MRI datasets demonstrate that the proposed approach improves by 11.76% and 4.61% the compression efficiency when compared with the single modality intra-coding of the Versatile Video Coding. Additionally, this approach allows to reduce the computational complexity by almost half in comparison to selecting the most compression-efficient after testing both schemes.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"489-497"},"PeriodicalIF":2.9,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10978054","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143943910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Content-Adaptive Inference for State-of-the-Art Learned Video Compression

IF 2.9

IEEE open journal of signal processing Pub Date : 2025-04-28 DOI: 10.1109/OJSP.2025.3564817

Ahmet Bilican;M. Akın Yılmaz;A. Murat Tekalp

{"title":"Content-Adaptive Inference for State-of-the-Art Learned Video Compression","authors":"Ahmet Bilican;M. Akın Yılmaz;A. Murat Tekalp","doi":"10.1109/OJSP.2025.3564817","DOIUrl":"https://doi.org/10.1109/OJSP.2025.3564817","url":null,"abstract":"While the BD-rate performance of recent learned video codec models in both low-delay and random-access modes exceed that of respective modes of traditional codecs on average over common benchmarks, the performance improvements for individual videos with complex/large motions is much smaller compared to scenes with simple motion. This is related to the inability of a learned encoder model to generalize to motion vector ranges that have not been seen in the training set, which causes loss of performance in both coding of flow fields as well as frame prediction and coding. As a remedy, we propose a generic (model-agnostic) framework to control the scale of motion vectors in a scene during inference (encoding) to approximately match the range of motion vectors in the test and training videos by adaptively downsampling frames. This results in down-scaled motion vectors enabling: i) better flow estimation; hence, frame prediction and ii) more efficient flow compression. We show that the proposed framework for content-adaptive inference improves the BD-rate performance of already state-of-the-art low-delay video codec DCVC-FM by up to 41% on individual videos without any model fine tuning. We present ablation studies to show measures of motion and scene complexity can be used to predict the effectiveness of the proposed framework.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"498-506"},"PeriodicalIF":2.9,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10978087","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143943980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0