{"title":"SDAT: Sub-Dataset Alternation Training for Improved Image Demosaicing","authors":"Yuval Becker;Raz Z. Nossek;Tomer Peleg","doi":"10.1109/OJSP.2024.3395179","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3395179","url":null,"abstract":"Image demosaicing is an important step in the image processing pipeline for digital cameras. In data centric approaches, such as deep learning, the distribution of the dataset used for training can impose a bias on the networks' outcome. For example, in natural images most patches are smooth, and high-content patches are much rarer. This can lead to a bias in the performance of demosaicing algorithms. Most deep learning approaches address this challenge by utilizing specific losses or designing special network architectures. We propose a novel approach \u0000<bold>SDAT</b>\u0000, Sub-Dataset Alternation Training, that tackles the problem from a training protocol perspective. SDAT is comprised of two essential phases. In the initial phase, we employ a method to create sub-datasets from the entire dataset, each inducing a distinct bias. The subsequent phase involves an alternating training process, which uses the derived sub-datasets in addition to training also on the entire dataset. SDAT can be applied regardless of the chosen architecture as demonstrated by various experiments we conducted for the demosaicing task. The experiments are performed across a range of architecture sizes and types, namely CNNs and transformers. We show improved performance in all cases. We are also able to achieve state-of-the-art results on three highly popular image demosaicing benchmarks.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"611-620"},"PeriodicalIF":0.0,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10510581","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140919094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan Shani;Tom Tirer;Raja Giryes;Tamir Bendory
{"title":"Denoiser-Based Projections for 2D Super-Resolution MRA","authors":"Jonathan Shani;Tom Tirer;Raja Giryes;Tamir Bendory","doi":"10.1109/OJSP.2024.3394369","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3394369","url":null,"abstract":"We study the 2D super-resolution multi-reference alignment (SR-MRA) problem: estimating an image from its down-sampled, circularly translated, and noisy copies. The SR-MRA problem serves as a mathematical abstraction of the structure determination problem for biological molecules. Since the SR-MRA problem is ill-posed without prior knowledge, accurate image estimation relies on designing priors that describe the statistics of the images of interest. In this work, we build on recent advances in image processing and harness the power of denoisers as priors for images. To estimate an image, we propose utilizing denoisers as projections and using them within two computational frameworks that we propose: projected expectation-maximization and projected method of moments. We provide an efficient GPU implementation and demonstrate the effectiveness of these algorithms through extensive numerical experiments on a wide range of parameters and images.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"621-629"},"PeriodicalIF":0.0,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10508939","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141078833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marc García-Bermúdez;Jordi Solé-Lloveras;Martin Hudlička;Marco A. Azpúrua
{"title":"Evaluation of Spectral Estimation Parameters for Direct Sampling FFT-Based Measuring Receivers","authors":"Marc García-Bermúdez;Jordi Solé-Lloveras;Martin Hudlička;Marco A. Azpúrua","doi":"10.1109/OJSP.2024.3389825","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3389825","url":null,"abstract":"The standard CISPR 16-1-1 defines the measuring receiver using a black-box approach and sets requirements for its accuracy and spectral properties. Traditionally, such test receivers were developed using a superheterodyne architecture. Recently, time-domain electromagnetic emission measurement systems have been built employing direct sampling instruments, mainly oscilloscopes, and relying on specific signal processing to emulate the performance of compliant instruments. In these cases, the short-time Fourier transform is used for spectral estimation, but the corresponding electromagnetic compatibility standards lack details for its correct use with respect to parameters such as windowing function, overlapping factor, and frequency interpolation. Moreover, it is unclear which combination of spectral estimation parameters is best fit for this purpose. Obtaining reliable, consistent and low uncertainty spectral estimates of electromagnetic emissions measured in time-domain needs appropriate configuration and tuning of the signal processing algorithms. This paper investigates the error in the calculated spectrum for various reference signals: multitone, chirp pulses and rectangular pulses. The analysis is carried out for each CISPR band from A to D, that is, between 9 kHz and 1 GHz. After \u0000<inline-formula><tex-math>$489.6times 10^{3}$</tex-math></inline-formula>\u0000 iterations, distributed in 1700 different digital implementations of the CISPR 16-1-1 measuring receiver, the simulations outcomes point to certain sets of parameters that showed satisfactory performance overall, being the Nutall, Kaiser, and Parzen windows with more than 75% of overlapping and using interpolation factor higher than 5, generally suitable. Calibration results are used to experimentally verify that a valid set of parameters is adequate to fulfil CISPR 16-1-1 requirements.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"588-598"},"PeriodicalIF":0.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10502171","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140818796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse Index Tracking: Simultaneous Asset Selection and Capital Allocation via ℓ0 -Constrained Portfolio","authors":"Eisuke Yamagata;Shunsuke Ono","doi":"10.1109/OJSP.2024.3389810","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3389810","url":null,"abstract":"Sparse index tracking is a prominent passive portfolio management strategy that constructs a sparse portfolio to track a financial index. A sparse portfolio is preferable to a full portfolio in terms of reducing transaction costs and avoiding illiquid assets. To achieve portfolio sparsity, conventional studies have utilized \u0000<inline-formula><tex-math>$ell _{p}$</tex-math></inline-formula>\u0000-norm regularizations as a continuous surrogate of the \u0000<inline-formula><tex-math>$ell _{0}$</tex-math></inline-formula>\u0000-norm regularization. Although these formulations can construct sparse portfolios, their practical application is challenging due to the intricate and time-consuming process of tuning parameters to define the precise upper limit of assets in the portfolio. In this paper, we propose a new problem formulation of sparse index tracking using an \u0000<inline-formula><tex-math>$ell _{0}$</tex-math></inline-formula>\u0000-norm constraint that enables easy control of the upper bound on the number of assets in the portfolio. Moreover, our approach offers a choice between constraints on portfolio and turnover sparsity, further reducing transaction costs by limiting asset updates at each rebalancing interval. Furthermore, we develop an efficient algorithm for solving this problem based on a primal-dual splitting method. Finally, we illustrate the effectiveness of the proposed method through experiments on the S&P500 and Russell3000 index datasets.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"810-819"},"PeriodicalIF":2.9,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10502015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MMSFormer: Multimodal Transformer for Material and Semantic Segmentation","authors":"Md Kaykobad Reza;Ashley Prater-Bennette;M. Salman Asif","doi":"10.1109/OJSP.2024.3389812","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3389812","url":null,"abstract":"Leveraging information across diverse modalities is known to enhance performance on multimodal segmentation tasks. However, effectively fusing information from different modalities remains challenging due to the unique characteristics of each modality. In this paper, we propose a novel fusion strategy that can effectively fuse information from different modality combinations. We also propose a new model named \u0000<underline>M</u>\u0000ulti-\u0000<underline>M</u>\u0000odal \u0000<underline>S</u>\u0000egmentation Trans\u0000<underline>Former</u>\u0000 (MMSFormer) that incorporates the proposed fusion strategy to perform multimodal material and semantic segmentation tasks. MMSFormer outperforms current state-of-the-art models on three different datasets. As we begin with only one input modality, performance improves progressively as additional modalities are incorporated, showcasing the effectiveness of the fusion block in combining useful information from diverse input modalities. Ablation studies show that different modules in the fusion block are crucial for overall model performance. Furthermore, our ablation studies also highlight the capacity of different input modalities to improve performance in the identification of different types of materials.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"599-610"},"PeriodicalIF":0.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10502124","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140822028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contextual Multi-Armed Bandit With Costly Feature Observation in Non-Stationary Environments","authors":"Saeed Ghoorchian;Evgenii Kortukov;Setareh Maghsudi","doi":"10.1109/OJSP.2024.3389809","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3389809","url":null,"abstract":"Maximizing long-term rewards is the primary goal in sequential decision-making problems. The majority of existing methods assume that side information is freely available, enabling the learning agent to observe all features' states before making a decision. In real-world problems, however, collecting beneficial information is often costly. That implies that, besides individual arms' reward, learning the observations of the features' states is essential to improve the decision-making strategy. The problem is aggravated in a non-stationary environment where reward and cost distributions undergo abrupt changes over time. To address the aforementioned dual learning problem, we extend the contextual bandit setting and allow the agent to observe subsets of features' states. The objective is to maximize the long-term average gain, which is the difference between the accumulated rewards and the paid costs on average. Therefore, the agent faces a trade-off between minimizing the cost of information acquisition and possibly improving the decision-making process using the obtained information. To this end, we develop an algorithm that guarantees a sublinear regret in time. Numerical results demonstrate the superiority of our proposed policy in a real-world scenario.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"820-830"},"PeriodicalIF":2.9,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10502231","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141447972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS","authors":"Myeongjin Ko;Euiyeon Kim;Yong-Hoon Choi","doi":"10.1109/OJSP.2024.3386495","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3386495","url":null,"abstract":"The diffusion model is capable of generating high-quality data through a probabilistic approach. However, it suffers from the drawback of slow generation speed due to its requirement for many time steps. To address this limitation, recent models such as denoising diffusion implicit models (DDIM) focus on sample generation without explicitly modeling the entire probability distribution, while models like denoising diffusion generative adversarial networks (GAN) combine diffusion processes with GANs. In the field of speech synthesis, a recent diffusion speech synthesis model called DiffGAN-TTS, which utilizes the structure of GANs, has been introduced and demonstrates superior performance in both speech quality and generation speed. In this paper, to further enhance the performance of DiffGAN-TTS, we propose a speech synthesis model with two discriminators: a diffusion discriminator to learn the distribution of the reverse process, and a spectrogram discriminator to learn the distribution of the generated data. Objective metrics such as the structural similarity index measure (SSIM), mel-cepstral distortion (MCD), F0 root mean squared error (F0- RMSE), phoneme error rate (PER), word error rate (WER), as well as subjective metrics like mean opinion score (MOS), are used to evaluate the performance of the proposed model. The evaluation results demonstrate that our model matches or exceeds recent state-of-the-art models like FastSpeech 2 and DiffGAN-TTS across various metrics. Our code and audio samples are available on GitHub.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"577-587"},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10494889","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140647820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Non-Stationary Linear Bandits With Dimensionality Reduction for Large-Scale Recommender Systems","authors":"Saeed Ghoorchian;Evgenii Kortukov;Setareh Maghsudi","doi":"10.1109/OJSP.2024.3386490","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3386490","url":null,"abstract":"Taking advantage ofcontextual information can potentially boost the performance of recommender systems. In the era of Big Data, such side information often has several dimensions. Thus, developing decision-making algorithms to cope with such a high-dimensional context in real time is essential. That is specifically challenging when the decision-maker has a variety of items to recommend. In addition, changes in items' popularity or users' preferences can hinder the performance of the deployed recommender system due to a lack of robustness to distribution shifts in the environment. In this paper, we build upon the linear contextual multi-armed bandit framework to address this problem. We develop a decision-making policy for a linear bandit problem with high-dimensional feature vectors, a large set of arms, and non-stationary reward-generating processes. Our Thompson sampling-based policy reduces the dimension of feature vectors using random projection and uses exponentially increasing weights to decrease the influence of past observations with time. Our proposed recommender system employs this policy to learn the users' item preferences online while minimizing runtime. We prove a regret bound that scales as a factor of the reduced dimension instead of the original one. To evaluate our proposed recommender system numerically, we apply it to three real-world datasets. The theoretical and numerical results demonstrate the effectiveness of our proposed algorithm in making a trade-off between computational complexity and regret performance compared to the state-of-the-art.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"548-558"},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10494875","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140633561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Body Motion Segmentation via Multilayer Graph Processing for Wearable Sensor Signals","authors":"Qinwen Deng;Songyang Zhang;Zhi Ding","doi":"10.1109/OJSP.2024.3407662","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3407662","url":null,"abstract":"Human body motion segmentation plays a major role in many applications, ranging from computer vision to robotics. Among a variety of algorithms, graph-based approaches have demonstrated exciting potential in motion analysis owing to their power to capture the underlying correlations among joints. However, most existing works focus on simpler single-layer geometric structures, whereas multi-layer spatial-temporal graph structure can provide more informative results. To provide an interpretable analysis on multilayer spatial-temporal structures, we revisit the emerging field of multilayer graph signal processing (M-GSP), and propose novel approaches based on M-GSP to human motion segmentation. Specifically, we model the spatial-temporal relationships via multilayer graphs (MLG) and introduce M-GSP spectrum analysis for feature extraction. We present two different M-GSP based algorithms for unsupervised segmentation in the MLG spectrum and vertex domains, respectively. Our experimental results demonstrate the robustness and effectiveness of our proposed methods.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"934-947"},"PeriodicalIF":2.9,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10542374","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141993917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptable L4S Congestion Control for Cloud-Based Real-Time Streaming Over 5G","authors":"Jangwoo Son;Yago Sanchez;Cornelius Hellge;Thomas Schierl","doi":"10.1109/OJSP.2024.3405719","DOIUrl":"https://doi.org/10.1109/OJSP.2024.3405719","url":null,"abstract":"Achieving reliable low-latency streaming on real-time immersive services that require seamless interaction has been of increasing importance recently. To cope with such an immersive service requirement, IETF and 3GPP defined Low Latency, Low Loss, and Scalable Throughput (L4S) architecture and terminologies to enable delay-critical applications to achieve low congestion and scalable bitrate control over 5G. With low-latency applications in mind, this paper presents a cloud-based streaming system using WebRTC for real-time communication with an adaptable L4S congestion control (aL4S-CC). aL4S-CC is designed to prevent the target service from surpassing a required end-to-end latency. It is evaluated against existing congestion controls GCC and ScreamV2 across two configurations: 1) standard L4S (sL4S) which has no knowledge of Explicit Congestion Notification (ECN) marking scheme information; 2) conscious L4S (cL4S) which recognizes the ECN marking scheme information. The results show that aL4S-CC achieves high link utilization with low latency while maintaining good performance in terms of fairness, and cL4S improves sL4S's performance by having an efficient trade-off between link utilization and latency. In the entire simulation, the gain of link utilization on cL4S is 1.4%, 4%, and 17.9% on average compared to sL4S, GCC, and ScreamV2, respectively, and the ratio of duration exceeding the target queuing delay achieves the lowest values of 1% and 0.9% for cL4S and sL4S, respectively.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"5 ","pages":"841-849"},"PeriodicalIF":2.9,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10539241","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}