{"title":"A $\\mathcal{K}$-Divergence Based Approach for Robust Regression Analysis","authors":"Yair Sorek;Koby Todros","doi":"10.1109/TSP.2025.3585830","DOIUrl":null,"url":null,"abstract":"This paper deals with the problem of robust regression analysis in the presence of outliers in <italic>both</i> input and output data sets. In this context, we infer the input-output relation of a system by minimizing a new robust loss between the outputs and a presumed parametric function of the inputs. The considered loss is derived from an empirical estimate of a non-trivially modified version of the recently developed <inline-formula><tex-math>$\\mathcal{K}$</tex-math></inline-formula>-divergence (adapted here for regression analysis). This modified version utilizes a model-free data-weighting mechanism based on Parzen’s non-parametric <inline-formula><tex-math>$\\mathcal{K}$</tex-math></inline-formula>ernel density estimator, associated with the underlying joint distribution of the input and output data. The considered Parzen’s estimator involves two strictly positive smoothing “<inline-formula><tex-math>$\\mathcal{K}$</tex-math></inline-formula>”ernel functions. These are defined independently across the input and output domains with possibly different bandwidth parameters. This data-weighting strategy leads to mitigation of low-density contaminations attributed to different types of input and output outlying measurements. The considered approach is illustrated for robust training of GELU neural networks, with applications to function approximation and time-series prediction.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"73 ","pages":"3253-3269"},"PeriodicalIF":5.8000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11075545/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This paper deals with the problem of robust regression analysis in the presence of outliers in both input and output data sets. In this context, we infer the input-output relation of a system by minimizing a new robust loss between the outputs and a presumed parametric function of the inputs. The considered loss is derived from an empirical estimate of a non-trivially modified version of the recently developed $\mathcal{K}$-divergence (adapted here for regression analysis). This modified version utilizes a model-free data-weighting mechanism based on Parzen’s non-parametric $\mathcal{K}$ernel density estimator, associated with the underlying joint distribution of the input and output data. The considered Parzen’s estimator involves two strictly positive smoothing “$\mathcal{K}$”ernel functions. These are defined independently across the input and output domains with possibly different bandwidth parameters. This data-weighting strategy leads to mitigation of low-density contaminations attributed to different types of input and output outlying measurements. The considered approach is illustrated for robust training of GELU neural networks, with applications to function approximation and time-series prediction.
期刊介绍:
The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.