{"title":"Generalized kernel density estimation with limited data contamination","authors":"Jerome Krief","doi":"10.1016/j.cam.2025.116937","DOIUrl":null,"url":null,"abstract":"<div><div>This paper treats the deconvolution model <span><math><mrow><mi>Y</mi><mo>=</mo><mi>X</mi><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>−</mo><mi>S</mi><mo>)</mo></mrow><mi>U</mi></mrow></math></span>, where <span><math><mi>X</mi></math></span> has Lebesgue density <span><math><msub><mrow><mi>f</mi></mrow><mrow><mi>X</mi></mrow></msub></math></span>, <span><math><mi>U</mi></math></span> has a known distribution, and <span><math><mi>S</mi></math></span> has a known Bernoulli distribution with <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>∈</mo><mrow><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span>. The aim is to estimate <span><math><msub><mrow><mi>f</mi></mrow><mrow><mi>X</mi></mrow></msub></math></span> using observations from <span><math><mi>Y</mi></math></span>. Unlike the classic deconvolution model where <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>=</mo><mn>0</mn></mrow></math></span> (Fan 1991, Annals of Statistics), this estimation problem is well-posed. This substantially reduces the difficulty of the estimation problem. Existing estimators require the characteristic function of <span><math><mi>U</mi></math></span> to be real-valued or else <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>></mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></math></span> but the implementation in that case requires selecting three tuning parameters which is not very appealing in applied works. I present an easily implementable nonparametric methodology which removes these restrictions concerning the distribution of <span><math><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>U</mi><mo>,</mo><mi>S</mi><mo>)</mo></mrow></math></span>. If <span><math><mi>U</mi></math></span> is noisy in a certain sense or else if <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>></mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></math></span> then a target density belonging to the classic Holder class can be identified as the solution of a well-posed Fredholm integral equation of the second kind. The proposed estimator has a Mean Integrated Square Error converging at a rate which is equal to the optimal nonparametric rate without data contamination (i.e. if <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>=</mo><mn>1</mn></mrow></math></span>). Moreover, if the distribution of <span><math><mi>S</mi></math></span> is unknown, then a feasible estimator is proposed assuming that either the first moment or the second moment of <span><math><mi>X</mi></math></span> is known. The feasible estimator has an Integrated Square Error displaying the same speed of convergence in probability. A Monte Carlo experiment reveals good finite-sample properties for the proposed estimators when the distribution of <span><math><mi>U</mi></math></span> is supersmooth or skewed</div></div>","PeriodicalId":50226,"journal":{"name":"Journal of Computational and Applied Mathematics","volume":"474 ","pages":"Article 116937"},"PeriodicalIF":2.6000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational and Applied Mathematics","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0377042725004510","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
This paper treats the deconvolution model , where has Lebesgue density , has a known distribution, and has a known Bernoulli distribution with . The aim is to estimate using observations from . Unlike the classic deconvolution model where (Fan 1991, Annals of Statistics), this estimation problem is well-posed. This substantially reduces the difficulty of the estimation problem. Existing estimators require the characteristic function of to be real-valued or else but the implementation in that case requires selecting three tuning parameters which is not very appealing in applied works. I present an easily implementable nonparametric methodology which removes these restrictions concerning the distribution of . If is noisy in a certain sense or else if then a target density belonging to the classic Holder class can be identified as the solution of a well-posed Fredholm integral equation of the second kind. The proposed estimator has a Mean Integrated Square Error converging at a rate which is equal to the optimal nonparametric rate without data contamination (i.e. if ). Moreover, if the distribution of is unknown, then a feasible estimator is proposed assuming that either the first moment or the second moment of is known. The feasible estimator has an Integrated Square Error displaying the same speed of convergence in probability. A Monte Carlo experiment reveals good finite-sample properties for the proposed estimators when the distribution of is supersmooth or skewed
期刊介绍:
The Journal of Computational and Applied Mathematics publishes original papers of high scientific value in all areas of computational and applied mathematics. The main interest of the Journal is in papers that describe and analyze new computational techniques for solving scientific or engineering problems. Also the improved analysis, including the effectiveness and applicability, of existing methods and algorithms is of importance. The computational efficiency (e.g. the convergence, stability, accuracy, ...) should be proved and illustrated by nontrivial numerical examples. Papers describing only variants of existing methods, without adding significant new computational properties are not of interest.
The audience consists of: applied mathematicians, numerical analysts, computational scientists and engineers.