{"title":"有限数据污染下的广义核密度估计","authors":"Jerome Krief","doi":"10.1016/j.cam.2025.116937","DOIUrl":null,"url":null,"abstract":"<div><div>This paper treats the deconvolution model <span><math><mrow><mi>Y</mi><mo>=</mo><mi>X</mi><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>−</mo><mi>S</mi><mo>)</mo></mrow><mi>U</mi></mrow></math></span>, where <span><math><mi>X</mi></math></span> has Lebesgue density <span><math><msub><mrow><mi>f</mi></mrow><mrow><mi>X</mi></mrow></msub></math></span>, <span><math><mi>U</mi></math></span> has a known distribution, and <span><math><mi>S</mi></math></span> has a known Bernoulli distribution with <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>∈</mo><mrow><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span>. The aim is to estimate <span><math><msub><mrow><mi>f</mi></mrow><mrow><mi>X</mi></mrow></msub></math></span> using observations from <span><math><mi>Y</mi></math></span>. Unlike the classic deconvolution model where <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>=</mo><mn>0</mn></mrow></math></span> (Fan 1991, Annals of Statistics), this estimation problem is well-posed. This substantially reduces the difficulty of the estimation problem. Existing estimators require the characteristic function of <span><math><mi>U</mi></math></span> to be real-valued or else <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>></mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></math></span> but the implementation in that case requires selecting three tuning parameters which is not very appealing in applied works. I present an easily implementable nonparametric methodology which removes these restrictions concerning the distribution of <span><math><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>U</mi><mo>,</mo><mi>S</mi><mo>)</mo></mrow></math></span>. If <span><math><mi>U</mi></math></span> is noisy in a certain sense or else if <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>></mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></math></span> then a target density belonging to the classic Holder class can be identified as the solution of a well-posed Fredholm integral equation of the second kind. The proposed estimator has a Mean Integrated Square Error converging at a rate which is equal to the optimal nonparametric rate without data contamination (i.e. if <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>=</mo><mn>1</mn></mrow></math></span>). Moreover, if the distribution of <span><math><mi>S</mi></math></span> is unknown, then a feasible estimator is proposed assuming that either the first moment or the second moment of <span><math><mi>X</mi></math></span> is known. The feasible estimator has an Integrated Square Error displaying the same speed of convergence in probability. A Monte Carlo experiment reveals good finite-sample properties for the proposed estimators when the distribution of <span><math><mi>U</mi></math></span> is supersmooth or skewed</div></div>","PeriodicalId":50226,"journal":{"name":"Journal of Computational and Applied Mathematics","volume":"474 ","pages":"Article 116937"},"PeriodicalIF":2.6000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generalized kernel density estimation with limited data contamination\",\"authors\":\"Jerome Krief\",\"doi\":\"10.1016/j.cam.2025.116937\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper treats the deconvolution model <span><math><mrow><mi>Y</mi><mo>=</mo><mi>X</mi><mo>+</mo><mrow><mo>(</mo><mn>1</mn><mo>−</mo><mi>S</mi><mo>)</mo></mrow><mi>U</mi></mrow></math></span>, where <span><math><mi>X</mi></math></span> has Lebesgue density <span><math><msub><mrow><mi>f</mi></mrow><mrow><mi>X</mi></mrow></msub></math></span>, <span><math><mi>U</mi></math></span> has a known distribution, and <span><math><mi>S</mi></math></span> has a known Bernoulli distribution with <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>∈</mo><mrow><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span>. The aim is to estimate <span><math><msub><mrow><mi>f</mi></mrow><mrow><mi>X</mi></mrow></msub></math></span> using observations from <span><math><mi>Y</mi></math></span>. Unlike the classic deconvolution model where <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>=</mo><mn>0</mn></mrow></math></span> (Fan 1991, Annals of Statistics), this estimation problem is well-posed. This substantially reduces the difficulty of the estimation problem. Existing estimators require the characteristic function of <span><math><mi>U</mi></math></span> to be real-valued or else <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>></mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></math></span> but the implementation in that case requires selecting three tuning parameters which is not very appealing in applied works. I present an easily implementable nonparametric methodology which removes these restrictions concerning the distribution of <span><math><mrow><mo>(</mo><mi>X</mi><mo>,</mo><mi>U</mi><mo>,</mo><mi>S</mi><mo>)</mo></mrow></math></span>. If <span><math><mi>U</mi></math></span> is noisy in a certain sense or else if <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>></mo><mn>1</mn><mo>/</mo><mn>2</mn></mrow></math></span> then a target density belonging to the classic Holder class can be identified as the solution of a well-posed Fredholm integral equation of the second kind. The proposed estimator has a Mean Integrated Square Error converging at a rate which is equal to the optimal nonparametric rate without data contamination (i.e. if <span><math><mrow><mi>P</mi><mrow><mo>[</mo><mi>S</mi><mo>=</mo><mn>1</mn><mo>]</mo></mrow><mo>=</mo><mn>1</mn></mrow></math></span>). Moreover, if the distribution of <span><math><mi>S</mi></math></span> is unknown, then a feasible estimator is proposed assuming that either the first moment or the second moment of <span><math><mi>X</mi></math></span> is known. The feasible estimator has an Integrated Square Error displaying the same speed of convergence in probability. A Monte Carlo experiment reveals good finite-sample properties for the proposed estimators when the distribution of <span><math><mi>U</mi></math></span> is supersmooth or skewed</div></div>\",\"PeriodicalId\":50226,\"journal\":{\"name\":\"Journal of Computational and Applied Mathematics\",\"volume\":\"474 \",\"pages\":\"Article 116937\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational and Applied Mathematics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0377042725004510\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational and Applied Mathematics","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0377042725004510","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0
摘要
本文处理反卷积模型Y=X+(1−S)U,其中X具有勒贝格密度fX, U具有已知分布,S具有已知伯努利分布,且P[S=1]∈(0,1)。目的是使用y的观测值来估计fX。与P[S=1]=0的经典反卷积模型(Fan 1991, Annals of Statistics)不同,这个估计问题是适定的。这大大降低了估计问题的难度。现有的估计器要求U的特征函数为实值或P[S=1]>1/2,但在这种情况下的实现需要选择三个调谐参数,这在应用工作中不是很有吸引力。我提出了一种易于实现的非参数方法,它消除了有关(X,U,S)分布的这些限制。如果U在一定意义上是有噪声的,或者如果P[S=1]>1/2,则属于经典Holder类的目标密度可以被识别为第二类适定Fredholm积分方程的解。所提出的估计器具有平均积分平方误差,其收敛速率等于无数据污染的最优非参数速率(即,如果P[S=1]=1)。此外,如果S的分布未知,则假设X的一阶矩或二阶矩已知,则提出可行估计量。可行估计量在概率上具有相同的收敛速度的积分平方误差。蒙特卡罗实验表明,当U的分布超光滑或偏态时,所提估计量具有良好的有限样本性质
Generalized kernel density estimation with limited data contamination
This paper treats the deconvolution model , where has Lebesgue density , has a known distribution, and has a known Bernoulli distribution with . The aim is to estimate using observations from . Unlike the classic deconvolution model where (Fan 1991, Annals of Statistics), this estimation problem is well-posed. This substantially reduces the difficulty of the estimation problem. Existing estimators require the characteristic function of to be real-valued or else but the implementation in that case requires selecting three tuning parameters which is not very appealing in applied works. I present an easily implementable nonparametric methodology which removes these restrictions concerning the distribution of . If is noisy in a certain sense or else if then a target density belonging to the classic Holder class can be identified as the solution of a well-posed Fredholm integral equation of the second kind. The proposed estimator has a Mean Integrated Square Error converging at a rate which is equal to the optimal nonparametric rate without data contamination (i.e. if ). Moreover, if the distribution of is unknown, then a feasible estimator is proposed assuming that either the first moment or the second moment of is known. The feasible estimator has an Integrated Square Error displaying the same speed of convergence in probability. A Monte Carlo experiment reveals good finite-sample properties for the proposed estimators when the distribution of is supersmooth or skewed
期刊介绍:
The Journal of Computational and Applied Mathematics publishes original papers of high scientific value in all areas of computational and applied mathematics. The main interest of the Journal is in papers that describe and analyze new computational techniques for solving scientific or engineering problems. Also the improved analysis, including the effectiveness and applicability, of existing methods and algorithms is of importance. The computational efficiency (e.g. the convergence, stability, accuracy, ...) should be proved and illustrated by nontrivial numerical examples. Papers describing only variants of existing methods, without adding significant new computational properties are not of interest.
The audience consists of: applied mathematicians, numerical analysts, computational scientists and engineers.