A cross-validation-based statistical theory for point processes

IF 2.8 2区数学 Q2 BIOLOGY

Biometrika Pub Date : 2023-06-27 DOI:10.1093/biomet/asad041

O. Cronie, M. Moradi, C. Biscio

{"title":"A cross-validation-based statistical theory for point processes","authors":"O. Cronie, M. Moradi, C. Biscio","doi":"10.1093/biomet/asad041","DOIUrl":null,"url":null,"abstract":"\n Motivated by cross-validation’s general ability to reduce overfitting and mean square error, we develop a cross-validation-based statistical theory for general point processes. It is based on the combination of two novel concepts for general point processes: cross-validation and prediction errors. Our cross-validation approach uses thinning to split a point process/pattern into pairs of training and validation sets, while our prediction errors measure discrepancy between two point processes. The new statistical approach, which may be used to model different distributional characteristics, exploits the prediction errors to measure how well a given model predicts validation sets using associated training sets. Having indicated that our new framework generalizes many existing statistical approaches, we then establish different theoretical properties for it, including large sample properties. We further recognize that non-parametric intensity estimation is an instance of Papangelou conditional intensity estimation, which we exploit to apply our new statistical theory to kernel intensity estimation. Using independent thinning-based cross-validation, we numerically show that the new approach substantially outperforms the state of the art in bandwidth selection. Finally, we carry out intensity estimation for a dataset in forestry (Euclidean domain) and a dataset in neurology (linear network).","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrika","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomet/asad041","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Motivated by cross-validation’s general ability to reduce overfitting and mean square error, we develop a cross-validation-based statistical theory for general point processes. It is based on the combination of two novel concepts for general point processes: cross-validation and prediction errors. Our cross-validation approach uses thinning to split a point process/pattern into pairs of training and validation sets, while our prediction errors measure discrepancy between two point processes. The new statistical approach, which may be used to model different distributional characteristics, exploits the prediction errors to measure how well a given model predicts validation sets using associated training sets. Having indicated that our new framework generalizes many existing statistical approaches, we then establish different theoretical properties for it, including large sample properties. We further recognize that non-parametric intensity estimation is an instance of Papangelou conditional intensity estimation, which we exploit to apply our new statistical theory to kernel intensity estimation. Using independent thinning-based cross-validation, we numerically show that the new approach substantially outperforms the state of the art in bandwidth selection. Finally, we carry out intensity estimation for a dataset in forestry (Euclidean domain) and a dataset in neurology (linear network).

查看原文本刊更多论文

基于交叉验证的点过程统计理论

受交叉验证减少过拟合和均方误差的一般能力的启发，我们为一般点过程开发了一种基于交叉验证的统计理论。它基于通用点过程的两个新概念的组合：交叉验证和预测误差。我们的交叉验证方法使用细化将点过程/模式划分为成对的训练集和验证集，而我们的预测误差测量两点过程之间的差异。新的统计方法可用于对不同的分布特征进行建模，利用预测误差来衡量给定模型使用相关训练集预测验证集的效果。在指出我们的新框架概括了许多现有的统计方法后，我们为它建立了不同的理论性质，包括大样本性质。我们进一步认识到，非参数强度估计是Papangelou条件强度估计的一个例子，我们利用它将我们的新统计理论应用于核强度估计。使用基于独立稀疏的交叉验证，我们在数值上表明，新方法在带宽选择方面显著优于现有技术。最后，我们对林业数据集（欧几里得域）和神经病学数据集（线性网络）进行了强度估计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biometrika 生物-生物学

CiteScore

5.50

自引率

3.70%

发文量

审稿时长

6-12 weeks

期刊介绍： Biometrika is primarily a journal of statistics in which emphasis is placed on papers containing original theoretical contributions of direct or potential value in applications. From time to time, papers in bordering fields are also published.