Johanna de Haan-Ward, Douglas G. Woolford, Simon J. Bonner
{"title":"利用分层抽样设计的训练数据预测罕见事件,并应用于人为野火预测","authors":"Johanna de Haan-Ward, Douglas G. Woolford, Simon J. Bonner","doi":"10.1002/cjs.70008","DOIUrl":null,"url":null,"abstract":"<p>Response-based sampling is often used in modelling rare events from large, imbalanced data for efficiency. When modelling the event with logistic regression, the sampling design may be adjusted for using sampling weights or an offset. We propose a stratified sampling design for modelling rare events with large data which improves on previous methods by providing unbiased estimates of the standard errors of the coefficients in a multiple logistic regression scenario. We use multiple intercepts to model the incidence in the sampled data, then adjust each intercept via a stratum-specific offset. Our simulations provide no evidence of bias in the estimated logistic regression coefficients or their standard errors. We apply this method to spatio-temporal, fine-scale human-caused fire occurrence modelling for a region in northwestern Ontario, Canada, illustrating how the stratified sampling approach results in more locally precise estimates of fire occurrence.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 3","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70008","citationCount":"0","resultStr":"{\"title\":\"Predicting rare events using training data from stratified sampling designs, with application to human-caused wildfire prediction\",\"authors\":\"Johanna de Haan-Ward, Douglas G. Woolford, Simon J. Bonner\",\"doi\":\"10.1002/cjs.70008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Response-based sampling is often used in modelling rare events from large, imbalanced data for efficiency. When modelling the event with logistic regression, the sampling design may be adjusted for using sampling weights or an offset. We propose a stratified sampling design for modelling rare events with large data which improves on previous methods by providing unbiased estimates of the standard errors of the coefficients in a multiple logistic regression scenario. We use multiple intercepts to model the incidence in the sampled data, then adjust each intercept via a stratum-specific offset. Our simulations provide no evidence of bias in the estimated logistic regression coefficients or their standard errors. We apply this method to spatio-temporal, fine-scale human-caused fire occurrence modelling for a region in northwestern Ontario, Canada, illustrating how the stratified sampling approach results in more locally precise estimates of fire occurrence.</p>\",\"PeriodicalId\":55281,\"journal\":{\"name\":\"Canadian Journal of Statistics-Revue Canadienne De Statistique\",\"volume\":\"53 3\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70008\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Canadian Journal of Statistics-Revue Canadienne De Statistique\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cjs.70008\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian Journal of Statistics-Revue Canadienne De Statistique","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cjs.70008","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Predicting rare events using training data from stratified sampling designs, with application to human-caused wildfire prediction
Response-based sampling is often used in modelling rare events from large, imbalanced data for efficiency. When modelling the event with logistic regression, the sampling design may be adjusted for using sampling weights or an offset. We propose a stratified sampling design for modelling rare events with large data which improves on previous methods by providing unbiased estimates of the standard errors of the coefficients in a multiple logistic regression scenario. We use multiple intercepts to model the incidence in the sampled data, then adjust each intercept via a stratum-specific offset. Our simulations provide no evidence of bias in the estimated logistic regression coefficients or their standard errors. We apply this method to spatio-temporal, fine-scale human-caused fire occurrence modelling for a region in northwestern Ontario, Canada, illustrating how the stratified sampling approach results in more locally precise estimates of fire occurrence.
期刊介绍:
The Canadian Journal of Statistics is the official journal of the Statistical Society of Canada. It has a reputation internationally as an excellent journal. The editorial board is comprised of statistical scientists with applied, computational, methodological, theoretical and probabilistic interests. Their role is to ensure that the journal continues to provide an international forum for the discipline of Statistics.
The journal seeks papers making broad points of interest to many readers, whereas papers making important points of more specific interest are better placed in more specialized journals. The levels of innovation and impact are key in the evaluation of submitted manuscripts.