Statistical Analysis and Data Mining最新文献

Bayesian Posterior Interval Calibration to Improve the Interpretability of Observational Studies. 贝叶斯后验区间校准提高观察性研究的可解释性。

IF 3.6 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-12-01 Epub Date: 2024-12-04 DOI: 10.1002/sam.11715

Jami J Mulgrave, David Madigan, George Hripcsak

{"title":"Bayesian Posterior Interval Calibration to Improve the Interpretability of Observational Studies.","authors":"Jami J Mulgrave, David Madigan, George Hripcsak","doi":"10.1002/sam.11715","DOIUrl":"10.1002/sam.11715","url":null,"abstract":"<p><p>Observational healthcare data offer the potential to estimate causal effects of medical products on a large scale. However, the confidence intervals and p-values produced by observational studies only account for random error and fail to account for systematic error. As a consequence, operating characteristics such as confidence interval coverage and Type I error rates often deviate sharply from their nominal values and render interpretation impossible. While there is a longstanding awareness of systematic error in observational studies, analytic approaches to empirically account for systematic error are relatively new. Several authors have proposed approaches using negative controls (also known as \"falsification hypotheses\") and positive controls. The basic idea is to adjust confidence intervals and p-values in light of the bias (if any) detected in the analyses of the negative and positive control. In this work, we propose a Bayesian statistical procedure for posterior interval calibration that uses negative and positive controls. We show that the posterior interval calibration procedure restores nominal characteristics, such as 95% coverage of the true effect size by the 95% posterior interval.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"17 6","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12332498/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144817998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quantifying Epistemic Uncertainty in Binary Classification via Accuracy Gain 通过精度增益量化二元分类中的认识不确定性

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-09-18 DOI: 10.1002/sam.11709

Christopher Qian, Tyler Ganter, Joshua Michalenko, Feng Liang, Jason Adams

引用次数: 0

A new logarithmic multiplicative distortion for correlation analysis 用于相关分析的新对数乘法失真

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-08-23 DOI: 10.1002/sam.11708

Siming Deng, Jun Zhang

引用次数: 0

Revisiting Winnow: A modified online feature selection algorithm for efficient binary classification 重新审视 Winnow：用于高效二元分类的改进型在线特征选择算法

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-07-30 DOI: 10.1002/sam.11707

Y. Narasimhulu, Pralhad Kolambkar, Venkaiah V. China

{"title":"Revisiting Winnow: A modified online feature selection algorithm for efficient binary classification","authors":"Y. Narasimhulu, Pralhad Kolambkar, Venkaiah V. China","doi":"10.1002/sam.11707","DOIUrl":"https://doi.org/10.1002/sam.11707","url":null,"abstract":"Winnow is an efficient binary classification algorithm that effectively learns from data even in the presence of a large number of irrelevant attributes. It is specifically designed for online learning scenarios. Unlike the Perceptron algorithm, Winnow employs a multiplicative weight update function, which leads to fewer mistakes and faster convergence. However, the original Winnow algorithm has several limitations. They include, it only works on binary data, and the weight updates are constant and do not depend on the input features. In this article, we propose a modified version of the Winnow algorithm that addresses these limitations. The proposed algorithm is capable of handling real‐valued data, updates the learning function based on the input feature vector. To evaluate the performance of our proposed algorithm, we compare it with seven existing variants of the Winnow algorithm on datasets of varying sizes. We employ various evaluation metrics and parameters to assess and compare the performance of the algorithms. The experimental results demonstrate that our proposed algorithm outperforms all the other algorithms used for comparison, highlighting its effectiveness in classification tasks.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"78 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A random forest approach for interval selection in functional regression 函数回归中区间选择的随机森林方法

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-07-24 DOI: 10.1002/sam.11705

Rémi Servien, Nathalie Vialaneix

{"title":"A random forest approach for interval selection in functional regression","authors":"Rémi Servien, Nathalie Vialaneix","doi":"10.1002/sam.11705","DOIUrl":"https://doi.org/10.1002/sam.11705","url":null,"abstract":"In this article, we focus on the problem of variable selection in a functional regression framework. This question is motivated by practical applications in the field of agronomy: In this field, identifying the temporal periods during which weather measurements have the greatest impact on yield is critical for guiding agriculture practices in a changing environment. From a methodological point of view, our goal is to identify consecutive measurement points in the definition domain of the functional predictors, which correspond to the most important intervals for the prediction of a numeric output from the functional variables. We propose an approach based on the versatile random forest method that benefits from its good performances for variable selection and prediction. Our method builds in three steps (interval creation, summary, and selection). Different variants for each of the steps are proposed and compared on both simulated and real‐life datasets. The performances of our method compared to alternative approaches highlight its usefulness to select relevant intervals while maintaining good prediction capabilities. All variants of our method are available in the R package SISIR.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"41 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Characterizing climate pathways using feature importance on echo state networks 利用回波状态网络的特征重要性描述气候路径

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-07-23 DOI: 10.1002/sam.11706

Katherine Goode, Daniel Ries, Kellie McClernon

{"title":"Characterizing climate pathways using feature importance on echo state networks","authors":"Katherine Goode, Daniel Ries, Kellie McClernon","doi":"10.1002/sam.11706","DOIUrl":"https://doi.org/10.1002/sam.11706","url":null,"abstract":"The 2022 National Defense Strategy of the United States listed climate change as a serious threat to national security. Climate intervention methods, such as stratospheric aerosol injection, have been proposed as mitigation strategies, but the downstream effects of such actions on a complex climate system are not well understood. The development of algorithmic techniques for quantifying relationships between source and impact variables related to a climate event (i.e., a climate pathway) would help inform policy decisions. Data‐driven deep learning models have become powerful tools for modeling highly nonlinear relationships and may provide a route to characterize climate variable relationships. In this paper, we explore the use of an echo state network (ESN) for characterizing climate pathways. ESNs are a computationally efficient neural network variation designed for temporal data, and recent work proposes ESNs as a useful tool for forecasting spatiotemporal climate data. However, ESNs are noninterpretable black‐box models along with other neural networks. The lack of model transparency poses a hurdle for understanding variable relationships. We address this issue by developing feature importance methods for ESNs in the context of spatiotemporal data to quantify variable relationships captured by the model. We conduct a simulation study to assess and compare the feature importance techniques, and we demonstrate the approach on reanalysis climate data. In the climate application, we consider a time period that includes the 1991 volcanic eruption of Mount Pinatubo. This event was a significant stratospheric aerosol injection, which acts as a proxy for an anthropogenic stratospheric aerosol injection. We are able to use the proposed approach to characterize relationships between pathway variables associated with this event that agree with relationships previously identified by climate scientists.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"43 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Two‐sample testing for random graphs 随机图形的双样本测试

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-06-27 DOI: 10.1002/sam.11703

Xiaoyi Wen

引用次数: 0

Cost‐sensitive classification with time constraint on incomplete data 在不完整数据的时间限制下进行对成本敏感的分类

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-06-25 DOI: 10.1002/sam.11702

Yong‐Shiuan Lee, Chia‐Chi Wu

引用次数: 0

Sequential metamodel‐based approaches to level‐set estimation under heteroscedasticity 基于序列元模型的异方差下水平集估计方法

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-05-29 DOI: 10.1002/sam.11697

Yutong Zhang, Xi Chen

引用次数: 0

Towards accelerating particle‐resolved direct numerical simulation with neural operators 利用神经算子加速粒子分辨直接数值模拟

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2024-05-29 DOI: 10.1002/sam.11690

Mohammad Atif, Vanessa López‐Marrero, Tao Zhang, Abdullah Al Muti Sharfuddin, Kwangmin Yu, Jiaqi Yang, Fan Yang, Foluso Ladeinde, Yangang Liu, Meifeng Lin, Lingda Li

引用次数: 0