Statistical Analysis and Data Mining最新文献

筛选
英文 中文
Quantifying Epistemic Uncertainty in Binary Classification via Accuracy Gain 通过精度增益量化二元分类中的认识不确定性
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-09-18 DOI: 10.1002/sam.11709
Christopher Qian, Tyler Ganter, Joshua Michalenko, Feng Liang, Jason Adams
{"title":"Quantifying Epistemic Uncertainty in Binary Classification via Accuracy Gain","authors":"Christopher Qian, Tyler Ganter, Joshua Michalenko, Feng Liang, Jason Adams","doi":"10.1002/sam.11709","DOIUrl":"https://doi.org/10.1002/sam.11709","url":null,"abstract":"Recently, a surge of interest has been given to quantifying epistemic uncertainty (EU), the reducible portion of uncertainty due to lack of data. We propose a novel EU estimator in the binary classification setting, as the posterior expected value of the empirical gain in accuracy between the current prediction and the optimal prediction. In order to validate the performance of our EU estimator, we introduce an experimental procedure where we take an existing dataset, remove a set of points, and compare the estimated EU with the observed change in accuracy. Through real and simulated data experiments, we demonstrate the effectiveness of our proposed EU estimator.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142257108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new logarithmic multiplicative distortion for correlation analysis 用于相关分析的新对数乘法失真
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-08-23 DOI: 10.1002/sam.11708
Siming Deng, Jun Zhang
{"title":"A new logarithmic multiplicative distortion for correlation analysis","authors":"Siming Deng, Jun Zhang","doi":"10.1002/sam.11708","DOIUrl":"https://doi.org/10.1002/sam.11708","url":null,"abstract":"We study the Pearson correlation coefficient in a logarithmic manner under the presence of multiplicative distortion measurement errors. In this context, the observed variables with logarithmic transformation are distorted in multiplicative fashions by an observed confounding variable. The proposed multiplicative distortion model in this paper is applied to analyze positive variables. We utilize the conditional mean calibration and the conditional absolute mean calibration methods to obtain the calibrated variables. Furthermore, we propose confidence intervals based on asymptotic normality, empirical likelihood, and jackknife empirical likelihood. Simulation studies demonstrate the effectiveness of the proposed estimation procedure, and a real‐world example is analyzed to illustrate its practical application.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142199011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting Winnow: A modified online feature selection algorithm for efficient binary classification 重新审视 Winnow:用于高效二元分类的改进型在线特征选择算法
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-07-30 DOI: 10.1002/sam.11707
Y. Narasimhulu, Pralhad Kolambkar, Venkaiah V. China
{"title":"Revisiting Winnow: A modified online feature selection algorithm for efficient binary classification","authors":"Y. Narasimhulu, Pralhad Kolambkar, Venkaiah V. China","doi":"10.1002/sam.11707","DOIUrl":"https://doi.org/10.1002/sam.11707","url":null,"abstract":"Winnow is an efficient binary classification algorithm that effectively learns from data even in the presence of a large number of irrelevant attributes. It is specifically designed for online learning scenarios. Unlike the Perceptron algorithm, Winnow employs a multiplicative weight update function, which leads to fewer mistakes and faster convergence. However, the original Winnow algorithm has several limitations. They include, it only works on binary data, and the weight updates are constant and do not depend on the input features. In this article, we propose a modified version of the Winnow algorithm that addresses these limitations. The proposed algorithm is capable of handling real‐valued data, updates the learning function based on the input feature vector. To evaluate the performance of our proposed algorithm, we compare it with seven existing variants of the Winnow algorithm on datasets of varying sizes. We employ various evaluation metrics and parameters to assess and compare the performance of the algorithms. The experimental results demonstrate that our proposed algorithm outperforms all the other algorithms used for comparison, highlighting its effectiveness in classification tasks.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141871106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A random forest approach for interval selection in functional regression 函数回归中区间选择的随机森林方法
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-07-24 DOI: 10.1002/sam.11705
Rémi Servien, Nathalie Vialaneix
{"title":"A random forest approach for interval selection in functional regression","authors":"Rémi Servien, Nathalie Vialaneix","doi":"10.1002/sam.11705","DOIUrl":"https://doi.org/10.1002/sam.11705","url":null,"abstract":"In this article, we focus on the problem of variable selection in a functional regression framework. This question is motivated by practical applications in the field of agronomy: In this field, identifying the temporal periods during which weather measurements have the greatest impact on yield is critical for guiding agriculture practices in a changing environment. From a methodological point of view, our goal is to identify consecutive measurement points in the definition domain of the functional predictors, which correspond to the most important intervals for the prediction of a numeric output from the functional variables. We propose an approach based on the versatile random forest method that benefits from its good performances for variable selection and prediction. Our method builds in three steps (interval creation, summary, and selection). Different variants for each of the steps are proposed and compared on both simulated and real‐life datasets. The performances of our method compared to alternative approaches highlight its usefulness to select relevant intervals while maintaining good prediction capabilities. All variants of our method are available in the R package SISIR.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterizing climate pathways using feature importance on echo state networks 利用回波状态网络的特征重要性描述气候路径
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-07-23 DOI: 10.1002/sam.11706
Katherine Goode, Daniel Ries, Kellie McClernon
{"title":"Characterizing climate pathways using feature importance on echo state networks","authors":"Katherine Goode, Daniel Ries, Kellie McClernon","doi":"10.1002/sam.11706","DOIUrl":"https://doi.org/10.1002/sam.11706","url":null,"abstract":"The 2022 National Defense Strategy of the United States listed climate change as a serious threat to national security. Climate intervention methods, such as stratospheric aerosol injection, have been proposed as mitigation strategies, but the downstream effects of such actions on a complex climate system are not well understood. The development of algorithmic techniques for quantifying relationships between source and impact variables related to a climate event (i.e., a climate pathway) would help inform policy decisions. Data‐driven deep learning models have become powerful tools for modeling highly nonlinear relationships and may provide a route to characterize climate variable relationships. In this paper, we explore the use of an echo state network (ESN) for characterizing climate pathways. ESNs are a computationally efficient neural network variation designed for temporal data, and recent work proposes ESNs as a useful tool for forecasting spatiotemporal climate data. However, ESNs are noninterpretable black‐box models along with other neural networks. The lack of model transparency poses a hurdle for understanding variable relationships. We address this issue by developing feature importance methods for ESNs in the context of spatiotemporal data to quantify variable relationships captured by the model. We conduct a simulation study to assess and compare the feature importance techniques, and we demonstrate the approach on reanalysis climate data. In the climate application, we consider a time period that includes the 1991 volcanic eruption of Mount Pinatubo. This event was a significant stratospheric aerosol injection, which acts as a proxy for an anthropogenic stratospheric aerosol injection. We are able to use the proposed approach to characterize relationships between pathway variables associated with this event that agree with relationships previously identified by climate scientists.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two‐sample testing for random graphs 随机图形的双样本测试
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-06-27 DOI: 10.1002/sam.11703
Xiaoyi Wen
{"title":"Two‐sample testing for random graphs","authors":"Xiaoyi Wen","doi":"10.1002/sam.11703","DOIUrl":"https://doi.org/10.1002/sam.11703","url":null,"abstract":"The employment of two‐sample hypothesis testing in examining random graphs has been a prevalent approach in diverse fields such as social sciences, neuroscience, and genetics. We advance a spectral‐based two‐sample hypothesis testing methodology to test the latent position random graphs. We propose two distinct asymptotic normal statistics, each optimally designed for two different models—the elementary Erdős–Rényi model and the more complex latent position random graph model. For the latter, the spectral embedding of the adjacency matrix was utilized to estimate the test statistic. The proposed method exhibited superior efficacy as it accomplished higher power than the conventional method of mean estimation. To validate our hypothesis testing procedure, we applied it to empirical biological data to discern structural variances in gene co‐expression networks between COVID‐19 patients and individuals who remained unaffected by the disease.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141512034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost‐sensitive classification with time constraint on incomplete data 在不完整数据的时间限制下进行对成本敏感的分类
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-06-25 DOI: 10.1002/sam.11702
Yong‐Shiuan Lee, Chia‐Chi Wu
{"title":"Cost‐sensitive classification with time constraint on incomplete data","authors":"Yong‐Shiuan Lee, Chia‐Chi Wu","doi":"10.1002/sam.11702","DOIUrl":"https://doi.org/10.1002/sam.11702","url":null,"abstract":"Missing values are common, but dealing with them by inappropriate method may lead to large classification errors. Empirical evidences show that the tree‐based classification algorithms such as classification and regression tree (CART) can benefit from imputation, especially multiple imputation. Nevertheless, less attention has been paid to incorporating multiple imputation into cost‐sensitive decision tree induction. This study focuses on the treatment of missing data based on a time‐constrained minimal‐cost tree algorithm. We introduce various approaches to handle incomplete data into the algorithm including complete‐case analysis, missing‐value branch, single imputation, feature acquisition, and multiple imputation. A simulation study under different scenarios examines the predictive performances of the proposed strategies. The simulation results show that the combination of the algorithm with multiple imputation can assure classification accuracy under the budget. A real medical data example provides insights into the problem of missing values in cost‐sensitive learning and the advantages of the proposed methods.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141512035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequential metamodel‐based approaches to level‐set estimation under heteroscedasticity 基于序列元模型的异方差下水平集估计方法
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-05-29 DOI: 10.1002/sam.11697
Yutong Zhang, Xi Chen
{"title":"Sequential metamodel‐based approaches to level‐set estimation under heteroscedasticity","authors":"Yutong Zhang, Xi Chen","doi":"10.1002/sam.11697","DOIUrl":"https://doi.org/10.1002/sam.11697","url":null,"abstract":"This paper proposes two sequential metamodel‐based methods for level‐set estimation (LSE) that leverage the uniform bound built on stochastic kriging: predictive variance reduction (PVR) and expected classification improvement (ECI). We show that PVR and ECI possess desirable theoretical performance guarantees and provide closed‐form expressions for their respective sequential sampling criteria to seek the next design point for performing simulation runs, allowing computationally efficient one‐iteration look‐ahead updates. To enhance understanding, we reveal the connection between PVR and ECI's sequential sampling criteria. Additionally, we propose integrating a budget allocation feature with PVR and ECI, which improves computational efficiency and potentially enhances robustness to the impacts of heteroscedasticity. Numerical studies demonstrate the superior performance of the proposed methods compared to state‐of‐the‐art benchmarking approaches when given a fixed simulation budget, highlighting their effectiveness in addressing LSE problems.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards accelerating particle‐resolved direct numerical simulation with neural operators 利用神经算子加速粒子分辨直接数值模拟
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-05-29 DOI: 10.1002/sam.11690
Mohammad Atif, Vanessa López‐Marrero, Tao Zhang, Abdullah Al Muti Sharfuddin, Kwangmin Yu, Jiaqi Yang, Fan Yang, Foluso Ladeinde, Yangang Liu, Meifeng Lin, Lingda Li
{"title":"Towards accelerating particle‐resolved direct numerical simulation with neural operators","authors":"Mohammad Atif, Vanessa López‐Marrero, Tao Zhang, Abdullah Al Muti Sharfuddin, Kwangmin Yu, Jiaqi Yang, Fan Yang, Foluso Ladeinde, Yangang Liu, Meifeng Lin, Lingda Li","doi":"10.1002/sam.11690","DOIUrl":"https://doi.org/10.1002/sam.11690","url":null,"abstract":"We present our ongoing work aimed at accelerating a particle‐resolved direct numerical simulation model designed to study aerosol–cloud–turbulence interactions. The dynamical model consists of two main components—a set of fluid dynamics equations for air velocity, temperature, and humidity, coupled with a set of equations for particle (i.e., cloud droplet) tracing. Rather than attempting to replace the original numerical solution method in its entirety with a machine learning (ML) method, we consider developing a hybrid approach. We exploit the potential of neural operator learning to yield fast and accurate surrogate models and, in this study, develop such surrogates for the velocity and vorticity fields. We discuss results from numerical experiments designed to assess the performance of ML architectures under consideration as well as their suitability for capturing the behavior of relevant dynamical systems.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141188840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric mean and variance adaptive classification rule for high‐dimensional data with heteroscedastic variances 具有异方差的高维数据的非参数均值和方差自适应分类规则
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2024-05-20 DOI: 10.1002/sam.11689
Seungyeon Oh, Hoyoung Park
{"title":"Nonparametric mean and variance adaptive classification rule for high‐dimensional data with heteroscedastic variances","authors":"Seungyeon Oh, Hoyoung Park","doi":"10.1002/sam.11689","DOIUrl":"https://doi.org/10.1002/sam.11689","url":null,"abstract":"In this study, we introduce an innovative methodology aimed at enhancing Fisher's Linear Discriminant Analysis (LDA) in the context of high‐dimensional data classification scenarios, specifically addressing situations where each feature exhibits distinct variances. Our approach leverages Nonparametric Maximum Likelihood Estimation (NPMLE) techniques to estimate both the mean and variance parameters. By accommodating varying variances among features, our proposed method leads to notable improvements in classification performance. In particular, unlike numerous prior studies that assume the distribution of heterogeneous variances follows a right‐skewed inverse gamma distribution, our proposed method demonstrates excellent performance even when the distribution of heterogeneous variances takes on left‐skewed, symmetric, or right‐skewed forms. We conducted a series of rigorous experiments to empirically validate the effectiveness of our approach. The results of these experiments demonstrate that our proposed methodology excels in accurately classifying high‐dimensional data characterized by heterogeneous variances.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141147814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信