arXiv - STAT - Methodology最新文献

筛选
英文 中文
Causal Analysis of Shapley Values: Conditional vs. Marginal 夏普利值的因果分析:条件值与边际值
arXiv - STAT - Methodology Pub Date : 2024-09-10 DOI: arxiv-2409.06157
Ilya Rozenfeld
{"title":"Causal Analysis of Shapley Values: Conditional vs. Marginal","authors":"Ilya Rozenfeld","doi":"arxiv-2409.06157","DOIUrl":"https://doi.org/arxiv-2409.06157","url":null,"abstract":"Shapley values, a game theoretic concept, has been one of the most popular\u0000tools for explaining Machine Learning (ML) models in recent years.\u0000Unfortunately, the two most common approaches, conditional and marginal, to\u0000calculating Shapley values can lead to different results along with some\u0000undesirable side effects when features are correlated. This in turn has led to\u0000the situation in the literature where contradictory recommendations regarding\u0000choice of an approach are provided by different authors. In this paper we aim\u0000to resolve this controversy through the use of causal arguments. We show that\u0000the differences arise from the implicit assumptions that are made within each\u0000method to deal with missing causal information. We also demonstrate that the\u0000conditional approach is fundamentally unsound from a causal perspective. This,\u0000together with previous work in [1], leads to the conclusion that the marginal\u0000approach should be preferred over the conditional one.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"192 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient nonparametric estimators of discriminationmeasures with censored survival data 有删减生存数据的歧视度量的高效非参数估计器
arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI: arxiv-2409.05632
Marie S. Breum, Torben Martinussen
{"title":"Efficient nonparametric estimators of discriminationmeasures with censored survival data","authors":"Marie S. Breum, Torben Martinussen","doi":"arxiv-2409.05632","DOIUrl":"https://doi.org/arxiv-2409.05632","url":null,"abstract":"Discrimination measures such as concordance statistics (e.g. the c-index or\u0000the concordance probability) and the cumulative-dynamic time-dependent area\u0000under the ROC-curve (AUC) are widely used in the medical literature for\u0000evaluating the predictive accuracy of a scoring rule which relates a set of\u0000prognostic markers to the risk of experiencing a particular event. Often the\u0000scoring rule being evaluated in terms of discriminatory ability is the linear\u0000predictor of a survival regression model such as the Cox proportional hazards\u0000model. This has the undesirable feature that the scoring rule depends on the\u0000censoring distribution when the model is misspecified. In this work we focus on\u0000linear scoring rules where the coefficient vector is a nonparametric estimand\u0000defined in the setting where there is no censoring. We propose so-called\u0000debiased estimators of the aforementioned discrimination measures for this\u0000class of scoring rules. The proposed estimators make efficient use of the data\u0000and minimize bias by allowing for the use of data-adaptive methods for model\u0000fitting. Moreover, the estimators do not rely on correct specification of the\u0000censoring model to produce consistent estimation. We compare the estimators to\u0000existing methods in a simulation study, and we illustrate the method by an\u0000application to a brain cancer study.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Eigengap Ratio Test for Determining the Number of Communities in Network Data 用于确定网络数据中社群数量的 Eigengap 比率测试
arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI: arxiv-2409.05276
Yujia Wu, Jingfei Zhang, Wei Lan, Chih-Ling Tsai
{"title":"An Eigengap Ratio Test for Determining the Number of Communities in Network Data","authors":"Yujia Wu, Jingfei Zhang, Wei Lan, Chih-Ling Tsai","doi":"arxiv-2409.05276","DOIUrl":"https://doi.org/arxiv-2409.05276","url":null,"abstract":"To characterize the community structure in network data, researchers have\u0000introduced various block-type models, including the stochastic block model,\u0000degree-corrected stochastic block model, mixed membership block model,\u0000degree-corrected mixed membership block model, and others. A critical step in\u0000applying these models effectively is determining the number of communities in\u0000the network. However, to our knowledge, existing methods for estimating the\u0000number of network communities often require model estimations or are unable to\u0000simultaneously account for network sparsity and a divergent number of\u0000communities. In this paper, we propose an eigengap-ratio based test that\u0000address these challenges. The test is straightforward to compute, requires no\u0000parameter tuning, and can be applied to a wide range of block models without\u0000the need to estimate network distribution parameters. Furthermore, it is\u0000effective for both dense and sparse networks with a divergent number of\u0000communities. We show that the proposed test statistic converges to a function\u0000of the type-I Tracy-Widom distributions under the null hypothesis, and that the\u0000test is asymptotically powerful under alternatives. Simulation studies on both\u0000dense and sparse networks demonstrate the efficacy of the proposed method.\u0000Three real-world examples are presented to illustrate the usefulness of the\u0000proposed test.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Priors from Envisioned Posterior Judgments: A Novel Elicitation Approach With Application to Bayesian Clinical Trials 来自设想的后验判断的先验:应用于贝叶斯临床试验的新颖诱导方法
arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI: arxiv-2409.05271
Yongdong Ouyang, Janice J Eng, Denghuang Zhan, Hubert Wong
{"title":"Priors from Envisioned Posterior Judgments: A Novel Elicitation Approach With Application to Bayesian Clinical Trials","authors":"Yongdong Ouyang, Janice J Eng, Denghuang Zhan, Hubert Wong","doi":"arxiv-2409.05271","DOIUrl":"https://doi.org/arxiv-2409.05271","url":null,"abstract":"The uptake of formalized prior elicitation from experts in Bayesian clinical\u0000trials has been limited, largely due to the challenges associated with complex\u0000statistical modeling, the lack of practical tools, and the cognitive burden on\u0000experts required to quantify their uncertainty using probabilistic language.\u0000Additionally, existing methods do not address prior-posterior coherence, i.e.,\u0000does the posterior distribution, obtained mathematically from combining the\u0000estimated prior with the trial data, reflect the expert's actual posterior\u0000beliefs? We propose a new elicitation approach that seeks to ensure\u0000prior-posterior coherence and reduce the expert's cognitive burden. This is\u0000achieved by eliciting responses about the expert's envisioned posterior\u0000judgments under various potential data outcomes and inferring the prior\u0000distribution by minimizing the discrepancies between these responses and the\u0000expected responses obtained from the posterior distribution. The feasibility\u0000and potential value of the new approach are illustrated through an application\u0000to a real trial currently underway.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"170 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recursive Nested Filtering for Efficient Amortized Bayesian Experimental Design 高效摊销贝叶斯实验设计的递归嵌套过滤法
arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI: arxiv-2409.05354
Sahel Iqbal, Hany Abdulsamad, Sara Pérez-Vieites, Simo Särkkä, Adrien Corenflos
{"title":"Recursive Nested Filtering for Efficient Amortized Bayesian Experimental Design","authors":"Sahel Iqbal, Hany Abdulsamad, Sara Pérez-Vieites, Simo Särkkä, Adrien Corenflos","doi":"arxiv-2409.05354","DOIUrl":"https://doi.org/arxiv-2409.05354","url":null,"abstract":"This paper introduces the Inside-Out Nested Particle Filter (IO-NPF), a\u0000novel, fully recursive, algorithm for amortized sequential Bayesian\u0000experimental design in the non-exchangeable setting. We frame policy\u0000optimization as maximum likelihood estimation in a non-Markovian state-space\u0000model, achieving (at most) $mathcal{O}(T^2)$ computational complexity in the\u0000number of experiments. We provide theoretical convergence guarantees and\u0000introduce a backward sampling algorithm to reduce trajectory degeneracy. IO-NPF\u0000offers a practical, extensible, and provably consistent approach to sequential\u0000Bayesian experimental design, demonstrating improved efficiency over existing\u0000methods.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multilevel testing of constraints induced by structural equation modeling in fMRI effective connectivity analysis: A proof of concept 对 fMRI 有效连通性分析中结构方程建模引起的制约因素进行多层次测试:概念验证
arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI: arxiv-2409.05630
G. Marrelec, A. Giron
{"title":"Multilevel testing of constraints induced by structural equation modeling in fMRI effective connectivity analysis: A proof of concept","authors":"G. Marrelec, A. Giron","doi":"arxiv-2409.05630","DOIUrl":"https://doi.org/arxiv-2409.05630","url":null,"abstract":"In functional MRI (fMRI), effective connectivity analysis aims at inferring\u0000the causal influences that brain regions exert on one another. A common method\u0000for this type of analysis is structural equation modeling (SEM). We here\u0000propose a novel method to test the validity of a given model of structural\u0000equation. Given a structural model in the form of a directed graph, the method\u0000extracts the set of all constraints of conditional independence induced by the\u0000absence of links between pairs of regions in the model and tests for their\u0000validity in a Bayesian framework, either individually (constraint by\u0000constraint), jointly (e.g., by gathering all constraints associated with a\u0000given missing link), or globally (i.e., all constraints associated with the\u0000structural model). This approach has two main advantages. First, it only tests\u0000what is testable from observational data and does allow for false causal\u0000interpretation. Second, it makes it possible to test each constraint (or group\u0000of constraints) separately and, therefore, quantify in what measure each\u0000constraint (or, e..g., missing link) is respected in the data. We validate our\u0000approach using a simulation study and illustrate its potential benefits through\u0000the reanalysis of published data.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Electricity Consumption with Random Walks on Gaussian Processes 用高斯过程的随机漫步预测用电量
arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI: arxiv-2409.05934
Chloé Hashimoto-Cullen, Benjamin Guedj
{"title":"Predicting Electricity Consumption with Random Walks on Gaussian Processes","authors":"Chloé Hashimoto-Cullen, Benjamin Guedj","doi":"arxiv-2409.05934","DOIUrl":"https://doi.org/arxiv-2409.05934","url":null,"abstract":"We consider time-series forecasting problems where data is scarce, difficult\u0000to gather, or induces a prohibitive computational cost. As a first attempt, we\u0000focus on short-term electricity consumption in France, which is of strategic\u0000importance for energy suppliers and public stakeholders. The complexity of this\u0000problem and the many levels of geospatial granularity motivate the use of an\u0000ensemble of Gaussian Processes (GPs). Whilst GPs are remarkable predictors,\u0000they are computationally expensive to train, which calls for a frugal few-shot\u0000learning approach. By taking into account performance on GPs trained on a\u0000dataset and designing a random walk on these, we mitigate the training cost of\u0000our entire Bayesian decision-making procedure. We introduce our algorithm\u0000called textsc{Domino} (ranDOM walk on gaussIaN prOcesses) and present\u0000numerical experiments to support its merits.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An unbiased rank-based estimator of the Mann-Whitney variance including the case of ties 包括并列情况在内的曼-惠特尼方差无偏等级估计器
arXiv - STAT - Methodology Pub Date : 2024-09-08 DOI: arxiv-2409.05038
Edgar Brunner, Frank Konietschke
{"title":"An unbiased rank-based estimator of the Mann-Whitney variance including the case of ties","authors":"Edgar Brunner, Frank Konietschke","doi":"arxiv-2409.05038","DOIUrl":"https://doi.org/arxiv-2409.05038","url":null,"abstract":"Many estimators of the variance of the well-known unbiased and uniform most\u0000powerful estimator $htheta$ of the Mann-Whitney effect, $theta = P(X < Y) +\u0000nfrac12 P(X=Y)$, are considered in the literature. Some of these estimators\u0000are only valid in case of no ties or are biased in case of small sample sizes\u0000where the amount of the bias is not discussed. Here we derive an unbiased\u0000estimator that is based on different rankings, the so-called 'placements'\u0000(Orban and Wolfe, 1980), and is therefore easy to compute. This estimator does\u0000not require the assumption of continuous dfs and is also valid in the case of\u0000ties. Moreover, it is shown that this estimator is non-negative and has a sharp\u0000upper bound which may be considered an empirical version of the well-known\u0000Birnbaum-Klose inequality. The derivation of this estimator provides an option\u0000to compute the biases of some commonly used estimators in the literature.\u0000Simulations demonstrate that, for small sample sizes, the biases of these\u0000estimators depend on the underlying dfs and thus are not under control. This\u0000means that in the case of a biased estimator, simulation results for the type-I\u0000error of a test or the coverage probability of a ci do not only depend on the\u0000quality of the approximation of $htheta$ by a normal db but also an\u0000additional unknown bias caused by the variance estimator. Finally, it is shown\u0000that this estimator is $L_2$-consistent.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating velocities of infectious disease spread through spatio-temporal log-Gaussian Cox point processes 通过时空对数高斯考克斯点过程估计传染病传播速度
arXiv - STAT - Methodology Pub Date : 2024-09-08 DOI: arxiv-2409.05036
Fernando Rodriguez Avellaneda, Jorge Mateu, Paula Moraga
{"title":"Estimating velocities of infectious disease spread through spatio-temporal log-Gaussian Cox point processes","authors":"Fernando Rodriguez Avellaneda, Jorge Mateu, Paula Moraga","doi":"arxiv-2409.05036","DOIUrl":"https://doi.org/arxiv-2409.05036","url":null,"abstract":"Understanding the spread of infectious diseases such as COVID-19 is crucial\u0000for informed decision-making and resource allocation. A critical component of\u0000disease behavior is the velocity with which disease spreads, defined as the\u0000rate of change between time and space. In this paper, we propose a\u0000spatio-temporal modeling approach to determine the velocities of infectious\u0000disease spread. Our approach assumes that the locations and times of people\u0000infected can be considered as a spatio-temporal point pattern that arises as a\u0000realization of a spatio-temporal log-Gaussian Cox process. The intensity of\u0000this process is estimated using fast Bayesian inference by employing the\u0000integrated nested Laplace approximation (INLA) and the Stochastic Partial\u0000Differential Equations (SPDE) approaches. The velocity is then calculated using\u0000finite differences that approximate the derivatives of the intensity function.\u0000Finally, the directions and magnitudes of the velocities can be mapped at\u0000specific times to examine better the spread of the disease throughout the\u0000region. We demonstrate our method by analyzing COVID-19 spread in Cali,\u0000Colombia, during the 2020-2021 pandemic.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference for Large Scale Regression Models with Dependent Errors 具有依赖误差的大规模回归模型推理
arXiv - STAT - Methodology Pub Date : 2024-09-08 DOI: arxiv-2409.05160
Lionel Voirol, Haotian Xu, Yuming Zhang, Luca Insolia, Roberto Molinari, Stéphane Guerrier
{"title":"Inference for Large Scale Regression Models with Dependent Errors","authors":"Lionel Voirol, Haotian Xu, Yuming Zhang, Luca Insolia, Roberto Molinari, Stéphane Guerrier","doi":"arxiv-2409.05160","DOIUrl":"https://doi.org/arxiv-2409.05160","url":null,"abstract":"The exponential growth in data sizes and storage costs has brought\u0000considerable challenges to the data science community, requiring solutions to\u0000run learning methods on such data. While machine learning has scaled to achieve\u0000predictive accuracy in big data settings, statistical inference and uncertainty\u0000quantification tools are still lagging. Priority scientific fields collect vast\u0000data to understand phenomena typically studied with statistical methods like\u0000regression. In this setting, regression parameter estimation can benefit from\u0000efficient computational procedures, but the main challenge lies in computing\u0000error process parameters with complex covariance structures. Identifying and\u0000estimating these structures is essential for inference and often used for\u0000uncertainty quantification in machine learning with Gaussian Processes.\u0000However, estimating these structures becomes burdensome as data scales,\u0000requiring approximations that compromise the reliability of outputs. These\u0000approximations are even more unreliable when complexities like long-range\u0000dependencies or missing data are present. This work defines and proves the\u0000statistical properties of the Generalized Method of Wavelet Moments with\u0000Exogenous variables (GMWMX), a highly scalable, stable, and statistically valid\u0000method for estimating and delivering inference for linear models using\u0000stochastic processes in the presence of data complexities like latent\u0000dependence structures and missing data. Applied examples from Earth Sciences\u0000and extensive simulations highlight the advantages of the GMWMX.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信