arXiv - STAT - Methodology最新文献_第8页

This is not normal! (Re-) Evaluating the lower $n$ guildelines for regression analysis 这是不正常的！(Re-) 评估回归分析的下 $n$ 准则

arXiv - STAT - Methodology Pub Date : 2024-09-10 DOI: arxiv-2409.06413

David Randahl

{"title":"This is not normal! (Re-) Evaluating the lower $n$ guildelines for regression analysis","authors":"David Randahl","doi":"arxiv-2409.06413","DOIUrl":"https://doi.org/arxiv-2409.06413","url":null,"abstract":"The commonly cited rule of thumb for regression analysis, which suggests that\u0000a sample size of $n geq 30$ is sufficient to ensure valid inferences, is\u0000frequently referenced but rarely scrutinized. This research note evaluates the\u0000lower bound for the number of observations required for regression analysis by\u0000exploring how different distributional characteristics, such as skewness and\u0000kurtosis, influence the convergence of t-values to the t-distribution in linear\u0000regression models. Through an extensive simulation study involving over 22\u0000billion regression models, this paper examines a range of symmetric,\u0000platykurtic, and skewed distributions, testing sample sizes from 4 to 10,000.\u0000The results reveal that it is sufficient that either the dependent or\u0000independent variable follow a symmetric distribution for the t-values to\u0000converge to the t-distribution at much smaller sample sizes than $n=30$. This\u0000is contrary to previous guidance which suggests that the error term needs to be\u0000normally distributed for this convergence to happen at low $n$. On the other\u0000hand, if both dependent and independent variables are highly skewed the\u0000required sample size is substantially higher. In cases of extreme skewness,\u0000even sample sizes of 10,000 do not ensure convergence. These findings suggest\u0000that the $ngeq30$ rule is too permissive in certain cases but overly\u0000conservative in others, depending on the underlying distributional\u0000characteristics. This study offers revised guidelines for determining the\u0000minimum sample size necessary for valid regression analysis.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient nonparametric estimators of discriminationmeasures with censored survival data 有删减生存数据的歧视度量的高效非参数估计器

arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI: arxiv-2409.05632

Marie S. Breum, Torben Martinussen

{"title":"Efficient nonparametric estimators of discriminationmeasures with censored survival data","authors":"Marie S. Breum, Torben Martinussen","doi":"arxiv-2409.05632","DOIUrl":"https://doi.org/arxiv-2409.05632","url":null,"abstract":"Discrimination measures such as concordance statistics (e.g. the c-index or\u0000the concordance probability) and the cumulative-dynamic time-dependent area\u0000under the ROC-curve (AUC) are widely used in the medical literature for\u0000evaluating the predictive accuracy of a scoring rule which relates a set of\u0000prognostic markers to the risk of experiencing a particular event. Often the\u0000scoring rule being evaluated in terms of discriminatory ability is the linear\u0000predictor of a survival regression model such as the Cox proportional hazards\u0000model. This has the undesirable feature that the scoring rule depends on the\u0000censoring distribution when the model is misspecified. In this work we focus on\u0000linear scoring rules where the coefficient vector is a nonparametric estimand\u0000defined in the setting where there is no censoring. We propose so-called\u0000debiased estimators of the aforementioned discrimination measures for this\u0000class of scoring rules. The proposed estimators make efficient use of the data\u0000and minimize bias by allowing for the use of data-adaptive methods for model\u0000fitting. Moreover, the estimators do not rely on correct specification of the\u0000censoring model to produce consistent estimation. We compare the estimators to\u0000existing methods in a simulation study, and we illustrate the method by an\u0000application to a brain cancer study.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Eigengap Ratio Test for Determining the Number of Communities in Network Data 用于确定网络数据中社群数量的 Eigengap 比率测试

arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI: arxiv-2409.05276

Yujia Wu, Jingfei Zhang, Wei Lan, Chih-Ling Tsai

{"title":"An Eigengap Ratio Test for Determining the Number of Communities in Network Data","authors":"Yujia Wu, Jingfei Zhang, Wei Lan, Chih-Ling Tsai","doi":"arxiv-2409.05276","DOIUrl":"https://doi.org/arxiv-2409.05276","url":null,"abstract":"To characterize the community structure in network data, researchers have\u0000introduced various block-type models, including the stochastic block model,\u0000degree-corrected stochastic block model, mixed membership block model,\u0000degree-corrected mixed membership block model, and others. A critical step in\u0000applying these models effectively is determining the number of communities in\u0000the network. However, to our knowledge, existing methods for estimating the\u0000number of network communities often require model estimations or are unable to\u0000simultaneously account for network sparsity and a divergent number of\u0000communities. In this paper, we propose an eigengap-ratio based test that\u0000address these challenges. The test is straightforward to compute, requires no\u0000parameter tuning, and can be applied to a wide range of block models without\u0000the need to estimate network distribution parameters. Furthermore, it is\u0000effective for both dense and sparse networks with a divergent number of\u0000communities. We show that the proposed test statistic converges to a function\u0000of the type-I Tracy-Widom distributions under the null hypothesis, and that the\u0000test is asymptotically powerful under alternatives. Simulation studies on both\u0000dense and sparse networks demonstrate the efficacy of the proposed method.\u0000Three real-world examples are presented to illustrate the usefulness of the\u0000proposed test.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Priors from Envisioned Posterior Judgments: A Novel Elicitation Approach With Application to Bayesian Clinical Trials 来自设想的后验判断的先验：应用于贝叶斯临床试验的新颖诱导方法

arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI: arxiv-2409.05271

Yongdong Ouyang, Janice J Eng, Denghuang Zhan, Hubert Wong

{"title":"Priors from Envisioned Posterior Judgments: A Novel Elicitation Approach With Application to Bayesian Clinical Trials","authors":"Yongdong Ouyang, Janice J Eng, Denghuang Zhan, Hubert Wong","doi":"arxiv-2409.05271","DOIUrl":"https://doi.org/arxiv-2409.05271","url":null,"abstract":"The uptake of formalized prior elicitation from experts in Bayesian clinical\u0000trials has been limited, largely due to the challenges associated with complex\u0000statistical modeling, the lack of practical tools, and the cognitive burden on\u0000experts required to quantify their uncertainty using probabilistic language.\u0000Additionally, existing methods do not address prior-posterior coherence, i.e.,\u0000does the posterior distribution, obtained mathematically from combining the\u0000estimated prior with the trial data, reflect the expert's actual posterior\u0000beliefs? We propose a new elicitation approach that seeks to ensure\u0000prior-posterior coherence and reduce the expert's cognitive burden. This is\u0000achieved by eliciting responses about the expert's envisioned posterior\u0000judgments under various potential data outcomes and inferring the prior\u0000distribution by minimizing the discrepancies between these responses and the\u0000expected responses obtained from the posterior distribution. The feasibility\u0000and potential value of the new approach are illustrated through an application\u0000to a real trial currently underway.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"170 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Recursive Nested Filtering for Efficient Amortized Bayesian Experimental Design 高效摊销贝叶斯实验设计的递归嵌套过滤法

arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI: arxiv-2409.05354

Sahel Iqbal, Hany Abdulsamad, Sara Pérez-Vieites, Simo Särkkä, Adrien Corenflos

引用次数: 0

Multilevel testing of constraints induced by structural equation modeling in fMRI effective connectivity analysis: A proof of concept 对 fMRI 有效连通性分析中结构方程建模引起的制约因素进行多层次测试：概念验证

arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI: arxiv-2409.05630

G. Marrelec, A. Giron

{"title":"Multilevel testing of constraints induced by structural equation modeling in fMRI effective connectivity analysis: A proof of concept","authors":"G. Marrelec, A. Giron","doi":"arxiv-2409.05630","DOIUrl":"https://doi.org/arxiv-2409.05630","url":null,"abstract":"In functional MRI (fMRI), effective connectivity analysis aims at inferring\u0000the causal influences that brain regions exert on one another. A common method\u0000for this type of analysis is structural equation modeling (SEM). We here\u0000propose a novel method to test the validity of a given model of structural\u0000equation. Given a structural model in the form of a directed graph, the method\u0000extracts the set of all constraints of conditional independence induced by the\u0000absence of links between pairs of regions in the model and tests for their\u0000validity in a Bayesian framework, either individually (constraint by\u0000constraint), jointly (e.g., by gathering all constraints associated with a\u0000given missing link), or globally (i.e., all constraints associated with the\u0000structural model). This approach has two main advantages. First, it only tests\u0000what is testable from observational data and does allow for false causal\u0000interpretation. Second, it makes it possible to test each constraint (or group\u0000of constraints) separately and, therefore, quantify in what measure each\u0000constraint (or, e..g., missing link) is respected in the data. We validate our\u0000approach using a simulation study and illustrate its potential benefits through\u0000the reanalysis of published data.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting Electricity Consumption with Random Walks on Gaussian Processes 用高斯过程的随机漫步预测用电量

arXiv - STAT - Methodology Pub Date : 2024-09-09 DOI: arxiv-2409.05934

Chloé Hashimoto-Cullen, Benjamin Guedj

引用次数: 0

An unbiased rank-based estimator of the Mann-Whitney variance including the case of ties 包括并列情况在内的曼-惠特尼方差无偏等级估计器

arXiv - STAT - Methodology Pub Date : 2024-09-08 DOI: arxiv-2409.05038

Edgar Brunner, Frank Konietschke

{"title":"An unbiased rank-based estimator of the Mann-Whitney variance including the case of ties","authors":"Edgar Brunner, Frank Konietschke","doi":"arxiv-2409.05038","DOIUrl":"https://doi.org/arxiv-2409.05038","url":null,"abstract":"Many estimators of the variance of the well-known unbiased and uniform most\u0000powerful estimator $htheta$ of the Mann-Whitney effect, $theta = P(X < Y) +\u0000nfrac12 P(X=Y)$, are considered in the literature. Some of these estimators\u0000are only valid in case of no ties or are biased in case of small sample sizes\u0000where the amount of the bias is not discussed. Here we derive an unbiased\u0000estimator that is based on different rankings, the so-called 'placements'\u0000(Orban and Wolfe, 1980), and is therefore easy to compute. This estimator does\u0000not require the assumption of continuous dfs and is also valid in the case of\u0000ties. Moreover, it is shown that this estimator is non-negative and has a sharp\u0000upper bound which may be considered an empirical version of the well-known\u0000Birnbaum-Klose inequality. The derivation of this estimator provides an option\u0000to compute the biases of some commonly used estimators in the literature.\u0000Simulations demonstrate that, for small sample sizes, the biases of these\u0000estimators depend on the underlying dfs and thus are not under control. This\u0000means that in the case of a biased estimator, simulation results for the type-I\u0000error of a test or the coverage probability of a ci do not only depend on the\u0000quality of the approximation of $htheta$ by a normal db but also an\u0000additional unknown bias caused by the variance estimator. Finally, it is shown\u0000that this estimator is $L_2$-consistent.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating velocities of infectious disease spread through spatio-temporal log-Gaussian Cox point processes 通过时空对数高斯考克斯点过程估计传染病传播速度

arXiv - STAT - Methodology Pub Date : 2024-09-08 DOI: arxiv-2409.05036

Fernando Rodriguez Avellaneda, Jorge Mateu, Paula Moraga

{"title":"Estimating velocities of infectious disease spread through spatio-temporal log-Gaussian Cox point processes","authors":"Fernando Rodriguez Avellaneda, Jorge Mateu, Paula Moraga","doi":"arxiv-2409.05036","DOIUrl":"https://doi.org/arxiv-2409.05036","url":null,"abstract":"Understanding the spread of infectious diseases such as COVID-19 is crucial\u0000for informed decision-making and resource allocation. A critical component of\u0000disease behavior is the velocity with which disease spreads, defined as the\u0000rate of change between time and space. In this paper, we propose a\u0000spatio-temporal modeling approach to determine the velocities of infectious\u0000disease spread. Our approach assumes that the locations and times of people\u0000infected can be considered as a spatio-temporal point pattern that arises as a\u0000realization of a spatio-temporal log-Gaussian Cox process. The intensity of\u0000this process is estimated using fast Bayesian inference by employing the\u0000integrated nested Laplace approximation (INLA) and the Stochastic Partial\u0000Differential Equations (SPDE) approaches. The velocity is then calculated using\u0000finite differences that approximate the derivatives of the intensity function.\u0000Finally, the directions and magnitudes of the velocities can be mapped at\u0000specific times to examine better the spread of the disease throughout the\u0000region. We demonstrate our method by analyzing COVID-19 spread in Cali,\u0000Colombia, during the 2020-2021 pandemic.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Inference for Large Scale Regression Models with Dependent Errors 具有依赖误差的大规模回归模型推理

arXiv - STAT - Methodology Pub Date : 2024-09-08 DOI: arxiv-2409.05160

Lionel Voirol, Haotian Xu, Yuming Zhang, Luca Insolia, Roberto Molinari, Stéphane Guerrier

{"title":"Inference for Large Scale Regression Models with Dependent Errors","authors":"Lionel Voirol, Haotian Xu, Yuming Zhang, Luca Insolia, Roberto Molinari, Stéphane Guerrier","doi":"arxiv-2409.05160","DOIUrl":"https://doi.org/arxiv-2409.05160","url":null,"abstract":"The exponential growth in data sizes and storage costs has brought\u0000considerable challenges to the data science community, requiring solutions to\u0000run learning methods on such data. While machine learning has scaled to achieve\u0000predictive accuracy in big data settings, statistical inference and uncertainty\u0000quantification tools are still lagging. Priority scientific fields collect vast\u0000data to understand phenomena typically studied with statistical methods like\u0000regression. In this setting, regression parameter estimation can benefit from\u0000efficient computational procedures, but the main challenge lies in computing\u0000error process parameters with complex covariance structures. Identifying and\u0000estimating these structures is essential for inference and often used for\u0000uncertainty quantification in machine learning with Gaussian Processes.\u0000However, estimating these structures becomes burdensome as data scales,\u0000requiring approximations that compromise the reliability of outputs. These\u0000approximations are even more unreliable when complexities like long-range\u0000dependencies or missing data are present. This work defines and proves the\u0000statistical properties of the Generalized Method of Wavelet Moments with\u0000Exogenous variables (GMWMX), a highly scalable, stable, and statistically valid\u0000method for estimating and delivering inference for linear models using\u0000stochastic processes in the presence of data complexities like latent\u0000dependence structures and missing data. Applied examples from Earth Sciences\u0000and extensive simulations highlight the advantages of the GMWMX.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0