arXiv - STAT - Methodology最新文献

筛选
英文 中文
Improve Sensitivity Analysis Synthesizing Randomized Clinical Trials With Limited Overlap 改进敏感性分析,综合有限重叠的随机临床试验
arXiv - STAT - Methodology Pub Date : 2024-09-11 DOI: arxiv-2409.07391
Kuan Jiang, Wenjie Hu, Shu Yang, Xinxing Lai, Xiaohua Zhou
{"title":"Improve Sensitivity Analysis Synthesizing Randomized Clinical Trials With Limited Overlap","authors":"Kuan Jiang, Wenjie Hu, Shu Yang, Xinxing Lai, Xiaohua Zhou","doi":"arxiv-2409.07391","DOIUrl":"https://doi.org/arxiv-2409.07391","url":null,"abstract":"To estimate the average treatment effect in real-world populations,\u0000observational studies are typically designed around real-world cohorts.\u0000However, even when study samples from these designs represent the population,\u0000unmeasured confounders can introduce bias. Sensitivity analysis is often used\u0000to estimate bounds for the average treatment effect without relying on the\u0000strict mathematical assumptions of other existing methods. This article\u0000introduces a new approach that improves sensitivity analysis in observational\u0000studies by incorporating randomized clinical trial data, even with limited\u0000overlap due to inclusion/exclusion criteria. Theoretical proof and simulations\u0000show that this method provides a tighter bound width than existing approaches.\u0000We also apply this method to both a trial dataset and a real-world drug\u0000effectiveness comparison dataset for practical analysis.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extended-support beta regression for $[0, 1]$ responses 针对 $[0, 1]$ 响应的扩展支持贝塔回归
arXiv - STAT - Methodology Pub Date : 2024-09-11 DOI: arxiv-2409.07233
Ioannis Kosmidis, Achim Zeileis
{"title":"Extended-support beta regression for $[0, 1]$ responses","authors":"Ioannis Kosmidis, Achim Zeileis","doi":"arxiv-2409.07233","DOIUrl":"https://doi.org/arxiv-2409.07233","url":null,"abstract":"We introduce the XBX regression model, a continuous mixture of\u0000extended-support beta regressions for modeling bounded responses with or\u0000without boundary observations. The core building block of the new model is the\u0000extended-support beta distribution, which is a censored version of a\u0000four-parameter beta distribution with the same exceedance on the left and right\u0000of $(0, 1)$. Hence, XBX regression is a direct extension of beta regression. We\u0000prove that both beta regression with dispersion effects and heteroscedastic\u0000normal regression with censoring at both $0$ and $1$ -- known as the\u0000heteroscedastic two-limit tobit model in the econometrics literature -- are\u0000special cases of the extended-support beta regression model, depending on\u0000whether a single extra parameter is zero or infinity, respectively. To overcome\u0000identifiability issues that may arise in estimating the extra parameter due to\u0000the similarity of the beta and normal distribution for certain parameter\u0000settings, we assume that the additional parameter has an exponential\u0000distribution with an unknown mean. The associated marginal likelihood can be\u0000conveniently and accurately approximated using a Gauss-Laguerre quadrature\u0000rule, resulting in efficient estimation and inference procedures. The new model\u0000is used to analyze investment decisions in a behavioral economics experiment,\u0000where the occurrence and extent of loss aversion is of interest. In contrast to\u0000standard approaches, XBX regression can simultaneously capture the probability\u0000of rational behavior as well as the mean amount of loss aversion. Moreover, the\u0000effectiveness of the new model is illustrated through extensive numerical\u0000comparisons with alternative models.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"108 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-source Stable Variable Importance Measure via Adversarial Machine Learning 通过对抗式机器学习进行多源稳定变量重要性测量
arXiv - STAT - Methodology Pub Date : 2024-09-11 DOI: arxiv-2409.07380
Zitao Wang, Nian Si, Zijian Guo, Molei Liu
{"title":"Multi-source Stable Variable Importance Measure via Adversarial Machine Learning","authors":"Zitao Wang, Nian Si, Zijian Guo, Molei Liu","doi":"arxiv-2409.07380","DOIUrl":"https://doi.org/arxiv-2409.07380","url":null,"abstract":"As part of enhancing the interpretability of machine learning, it is of\u0000renewed interest to quantify and infer the predictive importance of certain\u0000exposure covariates. Modern scientific studies often collect data from multiple\u0000sources with distributional heterogeneity. Thus, measuring and inferring stable\u0000associations across multiple environments is crucial in reliable and\u0000generalizable decision-making. In this paper, we propose MIMAL, a novel\u0000statistical framework for Multi-source stable Importance Measure via\u0000Adversarial Learning. MIMAL measures the importance of some exposure variables\u0000by maximizing the worst-case predictive reward over the source mixture. Our\u0000framework allows various machine learning methods for confounding adjustment\u0000and exposure effect characterization. For inferential analysis, the asymptotic\u0000normality of our introduced statistic is established under a general machine\u0000learning framework that requires no stronger learning accuracy conditions than\u0000those for single source variable importance. Numerical studies with various\u0000types of data generation setups and machine learning implementation are\u0000conducted to justify the finite-sample performance of MIMAL. We also illustrate\u0000our method through a real-world study of Beijing air pollution in multiple\u0000locations.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating Multiple Data Sources with Interactions in Multi-Omics Using Cooperative Learning 利用合作学习将多数据源与多图像中的互动整合在一起
arXiv - STAT - Methodology Pub Date : 2024-09-11 DOI: arxiv-2409.07125
Matteo D'Alessandro, Theophilus Quachie Asenso, Manuela Zucknick
{"title":"Integrating Multiple Data Sources with Interactions in Multi-Omics Using Cooperative Learning","authors":"Matteo D'Alessandro, Theophilus Quachie Asenso, Manuela Zucknick","doi":"arxiv-2409.07125","DOIUrl":"https://doi.org/arxiv-2409.07125","url":null,"abstract":"Modeling with multi-omics data presents multiple challenges such as the\u0000high-dimensionality of the problem ($p gg n$), the presence of interactions\u0000between features, and the need for integration between multiple data sources.\u0000We establish an interaction model that allows for the inclusion of multiple\u0000sources of data from the integration of two existing methods, pliable lasso and\u0000cooperative learning. The integrated model is tested both on simulation studies\u0000and on real multi-omics datasets for predicting labor onset and cancer\u0000treatment response. The results show that the model is effective in modeling\u0000multi-source data in various scenarios where interactions are present, both in\u0000terms of prediction performance and selection of relevant variables.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"195 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequential stratified inference for the mean 均值的连续分层推断
arXiv - STAT - Methodology Pub Date : 2024-09-10 DOI: arxiv-2409.06680
Jacob V. Spertus, Mayuri Sridhar, Philip B. Stark
{"title":"Sequential stratified inference for the mean","authors":"Jacob V. Spertus, Mayuri Sridhar, Philip B. Stark","doi":"arxiv-2409.06680","DOIUrl":"https://doi.org/arxiv-2409.06680","url":null,"abstract":"We develop conservative tests for the mean of a bounded population using data\u0000from a stratified sample. The sample may be drawn sequentially, with or without\u0000replacement. The tests are \"anytime valid,\" allowing optional stopping and\u0000continuation in each stratum. We call this combination of properties\u0000sequential, finite-sample, nonparametric validity. The methods express a\u0000hypothesis about the population mean as a union of intersection hypotheses\u0000describing within-stratum means. They test each intersection hypothesis using\u0000independent test supermartingales (TSMs) combined across strata by\u0000multiplication. The $P$-value of the global null hypothesis is then the maximum\u0000$P$-value of any intersection hypothesis in the union. This approach has three\u0000primary moving parts: (i) the rule for deciding which stratum to draw from next\u0000to test each intersection null, given the sample so far; (ii) the form of the\u0000TSM for each null in each stratum; and (iii) the method of combining evidence\u0000across strata. These choices interact. We examine the performance of a variety\u0000of rules with differing computational complexity. Approximately optimal methods\u0000have a prohibitive computational cost, while naive rules may be inconsistent --\u0000they will never reject for some alternative populations, no matter how large\u0000the sample. We present a method that is statistically comparable to optimal\u0000methods in examples where optimal methods are computable, but computationally\u0000tractable for arbitrarily many strata. In numerical examples its expected\u0000sample size is substantially smaller than that of previous methods.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"42 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ensemble Doubly Robust Bayesian Inference via Regression Synthesis 通过回归合成进行集合双稳健贝叶斯推理
arXiv - STAT - Methodology Pub Date : 2024-09-10 DOI: arxiv-2409.06288
Kaoru Babasaki, Shonosuke Sugasawa, Kosaku Takanashi, Kenichiro McAlinn
{"title":"Ensemble Doubly Robust Bayesian Inference via Regression Synthesis","authors":"Kaoru Babasaki, Shonosuke Sugasawa, Kosaku Takanashi, Kenichiro McAlinn","doi":"arxiv-2409.06288","DOIUrl":"https://doi.org/arxiv-2409.06288","url":null,"abstract":"The doubly robust estimator, which models both the propensity score and\u0000outcomes, is a popular approach to estimate the average treatment effect in the\u0000potential outcome setting. The primary appeal of this estimator is its\u0000theoretical property, wherein the estimator achieves consistency as long as\u0000either the propensity score or outcomes is correctly specified. In most\u0000applications, however, both are misspecified, leading to considerable bias that\u0000cannot be checked. In this paper, we propose a Bayesian ensemble approach that\u0000synthesizes multiple models for both the propensity score and outcomes, which\u0000we call doubly robust Bayesian regression synthesis. Our approach applies\u0000Bayesian updating to the ensemble model weights that adapt at the unit level,\u0000incorporating data heterogeneity, to significantly mitigate misspecification\u0000bias. Theoretically, we show that our proposed approach is consistent regarding\u0000the estimation of both the propensity score and outcomes, ensuring that the\u0000doubly robust estimator is consistent, even if no single model is correctly\u0000specified. An efficient algorithm for posterior computation facilitates the\u0000characterization of uncertainty regarding the treatment effect. Our proposed\u0000approach is compared against standard and state-of-the-art methods through two\u0000comprehensive simulation studies, where we find that our approach is superior\u0000in all cases. An empirical study on the impact of maternal smoking on birth\u0000weight highlights the practical applicability of our proposed method.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonparametric Inference for Balance in Signed Networks 符号网络中平衡的非参数推理
arXiv - STAT - Methodology Pub Date : 2024-09-10 DOI: arxiv-2409.06172
Xuyang Chen, Yinjie Wang, Weijing Tang
{"title":"Nonparametric Inference for Balance in Signed Networks","authors":"Xuyang Chen, Yinjie Wang, Weijing Tang","doi":"arxiv-2409.06172","DOIUrl":"https://doi.org/arxiv-2409.06172","url":null,"abstract":"In many real-world networks, relationships often go beyond simple dyadic\u0000presence or absence; they can be positive, like friendship, alliance, and\u0000mutualism, or negative, characterized by enmity, disputes, and competition. To\u0000understand the formation mechanism of such signed networks, the social balance\u0000theory sheds light on the dynamics of positive and negative connections. In\u0000particular, it characterizes the proverbs, \"a friend of my friend is my friend\"\u0000and \"an enemy of my enemy is my friend\". In this work, we propose a\u0000nonparametric inference approach for assessing empirical evidence for the\u0000balance theory in real-world signed networks. We first characterize the\u0000generating process of signed networks with node exchangeability and propose a\u0000nonparametric sparse signed graphon model. Under this model, we construct\u0000confidence intervals for the population parameters associated with balance\u0000theory and establish their theoretical validity. Our inference procedure is as\u0000computationally efficient as a simple normal approximation but offers\u0000higher-order accuracy. By applying our method, we find strong real-world\u0000evidence for balance theory in signed networks across various domains,\u0000extending its applicability beyond social psychology.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Sample Size for Supervised Machine Learning with Bulk Transcriptomic Sequencing: A Learning Curve Approach 利用大容量转录组测序优化监督机器学习的样本量:学习曲线方法
arXiv - STAT - Methodology Pub Date : 2024-09-10 DOI: arxiv-2409.06180
Yunhui Qi, Xinyi Wang, Li-Xuan Qin
{"title":"Optimizing Sample Size for Supervised Machine Learning with Bulk Transcriptomic Sequencing: A Learning Curve Approach","authors":"Yunhui Qi, Xinyi Wang, Li-Xuan Qin","doi":"arxiv-2409.06180","DOIUrl":"https://doi.org/arxiv-2409.06180","url":null,"abstract":"Accurate sample classification using transcriptomics data is crucial for\u0000advancing personalized medicine. Achieving this goal necessitates determining a\u0000suitable sample size that ensures adequate statistical power without undue\u0000resource allocation. Current sample size calculation methods rely on\u0000assumptions and algorithms that may not align with supervised machine learning\u0000techniques for sample classification. Addressing this critical methodological\u0000gap, we present a novel computational approach that establishes the\u0000power-versus-sample-size relationship by employing a data augmentation strategy\u0000followed by fitting a learning curve. We comprehensively evaluated its\u0000performance for microRNA and RNA sequencing data, considering diverse data\u0000characteristics and algorithm configurations, based on a spectrum of evaluation\u0000metrics. To foster accessibility and reproducibility, the Python and R code for\u0000implementing our approach is available on GitHub. Its deployment will\u0000significantly facilitate the adoption of machine learning in transcriptomics\u0000studies and accelerate their translation into clinically useful classifiers for\u0000personalized treatment.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new paradigm for global sensitivity analysis 全球敏感性分析的新范例
arXiv - STAT - Methodology Pub Date : 2024-09-10 DOI: arxiv-2409.06271
Gildas MazoMaIAGE
{"title":"A new paradigm for global sensitivity analysis","authors":"Gildas MazoMaIAGE","doi":"arxiv-2409.06271","DOIUrl":"https://doi.org/arxiv-2409.06271","url":null,"abstract":"<div><p>Current theory of global sensitivity analysis, based on a nonlinear\u0000functional ANOVA decomposition of the random output, is limited in scope-for\u0000instance, the analysis is limited to the output's variance and the inputs have\u0000to be mutually independent-and leads to sensitivity indices the interpretation\u0000of which is not fully clear, especially interaction effects. Alternatively,\u0000sensitivity indices built for arbitrary user-defined importance measures have\u0000been proposed but a theory to define interactions in a systematic fashion\u0000and/or establish a decomposition of the total importance measure is still\u0000missing. It is shown that these important problems are solved all at once by\u0000adopting a new paradigm. By partitioning the inputs into those causing the\u0000change in the output and those which do not, arbitrary user-defined variability\u0000measures are identified with the outcomes of a factorial experiment at two\u0000levels, leading to all factorial effects without assuming any functional\u0000decomposition. To link various well-known sensitivity indices of the literature\u0000(Sobol indices and Shapley effects), weighted factorial effects are studied and\u0000utilized.</p></div>","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
This is not normal! (Re-) Evaluating the lower $n$ guildelines for regression analysis 这是不正常的!(Re-) 评估回归分析的下 $n$ 准则
arXiv - STAT - Methodology Pub Date : 2024-09-10 DOI: arxiv-2409.06413
David Randahl
{"title":"This is not normal! (Re-) Evaluating the lower $n$ guildelines for regression analysis","authors":"David Randahl","doi":"arxiv-2409.06413","DOIUrl":"https://doi.org/arxiv-2409.06413","url":null,"abstract":"The commonly cited rule of thumb for regression analysis, which suggests that\u0000a sample size of $n geq 30$ is sufficient to ensure valid inferences, is\u0000frequently referenced but rarely scrutinized. This research note evaluates the\u0000lower bound for the number of observations required for regression analysis by\u0000exploring how different distributional characteristics, such as skewness and\u0000kurtosis, influence the convergence of t-values to the t-distribution in linear\u0000regression models. Through an extensive simulation study involving over 22\u0000billion regression models, this paper examines a range of symmetric,\u0000platykurtic, and skewed distributions, testing sample sizes from 4 to 10,000.\u0000The results reveal that it is sufficient that either the dependent or\u0000independent variable follow a symmetric distribution for the t-values to\u0000converge to the t-distribution at much smaller sample sizes than $n=30$. This\u0000is contrary to previous guidance which suggests that the error term needs to be\u0000normally distributed for this convergence to happen at low $n$. On the other\u0000hand, if both dependent and independent variables are highly skewed the\u0000required sample size is substantially higher. In cases of extreme skewness,\u0000even sample sizes of 10,000 do not ensure convergence. These findings suggest\u0000that the $ngeq30$ rule is too permissive in certain cases but overly\u0000conservative in others, depending on the underlying distributional\u0000characteristics. This study offers revised guidelines for determining the\u0000minimum sample size necessary for valid regression analysis.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142196633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信