arXiv: Methodology最新文献

筛选
英文 中文
Revisiting Empirical Bayes Methods and Applications to Special Types of Data 回顾经验贝叶斯方法及其在特殊类型数据中的应用
arXiv: Methodology Pub Date : 2021-06-29 DOI: 10.20381/RUOR-26562
XiuWen Duan
{"title":"Revisiting Empirical Bayes Methods and Applications to Special Types of Data","authors":"XiuWen Duan","doi":"10.20381/RUOR-26562","DOIUrl":"https://doi.org/10.20381/RUOR-26562","url":null,"abstract":"Empirical Bayes methods have been around for a long time and have a wide range of applications. These methods provide a way in which historical data can be aggregated to provide estimates of the posterior mean. This thesis revisits some of the empirical Bayesian methods and develops new applications. We first look at a linear empirical Bayes estimator and apply it on ranking and symbolic data. Next, we consider Tweedie's formula and show how it can be applied to analyze a microarray dataset. The application of the formula is simplified with the Pearson system of distributions. Saddlepoint approximations enable us to generalize several results in this direction. The results show that the proposed methods perform well in applications to real data sets.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127127471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible Bayesian modelling of concomitant covariate effects in mixture models 混合模型中伴随协变量效应的柔性贝叶斯建模
arXiv: Methodology Pub Date : 2021-05-26 DOI: 10.48676/UNIBO/AMSDOTTORATO/9861
Marco Berrettini, G. Galimberti, Saverio Ranciati, T. B. Murphy
{"title":"Flexible Bayesian modelling of concomitant covariate effects in mixture models","authors":"Marco Berrettini, G. Galimberti, Saverio Ranciati, T. B. Murphy","doi":"10.48676/UNIBO/AMSDOTTORATO/9861","DOIUrl":"https://doi.org/10.48676/UNIBO/AMSDOTTORATO/9861","url":null,"abstract":"Mixture models provide a useful tool to account for unobserved heterogeneity, and are the basis of many model-based clustering methods. In order to gain additional flexibility, some model parameters can be expressed as functions of concomitant covariates. In particular, prior probabilities of latent group membership can be linked to concomitant covariates through a multinomial logistic regression model, where each of these so-called component weights is associated with a linear predictor involving one or more of these variables. In this Thesis, this approach is extended by replacing the linear predictors with additive ones, where the contributions of some/all concomitant covariates can be represented by smooth functions. An estimation procedure within the Bayesian paradigm is proposed. In particular, a data augmentation scheme based on difference random utility models is exploited, and smoothness of the covariate effects is controlled by suitable choices for the prior distributions of the spline coefficients. This methodology is then extended to include flexible covariates effects also on the component densities. \u0000The performance of the proposed methodologies is investigated via simulation experiments and applications to real data. The content of the Thesis is organized as follows. In Chapter 1, a literature review about mixture models and mixture models with covariate effects is provided. After a brief introduction on Bayesian additive models with P-splines, the general specification for the proposed method is presented in Chapter 2, together with the associated Bayesian inference procedure. This approach is adapted to the specific case of categorical and continuous manifest variables in Chapter 3 and Chapter 4, respectively. \u0000In Chapter 5, the proposed methodology is extended to include flexible covariate effects also in the component densities. Finally, conclusions and remarks on the Thesis are collected in Chapter 6.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129103149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Critique of Differential Abundance Analysis, and Advocacy for an Alternative 对差异丰度分析的批判,并倡导一种替代方法
arXiv: Methodology Pub Date : 2021-04-14 DOI: 10.5281/ZENODO.4692004
Thomas P. Quinn, E. Gordon-Rodríguez, Ionas Erb
{"title":"A Critique of Differential Abundance Analysis, and Advocacy for an Alternative","authors":"Thomas P. Quinn, E. Gordon-Rodríguez, Ionas Erb","doi":"10.5281/ZENODO.4692004","DOIUrl":"https://doi.org/10.5281/ZENODO.4692004","url":null,"abstract":"It is largely taken for granted that differential abundance analysis is, by default, the best first step when analyzing genomic data. We argue that this is not necessarily the case. In this article, we identify key limitations that are intrinsic to differential abundance analysis: it is (a) dependent on unverifiable assumptions, (b) an unreliable construct, and (c) overly reductionist. We formulate an alternative framework called ratio-based biomarker analysis which does not suffer from the identified limitations. Moreover, ratio-based biomarkers are highly flexible. Beyond replacing DAA, they can also be used for many other bespoke analyses, including dimension reduction and multi-omics data integration.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127024164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Post-Processing of MCMC MCMC的后处理
arXiv: Methodology Pub Date : 2021-03-30 DOI: 10.1146/ANNUREVSTATISTICS-040220-091727
Leah F. South, M. Riabiz, Onur Teymur, C. Oates
{"title":"Post-Processing of MCMC","authors":"Leah F. South, M. Riabiz, Onur Teymur, C. Oates","doi":"10.1146/ANNUREVSTATISTICS-040220-091727","DOIUrl":"https://doi.org/10.1146/ANNUREVSTATISTICS-040220-091727","url":null,"abstract":"Markov chain Monte Carlo is the engine of modern Bayesian statistics, being used to approximate the posterior and derived quantities of interest. Despite this, the issue of how the output from a Markov chain is post-processed and reported is often overlooked. Convergence diagnostics can be used to control bias via burn-in removal, but these do not account for (common) situations where a limited computational budget engenders a bias-variance trade-off. The aim of this article is to review state-of-the-art techniques for post-processing Markov chain output. Our review covers methods based on discrepancy minimisation, which \u0000directly address the bias-variance trade-off, as well as general-purpose control variate methods for approximating expected quantities of interest.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"318 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124295458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditional variance estimator for sufficient dimension reduction 充分降维的条件方差估计
arXiv: Methodology Pub Date : 2021-02-17 DOI: 10.3150/21-bej1402
L. Fertl, E. Bura
{"title":"Conditional variance estimator for sufficient dimension reduction","authors":"L. Fertl, E. Bura","doi":"10.3150/21-bej1402","DOIUrl":"https://doi.org/10.3150/21-bej1402","url":null,"abstract":"Conditional Variance Estimation (CVE) is a novel sufficient dimension reduction (SDR) method for additive error regressions with continuous predictors and link function. It operates under the assumption that the predictors can be replaced by a lower dimensional projection without loss of information. In contrast to the majority of moment based sufficient dimension reduction methods, Conditional Variance Estimation is fully data driven, does not require the restrictive linearity and constant variance conditions, and is not based on inverse regression. CVE is shown to be consistent and its objective function to be uniformly convergent. CVE outperforms the mean average variance estimation, (MAVE), its main competitor, in several simulation settings, remains on par under others, while it always outperforms the usual inverse regression based linear SDR methods, such as Sliced Inverse Regression.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117223705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The Violating Assumptions Series: Simulated demonstrations to illustrate how assumptions can affect statistical estimates 违反假设系列:模拟演示,说明假设如何影响统计估计
arXiv: Methodology Pub Date : 2021-01-18 DOI: 10.13140/RG.2.2.13339.69921
Ian A. Silver
{"title":"The Violating Assumptions Series: Simulated demonstrations to illustrate how assumptions can affect statistical estimates","authors":"Ian A. Silver","doi":"10.13140/RG.2.2.13339.69921","DOIUrl":"https://doi.org/10.13140/RG.2.2.13339.69921","url":null,"abstract":"When teaching and discussing statistical assumptions, our focus is oftentimes placed on how to test and address potential violations rather than the effects of violating assumptions on the estimates produced by our statistical models. The latter represents a potential avenue to help us better understand the impact of researcher degrees of freedom on the statistical estimates we produce. The Violating Assumptions Series is an endeavor I have undertaken to demonstrate the effects of violating assumptions on the estimates produced across various statistical models. The series will review assumptions associated with estimating causal associations, as well as more complicated statistical models including, but not limited to, multilevel models, path models, structural equation models, and Bayesian models. In addition to the primary goal, the series of posts is designed to illustrate how simulations can be used to develop a comprehensive understanding of applied statistics.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126820360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Feature Weighted Mixed Naive Bayes Model for Monitoring Anomalies in the Fan System of a Thermal Power Plant. 火电厂风机系统异常监测的特征加权混合朴素贝叶斯模型。
arXiv: Methodology Pub Date : 2020-12-14 DOI: 10.1109/JAS.2020.000000
Min Wang, Li Sheng, Donghua Zhou, Maoyin Chen
{"title":"A Feature Weighted Mixed Naive Bayes Model for Monitoring Anomalies in the Fan System of a Thermal Power Plant.","authors":"Min Wang, Li Sheng, Donghua Zhou, Maoyin Chen","doi":"10.1109/JAS.2020.000000","DOIUrl":"https://doi.org/10.1109/JAS.2020.000000","url":null,"abstract":"With the increasing intelligence and integration, a great number of two-valued variables (generally stored in the form of 0 or 1 value) often exist in large-scale industrial processes. However, these variables cannot be effectively handled by traditional monitoring methods such as LDA, PCA and PLS. Recently, a mixed hidden naive Bayesian model (MHNBM) is developed for the first time to utilize both two-valued and continuous variables for abnormality monitoring. Although MHNBM is effective, it still has some shortcomings that need to be improved. For MHNBM, the variables with greater correlation to other variables have greater weights, which cannot guarantee greater weights are assigned to the more discriminating variables. In addition, the conditional probability must be computed based on the historical data. When the training data is scarce, the conditional probability between continuous variables tends to be uniformly distributed, which affects the performance of MHNBM. Here a novel feature weighted mixed naive Bayes model (FWMNBM) is developed to overcome the above shortcomings. For FWMNBM, the variables that are more correlated to the class have greater weights, which makes the more discriminating variables contribute more to the model. At the same time, FWMNBM does not have to calculate the conditional probability between variables, thus it is less restricted by the number of training data samples. Compared with MHNBM, FWMNBM has better performance, and its effectiveness is validated through the numerical cases of a simulation example and a practical case of Zhoushan thermal power plant (ZTPP), China.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133605919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Generalized Heckman Model With Varying Sample Selection Bias and Dispersion Parameters 具有变样本选择偏差和色散参数的广义Heckman模型
arXiv: Methodology Pub Date : 2020-12-03 DOI: 10.5705/SS.202021.0068
F. D. S. Bastos, W. Barreto‐Souza, M. Genton
{"title":"A Generalized Heckman Model With Varying Sample Selection Bias and Dispersion Parameters","authors":"F. D. S. Bastos, W. Barreto‐Souza, M. Genton","doi":"10.5705/SS.202021.0068","DOIUrl":"https://doi.org/10.5705/SS.202021.0068","url":null,"abstract":"Many proposals have emerged as alternatives to the Heckman selection model, mainly to address the non-robustness of its normal assumption. The 2001 Medical Expenditure Panel Survey data is often used to illustrate this non-robustness of the Heckman model. In this paper, we propose a generalization of the Heckman sample selection model by allowing the sample selection bias and dispersion parameters to depend on covariates. We show that the non-robustness of the Heckman model may be due to the assumption of the constant sample selection bias parameter rather than the normality assumption. Our proposed methodology allows us to understand which covariates are important to explain the sample selection bias phenomenon rather than to only form conclusions about its presence. We explore the inferential aspects of the maximum likelihood estimators (MLEs) for our proposed generalized Heckman model. More specifically, we show that this model satisfies some regularity conditions such that it ensures consistency and asymptotic normality of the MLEs. Proper score residuals for sample selection models are provided, and model adequacy is addressed. Simulated results are presented to check the finite-sample behavior of the estimators and to verify the consequences of not considering varying sample selection bias and dispersion parameters. We show that the normal assumption for analyzing medical expenditure data is suitable and that the conclusions drawn using our approach are coherent with findings from prior literature. Moreover, we identify which covariates are relevant to explain the presence of sample selection bias in this important dataset.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116945912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Double/debiased machine learning for logistic partially linear model logistic部分线性模型的双/去偏机器学习
arXiv: Methodology Pub Date : 2020-09-30 DOI: 10.1093/ECTJ/UTAB019
Molei Liu, Yi Zhang, D. Zhou
{"title":"Double/debiased machine learning for logistic partially linear model","authors":"Molei Liu, Yi Zhang, D. Zhou","doi":"10.1093/ECTJ/UTAB019","DOIUrl":"https://doi.org/10.1093/ECTJ/UTAB019","url":null,"abstract":"We propose double/debiased machine learning approaches to infer (at the parametric rate) the parametric component of a logistic partially linear model with the binary response following a conditional logistic model of a low dimensional linear parametric function of some key (exposure) covariates and a nonparametric function adjusting for the confounding effect of other covariates. We consider a Neyman orthogonal (doubly robust) score equation consisting of two nuisance functions: nonparametric component in the logistic model and conditional mean of the exposure on the other covariates and with the response fixed. To estimate the nuisance models, we separately consider the use of high dimensional (HD) sparse parametric models and more general (typically nonparametric) machine learning (ML) methods. In the HD case, we derive certain moment equations to calibrate the first-order bias of the nuisance models and grant our method a model double robustness property in the sense that our estimator achieves the desirable rate when at least one of the nuisance models is correctly specified and both of them are ultra-sparse. In the ML case, the non-linearity of the logit link makes it substantially harder than the partially linear setting to use an arbitrary conditional mean learning algorithm to estimate the nuisance component of the logistic model. We handle this obstacle through a novel full model refitting procedure that is easy-to-implement and facilitates the use of nonparametric ML algorithms in our framework. Our ML estimator is rate doubly robust in the same sense as Chernozhukov et al. (2018a). We evaluate our methods through simulation studies and apply them in assessing the effect of emergency contraceptive (EC) pill on early gestation foetal with a policy reform in Chile in 2008 (Bentancor and Clarke, 2017).","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130307167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
On Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification (MR MiSTERI) and Estimation for Causal Inference 孟德尔随机化混合尺度治疗效应的鲁棒识别与因果推理估计
arXiv: Methodology Pub Date : 2020-09-30 DOI: 10.1101/2020.09.29.20204420
Z. Liu, T. Ye, B. Sun, M. Schooling, E. T. Tchetgen Tchetgen
{"title":"On Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification (MR MiSTERI) and Estimation for Causal Inference","authors":"Z. Liu, T. Ye, B. Sun, M. Schooling, E. T. Tchetgen Tchetgen","doi":"10.1101/2020.09.29.20204420","DOIUrl":"https://doi.org/10.1101/2020.09.29.20204420","url":null,"abstract":"Standard Mendelian randomization analysis can produce biased results if the genetic variant defining the instrumental variable (IV) is confounded and/or has a horizontal pleiotropic effect on the outcome of interest not mediated by the treatment. We provide novel identification conditions for the causal effect of a treatment in presence of unmeasured confounding, by leveraging an invalid IV for which both the IV independence and exclusion restriction assumptions may be violated. The proposed Mendelian randomization Mixed-Scale Treatment Effect Robust Identification (MR MiSTERI) approach relies on (i) an assumption that the treatment effect does not vary with the invalid IV on the additive scale; and (ii) that the selection bias due to confounding does not vary with the invalid IV on the odds ratio scale; and (iii) that the residual variance for the outcome is heteroscedastic and thus varies with the invalid IV. Although assumptions (i) and (ii) have, respectively appeared in the IV literature, assumption (iii) has not; we formally establish that their conjunction can identify a causal effect even with an invalid IV subject to pleiotropy. MiSTERI is shown to be particularly advantageous in presence of pervasive heterogeneity of pleiotropic effects on additive scale, a setting in which two recently proposed robust estimation methods MR GxE and MR GENIUS can be severely biased. For estimation, we propose a simple and consistent three-stage estimator that can be used as preliminary estimator to a carefully constructed one-step-update estimator, which is guaranteed to be more efficient under the assumed model. In order to incorporate multiple, possibly correlated and weak IVs, a common challenge in MR studies, we develop a MAny Weak Invalid Instruments (MR MaWII MiSTERI) approach for strengthened identification and improved accuracy. We have developed an R package MR-MiSTERI for public use of all proposed methods. We illustrate MR MiSTERI in an application using UK Biobank data to evaluate the causal relationship between body mass index and glucose, thus obtaining inferences that are robust to unmeasured confounding, leveraging many weak and potentially invalid candidate genetic IVs. MaWII MiSTERI is shown to be robust to horizontal pleiotropy, violation of IV independence assumption and weak IV bias. Both simulation studies and real data analysis results demonstrate the robustness of the proposed MR MiSTERI methods.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126698660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信