arXiv: Methodology最新文献

Revisiting Empirical Bayes Methods and Applications to Special Types of Data 回顾经验贝叶斯方法及其在特殊类型数据中的应用

arXiv: Methodology Pub Date : 2021-06-29 DOI: 10.20381/RUOR-26562

XiuWen Duan

引用次数: 0

Flexible Bayesian modelling of concomitant covariate effects in mixture models 混合模型中伴随协变量效应的柔性贝叶斯建模

arXiv: Methodology Pub Date : 2021-05-26 DOI: 10.48676/UNIBO/AMSDOTTORATO/9861

Marco Berrettini, G. Galimberti, Saverio Ranciati, T. B. Murphy

{"title":"Flexible Bayesian modelling of concomitant covariate effects in mixture models","authors":"Marco Berrettini, G. Galimberti, Saverio Ranciati, T. B. Murphy","doi":"10.48676/UNIBO/AMSDOTTORATO/9861","DOIUrl":"https://doi.org/10.48676/UNIBO/AMSDOTTORATO/9861","url":null,"abstract":"Mixture models provide a useful tool to account for unobserved heterogeneity, and are the basis of many model-based clustering methods. In order to gain additional flexibility, some model parameters can be expressed as functions of concomitant covariates. In particular, prior probabilities of latent group membership can be linked to concomitant covariates through a multinomial logistic regression model, where each of these so-called component weights is associated with a linear predictor involving one or more of these variables. In this Thesis, this approach is extended by replacing the linear predictors with additive ones, where the contributions of some/all concomitant covariates can be represented by smooth functions. An estimation procedure within the Bayesian paradigm is proposed. In particular, a data augmentation scheme based on difference random utility models is exploited, and smoothness of the covariate effects is controlled by suitable choices for the prior distributions of the spline coefficients. This methodology is then extended to include flexible covariates effects also on the component densities. \u0000The performance of the proposed methodologies is investigated via simulation experiments and applications to real data. The content of the Thesis is organized as follows. In Chapter 1, a literature review about mixture models and mixture models with covariate effects is provided. After a brief introduction on Bayesian additive models with P-splines, the general specification for the proposed method is presented in Chapter 2, together with the associated Bayesian inference procedure. This approach is adapted to the specific case of categorical and continuous manifest variables in Chapter 3 and Chapter 4, respectively. \u0000In Chapter 5, the proposed methodology is extended to include flexible covariate effects also in the component densities. Finally, conclusions and remarks on the Thesis are collected in Chapter 6.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129103149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Critique of Differential Abundance Analysis, and Advocacy for an Alternative 对差异丰度分析的批判，并倡导一种替代方法

arXiv: Methodology Pub Date : 2021-04-14 DOI: 10.5281/ZENODO.4692004

Thomas P. Quinn, E. Gordon-Rodríguez, Ionas Erb

引用次数: 7

Post-Processing of MCMC MCMC的后处理

arXiv: Methodology Pub Date : 2021-03-30 DOI: 10.1146/ANNUREVSTATISTICS-040220-091727

Leah F. South, M. Riabiz, Onur Teymur, C. Oates

引用次数: 0

Conditional variance estimator for sufficient dimension reduction 充分降维的条件方差估计

arXiv: Methodology Pub Date : 2021-02-17 DOI: 10.3150/21-bej1402

L. Fertl, E. Bura

引用次数: 5

The Violating Assumptions Series: Simulated demonstrations to illustrate how assumptions can affect statistical estimates 违反假设系列:模拟演示，说明假设如何影响统计估计

arXiv: Methodology Pub Date : 2021-01-18 DOI: 10.13140/RG.2.2.13339.69921

Ian A. Silver

引用次数: 0

A Feature Weighted Mixed Naive Bayes Model for Monitoring Anomalies in the Fan System of a Thermal Power Plant. 火电厂风机系统异常监测的特征加权混合朴素贝叶斯模型。

arXiv: Methodology Pub Date : 2020-12-14 DOI: 10.1109/JAS.2020.000000

Min Wang, Li Sheng, Donghua Zhou, Maoyin Chen

{"title":"A Feature Weighted Mixed Naive Bayes Model for Monitoring Anomalies in the Fan System of a Thermal Power Plant.","authors":"Min Wang, Li Sheng, Donghua Zhou, Maoyin Chen","doi":"10.1109/JAS.2020.000000","DOIUrl":"https://doi.org/10.1109/JAS.2020.000000","url":null,"abstract":"With the increasing intelligence and integration, a great number of two-valued variables (generally stored in the form of 0 or 1 value) often exist in large-scale industrial processes. However, these variables cannot be effectively handled by traditional monitoring methods such as LDA, PCA and PLS. Recently, a mixed hidden naive Bayesian model (MHNBM) is developed for the first time to utilize both two-valued and continuous variables for abnormality monitoring. Although MHNBM is effective, it still has some shortcomings that need to be improved. For MHNBM, the variables with greater correlation to other variables have greater weights, which cannot guarantee greater weights are assigned to the more discriminating variables. In addition, the conditional probability must be computed based on the historical data. When the training data is scarce, the conditional probability between continuous variables tends to be uniformly distributed, which affects the performance of MHNBM. Here a novel feature weighted mixed naive Bayes model (FWMNBM) is developed to overcome the above shortcomings. For FWMNBM, the variables that are more correlated to the class have greater weights, which makes the more discriminating variables contribute more to the model. At the same time, FWMNBM does not have to calculate the conditional probability between variables, thus it is less restricted by the number of training data samples. Compared with MHNBM, FWMNBM has better performance, and its effectiveness is validated through the numerical cases of a simulation example and a practical case of Zhoushan thermal power plant (ZTPP), China.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133605919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A Generalized Heckman Model With Varying Sample Selection Bias and Dispersion Parameters 具有变样本选择偏差和色散参数的广义Heckman模型

arXiv: Methodology Pub Date : 2020-12-03 DOI: 10.5705/SS.202021.0068

F. D. S. Bastos, W. Barreto‐Souza, M. Genton

{"title":"A Generalized Heckman Model With Varying Sample Selection Bias and Dispersion Parameters","authors":"F. D. S. Bastos, W. Barreto‐Souza, M. Genton","doi":"10.5705/SS.202021.0068","DOIUrl":"https://doi.org/10.5705/SS.202021.0068","url":null,"abstract":"Many proposals have emerged as alternatives to the Heckman selection model, mainly to address the non-robustness of its normal assumption. The 2001 Medical Expenditure Panel Survey data is often used to illustrate this non-robustness of the Heckman model. In this paper, we propose a generalization of the Heckman sample selection model by allowing the sample selection bias and dispersion parameters to depend on covariates. We show that the non-robustness of the Heckman model may be due to the assumption of the constant sample selection bias parameter rather than the normality assumption. Our proposed methodology allows us to understand which covariates are important to explain the sample selection bias phenomenon rather than to only form conclusions about its presence. We explore the inferential aspects of the maximum likelihood estimators (MLEs) for our proposed generalized Heckman model. More specifically, we show that this model satisfies some regularity conditions such that it ensures consistency and asymptotic normality of the MLEs. Proper score residuals for sample selection models are provided, and model adequacy is addressed. Simulated results are presented to check the finite-sample behavior of the estimators and to verify the consequences of not considering varying sample selection bias and dispersion parameters. We show that the normal assumption for analyzing medical expenditure data is suitable and that the conclusions drawn using our approach are coherent with findings from prior literature. Moreover, we identify which covariates are relevant to explain the presence of sample selection bias in this important dataset.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116945912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Double/debiased machine learning for logistic partially linear model logistic部分线性模型的双/去偏机器学习

arXiv: Methodology Pub Date : 2020-09-30 DOI: 10.1093/ECTJ/UTAB019

Molei Liu, Yi Zhang, D. Zhou

{"title":"Double/debiased machine learning for logistic partially linear model","authors":"Molei Liu, Yi Zhang, D. Zhou","doi":"10.1093/ECTJ/UTAB019","DOIUrl":"https://doi.org/10.1093/ECTJ/UTAB019","url":null,"abstract":"We propose double/debiased machine learning approaches to infer (at the parametric rate) the parametric component of a logistic partially linear model with the binary response following a conditional logistic model of a low dimensional linear parametric function of some key (exposure) covariates and a nonparametric function adjusting for the confounding effect of other covariates. We consider a Neyman orthogonal (doubly robust) score equation consisting of two nuisance functions: nonparametric component in the logistic model and conditional mean of the exposure on the other covariates and with the response fixed. To estimate the nuisance models, we separately consider the use of high dimensional (HD) sparse parametric models and more general (typically nonparametric) machine learning (ML) methods. In the HD case, we derive certain moment equations to calibrate the first-order bias of the nuisance models and grant our method a model double robustness property in the sense that our estimator achieves the desirable rate when at least one of the nuisance models is correctly specified and both of them are ultra-sparse. In the ML case, the non-linearity of the logit link makes it substantially harder than the partially linear setting to use an arbitrary conditional mean learning algorithm to estimate the nuisance component of the logistic model. We handle this obstacle through a novel full model refitting procedure that is easy-to-implement and facilitates the use of nonparametric ML algorithms in our framework. Our ML estimator is rate doubly robust in the same sense as Chernozhukov et al. (2018a). We evaluate our methods through simulation studies and apply them in assessing the effect of emergency contraceptive (EC) pill on early gestation foetal with a policy reform in Chile in 2008 (Bentancor and Clarke, 2017).","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130307167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

On Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification (MR MiSTERI) and Estimation for Causal Inference 孟德尔随机化混合尺度治疗效应的鲁棒识别与因果推理估计

arXiv: Methodology Pub Date : 2020-09-30 DOI: 10.1101/2020.09.29.20204420

Z. Liu, T. Ye, B. Sun, M. Schooling, E. T. Tchetgen Tchetgen

{"title":"On Mendelian Randomization Mixed-Scale Treatment Effect Robust Identification (MR MiSTERI) and Estimation for Causal Inference","authors":"Z. Liu, T. Ye, B. Sun, M. Schooling, E. T. Tchetgen Tchetgen","doi":"10.1101/2020.09.29.20204420","DOIUrl":"https://doi.org/10.1101/2020.09.29.20204420","url":null,"abstract":"Standard Mendelian randomization analysis can produce biased results if the genetic variant defining the instrumental variable (IV) is confounded and/or has a horizontal pleiotropic effect on the outcome of interest not mediated by the treatment. We provide novel identification conditions for the causal effect of a treatment in presence of unmeasured confounding, by leveraging an invalid IV for which both the IV independence and exclusion restriction assumptions may be violated. The proposed Mendelian randomization Mixed-Scale Treatment Effect Robust Identification (MR MiSTERI) approach relies on (i) an assumption that the treatment effect does not vary with the invalid IV on the additive scale; and (ii) that the selection bias due to confounding does not vary with the invalid IV on the odds ratio scale; and (iii) that the residual variance for the outcome is heteroscedastic and thus varies with the invalid IV. Although assumptions (i) and (ii) have, respectively appeared in the IV literature, assumption (iii) has not; we formally establish that their conjunction can identify a causal effect even with an invalid IV subject to pleiotropy. MiSTERI is shown to be particularly advantageous in presence of pervasive heterogeneity of pleiotropic effects on additive scale, a setting in which two recently proposed robust estimation methods MR GxE and MR GENIUS can be severely biased. For estimation, we propose a simple and consistent three-stage estimator that can be used as preliminary estimator to a carefully constructed one-step-update estimator, which is guaranteed to be more efficient under the assumed model. In order to incorporate multiple, possibly correlated and weak IVs, a common challenge in MR studies, we develop a MAny Weak Invalid Instruments (MR MaWII MiSTERI) approach for strengthened identification and improved accuracy. We have developed an R package MR-MiSTERI for public use of all proposed methods. We illustrate MR MiSTERI in an application using UK Biobank data to evaluate the causal relationship between body mass index and glucose, thus obtaining inferences that are robust to unmeasured confounding, leveraging many weak and potentially invalid candidate genetic IVs. MaWII MiSTERI is shown to be robust to horizontal pleiotropy, violation of IV independence assumption and weak IV bias. Both simulation studies and real data analysis results demonstrate the robustness of the proposed MR MiSTERI methods.","PeriodicalId":186390,"journal":{"name":"arXiv: Methodology","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126698660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8