利用fisher变换增强回归模型的分数检验能力

Journal of the Japanese Society of Computational Statistics Pub Date : 2018-04-20 DOI:10.5183/JJSCS.1702001_234

Masao Ueki

{"title":"利用fisher变换增强回归模型的分数检验能力","authors":"Masao Ueki","doi":"10.5183/JJSCS.1702001_234","DOIUrl":null,"url":null,"abstract":"A simple method is presented to enhance statistical power of score tests for regression models via Fisher transformation (or Fisher’s z-transformation) by exploiting a relationship with the partial correlation coefficient. Simulation studies mimicking marginal association and gene-environment interaction analyses for genome-wide association studies (GWASs) under case-control design demonstrate that the Fisher transformation enhances power of the score tests while maintaining type I error asymptotically. The smaller the sample size is, the more the enhancement is pronounced, at the expense of inflated type I error due to invalidating asymptotic approximation. Accordingly, the proposed method may be applied when sample size is enough for valid asymptotic approximation. An illustration with real GWAS data is also presented. 1. Fisher-transformation of score tests for regression models Suppose that n response variables y = (y1, . . . , yn) T and an n × p design matrix X = (x1, . . . ,xn) T are observed, where xi is a p-dimensional column vector of explanatory variables for subject i ∈ {1, . . . , n}. Let f(yi | xi) denote the probability distribution of yi conditional on xi for each i. Here, the probability density function of a continuous random variable or the probability mass function of a discrete random variable is referred to as a probability distribution (Dobson, 2002). Assume that a transformed conditional expectation of yi through some differentiable monotone function (i.e. the link function) is written as xi β, in which β is a vector of corresponding p regression coefficients. Then, denote the loglikelihood by l(xi β) = log f(yi | xi) for the ith sample. Throughout, it is assumed that each yi is independently distributed given xi. The above regression framework includes the generalized linear models (McCullagh and Nelder, 1989; Dobson, 2002) and regression with heavy-tailed error distribution (Lange and Sinsheimer, 1993). Suppose thatX is partitioned into two parts as (X1,X2), where X1 is a collection of q (q < p) explanatory variables to be tested for association with y and X2 is a set of p − q covariates to be adjusted for. Correspondingly, let β = (β1 ,β T 2 ) T and xi = (x T 1,i,x T 2,i) T . In this article, X is assumed to be of full column rank. 1.1. Fisher-transformed score test: single parameter case This subsection considers the case of q = 1, and hence the corresponding regression coefficient is written as β1 with a non-bold letter. In genome-wide association study (GWAS) ∗Biostatistics Center, Kurume University, 67 Asahi-machi, Kurume, Fukuoka 830-0011, Japan. Present affiliation is Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan E-mail: uekimrsd@nifty.com","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"ENHANCING POWER OF SCORE TESTS FOR REGRESSION MODELS VIA FISHER TRANSFORMATION\",\"authors\":\"Masao Ueki\",\"doi\":\"10.5183/JJSCS.1702001_234\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A simple method is presented to enhance statistical power of score tests for regression models via Fisher transformation (or Fisher’s z-transformation) by exploiting a relationship with the partial correlation coefficient. Simulation studies mimicking marginal association and gene-environment interaction analyses for genome-wide association studies (GWASs) under case-control design demonstrate that the Fisher transformation enhances power of the score tests while maintaining type I error asymptotically. The smaller the sample size is, the more the enhancement is pronounced, at the expense of inflated type I error due to invalidating asymptotic approximation. Accordingly, the proposed method may be applied when sample size is enough for valid asymptotic approximation. An illustration with real GWAS data is also presented. 1. Fisher-transformation of score tests for regression models Suppose that n response variables y = (y1, . . . , yn) T and an n × p design matrix X = (x1, . . . ,xn) T are observed, where xi is a p-dimensional column vector of explanatory variables for subject i ∈ {1, . . . , n}. Let f(yi | xi) denote the probability distribution of yi conditional on xi for each i. Here, the probability density function of a continuous random variable or the probability mass function of a discrete random variable is referred to as a probability distribution (Dobson, 2002). Assume that a transformed conditional expectation of yi through some differentiable monotone function (i.e. the link function) is written as xi β, in which β is a vector of corresponding p regression coefficients. Then, denote the loglikelihood by l(xi β) = log f(yi | xi) for the ith sample. Throughout, it is assumed that each yi is independently distributed given xi. The above regression framework includes the generalized linear models (McCullagh and Nelder, 1989; Dobson, 2002) and regression with heavy-tailed error distribution (Lange and Sinsheimer, 1993). Suppose thatX is partitioned into two parts as (X1,X2), where X1 is a collection of q (q < p) explanatory variables to be tested for association with y and X2 is a set of p − q covariates to be adjusted for. Correspondingly, let β = (β1 ,β T 2 ) T and xi = (x T 1,i,x T 2,i) T . In this article, X is assumed to be of full column rank. 1.1. Fisher-transformed score test: single parameter case This subsection considers the case of q = 1, and hence the corresponding regression coefficient is written as β1 with a non-bold letter. In genome-wide association study (GWAS) ∗Biostatistics Center, Kurume University, 67 Asahi-machi, Kurume, Fukuoka 830-0011, Japan. Present affiliation is Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan E-mail: uekimrsd@nifty.com\",\"PeriodicalId\":338719,\"journal\":{\"name\":\"Journal of the Japanese Society of Computational Statistics\",\"volume\":\"146 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Japanese Society of Computational Statistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5183/JJSCS.1702001_234\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Japanese Society of Computational Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5183/JJSCS.1702001_234","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文提出了一种简单的方法，利用Fisher变换(或Fisher的z变换)与偏相关系数的关系来提高回归模型得分检验的统计能力。在病例对照设计下，模拟全基因组关联研究(GWASs)的边际关联和基因-环境相互作用分析的模拟研究表明，Fisher变换增强了分数检验的有效性，同时保持了I型误差的渐近性。样本量越小，增强越明显，代价是由于渐近近似无效而导致的I型误差膨胀。因此，所提出的方法可以应用于样本量足够的有效渐近逼近。并给出了用实际GWAS数据进行的说明。1. 假设n个响应变量y = (y1，…)， n) T和n × p设计矩阵X = (x1，…)，xn) T，其中xi是主题i∈{1，…的解释变量的p维列向量。n}。设f(yi | xi)表示每个i在xi条件下yi的概率分布。这里，连续随机变量的概率密度函数或离散随机变量的概率质量函数被称为概率分布(Dobson, 2002)。假设yi通过某个可微单调函数(即链接函数)变换后的条件期望写成xi β，其中β是对应p个回归系数的向量。然后，用l(xi β) = log f(yi | xi)表示第i个样本的对数似然。自始至终，假设给定xi，每个yi都是独立分布的。上述回归框架包括广义线性模型(McCullagh and Nelder, 1989;Dobson, 2002)和重尾误差分布回归(Lange and Sinsheimer, 1993)。设x分为(X1,X2)两部分，其中X1为q (q < p)个有待检验与y关联的解释变量集合，X2为p−q个有待调整的协变量集合。相应地，设β = (β1，β t2) T, xi = (x t1,i,x t2,i) T。在本文中，假设X具有全列秩。1.1. 本节考虑q = 1的情况，因此对应的回归系数用非黑体字母表示为β1。全基因组关联研究(GWAS) *生物统计中心，日本福冈830-0011。目前隶属于RIKEN高级智能项目中心统计遗传学小组，日本东京103-0027中央区日本桥1-4-1。E-mail: uekimrsd@nifty.com

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ENHANCING POWER OF SCORE TESTS FOR REGRESSION MODELS VIA FISHER TRANSFORMATION

A simple method is presented to enhance statistical power of score tests for regression models via Fisher transformation (or Fisher’s z-transformation) by exploiting a relationship with the partial correlation coefficient. Simulation studies mimicking marginal association and gene-environment interaction analyses for genome-wide association studies (GWASs) under case-control design demonstrate that the Fisher transformation enhances power of the score tests while maintaining type I error asymptotically. The smaller the sample size is, the more the enhancement is pronounced, at the expense of inflated type I error due to invalidating asymptotic approximation. Accordingly, the proposed method may be applied when sample size is enough for valid asymptotic approximation. An illustration with real GWAS data is also presented. 1. Fisher-transformation of score tests for regression models Suppose that n response variables y = (y1, . . . , yn) T and an n × p design matrix X = (x1, . . . ,xn) T are observed, where xi is a p-dimensional column vector of explanatory variables for subject i ∈ {1, . . . , n}. Let f(yi | xi) denote the probability distribution of yi conditional on xi for each i. Here, the probability density function of a continuous random variable or the probability mass function of a discrete random variable is referred to as a probability distribution (Dobson, 2002). Assume that a transformed conditional expectation of yi through some differentiable monotone function (i.e. the link function) is written as xi β, in which β is a vector of corresponding p regression coefficients. Then, denote the loglikelihood by l(xi β) = log f(yi | xi) for the ith sample. Throughout, it is assumed that each yi is independently distributed given xi. The above regression framework includes the generalized linear models (McCullagh and Nelder, 1989; Dobson, 2002) and regression with heavy-tailed error distribution (Lange and Sinsheimer, 1993). Suppose thatX is partitioned into two parts as (X1,X2), where X1 is a collection of q (q < p) explanatory variables to be tested for association with y and X2 is a set of p − q covariates to be adjusted for. Correspondingly, let β = (β1 ,β T 2 ) T and xi = (x T 1,i,x T 2,i) T . In this article, X is assumed to be of full column rank. 1.1. Fisher-transformed score test: single parameter case This subsection considers the case of q = 1, and hence the corresponding regression coefficient is written as β1 with a non-bold letter. In genome-wide association study (GWAS) ∗Biostatistics Center, Kurume University, 67 Asahi-machi, Kurume, Fukuoka 830-0011, Japan. Present affiliation is Statistical Genetics Team, RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan E-mail: uekimrsd@nifty.com

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Japanese Society of Computational Statistics

自引率

0.00%

发文量