当尺度和相关参数都未知时,GEE方法中用于模型选择的一种$C_p$类型准则

Pub Date : 2020-03-01 DOI:10.32917/hmj/1583805651
Tomoharu Sato, Yu Inatsu
{"title":"当尺度和相关参数都未知时,GEE方法中用于模型选择的一种$C_p$类型准则","authors":"Tomoharu Sato, Yu Inatsu","doi":"10.32917/hmj/1583805651","DOIUrl":null,"url":null,"abstract":"Recently, in real data analysis, we consider the data with correlation for many fields, for example medical science, economics and many other fields. Especially, the data what is measured repeatedly over times from same subjects, named longitudinal data, is widely used in those fields. In general, the data from same subject have correlation, on the other hand, the data from different subjects are independent.. Liang and Zeger (1986) introduce an extension of generalized linear model (Nelder and Wedderburn, 1972), named generalized estimating equation (GEE). GEE method is one of the methods to analyze the data with correlation. Defining features of the GEE method are that we can use working correlation matrix one can choose freely. We can get good estimation of parameters if working correlation matrix is correct or not. It is important that we don’t need a full specification of a joint distribution. In those reason, GEE method is widely used in many fields. ”Model selection” is also important problem, so we apply model selection to the GEE. In general, in model selection, we measure the goodness of fit by risk function, and choose the model with smallest risk function. Then, by using the asymptotically unbiased estimator of risk function, we consider the model selection criterion. For example, expected Kullback-Leibler information (Kullback and Leibler, 1951), and most famous Akaike’s information criterion (AIC) (Akaike, 1973, 1974) are used. The AIC is calculated by AIC = −2× (maximumloglikelihood)+2× (thenumberofparameters). Furthermore, the GIC what is expansion of the AIC proposed by Nishii (1984) and Rao (1988) is also applied for many fields. However, we can’t use the model selection criterion based likelihood as AIC or GIC because of we don’t specify joint distribution. Some model selection criteria like AIC and GIC in the GEE method have been already proposed. For example, Pan (2001) proposed the QIC based on the quasi-likelihood (defined by Wedderburn, 1974). Furthermore, the GCp proposed by Cantoni et al. (2005) is the generally extension of Mallow’s Cp (Mallows, 1973). The CIC proposed by Hin and Wang (2009) and Gosho et al. (2011) is criterion what select the correlation structure. Unfortunately, the above criteria are derived without consider the correlation structure so we regard to these criteria don’t reflect the correlation. From this background, in Inatsu and Imori (2013) proposed a new model selection criterion PMSEG (the prediction mean squared error in the GEE) using the risk function based on the prediction mean squared error (PMSE) normalized by the covariance matrix. Inatsu and Imori (2013) proposed this criterion when both correlation and scale parameters are known, but correlation and scale parameters are generally unknown so we consider this criterion when both correlation and scale parameters are unknown. In this paper, the main topic is to propose the model selection criterion considered correlation structure when both correlation and scale parameters are unknown. In order to propose the new model selection criterion, we evaluate the asymptotic bias of the estimator of risk function and consider the influence of estimation correlation parameter and scale parameter. We focus on the ”variable selection” which selecting the optimum combination of variables. The present paper organized as follows: In section 2, we introduce the GEE framework and propose the estimation method for parameters. After that, we perform the stochastic expansion of the GEE estimator. In section 3, we define the estimation of risk function, and evaluate the asymptotic bias by calculate the bias, and propose the new model selection criterion. In section 4, we perform numerical study. In section 5, we conclude our discussion. In appendix, we provide the calculation process for the bias.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A $C_p$ type criterion for model selection in the GEE method when both scale and correlation parameters are unknown\",\"authors\":\"Tomoharu Sato, Yu Inatsu\",\"doi\":\"10.32917/hmj/1583805651\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, in real data analysis, we consider the data with correlation for many fields, for example medical science, economics and many other fields. Especially, the data what is measured repeatedly over times from same subjects, named longitudinal data, is widely used in those fields. In general, the data from same subject have correlation, on the other hand, the data from different subjects are independent.. Liang and Zeger (1986) introduce an extension of generalized linear model (Nelder and Wedderburn, 1972), named generalized estimating equation (GEE). GEE method is one of the methods to analyze the data with correlation. Defining features of the GEE method are that we can use working correlation matrix one can choose freely. We can get good estimation of parameters if working correlation matrix is correct or not. It is important that we don’t need a full specification of a joint distribution. In those reason, GEE method is widely used in many fields. ”Model selection” is also important problem, so we apply model selection to the GEE. In general, in model selection, we measure the goodness of fit by risk function, and choose the model with smallest risk function. Then, by using the asymptotically unbiased estimator of risk function, we consider the model selection criterion. For example, expected Kullback-Leibler information (Kullback and Leibler, 1951), and most famous Akaike’s information criterion (AIC) (Akaike, 1973, 1974) are used. The AIC is calculated by AIC = −2× (maximumloglikelihood)+2× (thenumberofparameters). Furthermore, the GIC what is expansion of the AIC proposed by Nishii (1984) and Rao (1988) is also applied for many fields. However, we can’t use the model selection criterion based likelihood as AIC or GIC because of we don’t specify joint distribution. Some model selection criteria like AIC and GIC in the GEE method have been already proposed. For example, Pan (2001) proposed the QIC based on the quasi-likelihood (defined by Wedderburn, 1974). Furthermore, the GCp proposed by Cantoni et al. (2005) is the generally extension of Mallow’s Cp (Mallows, 1973). The CIC proposed by Hin and Wang (2009) and Gosho et al. (2011) is criterion what select the correlation structure. Unfortunately, the above criteria are derived without consider the correlation structure so we regard to these criteria don’t reflect the correlation. From this background, in Inatsu and Imori (2013) proposed a new model selection criterion PMSEG (the prediction mean squared error in the GEE) using the risk function based on the prediction mean squared error (PMSE) normalized by the covariance matrix. Inatsu and Imori (2013) proposed this criterion when both correlation and scale parameters are known, but correlation and scale parameters are generally unknown so we consider this criterion when both correlation and scale parameters are unknown. In this paper, the main topic is to propose the model selection criterion considered correlation structure when both correlation and scale parameters are unknown. In order to propose the new model selection criterion, we evaluate the asymptotic bias of the estimator of risk function and consider the influence of estimation correlation parameter and scale parameter. We focus on the ”variable selection” which selecting the optimum combination of variables. The present paper organized as follows: In section 2, we introduce the GEE framework and propose the estimation method for parameters. After that, we perform the stochastic expansion of the GEE estimator. In section 3, we define the estimation of risk function, and evaluate the asymptotic bias by calculate the bias, and propose the new model selection criterion. In section 4, we perform numerical study. In section 5, we conclude our discussion. In appendix, we provide the calculation process for the bias.\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2020-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.32917/hmj/1583805651\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.32917/hmj/1583805651","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

最近,在实际数据分析中,我们考虑了许多领域的相关数据,例如医学、经济学和许多其他领域。特别是,从同一受试者身上多次重复测量的数据,称为纵向数据,在这些领域得到了广泛应用。一般来说,来自同一主题的数据具有相关性,另一方面,来自不同主题的数据是独立的。。梁和Zeger(1986)介绍了广义线性模型(Nelder和Wedderburn,1972)的一个推广,称为广义估计方程(GEE)。GEE方法是对具有相关性的数据进行分析的方法之一。GEE方法的定义特征是,我们可以使用可以自由选择的工作相关矩阵。如果工作相关矩阵正确与否,我们可以得到很好的参数估计。重要的是,我们不需要联合分发的完整规范。因此,GEE方法在许多领域得到了广泛的应用。”选型”也是一个重要问题,因此我们将选型应用于GEE。通常,在模型选择中,我们通过风险函数来衡量拟合优度,并选择风险函数最小的模型。然后,利用风险函数的渐近无偏估计量,考虑模型的选择准则。例如,使用了预期的Kullback-Leibler信息(Kullback和Leibler,1951)和最著名的Akaike信息准则(AIC)(Akaike,19731974)。AIC的计算公式为AIC=−2×(最大似然比)+2×(参数个数)。此外,Nishii(1984)和Rao(1988)提出的作为AIC扩展的GIC也应用于许多领域。然而,由于我们没有指定联合分布,我们不能将基于模型选择准则的似然性用作AIC或GIC。已经提出了GEE方法中的一些模型选择标准,如AIC和GIC。例如,Pan(2001)提出了基于拟似然的QIC(由Wedderburn定义,1974)。此外,Cantoni等人(2005)提出的GCp是Mallow Cp的一般扩展(Mallows,1973)。Hin和Wang(2009)以及Gosho等人(2011)提出的CIC是选择相关结构的标准。不幸的是,上面的标准是在没有考虑相关性结构的情况下推导出来的,所以我们认为这些标准没有反映相关性。在此背景下,Inatsu和Imori(2013)提出了一种新的模型选择标准PMSEG(GEE中的预测均方误差),该标准使用基于协方差矩阵归一化的预测均方差(PMSE)的风险函数。Inatsu和Imori(2013)在相关性和标度参数都已知的情况下提出了这个标准,但相关性和标量参数通常是未知的,所以我们在相关性和尺度参数都未知的情况下考虑这个标准。本文的主要内容是在相关参数和尺度参数都未知的情况下,提出考虑相关结构的模型选择准则。为了提出新的模型选择准则,我们评估了风险函数估计量的渐近偏差,并考虑了估计相关参数和尺度参数的影响。我们关注的是“变量选择”,即选择变量的最佳组合。本文组织如下:在第2节中,我们介绍了GEE框架,并提出了参数的估计方法。然后,我们对GEE估计量进行了随机展开。在第3节中,我们定义了风险函数的估计,并通过计算偏差来评估渐近偏差,并提出了新的模型选择准则。在第4节中,我们进行了数值研究。在第5节中,我们结束讨论。在附录中,我们提供了偏差的计算过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
分享
查看原文
A $C_p$ type criterion for model selection in the GEE method when both scale and correlation parameters are unknown
Recently, in real data analysis, we consider the data with correlation for many fields, for example medical science, economics and many other fields. Especially, the data what is measured repeatedly over times from same subjects, named longitudinal data, is widely used in those fields. In general, the data from same subject have correlation, on the other hand, the data from different subjects are independent.. Liang and Zeger (1986) introduce an extension of generalized linear model (Nelder and Wedderburn, 1972), named generalized estimating equation (GEE). GEE method is one of the methods to analyze the data with correlation. Defining features of the GEE method are that we can use working correlation matrix one can choose freely. We can get good estimation of parameters if working correlation matrix is correct or not. It is important that we don’t need a full specification of a joint distribution. In those reason, GEE method is widely used in many fields. ”Model selection” is also important problem, so we apply model selection to the GEE. In general, in model selection, we measure the goodness of fit by risk function, and choose the model with smallest risk function. Then, by using the asymptotically unbiased estimator of risk function, we consider the model selection criterion. For example, expected Kullback-Leibler information (Kullback and Leibler, 1951), and most famous Akaike’s information criterion (AIC) (Akaike, 1973, 1974) are used. The AIC is calculated by AIC = −2× (maximumloglikelihood)+2× (thenumberofparameters). Furthermore, the GIC what is expansion of the AIC proposed by Nishii (1984) and Rao (1988) is also applied for many fields. However, we can’t use the model selection criterion based likelihood as AIC or GIC because of we don’t specify joint distribution. Some model selection criteria like AIC and GIC in the GEE method have been already proposed. For example, Pan (2001) proposed the QIC based on the quasi-likelihood (defined by Wedderburn, 1974). Furthermore, the GCp proposed by Cantoni et al. (2005) is the generally extension of Mallow’s Cp (Mallows, 1973). The CIC proposed by Hin and Wang (2009) and Gosho et al. (2011) is criterion what select the correlation structure. Unfortunately, the above criteria are derived without consider the correlation structure so we regard to these criteria don’t reflect the correlation. From this background, in Inatsu and Imori (2013) proposed a new model selection criterion PMSEG (the prediction mean squared error in the GEE) using the risk function based on the prediction mean squared error (PMSE) normalized by the covariance matrix. Inatsu and Imori (2013) proposed this criterion when both correlation and scale parameters are known, but correlation and scale parameters are generally unknown so we consider this criterion when both correlation and scale parameters are unknown. In this paper, the main topic is to propose the model selection criterion considered correlation structure when both correlation and scale parameters are unknown. In order to propose the new model selection criterion, we evaluate the asymptotic bias of the estimator of risk function and consider the influence of estimation correlation parameter and scale parameter. We focus on the ”variable selection” which selecting the optimum combination of variables. The present paper organized as follows: In section 2, we introduce the GEE framework and propose the estimation method for parameters. After that, we perform the stochastic expansion of the GEE estimator. In section 3, we define the estimation of risk function, and evaluate the asymptotic bias by calculate the bias, and propose the new model selection criterion. In section 4, we perform numerical study. In section 5, we conclude our discussion. In appendix, we provide the calculation process for the bias.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信