Kernel-Smoothed Conditional Quantiles of Correlated Bivariate Discrete Data

J. De Gooijer, A. Yuan
{"title":"Kernel-Smoothed Conditional Quantiles of Correlated Bivariate Discrete Data","authors":"J. De Gooijer, A. Yuan","doi":"10.2139/ssrn.1742230","DOIUrl":null,"url":null,"abstract":"Often socio-economic variables are measured on a discrete scale or rounded to protect confidentiality. Nevertheless, when exploring the effect of a relevant covariate on the whole outcome distribution of a discrete response variable, virtually all common quantile regression methods require the distribution of the covariate to be continuous. This paper departs from this basic requirement by presenting an algorithm for nonparametric estimation of conditional quantiles when both the response variable and the covariate are discretely distributed. Moreover, we allow the variables of interest to be pairwise correlated. For computational efficiency, we aggregate the data into smaller subsets by a binning operation, and make inference on the resulting prebinned data. Specifically, we propose two kernel-based binned conditional quantile estimators, one for untransformed discrete response data and one for rank-transformed response data. We establish asymptotic properties of both estimators. A practical procedure for jointly selecting band- and binwidth parameters is also presented. Simulation results show excellent estimation accuracy in terms of bias, mean squared error, and confidence interval coverage. Typically prebinning the data leads to considerable computational savings when large datasets are under study, as compared to direct (un)conditional quantile kernel estimation of multivariate data. With this in mind, we illustrate the proposed methodology with an application to a large real dataset concerning US hospital patients with congestive heart failure.","PeriodicalId":264857,"journal":{"name":"ERN: Semiparametric & Nonparametric Methods (Topic)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2011-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Semiparametric & Nonparametric Methods (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.1742230","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Often socio-economic variables are measured on a discrete scale or rounded to protect confidentiality. Nevertheless, when exploring the effect of a relevant covariate on the whole outcome distribution of a discrete response variable, virtually all common quantile regression methods require the distribution of the covariate to be continuous. This paper departs from this basic requirement by presenting an algorithm for nonparametric estimation of conditional quantiles when both the response variable and the covariate are discretely distributed. Moreover, we allow the variables of interest to be pairwise correlated. For computational efficiency, we aggregate the data into smaller subsets by a binning operation, and make inference on the resulting prebinned data. Specifically, we propose two kernel-based binned conditional quantile estimators, one for untransformed discrete response data and one for rank-transformed response data. We establish asymptotic properties of both estimators. A practical procedure for jointly selecting band- and binwidth parameters is also presented. Simulation results show excellent estimation accuracy in terms of bias, mean squared error, and confidence interval coverage. Typically prebinning the data leads to considerable computational savings when large datasets are under study, as compared to direct (un)conditional quantile kernel estimation of multivariate data. With this in mind, we illustrate the proposed methodology with an application to a large real dataset concerning US hospital patients with congestive heart failure.
相关二元离散数据的核平滑条件分位数
社会经济变量通常以离散尺度或四舍五入进行测量,以保护机密性。然而,当探索相关协变量对离散响应变量的整个结果分布的影响时,几乎所有常用的分位数回归方法都要求协变量的分布是连续的。本文从这一基本要求出发,提出了一种响应变量和协变量均为离散分布时条件分位数的非参数估计算法。此外,我们允许感兴趣的变量是两两相关的。为了提高计算效率,我们通过分组操作将数据聚合成更小的子集,并对结果预分组数据进行推理。具体来说,我们提出了两个基于核的分类条件分位数估计器,一个用于未变换的离散响应数据,另一个用于秩变换的响应数据。我们建立了两个估计量的渐近性质。提出了一种联合选择带宽和双宽参数的实用方法。仿真结果表明,该方法在偏置、均方误差和置信区间覆盖方面具有良好的估计精度。通常,与直接(非)条件分位数核估计多变量数据相比,在研究大型数据集时,预合并数据可以节省大量的计算量。考虑到这一点,我们通过应用于美国医院充血性心力衰竭患者的大型真实数据集来说明所提出的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信