Optimal estimators of the population mean of a skewed distribution using auxiliary variables in median ranked-set sampling

IF 1.3 3区社会学 Q3 DEMOGRAPHY

Mathematical Population Studies Pub Date : 2023-10-05 DOI:10.1080/08898480.2023.2251852

Mohammad Hossein Zarinkolah, Hadi Jabbari, Mohammad Mehdi Saber

{"title":"Optimal estimators of the population mean of a skewed distribution using auxiliary variables in median ranked-set sampling","authors":"Mohammad Hossein Zarinkolah, Hadi Jabbari, Mohammad Mehdi Saber","doi":"10.1080/08898480.2023.2251852","DOIUrl":null,"url":null,"abstract":"ABSTRACTIn an asymmetric population, individuals are concentrated toward one tail of the distribution. An estimator of the population mean in this asymmetric case is constructed on the basis of median ranked-set sampling, that is, the population is divided into subsets of equal size and the intersections of these sets depend on the chosen order of ranking according to a known auxiliary variable. Ranking individuals according to this auxiliary variable should approximate their ranking with respect to the unknown variable of interest. This procedure is a cost-effective way of selecting the sample when the variable of interest is unknown. To do this, the auxiliary variable must be at least weakly correlated with the variable of interest. The proposed estimator extends that constructed with extreme ranked-set sampling, whose principle is to divide the population into subsets whose intersections depend on the extreme values of the auxiliary variable. The mean square error of the estimator is expressed analytically. A simulation allows for comparing the proposed estimator with estimators based on simple random sampling and with those based on sampling sets of extreme values. A simulation shows that when the response variable is correlated with both auxiliary variables, even if these correlations are weak, around 0.5 in absolute value, then the mean square error of the proposed estimator is at least 175% lower than the mean square error of estimators based either on simple random or on extreme ranked-set samplings. A first application focuses on household incomes in the Iranian provinces of Fars and Khuzestan in 2022, first with the single gross income, which is the total income that an individual or household earns before tax as auxiliary variable and then with the two auxiliary variables of total gross household income and wages paid year-round to heads of households through the banking network. In this application, the mean square error of the proposed estimator with median ranked-set sampling is at least 60% lower than that obtained with simple random and extreme ranked-set samplings. In the application of the physical preparation score with runners’ track records as an auxiliary variable concerning 160 Iranian athletes in 2022 with sample sizes of 6, 8, 10, 25, and 30, the mean square error of the proposed estimator with median ranked-set sampling is at least 50% lower than that obtained with simple random and extreme ranked-set samplings. In the third application of the COVID-19 mean mortality rate in 2022 in the USA, Iran, Turkey, and Germany, with sample sizes of 6, 8, 10, 25, and 30, estimations of the mean mortality rate are based on new cases. In each of the four countries, the mean square error of the proposed estimator under median ranked-set sampling is at least 60% lower than that obtained with simple random and extreme ranked-set samplings.KEYWORDS: Median ranked-set samplingpopulation meanranked-set samplingratio estimationsampling surveysJEL CLASSIFICATION: 62D0562D99 AcknowledgementsWe thank two reviewers for their constructive comments.Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 Iran’s national portal of statistics: https://www.amar.org.ir.Additional informationFundingWe received no fund or grant for this article.","PeriodicalId":49859,"journal":{"name":"Mathematical Population Studies","volume":"85 1","pages":"0"},"PeriodicalIF":1.3000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Population Studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/08898480.2023.2251852","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DEMOGRAPHY","Score":null,"Total":0}

引用次数: 0

Abstract

ABSTRACTIn an asymmetric population, individuals are concentrated toward one tail of the distribution. An estimator of the population mean in this asymmetric case is constructed on the basis of median ranked-set sampling, that is, the population is divided into subsets of equal size and the intersections of these sets depend on the chosen order of ranking according to a known auxiliary variable. Ranking individuals according to this auxiliary variable should approximate their ranking with respect to the unknown variable of interest. This procedure is a cost-effective way of selecting the sample when the variable of interest is unknown. To do this, the auxiliary variable must be at least weakly correlated with the variable of interest. The proposed estimator extends that constructed with extreme ranked-set sampling, whose principle is to divide the population into subsets whose intersections depend on the extreme values of the auxiliary variable. The mean square error of the estimator is expressed analytically. A simulation allows for comparing the proposed estimator with estimators based on simple random sampling and with those based on sampling sets of extreme values. A simulation shows that when the response variable is correlated with both auxiliary variables, even if these correlations are weak, around 0.5 in absolute value, then the mean square error of the proposed estimator is at least 175% lower than the mean square error of estimators based either on simple random or on extreme ranked-set samplings. A first application focuses on household incomes in the Iranian provinces of Fars and Khuzestan in 2022, first with the single gross income, which is the total income that an individual or household earns before tax as auxiliary variable and then with the two auxiliary variables of total gross household income and wages paid year-round to heads of households through the banking network. In this application, the mean square error of the proposed estimator with median ranked-set sampling is at least 60% lower than that obtained with simple random and extreme ranked-set samplings. In the application of the physical preparation score with runners’ track records as an auxiliary variable concerning 160 Iranian athletes in 2022 with sample sizes of 6, 8, 10, 25, and 30, the mean square error of the proposed estimator with median ranked-set sampling is at least 50% lower than that obtained with simple random and extreme ranked-set samplings. In the third application of the COVID-19 mean mortality rate in 2022 in the USA, Iran, Turkey, and Germany, with sample sizes of 6, 8, 10, 25, and 30, estimations of the mean mortality rate are based on new cases. In each of the four countries, the mean square error of the proposed estimator under median ranked-set sampling is at least 60% lower than that obtained with simple random and extreme ranked-set samplings.KEYWORDS: Median ranked-set samplingpopulation meanranked-set samplingratio estimationsampling surveysJEL CLASSIFICATION: 62D0562D99 AcknowledgementsWe thank two reviewers for their constructive comments.Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 Iran’s national portal of statistics: https://www.amar.org.ir.Additional informationFundingWe received no fund or grant for this article.

查看原文本刊更多论文

中位排序集抽样中使用辅助变量的偏态分布总体均值的最优估计

摘要在非对称种群中，个体向分布的一端集中。在这种不对称情况下，总体均值的估计是在中位数排序集抽样的基础上构造的，即将总体划分为大小相等的子集，这些子集的交点依赖于根据已知辅助变量选择的排序顺序。根据这个辅助变量对个体进行排名应该近似于他们对未知感兴趣变量的排名。当感兴趣的变量未知时，这个过程是选择样本的一种经济有效的方法。要做到这一点，辅助变量必须至少与感兴趣的变量弱相关。该估计量扩展了用极值秩集抽样构造的估计量，其原理是将总体划分为子集，这些子集的交集依赖于辅助变量的极值。估计量的均方误差用解析式表示。仿真允许将所提出的估计量与基于简单随机抽样的估计量和基于极值抽样集的估计量进行比较。仿真结果表明，当响应变量与两个辅助变量均相关时，即使相关性很弱(绝对值约为0.5)，所提出的估计量的均方误差至少比基于简单随机或极端秩集抽样的估计量的均方误差低175%。第一个应用侧重于2022年伊朗法尔斯省和胡齐斯坦省的家庭收入，首先使用单一总收入，即个人或家庭的税前总收入作为辅助变量，然后使用家庭总收入和通过银行网络全年支付给户主的工资这两个辅助变量。在这个应用中，采用中位数排序集抽样得到的估计量的均方误差比简单随机和极端排序集抽样得到的估计量的均方误差至少低60%。以运动员成绩为辅助变量的体能准备成绩为研究对象，选取了160名2022年伊朗运动员，样本量分别为6、8、10、25和30，采用中位数秩集抽样的估计量的均方误差比简单随机和极端秩集抽样的估计量的均方误差至少低50%。第三次应用2019冠状病毒病2022年平均死亡率在美国、伊朗、土耳其和德国进行，样本量分别为6、8、10、25和30，平均死亡率的估计基于新病例。在这四个国家中，所提出的估计量在中位数排序集抽样下的均方误差比简单随机和极端排序集抽样得到的估计量至少低60%。关键词:中位有序集抽样总体平均有序集抽样比例估计抽样调查分类:62D0562D99致谢感谢两位审稿人提出的建设性意见。披露声明作者未报告潜在的利益冲突。注1伊朗国家统计门户网站:https://www.amar.org.ir.Additional信息资助本文没有收到任何资金或资助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Mathematical Population Studies 数学-数学跨学科应用

CiteScore

3.20

自引率

11.10%

发文量

审稿时长

>12 weeks

期刊介绍： Mathematical Population Studies publishes carefully selected research papers in the mathematical and statistical study of populations. The journal is strongly interdisciplinary and invites contributions by mathematicians, demographers, (bio)statisticians, sociologists, economists, biologists, epidemiologists, actuaries, geographers, and others who are interested in the mathematical formulation of population-related questions. The scope covers both theoretical and empirical work. Manuscripts should be sent to Manuscript central for review. The editor-in-chief has final say on the suitability for publication.