Application of Imputation Method for Compositional Data with Missing Values based on Adaptive LASSO Model: the Composition of Employment Industry in Taiyuan, China

IF 0.8 Q3 MULTIDISCIPLINARY SCIENCES
Ying Tian, Majid Khan Majahar Ali, Fam Pei Shan, Lili Wu, Siti Zulaikha Mohd Jamaludin
{"title":"Application of Imputation Method for Compositional Data with Missing Values based on Adaptive LASSO Model: the Composition of Employment Industry in Taiyuan, China","authors":"Ying Tian, Majid Khan Majahar Ali, Fam Pei Shan, Lili Wu, Siti Zulaikha Mohd Jamaludin","doi":"10.11113/mjfas.v20n1.3034","DOIUrl":null,"url":null,"abstract":"The tripartite industry classification, which divides all economic activities into three parts, is a classification method to reflect the dynamic process of economic development and the historical trend of the change of resource allocation structure.The fact shows that the proportion of each industry has become an important symbol of the level of national economic development. The proportion of each industry is compositional data,which is a kind of complex multidimensional data used in many fields. All components in the compositional data are non-negative and carry only relative information. In practice, there could be missing values in compositional data. However, general statistical analysis methods cannot be firstly used for compositional data with missing values. The complexity of the missing value of compositional data makes traditional imputation methods no longer suitable. Thus, how to carry out effective statistical inference for compositional data with missing values attracts the attention of many scholars, recently. In this paper, we focus on the imputation problem in compositional data containing missing values, and propose an Adaptive Least Absolute Shrinkage and Selection Operator (ALASSO) imputation method to obtain a complete datasets through variable selection and parameter estimation. Then, the new method is simulated and empirically analyzed, and a comparative study with mean imputation, k-nearest neighbor imputation, and iterative regression imputation is conducted. The results show that the ALASSO imputation method has the highest accuracy for different missing rates, dimensions and correlation coefficients.","PeriodicalId":18149,"journal":{"name":"Malaysian Journal of Fundamental and Applied Sciences","volume":"62 20","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Fundamental and Applied Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/mjfas.v20n1.3034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The tripartite industry classification, which divides all economic activities into three parts, is a classification method to reflect the dynamic process of economic development and the historical trend of the change of resource allocation structure.The fact shows that the proportion of each industry has become an important symbol of the level of national economic development. The proportion of each industry is compositional data,which is a kind of complex multidimensional data used in many fields. All components in the compositional data are non-negative and carry only relative information. In practice, there could be missing values in compositional data. However, general statistical analysis methods cannot be firstly used for compositional data with missing values. The complexity of the missing value of compositional data makes traditional imputation methods no longer suitable. Thus, how to carry out effective statistical inference for compositional data with missing values attracts the attention of many scholars, recently. In this paper, we focus on the imputation problem in compositional data containing missing values, and propose an Adaptive Least Absolute Shrinkage and Selection Operator (ALASSO) imputation method to obtain a complete datasets through variable selection and parameter estimation. Then, the new method is simulated and empirically analyzed, and a comparative study with mean imputation, k-nearest neighbor imputation, and iterative regression imputation is conducted. The results show that the ALASSO imputation method has the highest accuracy for different missing rates, dimensions and correlation coefficients.
基于自适应 LASSO 模型的缺失值构成数据估算方法的应用:中国太原市的就业行业构成
将所有经济活动分成三部分的产业三段式分类法,是一种反映经济发展动态过程和资源配置结构变化历史趋势的分类方法。事实表明,各行业的比重已成为衡量国民经济发展水平的重要标志。各个行业的比重是组成数据,是一种复杂的多维数据,应用于很多领域。成分数据中的所有分量都是非负的,只携带相对信息。在实践中,组合数据中可能存在缺失值。然而,一般的统计分析方法不能首先用于含有缺失值的成分数据。由于成分数据缺失值的复杂性,传统的成分数据补全方法已不再适用。因此,如何对含有缺失值的成分数据进行有效的统计推断是近年来众多学者关注的问题。本文针对含有缺失值的成分数据的插值问题,提出了一种自适应最小绝对收缩和选择算子(ALASSO)插值方法,通过变量选择和参数估计获得完整的数据集。然后,对新方法进行了仿真和实证分析,并与均值归算、k近邻归算和迭代回归归算进行了比较研究。结果表明,在不同的缺失率、维度和相关系数下,ALASSO估算方法具有最高的精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.40
自引率
0.00%
发文量
45
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信