多少缺失的数据对于纵向健康指标来说是过多的?给出了链式方程多重拟合中缺失比例拟合程度选择的初步准则。

IF 3.2 2区 医学 Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
K P Junaid, Tanvi Kiran, Madhu Gupta, Kamal Kishore, Sujata Siwatch
{"title":"多少缺失的数据对于纵向健康指标来说是过多的?给出了链式方程多重拟合中缺失比例拟合程度选择的初步准则。","authors":"K P Junaid, Tanvi Kiran, Madhu Gupta, Kamal Kishore, Sujata Siwatch","doi":"10.1186/s12963-025-00364-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The multiple imputation by chained equations (MICE) is a widely used approach for handling missing data. However, its robustness, especially for high missing proportions in health indicators, is under-researched. The study aimed to provide a preliminary guideline for the choice of the extent of missing proportion to impute longitudinal health-related data using the MICE method.</p><p><strong>Methods: </strong>The study obtained complete data on five mortality-related health indicators of 100 countries (2015-2019) from the Global Health Observatory. Nine incomplete datasets with missing rates from 10 to 90% were generated and imputed using MICE. The robustness of MICE was assessed through three approaches: comparison of means using the Repeated Measures- Analysis of variance, estimation of evaluation metrics (Root mean square error, mean absolute deviation, Bias, and proportionate variance), and visual inspection of box plots of imputed and non-imputed data.</p><p><strong>Results: </strong>The Repeated Measures- Analysis of variance revealed significant differences between complete and imputed data, primarily in imputed data with over 50% missing proportions. Evaluation metrics exhibited 'high performance' for the dataset with a 50% missing proportion for various health indicators However, with missing proportions exceeding 70%, the majority of indicators demonstrated a 'low' performance level in terms of most evaluation metrics. The visual inspection of the box plot revealed severe variance shrinkage in imputed datasets with missing proportions beyond 70%, corroborating the findings from the evaluation metrics.</p><p><strong>Conclusion: </strong>It demonstrates high robustness up to 50% missing values, with marginal deviations from complete datasets. Caution is warranted for missing proportions between 50 and 70%, as moderate alterations are observed. Proportions beyond 70% lead to significant variance shrinkage and compromised data reliability, emphasizing the importance of acknowledging imputation limitations for practical decision-making.</p>","PeriodicalId":51476,"journal":{"name":"Population Health Metrics","volume":"23 1","pages":"2"},"PeriodicalIF":3.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11787761/pdf/","citationCount":"0","resultStr":"{\"title\":\"How much missing data is too much to impute for longitudinal health indicators? A preliminary guideline for the choice of the extent of missing proportion to impute with multiple imputation by chained equations.\",\"authors\":\"K P Junaid, Tanvi Kiran, Madhu Gupta, Kamal Kishore, Sujata Siwatch\",\"doi\":\"10.1186/s12963-025-00364-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The multiple imputation by chained equations (MICE) is a widely used approach for handling missing data. However, its robustness, especially for high missing proportions in health indicators, is under-researched. The study aimed to provide a preliminary guideline for the choice of the extent of missing proportion to impute longitudinal health-related data using the MICE method.</p><p><strong>Methods: </strong>The study obtained complete data on five mortality-related health indicators of 100 countries (2015-2019) from the Global Health Observatory. Nine incomplete datasets with missing rates from 10 to 90% were generated and imputed using MICE. The robustness of MICE was assessed through three approaches: comparison of means using the Repeated Measures- Analysis of variance, estimation of evaluation metrics (Root mean square error, mean absolute deviation, Bias, and proportionate variance), and visual inspection of box plots of imputed and non-imputed data.</p><p><strong>Results: </strong>The Repeated Measures- Analysis of variance revealed significant differences between complete and imputed data, primarily in imputed data with over 50% missing proportions. Evaluation metrics exhibited 'high performance' for the dataset with a 50% missing proportion for various health indicators However, with missing proportions exceeding 70%, the majority of indicators demonstrated a 'low' performance level in terms of most evaluation metrics. The visual inspection of the box plot revealed severe variance shrinkage in imputed datasets with missing proportions beyond 70%, corroborating the findings from the evaluation metrics.</p><p><strong>Conclusion: </strong>It demonstrates high robustness up to 50% missing values, with marginal deviations from complete datasets. Caution is warranted for missing proportions between 50 and 70%, as moderate alterations are observed. Proportions beyond 70% lead to significant variance shrinkage and compromised data reliability, emphasizing the importance of acknowledging imputation limitations for practical decision-making.</p>\",\"PeriodicalId\":51476,\"journal\":{\"name\":\"Population Health Metrics\",\"volume\":\"23 1\",\"pages\":\"2\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11787761/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Population Health Metrics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12963-025-00364-2\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Population Health Metrics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12963-025-00364-2","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

摘要

背景:链式方程(MICE)多重拟合是一种广泛应用于缺失数据处理的方法。然而,它的稳健性,特别是对健康指标中高缺失比例的稳健性,尚未得到充分研究。本研究的目的是为选择缺失比例的程度提供初步的指导,以使用小鼠方法推算纵向健康相关数据。方法:从全球卫生观察站获得100个国家(2015-2019年)5项与死亡相关的健康指标的完整数据。生成了9个缺失率在10%到90%之间的不完整数据集,并使用MICE进行了输入。通过三种方法评估MICE的稳健性:使用重复测量法进行均值比较-方差分析,评估指标估计(均方根误差,平均绝对偏差,偏差和比例方差),以及对输入和非输入数据的箱形图进行目测检查。结果:重复测量-方差分析显示完整数据和输入数据之间存在显著差异,主要是在输入数据中有超过50%的缺失比例。评估指标对数据集表现出“高性能”,而各种健康指标的缺失比例为50%。然而,缺失比例超过70%,大多数指标在大多数评估指标方面表现出“低”性能水平。箱形图的目视检查显示,输入数据集的方差严重缩小,缺失比例超过70%,证实了评估指标的发现。结论:它显示了高达50%缺失值的高鲁棒性,与完整数据集的边际偏差。由于观察到中度变化,对于50%至70%之间的缺失比例需要谨慎。超过70%的比例会导致显著的方差收缩和数据可靠性受损,强调了在实际决策中承认归算限制的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
How much missing data is too much to impute for longitudinal health indicators? A preliminary guideline for the choice of the extent of missing proportion to impute with multiple imputation by chained equations.

Background: The multiple imputation by chained equations (MICE) is a widely used approach for handling missing data. However, its robustness, especially for high missing proportions in health indicators, is under-researched. The study aimed to provide a preliminary guideline for the choice of the extent of missing proportion to impute longitudinal health-related data using the MICE method.

Methods: The study obtained complete data on five mortality-related health indicators of 100 countries (2015-2019) from the Global Health Observatory. Nine incomplete datasets with missing rates from 10 to 90% were generated and imputed using MICE. The robustness of MICE was assessed through three approaches: comparison of means using the Repeated Measures- Analysis of variance, estimation of evaluation metrics (Root mean square error, mean absolute deviation, Bias, and proportionate variance), and visual inspection of box plots of imputed and non-imputed data.

Results: The Repeated Measures- Analysis of variance revealed significant differences between complete and imputed data, primarily in imputed data with over 50% missing proportions. Evaluation metrics exhibited 'high performance' for the dataset with a 50% missing proportion for various health indicators However, with missing proportions exceeding 70%, the majority of indicators demonstrated a 'low' performance level in terms of most evaluation metrics. The visual inspection of the box plot revealed severe variance shrinkage in imputed datasets with missing proportions beyond 70%, corroborating the findings from the evaluation metrics.

Conclusion: It demonstrates high robustness up to 50% missing values, with marginal deviations from complete datasets. Caution is warranted for missing proportions between 50 and 70%, as moderate alterations are observed. Proportions beyond 70% lead to significant variance shrinkage and compromised data reliability, emphasizing the importance of acknowledging imputation limitations for practical decision-making.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Population Health Metrics
Population Health Metrics PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH-
CiteScore
6.50
自引率
0.00%
发文量
21
审稿时长
29 weeks
期刊介绍: Population Health Metrics aims to advance the science of population health assessment, and welcomes papers relating to concepts, methods, ethics, applications, and summary measures of population health. The journal provides a unique platform for population health researchers to share their findings with the global community. We seek research that addresses the communication of population health measures and policy implications to stakeholders; this includes papers related to burden estimation and risk assessment, and research addressing population health across the full range of development. Population Health Metrics covers a broad range of topics encompassing health state measurement and valuation, summary measures of population health, descriptive epidemiology at the population level, burden of disease and injury analysis, disease and risk factor modeling for populations, and comparative assessment of risks to health at the population level. The journal is also interested in how to use and communicate indicators of population health to reduce disease burden, and the approaches for translating from indicators of population health to health-advancing actions. As a cross-cutting topic of importance, we are particularly interested in inequalities in population health and their measurement.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信