A computationally efficient sequential regression imputation algorithm for multilevel data

IF 1.2 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics Pub Date : 2023-11-06 DOI:10.1080/02664763.2023.2277669

Tugba Akkaya Hocagil, Recai M. Yucel

{"title":"A computationally efficient sequential regression imputation algorithm for multilevel data","authors":"Tugba Akkaya Hocagil, Recai M. Yucel","doi":"10.1080/02664763.2023.2277669","DOIUrl":null,"url":null,"abstract":"ABSTRACTDue to the computational burden, especially in high-dimensional settings, sequential imputation may not be practical. In this paper, we adopt computationally advantageous methods by sampling the missing data from their perspective predictive distributions, which leads to significantly improved computation time in the class of variable-by-variable imputation algorithms. We assess the computational performance in a comprehensive simulation study. We then compare and contrast the performance of our algorithm with commonly used alternatives. The results show that our method has a significant advantage over the commonly used alternatives with respect to computational efficiency and inferential quality. Finally, we demonstrate our methods in a substantive problem aimed at investigating the effects of area-level behavioral, socioeconomic, and demographic characteristics on poor birth outcomes in New York State among singleton births.KEYWORDS: Sequential regression imputationmultilevel datacomputational efficiencyfast variable by variable imputationmultiple imputation by chained equations AcknowledgmentsWe thank Dr. Tabassum Insaf for providing assistance in accessing the New York State Vital Records Registry data.Disclosure statementNo potential conflict of interest was reported by the author(s).","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"2017 6","pages":"0"},"PeriodicalIF":1.2000,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/02664763.2023.2277669","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

ABSTRACTDue to the computational burden, especially in high-dimensional settings, sequential imputation may not be practical. In this paper, we adopt computationally advantageous methods by sampling the missing data from their perspective predictive distributions, which leads to significantly improved computation time in the class of variable-by-variable imputation algorithms. We assess the computational performance in a comprehensive simulation study. We then compare and contrast the performance of our algorithm with commonly used alternatives. The results show that our method has a significant advantage over the commonly used alternatives with respect to computational efficiency and inferential quality. Finally, we demonstrate our methods in a substantive problem aimed at investigating the effects of area-level behavioral, socioeconomic, and demographic characteristics on poor birth outcomes in New York State among singleton births.KEYWORDS: Sequential regression imputationmultilevel datacomputational efficiencyfast variable by variable imputationmultiple imputation by chained equations AcknowledgmentsWe thank Dr. Tabassum Insaf for providing assistance in accessing the New York State Vital Records Registry data.Disclosure statementNo potential conflict of interest was reported by the author(s).

查看原文本刊更多论文

一种计算效率高的多层次数据序列回归插值算法

摘要由于计算量大，特别是在高维环境下，序贯输入可能不太实用。在本文中，我们采用了计算优势的方法，从缺失数据的预测分布角度对缺失数据进行采样，从而显著提高了变量逐变量插值算法的计算时间。我们在一个全面的模拟研究中评估了计算性能。然后，我们将我们的算法与常用替代算法的性能进行比较和对比。结果表明，我们的方法在计算效率和推理质量方面比常用的替代方法具有显著的优势。最后，我们在一个实质性问题中展示了我们的方法，该问题旨在调查纽约州单胎分娩中区域层面行为、社会经济和人口特征对不良出生结果的影响。关键词:序贯回归、多层次数据、计算效率、逐变量快速、链式方程多元、感谢Tabassum Insaf博士在获取纽约州生命记录登记处数据方面提供的帮助。披露声明作者未报告潜在的利益冲突。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Applied Statistics 数学-统计学与概率论

CiteScore

3.40

自引率

0.00%

发文量

126

审稿时长

6 months

期刊介绍： Journal of Applied Statistics provides a forum for communication between both applied statisticians and users of applied statistical techniques across a wide range of disciplines. These areas include business, computing, economics, ecology, education, management, medicine, operational research and sociology, but papers from other areas are also considered. The editorial policy is to publish rigorous but clear and accessible papers on applied techniques. Purely theoretical papers are avoided but those on theoretical developments which clearly demonstrate significant applied potential are welcomed. Each paper is submitted to at least two independent referees.