在Python和R: MIDASpy和rMIDAS中实现多种数据的高效多重输入

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Statistical Software Pub Date : 2023-01-01 DOI:10.18637/jss.v107.i09

Ranjit Lall, Thomas Robinson

{"title":"在Python和R: MIDASpy和rMIDAS中实现多种数据的高效多重输入","authors":"Ranjit Lall, Thomas Robinson","doi":"10.18637/jss.v107.i09","DOIUrl":null,"url":null,"abstract":"This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"241 1","pages":"0"},"PeriodicalIF":8.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS\",\"authors\":\"Ranjit Lall, Thomas Robinson\",\"doi\":\"10.18637/jss.v107.i09\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset.\",\"PeriodicalId\":17237,\"journal\":{\"name\":\"Journal of Statistical Software\",\"volume\":\"241 1\",\"pages\":\"0\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Statistical Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18637/jss.v107.i09\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18637/jss.v107.i09","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 2

摘要

本文介绍了在Python (MIDASpy)和R (rMIDAS)中使用深度学习方法有效地输入缺失数据的软件包。这些软件包实现了最近开发的一种称为MIDAS的多重输入方法，该方法包括在数据集中引入额外的缺失值，尝试使用一种称为去噪自动编码器的无监督神经网络重建这些值，并使用生成的模型绘制原始缺失数据的输入。这些步骤是由一个快速和灵活的算法来执行的，它扩大了数据的数量和范围，可以用多次插值来分析。为了帮助用户优化其特定应用的算法，MIDASpy和rMIDAS提供了大量用户友好的工具来校准和验证插补模型。我们提供了这些功能的详细指南，并演示了它们在大型真实数据集上的使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS

This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Statistical Software 工程技术-计算机：跨学科应用

CiteScore

10.70

自引率

1.70%

发文量

审稿时长

6-12 weeks

期刊介绍： The Journal of Statistical Software (JSS) publishes open-source software and corresponding reproducible articles discussing all aspects of the design, implementation, documentation, application, evaluation, comparison, maintainance and distribution of software dedicated to improvement of state-of-the-art in statistical computing in all areas of empirical research. Open-source code and articles are jointly reviewed and published in this journal and should be accessible to a broad community of practitioners, teachers, and researchers in the field of statistics.