RIFLE: Imputation and Robust Inference from Low Order Marginals.

Transactions on machine learning research Pub Date : 2023-09-01

Sina Baharlouei, Kelechi Ogudu, Sze-Chuan Suen, Meisam Razaviyayn

{"title":"RIFLE: Imputation and Robust Inference from Low Order Marginals.","authors":"Sina Baharlouei, Kelechi Ogudu, Sze-Chuan Suen, Meisam Razaviyayn","doi":"","DOIUrl":null,"url":null,"abstract":"The ubiquity of missing values in real-world datasets poses a challenge for statistical inference and can prevent similar datasets from being analyzed in the same study, precluding many existing datasets from being used for new analyses. While an extensive collection of packages and algorithms have been developed for data imputation, the overwhelming majority perform poorly if there are many missing values and low sample sizes, which are unfortunately common characteristics in empirical data. Such low-accuracy estimations adversely affect the performance of downstream statistical models. We develop a statistical inference framework for regression and classification in the presence of missing data without imputation. Our framework, RIFLE (Robust InFerence via Low-order moment Estimations), estimates low-order moments of the underlying data distribution with corresponding confidence intervals to learn a distributionally robust model. We specialize our framework to linear regression and normal discriminant analysis, and we provide convergence and performance guarantees. This framework can also be adapted to impute missing data. In numerical experiments, we compare RIFLE to several state-of-the-art approaches (including MICE, Amelia, MissForest, KNN-imputer, MIDA, and Mean Imputer) for imputation and inference in the presence of missing values. Our experiments demonstrate that RIFLE outperforms other benchmark algorithms when the percentage of missing values is high and/or when the number of data points is relatively small. RIFLE is publicly available at https://github.com/optimization-for-data-driven-science/RIFLE.","PeriodicalId":75238,"journal":{"name":"Transactions on machine learning research","volume":"2023 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10977932/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions on machine learning research","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The ubiquity of missing values in real-world datasets poses a challenge for statistical inference and can prevent similar datasets from being analyzed in the same study, precluding many existing datasets from being used for new analyses. While an extensive collection of packages and algorithms have been developed for data imputation, the overwhelming majority perform poorly if there are many missing values and low sample sizes, which are unfortunately common characteristics in empirical data. Such low-accuracy estimations adversely affect the performance of downstream statistical models. We develop a statistical inference framework for regression and classification in the presence of missing data without imputation. Our framework, RIFLE (Robust InFerence via Low-order moment Estimations), estimates low-order moments of the underlying data distribution with corresponding confidence intervals to learn a distributionally robust model. We specialize our framework to linear regression and normal discriminant analysis, and we provide convergence and performance guarantees. This framework can also be adapted to impute missing data. In numerical experiments, we compare RIFLE to several state-of-the-art approaches (including MICE, Amelia, MissForest, KNN-imputer, MIDA, and Mean Imputer) for imputation and inference in the presence of missing values. Our experiments demonstrate that RIFLE outperforms other benchmark algorithms when the percentage of missing values is high and/or when the number of data points is relatively small. RIFLE is publicly available at https://github.com/optimization-for-data-driven-science/RIFLE.

本刊更多论文

RIFLE：根据低阶边际值进行归因和稳健推断。

在现实世界的数据集中，缺失值无处不在，这给统计推断带来了挑战，并可能导致无法在同一研究中对类似数据集进行分析，从而使许多现有数据集无法用于新的分析。虽然已经开发了大量的数据估算软件包和算法，但绝大多数软件包和算法在缺失值多和样本量少的情况下表现不佳，而这正是经验数据的常见特征。这种低准确度的估计会对下游统计模型的性能产生不利影响。我们开发了一个统计推断框架，用于在存在缺失数据的情况下进行回归和分类，而无需估算。我们的框架 RIFLE（Robust InFerence via Low-order moment Estimations）通过相应的置信区间估计基础数据分布的低阶矩，从而学习分布上稳健的模型。我们将框架专门用于线性回归和正态判别分析，并提供收敛性和性能保证。这一框架还可用于缺失数据的补偿。在数值实验中，我们将 RIFLE 与几种最先进的方法（包括 MICE、Amelia、MissForest、KNN-imputer、MIDA 和 Mean Imputer）进行了比较，以便在存在缺失值的情况下进行归因和推断。我们的实验表明，当缺失值比例较高和/或数据点数量相对较少时，RIFLE 的表现优于其他基准算法。RIFLE 在 https://github.com/optimization-for-data-driven-science/RIFLE 上公开发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Transactions on machine learning research

自引率

0.00%

发文量