Still More Shades of Null: A Benchmark for Responsible Missing Value Imputation

arXiv - CS - Computers and Society Pub Date : 2024-09-11 DOI:arxiv-2409.07510

Falaah Arif Khan, Denys Herasymuk, Nazar Protsiv, Julia Stoyanovich

{"title":"Still More Shades of Null: A Benchmark for Responsible Missing Value Imputation","authors":"Falaah Arif Khan, Denys Herasymuk, Nazar Protsiv, Julia Stoyanovich","doi":"arxiv-2409.07510","DOIUrl":null,"url":null,"abstract":"We present Shades-of-NULL, a benchmark for responsible missing value\nimputation. Our benchmark includes state-of-the-art imputation techniques, and\nembeds them into the machine learning development lifecycle. We model realistic\nmissingness scenarios that go beyond Rubin's classic Missing Completely at\nRandom (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR), to\ninclude multi-mechanism missingness (when different missingness patterns\nco-exist in the data) and missingness shift (when the missingness mechanism\nchanges between training and test). Another key novelty of our work is that we\nevaluate imputers holistically, based on the predictive performance, fairness\nand stability of the models that are trained and tested on the data they\nproduce. We use Shades-of-NULL to conduct a large-scale empirical study involving\n20,952 experimental pipelines, and find that, while there is no single\nbest-performing imputation approach for all missingness types, interesting\nperformance patterns do emerge when comparing imputer performance in simpler\nvs. more complex missingness scenarios. Further, while predictive performance,\nfairness and stability can be seen as orthogonal, we identify trade-offs among\nthem that arise due to the combination of missingness scenario, the choice of\nan imputer, and the architecture of the model trained on the data\npost-imputation. We make Shades-of-NULL publicly available, and hope to enable\nresearchers to comprehensively and rigorously evaluate new missing value\nimputation methods on a wide range of evaluation metrics, in plausible and\nsocially meaningful missingness scenarios.","PeriodicalId":501112,"journal":{"name":"arXiv - CS - Computers and Society","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computers and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07510","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We present Shades-of-NULL, a benchmark for responsible missing value imputation. Our benchmark includes state-of-the-art imputation techniques, and embeds them into the machine learning development lifecycle. We model realistic missingness scenarios that go beyond Rubin's classic Missing Completely at Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR), to include multi-mechanism missingness (when different missingness patterns co-exist in the data) and missingness shift (when the missingness mechanism changes between training and test). Another key novelty of our work is that we evaluate imputers holistically, based on the predictive performance, fairness and stability of the models that are trained and tested on the data they produce. We use Shades-of-NULL to conduct a large-scale empirical study involving 20,952 experimental pipelines, and find that, while there is no single best-performing imputation approach for all missingness types, interesting performance patterns do emerge when comparing imputer performance in simpler vs. more complex missingness scenarios. Further, while predictive performance, fairness and stability can be seen as orthogonal, we identify trade-offs among them that arise due to the combination of missingness scenario, the choice of an imputer, and the architecture of the model trained on the data post-imputation. We make Shades-of-NULL publicly available, and hope to enable researchers to comprehensively and rigorously evaluate new missing value imputation methods on a wide range of evaluation metrics, in plausible and socially meaningful missingness scenarios.

查看原文本刊更多论文

更多的 "空"：负责任的缺失值估算基准

我们介绍了负责任的缺失值计算基准 Shades-of-NULL。我们的基准包括最先进的估算技术，并将其纳入机器学习开发生命周期。我们模拟了现实的缺失场景，这些场景不仅包括鲁宾经典的完全随机缺失（MCAR）、随机缺失（MAR）和非随机缺失（MNAR），还包括多机制缺失（当数据中存在不同的缺失模式时）和缺失转移（当缺失机制在训练和测试之间发生变化时）。我们工作的另一个关键新颖之处在于，我们根据在所产生的数据上训练和测试的模型的预测性能、公平性和稳定性，对误报者进行全面评估。我们使用 Shades-of-NULL 进行了大规模的实证研究，涉及 20,952 个实验管道，结果发现，虽然没有一种针对所有缺失类型的性能最佳的估算方法，但在比较估算器在简单和复杂缺失情况下的性能时，确实出现了有趣的性能模式。此外，虽然预测性能、公平性和稳定性可以看作是正交的，但我们发现它们之间的权衡是由缺失情景、计算器的选择以及数据输入后训练模型的结构等因素共同造成的。我们公开了 Shades-of-NULL，希望能让研究人员在可信且有社会意义的缺失情景下，根据广泛的评估指标对新的缺失值输入方法进行全面而严格的评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Computers and Society

自引率

0.00%

发文量