Embedding for Informative Missingness: Deep Learning With Incomplete Data

Amirata Ghorbani, James Y. Zou
{"title":"Embedding for Informative Missingness: Deep Learning With Incomplete Data","authors":"Amirata Ghorbani, James Y. Zou","doi":"10.1109/ALLERTON.2018.8636008","DOIUrl":null,"url":null,"abstract":"Deep learning is increasingly used to make pre-dictions on biomedical and social science data. A ubiquitous challenge in such applications is that the training data is often incomplete: certain attributes of samples could be missing. Moreover, there could be complex structures in the pattern of which attributes are missing-for example, whether the glucose level is measured for a participant may depend on his/her other attributes (e.g., age) as well as on the prediction target (say, diabetes status). We propose a general embedding approach to learn representations for missingness. The embedding can be a modular layer of any neural network architecture and it’s learned at the same time as the networks learn to make predictions. This approach bypasses the need to first impute the missing attributes, which is a key limitation because standard imputation methods require random missingness. Our systematic experimental evaluations demonstrate that missingness embedding significantly improves the prediction accuracy especially when the data missingness has structures, which is typical in practice. We show that the embedding is robust to changes in the missingness of test data (domain-adaptation) and discuss how the embedding reveals insights on the underlying missing mechanism.","PeriodicalId":299280,"journal":{"name":"2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ALLERTON.2018.8636008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Deep learning is increasingly used to make pre-dictions on biomedical and social science data. A ubiquitous challenge in such applications is that the training data is often incomplete: certain attributes of samples could be missing. Moreover, there could be complex structures in the pattern of which attributes are missing-for example, whether the glucose level is measured for a participant may depend on his/her other attributes (e.g., age) as well as on the prediction target (say, diabetes status). We propose a general embedding approach to learn representations for missingness. The embedding can be a modular layer of any neural network architecture and it’s learned at the same time as the networks learn to make predictions. This approach bypasses the need to first impute the missing attributes, which is a key limitation because standard imputation methods require random missingness. Our systematic experimental evaluations demonstrate that missingness embedding significantly improves the prediction accuracy especially when the data missingness has structures, which is typical in practice. We show that the embedding is robust to changes in the missingness of test data (domain-adaptation) and discuss how the embedding reveals insights on the underlying missing mechanism.
信息缺失的嵌入:不完整数据的深度学习
深度学习越来越多地用于对生物医学和社会科学数据进行预测。在这类应用中,一个普遍存在的挑战是训练数据通常是不完整的:样本的某些属性可能会丢失。此外,在属性缺失的模式中可能存在复杂的结构,例如,是否为参与者测量葡萄糖水平可能取决于他/她的其他属性(例如,年龄)以及预测目标(例如,糖尿病状态)。我们提出了一种通用的嵌入方法来学习缺失的表示。这种嵌入可以是任何神经网络架构的模块化层,它是在网络学习预测的同时学习的。这种方法不需要首先估算缺失的属性,这是一个关键的限制,因为标准的估算方法需要随机缺失。我们系统的实验评估表明,缺失嵌入显著提高了预测精度,特别是当数据缺失具有结构时,这在实践中是典型的。我们证明了嵌入对测试数据缺失(域适应)的变化具有鲁棒性,并讨论了嵌入如何揭示潜在缺失机制的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信