Uncertainty management in model-based imputation for missing data

M. Azarkhail, P. Woytowitz
{"title":"Uncertainty management in model-based imputation for missing data","authors":"M. Azarkhail, P. Woytowitz","doi":"10.1109/RAMS.2013.6517697","DOIUrl":null,"url":null,"abstract":"In semiconductor industry like many other applications, the failure data is rarely available in complete form and is often flawed by missing records. When the missing process is random, the missing data can be safely ignored without major conceptual impact on the statistics of the experiment. The potential flaw with ignoring the missing data, however, is that the remaining complete observations may not carry enough statistical power, due to small sample size of the remaining population of complete failures. In some cases, the modeler may be able to describe the missing records as a function of other independent information available. Imputation of missing records from such empirical model is a typical way by which the lateral information about missing records can be leveraged. These models often carry considerable uncertainty that needs to be effectively incorporated into the data analysis process, in order to avoid false overconfidence in estimated reliability measures. In this article the uncertainty management during the model-based imputation process for missing data is discussed. The case study consists of Weibull analysis for a reliability critical component when a simple linear model is available for the missing records. Ignoring the missing records will result in relatively large uncertainty over the calculated reliability measures. The single imputation from correlation model will mark the other end of the spectrum due to an artificial boost in the statistical significance of the results as expected. The Multiple imputations and Bayesian likelihood averaging methods seem to be the most viable options when it comes to the uncertainty management in this problem. There seems to be some differences, however, that will be explained in detail.","PeriodicalId":189714,"journal":{"name":"2013 Proceedings Annual Reliability and Maintainability Symposium (RAMS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Proceedings Annual Reliability and Maintainability Symposium (RAMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAMS.2013.6517697","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In semiconductor industry like many other applications, the failure data is rarely available in complete form and is often flawed by missing records. When the missing process is random, the missing data can be safely ignored without major conceptual impact on the statistics of the experiment. The potential flaw with ignoring the missing data, however, is that the remaining complete observations may not carry enough statistical power, due to small sample size of the remaining population of complete failures. In some cases, the modeler may be able to describe the missing records as a function of other independent information available. Imputation of missing records from such empirical model is a typical way by which the lateral information about missing records can be leveraged. These models often carry considerable uncertainty that needs to be effectively incorporated into the data analysis process, in order to avoid false overconfidence in estimated reliability measures. In this article the uncertainty management during the model-based imputation process for missing data is discussed. The case study consists of Weibull analysis for a reliability critical component when a simple linear model is available for the missing records. Ignoring the missing records will result in relatively large uncertainty over the calculated reliability measures. The single imputation from correlation model will mark the other end of the spectrum due to an artificial boost in the statistical significance of the results as expected. The Multiple imputations and Bayesian likelihood averaging methods seem to be the most viable options when it comes to the uncertainty management in this problem. There seems to be some differences, however, that will be explained in detail.
基于模型的缺失数据输入中的不确定性管理
在半导体工业中,像许多其他应用一样,故障数据很少以完整的形式提供,并且经常由于缺少记录而存在缺陷。当缺失过程是随机的,缺失的数据可以被安全地忽略,而不会对实验的统计产生重大的概念影响。然而,忽略缺失数据的潜在缺陷是,剩余的完整观测可能没有足够的统计能力,因为剩余的完全失败总体的样本量很小。在某些情况下,建模者可能能够将缺失的记录描述为其他可用的独立信息的函数。从这种经验模型中对缺失记录进行代入是利用缺失记录横向信息的一种典型方法。这些模型通常具有相当大的不确定性,需要有效地将其纳入数据分析过程,以避免在估计的可靠性度量中出现错误的过度自信。本文讨论了基于模型的缺失数据输入过程中的不确定性管理问题。当一个简单的线性模型可用于缺失记录时,案例研究包括对可靠性关键组件的威布尔分析。忽略缺失的记录将导致计算出的可靠性度量存在较大的不确定性。由于结果的统计显著性如预期的那样被人为地提高,来自相关模型的单一输入将标记频谱的另一端。当涉及到该问题的不确定性管理时,多重归算和贝叶斯似然平均方法似乎是最可行的选择。然而,似乎有一些不同之处,我们将详细解释。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信