Incomplete Data Analysis

Bo-Wei Chen, Jia-Ching Wang
{"title":"Incomplete Data Analysis","authors":"Bo-Wei Chen, Jia-Ching Wang","doi":"10.5772/INTECHOPEN.94068","DOIUrl":null,"url":null,"abstract":"This chapter discusses missing-value problems from the perspective of machine learning. Missing values frequently occur during data acquisition. When a dataset contains missing values, nonvectorial data are generated. This subsequently causes a serious problem in pattern recognition models because nonvectorial data need further data wrangling before models are built. In view of such, this chapter reviews the methodologies of related works and examines their empirical effectiveness. At present, a great deal of effort has been devoted in this field, and those works can be roughly divided into two types — Multiple imputation and single imputation, where the latter can be further classified into subcategories. They include deletion, fixed-value replacement, K-Nearest Neighbors, regression, tree-based algorithms, and latent component-based approaches. In this chapter, those approaches are introduced and commented. Finally, numerical examples are provided along with recommendations on future development.","PeriodicalId":169871,"journal":{"name":"Applications of Pattern Recognition","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applications of Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5772/INTECHOPEN.94068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This chapter discusses missing-value problems from the perspective of machine learning. Missing values frequently occur during data acquisition. When a dataset contains missing values, nonvectorial data are generated. This subsequently causes a serious problem in pattern recognition models because nonvectorial data need further data wrangling before models are built. In view of such, this chapter reviews the methodologies of related works and examines their empirical effectiveness. At present, a great deal of effort has been devoted in this field, and those works can be roughly divided into two types — Multiple imputation and single imputation, where the latter can be further classified into subcategories. They include deletion, fixed-value replacement, K-Nearest Neighbors, regression, tree-based algorithms, and latent component-based approaches. In this chapter, those approaches are introduced and commented. Finally, numerical examples are provided along with recommendations on future development.
不完整数据分析
本章从机器学习的角度讨论了缺失值问题。在数据采集过程中经常会出现丢失值的情况。当数据集包含缺失值时,将生成非矢量数据。这在模式识别模型中导致了一个严重的问题,因为在模型建立之前,非向量数据需要进一步的数据整理。鉴于此,本章回顾了相关工作的方法,并检验了它们的实证有效性。目前,这方面的工作已经做了大量的工作,大致可以分为两类,即多次归算和单次归算,单次归算又可以进一步细分。它们包括删除、固定值替换、k近邻、回归、基于树的算法和基于潜在组件的方法。本章对这些方法进行了介绍和评述。最后给出了数值算例,并对今后的发展提出了建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信