Target-Focused Feature Selection Using Uncertainty Measurements in Healthcare Data

Orpaz Goldstein, Mohammad Kachuee, Kimmo Kärkkäinen, M. Sarrafzadeh
{"title":"Target-Focused Feature Selection Using Uncertainty Measurements in Healthcare Data","authors":"Orpaz Goldstein, Mohammad Kachuee, Kimmo Kärkkäinen, M. Sarrafzadeh","doi":"10.1145/3383685","DOIUrl":null,"url":null,"abstract":"Healthcare big data remains under-utilized due to various incompatibility issues between the domains of data analytics and healthcare. The lack of generalizable iterative feature acquisition methods under budget and machine learning models that allow reasoning with a model’s uncertainty are two examples. Meanwhile, a boost to the available data is currently under way with the rapid growth in the Internet of Things applications and personalized healthcare. For the healthcare domain to be able to adopt models that take advantage of this big data, machine learning models should be coupled with more informative, germane feature acquisition methods, consequently adding robustness to the model’s results. We introduce an approach to feature selection that is based on Bayesian learning, allowing us to report the level of uncertainty in the model, combined with false-positive and false-negative rates. In addition, measuring target-specific uncertainty lifts the restriction on feature selection being target agnostic, allowing for feature acquisition based on a target of focus. We show that acquiring features for a specific target is at least as good as deep learning feature selection methods and common linear feature selection approaches for small non-sparse datasets, and surpasses these when faced with real-world data that is larger in scale and sparseness.","PeriodicalId":72043,"journal":{"name":"ACM transactions on computing for healthcare","volume":" ","pages":"1 - 17"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3383685","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM transactions on computing for healthcare","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3383685","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Healthcare big data remains under-utilized due to various incompatibility issues between the domains of data analytics and healthcare. The lack of generalizable iterative feature acquisition methods under budget and machine learning models that allow reasoning with a model’s uncertainty are two examples. Meanwhile, a boost to the available data is currently under way with the rapid growth in the Internet of Things applications and personalized healthcare. For the healthcare domain to be able to adopt models that take advantage of this big data, machine learning models should be coupled with more informative, germane feature acquisition methods, consequently adding robustness to the model’s results. We introduce an approach to feature selection that is based on Bayesian learning, allowing us to report the level of uncertainty in the model, combined with false-positive and false-negative rates. In addition, measuring target-specific uncertainty lifts the restriction on feature selection being target agnostic, allowing for feature acquisition based on a target of focus. We show that acquiring features for a specific target is at least as good as deep learning feature selection methods and common linear feature selection approaches for small non-sparse datasets, and surpasses these when faced with real-world data that is larger in scale and sparseness.
在医疗保健数据中使用不确定性测量的以目标为中心的特征选择
由于数据分析和医疗保健领域之间存在各种不兼容问题,医疗保健大数据仍未得到充分利用。在预算和机器学习模型下缺乏可推广的迭代特征获取方法,允许对模型的不确定性进行推理就是两个例子。与此同时,随着物联网应用和个性化医疗的快速增长,可用数据正在不断增加。为了使医疗保健领域能够采用利用这些大数据的模型,机器学习模型应该与更多信息、相关的特征获取方法相结合,从而增加模型结果的鲁棒性。我们引入了一种基于贝叶斯学习的特征选择方法,允许我们报告模型中的不确定性水平,并结合假阳性和假阴性率。此外,测量目标特定的不确定性解除了对目标不可知的特征选择的限制,允许基于焦点目标的特征获取。我们表明,对于小型非稀疏数据集,获取特定目标的特征至少与深度学习特征选择方法和常见线性特征选择方法一样好,并且在面对规模和稀疏度更大的现实世界数据时优于这些方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
10.30
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信