Addressing label noise for electronic health records: insights from computer vision for tabular data.

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Jenny Yang, Hagen Triendl, Andrew A S Soltan, Mangal Prakash, David A Clifton
{"title":"Addressing label noise for electronic health records: insights from computer vision for tabular data.","authors":"Jenny Yang, Hagen Triendl, Andrew A S Soltan, Mangal Prakash, David A Clifton","doi":"10.1186/s12911-024-02581-5","DOIUrl":null,"url":null,"abstract":"<p><p>The analysis of extensive electronic health records (EHR) datasets often calls for automated solutions, with machine learning (ML) techniques, including deep learning (DL), taking a lead role. One common task involves categorizing EHR data into predefined groups. However, the vulnerability of EHRs to noise and errors stemming from data collection processes, as well as potential human labeling errors, poses a significant risk. This risk is particularly prominent during the training of DL models, where the possibility of overfitting to noisy labels can have serious repercussions in healthcare. Despite the well-documented existence of label noise in EHR data, few studies have tackled this challenge within the EHR domain. Our work addresses this gap by adapting computer vision (CV) algorithms to mitigate the impact of label noise in DL models trained on EHR data. Notably, it remains uncertain whether CV methods, when applied to the EHR domain, will prove effective, given the substantial divergence between the two domains. We present empirical evidence demonstrating that these methods, whether used individually or in combination, can substantially enhance model performance when applied to EHR data, especially in the presence of noisy/incorrect labels. We validate our methods and underscore their practical utility in real-world EHR data, specifically in the context of COVID-19 diagnosis. Our study highlights the effectiveness of CV methods in the EHR domain, making a valuable contribution to the advancement of healthcare analytics and research.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11212446/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02581-5","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The analysis of extensive electronic health records (EHR) datasets often calls for automated solutions, with machine learning (ML) techniques, including deep learning (DL), taking a lead role. One common task involves categorizing EHR data into predefined groups. However, the vulnerability of EHRs to noise and errors stemming from data collection processes, as well as potential human labeling errors, poses a significant risk. This risk is particularly prominent during the training of DL models, where the possibility of overfitting to noisy labels can have serious repercussions in healthcare. Despite the well-documented existence of label noise in EHR data, few studies have tackled this challenge within the EHR domain. Our work addresses this gap by adapting computer vision (CV) algorithms to mitigate the impact of label noise in DL models trained on EHR data. Notably, it remains uncertain whether CV methods, when applied to the EHR domain, will prove effective, given the substantial divergence between the two domains. We present empirical evidence demonstrating that these methods, whether used individually or in combination, can substantially enhance model performance when applied to EHR data, especially in the presence of noisy/incorrect labels. We validate our methods and underscore their practical utility in real-world EHR data, specifically in the context of COVID-19 diagnosis. Our study highlights the effectiveness of CV methods in the EHR domain, making a valuable contribution to the advancement of healthcare analytics and research.

解决电子健康记录的标签噪声问题:计算机视觉对表格数据的启示。
对大量电子健康记录(EHR)数据集进行分析通常需要自动化解决方案,而机器学习(ML)技术,包括深度学习(DL),在其中发挥着主导作用。一项常见的任务是将电子病历数据归类到预定义的组中。然而,电子病历容易受到数据收集过程中产生的噪音和错误以及潜在的人为标记错误的影响,这带来了巨大的风险。这种风险在 DL 模型的训练过程中尤为突出,因为过度拟合噪声标签可能会对医疗保健造成严重影响。尽管电子病历数据中存在标签噪声,但在电子病历领域应对这一挑战的研究却寥寥无几。我们的研究通过调整计算机视觉(CV)算法来减轻在 EHR 数据上训练的 DL 模型中标签噪声的影响,从而填补了这一空白。值得注意的是,由于电子病历和计算机视觉在这两个领域之间存在很大差异,因此计算机视觉方法应用于电子病历领域是否有效仍不确定。我们提出的经验证据表明,这些方法无论是单独使用还是组合使用,在应用于电子病历数据时都能大幅提高模型性能,尤其是在存在噪声/不正确标签的情况下。我们验证了我们的方法,并强调了这些方法在真实 EHR 数据中的实用性,特别是在 COVID-19 诊断中。我们的研究凸显了 CV 方法在电子病历领域的有效性,为医疗分析和研究的进步做出了宝贵的贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信