用于检测医疗保健分析中误标的无监督错误检测方法。

IF 3.8 3区医学 Q2 ENGINEERING, BIOMEDICAL

Bioengineering Pub Date : 2024-07-31 DOI:10.3390/bioengineering11080770

Pei-Yuan Zhou, Faith Lum, Tony Jiecao Wang, Anubhav Bhatti, Surajsinh Parmar, Chen Dan, Andrew K C Wong

{"title":"用于检测医疗保健分析中误标的无监督错误检测方法。","authors":"Pei-Yuan Zhou, Faith Lum, Tony Jiecao Wang, Anubhav Bhatti, Surajsinh Parmar, Chen Dan, Andrew K C Wong","doi":"10.3390/bioengineering11080770","DOIUrl":null,"url":null,"abstract":"Medical datasets may be imbalanced and contain errors due to subjective test results and clinical variability. The poor quality of original data affects classification accuracy and reliability. Hence, detecting abnormal samples in the dataset can help clinicians make better decisions. In this study, we propose an unsupervised error detection method using patterns discovered by the Pattern Discovery and Disentanglement (PDD) model, developed in our earlier work. Applied to the large data, the eICU Collaborative Research Database for sepsis risk assessment, the proposed algorithm can effectively discover statistically significant association patterns, generate an interpretable knowledge base for interpretability, cluster samples in an unsupervised learning manner, and detect abnormal samples from the dataset. As shown in the experimental result, our method outperformed K-Means by 38% on the full dataset and 47% on the reduced dataset for unsupervised clustering. Multiple supervised classifiers improve accuracy by an average of 4% after removing abnormal samples by the proposed error detection approach. Therefore, the proposed algorithm provides a robust and practical solution for unsupervised clustering and error detection in healthcare data.","PeriodicalId":8874,"journal":{"name":"Bioengineering","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11351123/pdf/","citationCount":"0","resultStr":"{\"title\":\"An Unsupervised Error Detection Methodology for Detecting Mislabels in Healthcare Analytics.\",\"authors\":\"Pei-Yuan Zhou, Faith Lum, Tony Jiecao Wang, Anubhav Bhatti, Surajsinh Parmar, Chen Dan, Andrew K C Wong\",\"doi\":\"10.3390/bioengineering11080770\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Medical datasets may be imbalanced and contain errors due to subjective test results and clinical variability. The poor quality of original data affects classification accuracy and reliability. Hence, detecting abnormal samples in the dataset can help clinicians make better decisions. In this study, we propose an unsupervised error detection method using patterns discovered by the Pattern Discovery and Disentanglement (PDD) model, developed in our earlier work. Applied to the large data, the eICU Collaborative Research Database for sepsis risk assessment, the proposed algorithm can effectively discover statistically significant association patterns, generate an interpretable knowledge base for interpretability, cluster samples in an unsupervised learning manner, and detect abnormal samples from the dataset. As shown in the experimental result, our method outperformed K-Means by 38% on the full dataset and 47% on the reduced dataset for unsupervised clustering. Multiple supervised classifiers improve accuracy by an average of 4% after removing abnormal samples by the proposed error detection approach. Therefore, the proposed algorithm provides a robust and practical solution for unsupervised clustering and error detection in healthcare data.\",\"PeriodicalId\":8874,\"journal\":{\"name\":\"Bioengineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11351123/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioengineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.3390/bioengineering11080770\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioengineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/bioengineering11080770","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

由于主观测试结果和临床变异，医疗数据集可能不平衡并包含误差。原始数据质量差会影响分类的准确性和可靠性。因此，检测数据集中的异常样本可以帮助临床医生做出更好的决策。在本研究中，我们提出了一种无监督错误检测方法，该方法使用的模式是我们早期工作中开发的模式发现和纠错（PDD）模型所发现的模式。将该算法应用于用于脓毒症风险评估的 eICU 协作研究数据库这一大型数据，可以有效地发现具有统计意义的关联模式，生成可解释性知识库，以无监督学习的方式对样本进行聚类，并从数据集中检测出异常样本。实验结果表明，在无监督聚类方面，我们的方法在完整数据集上比 K-Means 高出 38%，在缩小数据集上比 K-Means 高出 47%。在使用所提出的错误检测方法剔除异常样本后，多重监督分类器的准确率平均提高了 4%。因此，所提出的算法为医疗数据的无监督聚类和错误检测提供了一种稳健实用的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Unsupervised Error Detection Methodology for Detecting Mislabels in Healthcare Analytics.

Medical datasets may be imbalanced and contain errors due to subjective test results and clinical variability. The poor quality of original data affects classification accuracy and reliability. Hence, detecting abnormal samples in the dataset can help clinicians make better decisions. In this study, we propose an unsupervised error detection method using patterns discovered by the Pattern Discovery and Disentanglement (PDD) model, developed in our earlier work. Applied to the large data, the eICU Collaborative Research Database for sepsis risk assessment, the proposed algorithm can effectively discover statistically significant association patterns, generate an interpretable knowledge base for interpretability, cluster samples in an unsupervised learning manner, and detect abnormal samples from the dataset. As shown in the experimental result, our method outperformed K-Means by 38% on the full dataset and 47% on the reduced dataset for unsupervised clustering. Multiple supervised classifiers improve accuracy by an average of 4% after removing abnormal samples by the proposed error detection approach. Therefore, the proposed algorithm provides a robust and practical solution for unsupervised clustering and error detection in healthcare data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bioengineering Chemical Engineering-Bioengineering

CiteScore

4.00

自引率

8.70%

发文量

661

期刊介绍： Aims Bioengineering (ISSN 2306-5354) provides an advanced forum for the science and technology of bioengineering. It publishes original research papers, comprehensive reviews, communications and case reports. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. All aspects of bioengineering are welcomed from theoretical concepts to education and applications. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. There are, in addition, four key features of this Journal: ● We are introducing a new concept in scientific and technical publications “The Translational Case Report in Bioengineering”. It is a descriptive explanatory analysis of a transformative or translational event. Understanding that the goal of bioengineering scholarship is to advance towards a transformative or clinical solution to an identified transformative/clinical need, the translational case report is used to explore causation in order to find underlying principles that may guide other similar transformative/translational undertakings. ● Manuscripts regarding research proposals and research ideas will be particularly welcomed. ● Electronic files and software regarding the full details of the calculation and experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material. ● We also accept manuscripts communicating to a broader audience with regard to research projects financed with public funds. Scope ● Bionics and biological cybernetics: implantology; bio–abio interfaces ● Bioelectronics: wearable electronics; implantable electronics; “more than Moore” electronics; bioelectronics devices ● Bioprocess and biosystems engineering and applications: bioprocess design; biocatalysis; bioseparation and bioreactors; bioinformatics; bioenergy; etc. ● Biomolecular, cellular and tissue engineering and applications: tissue engineering; chromosome engineering; embryo engineering; cellular, molecular and synthetic biology; metabolic engineering; bio-nanotechnology; micro/nano technologies; genetic engineering; transgenic technology ● Biomedical engineering and applications: biomechatronics; biomedical electronics; biomechanics; biomaterials; biomimetics; biomedical diagnostics; biomedical therapy; biomedical devices; sensors and circuits; biomedical imaging and medical information systems; implants and regenerative medicine; neurotechnology; clinical engineering; rehabilitation engineering ● Biochemical engineering and applications: metabolic pathway engineering; modeling and simulation ● Translational bioengineering