基于自动弱监督的非结构化临床笔记分类

Machine Learning in Health Care Pub Date : 2022-06-24 DOI:10.48550/arXiv.2206.12088

Chufan Gao, Mononito Goswami, Jieshi Chen, A. Dubrawski

{"title":"基于自动弱监督的非结构化临床笔记分类","authors":"Chufan Gao, Mononito Goswami, Jieshi Chen, A. Dubrawski","doi":"10.48550/arXiv.2206.12088","DOIUrl":null,"url":null,"abstract":"Healthcare providers usually record detailed notes of the clinical care delivered to each patient for clinical, research, and billing purposes. Due to the unstructured nature of these narratives, providers employ dedicated staff to assign diagnostic codes to patients' diagnoses using the International Classification of Diseases (ICD) coding system. This manual process is not only time-consuming but also costly and error-prone. Prior work demonstrated potential utility of Machine Learning (ML) methodology in automating this process, but it has relied on large quantities of manually labeled data to train the models. Additionally, diagnostic coding systems evolve with time, which makes traditional supervised learning strategies unable to generalize beyond local applications. In this work, we introduce a general weakly-supervised text classification framework that learns from class-label descriptions only, without the need to use any human-labeled documents. It leverages the linguistic domain knowledge stored within pre-trained language models and the data programming framework to assign code labels to individual texts. We demonstrate the efficacy and flexibility of our method by comparing it to state-of-the-art weak text classifiers across four real-world text classification datasets, in addition to assigning ICD codes to medical notes in the publicly available MIMIC-III database.","PeriodicalId":231229,"journal":{"name":"Machine Learning in Health Care","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Classifying Unstructured Clinical Notes via Automatic Weak Supervision\",\"authors\":\"Chufan Gao, Mononito Goswami, Jieshi Chen, A. Dubrawski\",\"doi\":\"10.48550/arXiv.2206.12088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Healthcare providers usually record detailed notes of the clinical care delivered to each patient for clinical, research, and billing purposes. Due to the unstructured nature of these narratives, providers employ dedicated staff to assign diagnostic codes to patients' diagnoses using the International Classification of Diseases (ICD) coding system. This manual process is not only time-consuming but also costly and error-prone. Prior work demonstrated potential utility of Machine Learning (ML) methodology in automating this process, but it has relied on large quantities of manually labeled data to train the models. Additionally, diagnostic coding systems evolve with time, which makes traditional supervised learning strategies unable to generalize beyond local applications. In this work, we introduce a general weakly-supervised text classification framework that learns from class-label descriptions only, without the need to use any human-labeled documents. It leverages the linguistic domain knowledge stored within pre-trained language models and the data programming framework to assign code labels to individual texts. We demonstrate the efficacy and flexibility of our method by comparing it to state-of-the-art weak text classifiers across four real-world text classification datasets, in addition to assigning ICD codes to medical notes in the publicly available MIMIC-III database.\",\"PeriodicalId\":231229,\"journal\":{\"name\":\"Machine Learning in Health Care\",\"volume\":\"109 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Learning in Health Care\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2206.12088\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning in Health Care","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.12088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

医疗保健提供者通常为临床、研究和计费目的记录向每位患者提供的临床护理的详细记录。由于这些叙述的非结构化性质，提供者雇用专门的工作人员使用国际疾病分类(ICD)编码系统为患者的诊断分配诊断代码。这个手工过程不仅耗时，而且成本高，而且容易出错。先前的工作证明了机器学习(ML)方法在自动化这一过程中的潜在效用，但它依赖于大量手动标记的数据来训练模型。此外，诊断编码系统随着时间的推移而发展，这使得传统的监督学习策略无法推广到局部应用之外。在这项工作中，我们引入了一个通用的弱监督文本分类框架，它只从类标签描述中学习，而不需要使用任何人工标记的文档。它利用存储在预训练语言模型中的语言领域知识和数据编程框架为单个文本分配代码标签。除了将ICD代码分配给公开可用的MIMIC-III数据库中的医疗记录外，我们还将该方法与四个现实世界文本分类数据集中最先进的弱文本分类器进行比较，从而证明了该方法的有效性和灵活性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Classifying Unstructured Clinical Notes via Automatic Weak Supervision

Healthcare providers usually record detailed notes of the clinical care delivered to each patient for clinical, research, and billing purposes. Due to the unstructured nature of these narratives, providers employ dedicated staff to assign diagnostic codes to patients' diagnoses using the International Classification of Diseases (ICD) coding system. This manual process is not only time-consuming but also costly and error-prone. Prior work demonstrated potential utility of Machine Learning (ML) methodology in automating this process, but it has relied on large quantities of manually labeled data to train the models. Additionally, diagnostic coding systems evolve with time, which makes traditional supervised learning strategies unable to generalize beyond local applications. In this work, we introduce a general weakly-supervised text classification framework that learns from class-label descriptions only, without the need to use any human-labeled documents. It leverages the linguistic domain knowledge stored within pre-trained language models and the data programming framework to assign code labels to individual texts. We demonstrate the efficacy and flexibility of our method by comparing it to state-of-the-art weak text classifiers across four real-world text classification datasets, in addition to assigning ICD codes to medical notes in the publicly available MIMIC-III database.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine Learning in Health Care

自引率

0.00%

发文量