自动分配诊断代码的自由格式文本医疗说明

J. Univers. Comput. Sci. Pub Date : 2023-04-28 DOI:10.3897/jucs.89923

Stefan Strydom, Andrei Michael Dreyer, Brink van der Merwe

{"title":"自动分配诊断代码的自由格式文本医疗说明","authors":"Stefan Strydom, Andrei Michael Dreyer, Brink van der Merwe","doi":"10.3897/jucs.89923","DOIUrl":null,"url":null,"abstract":"International Classification of Disease (ICD) coding plays a significant role in classify-ing morbidity and mortality rates. Currently, ICD codes are assigned to a patient’s medical record by hand by medical practitioners or specialist clinical coders. This practice is prone to errors, and training skilled clinical coders requires time and human resources. Automatic prediction of ICD codes can help alleviate this burden. In this paper, we propose a transformer-based architecture with label-wise attention for predicting ICD codes on a medical dataset. The transformer model is first pre-trained from scratch on a medical dataset. Once this is done, the pre-trained model is used to generate representations of the tokens in the clinical documents, which are fed into the label-wise attention layer. Finally, the outputs from the label-wise attention layer are fed into a feed-forward neural network to predict appropriate ICD codes for the input document. We evaluate our model using hospital discharge summaries and their corresponding ICD-9 codes from the MIMIC-III dataset. Our experimental results show that our transformer model outperforms all previous models in terms of micro-F1 for the full label set from the MIMIC-III dataset. This is also the first successful application of a pre-trained transformer architecture to the auto-coding problem on the full MIMIC-III dataset.","PeriodicalId":14652,"journal":{"name":"J. Univers. Comput. Sci.","volume":"106 1","pages":"349-373"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automatic assignment of diagnosis codes to free-form text medical note\",\"authors\":\"Stefan Strydom, Andrei Michael Dreyer, Brink van der Merwe\",\"doi\":\"10.3897/jucs.89923\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"International Classification of Disease (ICD) coding plays a significant role in classify-ing morbidity and mortality rates. Currently, ICD codes are assigned to a patient’s medical record by hand by medical practitioners or specialist clinical coders. This practice is prone to errors, and training skilled clinical coders requires time and human resources. Automatic prediction of ICD codes can help alleviate this burden. In this paper, we propose a transformer-based architecture with label-wise attention for predicting ICD codes on a medical dataset. The transformer model is first pre-trained from scratch on a medical dataset. Once this is done, the pre-trained model is used to generate representations of the tokens in the clinical documents, which are fed into the label-wise attention layer. Finally, the outputs from the label-wise attention layer are fed into a feed-forward neural network to predict appropriate ICD codes for the input document. We evaluate our model using hospital discharge summaries and their corresponding ICD-9 codes from the MIMIC-III dataset. Our experimental results show that our transformer model outperforms all previous models in terms of micro-F1 for the full label set from the MIMIC-III dataset. This is also the first successful application of a pre-trained transformer architecture to the auto-coding problem on the full MIMIC-III dataset.\",\"PeriodicalId\":14652,\"journal\":{\"name\":\"J. Univers. Comput. Sci.\",\"volume\":\"106 1\",\"pages\":\"349-373\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Univers. Comput. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3897/jucs.89923\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Univers. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/jucs.89923","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

国际疾病分类(ICD)编码在发病率和死亡率分类中起着重要作用。目前，ICD代码由医生或专业临床编码人员手工分配给患者的病历。这种做法容易出错，培训熟练的临床编码人员需要时间和人力资源。ICD代码的自动预测可以帮助减轻这一负担。在本文中，我们提出了一种基于变压器的结构，具有标签式关注，用于预测医疗数据集上的ICD代码。变压器模型首先在医疗数据集上从零开始进行预训练。一旦完成，预训练模型将用于生成临床文档中令牌的表示，这些令牌将被馈送到标签关注层。最后，来自标签注意层的输出被馈送到前馈神经网络中，以预测输入文档的适当ICD代码。我们使用来自MIMIC-III数据集的医院出院摘要及其相应的ICD-9代码来评估我们的模型。实验结果表明，对于来自MIMIC-III数据集的完整标签集，我们的变压器模型在micro-F1方面优于所有先前的模型。这也是在完整MIMIC-III数据集上首次成功地将预训练的变压器架构应用于自动编码问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic assignment of diagnosis codes to free-form text medical note

International Classification of Disease (ICD) coding plays a significant role in classify-ing morbidity and mortality rates. Currently, ICD codes are assigned to a patient’s medical record by hand by medical practitioners or specialist clinical coders. This practice is prone to errors, and training skilled clinical coders requires time and human resources. Automatic prediction of ICD codes can help alleviate this burden. In this paper, we propose a transformer-based architecture with label-wise attention for predicting ICD codes on a medical dataset. The transformer model is first pre-trained from scratch on a medical dataset. Once this is done, the pre-trained model is used to generate representations of the tokens in the clinical documents, which are fed into the label-wise attention layer. Finally, the outputs from the label-wise attention layer are fed into a feed-forward neural network to predict appropriate ICD codes for the input document. We evaluate our model using hospital discharge summaries and their corresponding ICD-9 codes from the MIMIC-III dataset. Our experimental results show that our transformer model outperforms all previous models in terms of micro-F1 for the full label set from the MIMIC-III dataset. This is also the first successful application of a pre-trained transformer architecture to the auto-coding problem on the full MIMIC-III dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Univers. Comput. Sci.

自引率

0.00%

发文量