Multi-label Few-shot ICD Coding as Autoregressive Generation with Prompt.

Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence Pub Date : 2023-06-26 DOI:10.1609/aaai.v37i4.25668

Zhichao Yang, Sunjae Kwon, Zonghai Yao, Hong Yu

{"title":"Multi-label Few-shot ICD Coding as Autoregressive Generation with Prompt.","authors":"Zhichao Yang, Sunjae Kwon, Zonghai Yao, Hong Yu","doi":"10.1609/aaai.v37i4.25668","DOIUrl":null,"url":null,"abstract":"Automatic International Classification of Diseases (ICD) coding aims to assign multiple ICD codes to a medical note with an average of 3,000+ tokens. This task is challenging due to the high-dimensional space of multi-label assignment (155,000+ ICD code candidates) and the long-tail challenge - Many ICD codes are infrequently assigned yet infrequent ICD codes are important clinically. This study addresses the long-tail challenge by transforming this multi-label classification task into an autoregressive generation task. Specifically, we first introduce a novel pretraining objective to generate free text diagnoses and procedures using the SOAP structure, the medical logic physicians use for note documentation. Second, instead of directly predicting the high dimensional space of ICD codes, our model generates the lower dimension of text descriptions, which then infers ICD codes. Third, we designed a novel prompt template for multi-label classification. We evaluate our Generation with Prompt (GPsoap) model with the benchmark of all code assignment (MIMIC-III-full) and few shot ICD code assignment evaluation benchmark (MIMIC-III-few). Experiments on MIMIC-III-few show that our model performs with a marco F130.2, which substantially outperforms the previous MIMIC-III-full SOTA model (marco F1 4.3) and the model specifically designed for few/zero shot setting (marco F1 18.7). Finally, we design a novel ensemble learner, a cross-attention reranker with prompts, to integrate previous SOTA and our best few-shot coding predictions. Experiments on MIMIC-III-full show that our ensemble learner substantially improves both macro and micro F1, from 10.4 to 14.6 and from 58.2 to 59.1, respectively.","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"37 4","pages":"5366-5374"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10457101/pdf/nihms-1875188.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaai.v37i4.25668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic International Classification of Diseases (ICD) coding aims to assign multiple ICD codes to a medical note with an average of 3,000+ tokens. This task is challenging due to the high-dimensional space of multi-label assignment (155,000+ ICD code candidates) and the long-tail challenge - Many ICD codes are infrequently assigned yet infrequent ICD codes are important clinically. This study addresses the long-tail challenge by transforming this multi-label classification task into an autoregressive generation task. Specifically, we first introduce a novel pretraining objective to generate free text diagnoses and procedures using the SOAP structure, the medical logic physicians use for note documentation. Second, instead of directly predicting the high dimensional space of ICD codes, our model generates the lower dimension of text descriptions, which then infers ICD codes. Third, we designed a novel prompt template for multi-label classification. We evaluate our Generation with Prompt (GP_soap) model with the benchmark of all code assignment (MIMIC-III-full) and few shot ICD code assignment evaluation benchmark (MIMIC-III-few). Experiments on MIMIC-III-few show that our model performs with a marco F130.2, which substantially outperforms the previous MIMIC-III-full SOTA model (marco F1 4.3) and the model specifically designed for few/zero shot setting (marco F1 18.7). Finally, we design a novel ensemble learner, a cross-attention reranker with prompts, to integrate previous SOTA and our best few-shot coding predictions. Experiments on MIMIC-III-full show that our ensemble learner substantially improves both macro and micro F1, from 10.4 to 14.6 and from 58.2 to 59.1, respectively.

查看原文本刊更多论文

多标签少镜头 ICD 编码作为带提示的自回归生成。

国际疾病分类（ICD）自动编码的目的是为平均包含 3,000 多个标记的医疗记录分配多个 ICD 代码。由于多标签分配的高维空间（155,000 多个候选 ICD 代码）和长尾挑战--许多 ICD 代码不常分配，而不常分配的 ICD 代码在临床上却很重要，因此这项任务极具挑战性。本研究通过将多标签分类任务转化为自回归生成任务来应对长尾挑战。具体来说，我们首先引入了一个新颖的预训练目标，利用 SOAP 结构生成自由文本诊断和手术，SOAP 结构是医生用于记录病历的医疗逻辑。其次，我们的模型不是直接预测 ICD 代码的高维空间，而是生成较低维度的文本描述，然后推断出 ICD 代码。第三，我们为多标签分类设计了一个新颖的提示模板。我们用全部代码分配基准（MIMIC-III-full）和少数 ICD 代码分配评估基准（MIMIC-III-few）评估了我们的 "带提示生成（GPsoap）"模型。在 MIMIC-III-few 上的实验表明，我们的模型性能达到了 Marco F130.2，大大优于之前的 MIMIC-III-full SOTA 模型（marco F1 4.3）和专为少数/零次设置而设计的模型（marco F1 18.7）。最后，我们设计了一个新颖的集合学习器，即带有提示的交叉注意力重新输入器，以整合之前的 SOTA 和我们的最佳少发编码预测。在 MIMIC-III-full 上的实验表明，我们的集合学习器大幅提高了宏观和微观 F1，分别从 10.4 提高到 14.6 和从 58.2 提高到 59.1。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

自引率

0.00%

发文量