Secondary Use of Clinical Problem List Descriptions for Bi-Encoder Based ICD-10 Classification.

AMIA ... Annual Symposium proceedings. AMIA Symposium Pub Date : 2025-05-22 eCollection Date: 2024-01-01

Markus Kreuzthaler, Bastian Pfeifer, Stefan Schulz

引用次数: 0

Abstract

Annotated language resources are essential for supervised machine learning methods. In the clinical domain, such data sets can boost use-case specific natural language processing services. In this work, we have analyzed a clinical problem list table consisting of millions of ICD-10 codes assigned to short problem list descriptions in German. We have investigated whether the given data forms a valuable resource within a secondary use case scenario for coding support. Our proposed methodology exploits an embedding-based k-NN classifier, which was evaluated based on its coding performance, leveraging the multilingual BERT based language model SapBERT-UMLS in comparison with medBERT.de, which is specifically tailored to medical and clinical language resources in German. Our approach reached a weighted F1-measure of 0.87 using SapBERT-UMLS and an F1-measure of 0.86 for medBERT.de. The approach revealed promising coding results when reusing annotated language resources out of clinical routine documentation.

本刊更多论文

基于双编码器的ICD-10分类中临床问题清单描述的二次使用。

注释语言资源对于监督式机器学习方法至关重要。在临床领域，这样的数据集可以促进用例特定的自然语言处理服务。在这项工作中，我们分析了由数百万个ICD-10代码组成的临床问题列表表，这些代码分配给德语的短问题列表描述。我们已经调查了给定的数据是否在编码支持的次要用例场景中形成了有价值的资源。我们提出的方法利用基于嵌入的k-NN分类器，该分类器根据其编码性能进行评估，利用基于多语言BERT的语言模型SapBERT-UMLS与专门针对德国医学和临床语言资源的medBERT.de进行比较。我们的方法使用SapBERT-UMLS的加权f1测量值为0.87，使用medBERT.de的加权f1测量值为0.86。当重用临床常规文档之外的注释语言资源时，该方法显示了有希望的编码结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AMIA ... Annual Symposium proceedings. AMIA Symposium

自引率

0.00%

发文量