Language model-based labeling of German thoracic radiology reports.

IF 1.3 4区医学 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Rofo-fortschritte Auf Dem Gebiet Der Rontgenstrahlen Und Der Bildgebenden Verfahren Pub Date : 2025-01-01 Epub Date: 2024-04-25 DOI:10.1055/a-2287-5054

Alessandro Wollek, Philip Haitzer, Thomas Sedlmeyr, Sardi Hyska, Johannes Rueckel, Bastian O Sabel, Michael Ingrisch, Tobias Lasser

{"title":"Language model-based labeling of German thoracic radiology reports.","authors":"Alessandro Wollek, Philip Haitzer, Thomas Sedlmeyr, Sardi Hyska, Johannes Rueckel, Bastian O Sabel, Michael Ingrisch, Tobias Lasser","doi":"10.1055/a-2287-5054","DOIUrl":null,"url":null,"abstract":"<p><p>The aim of this study was to explore the potential of weak supervision in a deep learning-based label prediction model. The goal was to use this model to extract labels from German free-text thoracic radiology reports on chest X-ray images and for training chest X-ray classification models.The proposed label extraction model for German thoracic radiology reports uses a German BERT encoder as a backbone and classifies a report based on the CheXpert labels. For investigating the efficient use of manually annotated data, the model was trained using manual annotations, weak rule-based labels, and both. Rule-based labels were extracted from 66071 retrospectively collected radiology reports from 2017-2021 (DS 0), and 1091 reports from 2020-2021 (DS 1) were manually labeled according to the CheXpert classes. Label extraction performance was evaluated with respect to mention extraction, negation detection, and uncertainty detection by measuring F1 scores. The influence of the label extraction method on chest X-ray classification was evaluated on a pneumothorax data set (DS 2) containing 6434 chest radiographs with associated reports and expert diagnoses of pneumothorax. For this, DenseNet-121 models trained on manual annotations, rule-based and deep learning-based label predictions, and publicly available data were compared.The proposed deep learning-based labeler (DL) performed on average considerably stronger than the rule-based labeler (RB) for all three tasks on DS 1 with F1 scores of 0.938 vs. 0.844 for mention extraction, 0.891 vs. 0.821 for negation detection, and 0.624 vs. 0.518 for uncertainty detection. Pre-training on DS 0 and fine-tuning on DS 1 performed better than only training on either DS 0 or DS 1. Chest X-ray pneumothorax classification results (DS 2) were highest when trained with DL labels with an area under the receiver operating curve (AUC) of 0.939 compared to RB labels with an AUC of 0.858. Training with manual labels performed slightly worse than training with DL labels with an AUC of 0.934. In contrast, training with a public data set resulted in an AUC of 0.720.Our results show that leveraging a rule-based report labeler for weak supervision leads to improved labeling performance. The pneumothorax classification results demonstrate that our proposed deep learning-based labeler can serve as a substitute for manual labeling requiring only 1000 manually annotated reports for training. · The proposed deep learning-based label extraction model for German thoracic radiology reports performs better than the rule-based model.. · Training with limited supervision outperformed training with a small manually labeled data set.. · Using predicted labels for pneumothorax classification from chest radiographs performed equally to using manual annotations.. Wollek A, Haitzer P, Sedlmeyr T et al. Language modelbased labeling of German thoracic radiology reports. Fortschr Röntgenstr 2024; DOI 10.1055/a-2287-5054.</p>","PeriodicalId":21490,"journal":{"name":"Rofo-fortschritte Auf Dem Gebiet Der Rontgenstrahlen Und Der Bildgebenden Verfahren","volume":" ","pages":"55-64"},"PeriodicalIF":1.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Rofo-fortschritte Auf Dem Gebiet Der Rontgenstrahlen Und Der Bildgebenden Verfahren","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-2287-5054","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/4/25 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

The aim of this study was to explore the potential of weak supervision in a deep learning-based label prediction model. The goal was to use this model to extract labels from German free-text thoracic radiology reports on chest X-ray images and for training chest X-ray classification models.The proposed label extraction model for German thoracic radiology reports uses a German BERT encoder as a backbone and classifies a report based on the CheXpert labels. For investigating the efficient use of manually annotated data, the model was trained using manual annotations, weak rule-based labels, and both. Rule-based labels were extracted from 66071 retrospectively collected radiology reports from 2017-2021 (DS 0), and 1091 reports from 2020-2021 (DS 1) were manually labeled according to the CheXpert classes. Label extraction performance was evaluated with respect to mention extraction, negation detection, and uncertainty detection by measuring F1 scores. The influence of the label extraction method on chest X-ray classification was evaluated on a pneumothorax data set (DS 2) containing 6434 chest radiographs with associated reports and expert diagnoses of pneumothorax. For this, DenseNet-121 models trained on manual annotations, rule-based and deep learning-based label predictions, and publicly available data were compared.The proposed deep learning-based labeler (DL) performed on average considerably stronger than the rule-based labeler (RB) for all three tasks on DS 1 with F1 scores of 0.938 vs. 0.844 for mention extraction, 0.891 vs. 0.821 for negation detection, and 0.624 vs. 0.518 for uncertainty detection. Pre-training on DS 0 and fine-tuning on DS 1 performed better than only training on either DS 0 or DS 1. Chest X-ray pneumothorax classification results (DS 2) were highest when trained with DL labels with an area under the receiver operating curve (AUC) of 0.939 compared to RB labels with an AUC of 0.858. Training with manual labels performed slightly worse than training with DL labels with an AUC of 0.934. In contrast, training with a public data set resulted in an AUC of 0.720.Our results show that leveraging a rule-based report labeler for weak supervision leads to improved labeling performance. The pneumothorax classification results demonstrate that our proposed deep learning-based labeler can serve as a substitute for manual labeling requiring only 1000 manually annotated reports for training. · The proposed deep learning-based label extraction model for German thoracic radiology reports performs better than the rule-based model.. · Training with limited supervision outperformed training with a small manually labeled data set.. · Using predicted labels for pneumothorax classification from chest radiographs performed equally to using manual annotations.. Wollek A, Haitzer P, Sedlmeyr T et al. Language modelbased labeling of German thoracic radiology reports. Fortschr Röntgenstr 2024; DOI 10.1055/a-2287-5054.

查看原文本刊更多论文

基于语言模型的德语胸部放射学报告标注。

本研究旨在探索基于深度学习的标签预测模型中弱监督的潜力。提出的德国胸部放射学报告标签提取模型使用德国 BERT 编码器作为骨干，并根据 CheXpert 标签对报告进行分类。为了研究人工标注数据的有效使用，该模型使用人工标注、基于规则的弱标签以及两者进行了训练。基于规则的标签是从 2017-2021 年（DS 0）的 66071 份回顾性收集的放射学报告中提取的，而 2020-2021 年（DS 1）的 1091 份报告则是根据 CheXpert 类别手动标注的。通过测量 F1 分数，评估了标签提取在提及提取、否定检测和不确定性检测方面的性能。在气胸数据集（DS 2）上评估了标签提取方法对胸部 X 光片分类的影响，该数据集包含 6434 张胸部 X 光片及相关报告和专家对气胸的诊断。在 DS 1 上的所有三项任务中，拟议的基于深度学习的标签器（DL）的平均表现要比基于规则的标签器（RB）强得多，提词提取的 F1 分数为 0.938 vs. 0.844，否定检测的 F1 分数为 0.891 vs. 0.821，不确定性检测的 F1 分数为 0.624 vs. 0.518。在 DS 0 上进行预训练并在 DS 1 上进行微调的效果优于只在 DS 0 或 DS 1 上进行训练的效果。使用 DL 标签进行训练时，胸部 X 光气胸分类结果（DS 2）最高，接收者工作曲线下面积（AUC）为 0.939，而使用 RB 标签时的接收者工作曲线下面积（AUC）为 0.858。使用人工标签进行的训练表现略逊于使用 DL 标签进行的训练，AUC 为 0.934。我们的结果表明，利用基于规则的报告标签器进行弱监督可以提高标签性能。气胸分类结果表明，我们提出的基于深度学习的标注器可以替代人工标注，只需要 1000 份人工标注的报告进行训练。- 针对德国胸部放射学报告提出的基于深度学习的标签提取模型比基于规则的模型表现更好。- 在有限监督下进行的训练优于使用小型人工标注数据集进行的训练- 使用预测标签对胸片进行气胸分类与使用人工标注效果相当Wollek A, Haitzer P, Sedlmeyr T et al.Fortschr Röntgenstr 2024; DOI 10.1055/a-2287-5054.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Rofo-fortschritte Auf Dem Gebiet Der Rontgenstrahlen Und Der Bildgebenden Verfahren 医学-核医学

CiteScore

1.20

自引率

5.60%

发文量

340

期刊介绍： Die RöFo veröffentlicht Originalarbeiten, Übersichtsartikel und Fallberichte aus dem Bereich der Radiologie und den weiteren bildgebenden Verfahren in der Medizin. Es dürfen nur Arbeiten eingereicht werden, die noch nicht veröffentlicht sind und die auch nicht gleichzeitig einer anderen Zeitschrift zur Veröffentlichung angeboten wurden. Alle eingereichten Beiträge unterliegen einer sorgfältigen fachlichen Begutachtung. Gegründet 1896 – nur knapp 1 Jahr nach der Entdeckung der Röntgenstrahlen durch C.W. Röntgen – blickt die RöFo auf über 100 Jahre Erfahrung als wichtigstes Publikationsmedium in der deutschsprachigen Radiologie zurück. Sie ist damit die älteste radiologische Fachzeitschrift und schafft es erfolgreich, lange Kontinuität mit dem Anspruch an wissenschaftliches Publizieren auf internationalem Niveau zu verbinden. Durch ihren zentralen Platz im Verlagsprogramm stellte die RöFo die Basis für das heute umfassende und erfolgreiche Radiologie-Medienangebot im Georg Thieme Verlag. Besonders eng verbunden ist die RöFo mit der Geschichte der Röntgengesellschaften in Deutschland und Österreich. Sie ist offizielles Organ von DRG und ÖRG und die Mitglieder der Fachgesellschaften erhalten die Zeitschrift im Rahmen ihrer Mitgliedschaft. Mit ihrem wissenschaftlichen Kernteil und dem eigenen Mitteilungsteil der Fachgesellschaften bietet die RöFo Monat für Monat ein Forum für den Austausch von Inhalten und Botschaften der radiologischen Community im deutschsprachigen Raum.