Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification.

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2025-03-07 DOI:10.1186/s12911-025-02897-w

Bauke Arends, Melle Vessies, Dirk van Osch, Arco Teske, Pim van der Harst, René van Es, Bram van Es

{"title":"Diagnosis extraction from unstructured Dutch echocardiogram reports using span- and document-level characteristic classification.","authors":"Bauke Arends, Melle Vessies, Dirk van Osch, Arco Teske, Pim van der Harst, René van Es, Bram van Es","doi":"10.1186/s12911-025-02897-w","DOIUrl":null,"url":null,"abstract":"Background: Clinical machine learning research and artificial intelligence driven clinical decision support models rely on clinically accurate labels. Manually extracting these labels with the help of clinical specialists is often time-consuming and expensive. This study tests the feasibility of automatic span- and document-level diagnosis extraction from unstructured Dutch echocardiogram reports.Methods: We included 115,692 unstructured echocardiogram reports from the University Medical Center Utrecht, a large university hospital in the Netherlands. A randomly selected subset was manually annotated for the occurrence and severity of eleven commonly described cardiac characteristics. We developed and tested several automatic labelling techniques at both span and document levels, using weighted and macro F1-score, precision, and recall for performance evaluation. We compared the performance of span labelling against document labelling methods, which included both direct document classifiers and indirect document classifiers that rely on span classification results.Results: The SpanCategorizer and MedRoBERTa.nl models outperformed all other span and document classifiers, respectively. The weighted F1-score varied between characteristics, ranging from 0.60 to 0.93 in SpanCategorizer and 0.96 to 0.98 in MedRoBERTa.nl. Direct document classification was superior to indirect document classification using span classifiers. SetFit achieved competitive document classification performance using only 10% of the training data. Utilizing a reduced label set yielded near-perfect document classification results.Conclusion: We recommend using our published SpanCategorizer and MedRoBERTa.nl models for span- and document-level diagnosis extraction from Dutch echocardiography reports. For settings with limited training data, SetFit may be a promising alternative for document classification. Future research should be aimed at training a RoBERTa based span classifier and applying English based models on translated echocardiogram reports.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"115"},"PeriodicalIF":3.3000,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11887187/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02897-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Clinical machine learning research and artificial intelligence driven clinical decision support models rely on clinically accurate labels. Manually extracting these labels with the help of clinical specialists is often time-consuming and expensive. This study tests the feasibility of automatic span- and document-level diagnosis extraction from unstructured Dutch echocardiogram reports.

Methods: We included 115,692 unstructured echocardiogram reports from the University Medical Center Utrecht, a large university hospital in the Netherlands. A randomly selected subset was manually annotated for the occurrence and severity of eleven commonly described cardiac characteristics. We developed and tested several automatic labelling techniques at both span and document levels, using weighted and macro F1-score, precision, and recall for performance evaluation. We compared the performance of span labelling against document labelling methods, which included both direct document classifiers and indirect document classifiers that rely on span classification results.

Results: The SpanCategorizer and MedRoBERTa.nl models outperformed all other span and document classifiers, respectively. The weighted F1-score varied between characteristics, ranging from 0.60 to 0.93 in SpanCategorizer and 0.96 to 0.98 in MedRoBERTa.nl. Direct document classification was superior to indirect document classification using span classifiers. SetFit achieved competitive document classification performance using only 10% of the training data. Utilizing a reduced label set yielded near-perfect document classification results.

Conclusion: We recommend using our published SpanCategorizer and MedRoBERTa.nl models for span- and document-level diagnosis extraction from Dutch echocardiography reports. For settings with limited training data, SetFit may be a promising alternative for document classification. Future research should be aimed at training a RoBERTa based span classifier and applying English based models on translated echocardiogram reports.

查看原文本刊更多论文

使用跨度和文档级特征分类从非结构化荷兰超声心动图报告中提取诊断。

临床机器学习研究和人工智能驱动的临床决策支持模型依赖于临床准确的标签。在临床专家的帮助下手动提取这些标签通常既耗时又昂贵。本研究测试了从非结构化荷兰超声心动图报告中自动提取跨度级和文档级诊断的可行性。方法：我们纳入了来自荷兰乌得勒支大学医学中心（一家大型大学医院）的115,692份非结构化超声心动图报告。随机选择一个子集，对11种常见的心脏特征的发生和严重程度进行手动注释。我们在跨度和文档级别上开发并测试了几种自动标记技术，使用加权和宏观f1分数、精度和召回率进行性能评估。我们比较了跨度标注与文档标注方法的性能，其中包括直接文档分类器和依赖跨度分类结果的间接文档分类器。结果：SpanCategorizer和MedRoBERTa。Nl模型分别优于所有其他的跨度和文档分类器。各特征间的加权f1得分差异较大，SpanCategorizer为0.60 ~ 0.93,medroberta . n1为0.96 ~ 0.98。使用跨分类器进行直接文档分类优于间接文档分类。SetFit仅使用10%的训练数据就实现了具有竞争力的文档分类性能。使用简化的标签集产生了近乎完美的文档分类结果。结论：我们建议使用我们出版的SpanCategorizer和MedRoBERTa。从荷兰超声心动图报告中提取跨和文档级诊断的nl模型。对于训练数据有限的设置，SetFit可能是一个很有前途的文档分类替代方案。未来的研究应着眼于训练基于RoBERTa的跨分类器，并将基于英语的模型应用于翻译的超声心动图报告。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.