Extraction of Radiological Characteristics From Free-Text Imaging Reports Using Natural Language Processing Among Patients With Ischemic and Hemorrhagic Stroke: Algorithm Development and Validation.

IF 1.5 4区 农林科学 Q3 FISHERIES
Enshuo Hsu, Abdulaziz T Bako, Thomas Potter, Alan P Pan, Gavin W Britz, Jonika Tannous, Farhaan S Vahidy
{"title":"Extraction of Radiological Characteristics From Free-Text Imaging Reports Using Natural Language Processing Among Patients With Ischemic and Hemorrhagic Stroke: Algorithm Development and Validation.","authors":"Enshuo Hsu, Abdulaziz T Bako, Thomas Potter, Alan P Pan, Gavin W Britz, Jonika Tannous, Farhaan S Vahidy","doi":"10.2196/42884","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Neuroimaging is the gold-standard diagnostic modality for all patients suspected of stroke. However, the unstructured nature of imaging reports remains a major challenge to extracting useful information from electronic health records systems. Despite the increasing adoption of natural language processing (NLP) for radiology reports, information extraction for many stroke imaging features has not been systematically evaluated.</p><p><strong>Objective: </strong>In this study, we propose an NLP pipeline, which adopts the state-of-the-art ClinicalBERT model with domain-specific pretraining and task-oriented fine-tuning to extract 13 stroke features from head computed tomography imaging notes.</p><p><strong>Methods: </strong>We used the model to generate structured data sets with information on the presence or absence of common stroke features for 24,924 patients with strokes. We compared the survival characteristics of patients with and without features of severe stroke (eg, midline shift, perihematomal edema, or mass effect) using the Kaplan-Meier curve and log-rank tests.</p><p><strong>Results: </strong>Pretrained on 82,073 head computed tomography notes with 13.7 million words and fine-tuned on 200 annotated notes, our HeadCT_BERT model achieved an average area under receiver operating characteristic curve of 0.9831, F<sub>1</sub>-score of 0.8683, and accuracy of 97%. Among patients with acute ischemic stroke, admissions with any severe stroke feature in initial imaging notes were associated with a lower probability of survival (P<.001).</p><p><strong>Conclusions: </strong>Our proposed NLP pipeline achieved high performance and has the potential to improve medical research and patient safety.</p>","PeriodicalId":55491,"journal":{"name":"Aquatic Living Resources","volume":"1 1","pages":"e42884"},"PeriodicalIF":1.5000,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11041442/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aquatic Living Resources","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/42884","RegionNum":4,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"FISHERIES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Neuroimaging is the gold-standard diagnostic modality for all patients suspected of stroke. However, the unstructured nature of imaging reports remains a major challenge to extracting useful information from electronic health records systems. Despite the increasing adoption of natural language processing (NLP) for radiology reports, information extraction for many stroke imaging features has not been systematically evaluated.

Objective: In this study, we propose an NLP pipeline, which adopts the state-of-the-art ClinicalBERT model with domain-specific pretraining and task-oriented fine-tuning to extract 13 stroke features from head computed tomography imaging notes.

Methods: We used the model to generate structured data sets with information on the presence or absence of common stroke features for 24,924 patients with strokes. We compared the survival characteristics of patients with and without features of severe stroke (eg, midline shift, perihematomal edema, or mass effect) using the Kaplan-Meier curve and log-rank tests.

Results: Pretrained on 82,073 head computed tomography notes with 13.7 million words and fine-tuned on 200 annotated notes, our HeadCT_BERT model achieved an average area under receiver operating characteristic curve of 0.9831, F1-score of 0.8683, and accuracy of 97%. Among patients with acute ischemic stroke, admissions with any severe stroke feature in initial imaging notes were associated with a lower probability of survival (P<.001).

Conclusions: Our proposed NLP pipeline achieved high performance and has the potential to improve medical research and patient safety.

使用自然语言处理从自由文本成像报告中提取缺血性和出血性脑卒中患者的放射学特征:算法开发与验证
背景:神经成像是所有疑似中风患者的金标准诊断方式。然而,成像报告的非结构化特性仍是从电子健康记录系统中提取有用信息的一大挑战。尽管放射学报告越来越多地采用自然语言处理(NLP)技术,但许多中风影像特征的信息提取尚未得到系统评估:在本研究中,我们提出了一种 NLP 管道,该管道采用最先进的 ClinicalBERT 模型,通过特定领域的预训练和面向任务的微调,从头部计算机断层扫描成像记录中提取 13 个中风特征:我们使用该模型生成了结构化数据集,其中包含 24,924 名脑卒中患者是否存在常见脑卒中特征的信息。我们使用 Kaplan-Meier 曲线和对数秩检验比较了具有和不具有严重中风特征(如中线移位、血肿周围水肿或肿块效应)的患者的生存特征:我们的 HeadCT_BERT 模型在 82,073 份头部计算机断层扫描记录(1370 万字)上进行了预训练,并在 200 份注释记录上进行了微调,其接收者操作特征曲线下的平均面积为 0.9831,F1 分数为 0.8683,准确率为 97%。在急性缺血性脑卒中患者中,初始影像记录中包含任何严重脑卒中特征的入院患者的生存概率较低(PConclusions:我们提出的 NLP 管道具有很高的性能,有望改善医学研究和患者安全。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Aquatic Living Resources
Aquatic Living Resources 农林科学-海洋与淡水生物学
CiteScore
2.30
自引率
0.00%
发文量
10
审稿时长
>24 weeks
期刊介绍: Aquatic Living Resources publishes original research papers, review articles and propective notes dealing with all exploited (i.e. fished or farmed) living resources in marine, brackish and freshwater environments. Priority is given to ecosystem-based approaches to the study of fishery and aquaculture social-ecological systems, including biological, ecological, economic and social dimensions. Research on the development of interdisciplinary methods and tools which can usefully support the design, implementation and evaluation of alternative management strategies for fisheries and/or aquaculture systems at different scales is particularly welcome by the journal. This includes the exploration of scenarios and strategies for the conservation of aquatic biodiversity and research relating to the development of integrated assessment approaches aimed at ensuring sustainable and high quality uses of aquatic living resources.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信