Natural language processing pipeline to extract prostate cancer-related information from clinical notes.

IF 4.7 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

European Radiology Pub Date : 2024-12-01 Epub Date: 2024-06-06 DOI:10.1007/s00330-024-10812-6

Hirotsugu Nakai, Garima Suman, Daniel A Adamo, Patrick J Navin, Candice A Bookwalter, Jordan D LeGout, Frank K Chen, Clinton V Wellnitz, Alvin C Silva, John V Thomas, Akira Kawashima, Jungwei W Fan, Adam T Froemming, Derek J Lomas, Mitchell R Humphreys, Chandler Dora, Panagiotis Korfiatis, Naoki Takahashi

{"title":"Natural language processing pipeline to extract prostate cancer-related information from clinical notes.","authors":"Hirotsugu Nakai, Garima Suman, Daniel A Adamo, Patrick J Navin, Candice A Bookwalter, Jordan D LeGout, Frank K Chen, Clinton V Wellnitz, Alvin C Silva, John V Thomas, Akira Kawashima, Jungwei W Fan, Adam T Froemming, Derek J Lomas, Mitchell R Humphreys, Chandler Dora, Panagiotis Korfiatis, Naoki Takahashi","doi":"10.1007/s00330-024-10812-6","DOIUrl":null,"url":null,"abstract":"Objectives: To develop an automated pipeline for extracting prostate cancer-related information from clinical notes.Materials and methods: This retrospective study included 23,225 patients who underwent prostate MRI between 2017 and 2022. Cancer risk factors (family history of cancer and digital rectal exam findings), pre-MRI prostate pathology, and treatment history of prostate cancer were extracted from free-text clinical notes in English as binary or multi-class classification tasks. Any sentence containing pre-defined keywords was extracted from clinical notes within one year before the MRI. After manually creating sentence-level datasets with ground truth, Bidirectional Encoder Representations from Transformers (BERT)-based sentence-level models were fine-tuned using the extracted sentence as input and the category as output. The patient-level output was determined by compilation of multiple sentence-level outputs using tree-based models. Sentence-level classification performance was evaluated using the area under the receiver operating characteristic curve (AUC) on 15% of the sentence-level dataset (sentence-level test set). The patient-level classification performance was evaluated on the patient-level test set created by radiologists by reviewing the clinical notes of 603 patients. Accuracy and sensitivity were compared between the pipeline and radiologists.Results: Sentence-level AUCs were ≥ 0.94. The pipeline showed higher patient-level sensitivity for extracting cancer risk factors (e.g., family history of prostate cancer, 96.5% vs. 77.9%, p < 0.001), but lower accuracy in classifying pre-MRI prostate pathology (92.5% vs. 95.9%, p = 0.002) and treatment history of prostate cancer (95.5% vs. 97.7%, p = 0.03) than radiologists, respectively.Conclusion: The proposed pipeline showed promising performance, especially for extracting cancer risk factors from patient's clinical notes.Clinical relevance statement: The natural language processing pipeline showed a higher sensitivity for extracting prostate cancer risk factors than radiologists and may help efficiently gather relevant text information when interpreting prostate MRI.Key points: When interpreting prostate MRI, it is necessary to extract prostate cancer-related information from clinical notes. This pipeline extracted the presence of prostate cancer risk factors with higher sensitivity than radiologists. Natural language processing may help radiologists efficiently gather relevant prostate cancer-related text information.","PeriodicalId":12076,"journal":{"name":"European Radiology","volume":" ","pages":"7878-7891"},"PeriodicalIF":4.7000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00330-024-10812-6","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/6 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: To develop an automated pipeline for extracting prostate cancer-related information from clinical notes.

Materials and methods: This retrospective study included 23,225 patients who underwent prostate MRI between 2017 and 2022. Cancer risk factors (family history of cancer and digital rectal exam findings), pre-MRI prostate pathology, and treatment history of prostate cancer were extracted from free-text clinical notes in English as binary or multi-class classification tasks. Any sentence containing pre-defined keywords was extracted from clinical notes within one year before the MRI. After manually creating sentence-level datasets with ground truth, Bidirectional Encoder Representations from Transformers (BERT)-based sentence-level models were fine-tuned using the extracted sentence as input and the category as output. The patient-level output was determined by compilation of multiple sentence-level outputs using tree-based models. Sentence-level classification performance was evaluated using the area under the receiver operating characteristic curve (AUC) on 15% of the sentence-level dataset (sentence-level test set). The patient-level classification performance was evaluated on the patient-level test set created by radiologists by reviewing the clinical notes of 603 patients. Accuracy and sensitivity were compared between the pipeline and radiologists.

Results: Sentence-level AUCs were ≥ 0.94. The pipeline showed higher patient-level sensitivity for extracting cancer risk factors (e.g., family history of prostate cancer, 96.5% vs. 77.9%, p < 0.001), but lower accuracy in classifying pre-MRI prostate pathology (92.5% vs. 95.9%, p = 0.002) and treatment history of prostate cancer (95.5% vs. 97.7%, p = 0.03) than radiologists, respectively.

Conclusion: The proposed pipeline showed promising performance, especially for extracting cancer risk factors from patient's clinical notes.

Clinical relevance statement: The natural language processing pipeline showed a higher sensitivity for extracting prostate cancer risk factors than radiologists and may help efficiently gather relevant text information when interpreting prostate MRI.

Key points: When interpreting prostate MRI, it is necessary to extract prostate cancer-related information from clinical notes. This pipeline extracted the presence of prostate cancer risk factors with higher sensitivity than radiologists. Natural language processing may help radiologists efficiently gather relevant prostate cancer-related text information.

Abstract Image

查看原文本刊更多论文

从临床笔记中提取前列腺癌相关信息的自然语言处理管道。

目的：开发从临床笔记中提取前列腺癌相关信息的自动化管道：开发从临床记录中提取前列腺癌相关信息的自动化管道：这项回顾性研究纳入了2017年至2022年间接受前列腺磁共振成像检查的23225名患者。癌症风险因素（癌症家族史和数字直肠检查结果）、MRI 前的前列腺病理和前列腺癌治疗史作为二元或多类分类任务从英文自由文本临床笔记中提取。从核磁共振成像前一年内的临床笔记中提取任何包含预定义关键词的句子。在手动创建具有基本事实的句子级数据集后，基于变换器双向编码器表示（BERT）的句子级模型被微调，将提取的句子作为输入，类别作为输出。患者级别的输出是通过使用基于树的模型对多个句子级别的输出进行汇编而确定的。句子级分类性能是在 15%的句子级数据集（句子级测试集）上使用接收者操作特征曲线下面积（AUC）进行评估的。患者级分类性能是在放射科医生通过查看 603 名患者的临床记录创建的患者级测试集上进行评估的。对管道和放射科医生的准确性和灵敏度进行了比较：结果：句子级 AUC 均≥ 0.94。在提取癌症风险因素方面，管道显示出更高的患者级别灵敏度（例如，前列腺癌家族史，96.5% 对 77.9%，p 结论：管道的准确性和灵敏度均高于放射科医生：所提出的管道显示出良好的性能，尤其是从病人的临床笔记中提取癌症风险因素：自然语言处理管道在提取前列腺癌风险因素方面的灵敏度高于放射科医生，有助于在解释前列腺 MRI 时有效收集相关文本信息：在解释前列腺 MRI 时，有必要从临床笔记中提取前列腺癌相关信息。该管道提取前列腺癌风险因素的灵敏度高于放射科医生。自然语言处理可帮助放射科医生有效地收集与前列腺癌相关的文本信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Radiology 医学-核医学

CiteScore

11.60

自引率

8.50%

发文量

874

审稿时长

2-4 weeks

期刊介绍： European Radiology (ER) continuously updates scientific knowledge in radiology by publication of strong original articles and state-of-the-art reviews written by leading radiologists. A well balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes ER an indispensable source for current information in this field. This is the Journal of the European Society of Radiology, and the official journal of a number of societies. From 2004-2008 supplements to European Radiology were published under its companion, European Radiology Supplements, ISSN 1613-3749.