Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing.

IF 2.5 4区 医学 Q2 PATHOLOGY
Ruben Geevarghese, Carlie Sigel, John Cadley, Subrata Chatterjee, Pulkit Jain, Alex Hollingsworth, Avijit Chatterjee, Nathaniel Swinburne, Khawaja Hasan Bilal, Brett Marinelli
{"title":"Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing.","authors":"Ruben Geevarghese, Carlie Sigel, John Cadley, Subrata Chatterjee, Pulkit Jain, Alex Hollingsworth, Avijit Chatterjee, Nathaniel Swinburne, Khawaja Hasan Bilal, Brett Marinelli","doi":"10.1136/jcp-2024-209669","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Structured reporting in pathology is not universally adopted and extracting elements essential to research often requires expensive and time-intensive manual curation. The accuracy and feasibility of using large language models (LLMs) to extract essential pathology elements, for cancer research is examined here.</p><p><strong>Methods: </strong>Retrospective study of patients who underwent pathology sampling for suspected hepatocellular carcinoma and underwent Ytrrium-90 embolisation. Five pathology report elements of interest were included for evaluation. LLMs (Generative Pre-trained Transformer (GPT) 3.5 turbo and GPT-4) were used to extract elements of interest. For comparison, a rules-based, regular expressions (REGEX) approach was devised for extraction. Accuracy for each approach was calculated.</p><p><strong>Results: </strong>88 pathology reports were identified. LLMs and REGEX were both able to extract research elements with high accuracy (average 84.1%-94.8%).</p><p><strong>Conclusions: </strong>LLMs have significant potential to simplify the extraction of research elements from pathology reporting, and therefore, accelerate the pace of cancer research.</p>","PeriodicalId":15391,"journal":{"name":"Journal of Clinical Pathology","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Pathology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/jcp-2024-209669","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PATHOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Aims: Structured reporting in pathology is not universally adopted and extracting elements essential to research often requires expensive and time-intensive manual curation. The accuracy and feasibility of using large language models (LLMs) to extract essential pathology elements, for cancer research is examined here.

Methods: Retrospective study of patients who underwent pathology sampling for suspected hepatocellular carcinoma and underwent Ytrrium-90 embolisation. Five pathology report elements of interest were included for evaluation. LLMs (Generative Pre-trained Transformer (GPT) 3.5 turbo and GPT-4) were used to extract elements of interest. For comparison, a rules-based, regular expressions (REGEX) approach was devised for extraction. Accuracy for each approach was calculated.

Results: 88 pathology reports were identified. LLMs and REGEX were both able to extract research elements with high accuracy (average 84.1%-94.8%).

Conclusions: LLMs have significant potential to simplify the extraction of research elements from pathology reporting, and therefore, accelerate the pace of cancer research.

使用大型语言模型从非结构化肝胆病理报告中提取结构化数据并进行分类:与基于规则的自然语言处理进行比较的可行性研究。
目的:病理学中的结构化报告并没有得到普遍采用,提取对研究至关重要的内容往往需要昂贵且耗时的人工整理。本文探讨了使用大型语言模型(LLMs)提取癌症研究必需病理要素的准确性和可行性:方法:对因怀疑患有肝细胞癌而进行病理取样并接受 Ytrrium-90 栓塞术的患者进行回顾性研究。评估包括五项相关病理报告要素。使用 LLM(生成预训练变换器 (GPT) 3.5 turbo 和 GPT-4)提取感兴趣的元素。为了进行比较,还设计了一种基于规则的正则表达式 (REGEX) 方法进行提取。计算了每种方法的准确性:共识别出 88 份病理报告。LLM 和 REGEX 都能以较高的准确率(平均 84.1%-94.8%)提取研究元素:LLMs 在简化病理报告中研究元素的提取方面具有巨大潜力,因此可以加快癌症研究的步伐。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.80
自引率
2.90%
发文量
113
审稿时长
3-8 weeks
期刊介绍: Journal of Clinical Pathology is a leading international journal covering all aspects of pathology. Diagnostic and research areas covered include histopathology, virology, haematology, microbiology, cytopathology, chemical pathology, molecular pathology, forensic pathology, dermatopathology, neuropathology and immunopathology. Each issue contains Reviews, Original articles, Short reports, Correspondence and more.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信