Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing.
Ruben Geevarghese, Carlie Sigel, John Cadley, Subrata Chatterjee, Pulkit Jain, Alex Hollingsworth, Avijit Chatterjee, Nathaniel Swinburne, Khawaja Hasan Bilal, Brett Marinelli
{"title":"Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing.","authors":"Ruben Geevarghese, Carlie Sigel, John Cadley, Subrata Chatterjee, Pulkit Jain, Alex Hollingsworth, Avijit Chatterjee, Nathaniel Swinburne, Khawaja Hasan Bilal, Brett Marinelli","doi":"10.1136/jcp-2024-209669","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Structured reporting in pathology is not universally adopted and extracting elements essential to research often requires expensive and time-intensive manual curation. The accuracy and feasibility of using large language models (LLMs) to extract essential pathology elements, for cancer research is examined here.</p><p><strong>Methods: </strong>Retrospective study of patients who underwent pathology sampling for suspected hepatocellular carcinoma and underwent Ytrrium-90 embolisation. Five pathology report elements of interest were included for evaluation. LLMs (Generative Pre-trained Transformer (GPT) 3.5 turbo and GPT-4) were used to extract elements of interest. For comparison, a rules-based, regular expressions (REGEX) approach was devised for extraction. Accuracy for each approach was calculated.</p><p><strong>Results: </strong>88 pathology reports were identified. LLMs and REGEX were both able to extract research elements with high accuracy (average 84.1%-94.8%).</p><p><strong>Conclusions: </strong>LLMs have significant potential to simplify the extraction of research elements from pathology reporting, and therefore, accelerate the pace of cancer research.</p>","PeriodicalId":15391,"journal":{"name":"Journal of Clinical Pathology","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Pathology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/jcp-2024-209669","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Aims: Structured reporting in pathology is not universally adopted and extracting elements essential to research often requires expensive and time-intensive manual curation. The accuracy and feasibility of using large language models (LLMs) to extract essential pathology elements, for cancer research is examined here.
Methods: Retrospective study of patients who underwent pathology sampling for suspected hepatocellular carcinoma and underwent Ytrrium-90 embolisation. Five pathology report elements of interest were included for evaluation. LLMs (Generative Pre-trained Transformer (GPT) 3.5 turbo and GPT-4) were used to extract elements of interest. For comparison, a rules-based, regular expressions (REGEX) approach was devised for extraction. Accuracy for each approach was calculated.
Results: 88 pathology reports were identified. LLMs and REGEX were both able to extract research elements with high accuracy (average 84.1%-94.8%).
Conclusions: LLMs have significant potential to simplify the extraction of research elements from pathology reporting, and therefore, accelerate the pace of cancer research.
期刊介绍:
Journal of Clinical Pathology is a leading international journal covering all aspects of pathology. Diagnostic and research areas covered include histopathology, virology, haematology, microbiology, cytopathology, chemical pathology, molecular pathology, forensic pathology, dermatopathology, neuropathology and immunopathology. Each issue contains Reviews, Original articles, Short reports, Correspondence and more.