Using Large Language Models for Efficient Cancer Registry Coding in the Real Hospital Setting: A Feasibility Study.

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Pub Date : 2025-01-01 DOI:10.1142/9789819807024_0010

Chen-Kai Wang, Cheng-Rong Ke, Ming-Siang Huang, Inn-Wen Chong, Yi-Hsin Yang, Vincent S Tseng, Hong-Jie Dai

{"title":"Using Large Language Models for Efficient Cancer Registry Coding in the Real Hospital Setting: A Feasibility Study.","authors":"Chen-Kai Wang, Cheng-Rong Ke, Ming-Siang Huang, Inn-Wen Chong, Yi-Hsin Yang, Vincent S Tseng, Hong-Jie Dai","doi":"10.1142/9789819807024_0010","DOIUrl":null,"url":null,"abstract":"<p><p>The primary challenge in reporting cancer cases lies in the labor-intensive and time-consuming process of manually reviewing numerous reports. Current methods predominantly rely on rule-based approaches or custom-supervised learning models, which predict diagnostic codes based on a single pathology report per patient. Although these methods show promising evaluation results, their biased outcomes in controlled settings may hinder adaption to real-world reporting workflows. In this feasibility study, we focused on lung cancer as a test case and developed an agentic retrieval-augmented generation (RAG) system to evaluate the potential of publicly available large language models (LLMs) for cancer registry coding. Our findings demonstrate that: (1) directly applying publicly available LLMs without fine-tuning is feasible for cancer registry coding; and (2) prompt engineering can significantly enhance the capability of pre-trained LLMs in cancer registry coding. The off-the-shelf LLM, combined with our proposed system architecture and basic prompts, achieved a macro-averaged F-score of 0.637 when evaluated on testing data consisting of patients' medical reports spanning 1.5 years since their first visit. By employing chain of thought (CoT) reasoning and our proposed coding item grouping, the system outperformed the baseline by 0.187 in terms of the macro-averaged F-score. These findings demonstrate the great potential of leveraging LLMs with prompt engineering for cancer registry coding. Our system could offer cancer registrars a promising reference tool to enhance their daily workflow, improving efficiency and accuracy in cancer case reporting.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"121-137"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9789819807024_0010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

Abstract

The primary challenge in reporting cancer cases lies in the labor-intensive and time-consuming process of manually reviewing numerous reports. Current methods predominantly rely on rule-based approaches or custom-supervised learning models, which predict diagnostic codes based on a single pathology report per patient. Although these methods show promising evaluation results, their biased outcomes in controlled settings may hinder adaption to real-world reporting workflows. In this feasibility study, we focused on lung cancer as a test case and developed an agentic retrieval-augmented generation (RAG) system to evaluate the potential of publicly available large language models (LLMs) for cancer registry coding. Our findings demonstrate that: (1) directly applying publicly available LLMs without fine-tuning is feasible for cancer registry coding; and (2) prompt engineering can significantly enhance the capability of pre-trained LLMs in cancer registry coding. The off-the-shelf LLM, combined with our proposed system architecture and basic prompts, achieved a macro-averaged F-score of 0.637 when evaluated on testing data consisting of patients' medical reports spanning 1.5 years since their first visit. By employing chain of thought (CoT) reasoning and our proposed coding item grouping, the system outperformed the baseline by 0.187 in terms of the macro-averaged F-score. These findings demonstrate the great potential of leveraging LLMs with prompt engineering for cancer registry coding. Our system could offer cancer registrars a promising reference tool to enhance their daily workflow, improving efficiency and accuracy in cancer case reporting.

查看原文本刊更多论文

在真实医院环境中使用大型语言模型进行有效的癌症登记编码：可行性研究。

报告癌症病例的主要挑战在于人工审查大量报告的劳动密集型和耗时过程。目前的方法主要依赖于基于规则的方法或自定义监督的学习模型，这些模型基于每个患者的单个病理报告来预测诊断代码。尽管这些方法显示出有希望的评估结果，但它们在受控环境中的偏差结果可能会阻碍对现实世界报告工作流程的适应。在这项可行性研究中，我们将重点放在肺癌作为测试案例，并开发了一个代理检索增强生成（RAG）系统，以评估公开可用的大型语言模型（llm）用于癌症注册编码的潜力。研究结果表明：(1)直接应用公开的llm进行癌症注册编码是可行的；(2)快速工程可以显著提高预训练llm在癌症注册编码中的能力。现成的LLM结合我们提出的系统架构和基本提示，在由患者首次就诊后1.5年的医疗报告组成的测试数据中进行评估时，获得了0.637的宏观平均f分。通过采用思维链（CoT）推理和我们提出的编码项目分组，系统在宏观平均f得分方面优于基线0.187。这些发现证明了利用法学硕士和快速工程进行癌症登记编码的巨大潜力。我们的系统可以为癌症登记员提供一个有前途的参考工具，以改善他们的日常工作流程，提高癌症病例报告的效率和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Medicine-Medicine (all)

CiteScore

4.50

自引率

0.00%

发文量