Application of NotebookLM, a large language model with retrieval-augmented generation, for lung cancer staging.

IF 2.1 4区医学

Japanese Journal of Radiology Pub Date : 2025-04-01 Epub Date: 2024-11-25 DOI:10.1007/s11604-024-01705-1

Ryota Tozuka, Hisashi Johno, Akitomo Amakawa, Junichi Sato, Mizuki Muto, Shoichiro Seki, Atsushi Komaba, Hiroshi Onishi

{"title":"Application of NotebookLM, a large language model with retrieval-augmented generation, for lung cancer staging.","authors":"Ryota Tozuka, Hisashi Johno, Akitomo Amakawa, Junichi Sato, Mizuki Muto, Shoichiro Seki, Atsushi Komaba, Hiroshi Onishi","doi":"10.1007/s11604-024-01705-1","DOIUrl":null,"url":null,"abstract":"Purpose: In radiology, large language models (LLMs), including ChatGPT, have recently gained attention, and their utility is being rapidly evaluated. However, concerns have emerged regarding their reliability in clinical applications due to limitations such as hallucinations and insufficient referencing. To address these issues, we focus on the latest technology, retrieval-augmented generation (RAG), which enables LLMs to reference reliable external knowledge (REK). Specifically, this study examines the utility and reliability of a recently released RAG-equipped LLM (RAG-LLM), NotebookLM, for staging lung cancer.Materials and methods: We summarized the current lung cancer staging guideline in Japan and provided this as REK to NotebookLM. We then tasked NotebookLM with staging 100 fictional lung cancer cases based on CT findings and evaluated its accuracy. For comparison, we performed the same task using a gold-standard LLM, GPT-4 Omni (GPT-4o), both with and without the REK. For GPT-4o, the REK was provided directly within the prompt rather than through RAG.Results: NotebookLM achieved 86% diagnostic accuracy in the lung cancer staging experiment, outperforming GPT-4o, which recorded 39% accuracy with the REK and 25% without it. Moreover, NotebookLM demonstrated 95% accuracy in searching reference locations within the REK.Conclusion: NotebookLM, a RAG-LLM, successfully performed lung cancer staging by utilizing the REK, demonstrating superior performance compared to GPT-4o (without RAG). Additionally, it provided highly accurate reference locations within the REK, allowing radiologists to efficiently evaluate the reliability of NotebookLM's responses and detect possible hallucinations. Overall, this study highlights the potential of NotebookLM, a RAG-LLM, in image diagnosis.","PeriodicalId":14691,"journal":{"name":"Japanese Journal of Radiology","volume":" ","pages":"706-712"},"PeriodicalIF":2.1000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Japanese Journal of Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11604-024-01705-1","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/25 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: In radiology, large language models (LLMs), including ChatGPT, have recently gained attention, and their utility is being rapidly evaluated. However, concerns have emerged regarding their reliability in clinical applications due to limitations such as hallucinations and insufficient referencing. To address these issues, we focus on the latest technology, retrieval-augmented generation (RAG), which enables LLMs to reference reliable external knowledge (REK). Specifically, this study examines the utility and reliability of a recently released RAG-equipped LLM (RAG-LLM), NotebookLM, for staging lung cancer.

Materials and methods: We summarized the current lung cancer staging guideline in Japan and provided this as REK to NotebookLM. We then tasked NotebookLM with staging 100 fictional lung cancer cases based on CT findings and evaluated its accuracy. For comparison, we performed the same task using a gold-standard LLM, GPT-4 Omni (GPT-4o), both with and without the REK. For GPT-4o, the REK was provided directly within the prompt rather than through RAG.

Results: NotebookLM achieved 86% diagnostic accuracy in the lung cancer staging experiment, outperforming GPT-4o, which recorded 39% accuracy with the REK and 25% without it. Moreover, NotebookLM demonstrated 95% accuracy in searching reference locations within the REK.

Conclusion: NotebookLM, a RAG-LLM, successfully performed lung cancer staging by utilizing the REK, demonstrating superior performance compared to GPT-4o (without RAG). Additionally, it provided highly accurate reference locations within the REK, allowing radiologists to efficiently evaluate the reliability of NotebookLM's responses and detect possible hallucinations. Overall, this study highlights the potential of NotebookLM, a RAG-LLM, in image diagnosis.

查看原文本刊更多论文

将具有检索增强生成功能的大型语言模型 NotebookLM 应用于肺癌分期。

目的：在放射学领域，包括 ChatGPT 在内的大型语言模型（LLMs）近来备受关注，其实用性正在迅速得到评估。然而，由于幻觉和参考不足等局限性，人们对其在临床应用中的可靠性产生了担忧。为了解决这些问题，我们重点研究了最新的技术--检索增强生成（RAG），它能使 LLMs 参考可靠的外部知识（REK）。具体来说，本研究考察了最近发布的配备 RAG 的 LLM（RAG-LLM）--NotebookLM 在肺癌分期方面的实用性和可靠性：我们总结了日本现行的肺癌分期指南，并将其作为 REK 提供给 NotebookLM。然后，我们让 NotebookLM 根据 CT 结果对 100 例虚构的肺癌病例进行分期，并评估其准确性。为了进行比较，我们使用黄金标准 LLM GPT-4 Omni（GPT-4o）执行了相同的任务，包括使用和不使用 REK。对于 GPT-4o，REK 直接在提示中提供，而不是通过 RAG：结果：在肺癌分期实验中，NotebookLM 的诊断准确率达到 86%，优于 GPT-4o，后者使用 REK 时的准确率为 39%，不使用 REK 时为 25%。此外，NotebookLM 在 REK 中搜索参考位置的准确率达到 95%：NotebookLM是一种RAG-LLM，它利用REK成功地进行了肺癌分期，与GPT-4o（不含RAG）相比表现出更优越的性能。此外，它还在 REK 中提供了高度准确的参考位置，使放射科医生能够有效评估 NotebookLM 响应的可靠性，并检测可能出现的幻觉。总之，这项研究凸显了 RAG-LLM NotebookLM 在图像诊断中的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Japanese Journal of Radiology Medicine-Radiology, Nuclear Medicine and Imaging

自引率

4.80%

发文量

133

期刊介绍： Japanese Journal of Radiology is a peer-reviewed journal, officially published by the Japan Radiological Society. The main purpose of the journal is to provide a forum for the publication of papers documenting recent advances and new developments in the field of radiology in medicine and biology. The scope of Japanese Journal of Radiology encompasses but is not restricted to diagnostic radiology, interventional radiology, radiation oncology, nuclear medicine, radiation physics, and radiation biology. Additionally, the journal covers technical and industrial innovations. The journal welcomes original articles, technical notes, review articles, pictorial essays and letters to the editor. The journal also provides announcements from the boards and the committees of the society. Membership in the Japan Radiological Society is not a prerequisite for submission. Contributions are welcomed from all parts of the world.