作者回复：Re: Koga et al。检索增强生成与基于文档的生成：大型语言模型中的关键区别。

IF 3.7 2区医学 Q1 PATHOLOGY

Journal of Pathology Clinical Research Pub Date : 2025-01-21 DOI:10.1002/2056-4538.70013

Katherine J Hewitt, Isabella C Wiest, Jakob N Kather

{"title":"作者回复：Re: Koga et al。检索增强生成与基于文档的生成：大型语言模型中的关键区别。","authors":"Katherine J Hewitt, Isabella C Wiest, Jakob N Kather","doi":"10.1002/2056-4538.70013","DOIUrl":null,"url":null,"abstract":"We thank Koga et al for their knowledgeable comments on our work. Their letter highlights a valid question that requires clarification [1].Our study assessed the ability of three large language models (LLMs) to diagnose neuropathology cases from free-text descriptions of adult-type diffuse gliomas, for which we compared two methodologies. The first method provided each model with the free-text tumor descriptions alone, while the second approach additionally provided the models with a Word document of the WHO CNS5. We termed these approaches zero-shot and retrieval-augmented generation (RAG), respectively [2]. Koga et al point out that the methodology we describe in our paper as RAG, may be better described as document-grounded generation, or in-context learning.While we agree with the definition of RAG provided in the letter as it was initially defined [3], the field has evolved significantly since the approach was first proposed by Lewis et al in 2020. Three paradigms of RAG are now increasingly recognized: naive RAG, advanced RAG, and modular RAG [4]. Naive RAG is an approach where the data for indexing are generally obtained offline and converted into a format such as PDF or Word, and uploaded with the query via the context window. Advanced RAG and modular RAG offer specific improvements to address the limitations of naive RAG; however, to achieve this, they utilize more technical approaches.The intention for our paper was to use naive RAG. We chose this approach as it leverages the easiest possible way for improving an LLM response that would be reproducible by doctors, considering that most doctors would be unable to utilize the application programming interface and programmatically build a RAG pipeline. As discussed by Koga et al, the key difference between naive RAG and document-grounding lies in how the document is utilized when the model retrieves its response [5]. Document-grounding submits the document with the user query and is equivalent to inserting the entire document text into the context window [5]. Whereas with naive RAG, relevant parts of the document are identified by the model and used with the query to dynamically search its database [4]. Both approaches are examples of in-context learning as they acquire additional knowledge from the prompt without requiring parameter updates [6].Bereft of transparency from the LLM providers regarding how they process the document once it has been submitted via the graphical user interface, it is difficult to know whether naive RAG or document-grounding was used to formulate a response. To our knowledge, details regarding how appended documents are utilized during a query are not freely available online by ChatGPT, Llama, or Claude. Furthermore, due to the speed of development in the field, technical aspects of how documents are utilized may have changed since our experiments were conducted earlier this year. Nonetheless, we contacted ChatGPT, Anthropic, and Poe for assistance in clarifying this point. All three providers confirmed that documents uploaded with a query are used for RAG. However, the responses from ChatGPT and Anthropic were both generated by bots, demonstrating the need for more reliable and greater transparency about the actual technical methods used.We are grateful for your endorsement of our conclusions and appreciate the opportunity to address this important distinction. Further clarification and transparency are needed to definitively distinguish the mechanisms employed by specific LLM platforms, particularly regarding how appended documents are processed. We proffer that our work uses in-context learning, as both RAG and document-grounding are methods of this broader paradigm. Nevertheless, we remain committed to clarifying this matter and thank Koga et al for their engagement and valuable input.KJH wrote the first draft; review and critique; and final approval. ICW review and critique; and final approval. JNK review and critique; and final approval.","PeriodicalId":48612,"journal":{"name":"Journal of Pathology Clinical Research","volume":"11 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747990/pdf/","citationCount":"0","resultStr":"{\"title\":\"Authors' reply: Re: Koga et al. Retrieval-augmented generation versus document-grounded generation: a key distinction in large language models\",\"authors\":\"Katherine J Hewitt, Isabella C Wiest, Jakob N Kather\",\"doi\":\"10.1002/2056-4538.70013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We thank Koga et al for their knowledgeable comments on our work. Their letter highlights a valid question that requires clarification [1].Our study assessed the ability of three large language models (LLMs) to diagnose neuropathology cases from free-text descriptions of adult-type diffuse gliomas, for which we compared two methodologies. The first method provided each model with the free-text tumor descriptions alone, while the second approach additionally provided the models with a Word document of the WHO CNS5. We termed these approaches zero-shot and retrieval-augmented generation (RAG), respectively [2]. Koga et al point out that the methodology we describe in our paper as RAG, may be better described as document-grounded generation, or in-context learning.While we agree with the definition of RAG provided in the letter as it was initially defined [3], the field has evolved significantly since the approach was first proposed by Lewis et al in 2020. Three paradigms of RAG are now increasingly recognized: naive RAG, advanced RAG, and modular RAG [4]. Naive RAG is an approach where the data for indexing are generally obtained offline and converted into a format such as PDF or Word, and uploaded with the query via the context window. Advanced RAG and modular RAG offer specific improvements to address the limitations of naive RAG; however, to achieve this, they utilize more technical approaches.The intention for our paper was to use naive RAG. We chose this approach as it leverages the easiest possible way for improving an LLM response that would be reproducible by doctors, considering that most doctors would be unable to utilize the application programming interface and programmatically build a RAG pipeline. As discussed by Koga et al, the key difference between naive RAG and document-grounding lies in how the document is utilized when the model retrieves its response [5]. Document-grounding submits the document with the user query and is equivalent to inserting the entire document text into the context window [5]. Whereas with naive RAG, relevant parts of the document are identified by the model and used with the query to dynamically search its database [4]. Both approaches are examples of in-context learning as they acquire additional knowledge from the prompt without requiring parameter updates [6].Bereft of transparency from the LLM providers regarding how they process the document once it has been submitted via the graphical user interface, it is difficult to know whether naive RAG or document-grounding was used to formulate a response. To our knowledge, details regarding how appended documents are utilized during a query are not freely available online by ChatGPT, Llama, or Claude. Furthermore, due to the speed of development in the field, technical aspects of how documents are utilized may have changed since our experiments were conducted earlier this year. Nonetheless, we contacted ChatGPT, Anthropic, and Poe for assistance in clarifying this point. All three providers confirmed that documents uploaded with a query are used for RAG. However, the responses from ChatGPT and Anthropic were both generated by bots, demonstrating the need for more reliable and greater transparency about the actual technical methods used.We are grateful for your endorsement of our conclusions and appreciate the opportunity to address this important distinction. Further clarification and transparency are needed to definitively distinguish the mechanisms employed by specific LLM platforms, particularly regarding how appended documents are processed. We proffer that our work uses in-context learning, as both RAG and document-grounding are methods of this broader paradigm. Nevertheless, we remain committed to clarifying this matter and thank Koga et al for their engagement and valuable input.KJH wrote the first draft; review and critique; and final approval. ICW review and critique; and final approval. JNK review and critique; and final approval.\",\"PeriodicalId\":48612,\"journal\":{\"name\":\"Journal of Pathology Clinical Research\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747990/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Pathology Clinical Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/2056-4538.70013\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pathology Clinical Research","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/2056-4538.70013","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PATHOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

我们感谢Koga等人对我们的工作提出的有见地的评论。他们的信强调了一个需要澄清的合理问题。我们的研究评估了三种大型语言模型（llm）从成人型弥漫性神经胶质瘤的自由文本描述中诊断神经病理学病例的能力，为此我们比较了两种方法。第一种方法为每个模型单独提供自由文本的肿瘤描述，而第二种方法为模型额外提供WHO CNS5的Word文档。我们将这些方法分别命名为零射击和检索增强生成（RAG）。Koga等人指出，我们在论文中描述为RAG的方法可能被更好地描述为基于文档的生成或上下文学习。虽然我们同意信中提供的RAG的定义，因为它最初是在2010年定义的，但自Lewis等人在2020年首次提出该方法以来，该领域已经发生了重大变化。现在越来越多地认识到RAG的三种范式：朴素的RAG、高级的RAG和模块化的RAG[4]。朴素RAG是一种方法，在这种方法中，用于索引的数据通常是脱机获取的，并转换为PDF或Word等格式，然后通过上下文窗口与查询一起上传。高级RAG和模块化RAG提供了特定的改进，以解决原始RAG的局限性；然而，为了实现这一点，他们使用了更多的技术方法。我们论文的目的是使用朴素的RAG。考虑到大多数医生无法利用应用程序编程接口并以编程方式构建RAG管道，我们选择了这种方法，因为它利用了最简单的方法来改进医生可重复的LLM响应。正如Koga等人所讨论的，朴素RAG和基于文档之间的关键区别在于当模型检索其响应[5]时如何利用文档。文档基础提交带有用户查询的文档，相当于将整个文档文本插入上下文窗口[5]。而对于朴素的RAG，文档的相关部分由模型识别，并与查询一起用于动态搜索其数据库[4]。这两种方法都是上下文学习的例子，因为它们从提示符中获取额外的知识，而不需要参数更新[6]。在通过图形用户界面提交文档后，LLM提供者对于如何处理文档缺乏透明度，因此很难知道是否使用了幼稚的RAG或文档基础来制定响应。据我们所知，关于在查询期间如何使用附加文档的详细信息，ChatGPT、Llama或Claude并没有免费在线提供。此外，由于该领域的发展速度，自我们今年早些时候进行实验以来，如何利用文件的技术方面可能已经发生了变化。尽管如此，我们还是联系了ChatGPT， Anthropic和Poe，以帮助澄清这一点。所有三个提供者都确认使用查询上传的文档将用于RAG。然而，ChatGPT和Anthropic的回复都是由机器人生成的，这表明需要对实际使用的技术方法进行更可靠和更大的透明度。我们感谢你赞同我们的结论，并感谢有机会讨论这一重要区别。需要进一步的澄清和透明度，以明确区分特定法学硕士平台所采用的机制，特别是关于如何处理附加文件。我们建议我们的工作使用上下文学习，因为RAG和文档基础都是这种更广泛范例的方法。尽管如此，我们仍然致力于澄清此事，并感谢Koga等人的参与和宝贵的投入。KJH写了初稿；审查和批评；最后的批准。ICW审查和批评；最后的批准。JNK审查和批评；最后的批准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Authors' reply: Re: Koga et al. Retrieval-augmented generation versus document-grounded generation: a key distinction in large language models

We thank Koga et al for their knowledgeable comments on our work. Their letter highlights a valid question that requires clarification [1].

Our study assessed the ability of three large language models (LLMs) to diagnose neuropathology cases from free-text descriptions of adult-type diffuse gliomas, for which we compared two methodologies. The first method provided each model with the free-text tumor descriptions alone, while the second approach additionally provided the models with a Word document of the WHO CNS5. We termed these approaches zero-shot and retrieval-augmented generation (RAG), respectively [2]. Koga et al point out that the methodology we describe in our paper as RAG, may be better described as document-grounded generation, or in-context learning.

While we agree with the definition of RAG provided in the letter as it was initially defined [3], the field has evolved significantly since the approach was first proposed by Lewis et al in 2020. Three paradigms of RAG are now increasingly recognized: naive RAG, advanced RAG, and modular RAG [4]. Naive RAG is an approach where the data for indexing are generally obtained offline and converted into a format such as PDF or Word, and uploaded with the query via the context window. Advanced RAG and modular RAG offer specific improvements to address the limitations of naive RAG; however, to achieve this, they utilize more technical approaches.

The intention for our paper was to use naive RAG. We chose this approach as it leverages the easiest possible way for improving an LLM response that would be reproducible by doctors, considering that most doctors would be unable to utilize the application programming interface and programmatically build a RAG pipeline. As discussed by Koga et al, the key difference between naive RAG and document-grounding lies in how the document is utilized when the model retrieves its response [5]. Document-grounding submits the document with the user query and is equivalent to inserting the entire document text into the context window [5]. Whereas with naive RAG, relevant parts of the document are identified by the model and used with the query to dynamically search its database [4]. Both approaches are examples of in-context learning as they acquire additional knowledge from the prompt without requiring parameter updates [6].

Bereft of transparency from the LLM providers regarding how they process the document once it has been submitted via the graphical user interface, it is difficult to know whether naive RAG or document-grounding was used to formulate a response. To our knowledge, details regarding how appended documents are utilized during a query are not freely available online by ChatGPT, Llama, or Claude. Furthermore, due to the speed of development in the field, technical aspects of how documents are utilized may have changed since our experiments were conducted earlier this year. Nonetheless, we contacted ChatGPT, Anthropic, and Poe for assistance in clarifying this point. All three providers confirmed that documents uploaded with a query are used for RAG. However, the responses from ChatGPT and Anthropic were both generated by bots, demonstrating the need for more reliable and greater transparency about the actual technical methods used.

We are grateful for your endorsement of our conclusions and appreciate the opportunity to address this important distinction. Further clarification and transparency are needed to definitively distinguish the mechanisms employed by specific LLM platforms, particularly regarding how appended documents are processed. We proffer that our work uses in-context learning, as both RAG and document-grounding are methods of this broader paradigm. Nevertheless, we remain committed to clarifying this matter and thank Koga et al for their engagement and valuable input.

KJH wrote the first draft; review and critique; and final approval. ICW review and critique; and final approval. JNK review and critique; and final approval.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Pathology Clinical Research Medicine-Pathology and Forensic Medicine

CiteScore

7.40

自引率

2.40%

发文量

审稿时长

20 weeks

期刊介绍： The Journal of Pathology: Clinical Research and The Journal of Pathology serve as translational bridges between basic biomedical science and clinical medicine with particular emphasis on, but not restricted to, tissue based studies. The focus of The Journal of Pathology: Clinical Research is the publication of studies that illuminate the clinical relevance of research in the broad area of the study of disease. Appropriately powered and validated studies with novel diagnostic, prognostic and predictive significance, and biomarker discover and validation, will be welcomed. Studies with a predominantly mechanistic basis will be more appropriate for the companion Journal of Pathology.