从诊断报告中自动提取结构化数据的语言模型和检索增强生成技术

arXiv - CS - Information Retrieval Pub Date : 2024-09-15 DOI:arxiv-2409.10576

Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese

{"title":"从诊断报告中自动提取结构化数据的语言模型和检索增强生成技术","authors":"Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese","doi":"arxiv-2409.10576","DOIUrl":null,"url":null,"abstract":"Purpose: To develop and evaluate an automated system for extracting\nstructured clinical information from unstructured radiology and pathology\nreports using open-weights large language models (LMs) and retrieval augmented\ngeneration (RAG), and to assess the effects of model configuration variables on\nextraction performance. Methods and Materials: The study utilized two datasets:\n7,294 radiology reports annotated for Brain Tumor Reporting and Data System\n(BT-RADS) scores and 2,154 pathology reports annotated for isocitrate\ndehydrogenase (IDH) mutation status. An automated pipeline was developed to\nbenchmark the performance of various LMs and RAG configurations. The impact of\nmodel size, quantization, prompting strategies, output formatting, and\ninference parameters was systematically evaluated. Results: The best performing\nmodels achieved over 98% accuracy in extracting BT-RADS scores from radiology\nreports and over 90% for IDH mutation status extraction from pathology reports.\nThe top model being medical fine-tuned llama3. Larger, newer, and domain\nfine-tuned models consistently outperformed older and smaller models. Model\nquantization had minimal impact on performance. Few-shot prompting\nsignificantly improved accuracy. RAG improved performance for complex pathology\nreports but not for shorter radiology reports. Conclusions: Open LMs\ndemonstrate significant potential for automated extraction of structured\nclinical data from unstructured clinical reports with local privacy-preserving\napplication. Careful model selection, prompt engineering, and semi-automated\noptimization using annotated data are critical for optimal performance. These\napproaches could be reliable enough for practical use in research workflows,\nhighlighting the potential for human-machine collaboration in healthcare data\nextraction.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports\",\"authors\":\"Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese\",\"doi\":\"arxiv-2409.10576\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: To develop and evaluate an automated system for extracting\\nstructured clinical information from unstructured radiology and pathology\\nreports using open-weights large language models (LMs) and retrieval augmented\\ngeneration (RAG), and to assess the effects of model configuration variables on\\nextraction performance. Methods and Materials: The study utilized two datasets:\\n7,294 radiology reports annotated for Brain Tumor Reporting and Data System\\n(BT-RADS) scores and 2,154 pathology reports annotated for isocitrate\\ndehydrogenase (IDH) mutation status. An automated pipeline was developed to\\nbenchmark the performance of various LMs and RAG configurations. The impact of\\nmodel size, quantization, prompting strategies, output formatting, and\\ninference parameters was systematically evaluated. Results: The best performing\\nmodels achieved over 98% accuracy in extracting BT-RADS scores from radiology\\nreports and over 90% for IDH mutation status extraction from pathology reports.\\nThe top model being medical fine-tuned llama3. Larger, newer, and domain\\nfine-tuned models consistently outperformed older and smaller models. Model\\nquantization had minimal impact on performance. Few-shot prompting\\nsignificantly improved accuracy. RAG improved performance for complex pathology\\nreports but not for shorter radiology reports. Conclusions: Open LMs\\ndemonstrate significant potential for automated extraction of structured\\nclinical data from unstructured clinical reports with local privacy-preserving\\napplication. Careful model selection, prompt engineering, and semi-automated\\noptimization using annotated data are critical for optimal performance. These\\napproaches could be reliable enough for practical use in research workflows,\\nhighlighting the potential for human-machine collaboration in healthcare data\\nextraction.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10576\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目的：使用开放权重大语言模型（LM）和检索增强生成（RAG）开发和评估从非结构化放射学和病理学报告中提取结构化临床信息的自动化系统，并评估模型配置变量对提取性能的影响。方法和材料：该研究使用了两个数据集：7,294 份注释了脑肿瘤报告和数据系统（BT-RADS）评分的放射学报告和 2,154 份注释了异柠檬酸氢酶（IDH）突变状态的病理学报告。我们开发了一个自动化管道，对各种 LM 和 RAG 配置的性能进行enchmark。系统评估了模型大小、量化、提示策略、输出格式和推断参数的影响。结果：表现最好的模型从放射报告中提取 BT-RADS 评分的准确率超过 98%，从病理报告中提取 IDH 突变状态的准确率超过 90%。较大、较新和经过领域微调的模型始终优于较旧和较小的模型。模型量化对性能的影响微乎其微。少量提示显著提高了准确性。RAG 提高了复杂病理报告的性能，但对较短的放射学报告没有影响。结论：开放式 LM 展示了从非结构化临床报告中自动提取结构化临床数据的巨大潜力，同时还能在本地应用中保护隐私。使用注释数据进行谨慎的模型选择、及时的工程设计和半自动优化对于实现最佳性能至关重要。这些方法足够可靠，可用于研究工作流程的实际应用，凸显了人机协作在医疗数据提取方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports

Purpose: To develop and evaluate an automated system for extracting structured clinical information from unstructured radiology and pathology reports using open-weights large language models (LMs) and retrieval augmented generation (RAG), and to assess the effects of model configuration variables on extraction performance. Methods and Materials: The study utilized two datasets: 7,294 radiology reports annotated for Brain Tumor Reporting and Data System (BT-RADS) scores and 2,154 pathology reports annotated for isocitrate dehydrogenase (IDH) mutation status. An automated pipeline was developed to benchmark the performance of various LMs and RAG configurations. The impact of model size, quantization, prompting strategies, output formatting, and inference parameters was systematically evaluated. Results: The best performing models achieved over 98% accuracy in extracting BT-RADS scores from radiology reports and over 90% for IDH mutation status extraction from pathology reports. The top model being medical fine-tuned llama3. Larger, newer, and domain fine-tuned models consistently outperformed older and smaller models. Model quantization had minimal impact on performance. Few-shot prompting significantly improved accuracy. RAG improved performance for complex pathology reports but not for shorter radiology reports. Conclusions: Open LMs demonstrate significant potential for automated extraction of structured clinical data from unstructured clinical reports with local privacy-preserving application. Careful model selection, prompt engineering, and semi-automated optimization using annotated data are critical for optimal performance. These approaches could be reliable enough for practical use in research workflows, highlighting the potential for human-machine collaboration in healthcare data extraction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Information Retrieval

自引率

0.00%

发文量