Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese
{"title":"从诊断报告中自动提取结构化数据的语言模型和检索增强生成技术","authors":"Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese","doi":"arxiv-2409.10576","DOIUrl":null,"url":null,"abstract":"Purpose: To develop and evaluate an automated system for extracting\nstructured clinical information from unstructured radiology and pathology\nreports using open-weights large language models (LMs) and retrieval augmented\ngeneration (RAG), and to assess the effects of model configuration variables on\nextraction performance. Methods and Materials: The study utilized two datasets:\n7,294 radiology reports annotated for Brain Tumor Reporting and Data System\n(BT-RADS) scores and 2,154 pathology reports annotated for isocitrate\ndehydrogenase (IDH) mutation status. An automated pipeline was developed to\nbenchmark the performance of various LMs and RAG configurations. The impact of\nmodel size, quantization, prompting strategies, output formatting, and\ninference parameters was systematically evaluated. Results: The best performing\nmodels achieved over 98% accuracy in extracting BT-RADS scores from radiology\nreports and over 90% for IDH mutation status extraction from pathology reports.\nThe top model being medical fine-tuned llama3. Larger, newer, and domain\nfine-tuned models consistently outperformed older and smaller models. Model\nquantization had minimal impact on performance. Few-shot prompting\nsignificantly improved accuracy. RAG improved performance for complex pathology\nreports but not for shorter radiology reports. Conclusions: Open LMs\ndemonstrate significant potential for automated extraction of structured\nclinical data from unstructured clinical reports with local privacy-preserving\napplication. Careful model selection, prompt engineering, and semi-automated\noptimization using annotated data are critical for optimal performance. These\napproaches could be reliable enough for practical use in research workflows,\nhighlighting the potential for human-machine collaboration in healthcare data\nextraction.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports\",\"authors\":\"Mohamed Sobhi Jabal, Pranav Warman, Jikai Zhang, Kartikeye Gupta, Ayush Jain, Maciej Mazurowski, Walter Wiggins, Kirti Magudia, Evan Calabrese\",\"doi\":\"arxiv-2409.10576\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: To develop and evaluate an automated system for extracting\\nstructured clinical information from unstructured radiology and pathology\\nreports using open-weights large language models (LMs) and retrieval augmented\\ngeneration (RAG), and to assess the effects of model configuration variables on\\nextraction performance. Methods and Materials: The study utilized two datasets:\\n7,294 radiology reports annotated for Brain Tumor Reporting and Data System\\n(BT-RADS) scores and 2,154 pathology reports annotated for isocitrate\\ndehydrogenase (IDH) mutation status. An automated pipeline was developed to\\nbenchmark the performance of various LMs and RAG configurations. The impact of\\nmodel size, quantization, prompting strategies, output formatting, and\\ninference parameters was systematically evaluated. Results: The best performing\\nmodels achieved over 98% accuracy in extracting BT-RADS scores from radiology\\nreports and over 90% for IDH mutation status extraction from pathology reports.\\nThe top model being medical fine-tuned llama3. Larger, newer, and domain\\nfine-tuned models consistently outperformed older and smaller models. Model\\nquantization had minimal impact on performance. Few-shot prompting\\nsignificantly improved accuracy. RAG improved performance for complex pathology\\nreports but not for shorter radiology reports. Conclusions: Open LMs\\ndemonstrate significant potential for automated extraction of structured\\nclinical data from unstructured clinical reports with local privacy-preserving\\napplication. Careful model selection, prompt engineering, and semi-automated\\noptimization using annotated data are critical for optimal performance. These\\napproaches could be reliable enough for practical use in research workflows,\\nhighlighting the potential for human-machine collaboration in healthcare data\\nextraction.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10576\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Language Models and Retrieval Augmented Generation for Automated Structured Data Extraction from Diagnostic Reports
Purpose: To develop and evaluate an automated system for extracting
structured clinical information from unstructured radiology and pathology
reports using open-weights large language models (LMs) and retrieval augmented
generation (RAG), and to assess the effects of model configuration variables on
extraction performance. Methods and Materials: The study utilized two datasets:
7,294 radiology reports annotated for Brain Tumor Reporting and Data System
(BT-RADS) scores and 2,154 pathology reports annotated for isocitrate
dehydrogenase (IDH) mutation status. An automated pipeline was developed to
benchmark the performance of various LMs and RAG configurations. The impact of
model size, quantization, prompting strategies, output formatting, and
inference parameters was systematically evaluated. Results: The best performing
models achieved over 98% accuracy in extracting BT-RADS scores from radiology
reports and over 90% for IDH mutation status extraction from pathology reports.
The top model being medical fine-tuned llama3. Larger, newer, and domain
fine-tuned models consistently outperformed older and smaller models. Model
quantization had minimal impact on performance. Few-shot prompting
significantly improved accuracy. RAG improved performance for complex pathology
reports but not for shorter radiology reports. Conclusions: Open LMs
demonstrate significant potential for automated extraction of structured
clinical data from unstructured clinical reports with local privacy-preserving
application. Careful model selection, prompt engineering, and semi-automated
optimization using annotated data are critical for optimal performance. These
approaches could be reliable enough for practical use in research workflows,
highlighting the potential for human-machine collaboration in healthcare data
extraction.