Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language.

IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES
JAMIA Open Pub Date : 2025-02-08 eCollection Date: 2025-02-01 DOI:10.1093/jamiaopen/ooaf003
Jeffery L Painter, Venkateswara Rao Chalamalasetti, Raymond Kassekert, Andrew Bate
{"title":"Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language.","authors":"Jeffery L Painter, Venkateswara Rao Chalamalasetti, Raymond Kassekert, Andrew Bate","doi":"10.1093/jamiaopen/ooaf003","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To enhance the accuracy of information retrieval from pharmacovigilance (PV) databases by employing Large Language Models (LLMs) to convert natural language queries (NLQs) into Structured Query Language (SQL) queries, leveraging a business context document.</p><p><strong>Materials and methods: </strong>We utilized OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework, enriched with a business context document, to transform NLQs into executable SQL queries. Each NLQ was presented to the LLM randomly and independently to prevent memorization. The study was conducted in 3 phases, varying query complexity, and assessing the LLM's performance both with and without the business context document.</p><p><strong>Results: </strong>Our approach significantly improved NLQ-to-SQL accuracy, increasing from 8.3% with the database schema alone to 78.3% with the business context document. This enhancement was consistent across low, medium, and high complexity queries, indicating the critical role of contextual knowledge in query generation.</p><p><strong>Discussion: </strong>The integration of a business context document markedly improved the LLM's ability to generate accurate SQL queries (ie, both executable and returning semantically appropriate results). Performance achieved a maximum of 85% when high complexity queries are excluded, suggesting promise for routine deployment.</p><p><strong>Conclusion: </strong>This study presents a novel approach to employing LLMs for safety data retrieval and analysis, demonstrating significant advancements in query generation accuracy. The methodology offers a framework applicable to various data-intensive domains, enhancing the accessibility of information retrieval for non-technical users.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf003"},"PeriodicalIF":3.4000,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806702/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: To enhance the accuracy of information retrieval from pharmacovigilance (PV) databases by employing Large Language Models (LLMs) to convert natural language queries (NLQs) into Structured Query Language (SQL) queries, leveraging a business context document.

Materials and methods: We utilized OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework, enriched with a business context document, to transform NLQs into executable SQL queries. Each NLQ was presented to the LLM randomly and independently to prevent memorization. The study was conducted in 3 phases, varying query complexity, and assessing the LLM's performance both with and without the business context document.

Results: Our approach significantly improved NLQ-to-SQL accuracy, increasing from 8.3% with the database schema alone to 78.3% with the business context document. This enhancement was consistent across low, medium, and high complexity queries, indicating the critical role of contextual knowledge in query generation.

Discussion: The integration of a business context document markedly improved the LLM's ability to generate accurate SQL queries (ie, both executable and returning semantically appropriate results). Performance achieved a maximum of 85% when high complexity queries are excluded, suggesting promise for routine deployment.

Conclusion: This study presents a novel approach to employing LLMs for safety data retrieval and analysis, demonstrating significant advancements in query generation accuracy. The methodology offers a framework applicable to various data-intensive domains, enhancing the accessibility of information retrieval for non-technical users.

Abstract Image

Abstract Image

Abstract Image

自动化药物警戒证据生成:使用大型语言模型生成上下文感知的结构化查询语言。
目的:利用商业上下文文档,利用大型语言模型(LLMs)将自然语言查询(NLQs)转换为结构化查询语言(SQL)查询,以提高从药物警戒(PV)数据库中检索信息的准确性。材料和方法:我们在检索增强生成(RAG)框架中使用OpenAI的GPT-4模型,并辅以业务上下文文档,将nlq转换为可执行的SQL查询。每个NLQ随机独立地呈现给LLM,以防止记忆。该研究分3个阶段进行,改变查询复杂性,并评估LLM在有和没有业务上下文文档的情况下的性能。结果:我们的方法显著提高了NLQ-to-SQL的准确性,从仅使用数据库模式的8.3%增加到使用业务上下文文档的78.3%。这种增强在低、中、高复杂性查询中都是一致的,这表明上下文知识在查询生成中的关键作用。讨论:业务上下文文档的集成显著提高了LLM生成准确SQL查询的能力(即,既可执行又返回语义上适当的结果)。当排除高复杂性查询时,性能最高可达到85%,这表明可以进行常规部署。结论:本研究提出了一种采用llm进行安全数据检索和分析的新方法,在查询生成准确性方面取得了显着进步。该方法提供了一个适用于各种数据密集型领域的框架,增强了非技术用户对信息检索的可访问性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信