Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language.

IF 3.4 Q2 HEALTH CARE SCIENCES & SERVICES

JAMIA Open Pub Date : 2025-02-08 eCollection Date: 2025-02-01 DOI:10.1093/jamiaopen/ooaf003

Jeffery L Painter, Venkateswara Rao Chalamalasetti, Raymond Kassekert, Andrew Bate

{"title":"Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language.","authors":"Jeffery L Painter, Venkateswara Rao Chalamalasetti, Raymond Kassekert, Andrew Bate","doi":"10.1093/jamiaopen/ooaf003","DOIUrl":null,"url":null,"abstract":"Objective: To enhance the accuracy of information retrieval from pharmacovigilance (PV) databases by employing Large Language Models (LLMs) to convert natural language queries (NLQs) into Structured Query Language (SQL) queries, leveraging a business context document.Materials and methods: We utilized OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework, enriched with a business context document, to transform NLQs into executable SQL queries. Each NLQ was presented to the LLM randomly and independently to prevent memorization. The study was conducted in 3 phases, varying query complexity, and assessing the LLM's performance both with and without the business context document.Results: Our approach significantly improved NLQ-to-SQL accuracy, increasing from 8.3% with the database schema alone to 78.3% with the business context document. This enhancement was consistent across low, medium, and high complexity queries, indicating the critical role of contextual knowledge in query generation.Discussion: The integration of a business context document markedly improved the LLM's ability to generate accurate SQL queries (ie, both executable and returning semantically appropriate results). Performance achieved a maximum of 85% when high complexity queries are excluded, suggesting promise for routine deployment.Conclusion: This study presents a novel approach to employing LLMs for safety data retrieval and analysis, demonstrating significant advancements in query generation accuracy. The methodology offers a framework applicable to various data-intensive domains, enhancing the accessibility of information retrieval for non-technical users.","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf003"},"PeriodicalIF":3.4000,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806702/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooaf003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: To enhance the accuracy of information retrieval from pharmacovigilance (PV) databases by employing Large Language Models (LLMs) to convert natural language queries (NLQs) into Structured Query Language (SQL) queries, leveraging a business context document.

Materials and methods: We utilized OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework, enriched with a business context document, to transform NLQs into executable SQL queries. Each NLQ was presented to the LLM randomly and independently to prevent memorization. The study was conducted in 3 phases, varying query complexity, and assessing the LLM's performance both with and without the business context document.

Results: Our approach significantly improved NLQ-to-SQL accuracy, increasing from 8.3% with the database schema alone to 78.3% with the business context document. This enhancement was consistent across low, medium, and high complexity queries, indicating the critical role of contextual knowledge in query generation.

Discussion: The integration of a business context document markedly improved the LLM's ability to generate accurate SQL queries (ie, both executable and returning semantically appropriate results). Performance achieved a maximum of 85% when high complexity queries are excluded, suggesting promise for routine deployment.

Conclusion: This study presents a novel approach to employing LLMs for safety data retrieval and analysis, demonstrating significant advancements in query generation accuracy. The methodology offers a framework applicable to various data-intensive domains, enhancing the accessibility of information retrieval for non-technical users.

Abstract Image

查看原文本刊更多论文

自动化药物警戒证据生成：使用大型语言模型生成上下文感知的结构化查询语言。

目的：利用商业上下文文档，利用大型语言模型（LLMs）将自然语言查询（NLQs）转换为结构化查询语言（SQL）查询，以提高从药物警戒（PV）数据库中检索信息的准确性。材料和方法：我们在检索增强生成（RAG）框架中使用OpenAI的GPT-4模型，并辅以业务上下文文档，将nlq转换为可执行的SQL查询。每个NLQ随机独立地呈现给LLM，以防止记忆。该研究分3个阶段进行，改变查询复杂性，并评估LLM在有和没有业务上下文文档的情况下的性能。结果：我们的方法显著提高了NLQ-to-SQL的准确性，从仅使用数据库模式的8.3%增加到使用业务上下文文档的78.3%。这种增强在低、中、高复杂性查询中都是一致的，这表明上下文知识在查询生成中的关键作用。讨论：业务上下文文档的集成显著提高了LLM生成准确SQL查询的能力（即，既可执行又返回语义上适当的结果）。当排除高复杂性查询时，性能最高可达到85%，这表明可以进行常规部署。结论：本研究提出了一种采用llm进行安全数据检索和分析的新方法，在查询生成准确性方面取得了显着进步。该方法提供了一个适用于各种数据密集型领域的框架，增强了非技术用户对信息检索的可访问性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊