A knowledge-graph-based pharmaceutical engineering chatbot for drug discovery

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Naz Pinar Taskiran, Chia-En Jacklyn Tsai, Shuxin Huang, Arijit Chakraborty, Venkat Venkatasubramanian
{"title":"A knowledge-graph-based pharmaceutical engineering chatbot for drug discovery","authors":"Naz Pinar Taskiran,&nbsp;Chia-En Jacklyn Tsai,&nbsp;Shuxin Huang,&nbsp;Arijit Chakraborty,&nbsp;Venkat Venkatasubramanian","doi":"10.1016/j.compchemeng.2025.109318","DOIUrl":null,"url":null,"abstract":"<div><div>Despite their success in day-to-day applications, ChatGPT and other large language models (LLMs) have not covered as much ground in scientific and engineering domains. One key challenge is the abundance of domain-specific terminology, which an LLM is not trained to extract in accordance with the underlying physical laws. Such black-box models can also lead to unreliable results or hallucinations. Hybrid AI, which combines data-driven and symbolic methods, leverages domain knowledge to add explainability and reliability to answers. Our group has previously developed a domain-informed ontology-based information extraction tool called SUSIE, which extracts key terms and their context to present them to the user as knowledge graphs (KGs). Although KGs are used to visualize relationships between different entities, they are not easily accessible for user questions. However, they serve as a structured input for LLMs. Thus, KGs can efficiently query a corpus of pharmaceutical documents, streamlining drug discovery and manufacturing processes. In this work, we propose methods to improve the information extraction capabilities of SUSIE by expanding its knowledge base and improving its ability to understand scientific material through a sentence-restructuring module. Additionally, we present a customized question-and-answer module that enables the user to query from generated KGs and get an answer in natural language. Unlike black-box models such as those purely powered by OpenAI’s models and the LangChain GraphQA packages, combining our KGs with Neo4j limits hallucinations and provides reliable and traceable answers in a user-friendly chatbot interface.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"203 ","pages":"Article 109318"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425003205","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Despite their success in day-to-day applications, ChatGPT and other large language models (LLMs) have not covered as much ground in scientific and engineering domains. One key challenge is the abundance of domain-specific terminology, which an LLM is not trained to extract in accordance with the underlying physical laws. Such black-box models can also lead to unreliable results or hallucinations. Hybrid AI, which combines data-driven and symbolic methods, leverages domain knowledge to add explainability and reliability to answers. Our group has previously developed a domain-informed ontology-based information extraction tool called SUSIE, which extracts key terms and their context to present them to the user as knowledge graphs (KGs). Although KGs are used to visualize relationships between different entities, they are not easily accessible for user questions. However, they serve as a structured input for LLMs. Thus, KGs can efficiently query a corpus of pharmaceutical documents, streamlining drug discovery and manufacturing processes. In this work, we propose methods to improve the information extraction capabilities of SUSIE by expanding its knowledge base and improving its ability to understand scientific material through a sentence-restructuring module. Additionally, we present a customized question-and-answer module that enables the user to query from generated KGs and get an answer in natural language. Unlike black-box models such as those purely powered by OpenAI’s models and the LangChain GraphQA packages, combining our KGs with Neo4j limits hallucinations and provides reliable and traceable answers in a user-friendly chatbot interface.
基于知识图谱的药物发现制药工程聊天机器人
尽管ChatGPT和其他大型语言模型(llm)在日常应用中取得了成功,但它们在科学和工程领域的覆盖范围并不大。一个关键的挑战是领域特定术语的丰富,法学硕士没有经过训练,无法根据潜在的物理定律提取这些术语。这种黑盒模型也可能导致不可靠的结果或幻觉。混合人工智能结合了数据驱动和符号方法,利用领域知识来增加答案的可解释性和可靠性。我们的团队之前开发了一个基于领域本体的信息提取工具,称为SUSIE,它可以提取关键术语及其上下文,并将其作为知识图(KGs)呈现给用户。尽管kg用于可视化不同实体之间的关系,但对于用户问题,它们不容易访问。然而,它们作为法学硕士的结构化输入。因此,KGs可以有效地查询药物文档的语料库,简化药物发现和制造过程。在这项工作中,我们提出了通过扩展其知识库和通过句子重组模块提高其理解科学材料的能力来提高SUSIE信息提取能力的方法。此外,我们还提供了一个定制的问答模块,使用户能够从生成的KGs中查询并以自然语言获得答案。与那些纯粹由OpenAI模型和LangChain GraphQA包驱动的黑盒模型不同,将我们的KGs与Neo4j相结合可以限制幻觉,并在用户友好的聊天机器人界面中提供可靠和可追踪的答案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信