Osteosarcoma knowledge graph question answering system: deep learning-based knowledge graph and large language model fusion

IF 6.9 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Lulu Zhang , Weisong Zhao , Zhiwei Cheng , Yafei Jiang , Kai Tian , Jia Shi , Zhenyu Jiang , Yingqi Hua
{"title":"Osteosarcoma knowledge graph question answering system: deep learning-based knowledge graph and large language model fusion","authors":"Lulu Zhang ,&nbsp;Weisong Zhao ,&nbsp;Zhiwei Cheng ,&nbsp;Yafei Jiang ,&nbsp;Kai Tian ,&nbsp;Jia Shi ,&nbsp;Zhenyu Jiang ,&nbsp;Yingqi Hua","doi":"10.1016/j.imed.2024.12.001","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Osteosarcoma is a prevalent primary malignant bone tumor in children and adolescents, accounting for approximately 5 % of childhood malignancies. Because of its rarity and biological complexity, treatment breakthroughs for osteosarcoma have been limited. To advance research in this field, we aimed to construct the first comprehensive osteosarcoma knowledge graph (OSKG) using the PubMed database.</div></div><div><h3>Methods</h3><div>A systematic search of PubMed (2003–2023) using the keyword “osteosarcoma” yielded 25,415 abstracts. Leveraging BioBERT, pretrained on biomedical corpora and fine-tuned with osteosarcoma-specific manual annotations, we identified 16 entity types and 17 biological relationships. The extracted elements were synthesized to create the OSKG, resulting in a deep learning-based knowledge base to explore osteosarcoma pathogenesis and molecular mechanisms. We then developed a specialized question-answering system (knowledge graph question answering (KGQA)) powered by ChatGLM3. This system employs advanced natural language processing and incorporates the OSKG to ensure optimal response quality and accuracy.</div></div><div><h3>Results</h3><div>The pretrained BioBERT averaged &gt; 92 % accuracy in entity and relationship training. Evaluation using 100 pairs of gold-standard quizzes showed that the final quiz system outperformed other large language models in accuracy and robustness.</div></div><div><h3>Conclusion</h3><div>The system is designed to provide accurate disease-related queries and answers, effectively facilitating knowledge acquisition and reasoning in medical research and clinical practice. This project offers a robust tool for osteosarcoma research and promotes the deep integration of knowledge graphs and artificial intelligence technologies in the medical field.</div></div>","PeriodicalId":73400,"journal":{"name":"Intelligent medicine","volume":"5 2","pages":"Pages 99-110"},"PeriodicalIF":6.9000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667102625000269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

Osteosarcoma is a prevalent primary malignant bone tumor in children and adolescents, accounting for approximately 5 % of childhood malignancies. Because of its rarity and biological complexity, treatment breakthroughs for osteosarcoma have been limited. To advance research in this field, we aimed to construct the first comprehensive osteosarcoma knowledge graph (OSKG) using the PubMed database.

Methods

A systematic search of PubMed (2003–2023) using the keyword “osteosarcoma” yielded 25,415 abstracts. Leveraging BioBERT, pretrained on biomedical corpora and fine-tuned with osteosarcoma-specific manual annotations, we identified 16 entity types and 17 biological relationships. The extracted elements were synthesized to create the OSKG, resulting in a deep learning-based knowledge base to explore osteosarcoma pathogenesis and molecular mechanisms. We then developed a specialized question-answering system (knowledge graph question answering (KGQA)) powered by ChatGLM3. This system employs advanced natural language processing and incorporates the OSKG to ensure optimal response quality and accuracy.

Results

The pretrained BioBERT averaged > 92 % accuracy in entity and relationship training. Evaluation using 100 pairs of gold-standard quizzes showed that the final quiz system outperformed other large language models in accuracy and robustness.

Conclusion

The system is designed to provide accurate disease-related queries and answers, effectively facilitating knowledge acquisition and reasoning in medical research and clinical practice. This project offers a robust tool for osteosarcoma research and promotes the deep integration of knowledge graphs and artificial intelligence technologies in the medical field.
骨肉瘤知识图谱问答系统:基于深度学习的知识图谱与大语言模型融合
目的骨肉瘤是儿童和青少年常见的原发性恶性骨肿瘤,约占儿童恶性肿瘤的5%。由于其罕见性和生物学复杂性,骨肉瘤的治疗突破有限。为了推进这一领域的研究,我们的目标是利用PubMed数据库构建第一个全面的骨肉瘤知识图谱(OSKG)。方法以“骨肉瘤”为关键词系统检索PubMed(2003-2023),共检索到25,415篇论文。利用BioBERT,在生物医学语料库上进行预训练,并与骨肉瘤特异性手册注释进行微调,我们确定了16种实体类型和17种生物关系。将提取的元素合成成OSKG,从而形成一个基于深度学习的知识库,用于探索骨肉瘤的发病机制和分子机制。然后,我们开发了一个专门的问答系统(知识图谱问答(KGQA)),由ChatGLM3提供支持。该系统采用先进的自然语言处理,并结合OSKG,以确保最佳的响应质量和准确性。结果预训练的BioBERT平均为>;实体和关系训练的准确率为92%。使用100对黄金标准测验的评估表明,最终的测验系统在准确性和稳健性方面优于其他大型语言模型。结论该系统能够提供准确的疾病相关查询和答案,有效促进医学研究和临床实践中的知识获取和推理。该项目为骨肉瘤研究提供了一个强大的工具,并促进了知识图谱和人工智能技术在医学领域的深度融合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Intelligent medicine
Intelligent medicine Surgery, Radiology and Imaging, Artificial Intelligence, Biomedical Engineering
CiteScore
5.20
自引率
0.00%
发文量
19
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信