Large language models for intelligent RDF knowledge graph construction: results from medical ontology mapping.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Frontiers in Artificial Intelligence Pub Date : 2025-04-25 eCollection Date: 2025-01-01 DOI:10.3389/frai.2025.1546179
Apostolos Mavridis, Stergios Tegos, Christos Anastasiou, Maria Papoutsoglou, Georgios Meditskos
{"title":"Large language models for intelligent RDF knowledge graph construction: results from medical ontology mapping.","authors":"Apostolos Mavridis, Stergios Tegos, Christos Anastasiou, Maria Papoutsoglou, Georgios Meditskos","doi":"10.3389/frai.2025.1546179","DOIUrl":null,"url":null,"abstract":"<p><p>The exponential growth of digital data, particularly in specialized domains like healthcare, necessitates advanced knowledge representation and integration techniques. RDF knowledge graphs offer a powerful solution, yet their creation and maintenance, especially for complex medical ontologies like Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT), remain challenging. Traditional methods often struggle with the scale, heterogeneity, and semantic complexity of medical data. This paper introduces a methodology leveraging the contextual understanding and reasoning capabilities of Large Language Models (LLMs) to automate and enhance medical ontology mapping for Resource Description Framework (RDF) knowledge graph construction. We conduct a comprehensive comparative analysis of six systems-GPT-4o, Claude 3.5 Sonnet v2, Gemini 1.5 Pro, Llama 3.3 70B, DeepSeek R1, and BERTMap-using a novel evaluation framework that combines quantitative metrics (precision, recall, and F1-score) with qualitative assessments of semantic accuracy. Our approach integrates a data preprocessing pipeline with an LLM-powered semantic mapping engine, utilizing BioBERT embeddings and ChromaDB vector database for efficient concept retrieval. Experimental results on a dataset of 108 medical terms demonstrate the superior performance of modern LLMs, particularly GPT-4o, achieving a precision of 93.75% and an F1-score of 96.26%. These findings highlight the potential of LLMs in bridging the gap between structured medical data and semantic knowledge representation, toward more accurate and interoperable medical knowledge graphs.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1546179"},"PeriodicalIF":3.0000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12061982/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2025.1546179","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The exponential growth of digital data, particularly in specialized domains like healthcare, necessitates advanced knowledge representation and integration techniques. RDF knowledge graphs offer a powerful solution, yet their creation and maintenance, especially for complex medical ontologies like Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT), remain challenging. Traditional methods often struggle with the scale, heterogeneity, and semantic complexity of medical data. This paper introduces a methodology leveraging the contextual understanding and reasoning capabilities of Large Language Models (LLMs) to automate and enhance medical ontology mapping for Resource Description Framework (RDF) knowledge graph construction. We conduct a comprehensive comparative analysis of six systems-GPT-4o, Claude 3.5 Sonnet v2, Gemini 1.5 Pro, Llama 3.3 70B, DeepSeek R1, and BERTMap-using a novel evaluation framework that combines quantitative metrics (precision, recall, and F1-score) with qualitative assessments of semantic accuracy. Our approach integrates a data preprocessing pipeline with an LLM-powered semantic mapping engine, utilizing BioBERT embeddings and ChromaDB vector database for efficient concept retrieval. Experimental results on a dataset of 108 medical terms demonstrate the superior performance of modern LLMs, particularly GPT-4o, achieving a precision of 93.75% and an F1-score of 96.26%. These findings highlight the potential of LLMs in bridging the gap between structured medical data and semantic knowledge representation, toward more accurate and interoperable medical knowledge graphs.

用于智能RDF知识图构建的大型语言模型:医学本体映射的结果。
数字数据的指数级增长,特别是在医疗保健等专业领域,需要先进的知识表示和集成技术。RDF知识图提供了一个强大的解决方案,但是它们的创建和维护仍然具有挑战性,特别是对于复杂的医学本体,如系统化医学术语-临床术语(SNOMED CT)。传统方法经常与医疗数据的规模、异构性和语义复杂性作斗争。本文介绍了一种利用大型语言模型(llm)的上下文理解和推理能力来自动化和增强医学本体映射的方法,用于资源描述框架(RDF)知识图的构建。我们对gpt - 40、Claude 3.5 Sonnet v2、Gemini 1.5 Pro、Llama 3.3 70B、DeepSeek R1和bertmap这六个系统进行了全面的比较分析,使用了一种新的评估框架,该框架结合了定量指标(精度、召回率和f1分数)和语义准确性的定性评估。我们的方法将数据预处理管道与llm支持的语义映射引擎集成在一起,利用BioBERT嵌入和ChromaDB矢量数据库进行高效的概念检索。在108个医学术语数据集上的实验结果表明,现代llm,特别是gpt - 40,具有优异的性能,达到了93.75%的精度和96.26%的f1分数。这些发现突出了llm在弥合结构化医学数据和语义知识表示之间的差距,朝着更准确和可互操作的医学知识图发展的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.10
自引率
2.50%
发文量
272
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信