本体论工程中生成能力问题的 RAG 方法

arXiv - CS - Artificial Intelligence Pub Date : 2024-09-13 DOI:arxiv-2409.08820

Xueli Pan, Jacco van Ossenbruggen, Victor de Boer, Zhisheng Huang

{"title":"本体论工程中生成能力问题的 RAG 方法","authors":"Xueli Pan, Jacco van Ossenbruggen, Victor de Boer, Zhisheng Huang","doi":"arxiv-2409.08820","DOIUrl":null,"url":null,"abstract":"Competency question (CQ) formulation is central to several ontology\ndevelopment and evaluation methodologies. Traditionally, the task of crafting\nthese competency questions heavily relies on the effort of domain experts and\nknowledge engineers which is often time-consuming and labor-intensive. With the\nemergence of Large Language Models (LLMs), there arises the possibility to\nautomate and enhance this process. Unlike other similar works which use\nexisting ontologies or knowledge graphs as input to LLMs, we present a\nretrieval-augmented generation (RAG) approach that uses LLMs for the automatic\ngeneration of CQs given a set of scientific papers considered to be a domain\nknowledge base. We investigate its performance and specifically, we study the\nimpact of different number of papers to the RAG and different temperature\nsetting of the LLM. We conduct experiments using GPT-4 on two domain ontology\nengineering tasks and compare results against ground-truth CQs constructed by\ndomain experts. Empirical assessments on the results, utilizing evaluation\nmetrics (precision and consistency), reveal that compared to zero-shot\nprompting, adding relevant domain knowledge to the RAG improves the performance\nof LLMs on generating CQs for concrete ontology engineering tasks.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A RAG Approach for Generating Competency Questions in Ontology Engineering\",\"authors\":\"Xueli Pan, Jacco van Ossenbruggen, Victor de Boer, Zhisheng Huang\",\"doi\":\"arxiv-2409.08820\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Competency question (CQ) formulation is central to several ontology\\ndevelopment and evaluation methodologies. Traditionally, the task of crafting\\nthese competency questions heavily relies on the effort of domain experts and\\nknowledge engineers which is often time-consuming and labor-intensive. With the\\nemergence of Large Language Models (LLMs), there arises the possibility to\\nautomate and enhance this process. Unlike other similar works which use\\nexisting ontologies or knowledge graphs as input to LLMs, we present a\\nretrieval-augmented generation (RAG) approach that uses LLMs for the automatic\\ngeneration of CQs given a set of scientific papers considered to be a domain\\nknowledge base. We investigate its performance and specifically, we study the\\nimpact of different number of papers to the RAG and different temperature\\nsetting of the LLM. We conduct experiments using GPT-4 on two domain ontology\\nengineering tasks and compare results against ground-truth CQs constructed by\\ndomain experts. Empirical assessments on the results, utilizing evaluation\\nmetrics (precision and consistency), reveal that compared to zero-shot\\nprompting, adding relevant domain knowledge to the RAG improves the performance\\nof LLMs on generating CQs for concrete ontology engineering tasks.\",\"PeriodicalId\":501479,\"journal\":{\"name\":\"arXiv - CS - Artificial Intelligence\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08820\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08820","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

能力问题（CQ）的提出是几种本体开发和评估方法的核心。传统上，编制这些能力问题的任务主要依赖于领域专家和知识工程师的努力，往往耗时耗力。随着大型语言模型（LLM）的出现，这一过程有了自动化和改进的可能。与其他使用现有本体或知识图谱作为 LLMs 输入的类似工作不同，我们提出了一种检索增强生成（RAG）方法，该方法使用 LLMs 自动生成 CQ，给定一组被视为领域知识库的科学论文。我们对其性能进行了研究，特别是研究了不同论文数量对 RAG 的影响以及 LLM 的不同温度设置。我们在两个领域本体工程任务中使用 GPT-4 进行了实验，并将实验结果与领域专家构建的地面实况 CQ 进行了比较。利用评价指标（精确度和一致性）对结果进行的实证评估表明，与 "0-shot-prompting "相比，在 RAG 中添加相关领域知识可以提高 LLM 为具体本体工程任务生成 CQ 的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A RAG Approach for Generating Competency Questions in Ontology Engineering

Competency question (CQ) formulation is central to several ontology development and evaluation methodologies. Traditionally, the task of crafting these competency questions heavily relies on the effort of domain experts and knowledge engineers which is often time-consuming and labor-intensive. With the emergence of Large Language Models (LLMs), there arises the possibility to automate and enhance this process. Unlike other similar works which use existing ontologies or knowledge graphs as input to LLMs, we present a retrieval-augmented generation (RAG) approach that uses LLMs for the automatic generation of CQs given a set of scientific papers considered to be a domain knowledge base. We investigate its performance and specifically, we study the impact of different number of papers to the RAG and different temperature setting of the LLM. We conduct experiments using GPT-4 on two domain ontology engineering tasks and compare results against ground-truth CQs constructed by domain experts. Empirical assessments on the results, utilizing evaluation metrics (precision and consistency), reveal that compared to zero-shot prompting, adding relevant domain knowledge to the RAG improves the performance of LLMs on generating CQs for concrete ontology engineering tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Artificial Intelligence

自引率

0.00%

发文量