Xueli Pan, Jacco van Ossenbruggen, Victor de Boer, Zhisheng Huang
{"title":"A RAG Approach for Generating Competency Questions in Ontology Engineering","authors":"Xueli Pan, Jacco van Ossenbruggen, Victor de Boer, Zhisheng Huang","doi":"arxiv-2409.08820","DOIUrl":null,"url":null,"abstract":"Competency question (CQ) formulation is central to several ontology\ndevelopment and evaluation methodologies. Traditionally, the task of crafting\nthese competency questions heavily relies on the effort of domain experts and\nknowledge engineers which is often time-consuming and labor-intensive. With the\nemergence of Large Language Models (LLMs), there arises the possibility to\nautomate and enhance this process. Unlike other similar works which use\nexisting ontologies or knowledge graphs as input to LLMs, we present a\nretrieval-augmented generation (RAG) approach that uses LLMs for the automatic\ngeneration of CQs given a set of scientific papers considered to be a domain\nknowledge base. We investigate its performance and specifically, we study the\nimpact of different number of papers to the RAG and different temperature\nsetting of the LLM. We conduct experiments using GPT-4 on two domain ontology\nengineering tasks and compare results against ground-truth CQs constructed by\ndomain experts. Empirical assessments on the results, utilizing evaluation\nmetrics (precision and consistency), reveal that compared to zero-shot\nprompting, adding relevant domain knowledge to the RAG improves the performance\nof LLMs on generating CQs for concrete ontology engineering tasks.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08820","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Competency question (CQ) formulation is central to several ontology
development and evaluation methodologies. Traditionally, the task of crafting
these competency questions heavily relies on the effort of domain experts and
knowledge engineers which is often time-consuming and labor-intensive. With the
emergence of Large Language Models (LLMs), there arises the possibility to
automate and enhance this process. Unlike other similar works which use
existing ontologies or knowledge graphs as input to LLMs, we present a
retrieval-augmented generation (RAG) approach that uses LLMs for the automatic
generation of CQs given a set of scientific papers considered to be a domain
knowledge base. We investigate its performance and specifically, we study the
impact of different number of papers to the RAG and different temperature
setting of the LLM. We conduct experiments using GPT-4 on two domain ontology
engineering tasks and compare results against ground-truth CQs constructed by
domain experts. Empirical assessments on the results, utilizing evaluation
metrics (precision and consistency), reveal that compared to zero-shot
prompting, adding relevant domain knowledge to the RAG improves the performance
of LLMs on generating CQs for concrete ontology engineering tasks.