Evaluating the effectiveness of prompt engineering for knowledge graph question answering.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Frontiers in Artificial Intelligence Pub Date : 2025-01-13 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1454258
Catherine Kosten, Farhad Nooralahzadeh, Kurt Stockinger
{"title":"Evaluating the effectiveness of prompt engineering for knowledge graph question answering.","authors":"Catherine Kosten, Farhad Nooralahzadeh, Kurt Stockinger","doi":"10.3389/frai.2024.1454258","DOIUrl":null,"url":null,"abstract":"<p><p>Many different methods for prompting large language models have been developed since the emergence of OpenAI's ChatGPT in November 2022. In this work, we evaluate six different few-shot prompting methods. The first set of experiments evaluates three frameworks that focus on the quantity or type of shots in a prompt: a baseline method with a simple prompt and a small number of shots, random few-shot prompting with 10, 20, and 30 shots, and similarity-based few-shot prompting. The second set of experiments target optimizing the prompt or enhancing shots through Large Language Model (LLM)-generated explanations, using three prompting frameworks: Explain then Translate, Question Decomposition Meaning Representation, and Optimization by Prompting. We evaluate these six prompting methods on the newly created Spider4SPARQL benchmark, as it is the most complex SPARQL-based Knowledge Graph Question Answering (KGQA) benchmark to date. Across the various prompting frameworks used, the commercial model is unable to achieve a score over 51%, indicating that KGQA, especially for complex queries, with multiple hops, set operations and filters remains a challenging task for LLMs. Our experiments find that the most successful prompting framework for KGQA is a simple prompt combined with an ontology and five random shots.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1454258"},"PeriodicalIF":3.0000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11770024/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1454258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Many different methods for prompting large language models have been developed since the emergence of OpenAI's ChatGPT in November 2022. In this work, we evaluate six different few-shot prompting methods. The first set of experiments evaluates three frameworks that focus on the quantity or type of shots in a prompt: a baseline method with a simple prompt and a small number of shots, random few-shot prompting with 10, 20, and 30 shots, and similarity-based few-shot prompting. The second set of experiments target optimizing the prompt or enhancing shots through Large Language Model (LLM)-generated explanations, using three prompting frameworks: Explain then Translate, Question Decomposition Meaning Representation, and Optimization by Prompting. We evaluate these six prompting methods on the newly created Spider4SPARQL benchmark, as it is the most complex SPARQL-based Knowledge Graph Question Answering (KGQA) benchmark to date. Across the various prompting frameworks used, the commercial model is unable to achieve a score over 51%, indicating that KGQA, especially for complex queries, with multiple hops, set operations and filters remains a challenging task for LLMs. Our experiments find that the most successful prompting framework for KGQA is a simple prompt combined with an ontology and five random shots.

自 2022 年 11 月 OpenAI 的 ChatGPT 出现以来,已经开发出许多不同的大型语言模型提示方法。在这项工作中,我们评估了六种不同的少量提示方法。第一组实验评估了三种侧重于提示中镜头数量或类型的框架:具有简单提示和少量镜头的基线方法、具有 10、20 和 30 个镜头的随机少量镜头提示,以及基于相似性的少量镜头提示。第二组实验使用三种提示框架,通过大语言模型(LLM)生成的解释来优化提示或增强镜头:先解释后翻译、问题分解意义表示和提示优化。我们在新创建的 Spider4SPARQL 基准上评估了这六种提示方法,因为它是迄今为止最复杂的基于 SPARQL 的知识图谱问题解答(KGQA)基准。在所使用的各种提示框架中,商业模型无法获得 51% 以上的分数,这表明 KGQA,尤其是复杂查询、多跳、集合操作和过滤器,对于 LLM 来说仍然是一项具有挑战性的任务。我们的实验发现,KGQA 最成功的提示框架是将简单提示与本体和五个随机镜头相结合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.10
自引率
2.50%
发文量
272
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信