使用语言模型提高知识图多语种问答质量的SPARQL查询候选过滤

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Web Engineering Pub Date : 2025-06-01 DOI:10.13052/jwe1540-9589.2444

Aleksandr Perevalov;Aleksandr Gashkov;Maria Eltsova;Andreas Both

{"title":"使用语言模型提高知识图多语种问答质量的SPARQL查询候选过滤","authors":"Aleksandr Perevalov;Aleksandr Gashkov;Maria Eltsova;Andreas Both","doi":"10.13052/jwe1540-9589.2444","DOIUrl":null,"url":null,"abstract":"Question answering is an approach to retrieving information from a knowledge base using natural language. Within question answering systems that work over knowledge graphs (KGQA), a ranked list of SPARQL query candidates is typically computed for the given natural-language input, where the top-ranked query should reflect the intention and semantics of the given user's question. This article follows our long-term research agenda of providing trustworthy KGQA systems by presenting an approach for filtering incorrect queries. Here, we employ (large) language models (LMs/LLMs) to distinguish between correct and incorrect queries. The main difference to the previous work is that we address here multilingual questions represented in major languages (English, German, French, Spanish, and Russian), and confirm the generalizability of the approach by also evaluating it on some low-resource languages (Ukrainian, Armenian, Lithuanian, Belarusian, and Bashkir). The considered LMs (BERT, DistilBERT, Mistral, Zephyr, GPT-3.5, and GPT-4) were applied to the KGQA systems - QAnswer (real-world system) and MemQA (idealized system) – as SPARQL query filters. The approach was evaluated on the multilingual dataset QALD-9-plus, which is based on the Wikidata knowledge graph. The experimental results imply that the considered KGQA systems achieve quality improvements for all languages when using our query-filtering approach.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"24 4","pages":"563-592"},"PeriodicalIF":1.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11112782","citationCount":"0","resultStr":"{\"title\":\"SPARQL Query Candidate Filtering for Improving the Quality of Multilingual Question Answering Over Knowledge Graphs Using Language Models\",\"authors\":\"Aleksandr Perevalov;Aleksandr Gashkov;Maria Eltsova;Andreas Both\",\"doi\":\"10.13052/jwe1540-9589.2444\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Question answering is an approach to retrieving information from a knowledge base using natural language. Within question answering systems that work over knowledge graphs (KGQA), a ranked list of SPARQL query candidates is typically computed for the given natural-language input, where the top-ranked query should reflect the intention and semantics of the given user's question. This article follows our long-term research agenda of providing trustworthy KGQA systems by presenting an approach for filtering incorrect queries. Here, we employ (large) language models (LMs/LLMs) to distinguish between correct and incorrect queries. The main difference to the previous work is that we address here multilingual questions represented in major languages (English, German, French, Spanish, and Russian), and confirm the generalizability of the approach by also evaluating it on some low-resource languages (Ukrainian, Armenian, Lithuanian, Belarusian, and Bashkir). The considered LMs (BERT, DistilBERT, Mistral, Zephyr, GPT-3.5, and GPT-4) were applied to the KGQA systems - QAnswer (real-world system) and MemQA (idealized system) – as SPARQL query filters. The approach was evaluated on the multilingual dataset QALD-9-plus, which is based on the Wikidata knowledge graph. The experimental results imply that the considered KGQA systems achieve quality improvements for all languages when using our query-filtering approach.\",\"PeriodicalId\":49952,\"journal\":{\"name\":\"Journal of Web Engineering\",\"volume\":\"24 4\",\"pages\":\"563-592\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11112782\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Web Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11112782/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11112782/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

问答是一种使用自然语言从知识库中检索信息的方法。在处理知识图（KGQA）的问答系统中，通常会为给定的自然语言输入计算SPARQL候选查询的排名列表，其中排名最高的查询应该反映给定用户问题的意图和语义。本文遵循我们的长期研究议程，通过提出一种过滤错误查询的方法来提供值得信赖的KGQA系统。在这里，我们使用（大型）语言模型（lm / llm）来区分正确和不正确的查询。与之前工作的主要区别在于，我们在这里解决了以主要语言（英语、德语、法语、西班牙语和俄语）表示的多语言问题，并通过在一些低资源语言（乌克兰语、亚美尼亚语、立陶宛语、白俄罗斯语和巴什基尔语）上进行评估来确认该方法的普遍性。所考虑的lm （BERT、DistilBERT、Mistral、Zephyr、GPT-3.5和GPT-4）被应用于KGQA系统——QAnswer（真实系统）和MemQA（理想系统）——作为SPARQL查询过滤器。在基于Wikidata知识图的多语言数据集QALD-9-plus上对该方法进行了评估。实验结果表明，当使用我们的查询过滤方法时，所考虑的KGQA系统对所有语言都实现了质量改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SPARQL Query Candidate Filtering for Improving the Quality of Multilingual Question Answering Over Knowledge Graphs Using Language Models

Question answering is an approach to retrieving information from a knowledge base using natural language. Within question answering systems that work over knowledge graphs (KGQA), a ranked list of SPARQL query candidates is typically computed for the given natural-language input, where the top-ranked query should reflect the intention and semantics of the given user's question. This article follows our long-term research agenda of providing trustworthy KGQA systems by presenting an approach for filtering incorrect queries. Here, we employ (large) language models (LMs/LLMs) to distinguish between correct and incorrect queries. The main difference to the previous work is that we address here multilingual questions represented in major languages (English, German, French, Spanish, and Russian), and confirm the generalizability of the approach by also evaluating it on some low-resource languages (Ukrainian, Armenian, Lithuanian, Belarusian, and Bashkir). The considered LMs (BERT, DistilBERT, Mistral, Zephyr, GPT-3.5, and GPT-4) were applied to the KGQA systems - QAnswer (real-world system) and MemQA (idealized system) – as SPARQL query filters. The approach was evaluated on the multilingual dataset QALD-9-plus, which is based on the Wikidata knowledge graph. The experimental results imply that the considered KGQA systems achieve quality improvements for all languages when using our query-filtering approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Web Engineering 工程技术-计算机：理论方法

CiteScore

1.80

自引率

12.50%

发文量

审稿时长

9 months

期刊介绍： The World Wide Web and its associated technologies have become a major implementation and delivery platform for a large variety of applications, ranging from simple institutional information Web sites to sophisticated supply-chain management systems, financial applications, e-government, distance learning, and entertainment, among others. Such applications, in addition to their intrinsic functionality, also exhibit the more complex behavior of distributed applications.