MLPQ: A Dataset for Path Question Answering over Multilingual Knowledge Graphs

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data Research Pub Date : 2023-05-28 DOI:10.1016/j.bdr.2023.100381

Yiming Tan , Yongrui Chen , Guilin Qi , Weizhuo Li , Meng Wang

{"title":"MLPQ: A Dataset for Path Question Answering over Multilingual Knowledge Graphs","authors":"Yiming Tan , Yongrui Chen , Guilin Qi , Weizhuo Li , Meng Wang","doi":"10.1016/j.bdr.2023.100381","DOIUrl":null,"url":null,"abstract":"<div><p>Knowledge Graph-based Multilingual Question Answering (KG-MLQA), as one of the essential subtasks in Knowledge Graph-based Question Answering (KGQA), emphasizes that questions on the KGQA task can be expressed in different languages to solve the lexical gap between questions and knowledge graph(s). However, the existing KG-MLQA works mainly focus on the semantic parsing<span> of multilingual questions but ignore the questions that require integrating information from cross-lingual knowledge graphs (CLKG). This paper extends KG-MLQA to Cross-lingual KG-based multilingual Question Answering (CLKGQA) and constructs the first CLKGQA dataset over multilingual DBpedia named MLPQ, which contains 300K questions in English, Chinese, and French. We further propose a novel KG sampling algorithm<span> for KG construction, making the MLPQ support the research of different types of methods. To evaluate the dataset, we put forward a general question answering workflow whose core idea is to transform CLKGQA into KG-MLQA. We first use the Entity Alignment (EA) model to merge CLKG into a single KG and get the answer to the question by the Multi-hop QA model combined with the Multilingual pre-training model. By instantiating the above QA workflow, we establish two baseline models for MLPQ, one of which uses Google translation to obtain alignment entities, and the other adopts the recent EA model. Experiments show that the baseline models are insufficient to obtain the ideal performances on CLKGQA. Moreover, the availability of our benchmark contributes to the community of question answering and entity alignment.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"32 ","pages":"Article 100381"},"PeriodicalIF":3.5000,"publicationDate":"2023-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Research","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221457962300014X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 2

Abstract

Knowledge Graph-based Multilingual Question Answering (KG-MLQA), as one of the essential subtasks in Knowledge Graph-based Question Answering (KGQA), emphasizes that questions on the KGQA task can be expressed in different languages to solve the lexical gap between questions and knowledge graph(s). However, the existing KG-MLQA works mainly focus on the semantic parsing of multilingual questions but ignore the questions that require integrating information from cross-lingual knowledge graphs (CLKG). This paper extends KG-MLQA to Cross-lingual KG-based multilingual Question Answering (CLKGQA) and constructs the first CLKGQA dataset over multilingual DBpedia named MLPQ, which contains 300K questions in English, Chinese, and French. We further propose a novel KG sampling algorithm for KG construction, making the MLPQ support the research of different types of methods. To evaluate the dataset, we put forward a general question answering workflow whose core idea is to transform CLKGQA into KG-MLQA. We first use the Entity Alignment (EA) model to merge CLKG into a single KG and get the answer to the question by the Multi-hop QA model combined with the Multilingual pre-training model. By instantiating the above QA workflow, we establish two baseline models for MLPQ, one of which uses Google translation to obtain alignment entities, and the other adopts the recent EA model. Experiments show that the baseline models are insufficient to obtain the ideal performances on CLKGQA. Moreover, the availability of our benchmark contributes to the community of question answering and entity alignment.

查看原文本刊更多论文

MLPQ:一个多语言知识图路径问答数据集

基于知识图的多语言问答（KG-MLQA）作为基于知识图问答（KGQA）的重要子任务之一，强调KGQA任务中的问题可以用不同的语言表达，以解决问题与知识图之间的词汇差距。然而，现有的KG-MLQA工作主要集中在多语言问题的语义解析上，而忽略了需要整合跨语言知识图信息的问题。本文将KG-MLQA扩展到基于跨语言KG的多语言问答（CLKGQA），并在多语言DBpedia上构建了第一个CLKGQA数据集MLPQ，该数据集包含300K个英语、汉语和法语问题。我们进一步提出了一种用于KG构造的新的KG采样算法，使MLPQ支持不同类型方法的研究。为了评估数据集，我们提出了一个通用的问答工作流，其核心思想是将CLKGQA转换为KG-MLQA。我们首先使用实体对齐（EA）模型将CLKG合并为单个KG，并通过多跳QA模型与多语言预训练模型相结合来获得问题的答案。通过实例化上述QA工作流程，我们为MLPQ建立了两个基线模型，其中一个使用谷歌翻译来获得对齐实体，另一个使用最近的EA模型。实验表明，基线模型不足以在CLKGQA上获得理想的性能。此外，我们的基准的可用性有助于问答和实体协调的社区。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Big Data Research Computer Science-Computer Science Applications

CiteScore

8.40

自引率

3.00%

发文量

期刊介绍： The journal aims to promote and communicate advances in big data research by providing a fast and high quality forum for researchers, practitioners and policy makers from the very many different communities working on, and with, this topic. The journal will accept papers on foundational aspects in dealing with big data, as well as papers on specific Platforms and Technologies used to deal with big data. To promote Data Science and interdisciplinary collaboration between fields, and to showcase the benefits of data driven research, papers demonstrating applications of big data in domains as diverse as Geoscience, Social Web, Finance, e-Commerce, Health Care, Environment and Climate, Physics and Astronomy, Chemistry, life sciences and drug discovery, digital libraries and scientific publications, security and government will also be considered. Occasionally the journal may publish whitepapers on policies, standards and best practices.