CEDG-GeoQA: Knowledge base question answering for the geoscience domain via Chinese entity description graph

IF 2.7 4区 地球科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Lai Wei, Qinghua Lu, Yilin Duan, Hong Yao, Xiaojun Kang
{"title":"CEDG-GeoQA: Knowledge base question answering for the geoscience domain via Chinese entity description graph","authors":"Lai Wei, Qinghua Lu, Yilin Duan, Hong Yao, Xiaojun Kang","doi":"10.1007/s12145-024-01304-8","DOIUrl":null,"url":null,"abstract":"<p>Acquiring geoscience knowledge is crucial for advancing earth science research. Currently, geoscience knowledge can be obtained through search engines or specialized databases. However, the quality of search engine results varies, and geoscience databases do not support natural language queries. To address these challenges, Geoscience Question Answering (GeoQA) systems have been developed to provide answers to natural language queries. Much of the existing research in geoscience QA primarily focuses on geography, with other domains remaining relatively unexplored. To bridge this gap, our study introduces a Chinese geoscience QA dataset that covers a wide range of topics, including geography, climate, and culture. Additionally, we propose the CEDG-GeoQA framework for Chinese geoscience QA. The model begins by utilizing syntactic parsing to convert unstructured queries into an entity description graph (EDG). Subsequently, it aligns the EDG with a comprehensive geoscience knowledge base, extracting a subgraph centered around the subject entity. This subgraph is used to assess candidate answers and determine the most likely response. Our comprehensive experiments, conducted using a Chinese geo-knowledge base, demonstrate the superior performance of our method, achieving a 5% improvement in the F1 measure compared to existing baselines, including WDAqua, gAnswer, and NSQA.</p>","PeriodicalId":49318,"journal":{"name":"Earth Science Informatics","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earth Science Informatics","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s12145-024-01304-8","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Acquiring geoscience knowledge is crucial for advancing earth science research. Currently, geoscience knowledge can be obtained through search engines or specialized databases. However, the quality of search engine results varies, and geoscience databases do not support natural language queries. To address these challenges, Geoscience Question Answering (GeoQA) systems have been developed to provide answers to natural language queries. Much of the existing research in geoscience QA primarily focuses on geography, with other domains remaining relatively unexplored. To bridge this gap, our study introduces a Chinese geoscience QA dataset that covers a wide range of topics, including geography, climate, and culture. Additionally, we propose the CEDG-GeoQA framework for Chinese geoscience QA. The model begins by utilizing syntactic parsing to convert unstructured queries into an entity description graph (EDG). Subsequently, it aligns the EDG with a comprehensive geoscience knowledge base, extracting a subgraph centered around the subject entity. This subgraph is used to assess candidate answers and determine the most likely response. Our comprehensive experiments, conducted using a Chinese geo-knowledge base, demonstrate the superior performance of our method, achieving a 5% improvement in the F1 measure compared to existing baselines, including WDAqua, gAnswer, and NSQA.

Abstract Image

CEDG-GeoQA:通过中文实体描述图回答地球科学领域的知识库问题
获取地球科学知识对于推动地球科学研究至关重要。目前,可通过搜索引擎或专业数据库获取地球科学知识。然而,搜索引擎结果的质量参差不齐,而且地球科学数据库不支持自然语言查询。为了应对这些挑战,人们开发了地球科学问题解答(GeoQA)系统,为自然语言查询提供答案。现有的大部分地理科学问题解答研究主要集中在地理学领域,其他领域的研究相对较少。为了弥补这一差距,我们的研究引入了一个中国地理科学质量保证数据集,该数据集涵盖了地理、气候和文化等广泛主题。此外,我们还为中文地理科学质量保证提出了 CEDG-GeoQA 框架。该模型首先利用语法分析将非结构化查询转换为实体描述图(EDG)。随后,它将实体描述图与综合地球科学知识库对齐,提取出以主题实体为中心的子图。该子图用于评估候选答案并确定最可能的响应。我们使用中国地理知识库进行的综合实验证明了我们的方法性能优越,与 WDAqua、gAnswer 和 NSQA 等现有基线相比,F1 指标提高了 5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Earth Science Informatics
Earth Science Informatics COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-GEOSCIENCES, MULTIDISCIPLINARY
CiteScore
4.60
自引率
3.60%
发文量
157
审稿时长
4.3 months
期刊介绍: The Earth Science Informatics [ESIN] journal aims at rapid publication of high-quality, current, cutting-edge, and provocative scientific work in the area of Earth Science Informatics as it relates to Earth systems science and space science. This includes articles on the application of formal and computational methods, computational Earth science, spatial and temporal analyses, and all aspects of computer applications to the acquisition, storage, processing, interchange, and visualization of data and information about the materials, properties, processes, features, and phenomena that occur at all scales and locations in the Earth system’s five components (atmosphere, hydrosphere, geosphere, biosphere, cryosphere) and in space (see "About this journal" for more detail). The quarterly journal publishes research, methodology, and software articles, as well as editorials, comments, and book and software reviews. Review articles of relevant findings, topics, and methodologies are also considered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信