Multi-head attention based candidate segment selection in QA over hybrid data

IF 0.8 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Intelligent Data Analysis Pub Date : 2023-11-20 DOI:10.3233/ida-227032

Qian Chen, Xiaoying Gao, Xin Guo, Suge Wang

{"title":"Multi-head attention based candidate segment selection in QA over hybrid data","authors":"Qian Chen, Xiaoying Gao, Xin Guo, Suge Wang","doi":"10.3233/ida-227032","DOIUrl":null,"url":null,"abstract":"Question Answering based on Tabular and Textual data is a novel task proposed in recent years in the field of QA. At present, most QA systems return answers from a single data form, such as knowledge graphs, tables, texts. However, hybrid data including structured and unstructured data is quite pervasive in real life instead of a single form. Recent research on TAT-QA mainly suffers from the higher error of extracting supporting evidences from both tabular and textual content. This paper aimed to address the problem of failure evidence extraction from more complex and realistic hybrid data. We first proposed two types of metrics to evaluate the performance of evidence extraction on hybrid data, i.e. wrong evidence ratio (WER) and missing evidence ratio (MER). Then we utilize a candidate extractor to obtain supporting evidence related to the question. Third, an origin selector is designed to determine from where the question’s answer comes. Finally, the loss of origin selector is fused to the final loss function, which can improve the evidence extraction performance. Experimental results on the TAT-QA dataset showed that our proposed model outperforms the best baseline in terms of F1, WER and MER, which proves the effectiveness of our model.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"42 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Data Analysis","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/ida-227032","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Question Answering based on Tabular and Textual data is a novel task proposed in recent years in the field of QA. At present, most QA systems return answers from a single data form, such as knowledge graphs, tables, texts. However, hybrid data including structured and unstructured data is quite pervasive in real life instead of a single form. Recent research on TAT-QA mainly suffers from the higher error of extracting supporting evidences from both tabular and textual content. This paper aimed to address the problem of failure evidence extraction from more complex and realistic hybrid data. We first proposed two types of metrics to evaluate the performance of evidence extraction on hybrid data, i.e. wrong evidence ratio (WER) and missing evidence ratio (MER). Then we utilize a candidate extractor to obtain supporting evidence related to the question. Third, an origin selector is designed to determine from where the question’s answer comes. Finally, the loss of origin selector is fused to the final loss function, which can improve the evidence extraction performance. Experimental results on the TAT-QA dataset showed that our proposed model outperforms the best baseline in terms of F1, WER and MER, which proves the effectiveness of our model.

查看原文本刊更多论文

混合数据质量保证中基于多头注意力的候选段选择

基于表格和文本数据的问题解答是近年来在质量保证领域提出的一项新任务。目前，大多数质量保证系统都是从知识图谱、表格、文本等单一数据形式返回答案的。然而，包括结构化数据和非结构化数据在内的混合数据在现实生活中非常普遍，而非单一形式。最近关于 TAT-QA 的研究主要存在从表格和文本内容中提取支持证据的误差较大的问题。本文旨在解决从更复杂、更现实的混合数据中提取失败证据的问题。我们首先提出了两类指标来评估混合数据中证据提取的性能，即错误证据率（WER）和缺失证据率（MER）。然后，我们利用候选提取器来获取与问题相关的支持证据。第三，设计一个来源选择器来确定问题答案的来源。最后，将起源选择器的损失融合到最终损失函数中，从而提高证据提取性能。在 TAT-QA 数据集上的实验结果表明，我们提出的模型在 F1、WER 和 MER 方面都优于最佳基线，这证明了我们模型的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Intelligent Data Analysis 工程技术-计算机：人工智能

CiteScore

2.20

自引率

5.90%

发文量

审稿时长

3.3 months

期刊介绍： Intelligent Data Analysis provides a forum for the examination of issues related to the research and applications of Artificial Intelligence techniques in data analysis across a variety of disciplines. These techniques include (but are not limited to): all areas of data visualization, data pre-processing (fusion, editing, transformation, filtering, sampling), data engineering, database mining techniques, tools and applications, use of domain knowledge in data analysis, big data applications, evolutionary algorithms, machine learning, neural nets, fuzzy logic, statistical pattern recognition, knowledge filtering, and post-processing. In particular, papers are preferred that discuss development of new AI related data analysis architectures, methodologies, and techniques and their applications to various domains.