它们并不都是一样的:回答不同的空间问题需要不同的接地策略

Proceedings of the Third International Workshop on Spatial Language Understanding Pub Date : 2020-11-01 DOI:10.18653/v1/2020.splu-1.4

Alberto Testoni, Claudio Greco, Tobias Bianchi, Mauricio Mazuecos, Agata Marcante, Luciana Benotti, R. Bernardi

{"title":"它们并不都是一样的:回答不同的空间问题需要不同的接地策略","authors":"Alberto Testoni, Claudio Greco, Tobias Bianchi, Mauricio Mazuecos, Agata Marcante, Luciana Benotti, R. Bernardi","doi":"10.18653/v1/2020.splu-1.4","DOIUrl":null,"url":null,"abstract":"In this paper, we study the grounding skills required to answer spatial questions asked by humans while playing the GuessWhat?! game. We propose a classification for spatial questions dividing them into absolute, relational, and group questions. We build a new answerer model based on the LXMERT multimodal transformer and we compare a baseline with and without visual features of the scene. We are interested in studying how the attention mechanisms of LXMERT are used to answer spatial questions since they require putting attention on more than one region simultaneously and spotting the relation holding among them. We show that our proposed model outperforms the baseline by a large extent (9.70% on spatial questions and 6.27% overall). By analyzing LXMERT errors and its attention mechanisms, we find that our classification helps to gain a better understanding of the skills required to answer different spatial questions.","PeriodicalId":272497,"journal":{"name":"Proceedings of the Third International Workshop on Spatial Language Understanding","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"They are not all alike: answering different spatial questions requires different grounding strategies\",\"authors\":\"Alberto Testoni, Claudio Greco, Tobias Bianchi, Mauricio Mazuecos, Agata Marcante, Luciana Benotti, R. Bernardi\",\"doi\":\"10.18653/v1/2020.splu-1.4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study the grounding skills required to answer spatial questions asked by humans while playing the GuessWhat?! game. We propose a classification for spatial questions dividing them into absolute, relational, and group questions. We build a new answerer model based on the LXMERT multimodal transformer and we compare a baseline with and without visual features of the scene. We are interested in studying how the attention mechanisms of LXMERT are used to answer spatial questions since they require putting attention on more than one region simultaneously and spotting the relation holding among them. We show that our proposed model outperforms the baseline by a large extent (9.70% on spatial questions and 6.27% overall). By analyzing LXMERT errors and its attention mechanisms, we find that our classification helps to gain a better understanding of the skills required to answer different spatial questions.\",\"PeriodicalId\":272497,\"journal\":{\"name\":\"Proceedings of the Third International Workshop on Spatial Language Understanding\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Third International Workshop on Spatial Language Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2020.splu-1.4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third International Workshop on Spatial Language Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.splu-1.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

在本文中，我们研究了在玩《GuessWhat?!》时回答人类提出的空间问题所需的基础技能。游戏。我们提出了空间问题的分类，将它们分为绝对问题、关系问题和群体问题。我们基于LXMERT多模态变压器建立了一个新的应答器模型，并比较了有和没有场景视觉特征的基线。我们感兴趣的是研究LXMERT的注意机制是如何被用来回答空间问题的，因为它们需要同时把注意力放在多个区域上，并发现它们之间的关系。我们表明，我们提出的模型在很大程度上优于基线(在空间问题上为9.70%，在总体上为6.27%)。通过分析LXMERT错误及其注意机制，我们发现我们的分类有助于更好地理解回答不同空间问题所需的技能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

They are not all alike: answering different spatial questions requires different grounding strategies

In this paper, we study the grounding skills required to answer spatial questions asked by humans while playing the GuessWhat?! game. We propose a classification for spatial questions dividing them into absolute, relational, and group questions. We build a new answerer model based on the LXMERT multimodal transformer and we compare a baseline with and without visual features of the scene. We are interested in studying how the attention mechanisms of LXMERT are used to answer spatial questions since they require putting attention on more than one region simultaneously and spotting the relation holding among them. We show that our proposed model outperforms the baseline by a large extent (9.70% on spatial questions and 6.27% overall). By analyzing LXMERT errors and its attention mechanisms, we find that our classification helps to gain a better understanding of the skills required to answer different spatial questions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Third International Workshop on Spatial Language Understanding

自引率

0.00%

发文量