Multi-Aspect Controlled Response Generation in a Multimodal Dialogue System using Hierarchical Transformer Network

2021 International Joint Conference on Neural Networks (IJCNN) Pub Date : 2021-07-18 DOI:10.1109/IJCNN52387.2021.9533886

Mauajama Firdaus, Nidhi Thakur, Asif Ekbal

{"title":"Multi-Aspect Controlled Response Generation in a Multimodal Dialogue System using Hierarchical Transformer Network","authors":"Mauajama Firdaus, Nidhi Thakur, Asif Ekbal","doi":"10.1109/IJCNN52387.2021.9533886","DOIUrl":null,"url":null,"abstract":"Multimodality in dialogues has become crucial for a thorough understanding of the intent of the user to provide better responses to fulfill user's demands. Existing dialogue systems suffer from the issue of inconsistency and dull responses. Many goal-oriented conversational systems lack the different aspect information of the products or services to present an informative and exciting response. Aspects such as the price, color, pattern, rating are essential for deciding whether to purchase/order. To alleviate the issues in the existing systems, we propose the novel task of multi-aspect guided dialogue generation. This task is introduced to focus on making the responses focused and consistent with the different aspects mentioned in the current dialogue. In our present work, we design a hierarchical transformer network to capture the dialogue context for generating the responses. For creating the responses with multiple aspects, we explicitly give the aspect vectors at the time of decoding for generation. The information of the aspects specified to the decoder controls the overall generation process. We evaluate our proposed hierarchical framework on the newly created multi-domain multi-modal dialogue (MDMMD) dataset consisting of both text and images. Experimental results show that the proposed system outperforms all the existing and baseline approaches. The aspect controlled responses are immensely consistent with the ongoing dialogue and highly diverse resolving the issues of the current systems.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9533886","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodality in dialogues has become crucial for a thorough understanding of the intent of the user to provide better responses to fulfill user's demands. Existing dialogue systems suffer from the issue of inconsistency and dull responses. Many goal-oriented conversational systems lack the different aspect information of the products or services to present an informative and exciting response. Aspects such as the price, color, pattern, rating are essential for deciding whether to purchase/order. To alleviate the issues in the existing systems, we propose the novel task of multi-aspect guided dialogue generation. This task is introduced to focus on making the responses focused and consistent with the different aspects mentioned in the current dialogue. In our present work, we design a hierarchical transformer network to capture the dialogue context for generating the responses. For creating the responses with multiple aspects, we explicitly give the aspect vectors at the time of decoding for generation. The information of the aspects specified to the decoder controls the overall generation process. We evaluate our proposed hierarchical framework on the newly created multi-domain multi-modal dialogue (MDMMD) dataset consisting of both text and images. Experimental results show that the proposed system outperforms all the existing and baseline approaches. The aspect controlled responses are immensely consistent with the ongoing dialogue and highly diverse resolving the issues of the current systems.

查看原文本刊更多论文

基于分层变压器网络的多模态对话系统的多面向控制响应生成

对话中的多模态对于彻底理解用户意图，提供更好的响应以满足用户需求至关重要。现有的对话系统存在不一致和反应迟钝的问题。许多面向目标的会话系统缺乏产品或服务的不同方面的信息，无法提供信息丰富和令人兴奋的响应。诸如价格、颜色、图案、等级等方面是决定是否购买/订购的重要因素。为了缓解现有系统中存在的问题，我们提出了多方位引导对话生成的新任务。这项任务的重点是使回答集中并与当前对话中提到的不同方面保持一致。在我们目前的工作中，我们设计了一个分层变压器网络来捕获生成响应的对话上下文。为了创建具有多个方面的响应，我们在解码生成时显式地给出了方面向量。指定给解码器的方面的信息控制整个生成过程。我们在由文本和图像组成的新创建的多域多模态对话(MDMMD)数据集上评估了我们提出的分层框架。实验结果表明，该系统优于所有现有的和基线的方法。方面控制的反应非常符合正在进行的对话和解决当前系统问题的高度多样化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Joint Conference on Neural Networks (IJCNN)

自引率

0.00%

发文量