{"title":"Multi-Aspect Controlled Response Generation in a Multimodal Dialogue System using Hierarchical Transformer Network","authors":"Mauajama Firdaus, Nidhi Thakur, Asif Ekbal","doi":"10.1109/IJCNN52387.2021.9533886","DOIUrl":null,"url":null,"abstract":"Multimodality in dialogues has become crucial for a thorough understanding of the intent of the user to provide better responses to fulfill user's demands. Existing dialogue systems suffer from the issue of inconsistency and dull responses. Many goal-oriented conversational systems lack the different aspect information of the products or services to present an informative and exciting response. Aspects such as the price, color, pattern, rating are essential for deciding whether to purchase/order. To alleviate the issues in the existing systems, we propose the novel task of multi-aspect guided dialogue generation. This task is introduced to focus on making the responses focused and consistent with the different aspects mentioned in the current dialogue. In our present work, we design a hierarchical transformer network to capture the dialogue context for generating the responses. For creating the responses with multiple aspects, we explicitly give the aspect vectors at the time of decoding for generation. The information of the aspects specified to the decoder controls the overall generation process. We evaluate our proposed hierarchical framework on the newly created multi-domain multi-modal dialogue (MDMMD) dataset consisting of both text and images. Experimental results show that the proposed system outperforms all the existing and baseline approaches. The aspect controlled responses are immensely consistent with the ongoing dialogue and highly diverse resolving the issues of the current systems.","PeriodicalId":396583,"journal":{"name":"2021 International Joint Conference on Neural Networks (IJCNN)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN52387.2021.9533886","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodality in dialogues has become crucial for a thorough understanding of the intent of the user to provide better responses to fulfill user's demands. Existing dialogue systems suffer from the issue of inconsistency and dull responses. Many goal-oriented conversational systems lack the different aspect information of the products or services to present an informative and exciting response. Aspects such as the price, color, pattern, rating are essential for deciding whether to purchase/order. To alleviate the issues in the existing systems, we propose the novel task of multi-aspect guided dialogue generation. This task is introduced to focus on making the responses focused and consistent with the different aspects mentioned in the current dialogue. In our present work, we design a hierarchical transformer network to capture the dialogue context for generating the responses. For creating the responses with multiple aspects, we explicitly give the aspect vectors at the time of decoding for generation. The information of the aspects specified to the decoder controls the overall generation process. We evaluate our proposed hierarchical framework on the newly created multi-domain multi-modal dialogue (MDMMD) dataset consisting of both text and images. Experimental results show that the proposed system outperforms all the existing and baseline approaches. The aspect controlled responses are immensely consistent with the ongoing dialogue and highly diverse resolving the issues of the current systems.