{"title":"Disentangling User Conversations with Voice Assistants for Online Shopping","authors":"Nikhita Vedula, M. Collins, Oleg Rokhlenko","doi":"10.1145/3539618.3591974","DOIUrl":null,"url":null,"abstract":"Conversation disentanglement aims to identify and group utterances from a conversation into separate threads. Existing methods primarily focus on disentangling multi-party conversations with three or more speakers, explicitly or implicitly incorporating speaker-related feature signals to disentangle. Most existing models require a large amount of human annotated data for model training, and often focus on pairwise relations between utterances, not accounting much for the conversational context. In this work, we propose a multi-task learning approach with a contrastive learning objective, DiSC, to disentangle conversations between two speakers -- a user and a virtual speech assistant, for a novel domain of e-commerce. We analyze multiple ways and granularities to define conversation \"threads''. DiSC jointly learns the relation between pairs of utterances, as well as between utterances and their respective thread context. We train and evaluate our models on multiple multi-threaded conversation datasets that were automatically created, without any human labeling effort. Experimental results on public datasets as well as real-world shopping conversations from a commercial speech assistant show that DiSC outperforms state-of-the-art baselines by at least 3%, across both automatic and human evaluation metrics. We also demonstrate how DiSC improves downstream dialog response generation in the shopping domain.","PeriodicalId":425056,"journal":{"name":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3539618.3591974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Conversation disentanglement aims to identify and group utterances from a conversation into separate threads. Existing methods primarily focus on disentangling multi-party conversations with three or more speakers, explicitly or implicitly incorporating speaker-related feature signals to disentangle. Most existing models require a large amount of human annotated data for model training, and often focus on pairwise relations between utterances, not accounting much for the conversational context. In this work, we propose a multi-task learning approach with a contrastive learning objective, DiSC, to disentangle conversations between two speakers -- a user and a virtual speech assistant, for a novel domain of e-commerce. We analyze multiple ways and granularities to define conversation "threads''. DiSC jointly learns the relation between pairs of utterances, as well as between utterances and their respective thread context. We train and evaluate our models on multiple multi-threaded conversation datasets that were automatically created, without any human labeling effort. Experimental results on public datasets as well as real-world shopping conversations from a commercial speech assistant show that DiSC outperforms state-of-the-art baselines by at least 3%, across both automatic and human evaluation metrics. We also demonstrate how DiSC improves downstream dialog response generation in the shopping domain.