基于用户反馈的意图分类在线学习

Companion Publication of the 2020 International Conference on Multimodal Interaction Pub Date : 2023-10-09 DOI:10.1145/3577190.3614137

Kaan Gönç, Baturay Sağlam, Onat Dalmaz, Tolga Çukur, Serdar Kozat, Hamdi Dibeklioglu

{"title":"基于用户反馈的意图分类在线学习","authors":"Kaan Gönç, Baturay Sağlam, Onat Dalmaz, Tolga Çukur, Serdar Kozat, Hamdi Dibeklioglu","doi":"10.1145/3577190.3614137","DOIUrl":null,"url":null,"abstract":"Intent classification is a key task in natural language processing (NLP) that aims to infer the goal or intention behind a user’s query. Most existing intent classification methods rely on supervised deep models trained on large annotated datasets of text-intent pairs. However, obtaining such datasets is often expensive and impractical in real-world settings. Furthermore, supervised models may overfit or face distributional shifts when new intents, utterances, or data distributions emerge over time, requiring frequent retraining. Online learning methods based on user feedback can overcome this limitation, as they do not need access to intents while collecting data and adapting the model continuously. In this paper, we propose a novel multi-armed contextual bandit framework that leverages a text encoder based on a large language model (LLM) to extract the latent features of a given utterance and jointly learn multimodal representations of encoded text features and intents. Our framework consists of two stages: offline pretraining and online fine-tuning. In the offline stage, we train the policy on a small labeled dataset using a contextual bandit approach. In the online stage, we fine-tune the policy parameters using the REINFORCE algorithm with a user feedback-based objective, without relying on the true intents. We further introduce a sliding window strategy for simulating the retrieval of data samples during online training. This novel two-phase approach enables our method to efficiently adapt to dynamic user preferences and data distributions with improved performance. An extensive set of empirical studies indicate that our method significantly outperforms policies that omit either offline pretraining or online fine-tuning, while achieving competitive performance to a supervised benchmark trained on an order of magnitude larger labeled dataset.","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"User Feedback-based Online Learning for Intent Classification\",\"authors\":\"Kaan Gönç, Baturay Sağlam, Onat Dalmaz, Tolga Çukur, Serdar Kozat, Hamdi Dibeklioglu\",\"doi\":\"10.1145/3577190.3614137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Intent classification is a key task in natural language processing (NLP) that aims to infer the goal or intention behind a user’s query. Most existing intent classification methods rely on supervised deep models trained on large annotated datasets of text-intent pairs. However, obtaining such datasets is often expensive and impractical in real-world settings. Furthermore, supervised models may overfit or face distributional shifts when new intents, utterances, or data distributions emerge over time, requiring frequent retraining. Online learning methods based on user feedback can overcome this limitation, as they do not need access to intents while collecting data and adapting the model continuously. In this paper, we propose a novel multi-armed contextual bandit framework that leverages a text encoder based on a large language model (LLM) to extract the latent features of a given utterance and jointly learn multimodal representations of encoded text features and intents. Our framework consists of two stages: offline pretraining and online fine-tuning. In the offline stage, we train the policy on a small labeled dataset using a contextual bandit approach. In the online stage, we fine-tune the policy parameters using the REINFORCE algorithm with a user feedback-based objective, without relying on the true intents. We further introduce a sliding window strategy for simulating the retrieval of data samples during online training. This novel two-phase approach enables our method to efficiently adapt to dynamic user preferences and data distributions with improved performance. An extensive set of empirical studies indicate that our method significantly outperforms policies that omit either offline pretraining or online fine-tuning, while achieving competitive performance to a supervised benchmark trained on an order of magnitude larger labeled dataset.\",\"PeriodicalId\":93171,\"journal\":{\"name\":\"Companion Publication of the 2020 International Conference on Multimodal Interaction\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Companion Publication of the 2020 International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3577190.3614137\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577190.3614137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

意图分类是自然语言处理(NLP)中的一项关键任务，旨在推断用户查询背后的目标或意图。大多数现有的意图分类方法依赖于在文本意图对的大型注释数据集上训练的监督深度模型。然而，在现实世界中，获得这样的数据集往往是昂贵和不切实际的。此外，随着时间的推移，当新的意图、话语或数据分布出现时，监督模型可能会过度拟合或面临分布变化，需要频繁的再训练。基于用户反馈的在线学习方法可以克服这一限制，因为它们在收集数据和不断调整模型时不需要访问意图。在本文中，我们提出了一种新的多臂上下文强盗框架，该框架利用基于大型语言模型(LLM)的文本编码器来提取给定话语的潜在特征，并共同学习编码文本特征和意图的多模态表示。我们的框架包括两个阶段:离线预训练和在线微调。在离线阶段，我们使用上下文强盗方法在一个小的标记数据集上训练策略。在在线阶段，我们使用基于用户反馈目标的强化算法微调策略参数，而不依赖于真实意图。我们进一步引入滑动窗口策略来模拟在线训练过程中数据样本的检索。这种新颖的两阶段方法使我们的方法能够有效地适应动态用户偏好和数据分布，并提高性能。一组广泛的实证研究表明，我们的方法显著优于忽略离线预训练或在线微调的策略，同时与在更大的标记数据集上训练的监督基准相比，获得了具有竞争力的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

User Feedback-based Online Learning for Intent Classification

Intent classification is a key task in natural language processing (NLP) that aims to infer the goal or intention behind a user’s query. Most existing intent classification methods rely on supervised deep models trained on large annotated datasets of text-intent pairs. However, obtaining such datasets is often expensive and impractical in real-world settings. Furthermore, supervised models may overfit or face distributional shifts when new intents, utterances, or data distributions emerge over time, requiring frequent retraining. Online learning methods based on user feedback can overcome this limitation, as they do not need access to intents while collecting data and adapting the model continuously. In this paper, we propose a novel multi-armed contextual bandit framework that leverages a text encoder based on a large language model (LLM) to extract the latent features of a given utterance and jointly learn multimodal representations of encoded text features and intents. Our framework consists of two stages: offline pretraining and online fine-tuning. In the offline stage, we train the policy on a small labeled dataset using a contextual bandit approach. In the online stage, we fine-tune the policy parameters using the REINFORCE algorithm with a user feedback-based objective, without relying on the true intents. We further introduce a sliding window strategy for simulating the retrieval of data samples during online training. This novel two-phase approach enables our method to efficiently adapt to dynamic user preferences and data distributions with improved performance. An extensive set of empirical studies indicate that our method significantly outperforms policies that omit either offline pretraining or online fine-tuning, while achieving competitive performance to a supervised benchmark trained on an order of magnitude larger labeled dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Companion Publication of the 2020 International Conference on Multimodal Interaction

自引率

0.00%

发文量