Farnoosh Javadi, Phanideep Gampa, Alyssa Woo, Xingxing Geng, Hang Zhang, Jose Sepulveda, Belhassen Bayar, Fei Wang
{"title":"基于 LLM 的视频搜索查询意图分类弱监督框架","authors":"Farnoosh Javadi, Phanideep Gampa, Alyssa Woo, Xingxing Geng, Hang Zhang, Jose Sepulveda, Belhassen Bayar, Fei Wang","doi":"arxiv-2409.08931","DOIUrl":null,"url":null,"abstract":"Streaming services have reshaped how we discover and engage with digital\nentertainment. Despite these advancements, effectively understanding the wide\nspectrum of user search queries continues to pose a significant challenge. An\naccurate query understanding system that can handle a variety of entities that\nrepresent different user intents is essential for delivering an enhanced user\nexperience. We can build such a system by training a natural language\nunderstanding (NLU) model; however, obtaining high-quality labeled training\ndata in this specialized domain is a substantial obstacle. Manual annotation is\ncostly and impractical for capturing users' vast vocabulary variations. To\naddress this, we introduce a novel approach that leverages large language\nmodels (LLMs) through weak supervision to automatically annotate a vast\ncollection of user search queries. Using prompt engineering and a diverse set\nof LLM personas, we generate training data that matches human annotator\nexpectations. By incorporating domain knowledge via Chain of Thought and\nIn-Context Learning, our approach leverages the labeled data to train\nlow-latency models optimized for real-time inference. Extensive evaluations\ndemonstrated that our approach outperformed the baseline with an average\nrelative gain of 113% in recall. Furthermore, our novel prompt engineering\nframework yields higher quality LLM-generated data to be used for weak\nsupervision; we observed 47.60% improvement over baseline in agreement rate\nbetween LLM predictions and human annotations with respect to F1 score,\nweighted according to the distribution of occurrences of the search queries.\nOur persona selection routing mechanism further adds an additional 3.67%\nincrease in weighted F1 score on top of our novel prompt engineering framework.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"67 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM-based Weak Supervision Framework for Query Intent Classification in Video Search\",\"authors\":\"Farnoosh Javadi, Phanideep Gampa, Alyssa Woo, Xingxing Geng, Hang Zhang, Jose Sepulveda, Belhassen Bayar, Fei Wang\",\"doi\":\"arxiv-2409.08931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Streaming services have reshaped how we discover and engage with digital\\nentertainment. Despite these advancements, effectively understanding the wide\\nspectrum of user search queries continues to pose a significant challenge. An\\naccurate query understanding system that can handle a variety of entities that\\nrepresent different user intents is essential for delivering an enhanced user\\nexperience. We can build such a system by training a natural language\\nunderstanding (NLU) model; however, obtaining high-quality labeled training\\ndata in this specialized domain is a substantial obstacle. Manual annotation is\\ncostly and impractical for capturing users' vast vocabulary variations. To\\naddress this, we introduce a novel approach that leverages large language\\nmodels (LLMs) through weak supervision to automatically annotate a vast\\ncollection of user search queries. Using prompt engineering and a diverse set\\nof LLM personas, we generate training data that matches human annotator\\nexpectations. By incorporating domain knowledge via Chain of Thought and\\nIn-Context Learning, our approach leverages the labeled data to train\\nlow-latency models optimized for real-time inference. Extensive evaluations\\ndemonstrated that our approach outperformed the baseline with an average\\nrelative gain of 113% in recall. Furthermore, our novel prompt engineering\\nframework yields higher quality LLM-generated data to be used for weak\\nsupervision; we observed 47.60% improvement over baseline in agreement rate\\nbetween LLM predictions and human annotations with respect to F1 score,\\nweighted according to the distribution of occurrences of the search queries.\\nOur persona selection routing mechanism further adds an additional 3.67%\\nincrease in weighted F1 score on top of our novel prompt engineering framework.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"67 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
流媒体服务重塑了我们发现和参与数字娱乐的方式。尽管取得了这些进步,但有效理解用户搜索查询的广泛范围仍然是一项重大挑战。一个能够处理代表不同用户意图的各种实体的准确查询理解系统对于提供更好的使用体验至关重要。我们可以通过训练自然语言理解(NLU)模型来构建这样一个系统;然而,在这一专业领域获得高质量的标注训练数据是一个巨大的障碍。人工标注成本高昂,而且无法捕捉用户的大量词汇变化。为了解决这个问题,我们引入了一种新方法,通过弱监督利用大型语言模型(LLM)对大量用户搜索查询进行自动注释。通过使用提示工程和一系列不同的 LLM 角色,我们生成了与人类注释期望相匹配的训练数据。通过 "思维链"(Chain of Thought)和 "上下文学习"(In-Context Learning)结合领域知识,我们的方法利用标注数据来训练为实时推理而优化的低延迟模型。广泛的评估表明,我们的方法优于基线方法,平均召回率提高了 113%。此外,我们的新颖提示工程框架产生了更高质量的 LLM 生成数据,可用于弱监督;我们观察到,在 F1 分数方面,LLM 预测与人类注释之间的一致率比基线提高了 47.60%,而 F1 分数是根据搜索查询的出现率分布加权计算的。
LLM-based Weak Supervision Framework for Query Intent Classification in Video Search
Streaming services have reshaped how we discover and engage with digital
entertainment. Despite these advancements, effectively understanding the wide
spectrum of user search queries continues to pose a significant challenge. An
accurate query understanding system that can handle a variety of entities that
represent different user intents is essential for delivering an enhanced user
experience. We can build such a system by training a natural language
understanding (NLU) model; however, obtaining high-quality labeled training
data in this specialized domain is a substantial obstacle. Manual annotation is
costly and impractical for capturing users' vast vocabulary variations. To
address this, we introduce a novel approach that leverages large language
models (LLMs) through weak supervision to automatically annotate a vast
collection of user search queries. Using prompt engineering and a diverse set
of LLM personas, we generate training data that matches human annotator
expectations. By incorporating domain knowledge via Chain of Thought and
In-Context Learning, our approach leverages the labeled data to train
low-latency models optimized for real-time inference. Extensive evaluations
demonstrated that our approach outperformed the baseline with an average
relative gain of 113% in recall. Furthermore, our novel prompt engineering
framework yields higher quality LLM-generated data to be used for weak
supervision; we observed 47.60% improvement over baseline in agreement rate
between LLM predictions and human annotations with respect to F1 score,
weighted according to the distribution of occurrences of the search queries.
Our persona selection routing mechanism further adds an additional 3.67%
increase in weighted F1 score on top of our novel prompt engineering framework.