“只是玩一些很棒的东西”:潘多拉的个性化语音交互

Proceedings of the 13th ACM Conference on Recommender Systems Pub Date : 2019-09-10 DOI:10.1145/3298689.3347064

V. Ostuni

{"title":"“只是玩一些很棒的东西”:潘多拉的个性化语音交互","authors":"V. Ostuni","doi":"10.1145/3298689.3347064","DOIUrl":null,"url":null,"abstract":"The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a delightful listening experience for millions of users daily. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic and broad open-ended. Known-item search requests are the most common scenario where users have a well defined and clear intent which is looking for a specific item in the catalog or their personal collection. A voice interface makes the task natural and easy to accomplish since the user is not required to type on a small keyboard. Solving for this specific task involves performing an entity search against a large music catalog and personal user collection. This can be very challenging due to imperfect voice utterance transcriptions, unconventional entity names and the numerous combinations of ways a user can ask for music entities. We employ personalization algorithms for entity disambiguation which can be caused by the presence of homonyms, homographs and homophones terms in the catalog. Another common voice use case is to ask for music regarding a specific theme or context such as a genre, an activity, a mood, an occasion or any combination of those. This scenario differs sharply from the known-item case in that multiple results might, based on user varying contexts, be relevant rather than a single clearly relevant one. For example, a rap music fan would not enjoy a country workout playlist when asking for \"music for working out\" but may like a hip hop one. This problem can be quite complex to solve as it involves different areas such as voice spoken language understanding, content tagging and personalization. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query. After that, we will discuss some of the content tagging work we have done to classify music according to these voice specific themes. Lastly, we will touch upon how we use recommendation techniques to deliver personalized and unique results to each individual and describe the challenge of balancing the delicate trade-off between query relevance and personalization. The third category of voice queries we will describe are broad or open-ended requests. Voice users often skip the hard work of thinking about what they actually want to hear and command: \"just play something awesome\". A music service should still meet these expectations instead of interpreting those commands as literal requests. We discuss exploit and explore trade-offs made in the recommendation item pool generation process. Here the exploit pool contains items aimed at re-consumption, while the explore pool contains new items with specific context match. Finally, we will discuss differences and challenges regarding evaluation of voice powered recommendation systems. The first key difference is that in the standard recommendation system settings evaluations are based on UI signals such as impressions and clicks or other explicit forms of feedback. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.","PeriodicalId":215384,"journal":{"name":"Proceedings of the 13th ACM Conference on Recommender Systems","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"\\\"Just play something awesome\\\": the personalization powering voice interactions at Pandora\",\"authors\":\"V. Ostuni\",\"doi\":\"10.1145/3298689.3347064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a delightful listening experience for millions of users daily. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic and broad open-ended. Known-item search requests are the most common scenario where users have a well defined and clear intent which is looking for a specific item in the catalog or their personal collection. A voice interface makes the task natural and easy to accomplish since the user is not required to type on a small keyboard. Solving for this specific task involves performing an entity search against a large music catalog and personal user collection. This can be very challenging due to imperfect voice utterance transcriptions, unconventional entity names and the numerous combinations of ways a user can ask for music entities. We employ personalization algorithms for entity disambiguation which can be caused by the presence of homonyms, homographs and homophones terms in the catalog. Another common voice use case is to ask for music regarding a specific theme or context such as a genre, an activity, a mood, an occasion or any combination of those. This scenario differs sharply from the known-item case in that multiple results might, based on user varying contexts, be relevant rather than a single clearly relevant one. For example, a rap music fan would not enjoy a country workout playlist when asking for \\\"music for working out\\\" but may like a hip hop one. This problem can be quite complex to solve as it involves different areas such as voice spoken language understanding, content tagging and personalization. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query. After that, we will discuss some of the content tagging work we have done to classify music according to these voice specific themes. Lastly, we will touch upon how we use recommendation techniques to deliver personalized and unique results to each individual and describe the challenge of balancing the delicate trade-off between query relevance and personalization. The third category of voice queries we will describe are broad or open-ended requests. Voice users often skip the hard work of thinking about what they actually want to hear and command: \\\"just play something awesome\\\". A music service should still meet these expectations instead of interpreting those commands as literal requests. We discuss exploit and explore trade-offs made in the recommendation item pool generation process. Here the exploit pool contains items aimed at re-consumption, while the explore pool contains new items with specific context match. Finally, we will discuss differences and challenges regarding evaluation of voice powered recommendation systems. The first key difference is that in the standard recommendation system settings evaluations are based on UI signals such as impressions and clicks or other explicit forms of feedback. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.\",\"PeriodicalId\":215384,\"journal\":{\"name\":\"Proceedings of the 13th ACM Conference on Recommender Systems\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 13th ACM Conference on Recommender Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3298689.3347064\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM Conference on Recommender Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3298689.3347064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在过去的几年里，语音设备的采用出现了爆炸式的增长，音乐消费是最受欢迎的用例之一。音乐个性化和推荐在Pandora每天为数百万用户提供愉快的聆听体验方面发挥着重要作用。反过来，通过这些新颖的语音界面提供同样完美定制的聆听体验带来了新的有趣挑战和令人兴奋的机会。在这次演讲中，我们将描述我们如何在三种常见的语音场景中应用个性化和推荐技术，这些场景可以根据请求类型来定义:已知项目、主题和广泛开放。已知项目搜索请求是最常见的场景，当用户有一个明确的定义和明确的意图，即在目录或他们的个人收藏中寻找特定的项目。语音界面使任务自然而容易完成，因为用户不需要在一个小键盘上打字。解决这个特定任务需要对大型音乐目录和个人用户集合执行实体搜索。由于不完美的语音转录，非常规的实体名称以及用户可以请求音乐实体的多种方式组合，这可能非常具有挑战性。我们采用个性化算法对实体消歧，这可能是由于同音异义词，同音异义词和同音异义词在目录中的存在。另一个常见的语音用例是要求播放与特定主题或背景有关的音乐，如流派、活动、情绪、场合或这些的任何组合。这种情况与已知项目的情况有很大不同，因为根据用户不同的上下文，多个结果可能是相关的，而不是一个明确相关的结果。例如，当要求“健身音乐”时，说唱乐迷不会喜欢乡村音乐，但可能会喜欢嘻哈音乐。这个问题可能非常复杂，因为它涉及不同的领域，如语音口语理解、内容标记和个性化。我们将描述如何使用深度学习槽填充技术和查询分类来解释用户意图并识别查询中的主要概念。之后，我们将讨论我们所做的一些内容标记工作，根据这些声音特定的主题对音乐进行分类。最后，我们将讨论如何使用推荐技术为每个人提供个性化和独特的结果，并描述在查询相关性和个性化之间平衡的挑战。我们将描述的第三类语音查询是广泛的或开放式的请求。语音用户通常会跳过思考他们真正想要听到什么和命令的困难工作:“播放一些很棒的东西”。音乐服务仍然应该满足这些期望，而不是将这些命令解释为文字请求。我们讨论了在推荐项目池生成过程中所做的利用和权衡。这里，利用池包含旨在重新消费的项目，而探索池包含具有特定上下文匹配的新项目。最后，我们将讨论语音推荐系统评估的差异和挑战。第一个关键区别是，在标准的推荐系统设置中，评估是基于UI信号，如印象和点击或其他明确形式的反馈。因为纯语音界面不包含可视UI元素，所以相关性标签需要通过诸如游戏时间、查询重新表述或其他类型的会话级别信息等隐式操作来推断。另一个区别是，虽然典型的推荐任务对应于推荐一个物品排名列表，但语音播放请求转化为单个物品播放操作。因此，需要对闭合反馈回路进行一些考虑。总之，提高音乐服务中语音交互的质量是一个相对较新的挑战，仍然存在许多令人兴奋的突破机会。推荐系统界面有许多新的方面需要解决，以便为语音用户带来愉快和轻松的体验。我们将分享未来需要解决的一些公开挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

"Just play something awesome": the personalization powering voice interactions at Pandora

The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a delightful listening experience for millions of users daily. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic and broad open-ended. Known-item search requests are the most common scenario where users have a well defined and clear intent which is looking for a specific item in the catalog or their personal collection. A voice interface makes the task natural and easy to accomplish since the user is not required to type on a small keyboard. Solving for this specific task involves performing an entity search against a large music catalog and personal user collection. This can be very challenging due to imperfect voice utterance transcriptions, unconventional entity names and the numerous combinations of ways a user can ask for music entities. We employ personalization algorithms for entity disambiguation which can be caused by the presence of homonyms, homographs and homophones terms in the catalog. Another common voice use case is to ask for music regarding a specific theme or context such as a genre, an activity, a mood, an occasion or any combination of those. This scenario differs sharply from the known-item case in that multiple results might, based on user varying contexts, be relevant rather than a single clearly relevant one. For example, a rap music fan would not enjoy a country workout playlist when asking for "music for working out" but may like a hip hop one. This problem can be quite complex to solve as it involves different areas such as voice spoken language understanding, content tagging and personalization. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query. After that, we will discuss some of the content tagging work we have done to classify music according to these voice specific themes. Lastly, we will touch upon how we use recommendation techniques to deliver personalized and unique results to each individual and describe the challenge of balancing the delicate trade-off between query relevance and personalization. The third category of voice queries we will describe are broad or open-ended requests. Voice users often skip the hard work of thinking about what they actually want to hear and command: "just play something awesome". A music service should still meet these expectations instead of interpreting those commands as literal requests. We discuss exploit and explore trade-offs made in the recommendation item pool generation process. Here the exploit pool contains items aimed at re-consumption, while the explore pool contains new items with specific context match. Finally, we will discuss differences and challenges regarding evaluation of voice powered recommendation systems. The first key difference is that in the standard recommendation system settings evaluations are based on UI signals such as impressions and clicks or other explicit forms of feedback. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 13th ACM Conference on Recommender Systems

自引率

0.00%

发文量