{"title":"\"Just play something awesome\": the personalization powering voice interactions at Pandora","authors":"V. Ostuni","doi":"10.1145/3298689.3347064","DOIUrl":null,"url":null,"abstract":"The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a delightful listening experience for millions of users daily. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic and broad open-ended. Known-item search requests are the most common scenario where users have a well defined and clear intent which is looking for a specific item in the catalog or their personal collection. A voice interface makes the task natural and easy to accomplish since the user is not required to type on a small keyboard. Solving for this specific task involves performing an entity search against a large music catalog and personal user collection. This can be very challenging due to imperfect voice utterance transcriptions, unconventional entity names and the numerous combinations of ways a user can ask for music entities. We employ personalization algorithms for entity disambiguation which can be caused by the presence of homonyms, homographs and homophones terms in the catalog. Another common voice use case is to ask for music regarding a specific theme or context such as a genre, an activity, a mood, an occasion or any combination of those. This scenario differs sharply from the known-item case in that multiple results might, based on user varying contexts, be relevant rather than a single clearly relevant one. For example, a rap music fan would not enjoy a country workout playlist when asking for \"music for working out\" but may like a hip hop one. This problem can be quite complex to solve as it involves different areas such as voice spoken language understanding, content tagging and personalization. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query. After that, we will discuss some of the content tagging work we have done to classify music according to these voice specific themes. Lastly, we will touch upon how we use recommendation techniques to deliver personalized and unique results to each individual and describe the challenge of balancing the delicate trade-off between query relevance and personalization. The third category of voice queries we will describe are broad or open-ended requests. Voice users often skip the hard work of thinking about what they actually want to hear and command: \"just play something awesome\". A music service should still meet these expectations instead of interpreting those commands as literal requests. We discuss exploit and explore trade-offs made in the recommendation item pool generation process. Here the exploit pool contains items aimed at re-consumption, while the explore pool contains new items with specific context match. Finally, we will discuss differences and challenges regarding evaluation of voice powered recommendation systems. The first key difference is that in the standard recommendation system settings evaluations are based on UI signals such as impressions and clicks or other explicit forms of feedback. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.","PeriodicalId":215384,"journal":{"name":"Proceedings of the 13th ACM Conference on Recommender Systems","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM Conference on Recommender Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3298689.3347064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a delightful listening experience for millions of users daily. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic and broad open-ended. Known-item search requests are the most common scenario where users have a well defined and clear intent which is looking for a specific item in the catalog or their personal collection. A voice interface makes the task natural and easy to accomplish since the user is not required to type on a small keyboard. Solving for this specific task involves performing an entity search against a large music catalog and personal user collection. This can be very challenging due to imperfect voice utterance transcriptions, unconventional entity names and the numerous combinations of ways a user can ask for music entities. We employ personalization algorithms for entity disambiguation which can be caused by the presence of homonyms, homographs and homophones terms in the catalog. Another common voice use case is to ask for music regarding a specific theme or context such as a genre, an activity, a mood, an occasion or any combination of those. This scenario differs sharply from the known-item case in that multiple results might, based on user varying contexts, be relevant rather than a single clearly relevant one. For example, a rap music fan would not enjoy a country workout playlist when asking for "music for working out" but may like a hip hop one. This problem can be quite complex to solve as it involves different areas such as voice spoken language understanding, content tagging and personalization. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query. After that, we will discuss some of the content tagging work we have done to classify music according to these voice specific themes. Lastly, we will touch upon how we use recommendation techniques to deliver personalized and unique results to each individual and describe the challenge of balancing the delicate trade-off between query relevance and personalization. The third category of voice queries we will describe are broad or open-ended requests. Voice users often skip the hard work of thinking about what they actually want to hear and command: "just play something awesome". A music service should still meet these expectations instead of interpreting those commands as literal requests. We discuss exploit and explore trade-offs made in the recommendation item pool generation process. Here the exploit pool contains items aimed at re-consumption, while the explore pool contains new items with specific context match. Finally, we will discuss differences and challenges regarding evaluation of voice powered recommendation systems. The first key difference is that in the standard recommendation system settings evaluations are based on UI signals such as impressions and clicks or other explicit forms of feedback. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.