{"title":"播客搜索和总结的新TREC轨道","authors":"R. Jones","doi":"10.1145/3397271.3402431","DOIUrl":null,"url":null,"abstract":"Podcasts are exploding in popularity. As this medium grows, it becomes increasingly important to understand the content of podcasts (e.g. what exactly is being covered, by whom, and how?), and how we can use this to connect users to shows that align with their interests. Given the explosion of new material, how do listeners find the needle in the haystack, and connect to those shows or episodes that speak to them? Furthermore, once they are presented with potential podcasts to listen to, how can they decide if this is what they want? To move the needle forward more rapidly toward this goal, we've introduced the Spotify Podcasts Dataset [1] and TREC shared task [2]. This dataset represents the first large-scale set of podcasts, with transcripts, released to the research community. The accompanying shared task is part of the TREC 2020 Conference, run by the US National Institute of Standards and Technology. The challenge is planned to run for several years, with progressively more demanding tasks: this first year, the challenge involves a search-related task and a task to automatically generate summaries, both based on transcripts of the audio. In this talk I will describe the task and dataset, outlining how the dataset is orders of magnitude larger than previous spoken document datasets, and how the tasks take us beyond previous shared tasks both in spoken document retrieval and NLP.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The New TREC Track on Podcast Search and Summarization\",\"authors\":\"R. Jones\",\"doi\":\"10.1145/3397271.3402431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Podcasts are exploding in popularity. As this medium grows, it becomes increasingly important to understand the content of podcasts (e.g. what exactly is being covered, by whom, and how?), and how we can use this to connect users to shows that align with their interests. Given the explosion of new material, how do listeners find the needle in the haystack, and connect to those shows or episodes that speak to them? Furthermore, once they are presented with potential podcasts to listen to, how can they decide if this is what they want? To move the needle forward more rapidly toward this goal, we've introduced the Spotify Podcasts Dataset [1] and TREC shared task [2]. This dataset represents the first large-scale set of podcasts, with transcripts, released to the research community. The accompanying shared task is part of the TREC 2020 Conference, run by the US National Institute of Standards and Technology. The challenge is planned to run for several years, with progressively more demanding tasks: this first year, the challenge involves a search-related task and a task to automatically generate summaries, both based on transcripts of the audio. In this talk I will describe the task and dataset, outlining how the dataset is orders of magnitude larger than previous spoken document datasets, and how the tasks take us beyond previous shared tasks both in spoken document retrieval and NLP.\",\"PeriodicalId\":252050,\"journal\":{\"name\":\"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3397271.3402431\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3397271.3402431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The New TREC Track on Podcast Search and Summarization
Podcasts are exploding in popularity. As this medium grows, it becomes increasingly important to understand the content of podcasts (e.g. what exactly is being covered, by whom, and how?), and how we can use this to connect users to shows that align with their interests. Given the explosion of new material, how do listeners find the needle in the haystack, and connect to those shows or episodes that speak to them? Furthermore, once they are presented with potential podcasts to listen to, how can they decide if this is what they want? To move the needle forward more rapidly toward this goal, we've introduced the Spotify Podcasts Dataset [1] and TREC shared task [2]. This dataset represents the first large-scale set of podcasts, with transcripts, released to the research community. The accompanying shared task is part of the TREC 2020 Conference, run by the US National Institute of Standards and Technology. The challenge is planned to run for several years, with progressively more demanding tasks: this first year, the challenge involves a search-related task and a task to automatically generate summaries, both based on transcripts of the audio. In this talk I will describe the task and dataset, outlining how the dataset is orders of magnitude larger than previous spoken document datasets, and how the tasks take us beyond previous shared tasks both in spoken document retrieval and NLP.