The New TREC Track on Podcast Search and Summarization

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval Pub Date : 2020-07-25 DOI:10.1145/3397271.3402431

R. Jones

{"title":"The New TREC Track on Podcast Search and Summarization","authors":"R. Jones","doi":"10.1145/3397271.3402431","DOIUrl":null,"url":null,"abstract":"Podcasts are exploding in popularity. As this medium grows, it becomes increasingly important to understand the content of podcasts (e.g. what exactly is being covered, by whom, and how?), and how we can use this to connect users to shows that align with their interests. Given the explosion of new material, how do listeners find the needle in the haystack, and connect to those shows or episodes that speak to them? Furthermore, once they are presented with potential podcasts to listen to, how can they decide if this is what they want? To move the needle forward more rapidly toward this goal, we've introduced the Spotify Podcasts Dataset [1] and TREC shared task [2]. This dataset represents the first large-scale set of podcasts, with transcripts, released to the research community. The accompanying shared task is part of the TREC 2020 Conference, run by the US National Institute of Standards and Technology. The challenge is planned to run for several years, with progressively more demanding tasks: this first year, the challenge involves a search-related task and a task to automatically generate summaries, both based on transcripts of the audio. In this talk I will describe the task and dataset, outlining how the dataset is orders of magnitude larger than previous spoken document datasets, and how the tasks take us beyond previous shared tasks both in spoken document retrieval and NLP.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3397271.3402431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Podcasts are exploding in popularity. As this medium grows, it becomes increasingly important to understand the content of podcasts (e.g. what exactly is being covered, by whom, and how?), and how we can use this to connect users to shows that align with their interests. Given the explosion of new material, how do listeners find the needle in the haystack, and connect to those shows or episodes that speak to them? Furthermore, once they are presented with potential podcasts to listen to, how can they decide if this is what they want? To move the needle forward more rapidly toward this goal, we've introduced the Spotify Podcasts Dataset [1] and TREC shared task [2]. This dataset represents the first large-scale set of podcasts, with transcripts, released to the research community. The accompanying shared task is part of the TREC 2020 Conference, run by the US National Institute of Standards and Technology. The challenge is planned to run for several years, with progressively more demanding tasks: this first year, the challenge involves a search-related task and a task to automatically generate summaries, both based on transcripts of the audio. In this talk I will describe the task and dataset, outlining how the dataset is orders of magnitude larger than previous spoken document datasets, and how the tasks take us beyond previous shared tasks both in spoken document retrieval and NLP.

查看原文本刊更多论文

播客搜索和总结的新TREC轨道

播客正迅速流行起来。随着这种媒体的发展，理解播客的内容变得越来越重要(例如，到底是什么内容，由谁来覆盖，以及如何覆盖?)，以及我们如何利用这些内容将用户与符合他们兴趣的节目联系起来。考虑到新材料的爆炸式增长，听众如何在大海捞针中找到针，并与那些与他们说话的节目或剧集联系起来?此外，一旦有潜在的播客供他们收听，他们如何决定这是否是他们想要的?为了更快地实现这一目标，我们引入了Spotify Podcasts Dataset[1]和TREC共享任务[2]。这个数据集代表了第一个大规模的播客集，并有转录本，发布给研究社区。伴随的共同任务是TREC 2020会议的一部分，由美国国家标准与技术研究所主办。这项挑战计划持续数年，任务要求越来越高:第一年的挑战包括搜索相关任务和自动生成摘要的任务，两者都是基于音频的文本。在这次演讲中，我将描述任务和数据集，概述数据集如何比以前的语音文档数据集大几个数量级，以及任务如何使我们超越以前在语音文档检索和NLP中的共享任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

自引率

0.00%

发文量