The New TREC Track on Podcast Search and Summarization

R. Jones
{"title":"The New TREC Track on Podcast Search and Summarization","authors":"R. Jones","doi":"10.1145/3397271.3402431","DOIUrl":null,"url":null,"abstract":"Podcasts are exploding in popularity. As this medium grows, it becomes increasingly important to understand the content of podcasts (e.g. what exactly is being covered, by whom, and how?), and how we can use this to connect users to shows that align with their interests. Given the explosion of new material, how do listeners find the needle in the haystack, and connect to those shows or episodes that speak to them? Furthermore, once they are presented with potential podcasts to listen to, how can they decide if this is what they want? To move the needle forward more rapidly toward this goal, we've introduced the Spotify Podcasts Dataset [1] and TREC shared task [2]. This dataset represents the first large-scale set of podcasts, with transcripts, released to the research community. The accompanying shared task is part of the TREC 2020 Conference, run by the US National Institute of Standards and Technology. The challenge is planned to run for several years, with progressively more demanding tasks: this first year, the challenge involves a search-related task and a task to automatically generate summaries, both based on transcripts of the audio. In this talk I will describe the task and dataset, outlining how the dataset is orders of magnitude larger than previous spoken document datasets, and how the tasks take us beyond previous shared tasks both in spoken document retrieval and NLP.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3397271.3402431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Podcasts are exploding in popularity. As this medium grows, it becomes increasingly important to understand the content of podcasts (e.g. what exactly is being covered, by whom, and how?), and how we can use this to connect users to shows that align with their interests. Given the explosion of new material, how do listeners find the needle in the haystack, and connect to those shows or episodes that speak to them? Furthermore, once they are presented with potential podcasts to listen to, how can they decide if this is what they want? To move the needle forward more rapidly toward this goal, we've introduced the Spotify Podcasts Dataset [1] and TREC shared task [2]. This dataset represents the first large-scale set of podcasts, with transcripts, released to the research community. The accompanying shared task is part of the TREC 2020 Conference, run by the US National Institute of Standards and Technology. The challenge is planned to run for several years, with progressively more demanding tasks: this first year, the challenge involves a search-related task and a task to automatically generate summaries, both based on transcripts of the audio. In this talk I will describe the task and dataset, outlining how the dataset is orders of magnitude larger than previous spoken document datasets, and how the tasks take us beyond previous shared tasks both in spoken document retrieval and NLP.
播客搜索和总结的新TREC轨道
播客正迅速流行起来。随着这种媒体的发展,理解播客的内容变得越来越重要(例如,到底是什么内容,由谁来覆盖,以及如何覆盖?),以及我们如何利用这些内容将用户与符合他们兴趣的节目联系起来。考虑到新材料的爆炸式增长,听众如何在大海捞针中找到针,并与那些与他们说话的节目或剧集联系起来?此外,一旦有潜在的播客供他们收听,他们如何决定这是否是他们想要的?为了更快地实现这一目标,我们引入了Spotify Podcasts Dataset[1]和TREC共享任务[2]。这个数据集代表了第一个大规模的播客集,并有转录本,发布给研究社区。伴随的共同任务是TREC 2020会议的一部分,由美国国家标准与技术研究所主办。这项挑战计划持续数年,任务要求越来越高:第一年的挑战包括搜索相关任务和自动生成摘要的任务,两者都是基于音频的文本。在这次演讲中,我将描述任务和数据集,概述数据集如何比以前的语音文档数据集大几个数量级,以及任务如何使我们超越以前在语音文档检索和NLP中的共享任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信