RCD-2020概述，即FIRE-2020关于会话对话检索的轨道

Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation Pub Date : 2020-12-16 DOI:10.1145/3441501.3441518

Debasis Ganguly, Dipasree Pal, Manisha Verma, Procheta Sen

{"title":"RCD-2020概述，即FIRE-2020关于会话对话检索的轨道","authors":"Debasis Ganguly, Dipasree Pal, Manisha Verma, Procheta Sen","doi":"10.1145/3441501.3441518","DOIUrl":null,"url":null,"abstract":"This paper describes an overview of the track - ’Retrieval from Conversational Dialogues’ (RCD) organized as a part of Forum of Information Retrieval and Evaluation (FIRE), 2020. The motivation of the track is to develop a dataset towards a controlled and reproducible laboratory based experimental setup for investigating the effectiveness if conversational assistance systems. Specifically, the manner of conversational assistance which this track addresses is contextualization of certain concepts within the content either written (e.g. a chat system) or uttered (e.g. in an audio or video communication) by a user about which the other users participating in the communication are not well versed. To study the problem under a laboratory-based reproducible setting, we took a collection of four movie scripts and manually annotated spans of text that may require contextualization. The two tasks involved in RCD track are: a) Task-1:, where participants were required to estimate the annotated span of text likely to be benefited by contextualization from a given sequence of dialogue based interactions from the script; and b) Task-2:, which involved retrieving a ranked list of documents corresponding to the concepts requiring contextualization. To evaluate the effectiveness of Task-1, we used i) a character n-gram based variant of the BLEU score, and ii) bag-of-words based Jaccard coefficient to measure the overlap between the manually annotated ground-truth and the automatically extracted text spans at two different granularity levels of character and word matches, respectively. To evaluate the effectiveness of the retrieved documents for Task-2, we employed two standard precision-oriented information retrieval (IR) metrics, namely precision at top-5 ranks (P@5) and mean reciprocal rank (MRR), along with a both precision and recall oriented metric, namely the mean average precision (MAP). We received a total of 5 submissions from a single participating team for both the tasks. A general trend from the submitted runs is that statistical-based unsupervised approaches of term extraction and summarization from movie scripts turned out to be more effective for both the tasks (i.e. query identification and retrieval) than supervised approaches, such as pre-trained transformer (BERT) based ones.","PeriodicalId":415985,"journal":{"name":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Overview of RCD-2020, the FIRE-2020 track on Retrieval from Conversational Dialogues\",\"authors\":\"Debasis Ganguly, Dipasree Pal, Manisha Verma, Procheta Sen\",\"doi\":\"10.1145/3441501.3441518\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes an overview of the track - ’Retrieval from Conversational Dialogues’ (RCD) organized as a part of Forum of Information Retrieval and Evaluation (FIRE), 2020. The motivation of the track is to develop a dataset towards a controlled and reproducible laboratory based experimental setup for investigating the effectiveness if conversational assistance systems. Specifically, the manner of conversational assistance which this track addresses is contextualization of certain concepts within the content either written (e.g. a chat system) or uttered (e.g. in an audio or video communication) by a user about which the other users participating in the communication are not well versed. To study the problem under a laboratory-based reproducible setting, we took a collection of four movie scripts and manually annotated spans of text that may require contextualization. The two tasks involved in RCD track are: a) Task-1:, where participants were required to estimate the annotated span of text likely to be benefited by contextualization from a given sequence of dialogue based interactions from the script; and b) Task-2:, which involved retrieving a ranked list of documents corresponding to the concepts requiring contextualization. To evaluate the effectiveness of Task-1, we used i) a character n-gram based variant of the BLEU score, and ii) bag-of-words based Jaccard coefficient to measure the overlap between the manually annotated ground-truth and the automatically extracted text spans at two different granularity levels of character and word matches, respectively. To evaluate the effectiveness of the retrieved documents for Task-2, we employed two standard precision-oriented information retrieval (IR) metrics, namely precision at top-5 ranks (P@5) and mean reciprocal rank (MRR), along with a both precision and recall oriented metric, namely the mean average precision (MAP). We received a total of 5 submissions from a single participating team for both the tasks. A general trend from the submitted runs is that statistical-based unsupervised approaches of term extraction and summarization from movie scripts turned out to be more effective for both the tasks (i.e. query identification and retrieval) than supervised approaches, such as pre-trained transformer (BERT) based ones.\",\"PeriodicalId\":415985,\"journal\":{\"name\":\"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3441501.3441518\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3441501.3441518","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

本文描述了作为2020年信息检索与评估论坛(FIRE)的一部分组织的“会话对话检索”(RCD)轨道的概述。该轨道的动机是开发一个数据集，以控制和可重复的实验室为基础的实验设置，以调查对话辅助系统的有效性。具体地说，这条轨道所涉及的对话辅助方式是将用户编写的(例如聊天系统)或发出的(例如在音频或视频通信中)内容中的某些概念上下文化，而参与通信的其他用户并不精通这些概念。为了在基于实验室的可重复设置下研究这个问题，我们收集了四个电影剧本，并手动注释了可能需要上下文化的文本范围。RCD轨道中涉及的两个任务是:a)任务1:要求参与者估计文本的注释范围可能受益于脚本中基于对话的给定交互序列的上下文化;b)任务2:涉及检索与需要上下文化的概念相对应的文档排序列表。为了评估Task-1的有效性，我们使用i)基于字符n图的BLEU分数变体，以及ii)基于词袋的Jaccard系数，分别在字符和词匹配的两个不同粒度级别上测量手动注释的基本事实和自动提取的文本跨度之间的重叠。为了评估Task-2检索文档的有效性，我们采用了两个标准的面向精度的信息检索(IR)指标，即前5个排名的精度(P@5)和平均倒数排名(MRR)，以及面向精度和召回率的平均平均精度(MAP)。我们一共收到了来自同一个参赛团队的5份参赛作品。从提交的运行来看，一个总体趋势是基于统计的无监督方法(从电影剧本中提取和总结术语)对于这两个任务(即查询识别和检索)都比有监督的方法(如基于预训练变压器(BERT)的方法)更有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Overview of RCD-2020, the FIRE-2020 track on Retrieval from Conversational Dialogues

This paper describes an overview of the track - ’Retrieval from Conversational Dialogues’ (RCD) organized as a part of Forum of Information Retrieval and Evaluation (FIRE), 2020. The motivation of the track is to develop a dataset towards a controlled and reproducible laboratory based experimental setup for investigating the effectiveness if conversational assistance systems. Specifically, the manner of conversational assistance which this track addresses is contextualization of certain concepts within the content either written (e.g. a chat system) or uttered (e.g. in an audio or video communication) by a user about which the other users participating in the communication are not well versed. To study the problem under a laboratory-based reproducible setting, we took a collection of four movie scripts and manually annotated spans of text that may require contextualization. The two tasks involved in RCD track are: a) Task-1:, where participants were required to estimate the annotated span of text likely to be benefited by contextualization from a given sequence of dialogue based interactions from the script; and b) Task-2:, which involved retrieving a ranked list of documents corresponding to the concepts requiring contextualization. To evaluate the effectiveness of Task-1, we used i) a character n-gram based variant of the BLEU score, and ii) bag-of-words based Jaccard coefficient to measure the overlap between the manually annotated ground-truth and the automatically extracted text spans at two different granularity levels of character and word matches, respectively. To evaluate the effectiveness of the retrieved documents for Task-2, we employed two standard precision-oriented information retrieval (IR) metrics, namely precision at top-5 ranks (P@5) and mean reciprocal rank (MRR), along with a both precision and recall oriented metric, namely the mean average precision (MAP). We received a total of 5 submissions from a single participating team for both the tasks. A general trend from the submitted runs is that statistical-based unsupervised approaches of term extraction and summarization from movie scripts turned out to be more effective for both the tasks (i.e. query identification and retrieval) than supervised approaches, such as pre-trained transformer (BERT) based ones.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation

自引率

0.00%

发文量