The "Naturalistic Free Recall" dataset: four stories, hundreds of participants, and high-fidelity transcriptions.

IF 6.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Data Pub Date : 2024-12-03 DOI:10.1038/s41597-024-04082-6

Omri Raccah, Phoebe Chen, Todd M Gureckis, David Poeppel, Vy A Vo

{"title":"The \"Naturalistic Free Recall\" dataset: four stories, hundreds of participants, and high-fidelity transcriptions.","authors":"Omri Raccah, Phoebe Chen, Todd M Gureckis, David Poeppel, Vy A Vo","doi":"10.1038/s41597-024-04082-6","DOIUrl":null,"url":null,"abstract":"<p><p>The \"Naturalistic Free Recall\" dataset provides transcribed verbal recollections of four spoken narratives collected from 229 participants. Each participant listened to two stories, varying in duration from approximately 8 to 13 minutes, recorded by different speakers. Subsequently, participants were tasked with verbally recalling the narrative content in as much detail as possible and in the correct order. The dataset includes high-fidelity, time-stamped text transcripts of both the original narratives and participants' recollections. To validate the dataset, we apply a previously published automated method to score memory performance for narrative content. Using this approach, we extend effects traditionally observed in classic list-learning paradigms. The analysis of narrative contents and its verbal recollection presents unique challenges compared to controlled list-learning experiments. To facilitate the use of these rich data by the community, we offer an overview of recent computational methods that can be used to annotate and evaluate key properties of narratives and their recollections. Using advancements in machine learning and natural language processing, these methods can help the community understand the role of event structure, discourse properties, prediction error, high-level semantic features (e.g., idioms, humor), and more. All experimental materials, code, and data are publicly available to facilitate new advances in understanding human memory.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1317"},"PeriodicalIF":6.9000,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11615391/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-024-04082-6","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

The "Naturalistic Free Recall" dataset provides transcribed verbal recollections of four spoken narratives collected from 229 participants. Each participant listened to two stories, varying in duration from approximately 8 to 13 minutes, recorded by different speakers. Subsequently, participants were tasked with verbally recalling the narrative content in as much detail as possible and in the correct order. The dataset includes high-fidelity, time-stamped text transcripts of both the original narratives and participants' recollections. To validate the dataset, we apply a previously published automated method to score memory performance for narrative content. Using this approach, we extend effects traditionally observed in classic list-learning paradigms. The analysis of narrative contents and its verbal recollection presents unique challenges compared to controlled list-learning experiments. To facilitate the use of these rich data by the community, we offer an overview of recent computational methods that can be used to annotate and evaluate key properties of narratives and their recollections. Using advancements in machine learning and natural language processing, these methods can help the community understand the role of event structure, discourse properties, prediction error, high-level semantic features (e.g., idioms, humor), and more. All experimental materials, code, and data are publicly available to facilitate new advances in understanding human memory.

Abstract Image

查看原文本刊更多论文

“自然自由回忆”数据集：四个故事，数百名参与者和高保真转录。

“自然自由回忆”数据集提供了从229名参与者中收集的四种口头叙述的转录口头回忆。每个参与者都听了两个故事，时长从8分钟到13分钟不等，由不同的演讲者录制。随后，参与者被要求以正确的顺序尽可能详细地口头回忆叙述内容。该数据集包括高保真的、带有时间戳的原始叙述和参与者回忆的文本抄本。为了验证数据集，我们应用先前发布的自动化方法对叙述性内容的记忆性能进行评分。使用这种方法，我们扩展了传统上在经典列表学习范式中观察到的效果。与对照表学习实验相比，叙述内容及其言语记忆的分析具有独特的挑战性。为了方便社区使用这些丰富的数据，我们概述了最近可用于注释和评估叙事及其回忆的关键属性的计算方法。利用机器学习和自然语言处理的进步，这些方法可以帮助社区理解事件结构、话语属性、预测误差、高级语义特征（例如成语、幽默）等的作用。所有的实验材料、代码和数据都是公开的，以促进对人类记忆理解的新进展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientific Data Social Sciences-Education

CiteScore

11.20

自引率

4.10%

发文量

689

审稿时长

16 weeks

期刊介绍： Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data. The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.