Modeling memories, predicting prospections: Automated scoring of autobiographical detail narration using large language models.

IF 3.9 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods Pub Date : 2025-08-01 DOI:10.3758/s13428-025-02767-3

Jonas Klus, Daniel E Cohen, Alexis N Garcia, Sarah Hennessy, Matthias R Mehl, Jessica R Andrews-Hanna, Matthew D Grilli

{"title":"Modeling memories, predicting prospections: Automated scoring of autobiographical detail narration using large language models.","authors":"Jonas Klus, Daniel E Cohen, Alexis N Garcia, Sarah Hennessy, Matthias R Mehl, Jessica R Andrews-Hanna, Matthew D Grilli","doi":"10.3758/s13428-025-02767-3","DOIUrl":null,"url":null,"abstract":"<p><p>The autobiographical interview is a widely used tool for examining memory and related cognitive functions. It provides a standardized framework to differentiate between internal details, representing the episodic features of specific events, and external details, including semantic knowledge and other non-episodic information. This study introduces an automated scoring model for autobiographical memory and future thinking tasks, using large language models (LLMs) that can analyze personal event narratives without preprocessing. Building on the traditional autobiographical interview protocol, we fine-tuned a LLaMA-3 model to identify internal and external details at a narrative level. The model was trained and tested on narratives from 284 participants across three studies, spanning past and future thinking tasks, multiple age groups, and collected in lab and virtual interviews. Results demonstrate strong correlations with human scores of up to r = 0.87 on internal and up to r = 0.84 on external details, indicating the model aligns as closely with human raters as they do with each other. Additionally, as evidence of the algorithm's construct validity, the model replicated known age-related trends wherein cognitively normal older adults generate fewer internal and more external details than younger adults across three datasets, finding this age group difference even in one dataset where human raters did not. This automated approach offers a scalable alternative to manual scoring, making large-scale studies of human autobiographical memory more feasible. To facilitate access for researchers, we created a Jupyter Notebook with the automated model and instructions for applying it to new narratives.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 9","pages":"245"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12321244/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-025-02767-3","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

The autobiographical interview is a widely used tool for examining memory and related cognitive functions. It provides a standardized framework to differentiate between internal details, representing the episodic features of specific events, and external details, including semantic knowledge and other non-episodic information. This study introduces an automated scoring model for autobiographical memory and future thinking tasks, using large language models (LLMs) that can analyze personal event narratives without preprocessing. Building on the traditional autobiographical interview protocol, we fine-tuned a LLaMA-3 model to identify internal and external details at a narrative level. The model was trained and tested on narratives from 284 participants across three studies, spanning past and future thinking tasks, multiple age groups, and collected in lab and virtual interviews. Results demonstrate strong correlations with human scores of up to r = 0.87 on internal and up to r = 0.84 on external details, indicating the model aligns as closely with human raters as they do with each other. Additionally, as evidence of the algorithm's construct validity, the model replicated known age-related trends wherein cognitively normal older adults generate fewer internal and more external details than younger adults across three datasets, finding this age group difference even in one dataset where human raters did not. This automated approach offers a scalable alternative to manual scoring, making large-scale studies of human autobiographical memory more feasible. To facilitate access for researchers, we created a Jupyter Notebook with the automated model and instructions for applying it to new narratives.

查看原文本刊更多论文

记忆建模，预测前景：使用大型语言模型的自传体细节叙述自动评分。

自传式访谈是一种广泛应用于记忆和相关认知功能研究的工具。它提供了一个标准化的框架来区分内部细节（代表特定事件的情景特征）和外部细节（包括语义知识和其他非情景信息）。本研究引入了一种自传体记忆和未来思考任务的自动评分模型，该模型使用大型语言模型（llm），可以分析个人事件叙述而无需预处理。在传统自传式访谈协议的基础上，我们对LLaMA-3模型进行了微调，以在叙事层面上识别内部和外部细节。该模型在三个研究中对284名参与者的叙述进行了训练和测试，这些参与者跨越了过去和未来的思考任务，涉及多个年龄组，并通过实验室和虚拟访谈收集。结果表明，该模型与人类评分之间存在很强的相关性，在内部细节上的相关性高达r = 0.87，在外部细节上的相关性高达r = 0.84，表明该模型与人类评分者之间的相关性与人类评分者之间的相关性一样密切。此外，作为算法结构有效性的证据，该模型复制了已知的年龄相关趋势，其中认知正常的老年人在三个数据集中比年轻人产生更少的内部细节和更多的外部细节，即使在人类评分者没有的一个数据集中也发现了这种年龄组差异。这种自动化的方法为人工评分提供了一种可扩展的替代方案，使人类自传式记忆的大规模研究更加可行。为了方便研究人员访问，我们创建了一个Jupyter Notebook，其中包含自动模型和将其应用于新叙述的说明。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Behavior Research Methods Multiple-

CiteScore

10.30

自引率

9.30%

发文量

266

期刊介绍： Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.