Can Language Models Trained on Written Monologue Learn to Predict Spoken Dialogue?

IF 2.4 2区心理学 Q2 PSYCHOLOGY, EXPERIMENTAL

Cognitive Science Pub Date : 2024-11-26 DOI:10.1111/cogs.70013

Muhammad Umair, Julia B. Mertens, Lena Warnke, Jan P. de Ruiter

{"title":"Can Language Models Trained on Written Monologue Learn to Predict Spoken Dialogue?","authors":"Muhammad Umair, Julia B. Mertens, Lena Warnke, Jan P. de Ruiter","doi":"10.1111/cogs.70013","DOIUrl":null,"url":null,"abstract":"Transformer-based Large Language Models (LLMs) have recently increased in popularity, in part due to their impressive performance on a number of language tasks. While LLMs can produce human-like writing, the extent to which these models can learn to predict spoken language in natural interaction remains unclear. This is a nontrivial question, as spoken and written language differ in syntax, pragmatics, and norms that interlocutors follow. Previous work suggests that while LLMs may develop an understanding of linguistic rules based on statistical regularities, they fail to acquire the knowledge required for language use. This implies that LLMs may not learn the normative structure underlying interactive spoken language, but may instead only model superficial regularities in speech. In this paper, we aim to evaluate LLMs as models of spoken dialogue. Specifically, we investigate whether LLMs can learn that the identity of a speaker in spoken dialogue influences what is likely to be said. To answer this question, we first fine-tuned two variants of a specific LLM (GPT-2) on transcripts of natural spoken dialogue in English. Then, we used these models to compute surprisal values for two-turn sequences with the same first-turn but different second-turn speakers and compared the output to human behavioral data. While the predictability of words in all fine-tuned models was influenced by speaker identity information, the models did not replicate humans' use of this information. Our findings suggest that although LLMs may learn to generate text conforming to normative linguistic structure, they do not (yet) faithfully replicate human behavior in natural conversation.","PeriodicalId":48349,"journal":{"name":"Cognitive Science","volume":"48 11","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Science","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cogs.70013","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Transformer-based Large Language Models (LLMs) have recently increased in popularity, in part due to their impressive performance on a number of language tasks. While LLMs can produce human-like writing, the extent to which these models can learn to predict spoken language in natural interaction remains unclear. This is a nontrivial question, as spoken and written language differ in syntax, pragmatics, and norms that interlocutors follow. Previous work suggests that while LLMs may develop an understanding of linguistic rules based on statistical regularities, they fail to acquire the knowledge required for language use. This implies that LLMs may not learn the normative structure underlying interactive spoken language, but may instead only model superficial regularities in speech. In this paper, we aim to evaluate LLMs as models of spoken dialogue. Specifically, we investigate whether LLMs can learn that the identity of a speaker in spoken dialogue influences what is likely to be said. To answer this question, we first fine-tuned two variants of a specific LLM (GPT-2) on transcripts of natural spoken dialogue in English. Then, we used these models to compute surprisal values for two-turn sequences with the same first-turn but different second-turn speakers and compared the output to human behavioral data. While the predictability of words in all fine-tuned models was influenced by speaker identity information, the models did not replicate humans' use of this information. Our findings suggest that although LLMs may learn to generate text conforming to normative linguistic structure, they do not (yet) faithfully replicate human behavior in natural conversation.

查看原文本刊更多论文

在书面独白中训练的语言模型能否学会预测口语对话？

基于变换器的大型语言模型（LLMs）最近越来越受欢迎，部分原因是它们在一些语言任务中表现出色。虽然 LLM 可以生成类似人类的书写，但这些模型能在多大程度上学会预测自然交互中的口语仍不清楚。这是一个非同小可的问题，因为口语和书面语在句法、语用学和对话者遵循的规范方面存在差异。以往的研究表明，虽然语言学习者可以根据统计规律理解语言规则，但他们无法获得语言使用所需的知识。这意味着 LLMs 可能无法学习交互式口语背后的规范结构，而只能模拟言语中的表面规律。本文旨在评估作为口语对话模型的 LLMs。具体来说，我们研究 LLMs 是否能够学习到，在口语对话中，说话者的身份会影响可能会说的话。为了回答这个问题，我们首先在英语自然口语对话的记录本上对特定 LLM（GPT-2）的两个变体进行了微调。然后，我们使用这些模型计算了具有相同第一轮但不同第二轮说话人的两轮序列的意外值，并将输出结果与人类行为数据进行了比较。虽然所有微调模型中单词的可预测性都受到说话者身份信息的影响，但这些模型并没有复制人类对这一信息的使用。我们的研究结果表明，尽管 LLMs 可以学习生成符合规范语言结构的文本，但它们（目前）还不能忠实地复制人类在自然对话中的行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cognitive Science PSYCHOLOGY, EXPERIMENTAL-

CiteScore

4.10

自引率

8.00%

发文量

139

期刊介绍： Cognitive Science publishes articles in all areas of cognitive science, covering such topics as knowledge representation, inference, memory processes, learning, problem solving, planning, perception, natural language understanding, connectionism, brain theory, motor control, intentional systems, and other areas of interdisciplinary concern. Highest priority is given to research reports that are specifically written for a multidisciplinary audience. The audience is primarily researchers in cognitive science and its associated fields, including anthropologists, education researchers, psychologists, philosophers, linguists, computer scientists, neuroscientists, and roboticists.