Can Large Language Models Simulate Spoken Human Conversations?

IF 2.4 2区心理学 Q2 PSYCHOLOGY, EXPERIMENTAL

Cognitive Science Pub Date : 2025-09-01 DOI:10.1111/cogs.70106

Eric Mayor, Lucas M. Bietti, Adrian Bangerter

{"title":"Can Large Language Models Simulate Spoken Human Conversations?","authors":"Eric Mayor, Lucas M. Bietti, Adrian Bangerter","doi":"10.1111/cogs.70106","DOIUrl":null,"url":null,"abstract":"<p>Large language models (LLMs) can emulate many aspects of human cognition and have been heralded as a potential paradigm shift. They are proficient in chat-based conversation, but little is known about their ability to simulate spoken conversation. We investigated whether LLMs can simulate spoken human conversation. In Study 1, we compared transcripts of human telephone conversations from the Switchboard (SB) corpus to six corpora of transcripts generated by two powerful LLMs, GPT-4 and Claude Sonnet 3.5, and two open-source LLMs, Vicuna and Wayfarer, using different prompts designed to mimic SB participants’ instructions. We compared LLM and SB conversations in terms of alignment (conceptual, syntactic, and lexical), coordination markers, and coordination of openings and closings. We also documented qualitative features by which LLM conversations differ from SB conversations. In Study 2, we assessed whether humans can distinguish transcripts produced by LLMs from those of SB conversations. LLM conversations exhibited exaggerated alignment (and an increase in alignment as conversation unfolded) relative to human conversations, different and often inappropriate use of coordination markers, and were dissimilar to human conversations in openings and closings. LLM conversations did not consistently pass for SB conversations. Spoken conversations generated by LLMs are both qualitatively and quantitatively different from those of humans. This issue may evolve with better LLMs and more training on spoken conversation, but may also result from key differences between spoken conversation and chat.</p>","PeriodicalId":48349,"journal":{"name":"Cognitive Science","volume":"49 9","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cogs.70106","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Science","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cogs.70106","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Large language models (LLMs) can emulate many aspects of human cognition and have been heralded as a potential paradigm shift. They are proficient in chat-based conversation, but little is known about their ability to simulate spoken conversation. We investigated whether LLMs can simulate spoken human conversation. In Study 1, we compared transcripts of human telephone conversations from the Switchboard (SB) corpus to six corpora of transcripts generated by two powerful LLMs, GPT-4 and Claude Sonnet 3.5, and two open-source LLMs, Vicuna and Wayfarer, using different prompts designed to mimic SB participants’ instructions. We compared LLM and SB conversations in terms of alignment (conceptual, syntactic, and lexical), coordination markers, and coordination of openings and closings. We also documented qualitative features by which LLM conversations differ from SB conversations. In Study 2, we assessed whether humans can distinguish transcripts produced by LLMs from those of SB conversations. LLM conversations exhibited exaggerated alignment (and an increase in alignment as conversation unfolded) relative to human conversations, different and often inappropriate use of coordination markers, and were dissimilar to human conversations in openings and closings. LLM conversations did not consistently pass for SB conversations. Spoken conversations generated by LLMs are both qualitatively and quantitatively different from those of humans. This issue may evolve with better LLMs and more training on spoken conversation, but may also result from key differences between spoken conversation and chat.

Abstract Image

查看原文本刊更多论文

大型语言模型能模拟人类口语对话吗？

大型语言模型（llm）可以模拟人类认知的许多方面，并被认为是一种潜在的范式转变。他们精通以聊天为基础的对话，但他们模拟口语对话的能力却鲜为人知。我们调查了法学硕士是否可以模拟人类口语对话。在研究1中，我们将来自总机（SB）语料库的人类电话对话文本与两个强大的法学硕士（GPT-4和Claude Sonnet 3.5）以及两个开源法学硕士（Vicuna和Wayfarer）生成的六个语料库进行了比较，使用不同的提示来模仿SB参与者的指令。我们从对齐（概念、句法和词汇）、协调标记以及开始和结束的协调方面比较了LLM和SB会话。我们还记录了LLM对话不同于SB对话的定性特征。在研究2中，我们评估了人类是否能够区分llm产生的转录本和SB对话产生的转录本。相对于人类对话，LLM对话表现出夸张的一致性（随着对话展开而增加一致性），不同且经常不恰当地使用协调标记，并且在开始和结束时与人类对话不同。法学硕士的谈话并不总是通过SB的谈话。法学硕士产生的口语对话在质量和数量上都与人类不同。这个问题可能会随着更好的法学硕士和更多的口语对话培训而发展，但也可能是口语对话和聊天之间的关键差异造成的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cognitive Science PSYCHOLOGY, EXPERIMENTAL-

CiteScore

4.10

自引率

8.00%

发文量

139

期刊介绍： Cognitive Science publishes articles in all areas of cognitive science, covering such topics as knowledge representation, inference, memory processes, learning, problem solving, planning, perception, natural language understanding, connectionism, brain theory, motor control, intentional systems, and other areas of interdisciplinary concern. Highest priority is given to research reports that are specifically written for a multidisciplinary audience. The audience is primarily researchers in cognitive science and its associated fields, including anthropologists, education researchers, psychologists, philosophers, linguists, computer scientists, neuroscientists, and roboticists.