{"title":"大型语言模型能模拟人类口语对话吗?","authors":"Eric Mayor, Lucas M. Bietti, Adrian Bangerter","doi":"10.1111/cogs.70106","DOIUrl":null,"url":null,"abstract":"<p>Large language models (LLMs) can emulate many aspects of human cognition and have been heralded as a potential paradigm shift. They are proficient in chat-based conversation, but little is known about their ability to simulate spoken conversation. We investigated whether LLMs can simulate spoken human conversation. In Study 1, we compared transcripts of human telephone conversations from the Switchboard (SB) corpus to six corpora of transcripts generated by two powerful LLMs, GPT-4 and Claude Sonnet 3.5, and two open-source LLMs, Vicuna and Wayfarer, using different prompts designed to mimic SB participants’ instructions. We compared LLM and SB conversations in terms of alignment (conceptual, syntactic, and lexical), coordination markers, and coordination of openings and closings. We also documented qualitative features by which LLM conversations differ from SB conversations. In Study 2, we assessed whether humans can distinguish transcripts produced by LLMs from those of SB conversations. LLM conversations exhibited exaggerated alignment (and an increase in alignment as conversation unfolded) relative to human conversations, different and often inappropriate use of coordination markers, and were dissimilar to human conversations in openings and closings. LLM conversations did not consistently pass for SB conversations. Spoken conversations generated by LLMs are both qualitatively and quantitatively different from those of humans. This issue may evolve with better LLMs and more training on spoken conversation, but may also result from key differences between spoken conversation and chat.</p>","PeriodicalId":48349,"journal":{"name":"Cognitive Science","volume":"49 9","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cogs.70106","citationCount":"0","resultStr":"{\"title\":\"Can Large Language Models Simulate Spoken Human Conversations?\",\"authors\":\"Eric Mayor, Lucas M. Bietti, Adrian Bangerter\",\"doi\":\"10.1111/cogs.70106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Large language models (LLMs) can emulate many aspects of human cognition and have been heralded as a potential paradigm shift. They are proficient in chat-based conversation, but little is known about their ability to simulate spoken conversation. We investigated whether LLMs can simulate spoken human conversation. In Study 1, we compared transcripts of human telephone conversations from the Switchboard (SB) corpus to six corpora of transcripts generated by two powerful LLMs, GPT-4 and Claude Sonnet 3.5, and two open-source LLMs, Vicuna and Wayfarer, using different prompts designed to mimic SB participants’ instructions. We compared LLM and SB conversations in terms of alignment (conceptual, syntactic, and lexical), coordination markers, and coordination of openings and closings. We also documented qualitative features by which LLM conversations differ from SB conversations. In Study 2, we assessed whether humans can distinguish transcripts produced by LLMs from those of SB conversations. LLM conversations exhibited exaggerated alignment (and an increase in alignment as conversation unfolded) relative to human conversations, different and often inappropriate use of coordination markers, and were dissimilar to human conversations in openings and closings. LLM conversations did not consistently pass for SB conversations. Spoken conversations generated by LLMs are both qualitatively and quantitatively different from those of humans. This issue may evolve with better LLMs and more training on spoken conversation, but may also result from key differences between spoken conversation and chat.</p>\",\"PeriodicalId\":48349,\"journal\":{\"name\":\"Cognitive Science\",\"volume\":\"49 9\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cogs.70106\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Science\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/cogs.70106\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Science","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cogs.70106","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
Can Large Language Models Simulate Spoken Human Conversations?
Large language models (LLMs) can emulate many aspects of human cognition and have been heralded as a potential paradigm shift. They are proficient in chat-based conversation, but little is known about their ability to simulate spoken conversation. We investigated whether LLMs can simulate spoken human conversation. In Study 1, we compared transcripts of human telephone conversations from the Switchboard (SB) corpus to six corpora of transcripts generated by two powerful LLMs, GPT-4 and Claude Sonnet 3.5, and two open-source LLMs, Vicuna and Wayfarer, using different prompts designed to mimic SB participants’ instructions. We compared LLM and SB conversations in terms of alignment (conceptual, syntactic, and lexical), coordination markers, and coordination of openings and closings. We also documented qualitative features by which LLM conversations differ from SB conversations. In Study 2, we assessed whether humans can distinguish transcripts produced by LLMs from those of SB conversations. LLM conversations exhibited exaggerated alignment (and an increase in alignment as conversation unfolded) relative to human conversations, different and often inappropriate use of coordination markers, and were dissimilar to human conversations in openings and closings. LLM conversations did not consistently pass for SB conversations. Spoken conversations generated by LLMs are both qualitatively and quantitatively different from those of humans. This issue may evolve with better LLMs and more training on spoken conversation, but may also result from key differences between spoken conversation and chat.
期刊介绍:
Cognitive Science publishes articles in all areas of cognitive science, covering such topics as knowledge representation, inference, memory processes, learning, problem solving, planning, perception, natural language understanding, connectionism, brain theory, motor control, intentional systems, and other areas of interdisciplinary concern. Highest priority is given to research reports that are specifically written for a multidisciplinary audience. The audience is primarily researchers in cognitive science and its associated fields, including anthropologists, education researchers, psychologists, philosophers, linguists, computer scientists, neuroscientists, and roboticists.