{"title":"在人类和大型语言模型中积极使用潜在的树状结构句子表示","authors":"Wei Liu, Ming Xiang, Nai Ding","doi":"10.1038/s41562-025-02297-0","DOIUrl":null,"url":null,"abstract":"<p>Understanding how sentences are represented in the human brain, as well as in large language models (LLMs), poses a substantial challenge for cognitive science. Here we develop a one-shot learning task to investigate whether humans and LLMs encode tree-structured constituents within sentences. Participants (total <i>N</i> = 372, native Chinese or English speakers, and bilingual in Chinese and English) and LLMs (for example, ChatGPT) were asked to infer which words should be deleted from a sentence. Both groups tend to delete constituents, instead of non-constituent word strings, following rules specific to Chinese and English, respectively. The results cannot be explained by models that rely only on word properties and word positions. Crucially, based on word strings deleted by either humans or LLMs, the underlying constituency tree structure can be successfully reconstructed. Altogether, these results demonstrate that latent tree-structured sentence representations emerge in both humans and LLMs.</p>","PeriodicalId":19074,"journal":{"name":"Nature Human Behaviour","volume":"35 1","pages":""},"PeriodicalIF":15.9000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Active use of latent tree-structured sentence representation in humans and large language models\",\"authors\":\"Wei Liu, Ming Xiang, Nai Ding\",\"doi\":\"10.1038/s41562-025-02297-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Understanding how sentences are represented in the human brain, as well as in large language models (LLMs), poses a substantial challenge for cognitive science. Here we develop a one-shot learning task to investigate whether humans and LLMs encode tree-structured constituents within sentences. Participants (total <i>N</i> = 372, native Chinese or English speakers, and bilingual in Chinese and English) and LLMs (for example, ChatGPT) were asked to infer which words should be deleted from a sentence. Both groups tend to delete constituents, instead of non-constituent word strings, following rules specific to Chinese and English, respectively. The results cannot be explained by models that rely only on word properties and word positions. Crucially, based on word strings deleted by either humans or LLMs, the underlying constituency tree structure can be successfully reconstructed. Altogether, these results demonstrate that latent tree-structured sentence representations emerge in both humans and LLMs.</p>\",\"PeriodicalId\":19074,\"journal\":{\"name\":\"Nature Human Behaviour\",\"volume\":\"35 1\",\"pages\":\"\"},\"PeriodicalIF\":15.9000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Human Behaviour\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1038/s41562-025-02297-0\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Human Behaviour","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1038/s41562-025-02297-0","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
Active use of latent tree-structured sentence representation in humans and large language models
Understanding how sentences are represented in the human brain, as well as in large language models (LLMs), poses a substantial challenge for cognitive science. Here we develop a one-shot learning task to investigate whether humans and LLMs encode tree-structured constituents within sentences. Participants (total N = 372, native Chinese or English speakers, and bilingual in Chinese and English) and LLMs (for example, ChatGPT) were asked to infer which words should be deleted from a sentence. Both groups tend to delete constituents, instead of non-constituent word strings, following rules specific to Chinese and English, respectively. The results cannot be explained by models that rely only on word properties and word positions. Crucially, based on word strings deleted by either humans or LLMs, the underlying constituency tree structure can be successfully reconstructed. Altogether, these results demonstrate that latent tree-structured sentence representations emerge in both humans and LLMs.
期刊介绍:
Nature Human Behaviour is a journal that focuses on publishing research of outstanding significance into any aspect of human behavior.The research can cover various areas such as psychological, biological, and social bases of human behavior.It also includes the study of origins, development, and disorders related to human behavior.The primary aim of the journal is to increase the visibility of research in the field and enhance its societal reach and impact.