{"title":"中文播客词汇数据库,用于捕捉口语细微差别和预测行为数据。","authors":"Ning Zhao, Lei Lei","doi":"10.3758/s13428-025-02697-0","DOIUrl":null,"url":null,"abstract":"<p><p>This study introduces Chipola, a Chinese Podcast Lexical Database derived from a large-scale collection of Chinese podcast transcripts. Due to the spoken nature of podcasts, such a podcast lexical database can accurately capture the nuances of spoken language in Chinese. Chipola was developed based on a corpus that comprises 31.2 million word tokens and 41.7 million character tokens, featuring a vocabulary of 88,085 unique words and 4,613 unique characters. Lexical variables such as frequency, context diversity, and part-of-speech information are also included. Findings of interest are as follows. First, Chipola captures the spoken Chinese features, such as the core spoken vocabulary. Second, it outperforms other lexical databases in predicting third-party behavioral data. Third, its rich text-level information enables educators to simulate Chinese lexical input on daily podcast listening, which provides pedagogical insights for the overall effects of language exposure. To summarize, Chipola presents an innovative and valuable resource with significant implications and applications in areas such as psychology and language education.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 6","pages":"166"},"PeriodicalIF":4.6000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Chipola: A Chinese Podcast Lexical Database for capturing spoken language nuances and predicting behavioral data.\",\"authors\":\"Ning Zhao, Lei Lei\",\"doi\":\"10.3758/s13428-025-02697-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This study introduces Chipola, a Chinese Podcast Lexical Database derived from a large-scale collection of Chinese podcast transcripts. Due to the spoken nature of podcasts, such a podcast lexical database can accurately capture the nuances of spoken language in Chinese. Chipola was developed based on a corpus that comprises 31.2 million word tokens and 41.7 million character tokens, featuring a vocabulary of 88,085 unique words and 4,613 unique characters. Lexical variables such as frequency, context diversity, and part-of-speech information are also included. Findings of interest are as follows. First, Chipola captures the spoken Chinese features, such as the core spoken vocabulary. Second, it outperforms other lexical databases in predicting third-party behavioral data. Third, its rich text-level information enables educators to simulate Chinese lexical input on daily podcast listening, which provides pedagogical insights for the overall effects of language exposure. To summarize, Chipola presents an innovative and valuable resource with significant implications and applications in areas such as psychology and language education.</p>\",\"PeriodicalId\":8717,\"journal\":{\"name\":\"Behavior Research Methods\",\"volume\":\"57 6\",\"pages\":\"166\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Behavior Research Methods\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.3758/s13428-025-02697-0\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-025-02697-0","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
Chipola: A Chinese Podcast Lexical Database for capturing spoken language nuances and predicting behavioral data.
This study introduces Chipola, a Chinese Podcast Lexical Database derived from a large-scale collection of Chinese podcast transcripts. Due to the spoken nature of podcasts, such a podcast lexical database can accurately capture the nuances of spoken language in Chinese. Chipola was developed based on a corpus that comprises 31.2 million word tokens and 41.7 million character tokens, featuring a vocabulary of 88,085 unique words and 4,613 unique characters. Lexical variables such as frequency, context diversity, and part-of-speech information are also included. Findings of interest are as follows. First, Chipola captures the spoken Chinese features, such as the core spoken vocabulary. Second, it outperforms other lexical databases in predicting third-party behavioral data. Third, its rich text-level information enables educators to simulate Chinese lexical input on daily podcast listening, which provides pedagogical insights for the overall effects of language exposure. To summarize, Chipola presents an innovative and valuable resource with significant implications and applications in areas such as psychology and language education.
期刊介绍:
Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.