{"title":"Chipola: A Chinese Podcast Lexical Database for capturing spoken language nuances and predicting behavioral data.","authors":"Ning Zhao, Lei Lei","doi":"10.3758/s13428-025-02697-0","DOIUrl":null,"url":null,"abstract":"<p><p>This study introduces Chipola, a Chinese Podcast Lexical Database derived from a large-scale collection of Chinese podcast transcripts. Due to the spoken nature of podcasts, such a podcast lexical database can accurately capture the nuances of spoken language in Chinese. Chipola was developed based on a corpus that comprises 31.2 million word tokens and 41.7 million character tokens, featuring a vocabulary of 88,085 unique words and 4,613 unique characters. Lexical variables such as frequency, context diversity, and part-of-speech information are also included. Findings of interest are as follows. First, Chipola captures the spoken Chinese features, such as the core spoken vocabulary. Second, it outperforms other lexical databases in predicting third-party behavioral data. Third, its rich text-level information enables educators to simulate Chinese lexical input on daily podcast listening, which provides pedagogical insights for the overall effects of language exposure. To summarize, Chipola presents an innovative and valuable resource with significant implications and applications in areas such as psychology and language education.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 6","pages":"166"},"PeriodicalIF":4.6000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-025-02697-0","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
This study introduces Chipola, a Chinese Podcast Lexical Database derived from a large-scale collection of Chinese podcast transcripts. Due to the spoken nature of podcasts, such a podcast lexical database can accurately capture the nuances of spoken language in Chinese. Chipola was developed based on a corpus that comprises 31.2 million word tokens and 41.7 million character tokens, featuring a vocabulary of 88,085 unique words and 4,613 unique characters. Lexical variables such as frequency, context diversity, and part-of-speech information are also included. Findings of interest are as follows. First, Chipola captures the spoken Chinese features, such as the core spoken vocabulary. Second, it outperforms other lexical databases in predicting third-party behavioral data. Third, its rich text-level information enables educators to simulate Chinese lexical input on daily podcast listening, which provides pedagogical insights for the overall effects of language exposure. To summarize, Chipola presents an innovative and valuable resource with significant implications and applications in areas such as psychology and language education.
期刊介绍:
Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.