Nan Xue PhD (student) , Jimin Wang PhD (Professor)
{"title":"HSK六级听力材料与媒体音频分级词汇特征对比研究及分级词表分析","authors":"Nan Xue PhD (student) , Jimin Wang PhD (Professor)","doi":"10.1016/j.acorp.2025.100143","DOIUrl":null,"url":null,"abstract":"<div><div>Vocabulary familiarity plays a critical role in Chinese language learners’ listening comprehension. This study compares HSK Level 6 listening materials (∼50,000 tokens) and transcribed media audio texts (∼100,000 tokens), using the graded word lists from the Standards for Chinese Language Proficiency in International Chinese Education. Applying Python and the Language Technology Platform (LTP) for segmentation and automated processing, the study calculates the proportions of vocabulary across levels. Results reveal no significant differences in graded word coverage between the two corpora, but both contain a substantial proportion of unclassified words, indicating limited coverage by current word lists. Frequency analysis also shows underuse of many listed words. These findings highlight the need to enhance graded word lists through corpus-based NLP techniques and suggest that topic type may influence vocabulary distribution in listening texts.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100143"},"PeriodicalIF":2.1000,"publicationDate":"2025-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Study of Graded Vocabulary Features in HSK Level 6 Listening Materials and Media Audio, and an Analysis of the Graded Word List\",\"authors\":\"Nan Xue PhD (student) , Jimin Wang PhD (Professor)\",\"doi\":\"10.1016/j.acorp.2025.100143\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Vocabulary familiarity plays a critical role in Chinese language learners’ listening comprehension. This study compares HSK Level 6 listening materials (∼50,000 tokens) and transcribed media audio texts (∼100,000 tokens), using the graded word lists from the Standards for Chinese Language Proficiency in International Chinese Education. Applying Python and the Language Technology Platform (LTP) for segmentation and automated processing, the study calculates the proportions of vocabulary across levels. Results reveal no significant differences in graded word coverage between the two corpora, but both contain a substantial proportion of unclassified words, indicating limited coverage by current word lists. Frequency analysis also shows underuse of many listed words. These findings highlight the need to enhance graded word lists through corpus-based NLP techniques and suggest that topic type may influence vocabulary distribution in listening texts.</div></div>\",\"PeriodicalId\":72254,\"journal\":{\"name\":\"Applied Corpus Linguistics\",\"volume\":\"5 3\",\"pages\":\"Article 100143\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Corpus Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666799125000267\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799125000267","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparative Study of Graded Vocabulary Features in HSK Level 6 Listening Materials and Media Audio, and an Analysis of the Graded Word List
Vocabulary familiarity plays a critical role in Chinese language learners’ listening comprehension. This study compares HSK Level 6 listening materials (∼50,000 tokens) and transcribed media audio texts (∼100,000 tokens), using the graded word lists from the Standards for Chinese Language Proficiency in International Chinese Education. Applying Python and the Language Technology Platform (LTP) for segmentation and automated processing, the study calculates the proportions of vocabulary across levels. Results reveal no significant differences in graded word coverage between the two corpora, but both contain a substantial proportion of unclassified words, indicating limited coverage by current word lists. Frequency analysis also shows underuse of many listed words. These findings highlight the need to enhance graded word lists through corpus-based NLP techniques and suggest that topic type may influence vocabulary distribution in listening texts.