Information-theoretic measures for mapping regularities between orthography and phonology: A comprehensive quantification and validation in the Chinese writing system.
{"title":"Information-theoretic measures for mapping regularities between orthography and phonology: A comprehensive quantification and validation in the Chinese writing system.","authors":"Zhe Xiao, Huimin Xiao, Caihua Xu","doi":"10.3758/s13428-025-02721-3","DOIUrl":null,"url":null,"abstract":"<p><p>Information theory has been widely applied to quantify mapping regularities between orthography and phonology in alphabetic writing systems. However, their applicability to the Chinese writing system-marked by distinct mapping characteristics-remains underexplored. This study presents a comprehensive quantification of mapping regularities in the Chinese writing system using information-theoretic measures and validates their effectiveness. We first compute three core measures-entropy, surprisal, and information gain-across multiple dimensions: mapping direction (orthography-to-phonology vs. phonology-to-orthography), frequency type (type vs. token frequency), and grain size (tonal vs. base syllable; direct vs. fundamental phonetic radical), followed by an assessment of overall system uncertainty using entropy. Second, we demonstrate the ability of information-theoretic measures to predict Chinese reading performance using a large-scale Chinese naming dataset. Third, we show that these measures capture unique aspects of character variability in naming performance not accounted for by traditional measures (i.e., regularity and consistency). Finally, by comparing information-theoretic measures across different mapping directions, frequency data types, and grain sizes, we highlight their flexibility in capturing unique aspects of character variability in naming performance. Critically, these findings hold across both a self-compiled database and an external corpus featuring a substantially larger token pool, underscoring the robustness and generalizability of these measures. In sum, we emphasize the effectiveness and adaptability of information-theoretic measures in capturing mapping regularities in a writing system that is notably distinct from alphabetic systems, and we discuss their promising applications for advancing psychological and linguistic research.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 9","pages":"232"},"PeriodicalIF":3.9000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-025-02721-3","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Information theory has been widely applied to quantify mapping regularities between orthography and phonology in alphabetic writing systems. However, their applicability to the Chinese writing system-marked by distinct mapping characteristics-remains underexplored. This study presents a comprehensive quantification of mapping regularities in the Chinese writing system using information-theoretic measures and validates their effectiveness. We first compute three core measures-entropy, surprisal, and information gain-across multiple dimensions: mapping direction (orthography-to-phonology vs. phonology-to-orthography), frequency type (type vs. token frequency), and grain size (tonal vs. base syllable; direct vs. fundamental phonetic radical), followed by an assessment of overall system uncertainty using entropy. Second, we demonstrate the ability of information-theoretic measures to predict Chinese reading performance using a large-scale Chinese naming dataset. Third, we show that these measures capture unique aspects of character variability in naming performance not accounted for by traditional measures (i.e., regularity and consistency). Finally, by comparing information-theoretic measures across different mapping directions, frequency data types, and grain sizes, we highlight their flexibility in capturing unique aspects of character variability in naming performance. Critically, these findings hold across both a self-compiled database and an external corpus featuring a substantially larger token pool, underscoring the robustness and generalizability of these measures. In sum, we emphasize the effectiveness and adaptability of information-theoretic measures in capturing mapping regularities in a writing system that is notably distinct from alphabetic systems, and we discuss their promising applications for advancing psychological and linguistic research.
期刊介绍:
Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.