Yang Hong , Xiaojun Qiao , Yinfei Li , Rui Li , Junsong Zhang
{"title":"Improving Chinese character representation with formation tree","authors":"Yang Hong , Xiaojun Qiao , Yinfei Li , Rui Li , Junsong Zhang","doi":"10.1016/j.neucom.2025.130098","DOIUrl":null,"url":null,"abstract":"<div><div>Learning effective representations for Chinese characters presents unique challenges, primarily due to the vast number of characters and their continuous growth, necessitating models that can handle an expanding category space. Additionally, the inherent sparsity of character usage complicates the generalization of learned representations. Prior research has explored radical-based sequences to overcome these issues, achieving progress in recognizing unseen characters. However, these approaches fail to fully exploit the inherent tree structure of such sequences. To address these limitations and leverage established data properties, we propose Formation Tree-CLIP (FT-CLIP). FT-CLIP utilizes formation trees to represent characters and incorporates a dedicated tree encoder, significantly improving performance in both seen and unseen character recognition tasks. We further introduce masking for both character images and tree nodes, enabling efficient and effective training. This approach accelerates training significantly (by a factor of two or more) while enhancing accuracy. Extensive experiments show that processing characters through formation trees aligns better with their inherent properties than direct sequential methods, significantly enhancing the generality and usability of the representations.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"638 ","pages":"Article 130098"},"PeriodicalIF":5.5000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225007702","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Learning effective representations for Chinese characters presents unique challenges, primarily due to the vast number of characters and their continuous growth, necessitating models that can handle an expanding category space. Additionally, the inherent sparsity of character usage complicates the generalization of learned representations. Prior research has explored radical-based sequences to overcome these issues, achieving progress in recognizing unseen characters. However, these approaches fail to fully exploit the inherent tree structure of such sequences. To address these limitations and leverage established data properties, we propose Formation Tree-CLIP (FT-CLIP). FT-CLIP utilizes formation trees to represent characters and incorporates a dedicated tree encoder, significantly improving performance in both seen and unseen character recognition tasks. We further introduce masking for both character images and tree nodes, enabling efficient and effective training. This approach accelerates training significantly (by a factor of two or more) while enhancing accuracy. Extensive experiments show that processing characters through formation trees aligns better with their inherent properties than direct sequential methods, significantly enhancing the generality and usability of the representations.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.