Pengfei Hu , Jiefeng Ma , Zhenrong Zhang , Jun Du , Jianshu Zhang
{"title":"Count, decompose and correct: A new approach to handwritten Chinese character error correction","authors":"Pengfei Hu , Jiefeng Ma , Zhenrong Zhang , Jun Du , Jianshu Zhang","doi":"10.1016/j.patcog.2024.111110","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, handwritten Chinese character error correction has been greatly improved by employing encoder–decoder methods to decompose a Chinese character into an ideographic description sequence (IDS). However, existing methods implicitly capture and encode linguistic information inherent in IDS sequences, leading to a tendency to generate IDS sequences that match seen characters. This poses a challenge when dealing with an unseen misspelled character, as the decoder may generate an IDS sequence that matches a seen character instead. Therefore, we introduce Count, Decompose and Correct (CDC), a novel approach that exhibits better generalization towards unseen misspelled characters. CDC is mainly composed of three parts: the Counter, the Decomposer, and the Corrector. In the first stage, the Counter predicts the number of each radical class without the symbol-level position annotations. In the second stage, the Decomposer employs the counting information and generates the IDS sequence step by step. Moreover, by updating the counting information at each time step, the Decomposer becomes aware of the existence of each radical. With the decomposed IDS sequence, we can determine whether the given character is misspelled. If it is misspelled, the Corrector under the transductive transfer learning strategy predicts the ideal character that the user originally intended to write. We integrate our method into existing encoder–decoder models and significantly enhance their performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111110"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324008616","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, handwritten Chinese character error correction has been greatly improved by employing encoder–decoder methods to decompose a Chinese character into an ideographic description sequence (IDS). However, existing methods implicitly capture and encode linguistic information inherent in IDS sequences, leading to a tendency to generate IDS sequences that match seen characters. This poses a challenge when dealing with an unseen misspelled character, as the decoder may generate an IDS sequence that matches a seen character instead. Therefore, we introduce Count, Decompose and Correct (CDC), a novel approach that exhibits better generalization towards unseen misspelled characters. CDC is mainly composed of three parts: the Counter, the Decomposer, and the Corrector. In the first stage, the Counter predicts the number of each radical class without the symbol-level position annotations. In the second stage, the Decomposer employs the counting information and generates the IDS sequence step by step. Moreover, by updating the counting information at each time step, the Decomposer becomes aware of the existence of each radical. With the decomposed IDS sequence, we can determine whether the given character is misspelled. If it is misspelled, the Corrector under the transductive transfer learning strategy predicts the ideal character that the user originally intended to write. We integrate our method into existing encoder–decoder models and significantly enhance their performance.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.