Dual-branch Multitask Fusion Network for Offline Chinese Writer Identification

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2023-12-26 DOI:10.1145/3638554

Haixia Wang, Qingran Miao, Qun Xiao, Yilong Zhang, Yingyu Mao

{"title":"Dual-branch Multitask Fusion Network for Offline Chinese Writer Identification","authors":"Haixia Wang, Qingran Miao, Qun Xiao, Yilong Zhang, Yingyu Mao","doi":"10.1145/3638554","DOIUrl":null,"url":null,"abstract":"<p>Chinese characters are complex and contain discriminative information, meaning that their writers have the potential to be recognized using less text. In this study, offline Chinese writer identification based on a single character was investigated. To extract comprehensive features to model Chinese characters, explicit and implicit information as well as global and local features are of interest. A dual-branch multitask fusion network is proposed which contains two branches for global and local feature extraction simultaneously, and introduces auxiliary tasks to help the main task. Content recognition, stroke number estimation, and stroke recognition are considered as three auxiliary tasks for explicit information. The main task extracts implicit information of writer identity. The experimental results validated the positive influences of auxiliary tasks on the writer identification task, with the stroke number estimation task being most helpful. In-depth research was conducted to investigate the influencing factors in Chinese writer identification, with respect to character complexity, stroke importance, and character number, which provides a systematic reference for the actual application of neural networks in Chinese writer identification.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"18 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3638554","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Chinese characters are complex and contain discriminative information, meaning that their writers have the potential to be recognized using less text. In this study, offline Chinese writer identification based on a single character was investigated. To extract comprehensive features to model Chinese characters, explicit and implicit information as well as global and local features are of interest. A dual-branch multitask fusion network is proposed which contains two branches for global and local feature extraction simultaneously, and introduces auxiliary tasks to help the main task. Content recognition, stroke number estimation, and stroke recognition are considered as three auxiliary tasks for explicit information. The main task extracts implicit information of writer identity. The experimental results validated the positive influences of auxiliary tasks on the writer identification task, with the stroke number estimation task being most helpful. In-depth research was conducted to investigate the influencing factors in Chinese writer identification, with respect to character complexity, stroke importance, and character number, which provides a systematic reference for the actual application of neural networks in Chinese writer identification.

查看原文本刊更多论文

用于离线中文作家识别的双分支多任务融合网络

汉字结构复杂，且包含辨别信息，这意味着可以使用较少的文本识别汉字作家。本研究调查了基于单个汉字的离线中文作家识别。为了提取全面的特征来建立汉字模型，显性和隐性信息以及全局和局部特征都很重要。本研究提出了一种双分支多任务融合网络，它包含两个分支，可同时进行全局和局部特征提取，并引入辅助任务来帮助主任务。内容识别、笔画数估计和笔画识别被视为显性信息的三个辅助任务。主任务则提取作者身份的隐含信息。实验结果验证了辅助任务对作家身份识别任务的积极影响，其中笔画数估计任务的帮助最大。通过深入研究汉字复杂度、笔画重要性和字数对汉字作家识别的影响因素，为神经网络在汉字作家识别中的实际应用提供了系统的参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Asian and Low-Resource Language Information Processing Computer Science-General Computer Science

CiteScore

3.60

自引率

15.00%

发文量

241

期刊介绍： The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.