棕榈叶手稿识别中的多低资源语言：基于音节的增强和错误分析

IF 3.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters Pub Date : 2025-05-13 DOI:10.1016/j.patrec.2025.04.031

Nimol Thuon , Jun Du , Panhapin Theang , Ranysakol Thuon

{"title":"棕榈叶手稿识别中的多低资源语言：基于音节的增强和错误分析","authors":"Nimol Thuon , Jun Du , Panhapin Theang , Ranysakol Thuon","doi":"10.1016/j.patrec.2025.04.031","DOIUrl":null,"url":null,"abstract":"<div><div>Recognizing text from palm leaf manuscripts in low-resource, non-Latin languages like Balinese, Khmer, and Sundanese poses significant challenges due to limited annotated data and complex structures. Unlike modern languages, these ancient scripts exhibit unique linguistic complexities that hinder effective recognition and digital preservation. Building on the success of syllable analysis augmentation for the Khmer script, we propose a framework, PALM-SADA, for multi-script recognition. PALM-SADA integrates visual and linguistic processing using a hybrid CNN-Transformer architecture. The framework introduces syllable analysis augmentation techniques, consisting of two main components. (1) Monosyllabic synthesis generates single-syllable words by combining glyphs from isolated glyph datasets using predefined grammar forms. And (2) Polysyllabic synthesis creates longer, grammatically correct text sequences by combining monosyllabic words and isolated glyphs. To ensure linguistic integrity, grammar forms and vocabulary lists of complete words were meticulously designed and validated, preserving the linguistic characteristics of the augmented data. For recognition, PALM-SADA employs a hybrid CNN-Transformer network that enhances both feature extraction and transcription accuracy. CNN layers capture local features, while Transformer layers model global dependencies. A Transformer-based decoder further refines transcriptions by leveraging contextual relationships within the text. Experiments conducted on the ICFHR 2018 contest datasets demonstrate that PALM-SADA significantly outperforms existing methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"195 ","pages":"Pages 8-15"},"PeriodicalIF":3.3000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-low resource languages in palm leaf manuscript recognition: Syllable-based augmentation and error analysis\",\"authors\":\"Nimol Thuon , Jun Du , Panhapin Theang , Ranysakol Thuon\",\"doi\":\"10.1016/j.patrec.2025.04.031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recognizing text from palm leaf manuscripts in low-resource, non-Latin languages like Balinese, Khmer, and Sundanese poses significant challenges due to limited annotated data and complex structures. Unlike modern languages, these ancient scripts exhibit unique linguistic complexities that hinder effective recognition and digital preservation. Building on the success of syllable analysis augmentation for the Khmer script, we propose a framework, PALM-SADA, for multi-script recognition. PALM-SADA integrates visual and linguistic processing using a hybrid CNN-Transformer architecture. The framework introduces syllable analysis augmentation techniques, consisting of two main components. (1) Monosyllabic synthesis generates single-syllable words by combining glyphs from isolated glyph datasets using predefined grammar forms. And (2) Polysyllabic synthesis creates longer, grammatically correct text sequences by combining monosyllabic words and isolated glyphs. To ensure linguistic integrity, grammar forms and vocabulary lists of complete words were meticulously designed and validated, preserving the linguistic characteristics of the augmented data. For recognition, PALM-SADA employs a hybrid CNN-Transformer network that enhances both feature extraction and transcription accuracy. CNN layers capture local features, while Transformer layers model global dependencies. A Transformer-based decoder further refines transcriptions by leveraging contextual relationships within the text. Experiments conducted on the ICFHR 2018 contest datasets demonstrate that PALM-SADA significantly outperforms existing methods.</div></div>\",\"PeriodicalId\":54638,\"journal\":{\"name\":\"Pattern Recognition Letters\",\"volume\":\"195 \",\"pages\":\"Pages 8-15\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167865525001734\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525001734","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

由于有限的注释数据和复杂的结构，从资源匮乏的非拉丁语言（如巴厘语、高棉语和巽他语）的棕榈叶手稿中识别文本面临着巨大的挑战。与现代语言不同，这些古代文字表现出独特的语言复杂性，阻碍了有效的识别和数字保存。在高棉文字音节分析增强成功的基础上，我们提出了一个用于多文字识别的框架PALM-SADA。PALM-SADA使用CNN-Transformer混合架构集成了视觉和语言处理。该框架介绍了音节分析增强技术，主要由两个部分组成。(1)单音节合成通过使用预定义的语法形式，将孤立的字形数据集中的字形组合在一起，生成单音节单词。(2)多音节合成通过将单音节单词和孤立的字形组合在一起，创造出更长、语法正确的文本序列。为了保证语言的完整性，我们精心设计和验证了完整单词的语法形式和词汇表，保留了增强数据的语言特征。在识别方面，PALM-SADA采用了CNN-Transformer混合网络，提高了特征提取和转录精度。CNN层捕获局部特征，而Transformer层建模全局依赖关系。基于转换器的解码器通过利用文本中的上下文关系进一步细化转录。在ICFHR 2018比赛数据集上进行的实验表明，PALM-SADA显著优于现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Multi-low resource languages in palm leaf manuscript recognition: Syllable-based augmentation and error analysis

查看原文本刊更多论文

Multi-low resource languages in palm leaf manuscript recognition: Syllable-based augmentation and error analysis

Recognizing text from palm leaf manuscripts in low-resource, non-Latin languages like Balinese, Khmer, and Sundanese poses significant challenges due to limited annotated data and complex structures. Unlike modern languages, these ancient scripts exhibit unique linguistic complexities that hinder effective recognition and digital preservation. Building on the success of syllable analysis augmentation for the Khmer script, we propose a framework, PALM-SADA, for multi-script recognition. PALM-SADA integrates visual and linguistic processing using a hybrid CNN-Transformer architecture. The framework introduces syllable analysis augmentation techniques, consisting of two main components. (1) Monosyllabic synthesis generates single-syllable words by combining glyphs from isolated glyph datasets using predefined grammar forms. And (2) Polysyllabic synthesis creates longer, grammatically correct text sequences by combining monosyllabic words and isolated glyphs. To ensure linguistic integrity, grammar forms and vocabulary lists of complete words were meticulously designed and validated, preserving the linguistic characteristics of the augmented data. For recognition, PALM-SADA employs a hybrid CNN-Transformer network that enhances both feature extraction and transcription accuracy. CNN layers capture local features, while Transformer layers model global dependencies. A Transformer-based decoder further refines transcriptions by leveraging contextual relationships within the text. Experiments conducted on the ICFHR 2018 contest datasets demonstrate that PALM-SADA significantly outperforms existing methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition Letters 工程技术-计算机：人工智能

CiteScore

12.40

自引率

5.90%

发文量

287

审稿时长

9.1 months

期刊介绍： Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.