Nimol Thuon , Jun Du , Panhapin Theang , Ranysakol Thuon
{"title":"棕榈叶手稿识别中的多低资源语言:基于音节的增强和错误分析","authors":"Nimol Thuon , Jun Du , Panhapin Theang , Ranysakol Thuon","doi":"10.1016/j.patrec.2025.04.031","DOIUrl":null,"url":null,"abstract":"<div><div>Recognizing text from palm leaf manuscripts in low-resource, non-Latin languages like Balinese, Khmer, and Sundanese poses significant challenges due to limited annotated data and complex structures. Unlike modern languages, these ancient scripts exhibit unique linguistic complexities that hinder effective recognition and digital preservation. Building on the success of syllable analysis augmentation for the Khmer script, we propose a framework, PALM-SADA, for multi-script recognition. PALM-SADA integrates visual and linguistic processing using a hybrid CNN-Transformer architecture. The framework introduces syllable analysis augmentation techniques, consisting of two main components. (1) Monosyllabic synthesis generates single-syllable words by combining glyphs from isolated glyph datasets using predefined grammar forms. And (2) Polysyllabic synthesis creates longer, grammatically correct text sequences by combining monosyllabic words and isolated glyphs. To ensure linguistic integrity, grammar forms and vocabulary lists of complete words were meticulously designed and validated, preserving the linguistic characteristics of the augmented data. For recognition, PALM-SADA employs a hybrid CNN-Transformer network that enhances both feature extraction and transcription accuracy. CNN layers capture local features, while Transformer layers model global dependencies. A Transformer-based decoder further refines transcriptions by leveraging contextual relationships within the text. Experiments conducted on the ICFHR 2018 contest datasets demonstrate that PALM-SADA significantly outperforms existing methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"195 ","pages":"Pages 8-15"},"PeriodicalIF":3.9000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-low resource languages in palm leaf manuscript recognition: Syllable-based augmentation and error analysis\",\"authors\":\"Nimol Thuon , Jun Du , Panhapin Theang , Ranysakol Thuon\",\"doi\":\"10.1016/j.patrec.2025.04.031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recognizing text from palm leaf manuscripts in low-resource, non-Latin languages like Balinese, Khmer, and Sundanese poses significant challenges due to limited annotated data and complex structures. Unlike modern languages, these ancient scripts exhibit unique linguistic complexities that hinder effective recognition and digital preservation. Building on the success of syllable analysis augmentation for the Khmer script, we propose a framework, PALM-SADA, for multi-script recognition. PALM-SADA integrates visual and linguistic processing using a hybrid CNN-Transformer architecture. The framework introduces syllable analysis augmentation techniques, consisting of two main components. (1) Monosyllabic synthesis generates single-syllable words by combining glyphs from isolated glyph datasets using predefined grammar forms. And (2) Polysyllabic synthesis creates longer, grammatically correct text sequences by combining monosyllabic words and isolated glyphs. To ensure linguistic integrity, grammar forms and vocabulary lists of complete words were meticulously designed and validated, preserving the linguistic characteristics of the augmented data. For recognition, PALM-SADA employs a hybrid CNN-Transformer network that enhances both feature extraction and transcription accuracy. CNN layers capture local features, while Transformer layers model global dependencies. A Transformer-based decoder further refines transcriptions by leveraging contextual relationships within the text. Experiments conducted on the ICFHR 2018 contest datasets demonstrate that PALM-SADA significantly outperforms existing methods.</div></div>\",\"PeriodicalId\":54638,\"journal\":{\"name\":\"Pattern Recognition Letters\",\"volume\":\"195 \",\"pages\":\"Pages 8-15\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167865525001734\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525001734","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Multi-low resource languages in palm leaf manuscript recognition: Syllable-based augmentation and error analysis
Recognizing text from palm leaf manuscripts in low-resource, non-Latin languages like Balinese, Khmer, and Sundanese poses significant challenges due to limited annotated data and complex structures. Unlike modern languages, these ancient scripts exhibit unique linguistic complexities that hinder effective recognition and digital preservation. Building on the success of syllable analysis augmentation for the Khmer script, we propose a framework, PALM-SADA, for multi-script recognition. PALM-SADA integrates visual and linguistic processing using a hybrid CNN-Transformer architecture. The framework introduces syllable analysis augmentation techniques, consisting of two main components. (1) Monosyllabic synthesis generates single-syllable words by combining glyphs from isolated glyph datasets using predefined grammar forms. And (2) Polysyllabic synthesis creates longer, grammatically correct text sequences by combining monosyllabic words and isolated glyphs. To ensure linguistic integrity, grammar forms and vocabulary lists of complete words were meticulously designed and validated, preserving the linguistic characteristics of the augmented data. For recognition, PALM-SADA employs a hybrid CNN-Transformer network that enhances both feature extraction and transcription accuracy. CNN layers capture local features, while Transformer layers model global dependencies. A Transformer-based decoder further refines transcriptions by leveraging contextual relationships within the text. Experiments conducted on the ICFHR 2018 contest datasets demonstrate that PALM-SADA significantly outperforms existing methods.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.