Théodore Bluche, D. Stutzmann, Christopher Kermorvant
{"title":"Automatic Handwritten Character Segmentation for Paleographical Character Shape Analysis","authors":"Théodore Bluche, D. Stutzmann, Christopher Kermorvant","doi":"10.1109/DAS.2016.74","DOIUrl":null,"url":null,"abstract":"Written texts are both physical (signs, shapes and graphical systems) and abstract objects (ideas), whose meanings and social connotations evolve through time. To study this dual nature of texts, palaeographers need to analyse large scale corpora at the finest granularity, such as character shape. This goal can only be reached through an automatic segmentation process. In this paper, we present a method, based on Handwritten Text Recognition, to automatically align images of digitized manuscripts with texts from scholarly editions, at the levels of page, column, line, word, and character. It has been successfully applied to two datasets of medieval manuscripts, which are now almost fully segmented at character level. The quality of the word and character segmentations are evaluated and further palaeographical analysis are presented.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2016.74","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Written texts are both physical (signs, shapes and graphical systems) and abstract objects (ideas), whose meanings and social connotations evolve through time. To study this dual nature of texts, palaeographers need to analyse large scale corpora at the finest granularity, such as character shape. This goal can only be reached through an automatic segmentation process. In this paper, we present a method, based on Handwritten Text Recognition, to automatically align images of digitized manuscripts with texts from scholarly editions, at the levels of page, column, line, word, and character. It has been successfully applied to two datasets of medieval manuscripts, which are now almost fully segmented at character level. The quality of the word and character segmentations are evaluated and further palaeographical analysis are presented.