Automatic Scribe Attribution for Medieval Manuscripts

Digital Medievalist Pub Date : 2018-12-24 DOI:10.16995/DM.67
Mats Dahllöf
{"title":"Automatic Scribe Attribution for Medieval Manuscripts","authors":"Mats Dahllöf","doi":"10.16995/DM.67","DOIUrl":null,"url":null,"abstract":"We propose an automatic method for attributing manuscript pages to scribes. The system uses digital images as published by libraries. The attribution process involves extracting from each query page approximately letter-size components. This is done by means of binarization (ink-background separation), connected component labelling, and further segmentation, guided by the estimated typical stroke width. Components are extracted in the same way from the pages of known scribal origin. This allows us to assign a scribe to each query component by means of nearest-neighbour classification. Distance (dissimilarity) between components is modelled by simple features capturing the distribution of ink in the bounding box defined by the component, together with Euclidean distance. The set of component-level scribe attributions, which typically includes hundreds of components for a page, is then used to predict the page scribe by means of a voting procedure. The scribe who receives the largest number of votes from the 120 strongest component attributions is proposed as its scribe. The scribe attribution process allows the argument behind an attribution to be visualized for a human reader. The writing components of the query page are exhibited along with the matching components of the known pages. This report is thus open to inspection and analysis using the methods and intuitions of traditional palaeography. The present system was evaluated on a data set covering 46 medieval scribes, writing in Carolingian minuscule, Bastarda, and a few other scripts. The system achieved a mean top-1 accuracy of 98.3% as regards the first scribe proposed for each page, when the labelled data comprised one randomly selected page from each scribe and nine unseen pages for each scribe were to be attributed in the validation procedure. The experiment was repeated 50 times to even out random variation effects.","PeriodicalId":440678,"journal":{"name":"Digital Medievalist","volume":"185 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Medievalist","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.16995/DM.67","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

We propose an automatic method for attributing manuscript pages to scribes. The system uses digital images as published by libraries. The attribution process involves extracting from each query page approximately letter-size components. This is done by means of binarization (ink-background separation), connected component labelling, and further segmentation, guided by the estimated typical stroke width. Components are extracted in the same way from the pages of known scribal origin. This allows us to assign a scribe to each query component by means of nearest-neighbour classification. Distance (dissimilarity) between components is modelled by simple features capturing the distribution of ink in the bounding box defined by the component, together with Euclidean distance. The set of component-level scribe attributions, which typically includes hundreds of components for a page, is then used to predict the page scribe by means of a voting procedure. The scribe who receives the largest number of votes from the 120 strongest component attributions is proposed as its scribe. The scribe attribution process allows the argument behind an attribution to be visualized for a human reader. The writing components of the query page are exhibited along with the matching components of the known pages. This report is thus open to inspection and analysis using the methods and intuitions of traditional palaeography. The present system was evaluated on a data set covering 46 medieval scribes, writing in Carolingian minuscule, Bastarda, and a few other scripts. The system achieved a mean top-1 accuracy of 98.3% as regards the first scribe proposed for each page, when the labelled data comprised one randomly selected page from each scribe and nine unseen pages for each scribe were to be attributed in the validation procedure. The experiment was repeated 50 times to even out random variation effects.
自动抄写归属中世纪手稿
我们提出了一种自动将手稿页归给抄写员的方法。系统采用图书馆发布的数字图像。归因过程包括从每个查询页面中提取大约字母大小的组件。这是通过二值化(油墨背景分离),连接组件标记和进一步分割来完成的,由估计的典型笔画宽度指导。组件以同样的方式从已知的抄写来源的页面中提取。这允许我们通过最近邻分类为每个查询组件分配一个抄写员。组件之间的距离(不相似性)通过捕获组件定义的边界框中油墨分布的简单特征以及欧几里得距离来建模。然后使用组件级抄写员属性集(通常包括一个页面的数百个组件)通过投票过程来预测页面抄写员。从120个最强成分属性中获得最多投票的抄写员被提议为其抄写员。抄写归因过程允许归因背后的论点是可视化的人类读者。查询页面的写入组件与已知页面的匹配组件一起显示。因此,这份报告是开放的,可以使用传统古学的方法和直觉来检查和分析。目前的系统是在一个包含46个中世纪抄写员的数据集上进行评估的,这些抄写员用加洛林小体、巴斯塔达和其他一些文字书写。当标记数据包括从每个抄写员随机选择的一页和每个抄写员在验证程序中归属的9个未见过的页面时,系统在每页建议的第一个抄写员方面实现了98.3%的平均前1准确率。实验重复50次,以均匀随机变异效应。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信