具有多个字体组的早期印刷书籍的页面数据集

Proceedings of the 5th International Workshop on Historical Document Imaging and Processing Pub Date : 2019-09-20 DOI:10.1145/3352631.3352640

Mathias Seuret, Saskia Limbach, Nikolaus Weichselbaumer, A. Maier, V. Christlein

{"title":"具有多个字体组的早期印刷书籍的页面数据集","authors":"Mathias Seuret, Saskia Limbach, Nikolaus Weichselbaumer, A. Maier, V. Christlein","doi":"10.1145/3352631.3352640","DOIUrl":null,"url":null,"abstract":"Based on contemporary scripts, early printers developed a large variety of different fonts. While fonts may slightly differ from one printer to another, they can be divided into font groups, such as Textura, Antiqua, or Fraktur. The recognition of font groups is important for computer scientists to select adequate OCR models, and of high interest to humanities scholars studying early printed books and the history of fonts. In this paper, we introduce a new, public dataset for the recognition of font groups in early printed books, and evaluate several state-of-the-art CNNs for the font group recognition task. The dataset consists of more than 35 600 page images, each page showing up to five different font groups, of which ten are considered in this dataset.","PeriodicalId":174440,"journal":{"name":"Proceedings of the 5th International Workshop on Historical Document Imaging and Processing","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Dataset of Pages from Early Printed Books with Multiple Font Groups\",\"authors\":\"Mathias Seuret, Saskia Limbach, Nikolaus Weichselbaumer, A. Maier, V. Christlein\",\"doi\":\"10.1145/3352631.3352640\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Based on contemporary scripts, early printers developed a large variety of different fonts. While fonts may slightly differ from one printer to another, they can be divided into font groups, such as Textura, Antiqua, or Fraktur. The recognition of font groups is important for computer scientists to select adequate OCR models, and of high interest to humanities scholars studying early printed books and the history of fonts. In this paper, we introduce a new, public dataset for the recognition of font groups in early printed books, and evaluate several state-of-the-art CNNs for the font group recognition task. The dataset consists of more than 35 600 page images, each page showing up to five different font groups, of which ten are considered in this dataset.\",\"PeriodicalId\":174440,\"journal\":{\"name\":\"Proceedings of the 5th International Workshop on Historical Document Imaging and Processing\",\"volume\":\"148 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th International Workshop on Historical Document Imaging and Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3352631.3352640\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Workshop on Historical Document Imaging and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3352631.3352640","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

早期的印刷商以当时的文字为基础，开发了各种不同的字体。虽然不同打印机的字体可能略有不同，但它们可以分为字体组，如Textura、Antiqua或Fraktur。字体组的识别对于计算机科学家选择适当的OCR模型非常重要，也是研究早期印刷书籍和字体历史的人文学者的高度兴趣。在本文中，我们引入了一个新的公共数据集，用于识别早期印刷书籍中的字体组，并评估了几个最先进的cnn用于字体组识别任务。该数据集由超过35600个页面图像组成，每个页面最多显示5个不同的字体组，其中10个在这个数据集中被考虑。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dataset of Pages from Early Printed Books with Multiple Font Groups

Based on contemporary scripts, early printers developed a large variety of different fonts. While fonts may slightly differ from one printer to another, they can be divided into font groups, such as Textura, Antiqua, or Fraktur. The recognition of font groups is important for computer scientists to select adequate OCR models, and of high interest to humanities scholars studying early printed books and the history of fonts. In this paper, we introduce a new, public dataset for the recognition of font groups in early printed books, and evaluate several state-of-the-art CNNs for the font group recognition task. The dataset consists of more than 35 600 page images, each page showing up to five different font groups, of which ten are considered in this dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 5th International Workshop on Historical Document Imaging and Processing

自引率

0.00%

发文量