{"title":"Breaking the Code on Broken Tablets: The Learning Challenge for Annotated Cuneiform Script in Normalized 2D and 3D Datasets","authors":"H. Mara, B. Bogacz","doi":"10.1109/ICDAR.2019.00032","DOIUrl":null,"url":null,"abstract":"The number of known cuneiform tablets is assumed to be in the hundreds of thousands. The Hilprecht Archive Online contains 1977 high-resolution 3D scans of tablets. The online cuneiform database CDLI catalogs metadata for more than 100.000 tablets. While both are accessible publicly, large-scale machine learning and pattern recognition on cuneiform tablets remain elusive. The data is only accessible by searching web pages, the tablet identifiers between collections are inconsistent, and the 3D data is unprepared and challenging for automated processing. We pave the way for large-scale analyses of cuneiform tablets by assembling a cross-referenced benchmark dataset of processed cuneiform tablets: (i) frontally aligned 3D tablets with pre-computed high-dimensional surface features, (ii) six-views raster images for off-the-shelf image processing, and (iii) metadata, transcriptions, and transliterations, for a subset of 707 tablets, for learning alignment between 3D data, image and linguistic expression. This is the first dataset of its kind and of its size in cuneiform research. This benchmark dataset is prepared for ease-of-use and immediate availability for computational researches, lowering the barrier to experiment and apply standard methods of analysis, at https://doi.org/10.11588/data/IE8CCN.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The number of known cuneiform tablets is assumed to be in the hundreds of thousands. The Hilprecht Archive Online contains 1977 high-resolution 3D scans of tablets. The online cuneiform database CDLI catalogs metadata for more than 100.000 tablets. While both are accessible publicly, large-scale machine learning and pattern recognition on cuneiform tablets remain elusive. The data is only accessible by searching web pages, the tablet identifiers between collections are inconsistent, and the 3D data is unprepared and challenging for automated processing. We pave the way for large-scale analyses of cuneiform tablets by assembling a cross-referenced benchmark dataset of processed cuneiform tablets: (i) frontally aligned 3D tablets with pre-computed high-dimensional surface features, (ii) six-views raster images for off-the-shelf image processing, and (iii) metadata, transcriptions, and transliterations, for a subset of 707 tablets, for learning alignment between 3D data, image and linguistic expression. This is the first dataset of its kind and of its size in cuneiform research. This benchmark dataset is prepared for ease-of-use and immediate availability for computational researches, lowering the barrier to experiment and apply standard methods of analysis, at https://doi.org/10.11588/data/IE8CCN.