Kevin Klein , Antoine Muller , Alyssa Wohde , Alexander V. Gorelik , Volker Heyd , Ralf Lämmel , Yoan Diekmann , Maxime Brami
{"title":"An AI-assisted workflow for object detection and data collection from archaeological catalogues","authors":"Kevin Klein , Antoine Muller , Alyssa Wohde , Alexander V. Gorelik , Volker Heyd , Ralf Lämmel , Yoan Diekmann , Maxime Brami","doi":"10.1016/j.jas.2025.106244","DOIUrl":null,"url":null,"abstract":"<div><div>Reconciling the ever-increasing volume of new archaeological data with the abundant corpus of legacy data is fundamental to making robust archaeological interpretations. Yet, combining new and existing results is hampered by inconsistent standards in the recording and illustration of archaeological features and artefacts. Attempts at collating data from images in existing publications first involve scouring the substantial body of existing literature, followed by extracting images that require onerous manual preprocessing steps, like re-scaling, re-orienting, and re-formatting. While the sample sizes of such manual analyses are curtailed by these problems, recent developments in AI and big data methods are poised to accelerate and automate large syntheses of existing data.</div><div>This paper introduces an AI-assisted workflow capable of creating uniform archaeological datasets from heterogeneous published resources. The associated software (<em>AutArch</em>) takes large and unsorted PDF files as input, and uses neural networks to conduct image processing, object detection, and classification. Objects commonly found in archaeological catalogues – like graves, skeletons, ceramics, ornaments, stone tools, and maps – are reliably detected. Accompanying elements of the illustrations, like North arrows and scales, are automatically used for orientation and scaling. Outlines are then extracted with contour detection, allowing whole-outline morphometrics. Detected objects, contours, and other automatically retrieved data can be manually validated and adjusted via <em>AutArch</em>'s graphical user interface.</div><div>While we test this workflow on third millennium BCE Central European graves and Final Neolithic/Early Bronze Age arrowheads from Northwest Europe, this method can be applied to the vast number of artefacts and archaeological features for which shape, size, and orientation holds technological, functional, cultural, and/or temporal significance. This AI-assisted workflow has the potential to speed-up, automate, and standardise data collection throughout the discipline, allowing more objective interpretations and freeing sample sizes from budget and time constraints.</div></div>","PeriodicalId":50254,"journal":{"name":"Journal of Archaeological Science","volume":"179 ","pages":"Article 106244"},"PeriodicalIF":2.6000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Archaeological Science","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0305440325000937","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANTHROPOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Reconciling the ever-increasing volume of new archaeological data with the abundant corpus of legacy data is fundamental to making robust archaeological interpretations. Yet, combining new and existing results is hampered by inconsistent standards in the recording and illustration of archaeological features and artefacts. Attempts at collating data from images in existing publications first involve scouring the substantial body of existing literature, followed by extracting images that require onerous manual preprocessing steps, like re-scaling, re-orienting, and re-formatting. While the sample sizes of such manual analyses are curtailed by these problems, recent developments in AI and big data methods are poised to accelerate and automate large syntheses of existing data.
This paper introduces an AI-assisted workflow capable of creating uniform archaeological datasets from heterogeneous published resources. The associated software (AutArch) takes large and unsorted PDF files as input, and uses neural networks to conduct image processing, object detection, and classification. Objects commonly found in archaeological catalogues – like graves, skeletons, ceramics, ornaments, stone tools, and maps – are reliably detected. Accompanying elements of the illustrations, like North arrows and scales, are automatically used for orientation and scaling. Outlines are then extracted with contour detection, allowing whole-outline morphometrics. Detected objects, contours, and other automatically retrieved data can be manually validated and adjusted via AutArch's graphical user interface.
While we test this workflow on third millennium BCE Central European graves and Final Neolithic/Early Bronze Age arrowheads from Northwest Europe, this method can be applied to the vast number of artefacts and archaeological features for which shape, size, and orientation holds technological, functional, cultural, and/or temporal significance. This AI-assisted workflow has the potential to speed-up, automate, and standardise data collection throughout the discipline, allowing more objective interpretations and freeing sample sizes from budget and time constraints.
期刊介绍:
The Journal of Archaeological Science is aimed at archaeologists and scientists with particular interests in advancing the development and application of scientific techniques and methodologies to all areas of archaeology. This established monthly journal publishes focus articles, original research papers and major review articles, of wide archaeological significance. The journal provides an international forum for archaeologists and scientists from widely different scientific backgrounds who share a common interest in developing and applying scientific methods to inform major debates through improving the quality and reliability of scientific information derived from archaeological research.