{"title":"Assessing advanced handwritten text recognition engines for digitizing historical documents.","authors":"C A Romein, A Rabus, G Leifert, P B Ströbel","doi":"10.1007/s42803-025-00100-0","DOIUrl":"https://doi.org/10.1007/s42803-025-00100-0","url":null,"abstract":"<p><p>This study provides critical insights and evaluates the performance of state-of-the-art Handwritten Text Recognition (HTR) engines-PyLaia, HTR + , IDA, TrOCR-f, and Transkribus' proprietary Transformer-based \"supermodel\" Titan-to digitize historical documents. Using a diverse range of datasets that include different scripts, this research assesses each engine's accuracy and efficiency in handling multilingual content, complex styles, abbreviations, and historical orthography. Results indicate that, while all engines can be trained or fine-tuned to improve performance, Titan and TrOCR-f exhibit superior out-of-the-box capabilities for Latin-script documents. PyLaia, IDA, and HTR + excel in specific non-Latin scripts when specifically trained or fine-tuned. This study underscores the importance of training, fine-tuning, and integrating language models, providing critical insights for future advancements in HTR technology and its application in the digital humanities.</p>","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"7 1","pages":"115-134"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12202554/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144531536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Javier Pereda, Pip Willcox, Gustavo Candela, Alexander Sanchez, Patricia A Murrieta-Flores
{"title":"Online cultural heritage as a social machine: a socio-technical approach to digital infrastructure and ecosystems.","authors":"Javier Pereda, Pip Willcox, Gustavo Candela, Alexander Sanchez, Patricia A Murrieta-Flores","doi":"10.1007/s42803-025-00097-6","DOIUrl":"10.1007/s42803-025-00097-6","url":null,"abstract":"<p><p>The advent of digital technologies has profoundly transformed cultural and heritage sectors, providing new avenues for broader access and interactions with digital collections. This shift has enabled Online Cultural Heritage (OCH) to evolve into an extensive ecosystem. Given the complexity that emerges from these networks and stakeholders, it is crucial to develop a clearer understanding of the extensive terminology used in the sector and establish pathways to deconstruct this complexity. Therefore, this article's aim is threefold: 1) it examines how OCH ecosystems foster the ongoing reinterpretation and recontextualisation of cultural heritage collections through technologic innovations and the Web. In doing so, it highlights the relevance of policy development and the establishment of ethical frameworks that address both human and technical complexities of Cultural Heritage (CH) knowledge; 2) using the Open Archival Information System (OAIS) as a framework and its terminology, the article maps the workflows and socio-technical actors of the OCH ecosystem; and 3) the article applies Callon's Process of Translation, a methodology for understanding how socio-technical networks evolve and use it to critically deconstruct digital infrastructures in OCH. This methodology enables the contextualisation and reinterpretation of cultural narratives across digital platforms, both online and offline, underscoring the dynamic interplay between technology, human agency, and cultural context. We explore how OCH ecosystems and other infrastructural ecosystems aid in preserving and facilitating engagement with open knowledge and research, and function as complex networks of cultural institutions interconnected through knowledge infrastructures. Whilst the paper places the primary approach within UK infrastructures, it provides alternative perspectives from the Global South, particularly Latin America, to contrast and further illustrate a reflection on the current and future challenges behind a sustainable OCH ecosystem, its implications for further networks, and its potential as a model beyond the CH sector. Furthermore, this framework can become paramount to identifying obstacles and opportunities for digital infrastructures, establishing a nuanced understanding of OCH as a core infrastructural element in the generation of knowledge from digital collections or digital infrastructures around the world. Finally, we provide a glossary of terms to establish a common ground between the wide range of parties involved in OCH. CCS CONCEPTS • Digital libraries and archives • Information Integration • Cultural characteristics.</p>","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"7 1","pages":"39-69"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12202677/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144531537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RelChronVis: an interactive web application for visualizing the relative chronology of language changes","authors":"Florian Wandl, Thilo H. K. Thelitz","doi":"10.1007/s42803-024-00086-1","DOIUrl":"https://doi.org/10.1007/s42803-024-00086-1","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"44 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141807749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DOLD: a digital platform for conducting online language experiments and surveys","authors":"Yik-Po Lai, Hin Tat Cheung","doi":"10.1007/s42803-024-00085-2","DOIUrl":"https://doi.org/10.1007/s42803-024-00085-2","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"70 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141683123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction: Committing to reproducibility and explainability: using Git as a research journal","authors":"Samuel J. Huskey","doi":"10.1007/s42803-024-00084-3","DOIUrl":"https://doi.org/10.1007/s42803-024-00084-3","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"78 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139802443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction: Committing to reproducibility and explainability: using Git as a research journal","authors":"Samuel J. Huskey","doi":"10.1007/s42803-024-00084-3","DOIUrl":"https://doi.org/10.1007/s42803-024-00084-3","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"1 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139861914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Open Times: The future of critique in the age of (un)replicability","authors":"Nathalie Cooke, Ronny Litvack-Katzman","doi":"10.1007/s42803-023-00081-y","DOIUrl":"https://doi.org/10.1007/s42803-023-00081-y","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"12 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139445792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital humanities in the era of digital reproducibility: towards a fairest and post-computational framework","authors":"Béatrice Joyeux-Prunel","doi":"10.1007/s42803-023-00079-6","DOIUrl":"https://doi.org/10.1007/s42803-023-00079-6","url":null,"abstract":"","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"20 6","pages":"1-21"},"PeriodicalIF":0.0,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139389208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"\"I have always found the whole area a minefield\": Wikidata, historical lives, and knowledge infrastructure.","authors":"James Baker, Ammandeep K Mahal","doi":"10.1007/s42803-024-00090-5","DOIUrl":"10.1007/s42803-024-00090-5","url":null,"abstract":"<p><p>The rise of Wikidata represents a quiet revolution in knowledge infrastructure. This paper enquires into this knowledge base as an infrastructure and considers the implications of its centrality within our contemporary knowledge ecosystem. Rather than read Wikidata at scale, we employ of a narrow frame through which to explore the ideologies Wikidata has adopted and reproduces. This frame is Beyond Notability, a knowledge base that seeks to document women's work in archaeology, history, and heritage between 1870 and 1950 through original archival research. Beyond Notability draws on and responds to the Wikidata data model, and this paper emerges from our experiences interacting with Wikidata to produce linked data biography. In foregrounding the tensions between historically specific phenomena and classificatory logics, our work stresses the value of using practice-based ontology development to investigate large-scale knowledge infrastructures at a time when the fabric of knowledge is at stake.</p>","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"6 2","pages":"217-236"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12084240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144096199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jawad Sadek, Andreas Vlachidis, Victoria Pickering, Marco Humbel, Daniele Metilli, Mark Carine, Julianne Nyhan
{"title":"Leveraging OCR and HTR cloud services towards data mobilisation of historical plant names.","authors":"Jawad Sadek, Andreas Vlachidis, Victoria Pickering, Marco Humbel, Daniele Metilli, Mark Carine, Julianne Nyhan","doi":"10.1007/s42803-024-00091-4","DOIUrl":"10.1007/s42803-024-00091-4","url":null,"abstract":"<p><p>We present our solution to the problem of how to mobilise (that is, extract and enrich) digital data from the analogue, printed book version Sir Hans Sloane's copy of John Ray's Historia Plantarum, to create the first searchable facility of its kind to the plants contained in the Sloane Herbarium, housed in the National History Museum UK. The data mobilisation workflow presented here enables the automatic detection of printed and handwritten marginalia text and annotations in Sir Hans Sloane\" personal copy of John Ray's Historia Plantarum. The rationale of adopting AWS Amazon's Textract service and the development of a specialised information extraction workflow for mobilising printed text and handwritten annotations is discussed. Testing of our workflow demonstrates the need for human-checking of outputs to ensure the accuracy of a large set of structured data comprising 7600 plant names and 4540 handwritten marginalia annotation. The links we have created serve as the first digital index to Sloan's Herbarium, a unique development in the longer analogue and digital format-history of these resources.</p>","PeriodicalId":91018,"journal":{"name":"International journal of digital humanities","volume":"6 3","pages":"237-261"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12106164/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144176215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}