Претраживе дигиталне рукописне колекције: могућност за рашчитавање српске ћирилице

Q4 Computer Science

Journal of Computing and Information Technology Pub Date : 2020-12-28 DOI:10.19090/cit.2020.37.35-46

Јелена С. Андоновски, Наташа Д. Дакић, Александра С. Тртовац

{"title":"Претраживе дигиталне рукописне колекције: могућност за рашчитавање српске ћирилице","authors":"Јелена С. Андоновски, Наташа Д. Дакић, Александра С. Тртовац","doi":"10.19090/cit.2020.37.35-46","DOIUrl":null,"url":null,"abstract":"The READ (Recognition and Enrichment of Archival Documents) project has the potential to revolutionise access\nto historical collections held by cultural institutions all over Europe. This project was implemented in the period\n2016/2019. It was funded by the European Commission, and involved 13 partners from the European Union. The\noverall objective of READ was to build a virtual research environment where archivists, humanities scholars, IT\nspecialists and volunteers would collaborate with the ultimate goal of boosting research, innovation, development\nand usage of cutting edge technology for the automated recognition, transcription, indexing and enrichment of\nhandwritten archival documents.\nSince its launch in 2016, in line with its concept of creating virtual research environment, the READ project was\ndeveloping advanced text recognition technology on the basis of artificial neural networks. Research in pattern\nrecognition, computer vision, document image analysis, language modelling, but also in digital humanities, archival\nresearch and related fields has seen unprecedented progress in recent years, and European research groups are\non the forefront of this specific field. Newly developed technologies and tools are integrated via publicly available\ninfrastructure – the Transkribus platform.\nThe primary goal of Transkribus is to support users who transcribe printed or handwritten documents. Only a few\nyears ago, it was still in the realm of fantasy that computers would become able to read historical scripts and to\nautomatically recognise and transcribe the handwritten text of documents from the past centuries. On the other\nhand, users of Transkribus are able to extract data from handwritten and printed texts via HTR (Handwritten Text\nRecognition) technology and search digitized text without retyping, using sophisticated technology known as\nKWS (Keyword Spotting), while simultaneously contributing to the improvement of the same technology thanks\nto machine learning principles. The automated recognition of a wide variety of historical texts has significant\nimplications for the accessibility of the written records of global cultural heritage.","PeriodicalId":38688,"journal":{"name":"Journal of Computing and Information Technology","volume":"36 1","pages":"35-46"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computing and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.19090/cit.2020.37.35-46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

Abstract

The READ (Recognition and Enrichment of Archival Documents) project has the potential to revolutionise access to historical collections held by cultural institutions all over Europe. This project was implemented in the period 2016/2019. It was funded by the European Commission, and involved 13 partners from the European Union. The overall objective of READ was to build a virtual research environment where archivists, humanities scholars, IT specialists and volunteers would collaborate with the ultimate goal of boosting research, innovation, development and usage of cutting edge technology for the automated recognition, transcription, indexing and enrichment of handwritten archival documents. Since its launch in 2016, in line with its concept of creating virtual research environment, the READ project was developing advanced text recognition technology on the basis of artificial neural networks. Research in pattern recognition, computer vision, document image analysis, language modelling, but also in digital humanities, archival research and related fields has seen unprecedented progress in recent years, and European research groups are on the forefront of this specific field. Newly developed technologies and tools are integrated via publicly available infrastructure – the Transkribus platform. The primary goal of Transkribus is to support users who transcribe printed or handwritten documents. Only a few years ago, it was still in the realm of fantasy that computers would become able to read historical scripts and to automatically recognise and transcribe the handwritten text of documents from the past centuries. On the other hand, users of Transkribus are able to extract data from handwritten and printed texts via HTR (Handwritten Text Recognition) technology and search digitized text without retyping, using sophisticated technology known as KWS (Keyword Spotting), while simultaneously contributing to the improvement of the same technology thanks to machine learning principles. The automated recognition of a wide variety of historical texts has significant implications for the accessibility of the written records of global cultural heritage.

查看原文本刊更多论文

READ(档案文件识别和丰富)项目有可能彻底改变欧洲各地文化机构持有的历史藏品的访问方式。该项目于2016/2019期间实施。它由欧盟委员会资助，并涉及来自欧盟的13个合作伙伴。READ的总体目标是建立一个虚拟的研究环境，档案工作者、人文学者、信息技术专家和志愿者将在此合作，最终目标是促进尖端技术的研究、创新、开发和使用，以自动识别、转录、索引和丰富手写档案文件。自2016年启动以来，根据其创建虚拟研究环境的概念，READ项目正在开发基于人工神经网络的先进文本识别技术。近年来，模式识别、计算机视觉、文档图像分析、语言建模，以及数字人文、档案研究和相关领域的研究取得了前所未有的进展，欧洲的研究小组处于这一特定领域的前沿。新开发的技术和工具通过公共基础设施- Transkribus平台集成。Transkribus的主要目标是支持转录打印或手写文档的用户。就在几年前，人们还在幻想计算机能够阅读历史手稿，并自动识别和转录过去几个世纪以来的文件手写文本。另一方面，Transkribus的用户可以通过HTR(手写文本识别)技术从手写和印刷文本中提取数据，并使用称为asKWS(关键字识别)的复杂技术，无需重复输入即可搜索数字化文本，同时由于机器学习原理，为同一技术的改进做出贡献。对各种历史文本的自动识别对全球文化遗产书面记录的可及性具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computing and Information Technology Computer Science-Computer Science (all)

CiteScore

0.60

自引率

0.00%

发文量

审稿时长

26 weeks

期刊介绍： CIT. Journal of Computing and Information Technology is an international peer-reviewed journal covering the area of computing and information technology, i.e. computer science, computer engineering, software engineering, information systems, and information technology. CIT endeavors to publish stimulating accounts of original scientific work, primarily including research papers on both theoretical and practical issues, as well as case studies describing the application and critical evaluation of theory. Surveys and state-of-the-art reports will be considered only exceptionally; proposals for such submissions should be sent to the Editorial Board for scrutiny. Specific areas of interest comprise, but are not restricted to, the following topics: theory of computing, design and analysis of algorithms, numerical and symbolic computing, scientific computing, artificial intelligence, image processing, pattern recognition, computer vision, embedded and real-time systems, operating systems, computer networking, Web technologies, distributed systems, human-computer interaction, technology enhanced learning, multimedia, database systems, data mining, machine learning, knowledge engineering, soft computing systems and network security, computational statistics, computational linguistics, and natural language processing. Special attention is paid to educational, social, legal and managerial aspects of computing and information technology. In this respect CIT fosters the exchange of ideas, experience and knowledge between regions with different technological and cultural background, and in particular developed and developing ones.