开放数字化报纸语料库:欧洲全文数据互操作性案例

International Conference on Language, Data, and Knowledge Pub Date : 2019-05-01 DOI:10.4230/OASIcs.LDK.2019.22

Nuno Freire, Antoine Isaac, Twan Goosen, D. Broeder, Hugo Manguinhas, V. Charles

{"title":"开放数字化报纸语料库:欧洲全文数据互操作性案例","authors":"Nuno Freire, Antoine Isaac, Twan Goosen, D. Broeder, Hugo Manguinhas, V. Charles","doi":"10.4230/OASIcs.LDK.2019.22","DOIUrl":null,"url":null,"abstract":"Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a “full-text profile” for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Opening Digitized Newspapers Corpora: Europeana's Full-Text Data Interoperability Case\",\"authors\":\"Nuno Freire, Antoine Isaac, Twan Goosen, D. Broeder, Hugo Manguinhas, V. Charles\",\"doi\":\"10.4230/OASIcs.LDK.2019.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a “full-text profile” for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.\",\"PeriodicalId\":377119,\"journal\":{\"name\":\"International Conference on Language, Data, and Knowledge\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Language, Data, and Knowledge\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/OASIcs.LDK.2019.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Language, Data, and Knowledge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/OASIcs.LDK.2019.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

文化遗产机构收藏的印刷报纸是研究历史、语言学和其他数字人文科学领域的宝贵资源。仅基于元数据的报纸内容的有效检索几乎是不可能完成的任务，这使得基于(数字化)全文的检索尤为重要。欧洲数字图书馆(Europeana)能够提供大型报纸馆藏的全文资源。全文语料库也与Europeana促进文化遗产资源在研究基础设施中使用的目标相关。基于对文化数据的具体特征、两个研究基础设施(CLARIN和EUDAT)的需求以及国际图像互操作性框架(IIIF)社区正在推广的实践的调查，我们得出了以可互操作的方式汇总和发布欧洲报纸全文语料库的要求。然后，我们为Europeana数据模型定义了一个“全文配置文件”，该模型将应用于Europeana的报纸语料库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Opening Digitized Newspapers Corpora: Europeana's Full-Text Data Interoperability Case

Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a “full-text profile” for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Language, Data, and Knowledge

自引率

0.00%

发文量