开放数字化报纸语料库:欧洲全文数据互操作性案例

Nuno Freire, Antoine Isaac, Twan Goosen, D. Broeder, Hugo Manguinhas, V. Charles
{"title":"开放数字化报纸语料库:欧洲全文数据互操作性案例","authors":"Nuno Freire, Antoine Isaac, Twan Goosen, D. Broeder, Hugo Manguinhas, V. Charles","doi":"10.4230/OASIcs.LDK.2019.22","DOIUrl":null,"url":null,"abstract":"Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a “full-text profile” for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Opening Digitized Newspapers Corpora: Europeana's Full-Text Data Interoperability Case\",\"authors\":\"Nuno Freire, Antoine Isaac, Twan Goosen, D. Broeder, Hugo Manguinhas, V. Charles\",\"doi\":\"10.4230/OASIcs.LDK.2019.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a “full-text profile” for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.\",\"PeriodicalId\":377119,\"journal\":{\"name\":\"International Conference on Language, Data, and Knowledge\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Language, Data, and Knowledge\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/OASIcs.LDK.2019.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Language, Data, and Knowledge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/OASIcs.LDK.2019.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

文化遗产机构收藏的印刷报纸是研究历史、语言学和其他数字人文科学领域的宝贵资源。仅基于元数据的报纸内容的有效检索几乎是不可能完成的任务,这使得基于(数字化)全文的检索尤为重要。欧洲数字图书馆(Europeana)能够提供大型报纸馆藏的全文资源。全文语料库也与Europeana促进文化遗产资源在研究基础设施中使用的目标相关。基于对文化数据的具体特征、两个研究基础设施(CLARIN和EUDAT)的需求以及国际图像互操作性框架(IIIF)社区正在推广的实践的调查,我们得出了以可互操作的方式汇总和发布欧洲报纸全文语料库的要求。然后,我们为Europeana数据模型定义了一个“全文配置文件”,该模型将应用于Europeana的报纸语料库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Opening Digitized Newspapers Corpora: Europeana's Full-Text Data Interoperability Case
Cultural heritage institutions hold collections of printed newspapers that are valuable resources for the study of history, linguistics and other Digital Humanities scientific domains. Effective retrieval of newspapers content based on metadata only is a task nearly impossible, making the retrieval based on (digitized) full-text particularly relevant. Europeana, Europe’s Digital Library, is in the position to provide access to large newspapers collections with full-text resources. Full-text corpora are also relevant for Europeana’s objective of promoting the usage of cultural heritage resources for use within research infrastructures. We have derived requirements for aggregating and publishing Europeana’s newspapers full-text corpus in an interoperable way, based on investigations into the specific characteristics of cultural data, the needs of two research infrastructures (CLARIN and EUDAT) and the practices being promoted in the International Image Interoperability Framework (IIIF) community. We have then defined a “full-text profile” for the Europeana Data Model, which is being applied to Europeana’s newspaper corpus.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信