机器可读编目到机器可理解数据与分布式大数据管理

Q2 Social Sciences
K. Sharma, U. Marjit, U. Biswas
{"title":"机器可读编目到机器可理解数据与分布式大数据管理","authors":"K. Sharma, U. Marjit, U. Biswas","doi":"10.1080/19386389.2018.1461177","DOIUrl":null,"url":null,"abstract":"ABSTRACT In recent years, the library domain has been using semantic web technologies to enable the data-centric information that can be processed directly by machines. Attempts have been evolved for data transitioning from MAchine-Readable Cataloging (MARC) formats into the Resource Description Framework (RDF). Storing library data in RDF format enhances interlinking and reusing of the resources on the web. Moreover, the machine can interpret library resources meaningfully because of rich source of semantics. Existing approaches rely on the single-node environment but they fail when they meet the large volume of the input data. Some of the bibliographic records in MARC 21 formats are huge in size that traditional data-management tools become incapable during data processing and requires larger storage area. Such data need serious attention by the systems that can perform tasks in parallel. In this article, we propose a distributed approach to convert legacy library data into RDF format using Apache Spark and Hadoop. We describe the process of data conversion from MARC 21 formats for Bibliographic data into RDF and show preliminary reports on the processing speed and storage analysis. The performance of the conversion process is improved in terms of processing time and the storage size.","PeriodicalId":39057,"journal":{"name":"Journal of Library Metadata","volume":"61 1","pages":"13 - 29"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"MAchine Readable Cataloging to MAchine Understandable Data with Distributed Big Data Management\",\"authors\":\"K. Sharma, U. Marjit, U. Biswas\",\"doi\":\"10.1080/19386389.2018.1461177\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT In recent years, the library domain has been using semantic web technologies to enable the data-centric information that can be processed directly by machines. Attempts have been evolved for data transitioning from MAchine-Readable Cataloging (MARC) formats into the Resource Description Framework (RDF). Storing library data in RDF format enhances interlinking and reusing of the resources on the web. Moreover, the machine can interpret library resources meaningfully because of rich source of semantics. Existing approaches rely on the single-node environment but they fail when they meet the large volume of the input data. Some of the bibliographic records in MARC 21 formats are huge in size that traditional data-management tools become incapable during data processing and requires larger storage area. Such data need serious attention by the systems that can perform tasks in parallel. In this article, we propose a distributed approach to convert legacy library data into RDF format using Apache Spark and Hadoop. We describe the process of data conversion from MARC 21 formats for Bibliographic data into RDF and show preliminary reports on the processing speed and storage analysis. The performance of the conversion process is improved in terms of processing time and the storage size.\",\"PeriodicalId\":39057,\"journal\":{\"name\":\"Journal of Library Metadata\",\"volume\":\"61 1\",\"pages\":\"13 - 29\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Library Metadata\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/19386389.2018.1461177\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Library Metadata","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/19386389.2018.1461177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 2

摘要

近年来,图书馆领域一直在使用语义web技术,使以数据为中心的信息可以由机器直接处理。人们已经尝试将数据从机器可读编目(MARC)格式转换为资源描述框架(RDF)。以RDF格式存储库数据增强了web上资源的相互链接和重用。此外,由于丰富的语义来源,机器可以对图书馆资源进行有意义的解释。现有的方法依赖于单节点环境,但当它们满足大量输入数据时就会失败。MARC 21格式的一些书目记录由于规模巨大,传统的数据管理工具在数据处理过程中无法胜任,需要更大的存储面积。这些数据需要能够并行执行任务的系统认真关注。在本文中,我们提出了一种使用Apache Spark和Hadoop将遗留库数据转换为RDF格式的分布式方法。我们描述了将书目数据从MARC 21格式转换为RDF的过程,并给出了处理速度和存储分析的初步报告。转换过程的性能在处理时间和存储大小方面得到了改善。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MAchine Readable Cataloging to MAchine Understandable Data with Distributed Big Data Management
ABSTRACT In recent years, the library domain has been using semantic web technologies to enable the data-centric information that can be processed directly by machines. Attempts have been evolved for data transitioning from MAchine-Readable Cataloging (MARC) formats into the Resource Description Framework (RDF). Storing library data in RDF format enhances interlinking and reusing of the resources on the web. Moreover, the machine can interpret library resources meaningfully because of rich source of semantics. Existing approaches rely on the single-node environment but they fail when they meet the large volume of the input data. Some of the bibliographic records in MARC 21 formats are huge in size that traditional data-management tools become incapable during data processing and requires larger storage area. Such data need serious attention by the systems that can perform tasks in parallel. In this article, we propose a distributed approach to convert legacy library data into RDF format using Apache Spark and Hadoop. We describe the process of data conversion from MARC 21 formats for Bibliographic data into RDF and show preliminary reports on the processing speed and storage analysis. The performance of the conversion process is improved in terms of processing time and the storage size.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Library Metadata
Journal of Library Metadata Social Sciences-Library and Information Sciences
CiteScore
2.00
自引率
0.00%
发文量
13
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信