Towards Semantic ETL for integration of textual scientific documents in a Big Data environment: a theoretical approach

Chaimae Boulahia, Hicham Behja, Mohammed Reda Chbihi Louhdi
{"title":"Towards Semantic ETL for integration of textual scientific documents in a Big Data environment: a theoretical approach","authors":"Chaimae Boulahia, Hicham Behja, Mohammed Reda Chbihi Louhdi","doi":"10.1109/CiSt49399.2021.9357280","DOIUrl":null,"url":null,"abstract":"Every day, new scientific documents (textual data) and new domain ontologies are published. It requires putting evolving information systems in order to exploit these massive amounts of textual data and ontologies. Hence, the need for data integration is to collect and combine them so we can provide a unified view, which analyze and visualize them for the different analytical needs of scientific researchers. In this sense, we find the Extract Transform Load (ETL) tool. It is one of the most popular approaches for data integration (DI). However, it does not take into consideration semantic data. This requirement gives birth to the Semantic ETL or ETL based on ontologies, which extends traditional ETL. Mostly known as an issue of interest to most researchers, but the latter found it difficult because of the complexity and the processing of Big Data that are characterized by 5V (Volume, Variety, Velocity, Veracity, Value). Succinctly, this paper attempts to discuss two questions such as: (1) What is the capacity of the existing Semantic ETL to manage Big Data and are they relevant in the present? (2) What is our proposal for a Semantic ETL for Big Data to overcome previous works?","PeriodicalId":253233,"journal":{"name":"2020 6th IEEE Congress on Information Science and Technology (CiSt)","volume":"246 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th IEEE Congress on Information Science and Technology (CiSt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CiSt49399.2021.9357280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Every day, new scientific documents (textual data) and new domain ontologies are published. It requires putting evolving information systems in order to exploit these massive amounts of textual data and ontologies. Hence, the need for data integration is to collect and combine them so we can provide a unified view, which analyze and visualize them for the different analytical needs of scientific researchers. In this sense, we find the Extract Transform Load (ETL) tool. It is one of the most popular approaches for data integration (DI). However, it does not take into consideration semantic data. This requirement gives birth to the Semantic ETL or ETL based on ontologies, which extends traditional ETL. Mostly known as an issue of interest to most researchers, but the latter found it difficult because of the complexity and the processing of Big Data that are characterized by 5V (Volume, Variety, Velocity, Veracity, Value). Succinctly, this paper attempts to discuss two questions such as: (1) What is the capacity of the existing Semantic ETL to manage Big Data and are they relevant in the present? (2) What is our proposal for a Semantic ETL for Big Data to overcome previous works?
面向大数据环境下文本科学文献集成的语义ETL:一种理论方法
每天都有新的科学文献(文本数据)和新的领域本体发布。它需要不断发展的信息系统,以便利用这些大量的文本数据和本体。因此,对数据集成的需求是收集和组合它们,以便我们能够提供一个统一的视图,对它们进行分析和可视化,以满足科研人员的不同分析需求。在这个意义上,我们找到了Extract Transform Load (ETL)工具。它是数据集成(DI)最流行的方法之一。然而,它没有考虑语义数据。这种需求产生了语义ETL或基于本体的ETL,它扩展了传统的ETL。这是大多数研究人员感兴趣的问题,但后者发现很难,因为大数据的复杂性和处理以5V (Volume, Variety, Velocity, Veracity, Value)为特征。简而言之,本文试图讨论两个问题,如:(1)现有的语义ETL管理大数据的能力是什么,它们在当前是否相关?(2)我们对大数据语义ETL的建议是什么,以克服以往的工作?
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信