面向大数据环境下文本科学文献集成的语义ETL:一种理论方法

2020 6th IEEE Congress on Information Science and Technology (CiSt) Pub Date : 2020-06-05 DOI:10.1109/CiSt49399.2021.9357280

Chaimae Boulahia, Hicham Behja, Mohammed Reda Chbihi Louhdi

{"title":"面向大数据环境下文本科学文献集成的语义ETL:一种理论方法","authors":"Chaimae Boulahia, Hicham Behja, Mohammed Reda Chbihi Louhdi","doi":"10.1109/CiSt49399.2021.9357280","DOIUrl":null,"url":null,"abstract":"Every day, new scientific documents (textual data) and new domain ontologies are published. It requires putting evolving information systems in order to exploit these massive amounts of textual data and ontologies. Hence, the need for data integration is to collect and combine them so we can provide a unified view, which analyze and visualize them for the different analytical needs of scientific researchers. In this sense, we find the Extract Transform Load (ETL) tool. It is one of the most popular approaches for data integration (DI). However, it does not take into consideration semantic data. This requirement gives birth to the Semantic ETL or ETL based on ontologies, which extends traditional ETL. Mostly known as an issue of interest to most researchers, but the latter found it difficult because of the complexity and the processing of Big Data that are characterized by 5V (Volume, Variety, Velocity, Veracity, Value). Succinctly, this paper attempts to discuss two questions such as: (1) What is the capacity of the existing Semantic ETL to manage Big Data and are they relevant in the present? (2) What is our proposal for a Semantic ETL for Big Data to overcome previous works?","PeriodicalId":253233,"journal":{"name":"2020 6th IEEE Congress on Information Science and Technology (CiSt)","volume":"246 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Towards Semantic ETL for integration of textual scientific documents in a Big Data environment: a theoretical approach\",\"authors\":\"Chaimae Boulahia, Hicham Behja, Mohammed Reda Chbihi Louhdi\",\"doi\":\"10.1109/CiSt49399.2021.9357280\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Every day, new scientific documents (textual data) and new domain ontologies are published. It requires putting evolving information systems in order to exploit these massive amounts of textual data and ontologies. Hence, the need for data integration is to collect and combine them so we can provide a unified view, which analyze and visualize them for the different analytical needs of scientific researchers. In this sense, we find the Extract Transform Load (ETL) tool. It is one of the most popular approaches for data integration (DI). However, it does not take into consideration semantic data. This requirement gives birth to the Semantic ETL or ETL based on ontologies, which extends traditional ETL. Mostly known as an issue of interest to most researchers, but the latter found it difficult because of the complexity and the processing of Big Data that are characterized by 5V (Volume, Variety, Velocity, Veracity, Value). Succinctly, this paper attempts to discuss two questions such as: (1) What is the capacity of the existing Semantic ETL to manage Big Data and are they relevant in the present? (2) What is our proposal for a Semantic ETL for Big Data to overcome previous works?\",\"PeriodicalId\":253233,\"journal\":{\"name\":\"2020 6th IEEE Congress on Information Science and Technology (CiSt)\",\"volume\":\"246 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 6th IEEE Congress on Information Science and Technology (CiSt)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CiSt49399.2021.9357280\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th IEEE Congress on Information Science and Technology (CiSt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CiSt49399.2021.9357280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

每天都有新的科学文献(文本数据)和新的领域本体发布。它需要不断发展的信息系统，以便利用这些大量的文本数据和本体。因此，对数据集成的需求是收集和组合它们，以便我们能够提供一个统一的视图，对它们进行分析和可视化，以满足科研人员的不同分析需求。在这个意义上，我们找到了Extract Transform Load (ETL)工具。它是数据集成(DI)最流行的方法之一。然而，它没有考虑语义数据。这种需求产生了语义ETL或基于本体的ETL，它扩展了传统的ETL。这是大多数研究人员感兴趣的问题，但后者发现很难，因为大数据的复杂性和处理以5V (Volume, Variety, Velocity, Veracity, Value)为特征。简而言之，本文试图讨论两个问题，如:(1)现有的语义ETL管理大数据的能力是什么，它们在当前是否相关?(2)我们对大数据语义ETL的建议是什么，以克服以往的工作?

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Semantic ETL for integration of textual scientific documents in a Big Data environment: a theoretical approach

Every day, new scientific documents (textual data) and new domain ontologies are published. It requires putting evolving information systems in order to exploit these massive amounts of textual data and ontologies. Hence, the need for data integration is to collect and combine them so we can provide a unified view, which analyze and visualize them for the different analytical needs of scientific researchers. In this sense, we find the Extract Transform Load (ETL) tool. It is one of the most popular approaches for data integration (DI). However, it does not take into consideration semantic data. This requirement gives birth to the Semantic ETL or ETL based on ontologies, which extends traditional ETL. Mostly known as an issue of interest to most researchers, but the latter found it difficult because of the complexity and the processing of Big Data that are characterized by 5V (Volume, Variety, Velocity, Veracity, Value). Succinctly, this paper attempts to discuss two questions such as: (1) What is the capacity of the existing Semantic ETL to manage Big Data and are they relevant in the present? (2) What is our proposal for a Semantic ETL for Big Data to overcome previous works?

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 6th IEEE Congress on Information Science and Technology (CiSt)

自引率

0.00%

发文量