Chaimae Boulahia, Hicham Behja, Mohammed Reda Chbihi Louhdi
{"title":"面向大数据环境下文本科学文献集成的语义ETL:一种理论方法","authors":"Chaimae Boulahia, Hicham Behja, Mohammed Reda Chbihi Louhdi","doi":"10.1109/CiSt49399.2021.9357280","DOIUrl":null,"url":null,"abstract":"Every day, new scientific documents (textual data) and new domain ontologies are published. It requires putting evolving information systems in order to exploit these massive amounts of textual data and ontologies. Hence, the need for data integration is to collect and combine them so we can provide a unified view, which analyze and visualize them for the different analytical needs of scientific researchers. In this sense, we find the Extract Transform Load (ETL) tool. It is one of the most popular approaches for data integration (DI). However, it does not take into consideration semantic data. This requirement gives birth to the Semantic ETL or ETL based on ontologies, which extends traditional ETL. Mostly known as an issue of interest to most researchers, but the latter found it difficult because of the complexity and the processing of Big Data that are characterized by 5V (Volume, Variety, Velocity, Veracity, Value). Succinctly, this paper attempts to discuss two questions such as: (1) What is the capacity of the existing Semantic ETL to manage Big Data and are they relevant in the present? (2) What is our proposal for a Semantic ETL for Big Data to overcome previous works?","PeriodicalId":253233,"journal":{"name":"2020 6th IEEE Congress on Information Science and Technology (CiSt)","volume":"246 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Towards Semantic ETL for integration of textual scientific documents in a Big Data environment: a theoretical approach\",\"authors\":\"Chaimae Boulahia, Hicham Behja, Mohammed Reda Chbihi Louhdi\",\"doi\":\"10.1109/CiSt49399.2021.9357280\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Every day, new scientific documents (textual data) and new domain ontologies are published. It requires putting evolving information systems in order to exploit these massive amounts of textual data and ontologies. Hence, the need for data integration is to collect and combine them so we can provide a unified view, which analyze and visualize them for the different analytical needs of scientific researchers. In this sense, we find the Extract Transform Load (ETL) tool. It is one of the most popular approaches for data integration (DI). However, it does not take into consideration semantic data. This requirement gives birth to the Semantic ETL or ETL based on ontologies, which extends traditional ETL. Mostly known as an issue of interest to most researchers, but the latter found it difficult because of the complexity and the processing of Big Data that are characterized by 5V (Volume, Variety, Velocity, Veracity, Value). Succinctly, this paper attempts to discuss two questions such as: (1) What is the capacity of the existing Semantic ETL to manage Big Data and are they relevant in the present? (2) What is our proposal for a Semantic ETL for Big Data to overcome previous works?\",\"PeriodicalId\":253233,\"journal\":{\"name\":\"2020 6th IEEE Congress on Information Science and Technology (CiSt)\",\"volume\":\"246 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 6th IEEE Congress on Information Science and Technology (CiSt)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CiSt49399.2021.9357280\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th IEEE Congress on Information Science and Technology (CiSt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CiSt49399.2021.9357280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Semantic ETL for integration of textual scientific documents in a Big Data environment: a theoretical approach
Every day, new scientific documents (textual data) and new domain ontologies are published. It requires putting evolving information systems in order to exploit these massive amounts of textual data and ontologies. Hence, the need for data integration is to collect and combine them so we can provide a unified view, which analyze and visualize them for the different analytical needs of scientific researchers. In this sense, we find the Extract Transform Load (ETL) tool. It is one of the most popular approaches for data integration (DI). However, it does not take into consideration semantic data. This requirement gives birth to the Semantic ETL or ETL based on ontologies, which extends traditional ETL. Mostly known as an issue of interest to most researchers, but the latter found it difficult because of the complexity and the processing of Big Data that are characterized by 5V (Volume, Variety, Velocity, Veracity, Value). Succinctly, this paper attempts to discuss two questions such as: (1) What is the capacity of the existing Semantic ETL to manage Big Data and are they relevant in the present? (2) What is our proposal for a Semantic ETL for Big Data to overcome previous works?