{"title":"Towards the Development of Talend Open Studio Components for the Support of Semantic Sources","authors":"Morad Hajji, Mohammed Qbadou, K. Mansouri","doi":"10.1109/ICSSD47982.2019.9002820","DOIUrl":null,"url":null,"abstract":"The Extract-Transform-Load (ETL) process is the most widely used mechanism to keep a Data Warehouse loading with data extracted from a variety of sources. Currently, tools offering graphical interfaces to facilitate the manipulation of ETL processes have become very popular and have reached a very advanced level of maturity. Talend Open Studio for Data Integration is one of the most popular and comprehensive tools in terms of functionality and performance. So far, this ETL tool provides a large number of components for different data sources. However, the advent of the Semantic Web brings the notion of ontology as a new source of data whose structure is characterized by its complex aspect related to the expressiveness of languages of the knowledge representation. The emergence of this notion is a new challenge. Indeed, to our knowledge, Talend Open Studio for Data Integration does not have any components intended to support ontological sources.In this contribution, we present our approach for the development of Talend Open Studio for Data Integration components in order to use Semantic Web data in ETL processes created with this tool. Using a strategy that promotes the abstraction of ontological sources, this approach can be adapted to different languages of representation of knowledge such as RDF and OWL.In order to assess the usefulness of our approach, we evaluated it as part of a hypothetical example set of a simplistic ontology.","PeriodicalId":342806,"journal":{"name":"2019 1st International Conference on Smart Systems and Data Science (ICSSD)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 1st International Conference on Smart Systems and Data Science (ICSSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSD47982.2019.9002820","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The Extract-Transform-Load (ETL) process is the most widely used mechanism to keep a Data Warehouse loading with data extracted from a variety of sources. Currently, tools offering graphical interfaces to facilitate the manipulation of ETL processes have become very popular and have reached a very advanced level of maturity. Talend Open Studio for Data Integration is one of the most popular and comprehensive tools in terms of functionality and performance. So far, this ETL tool provides a large number of components for different data sources. However, the advent of the Semantic Web brings the notion of ontology as a new source of data whose structure is characterized by its complex aspect related to the expressiveness of languages of the knowledge representation. The emergence of this notion is a new challenge. Indeed, to our knowledge, Talend Open Studio for Data Integration does not have any components intended to support ontological sources.In this contribution, we present our approach for the development of Talend Open Studio for Data Integration components in order to use Semantic Web data in ETL processes created with this tool. Using a strategy that promotes the abstraction of ontological sources, this approach can be adapted to different languages of representation of knowledge such as RDF and OWL.In order to assess the usefulness of our approach, we evaluated it as part of a hypothetical example set of a simplistic ontology.
提取-转换-加载(Extract-Transform-Load, ETL)过程是使用最广泛的机制,用于保持数据仓库加载从各种来源提取的数据。目前,提供图形界面以方便操作ETL过程的工具已经变得非常流行,并且已经达到了非常高级的成熟度。Talend Open Studio for Data Integration是在功能和性能方面最流行和最全面的工具之一。到目前为止,这个ETL工具为不同的数据源提供了大量的组件。然而,语义网的出现带来了本体作为一种新的数据源的概念,其结构的特点是其复杂性与知识表示语言的表达性有关。这个概念的出现是一个新的挑战。事实上,据我们所知,Talend Open Studio for Data Integration并没有任何支持本体论源的组件。在本文中,我们介绍了开发Talend Open Studio for Data Integration组件的方法,以便在使用该工具创建的ETL流程中使用语义Web数据。使用一种促进本体论源抽象的策略,这种方法可以适应不同的知识表示语言,如RDF和OWL。为了评估我们的方法的有用性,我们将其作为一个简单本体的假设示例集的一部分进行评估。