Conceptual modeling of Big Data extraction phase

Hana Mallek, Faïza Ghozzi, F. Gargouri
{"title":"Conceptual modeling of Big Data extraction phase","authors":"Hana Mallek, Faïza Ghozzi, F. Gargouri","doi":"10.3233/his-230008","DOIUrl":null,"url":null,"abstract":"As the amount of information exceeds the management and storage capacity of traditional data management systems, several domains need to take into account this growth of data, in particular the decision-making domain known as Business Intelligence (BI). Since the accumulation and reuse of these massive data stands for a gold mine for businesses, several insights that are useful and essential for effective decision making have to be provided. However, it is obvious that there are several problems and challenges for the BI systems, especially at the level of the ETL (Extraction-Transformation-Loading) as an integration system. These processes are responsible for the selection, filtering and restructuring of data sources in order to obtain relevant decisions. In this research paper, our central focus is especially upon the adaptation of the extraction phase inspired from the first step of MapReduce paradigm in order to prepare the massive data to the transformation phase. Subsequently, we provide a conceptual model of the extraction phase which is composed of a conversion operation that guarantees obtaining NoSQL structure suitable for Big Data storage, and a vertical partitioning operation for presenting the storage mode before submitting data to the second ETL phase. Finally, we implement through Talend for Big Data our new component which helps the designer extract data from semi-structured data.","PeriodicalId":88526,"journal":{"name":"International journal of hybrid intelligent systems","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of hybrid intelligent systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/his-230008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

As the amount of information exceeds the management and storage capacity of traditional data management systems, several domains need to take into account this growth of data, in particular the decision-making domain known as Business Intelligence (BI). Since the accumulation and reuse of these massive data stands for a gold mine for businesses, several insights that are useful and essential for effective decision making have to be provided. However, it is obvious that there are several problems and challenges for the BI systems, especially at the level of the ETL (Extraction-Transformation-Loading) as an integration system. These processes are responsible for the selection, filtering and restructuring of data sources in order to obtain relevant decisions. In this research paper, our central focus is especially upon the adaptation of the extraction phase inspired from the first step of MapReduce paradigm in order to prepare the massive data to the transformation phase. Subsequently, we provide a conceptual model of the extraction phase which is composed of a conversion operation that guarantees obtaining NoSQL structure suitable for Big Data storage, and a vertical partitioning operation for presenting the storage mode before submitting data to the second ETL phase. Finally, we implement through Talend for Big Data our new component which helps the designer extract data from semi-structured data.
大数据提取阶段概念建模
由于信息量超过了传统数据管理系统的管理和存储能力,有几个领域需要考虑数据的增长,特别是被称为商业智能(BI)的决策领域。由于这些海量数据的积累和重复使用代表着企业的金矿,因此必须提供一些对有效决策有用且至关重要的见解。然而,很明显,BI系统存在一些问题和挑战,尤其是在ETL(提取转换加载)作为集成系统的层面上。这些过程负责数据源的选择、过滤和重组,以获得相关决策。在这篇研究论文中,我们的中心关注点特别是受MapReduce范式第一步启发的提取阶段的适应性,以便为转换阶段准备大量数据。随后,我们提供了提取阶段的概念模型,该模型由保证获得适合大数据存储的NoSQL结构的转换操作和在向第二ETL阶段提交数据之前呈现存储模式的垂直分区操作组成。最后,我们通过Talend for Big Data实现了我们的新组件,它可以帮助设计者从半结构化数据中提取数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.30
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信