{"title":"Still Open Problems in Data Warehouse and Data Lake Research: extended abstract","authors":"R. Wrembel","doi":"10.1109/SNAMS53716.2021.9732098","DOIUrl":null,"url":null,"abstract":"During recent years, we observe a widespread of new data sources, especially all types of social media and IoT devices, which produce huge data volumes, whose content ranges from fully structured to totally unstructured. All these types of data are commonly referred to as big data. They are typically described by the three most important characteristics, called 3V [1], namely: an extremely large volume, a variety of data models and structures (data representations), as well as a high velocity at which data are generated. We argue that out of these three Vs, the most challenging is variety [2]. Such data need to be integrated and transformed into a common representation, which is suitable for analysis, in a similar manner as traditional (mainly table-like) data.","PeriodicalId":387260,"journal":{"name":"2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Eighth International Conference on Social Network Analysis, Management and Security (SNAMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNAMS53716.2021.9732098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
During recent years, we observe a widespread of new data sources, especially all types of social media and IoT devices, which produce huge data volumes, whose content ranges from fully structured to totally unstructured. All these types of data are commonly referred to as big data. They are typically described by the three most important characteristics, called 3V [1], namely: an extremely large volume, a variety of data models and structures (data representations), as well as a high velocity at which data are generated. We argue that out of these three Vs, the most challenging is variety [2]. Such data need to be integrated and transformed into a common representation, which is suitable for analysis, in a similar manner as traditional (mainly table-like) data.