{"title":"Comparing HiveQL and MapReduce methods to process fact data in a data warehouse","authors":"Haince Denis Pen, Prajyoti Dsilva, Sweedle Mascarnes","doi":"10.1109/CSCITA.2017.8066553","DOIUrl":null,"url":null,"abstract":"Today Big data is one of the most widely spoken about technology that is being explored throughout the world by technology enthusiasts and academic researchers. The reason for this is the enormous data generated every second of each day. Every webpage visited, every text message sent, every post on social networking websites, check-in information, mouse clicks etc. is logged. This data needs to be stored and retrieved efficiently, moreover the data is unstructured therefore the traditional methods of strong data fail. This data needs to be stored and retrieved efficiently There is a need of an efficient, scalable and robust architecture that needs stores enormous amounts of unstructured data, which can be queried as and when required. In this paper, we come up with a novel methodology to build a data warehouse over big data technologies while specifically addressing the issues of scalability and user performance. Our emphasis is on building a data pipeline which can be used as a reference for future research on the methodologies to build a data warehouse over big data technologies for either structured or unstructured data sources. We have demonstrated the processing of data for retrieving the facts from data warehouse using two techniques, namely HiveQL and MapReduce.","PeriodicalId":299147,"journal":{"name":"2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA)","volume":"310 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCITA.2017.8066553","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Today Big data is one of the most widely spoken about technology that is being explored throughout the world by technology enthusiasts and academic researchers. The reason for this is the enormous data generated every second of each day. Every webpage visited, every text message sent, every post on social networking websites, check-in information, mouse clicks etc. is logged. This data needs to be stored and retrieved efficiently, moreover the data is unstructured therefore the traditional methods of strong data fail. This data needs to be stored and retrieved efficiently There is a need of an efficient, scalable and robust architecture that needs stores enormous amounts of unstructured data, which can be queried as and when required. In this paper, we come up with a novel methodology to build a data warehouse over big data technologies while specifically addressing the issues of scalability and user performance. Our emphasis is on building a data pipeline which can be used as a reference for future research on the methodologies to build a data warehouse over big data technologies for either structured or unstructured data sources. We have demonstrated the processing of data for retrieving the facts from data warehouse using two techniques, namely HiveQL and MapReduce.