{"title":"Filtering the big data based on volume, variety and velocity by using Kalman filter recursive approach","authors":"Fatima Riaz, Muhammad Alam, Attra Ali","doi":"10.1109/ICETSS.2017.8324195","DOIUrl":null,"url":null,"abstract":"For the past seven decades the term Big Data is known, but due to the emerging technology shift of this era, it is captivating a lot of attention from the researchers of mathematics, computing, telecommunication, information technology, data warehousing, and mining. As this generation is living in the age of technology where data is playing a vital role and especially the Big Data has lots of success stories, but at the same time it is becoming the biggest threat to network service provider, telecom industry, and homeland security. Every device such as smart phones, laptop, desktop, etc. connected with the network is contributing to add data to a Big Data pool by using different applications. Social media such as Instagram, Facebook, WhatsApp, Apple, Google, Google+, Twitter, Flickr, etc. are few famous tools which are used to add redundant data. The question appears, is it mandatory to store and especially process all the data either useful or redundant? This research paper is focusing for filtering useful data from redundant data by using their parameters which are velocity, variety, and volume. In proposed architecture, Memcache DB (for velocity), Voldemort layers (for variety) and MapReduce (for volume) are linked with Hadoop to achieve filtered data. Kalman filter recursive approach is used to inject the data back into Hadoop Distributed File System to reduce processing cost of next iterations.","PeriodicalId":228333,"journal":{"name":"2017 IEEE 3rd International Conference on Engineering Technologies and Social Sciences (ICETSS)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 3rd International Conference on Engineering Technologies and Social Sciences (ICETSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICETSS.2017.8324195","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
For the past seven decades the term Big Data is known, but due to the emerging technology shift of this era, it is captivating a lot of attention from the researchers of mathematics, computing, telecommunication, information technology, data warehousing, and mining. As this generation is living in the age of technology where data is playing a vital role and especially the Big Data has lots of success stories, but at the same time it is becoming the biggest threat to network service provider, telecom industry, and homeland security. Every device such as smart phones, laptop, desktop, etc. connected with the network is contributing to add data to a Big Data pool by using different applications. Social media such as Instagram, Facebook, WhatsApp, Apple, Google, Google+, Twitter, Flickr, etc. are few famous tools which are used to add redundant data. The question appears, is it mandatory to store and especially process all the data either useful or redundant? This research paper is focusing for filtering useful data from redundant data by using their parameters which are velocity, variety, and volume. In proposed architecture, Memcache DB (for velocity), Voldemort layers (for variety) and MapReduce (for volume) are linked with Hadoop to achieve filtered data. Kalman filter recursive approach is used to inject the data back into Hadoop Distributed File System to reduce processing cost of next iterations.