B. Agrawal, Antorweep Chakravorty, Chunming Rong, T. Wlodarczyk
{"title":"R2Time: A Framework to Analyse Open TSDB Time-Series Data in HBase","authors":"B. Agrawal, Antorweep Chakravorty, Chunming Rong, T. Wlodarczyk","doi":"10.1109/CloudCom.2014.84","DOIUrl":null,"url":null,"abstract":"In recent years, the amount of time series data generated in different domains have grown consistently. Analyzing large time-series datasets coming from sensor networks, power grids, stock exchanges, social networks and cloud monitoring logs at a massive scale is one of the biggest challenges that data scientists are facing. Big data storage and processing frameworks provides an environment to handle the volume, velocity and frequency attributes associated with time-series data. We propose an efficient and distributed computing framework - R2Time for processing such data in the Hadoop environment. It integrates R with a distributed time-series database (Open TSDB) using a MapReduce programming framework (RHIPE). R2Time allows analysts to work on huge datasets from within a popular, well supported, and powerful analysis environment.","PeriodicalId":249306,"journal":{"name":"2014 IEEE 6th International Conference on Cloud Computing Technology and Science","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 6th International Conference on Cloud Computing Technology and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom.2014.84","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
In recent years, the amount of time series data generated in different domains have grown consistently. Analyzing large time-series datasets coming from sensor networks, power grids, stock exchanges, social networks and cloud monitoring logs at a massive scale is one of the biggest challenges that data scientists are facing. Big data storage and processing frameworks provides an environment to handle the volume, velocity and frequency attributes associated with time-series data. We propose an efficient and distributed computing framework - R2Time for processing such data in the Hadoop environment. It integrates R with a distributed time-series database (Open TSDB) using a MapReduce programming framework (RHIPE). R2Time allows analysts to work on huge datasets from within a popular, well supported, and powerful analysis environment.