Soft Real-Time Hadoop Scheduler for Big Data Processing in Smart Cities

2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA) Pub Date : 2016-03-23 DOI:10.1109/AINA.2016.122

Ciprian Barbieru, Florin Pop

{"title":"Soft Real-Time Hadoop Scheduler for Big Data Processing in Smart Cities","authors":"Ciprian Barbieru, Florin Pop","doi":"10.1109/AINA.2016.122","DOIUrl":null,"url":null,"abstract":"We live in a world where every electronic device generates data, and does so in a variety of ways that respect a multitude of patterns particular to every device and user. Some users user their phone to browse the Internet on their daily commute, some check it for updates every hour, and some may use it constantly throughout the day to accomplish different tasks. Even the same device can be used in variety of ways, let alone different devices. Besides the user generated data, there is also machine generated data, which can have a more foreseeable pattern, like nightly backups or scheduled tasks, but usually imply more CPU or I/O intensive tasks than the sporadic ones generated by human users. In a context where the analyzed data size is constantly increasing and we start to talk about Big Data in more and more daily tasks, we need a way to handle all these diverse tasks that serve a variety of purposes. Some of this data must be sometimes analyzed as fast as possible, or, in some cases the analysis can be done at the end of the day, as part of a batch process. In order to handle all this diversity we design a real-time and job scheduler in Hadoop for Big Data processing that addresses both the problem of small tasks that need to be executed in real time, and in the same time, adjust for long-running jobs where time of completion is not that strictly defined. The case study is applied as support for Smart City applications that are gathered / routed / stored via mobile devices and processed / diffused via a more standard Clouds.","PeriodicalId":438655,"journal":{"name":"2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA)","volume":"3 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINA.2016.122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

We live in a world where every electronic device generates data, and does so in a variety of ways that respect a multitude of patterns particular to every device and user. Some users user their phone to browse the Internet on their daily commute, some check it for updates every hour, and some may use it constantly throughout the day to accomplish different tasks. Even the same device can be used in variety of ways, let alone different devices. Besides the user generated data, there is also machine generated data, which can have a more foreseeable pattern, like nightly backups or scheduled tasks, but usually imply more CPU or I/O intensive tasks than the sporadic ones generated by human users. In a context where the analyzed data size is constantly increasing and we start to talk about Big Data in more and more daily tasks, we need a way to handle all these diverse tasks that serve a variety of purposes. Some of this data must be sometimes analyzed as fast as possible, or, in some cases the analysis can be done at the end of the day, as part of a batch process. In order to handle all this diversity we design a real-time and job scheduler in Hadoop for Big Data processing that addresses both the problem of small tasks that need to be executed in real time, and in the same time, adjust for long-running jobs where time of completion is not that strictly defined. The case study is applied as support for Smart City applications that are gathered / routed / stored via mobile devices and processed / diffused via a more standard Clouds.

查看原文本刊更多论文

面向智慧城市大数据处理的软实时Hadoop调度程序

在我们生活的世界里，每个电子设备都会生成数据，并且以各种方式生成数据，这些方式尊重每个设备和用户特有的多种模式。一些用户在日常通勤时用手机浏览互联网，一些人每小时查看一次更新，还有一些人可能一整天都在使用手机来完成不同的任务。即使是同一设备也可以有多种使用方式，更不用说不同的设备了。除了用户生成的数据之外，还有机器生成的数据，这些数据可能具有更可预见的模式，例如夜间备份或计划任务，但通常意味着比人类用户生成的零星任务更多的CPU或I/O密集型任务。在分析数据量不断增加的背景下，我们开始在越来越多的日常任务中讨论大数据，我们需要一种方法来处理所有这些服务于各种目的的不同任务。有时必须尽可能快地分析其中一些数据，或者在某些情况下，可以在一天结束时作为批处理过程的一部分进行分析。为了处理所有这些多样性，我们在Hadoop中设计了一个用于大数据处理的实时和作业调度器，它既解决了需要实时执行的小任务的问题，同时也调整了完成时间没有严格定义的长时间作业。该案例研究用于支持通过移动设备收集/路由/存储并通过更标准的云处理/扩散的智慧城市应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA)

自引率

0.00%

发文量