基于分类算法的云环境下大数据任务调度预测方法

2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence Pub Date : 2017-01-01 DOI:10.1109/CONFLUENCE.2017.7943147

Vidushi Vashishth, Anshuman Chhabra, A. Sood

{"title":"基于分类算法的云环境下大数据任务调度预测方法","authors":"Vidushi Vashishth, Anshuman Chhabra, A. Sood","doi":"10.1109/CONFLUENCE.2017.7943147","DOIUrl":null,"url":null,"abstract":"There have been many recent developments in integrating the Cloud with the Internet of Τhings (IoT) which comprise of up and coming technologies such as Smart Cities and Smart devices. This federation has resulted in research being directed towards further integration of Big Data with the Cloud, as IoT devices consisting of such technologies generate a continuous stream of sensor data. Thus, in this paper, we seek to present a predictive approach to task scheduling with the aim of reducing the overhead incurred when Big Data is processed on the Cloud. Subsequently, we wish to increase both the efficiency and reliability of the Cloud network while handling Big Data. We present a method of using classification in Machine Learning as a tool for scheduling tasks and assigning them to Virtual Machines (VMs) in the Cloud environment. A comparative study is undertaken to observe which brand of classifiers perform optimally in the given scenario. Particle Swarm Optimization (PSO) is used to generate the dataset which is used to train the classifiers. A number of classification algorithms such as Naive Bayes, Random Forest and Κ Nearest Neighbor are then used to predict the VM best suited to a task in the test dataset.","PeriodicalId":6651,"journal":{"name":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","volume":"65 1","pages":"188-192"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"A predictive approach to task scheduling for Big Data in cloud environments using classification algorithms\",\"authors\":\"Vidushi Vashishth, Anshuman Chhabra, A. Sood\",\"doi\":\"10.1109/CONFLUENCE.2017.7943147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There have been many recent developments in integrating the Cloud with the Internet of Τhings (IoT) which comprise of up and coming technologies such as Smart Cities and Smart devices. This federation has resulted in research being directed towards further integration of Big Data with the Cloud, as IoT devices consisting of such technologies generate a continuous stream of sensor data. Thus, in this paper, we seek to present a predictive approach to task scheduling with the aim of reducing the overhead incurred when Big Data is processed on the Cloud. Subsequently, we wish to increase both the efficiency and reliability of the Cloud network while handling Big Data. We present a method of using classification in Machine Learning as a tool for scheduling tasks and assigning them to Virtual Machines (VMs) in the Cloud environment. A comparative study is undertaken to observe which brand of classifiers perform optimally in the given scenario. Particle Swarm Optimization (PSO) is used to generate the dataset which is used to train the classifiers. A number of classification algorithms such as Naive Bayes, Random Forest and Κ Nearest Neighbor are then used to predict the VM best suited to a task in the test dataset.\",\"PeriodicalId\":6651,\"journal\":{\"name\":\"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence\",\"volume\":\"65 1\",\"pages\":\"188-192\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONFLUENCE.2017.7943147\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONFLUENCE.2017.7943147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

最近在将云与Τhings互联网(IoT)集成方面有了许多发展，其中包括智能城市和智能设备等新兴技术。这种联合导致研究的方向是进一步整合大数据与云，因为由这些技术组成的物联网设备会产生连续的传感器数据流。因此，在本文中，我们试图提出一种任务调度的预测方法，目的是减少在云上处理大数据时产生的开销。接下来，我们希望在处理大数据的同时，提高云网络的效率和可靠性。我们提出了一种使用机器学习中的分类作为调度任务并将其分配给云环境中的虚拟机(vm)的工具的方法。进行比较研究，以观察哪种品牌的分类器在给定的情况下表现最佳。采用粒子群算法生成数据集，用于训练分类器。然后使用朴素贝叶斯、随机森林和Κ最近邻等许多分类算法来预测最适合测试数据集中任务的VM。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A predictive approach to task scheduling for Big Data in cloud environments using classification algorithms

There have been many recent developments in integrating the Cloud with the Internet of Τhings (IoT) which comprise of up and coming technologies such as Smart Cities and Smart devices. This federation has resulted in research being directed towards further integration of Big Data with the Cloud, as IoT devices consisting of such technologies generate a continuous stream of sensor data. Thus, in this paper, we seek to present a predictive approach to task scheduling with the aim of reducing the overhead incurred when Big Data is processed on the Cloud. Subsequently, we wish to increase both the efficiency and reliability of the Cloud network while handling Big Data. We present a method of using classification in Machine Learning as a tool for scheduling tasks and assigning them to Virtual Machines (VMs) in the Cloud environment. A comparative study is undertaken to observe which brand of classifiers perform optimally in the given scenario. Particle Swarm Optimization (PSO) is used to generate the dataset which is used to train the classifiers. A number of classification algorithms such as Naive Bayes, Random Forest and Κ Nearest Neighbor are then used to predict the VM best suited to a task in the test dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence

自引率

0.00%

发文量