利用监督学习方法缓解虚拟机堆栈中的掉队者

Reshma S. Gaykar, V. Khanaa, S. Joshi
{"title":"利用监督学习方法缓解虚拟机堆栈中的掉队者","authors":"Reshma S. Gaykar, V. Khanaa, S. Joshi","doi":"10.1109/ESCI56872.2023.10099658","DOIUrl":null,"url":null,"abstract":"Hadoop is an inexpensive analytical tool as compared to the other distributed storage in market as it does not need any standalone machines and works on group of commodity hardware. It is a distributed storage system along with this it achieves parallelization of larger data collections. With MapReduce, HDFS (Hadoop distributed file system) provides solution for the system where processing huge datasets is a requirement. Few of the main reasons of stragglers in assorted Hadoop clusters are load inconsistency during storing, resource friction throughout scheduling tasks, hardware downturn due to excessive usage, as well as software configuration issues while managing the cluster. Hadoop's performance lows down in a heterogeneous network due to the technical heterogeneity. We used a supervised machine learning (ML) technique to identify straggler nodes in an eminently distributed network in this article. The suggested technique identifies the proper slow-running job (Straggler) in the network and assign it to other node in the stack to complete the operation with quick succession. Virtual Machine (VM) identifier, network bandwidth consumption, number of processors and its load, memory load and other parameters included in the full data set are utilized for recognition. Various feature extraction methodologies have been utilized to develop its training system. The whole data set was processed for heterogenous features on the dataset. We analyzed our approach using our suggested classifier after doing comprehensive empirical work. As out-turn, the system outperforms using typical machine learning models in classification performance.","PeriodicalId":441215,"journal":{"name":"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mitigation of Straggler in Virtual Machine Stack Using Supervised Learning Methodology\",\"authors\":\"Reshma S. Gaykar, V. Khanaa, S. Joshi\",\"doi\":\"10.1109/ESCI56872.2023.10099658\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop is an inexpensive analytical tool as compared to the other distributed storage in market as it does not need any standalone machines and works on group of commodity hardware. It is a distributed storage system along with this it achieves parallelization of larger data collections. With MapReduce, HDFS (Hadoop distributed file system) provides solution for the system where processing huge datasets is a requirement. Few of the main reasons of stragglers in assorted Hadoop clusters are load inconsistency during storing, resource friction throughout scheduling tasks, hardware downturn due to excessive usage, as well as software configuration issues while managing the cluster. Hadoop's performance lows down in a heterogeneous network due to the technical heterogeneity. We used a supervised machine learning (ML) technique to identify straggler nodes in an eminently distributed network in this article. The suggested technique identifies the proper slow-running job (Straggler) in the network and assign it to other node in the stack to complete the operation with quick succession. Virtual Machine (VM) identifier, network bandwidth consumption, number of processors and its load, memory load and other parameters included in the full data set are utilized for recognition. Various feature extraction methodologies have been utilized to develop its training system. The whole data set was processed for heterogenous features on the dataset. We analyzed our approach using our suggested classifier after doing comprehensive empirical work. As out-turn, the system outperforms using typical machine learning models in classification performance.\",\"PeriodicalId\":441215,\"journal\":{\"name\":\"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESCI56872.2023.10099658\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESCI56872.2023.10099658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

与市场上的其他分布式存储相比,Hadoop是一种便宜的分析工具,因为它不需要任何独立的机器,并且可以在一组商用硬件上工作。它是一个分布式存储系统,因此它实现了大型数据集合的并行化。通过MapReduce, HDFS (Hadoop分布式文件系统)为需要处理海量数据集的系统提供了解决方案。在不同的Hadoop集群中出现掉队的几个主要原因是存储期间的负载不一致、调度任务中的资源摩擦、过度使用导致的硬件下降,以及管理集群时的软件配置问题。由于技术的异构性,Hadoop在异构网络中的性能会下降。在本文中,我们使用监督机器学习(ML)技术来识别分布式网络中的离散节点。建议的技术识别网络中适当的慢速运行作业(Straggler),并将其分配给堆栈中的其他节点,以快速连续完成操作。利用虚拟机(VM)标识符、网络带宽消耗、处理器数量及其负载、内存负载和包含在完整数据集中的其他参数进行识别。利用各种特征提取方法开发其训练系统。对整个数据集进行异构特征处理。在做了全面的实证工作后,我们使用我们建议的分类器分析了我们的方法。结果表明,该系统在分类性能上优于典型的机器学习模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Mitigation of Straggler in Virtual Machine Stack Using Supervised Learning Methodology
Hadoop is an inexpensive analytical tool as compared to the other distributed storage in market as it does not need any standalone machines and works on group of commodity hardware. It is a distributed storage system along with this it achieves parallelization of larger data collections. With MapReduce, HDFS (Hadoop distributed file system) provides solution for the system where processing huge datasets is a requirement. Few of the main reasons of stragglers in assorted Hadoop clusters are load inconsistency during storing, resource friction throughout scheduling tasks, hardware downturn due to excessive usage, as well as software configuration issues while managing the cluster. Hadoop's performance lows down in a heterogeneous network due to the technical heterogeneity. We used a supervised machine learning (ML) technique to identify straggler nodes in an eminently distributed network in this article. The suggested technique identifies the proper slow-running job (Straggler) in the network and assign it to other node in the stack to complete the operation with quick succession. Virtual Machine (VM) identifier, network bandwidth consumption, number of processors and its load, memory load and other parameters included in the full data set are utilized for recognition. Various feature extraction methodologies have been utilized to develop its training system. The whole data set was processed for heterogenous features on the dataset. We analyzed our approach using our suggested classifier after doing comprehensive empirical work. As out-turn, the system outperforms using typical machine learning models in classification performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信