{"title":"利用监督学习方法缓解虚拟机堆栈中的掉队者","authors":"Reshma S. Gaykar, V. Khanaa, S. Joshi","doi":"10.1109/ESCI56872.2023.10099658","DOIUrl":null,"url":null,"abstract":"Hadoop is an inexpensive analytical tool as compared to the other distributed storage in market as it does not need any standalone machines and works on group of commodity hardware. It is a distributed storage system along with this it achieves parallelization of larger data collections. With MapReduce, HDFS (Hadoop distributed file system) provides solution for the system where processing huge datasets is a requirement. Few of the main reasons of stragglers in assorted Hadoop clusters are load inconsistency during storing, resource friction throughout scheduling tasks, hardware downturn due to excessive usage, as well as software configuration issues while managing the cluster. Hadoop's performance lows down in a heterogeneous network due to the technical heterogeneity. We used a supervised machine learning (ML) technique to identify straggler nodes in an eminently distributed network in this article. The suggested technique identifies the proper slow-running job (Straggler) in the network and assign it to other node in the stack to complete the operation with quick succession. Virtual Machine (VM) identifier, network bandwidth consumption, number of processors and its load, memory load and other parameters included in the full data set are utilized for recognition. Various feature extraction methodologies have been utilized to develop its training system. The whole data set was processed for heterogenous features on the dataset. We analyzed our approach using our suggested classifier after doing comprehensive empirical work. As out-turn, the system outperforms using typical machine learning models in classification performance.","PeriodicalId":441215,"journal":{"name":"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mitigation of Straggler in Virtual Machine Stack Using Supervised Learning Methodology\",\"authors\":\"Reshma S. Gaykar, V. Khanaa, S. Joshi\",\"doi\":\"10.1109/ESCI56872.2023.10099658\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hadoop is an inexpensive analytical tool as compared to the other distributed storage in market as it does not need any standalone machines and works on group of commodity hardware. It is a distributed storage system along with this it achieves parallelization of larger data collections. With MapReduce, HDFS (Hadoop distributed file system) provides solution for the system where processing huge datasets is a requirement. Few of the main reasons of stragglers in assorted Hadoop clusters are load inconsistency during storing, resource friction throughout scheduling tasks, hardware downturn due to excessive usage, as well as software configuration issues while managing the cluster. Hadoop's performance lows down in a heterogeneous network due to the technical heterogeneity. We used a supervised machine learning (ML) technique to identify straggler nodes in an eminently distributed network in this article. The suggested technique identifies the proper slow-running job (Straggler) in the network and assign it to other node in the stack to complete the operation with quick succession. Virtual Machine (VM) identifier, network bandwidth consumption, number of processors and its load, memory load and other parameters included in the full data set are utilized for recognition. Various feature extraction methodologies have been utilized to develop its training system. The whole data set was processed for heterogenous features on the dataset. We analyzed our approach using our suggested classifier after doing comprehensive empirical work. As out-turn, the system outperforms using typical machine learning models in classification performance.\",\"PeriodicalId\":441215,\"journal\":{\"name\":\"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ESCI56872.2023.10099658\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Emerging Smart Computing and Informatics (ESCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESCI56872.2023.10099658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mitigation of Straggler in Virtual Machine Stack Using Supervised Learning Methodology
Hadoop is an inexpensive analytical tool as compared to the other distributed storage in market as it does not need any standalone machines and works on group of commodity hardware. It is a distributed storage system along with this it achieves parallelization of larger data collections. With MapReduce, HDFS (Hadoop distributed file system) provides solution for the system where processing huge datasets is a requirement. Few of the main reasons of stragglers in assorted Hadoop clusters are load inconsistency during storing, resource friction throughout scheduling tasks, hardware downturn due to excessive usage, as well as software configuration issues while managing the cluster. Hadoop's performance lows down in a heterogeneous network due to the technical heterogeneity. We used a supervised machine learning (ML) technique to identify straggler nodes in an eminently distributed network in this article. The suggested technique identifies the proper slow-running job (Straggler) in the network and assign it to other node in the stack to complete the operation with quick succession. Virtual Machine (VM) identifier, network bandwidth consumption, number of processors and its load, memory load and other parameters included in the full data set are utilized for recognition. Various feature extraction methodologies have been utilized to develop its training system. The whole data set was processed for heterogenous features on the dataset. We analyzed our approach using our suggested classifier after doing comprehensive empirical work. As out-turn, the system outperforms using typical machine learning models in classification performance.