{"title":"Improvement in Performance of Image Classification based on Apache Spark","authors":"Sunil K, Sivagamasundari G","doi":"10.1109/ASIANCON55314.2022.9909293","DOIUrl":null,"url":null,"abstract":"Apache Spark is a widely used efficient distributed computing framework in the field of Big Data for data processing and analytics at a large scale. There is wide demand from organizations to apply deep learning technologies to their existing big data analysis pipelines which will reduce the cost of maintaining additional computational resources. To classify large scale image data is a hot topic. For image classification, the classic Convolution neural network (CNN) model has been widely used as a standard deep learning algorithm. This paper focuses on implementation and demonstrates the execution of combination of Apache Spark and Convolution neural network algorithm that will provide significant improvement in performance for the image classification model. The paper aims to reduce overheads involved in this approach to provide better performance by the usage of novel opensource frameworks and bring together a unified pipeline for the same. Improvements in various performance metrics that are obtained from our experimental setup are presented in this work accordingly.","PeriodicalId":429704,"journal":{"name":"2022 2nd Asian Conference on Innovation in Technology (ASIANCON)","volume":"214 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd Asian Conference on Innovation in Technology (ASIANCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASIANCON55314.2022.9909293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Apache Spark is a widely used efficient distributed computing framework in the field of Big Data for data processing and analytics at a large scale. There is wide demand from organizations to apply deep learning technologies to their existing big data analysis pipelines which will reduce the cost of maintaining additional computational resources. To classify large scale image data is a hot topic. For image classification, the classic Convolution neural network (CNN) model has been widely used as a standard deep learning algorithm. This paper focuses on implementation and demonstrates the execution of combination of Apache Spark and Convolution neural network algorithm that will provide significant improvement in performance for the image classification model. The paper aims to reduce overheads involved in this approach to provide better performance by the usage of novel opensource frameworks and bring together a unified pipeline for the same. Improvements in various performance metrics that are obtained from our experimental setup are presented in this work accordingly.