R. Roopa, V. Ryali, Tejasvi Shrivastava, SyedMuhammad Anwar
{"title":"MapReduce、Hive和Spark中的Aadhaar数据分析比较","authors":"R. Roopa, V. Ryali, Tejasvi Shrivastava, SyedMuhammad Anwar","doi":"10.2991/ahis.k.210913.036","DOIUrl":null,"url":null,"abstract":"Aadhaar with a 12-digit unique identification number of every Indian provides demographic and biometric information and is mandatory for various purposes like benefit transfer directly, healthcare, etc. Approximately Aadhaar details need to store 1.3 Billion Indians which attributes to the concept of big data. In this paper, the proposed hybrid model analyses the Aadhaar dataset w.r.t different research interrogations such as count of applicants based on gender, state-wise approved and by age type applicants. In the existing systems, Aadhaar data analyses are done either manually or in primitive SQL platforms which may take days to complete. In this paper, the focus is on Aadhaar data analysis using different distributed computing frameworks like MapReduce, Hive, and Apache Spark on top of Hadoop that could be used for the purpose of better decision-making by all government firms and we provide the valid conclusion that Apache Spark framework is efficient in terms of performance.","PeriodicalId":417648,"journal":{"name":"Proceedings of the 3rd International Conference on Integrated Intelligent Computing Communication & Security (ICIIC 2021)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Aadhaar Data Analysis Comparison in MapReduce, Hive and Spark\",\"authors\":\"R. Roopa, V. Ryali, Tejasvi Shrivastava, SyedMuhammad Anwar\",\"doi\":\"10.2991/ahis.k.210913.036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aadhaar with a 12-digit unique identification number of every Indian provides demographic and biometric information and is mandatory for various purposes like benefit transfer directly, healthcare, etc. Approximately Aadhaar details need to store 1.3 Billion Indians which attributes to the concept of big data. In this paper, the proposed hybrid model analyses the Aadhaar dataset w.r.t different research interrogations such as count of applicants based on gender, state-wise approved and by age type applicants. In the existing systems, Aadhaar data analyses are done either manually or in primitive SQL platforms which may take days to complete. In this paper, the focus is on Aadhaar data analysis using different distributed computing frameworks like MapReduce, Hive, and Apache Spark on top of Hadoop that could be used for the purpose of better decision-making by all government firms and we provide the valid conclusion that Apache Spark framework is efficient in terms of performance.\",\"PeriodicalId\":417648,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on Integrated Intelligent Computing Communication & Security (ICIIC 2021)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on Integrated Intelligent Computing Communication & Security (ICIIC 2021)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2991/ahis.k.210913.036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Integrated Intelligent Computing Communication & Security (ICIIC 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2991/ahis.k.210913.036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Aadhaar Data Analysis Comparison in MapReduce, Hive and Spark
Aadhaar with a 12-digit unique identification number of every Indian provides demographic and biometric information and is mandatory for various purposes like benefit transfer directly, healthcare, etc. Approximately Aadhaar details need to store 1.3 Billion Indians which attributes to the concept of big data. In this paper, the proposed hybrid model analyses the Aadhaar dataset w.r.t different research interrogations such as count of applicants based on gender, state-wise approved and by age type applicants. In the existing systems, Aadhaar data analyses are done either manually or in primitive SQL platforms which may take days to complete. In this paper, the focus is on Aadhaar data analysis using different distributed computing frameworks like MapReduce, Hive, and Apache Spark on top of Hadoop that could be used for the purpose of better decision-making by all government firms and we provide the valid conclusion that Apache Spark framework is efficient in terms of performance.