{"title":"基于大数据的map -约简技术缺失数据填充算法","authors":"Fugui Li, Ashutosh Sharma","doi":"10.4018/ijec.304036","DOIUrl":null,"url":null,"abstract":"In big data, the large number of missing values has a serious problem to compute the correct decision. This problem seriously affects the quality of information query, distorts data mining and analysis, and misleads the decisions. Therefore, in order to solve the missing values in the real database, we have pre populated the missing data, and filled in the classification attributes based on the probabilistic reasoning. The reasoning process is completed in Bayesian network to realize the parallelization of big data processing. The proposed algorithm has been presented in the Map-Reduce framework. The experimental results show that the Bayesian network construction method and probabilistic inference are effective for the classification data processing, and the parallelism of algorithm in Hadoop.","PeriodicalId":13957,"journal":{"name":"Int. J. e Collab.","volume":"7 1","pages":"1-11"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Missing Data Filling Algorithm for Big Data-Based Map-Reduce Technology\",\"authors\":\"Fugui Li, Ashutosh Sharma\",\"doi\":\"10.4018/ijec.304036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In big data, the large number of missing values has a serious problem to compute the correct decision. This problem seriously affects the quality of information query, distorts data mining and analysis, and misleads the decisions. Therefore, in order to solve the missing values in the real database, we have pre populated the missing data, and filled in the classification attributes based on the probabilistic reasoning. The reasoning process is completed in Bayesian network to realize the parallelization of big data processing. The proposed algorithm has been presented in the Map-Reduce framework. The experimental results show that the Bayesian network construction method and probabilistic inference are effective for the classification data processing, and the parallelism of algorithm in Hadoop.\",\"PeriodicalId\":13957,\"journal\":{\"name\":\"Int. J. e Collab.\",\"volume\":\"7 1\",\"pages\":\"1-11\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. e Collab.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/ijec.304036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. e Collab.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijec.304036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Missing Data Filling Algorithm for Big Data-Based Map-Reduce Technology
In big data, the large number of missing values has a serious problem to compute the correct decision. This problem seriously affects the quality of information query, distorts data mining and analysis, and misleads the decisions. Therefore, in order to solve the missing values in the real database, we have pre populated the missing data, and filled in the classification attributes based on the probabilistic reasoning. The reasoning process is completed in Bayesian network to realize the parallelization of big data processing. The proposed algorithm has been presented in the Map-Reduce framework. The experimental results show that the Bayesian network construction method and probabilistic inference are effective for the classification data processing, and the parallelism of algorithm in Hadoop.