Maad M. Mijwil, Alaa Wagih Abdulqader, Sura Mazin Ali, A. Sadiq
{"title":"采用不同修改的随机森林算法进行空值插值","authors":"Maad M. Mijwil, Alaa Wagih Abdulqader, Sura Mazin Ali, A. Sadiq","doi":"10.11591/ijai.v12.i1.pp374-383","DOIUrl":null,"url":null,"abstract":"Today, the world lives in the era of information and data. Therefore, it has become vital to collect and keep them in a database to perform a set of processes and obtain essential details. The null value problem will appear through these processes, which significantly influences the behaviour of processes such as analysis and prediction and gives inaccurate outcomes. In this concern, the authors decide to utilise the random forest technique by modifying it to calculate the null values from datasets got from the University of California Irvine (UCL) machine learning repository. The database of this scenario consists of connectionist bench, phishing websites, breast cancer, ionosphere, and COVID-19. The modified random forest algorithm is based on three matters and three number of null values. The samples chosen are founded on the proposed less redundancy bootstrap. Each tree has distinctive features depending on hybrid features selection. The final effect is considered based on ranked voting for classification. This scenario found that the modified random forest algorithm executed more suitable accuracy results than the traditional algorithm as it relied on four parameters and got sufficient accuracy in imputing the null value, which is grown by 9.5%, 6.5%, and 5.25% of one, two and three null values in the same row of datasets, respectively.","PeriodicalId":52221,"journal":{"name":"IAES International Journal of Artificial Intelligence","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Null-values imputation using different modification random forest algorithm\",\"authors\":\"Maad M. Mijwil, Alaa Wagih Abdulqader, Sura Mazin Ali, A. Sadiq\",\"doi\":\"10.11591/ijai.v12.i1.pp374-383\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today, the world lives in the era of information and data. Therefore, it has become vital to collect and keep them in a database to perform a set of processes and obtain essential details. The null value problem will appear through these processes, which significantly influences the behaviour of processes such as analysis and prediction and gives inaccurate outcomes. In this concern, the authors decide to utilise the random forest technique by modifying it to calculate the null values from datasets got from the University of California Irvine (UCL) machine learning repository. The database of this scenario consists of connectionist bench, phishing websites, breast cancer, ionosphere, and COVID-19. The modified random forest algorithm is based on three matters and three number of null values. The samples chosen are founded on the proposed less redundancy bootstrap. Each tree has distinctive features depending on hybrid features selection. The final effect is considered based on ranked voting for classification. This scenario found that the modified random forest algorithm executed more suitable accuracy results than the traditional algorithm as it relied on four parameters and got sufficient accuracy in imputing the null value, which is grown by 9.5%, 6.5%, and 5.25% of one, two and three null values in the same row of datasets, respectively.\",\"PeriodicalId\":52221,\"journal\":{\"name\":\"IAES International Journal of Artificial Intelligence\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IAES International Journal of Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11591/ijai.v12.i1.pp374-383\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IAES International Journal of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijai.v12.i1.pp374-383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}
Null-values imputation using different modification random forest algorithm
Today, the world lives in the era of information and data. Therefore, it has become vital to collect and keep them in a database to perform a set of processes and obtain essential details. The null value problem will appear through these processes, which significantly influences the behaviour of processes such as analysis and prediction and gives inaccurate outcomes. In this concern, the authors decide to utilise the random forest technique by modifying it to calculate the null values from datasets got from the University of California Irvine (UCL) machine learning repository. The database of this scenario consists of connectionist bench, phishing websites, breast cancer, ionosphere, and COVID-19. The modified random forest algorithm is based on three matters and three number of null values. The samples chosen are founded on the proposed less redundancy bootstrap. Each tree has distinctive features depending on hybrid features selection. The final effect is considered based on ranked voting for classification. This scenario found that the modified random forest algorithm executed more suitable accuracy results than the traditional algorithm as it relied on four parameters and got sufficient accuracy in imputing the null value, which is grown by 9.5%, 6.5%, and 5.25% of one, two and three null values in the same row of datasets, respectively.