{"title":"集成方法在水质分类中的应用","authors":"M. Sakizadeh","doi":"10.1504/IJW.2017.10004524","DOIUrl":null,"url":null,"abstract":"Groundwater pollution in Shoosh Aquifer located in Khuzestan Province, Iran, was considered, using an eight years time period data set collected from 30 sampling wells. Cluster analysis rendered a dendrogram where 30 sampling wells were grouped into three statistically significant clusters. The classification methods, k-nearest neighbour and classification tree, were utilised to classify sampling stations, with respect to the level of pollution. The optimum tree depth and number of neighbours were determined by 4-fold misclassification error which both had an error of 0.167. An ensemble was created using these base classifiers. In addition, considering the small sample size of our data in this study, random subspace as a feature selection method was amalgamated with k-nearest neighbour ensemble. The misclassification errors of classification tree and k-nearest neighbour ensembles were 0.13 and 0.10, respectively. The results of this study confirmed the high accuracy of ensemble methods for data classification.","PeriodicalId":39788,"journal":{"name":"International Journal of Water","volume":"11 1","pages":"114-131"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of ensemble methods for classification of water quality\",\"authors\":\"M. Sakizadeh\",\"doi\":\"10.1504/IJW.2017.10004524\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Groundwater pollution in Shoosh Aquifer located in Khuzestan Province, Iran, was considered, using an eight years time period data set collected from 30 sampling wells. Cluster analysis rendered a dendrogram where 30 sampling wells were grouped into three statistically significant clusters. The classification methods, k-nearest neighbour and classification tree, were utilised to classify sampling stations, with respect to the level of pollution. The optimum tree depth and number of neighbours were determined by 4-fold misclassification error which both had an error of 0.167. An ensemble was created using these base classifiers. In addition, considering the small sample size of our data in this study, random subspace as a feature selection method was amalgamated with k-nearest neighbour ensemble. The misclassification errors of classification tree and k-nearest neighbour ensembles were 0.13 and 0.10, respectively. The results of this study confirmed the high accuracy of ensemble methods for data classification.\",\"PeriodicalId\":39788,\"journal\":{\"name\":\"International Journal of Water\",\"volume\":\"11 1\",\"pages\":\"114-131\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Water\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJW.2017.10004524\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Water","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJW.2017.10004524","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
Application of ensemble methods for classification of water quality
Groundwater pollution in Shoosh Aquifer located in Khuzestan Province, Iran, was considered, using an eight years time period data set collected from 30 sampling wells. Cluster analysis rendered a dendrogram where 30 sampling wells were grouped into three statistically significant clusters. The classification methods, k-nearest neighbour and classification tree, were utilised to classify sampling stations, with respect to the level of pollution. The optimum tree depth and number of neighbours were determined by 4-fold misclassification error which both had an error of 0.167. An ensemble was created using these base classifiers. In addition, considering the small sample size of our data in this study, random subspace as a feature selection method was amalgamated with k-nearest neighbour ensemble. The misclassification errors of classification tree and k-nearest neighbour ensembles were 0.13 and 0.10, respectively. The results of this study confirmed the high accuracy of ensemble methods for data classification.
期刊介绍:
The IJW is a fully refereed journal, providing a high profile international outlet for analyses and discussions of all aspects of water, environment and society.