Ming Jiang, Du Lian, Jianping Wu, Min Zhang, Gong Zexin
{"title":"基于加权ML-kNN的多标签数据分类算法","authors":"Ming Jiang, Du Lian, Jianping Wu, Min Zhang, Gong Zexin","doi":"10.1504/ijims.2019.103861","DOIUrl":null,"url":null,"abstract":"The ML-kNN algorithm uses naive Bayesian classification to modify the traditional kNN algorithm to solve multi-label classification problems. However, the ML-kNN algorithm is prone to misjudgement or incomplete judgment of the unseen instance's label set in two special cases: when the number of labels in the training set is not balanced and when the training instances are unevenly distributed in space. Therefore, a weighted ML-kNN algorithm (i.e., wML-kNN) is proposed in this paper. The main idea is to assign different weights to each label according to the proportion of labels and mutual information of the spatial distribution of unseen instances to training instances. This method can reduce the probability of misjudgement of the unseen instance's label set. A comparative study was conducted on four multi-label datasets that included review classification and three other published benchmark multi-label datasets: yeast gene function analysis, natural scene classification, and musical sentiment classification. The results show that the performance of the wML-kNN algorithm is better than the other four multi-label learning algorithms, including ML-kNN.","PeriodicalId":39293,"journal":{"name":"International Journal of Internet Manufacturing and Services","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijims.2019.103861","citationCount":"3","resultStr":"{\"title\":\"A classification algorithm based on weighted ML-kNN for multi-label data\",\"authors\":\"Ming Jiang, Du Lian, Jianping Wu, Min Zhang, Gong Zexin\",\"doi\":\"10.1504/ijims.2019.103861\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ML-kNN algorithm uses naive Bayesian classification to modify the traditional kNN algorithm to solve multi-label classification problems. However, the ML-kNN algorithm is prone to misjudgement or incomplete judgment of the unseen instance's label set in two special cases: when the number of labels in the training set is not balanced and when the training instances are unevenly distributed in space. Therefore, a weighted ML-kNN algorithm (i.e., wML-kNN) is proposed in this paper. The main idea is to assign different weights to each label according to the proportion of labels and mutual information of the spatial distribution of unseen instances to training instances. This method can reduce the probability of misjudgement of the unseen instance's label set. A comparative study was conducted on four multi-label datasets that included review classification and three other published benchmark multi-label datasets: yeast gene function analysis, natural scene classification, and musical sentiment classification. The results show that the performance of the wML-kNN algorithm is better than the other four multi-label learning algorithms, including ML-kNN.\",\"PeriodicalId\":39293,\"journal\":{\"name\":\"International Journal of Internet Manufacturing and Services\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1504/ijims.2019.103861\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Internet Manufacturing and Services\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/ijims.2019.103861\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Internet Manufacturing and Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijims.2019.103861","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
A classification algorithm based on weighted ML-kNN for multi-label data
The ML-kNN algorithm uses naive Bayesian classification to modify the traditional kNN algorithm to solve multi-label classification problems. However, the ML-kNN algorithm is prone to misjudgement or incomplete judgment of the unseen instance's label set in two special cases: when the number of labels in the training set is not balanced and when the training instances are unevenly distributed in space. Therefore, a weighted ML-kNN algorithm (i.e., wML-kNN) is proposed in this paper. The main idea is to assign different weights to each label according to the proportion of labels and mutual information of the spatial distribution of unseen instances to training instances. This method can reduce the probability of misjudgement of the unseen instance's label set. A comparative study was conducted on four multi-label datasets that included review classification and three other published benchmark multi-label datasets: yeast gene function analysis, natural scene classification, and musical sentiment classification. The results show that the performance of the wML-kNN algorithm is better than the other four multi-label learning algorithms, including ML-kNN.