{"title":"Imbalanced Networked Multi-label Classification with Active Learning","authors":"Ruilong Zhang, Lei Li, Yuhong Zhang, Chenyang Bu","doi":"10.1109/ICBK.2018.00046","DOIUrl":null,"url":null,"abstract":"With the rapid development of social networks, the networked multi-label classification algorithms have gained wide attention. The existing networked multi-label classification algorithms mostly only consider the homogeneity or heterogeneity of the network without taking the imbalance of the network into account, and this is actually pretty common in real network environments, which deserves more attention. Moreover, the selection strategy of training set is very critical for multi-label classification algorithm, because it will directly affect both the parameter updating inside the classifier and the precision of the classifier. The application of active learning to the selection of training set can effectively improve the precision of the classifier. Similarly, the application of imbalanced data processing strategies to the selection of training sets also makes classifiers more suitable for imbalanced data networks. Thereout, we propose an algorithm BSHD (Block Sampling with selecting the Highest Degree nodes), which is an active learning based imbalanced networked multi-label classification algorithm. In this algorithm, we divide the network according to the edge density and utilize the oversampling and undersampling to dispose each block. Then we select the nodes with the highest degree from each block to form the training set. Experimental results show that our proposed BSHD outperforms other state-of-arts approaches.","PeriodicalId":144958,"journal":{"name":"2018 IEEE International Conference on Big Knowledge (ICBK)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Big Knowledge (ICBK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK.2018.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
With the rapid development of social networks, the networked multi-label classification algorithms have gained wide attention. The existing networked multi-label classification algorithms mostly only consider the homogeneity or heterogeneity of the network without taking the imbalance of the network into account, and this is actually pretty common in real network environments, which deserves more attention. Moreover, the selection strategy of training set is very critical for multi-label classification algorithm, because it will directly affect both the parameter updating inside the classifier and the precision of the classifier. The application of active learning to the selection of training set can effectively improve the precision of the classifier. Similarly, the application of imbalanced data processing strategies to the selection of training sets also makes classifiers more suitable for imbalanced data networks. Thereout, we propose an algorithm BSHD (Block Sampling with selecting the Highest Degree nodes), which is an active learning based imbalanced networked multi-label classification algorithm. In this algorithm, we divide the network according to the edge density and utilize the oversampling and undersampling to dispose each block. Then we select the nodes with the highest degree from each block to form the training set. Experimental results show that our proposed BSHD outperforms other state-of-arts approaches.
随着社会网络的快速发展,网络化的多标签分类算法得到了广泛的关注。现有的网络多标签分类算法大多只考虑网络的同质性或异质性,而没有考虑网络的不平衡性,这在现实网络环境中其实是相当普遍的,值得更多的关注。此外,训练集的选择策略对于多标签分类算法非常关键,因为它将直接影响分类器内部参数的更新和分类器的精度。将主动学习应用于训练集的选择,可以有效地提高分类器的准确率。同样,将不平衡数据处理策略应用于训练集的选择,也使得分类器更适合于不平衡数据网络。因此,我们提出了一种基于主动学习的不平衡网络多标签分类算法BSHD (Block Sampling with selection the Highest Degree nodes)。在该算法中,我们根据边缘密度对网络进行划分,并利用过采样和欠采样来处理每个块。然后从每个块中选择度最高的节点组成训练集。实验结果表明,我们提出的BSHD方法优于其他最先进的方法。