{"title":"A hidden treasure? Evaluating and extending latent methods for link-based classification","authors":"Aaron Fleming, Luke K. McDowell, Zane Markel","doi":"10.1109/IRI.2014.7051954","DOIUrl":null,"url":null,"abstract":"Many information tasks involve objects that are explicitly or implicitly connected in a network, such as webpages connected by hyperlinks or people linked by \"friendships\" in a social network. Research on link-based classification (LBC) has studied how to leverage these connections to improve classification accuracy. This research broadly falls into two groups. First, there are methods that use the original attributes and/or links of the network, via a link-aware supervised classifier or via a non-learning method based on label propagation or random walks. Second, there are recent methods that first compute a set of latent features or links that summarize the network, then use a (hopefully simpler) supervised classifier or label propagation method. Some work has claimed that the latent methods can improve accuracy, but has not adequately compared with the best non-latent methods. In response, this paper provides the first substantial comparison between these two groups. We find that certain non-latent methods typically provide the best overall accuracy, but that latent methods can be competitive when a network is densely-labeled or when the attributes are not very informative. Moreover, we introduce two novel combinations of these methods that in some cases substantially increase accuracy.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2014.7051954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Many information tasks involve objects that are explicitly or implicitly connected in a network, such as webpages connected by hyperlinks or people linked by "friendships" in a social network. Research on link-based classification (LBC) has studied how to leverage these connections to improve classification accuracy. This research broadly falls into two groups. First, there are methods that use the original attributes and/or links of the network, via a link-aware supervised classifier or via a non-learning method based on label propagation or random walks. Second, there are recent methods that first compute a set of latent features or links that summarize the network, then use a (hopefully simpler) supervised classifier or label propagation method. Some work has claimed that the latent methods can improve accuracy, but has not adequately compared with the best non-latent methods. In response, this paper provides the first substantial comparison between these two groups. We find that certain non-latent methods typically provide the best overall accuracy, but that latent methods can be competitive when a network is densely-labeled or when the attributes are not very informative. Moreover, we introduce two novel combinations of these methods that in some cases substantially increase accuracy.