A hidden treasure? Evaluating and extending latent methods for link-based classification

Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014) Pub Date : 2014-08-01 DOI:10.1109/IRI.2014.7051954

Aaron Fleming, Luke K. McDowell, Zane Markel

{"title":"A hidden treasure? Evaluating and extending latent methods for link-based classification","authors":"Aaron Fleming, Luke K. McDowell, Zane Markel","doi":"10.1109/IRI.2014.7051954","DOIUrl":null,"url":null,"abstract":"Many information tasks involve objects that are explicitly or implicitly connected in a network, such as webpages connected by hyperlinks or people linked by \"friendships\" in a social network. Research on link-based classification (LBC) has studied how to leverage these connections to improve classification accuracy. This research broadly falls into two groups. First, there are methods that use the original attributes and/or links of the network, via a link-aware supervised classifier or via a non-learning method based on label propagation or random walks. Second, there are recent methods that first compute a set of latent features or links that summarize the network, then use a (hopefully simpler) supervised classifier or label propagation method. Some work has claimed that the latent methods can improve accuracy, but has not adequately compared with the best non-latent methods. In response, this paper provides the first substantial comparison between these two groups. We find that certain non-latent methods typically provide the best overall accuracy, but that latent methods can be competitive when a network is densely-labeled or when the attributes are not very informative. Moreover, we introduce two novel combinations of these methods that in some cases substantially increase accuracy.","PeriodicalId":360013,"journal":{"name":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2014.7051954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Many information tasks involve objects that are explicitly or implicitly connected in a network, such as webpages connected by hyperlinks or people linked by "friendships" in a social network. Research on link-based classification (LBC) has studied how to leverage these connections to improve classification accuracy. This research broadly falls into two groups. First, there are methods that use the original attributes and/or links of the network, via a link-aware supervised classifier or via a non-learning method based on label propagation or random walks. Second, there are recent methods that first compute a set of latent features or links that summarize the network, then use a (hopefully simpler) supervised classifier or label propagation method. Some work has claimed that the latent methods can improve accuracy, but has not adequately compared with the best non-latent methods. In response, this paper provides the first substantial comparison between these two groups. We find that certain non-latent methods typically provide the best overall accuracy, but that latent methods can be competitive when a network is densely-labeled or when the attributes are not very informative. Moreover, we introduce two novel combinations of these methods that in some cases substantially increase accuracy.

查看原文本刊更多论文

隐藏的宝藏?评价和扩展基于链接的潜在分类方法

许多信息任务涉及在网络中显式或隐式连接的对象，例如通过超链接连接的网页或社交网络中通过“友谊”连接的人。基于链接的分类(LBC)研究了如何利用这些连接来提高分类精度。这项研究大致分为两类。首先，有一些方法使用网络的原始属性和/或链接，通过链接感知监督分类器或通过基于标签传播或随机行走的非学习方法。其次，最近有一些方法首先计算一组潜在特征或链接来总结网络，然后使用(希望更简单的)监督分类器或标签传播方法。一些研究声称潜在方法可以提高准确性，但尚未与最佳的非潜在方法进行充分的比较。因此，本文首次对这两个群体进行了实质性的比较。我们发现某些非潜在方法通常提供最好的总体准确性，但是当网络被密集标记或属性信息不是很丰富时，潜在方法可能会有竞争力。此外，我们介绍了这些方法的两种新组合，在某些情况下大大提高了准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)

自引率

0.00%

发文量