基于网络半监督学习的非均匀节点标记选择

2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence Pub Date : 2013-09-08 DOI:10.1109/BRICS-CCI-CBIC.2013.77

Bilzã Araújo, Liang Zhao

{"title":"基于网络半监督学习的非均匀节点标记选择","authors":"Bilzã Araújo, Liang Zhao","doi":"10.1109/BRICS-CCI-CBIC.2013.77","DOIUrl":null,"url":null,"abstract":"Network-based Semi-Supervised Learning (NbSSL) propagates labels in affinity-networks by taking advantage of the network topology likewise information spreading in trust networks. In NbSSL, not only the unlabeled data instances, but also the labeled ones, are able to bias the classification performance. Herein, we show some results and discussion on this phenomenon. Even the suitability of the free parameters of the NbSSL algorithms varies according to the available labeled data. Indeed, we propose a method for selecting representative data instances for labeling for NbSSL. In our sense the represent ability of a node is related to how inhomogeneous is its profile concerning the whole network. The proposed method uses Complex Networks centrality measures to identify which nodes present inhomogeneous profile. We perform this study by applying three NbSSL algorithms on Girvan-Newman and Lancichinetti-Fortunato-Radicchi modular networks. In the former, the nodes with high clustering coefficient are good representatives of the data and the nodes with high betweenness are the good representatives ones in the later. A high clustering coefficient means that the node lies in a much connected motif (clique) whereas a high betweenness means that the node lies interconnecting the modular structures. These results reveal the ability to improve the NbSSL performance by selecting representative data instances for manual labeling.","PeriodicalId":306195,"journal":{"name":"2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Selecting Nodes with Inhomogeneous Profile for Labeling for Network-Based Semi-supervised Learning\",\"authors\":\"Bilzã Araújo, Liang Zhao\",\"doi\":\"10.1109/BRICS-CCI-CBIC.2013.77\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Network-based Semi-Supervised Learning (NbSSL) propagates labels in affinity-networks by taking advantage of the network topology likewise information spreading in trust networks. In NbSSL, not only the unlabeled data instances, but also the labeled ones, are able to bias the classification performance. Herein, we show some results and discussion on this phenomenon. Even the suitability of the free parameters of the NbSSL algorithms varies according to the available labeled data. Indeed, we propose a method for selecting representative data instances for labeling for NbSSL. In our sense the represent ability of a node is related to how inhomogeneous is its profile concerning the whole network. The proposed method uses Complex Networks centrality measures to identify which nodes present inhomogeneous profile. We perform this study by applying three NbSSL algorithms on Girvan-Newman and Lancichinetti-Fortunato-Radicchi modular networks. In the former, the nodes with high clustering coefficient are good representatives of the data and the nodes with high betweenness are the good representatives ones in the later. A high clustering coefficient means that the node lies in a much connected motif (clique) whereas a high betweenness means that the node lies interconnecting the modular structures. These results reveal the ability to improve the NbSSL performance by selecting representative data instances for manual labeling.\",\"PeriodicalId\":306195,\"journal\":{\"name\":\"2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BRICS-CCI-CBIC.2013.77\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BRICS-CCI-CBIC.2013.77","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

基于网络的半监督学习(NbSSL)通过利用网络拓扑在亲和网络中传播标签，类似于在信任网络中传播信息。在NbSSL中，不仅未标记的数据实例，而且标记的数据实例也会对分类性能产生偏差。本文给出了一些结果，并对这一现象进行了讨论。甚至NbSSL算法的自由参数的适用性也根据可用的标记数据而变化。实际上，我们提出了一种选择代表性数据实例用于NbSSL标记的方法。在我们的意义上，一个节点的表示能力与它在整个网络中的分布有多不均匀有关。提出的方法使用复杂网络中心性度量来识别哪些节点呈现非均匀轮廓。我们通过在Girvan-Newman和Lancichinetti-Fortunato-Radicchi模块网络上应用三种NbSSL算法进行了这项研究。在前者中，聚类系数高的节点是数据的良好代表，而在后者中，中间度高的节点是数据的良好代表。高聚类系数意味着节点位于一个紧密连接的基元(集团)中，而高中间度意味着节点位于模块化结构的互连中。这些结果表明，通过选择具有代表性的数据实例进行手动标记，可以提高NbSSL的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Selecting Nodes with Inhomogeneous Profile for Labeling for Network-Based Semi-supervised Learning

Network-based Semi-Supervised Learning (NbSSL) propagates labels in affinity-networks by taking advantage of the network topology likewise information spreading in trust networks. In NbSSL, not only the unlabeled data instances, but also the labeled ones, are able to bias the classification performance. Herein, we show some results and discussion on this phenomenon. Even the suitability of the free parameters of the NbSSL algorithms varies according to the available labeled data. Indeed, we propose a method for selecting representative data instances for labeling for NbSSL. In our sense the represent ability of a node is related to how inhomogeneous is its profile concerning the whole network. The proposed method uses Complex Networks centrality measures to identify which nodes present inhomogeneous profile. We perform this study by applying three NbSSL algorithms on Girvan-Newman and Lancichinetti-Fortunato-Radicchi modular networks. In the former, the nodes with high clustering coefficient are good representatives of the data and the nodes with high betweenness are the good representatives ones in the later. A high clustering coefficient means that the node lies in a much connected motif (clique) whereas a high betweenness means that the node lies interconnecting the modular structures. These results reveal the ability to improve the NbSSL performance by selecting representative data instances for manual labeling.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence

自引率

0.00%

发文量