基于网络半监督学习的非均匀节点标记选择

Bilzã Araújo, Liang Zhao
{"title":"基于网络半监督学习的非均匀节点标记选择","authors":"Bilzã Araújo, Liang Zhao","doi":"10.1109/BRICS-CCI-CBIC.2013.77","DOIUrl":null,"url":null,"abstract":"Network-based Semi-Supervised Learning (NbSSL) propagates labels in affinity-networks by taking advantage of the network topology likewise information spreading in trust networks. In NbSSL, not only the unlabeled data instances, but also the labeled ones, are able to bias the classification performance. Herein, we show some results and discussion on this phenomenon. Even the suitability of the free parameters of the NbSSL algorithms varies according to the available labeled data. Indeed, we propose a method for selecting representative data instances for labeling for NbSSL. In our sense the represent ability of a node is related to how inhomogeneous is its profile concerning the whole network. The proposed method uses Complex Networks centrality measures to identify which nodes present inhomogeneous profile. We perform this study by applying three NbSSL algorithms on Girvan-Newman and Lancichinetti-Fortunato-Radicchi modular networks. In the former, the nodes with high clustering coefficient are good representatives of the data and the nodes with high betweenness are the good representatives ones in the later. A high clustering coefficient means that the node lies in a much connected motif (clique) whereas a high betweenness means that the node lies interconnecting the modular structures. These results reveal the ability to improve the NbSSL performance by selecting representative data instances for manual labeling.","PeriodicalId":306195,"journal":{"name":"2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Selecting Nodes with Inhomogeneous Profile for Labeling for Network-Based Semi-supervised Learning\",\"authors\":\"Bilzã Araújo, Liang Zhao\",\"doi\":\"10.1109/BRICS-CCI-CBIC.2013.77\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Network-based Semi-Supervised Learning (NbSSL) propagates labels in affinity-networks by taking advantage of the network topology likewise information spreading in trust networks. In NbSSL, not only the unlabeled data instances, but also the labeled ones, are able to bias the classification performance. Herein, we show some results and discussion on this phenomenon. Even the suitability of the free parameters of the NbSSL algorithms varies according to the available labeled data. Indeed, we propose a method for selecting representative data instances for labeling for NbSSL. In our sense the represent ability of a node is related to how inhomogeneous is its profile concerning the whole network. The proposed method uses Complex Networks centrality measures to identify which nodes present inhomogeneous profile. We perform this study by applying three NbSSL algorithms on Girvan-Newman and Lancichinetti-Fortunato-Radicchi modular networks. In the former, the nodes with high clustering coefficient are good representatives of the data and the nodes with high betweenness are the good representatives ones in the later. A high clustering coefficient means that the node lies in a much connected motif (clique) whereas a high betweenness means that the node lies interconnecting the modular structures. These results reveal the ability to improve the NbSSL performance by selecting representative data instances for manual labeling.\",\"PeriodicalId\":306195,\"journal\":{\"name\":\"2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BRICS-CCI-CBIC.2013.77\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BRICS-CCI-CBIC.2013.77","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

基于网络的半监督学习(NbSSL)通过利用网络拓扑在亲和网络中传播标签,类似于在信任网络中传播信息。在NbSSL中,不仅未标记的数据实例,而且标记的数据实例也会对分类性能产生偏差。本文给出了一些结果,并对这一现象进行了讨论。甚至NbSSL算法的自由参数的适用性也根据可用的标记数据而变化。实际上,我们提出了一种选择代表性数据实例用于NbSSL标记的方法。在我们的意义上,一个节点的表示能力与它在整个网络中的分布有多不均匀有关。提出的方法使用复杂网络中心性度量来识别哪些节点呈现非均匀轮廓。我们通过在Girvan-Newman和Lancichinetti-Fortunato-Radicchi模块网络上应用三种NbSSL算法进行了这项研究。在前者中,聚类系数高的节点是数据的良好代表,而在后者中,中间度高的节点是数据的良好代表。高聚类系数意味着节点位于一个紧密连接的基元(集团)中,而高中间度意味着节点位于模块化结构的互连中。这些结果表明,通过选择具有代表性的数据实例进行手动标记,可以提高NbSSL的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Selecting Nodes with Inhomogeneous Profile for Labeling for Network-Based Semi-supervised Learning
Network-based Semi-Supervised Learning (NbSSL) propagates labels in affinity-networks by taking advantage of the network topology likewise information spreading in trust networks. In NbSSL, not only the unlabeled data instances, but also the labeled ones, are able to bias the classification performance. Herein, we show some results and discussion on this phenomenon. Even the suitability of the free parameters of the NbSSL algorithms varies according to the available labeled data. Indeed, we propose a method for selecting representative data instances for labeling for NbSSL. In our sense the represent ability of a node is related to how inhomogeneous is its profile concerning the whole network. The proposed method uses Complex Networks centrality measures to identify which nodes present inhomogeneous profile. We perform this study by applying three NbSSL algorithms on Girvan-Newman and Lancichinetti-Fortunato-Radicchi modular networks. In the former, the nodes with high clustering coefficient are good representatives of the data and the nodes with high betweenness are the good representatives ones in the later. A high clustering coefficient means that the node lies in a much connected motif (clique) whereas a high betweenness means that the node lies interconnecting the modular structures. These results reveal the ability to improve the NbSSL performance by selecting representative data instances for manual labeling.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信