Transductive Classification on Heterogeneous Information Networks with Edge Betweenness-based Normalization

P. Bangcharoensap, T. Murata, Hayato Kobayashi, N. Shimizu
{"title":"Transductive Classification on Heterogeneous Information Networks with Edge Betweenness-based Normalization","authors":"P. Bangcharoensap, T. Murata, Hayato Kobayashi, N. Shimizu","doi":"10.1145/2835776.2835799","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel method for transductive classification on heterogeneous information networks composed of multiple types of vertices. Such networks naturally represent many real-world Web data such as DBLP data (author, paper, and conference). Given a network where some vertices are labeled, the classifier aims to predict labels for the remaining vertices by propagating the labels to the entire network. In the label propagation process, many studies reduce the importance of edges connecting to a high-degree vertex. The assumption is unsatisfactory when reliability of a label of a vertex cannot be implied from its degree. On the basis of our intuition that edges bridging across communities are less trustworthy, we adapt edge betweenness to imply the importance of edges. Since directly applying the conventional edge betweenness is inefficient on heterogeneous networks, we propose two additional refinements. First, the centrality utilizes the fact that networks contain multiple types of vertices. Second, the centrality ignores flows originating from endpoints of considering edges. The experimental results on real-world datasets show our proposed method is more effective than a state-of-the-art method, GNetMine. On average, our method yields 92.79 ± 1.25% accuracy on a DBLP network even if only 1.92% of vertices are labeled. Our simple weighting scheme results in more than 5 percentage points increase in accuracy compared with GNetMine.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"37 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2835776.2835799","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

This paper proposes a novel method for transductive classification on heterogeneous information networks composed of multiple types of vertices. Such networks naturally represent many real-world Web data such as DBLP data (author, paper, and conference). Given a network where some vertices are labeled, the classifier aims to predict labels for the remaining vertices by propagating the labels to the entire network. In the label propagation process, many studies reduce the importance of edges connecting to a high-degree vertex. The assumption is unsatisfactory when reliability of a label of a vertex cannot be implied from its degree. On the basis of our intuition that edges bridging across communities are less trustworthy, we adapt edge betweenness to imply the importance of edges. Since directly applying the conventional edge betweenness is inefficient on heterogeneous networks, we propose two additional refinements. First, the centrality utilizes the fact that networks contain multiple types of vertices. Second, the centrality ignores flows originating from endpoints of considering edges. The experimental results on real-world datasets show our proposed method is more effective than a state-of-the-art method, GNetMine. On average, our method yields 92.79 ± 1.25% accuracy on a DBLP network even if only 1.92% of vertices are labeled. Our simple weighting scheme results in more than 5 percentage points increase in accuracy compared with GNetMine.
基于边缘间距归一化的异构信息网络转换分类
提出了一种针对由多类型顶点组成的异构信息网络进行换向分类的新方法。这样的网络自然表示许多现实世界的Web数据,如DBLP数据(作者、论文和会议)。给定一个网络,其中一些顶点被标记,分类器的目标是通过将标签传播到整个网络来预测剩余顶点的标签。在标签传播过程中,许多研究降低了连接到高度顶点的边的重要性。当一个顶点的标签的可靠性不能从它的度中暗示出来时,这个假设是不令人满意的。基于我们的直觉,即跨社区的边界不太值得信赖,我们调整了边界之间的距离来暗示边界的重要性。由于直接应用传统的边缘间性在异构网络上是低效的,我们提出了两个额外的改进。首先,中心性利用了网络包含多种类型顶点的事实。其次,中心性忽略了从考虑边缘的端点产生的流。在真实数据集上的实验结果表明,我们提出的方法比最先进的方法GNetMine更有效。即使只有1.92%的顶点被标记,我们的方法在DBLP网络上的平均准确率为92.79±1.25%。与GNetMine相比,我们的简单加权方案使准确率提高了5个百分点以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信