Multiple Features Driven Author Name Disambiguation

2021 IEEE International Conference on Web Services (ICWS) Pub Date : 2021-09-01 DOI:10.1109/ICWS53863.2021.00071

Qiang Zhou, Wei Chen, Weiqing Wang, Jiajie Xu, Lei Zhao

{"title":"Multiple Features Driven Author Name Disambiguation","authors":"Qiang Zhou, Wei Chen, Weiqing Wang, Jiajie Xu, Lei Zhao","doi":"10.1109/ICWS53863.2021.00071","DOIUrl":null,"url":null,"abstract":"Author Name Disambiguation (AND) has received more attention recently, accompanied by the increase of academic publications. To tackle the AND problem, existing studies have proposed many approaches based on different types of information, such as raw document feature (e.g., co-author, title, and keywords), fusion feature (e.g., a hybrid publication embedding based on raw document feature), local structural information (e.g., a publication's neighborhood information on a graph), and global structural information (e.g., the interactive information between a node and others on a graph). However, there has been no work taking all the above-mentioned information into account for the AND problem so far. To fill the gap, we propose a novel framework namely MFAND (Multiple Features Driven Author Name Disambiguation). Specifically, we first employ the raw document and fusion feature to construct six similarity graphs for each author name to be disambiguated. Next, the global and local structural information extracted from these graphs is fed into a novel encoder called R3JG, which integrates and reconstructs the above-mentioned four types of information associated with an author, with the goal of learning the latent information to enhance the generalization ability of the MFAND. Then, the integrated and reconstructed information is fed into a binary classification model for disambiguation. Note that, several pruning strategies are applied before the information extraction to remove noise effectively. Finally, our proposed framework is investigated on two real-world datasets, and the experimental results show that MFAND performs better than all state-of-the-art methods.","PeriodicalId":213320,"journal":{"name":"2021 IEEE International Conference on Web Services (ICWS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Web Services (ICWS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWS53863.2021.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Author Name Disambiguation (AND) has received more attention recently, accompanied by the increase of academic publications. To tackle the AND problem, existing studies have proposed many approaches based on different types of information, such as raw document feature (e.g., co-author, title, and keywords), fusion feature (e.g., a hybrid publication embedding based on raw document feature), local structural information (e.g., a publication's neighborhood information on a graph), and global structural information (e.g., the interactive information between a node and others on a graph). However, there has been no work taking all the above-mentioned information into account for the AND problem so far. To fill the gap, we propose a novel framework namely MFAND (Multiple Features Driven Author Name Disambiguation). Specifically, we first employ the raw document and fusion feature to construct six similarity graphs for each author name to be disambiguated. Next, the global and local structural information extracted from these graphs is fed into a novel encoder called R3JG, which integrates and reconstructs the above-mentioned four types of information associated with an author, with the goal of learning the latent information to enhance the generalization ability of the MFAND. Then, the integrated and reconstructed information is fed into a binary classification model for disambiguation. Note that, several pruning strategies are applied before the information extraction to remove noise effectively. Finally, our proposed framework is investigated on two real-world datasets, and the experimental results show that MFAND performs better than all state-of-the-art methods.

查看原文本刊更多论文

多特征驱动的作者姓名消歧

近年来，随着学术出版物的增多，作者姓名消歧(AND)越来越受到人们的关注。为了解决AND问题，已有研究提出了许多基于不同类型信息的方法，如原始文档特征(如合著者、标题和关键词)、融合特征(如基于原始文档特征的混合出版物嵌入)、局部结构信息(如出版物在图上的邻域信息)和全局结构信息(如图上节点与其他节点之间的交互信息)。然而，到目前为止，还没有将上述所有信息都考虑到AND问题的工作。为了填补这一空白，我们提出了一个新的框架，即MFAND(多特征驱动的作者姓名消歧)。具体来说，我们首先利用原始文档和融合特征为每个作者姓名构建六个相似图来消除歧义。接下来，将从这些图中提取的全局和局部结构信息输入到一种名为R3JG的新型编码器中，该编码器将上述与作者相关的四种信息进行整合和重构，目的是学习潜在信息，以增强MFAND的泛化能力。然后，将整合和重构的信息输入到二值分类模型中进行消歧。需要注意的是，在信息提取之前，采用了几种修剪策略来有效地去除噪声。最后，在两个真实数据集上对我们提出的框架进行了研究，实验结果表明MFAND比所有最先进的方法表现得更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Web Services (ICWS)

自引率

0.00%

发文量