从结构和语义角度看异构网络嵌入的作者姓名消歧

2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI) Pub Date : 2022-10-01 DOI:10.1109/ICTAI56018.2022.00043

Wenjin Xie, Siyuan Liu, Xiaomeng Wang, Tao Jia

{"title":"从结构和语义角度看异构网络嵌入的作者姓名消歧","authors":"Wenjin Xie, Siyuan Liu, Xiaomeng Wang, Tao Jia","doi":"10.1109/ICTAI56018.2022.00043","DOIUrl":null,"url":null,"abstract":"Name ambiguity is common in academic digital libraries, such as multiple authors having the same name. This creates challenges for academic data management and analysis, thus name disambiguation becomes necessary. The procedure of name disambiguation is to divide publications with the same name into different groups, each group belonging to a unique author. A large amount of attribute information in publications makes traditional methods fall into the quagmire of feature selection. These methods always select attributes artificially and equally, which usually causes a negative impact on accuracy. The proposed method is mainly based on representation learning for heterogeneous networks and clustering and exploits the self-attention technology to solve the problem. The presentation of publications is a synthesis of structural and semantic representations. The structural representation is obtained by meta-path-based sampling and a skip-gram-based embedding method, and meta-path level attention is introduced to automatically learn the weight of each feature. The semantic representation is generated using NLP tools. Our proposal performs better in terms of name disambiguation accuracy compared with baselines and the ablation experiments demonstrate the improvement by feature selection and the meta-path level attention in our method. The experimental results show the superiority of our new method for capturing the most attributes from publications and reducing the impact of redundant information.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Author Name Disambiguation via Heterogeneous Network Embedding from Structural and Semantic Perspectives\",\"authors\":\"Wenjin Xie, Siyuan Liu, Xiaomeng Wang, Tao Jia\",\"doi\":\"10.1109/ICTAI56018.2022.00043\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Name ambiguity is common in academic digital libraries, such as multiple authors having the same name. This creates challenges for academic data management and analysis, thus name disambiguation becomes necessary. The procedure of name disambiguation is to divide publications with the same name into different groups, each group belonging to a unique author. A large amount of attribute information in publications makes traditional methods fall into the quagmire of feature selection. These methods always select attributes artificially and equally, which usually causes a negative impact on accuracy. The proposed method is mainly based on representation learning for heterogeneous networks and clustering and exploits the self-attention technology to solve the problem. The presentation of publications is a synthesis of structural and semantic representations. The structural representation is obtained by meta-path-based sampling and a skip-gram-based embedding method, and meta-path level attention is introduced to automatically learn the weight of each feature. The semantic representation is generated using NLP tools. Our proposal performs better in terms of name disambiguation accuracy compared with baselines and the ablation experiments demonstrate the improvement by feature selection and the meta-path level attention in our method. The experimental results show the superiority of our new method for capturing the most attributes from publications and reducing the impact of redundant information.\",\"PeriodicalId\":354314,\"journal\":{\"name\":\"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTAI56018.2022.00043\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI56018.2022.00043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

名称歧义在学术数字图书馆中很常见，例如多个作者使用相同的名称。这给学术数据管理和分析带来了挑战，因此名称消歧变得必要。名称消歧的过程是将具有相同名称的出版物划分为不同的组，每组属于一个唯一的作者。出版物中大量的属性信息使得传统的特征选择方法陷入了困境。这些方法总是人为地均等地选择属性，这通常会对准确性造成负面影响。该方法主要基于异构网络的表示学习和聚类，并利用自关注技术来解决问题。出版物的表示是结构表示和语义表示的综合。通过基于元路径的采样和基于skip-gram的嵌入方法获得结构表示，并引入元路径级关注来自动学习每个特征的权重。语义表示是使用NLP工具生成的。与基线相比，我们的方法在名称消歧精度方面表现更好，消融实验表明，我们的方法通过特征选择和元路径级别的关注得到了改善。实验结果表明，该方法在从出版物中捕获最多属性和减少冗余信息影响方面具有优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Author Name Disambiguation via Heterogeneous Network Embedding from Structural and Semantic Perspectives

Name ambiguity is common in academic digital libraries, such as multiple authors having the same name. This creates challenges for academic data management and analysis, thus name disambiguation becomes necessary. The procedure of name disambiguation is to divide publications with the same name into different groups, each group belonging to a unique author. A large amount of attribute information in publications makes traditional methods fall into the quagmire of feature selection. These methods always select attributes artificially and equally, which usually causes a negative impact on accuracy. The proposed method is mainly based on representation learning for heterogeneous networks and clustering and exploits the self-attention technology to solve the problem. The presentation of publications is a synthesis of structural and semantic representations. The structural representation is obtained by meta-path-based sampling and a skip-gram-based embedding method, and meta-path level attention is introduced to automatically learn the weight of each feature. The semantic representation is generated using NLP tools. Our proposal performs better in terms of name disambiguation accuracy compared with baselines and the ablation experiments demonstrate the improvement by feature selection and the meta-path level attention in our method. The experimental results show the superiority of our new method for capturing the most attributes from publications and reducing the impact of redundant information.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)

自引率

0.00%

发文量