Mining Author Identifiers for PubMed by Linking to Open Bibliographic Databases

Li Zhang, Yong Huang, Qikai Cheng, Wei Lu
{"title":"Mining Author Identifiers for PubMed by Linking to Open Bibliographic Databases","authors":"Li Zhang, Yong Huang, Qikai Cheng, Wei Lu","doi":"10.1109/QRS-C51114.2020.00043","DOIUrl":null,"url":null,"abstract":"Author identifier (ID) is essential for many downstream tasks, such as co-author network and scientist mobility analysis. As a widely used database, author ID of PubMed is not officially provided by National Institutes of Health (NIH), that restrict some identifier-based researches or systems. This study exploited three open bibliographic databases Aminer, Microsoft Academic Graph (MAG) and Semantic Scholar (S2) to associate author ID for PubMed. For this purpose, paper linking and author linking was performed in order to mine paper and author links between PubMed and these databases. Performance of author name disambiguation (AND) was evaluated on two datasets. Our findings suggested that, S2 contains full volume of PubMed regarding link completeness. With respect to correctness of author ID, S2 and MAG achieved better performance than Aminer. The best F1 score of there available identifiers is below 90%, indicate AND for large scale database remain as a difficult task and efforts are being need for further improvement. We made the final dataset publicly available for facilitating future research.","PeriodicalId":358174,"journal":{"name":"2020 IEEE 20th International Conference on Software Quality, Reliability and Security Companion (QRS-C)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 20th International Conference on Software Quality, Reliability and Security Companion (QRS-C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS-C51114.2020.00043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Author identifier (ID) is essential for many downstream tasks, such as co-author network and scientist mobility analysis. As a widely used database, author ID of PubMed is not officially provided by National Institutes of Health (NIH), that restrict some identifier-based researches or systems. This study exploited three open bibliographic databases Aminer, Microsoft Academic Graph (MAG) and Semantic Scholar (S2) to associate author ID for PubMed. For this purpose, paper linking and author linking was performed in order to mine paper and author links between PubMed and these databases. Performance of author name disambiguation (AND) was evaluated on two datasets. Our findings suggested that, S2 contains full volume of PubMed regarding link completeness. With respect to correctness of author ID, S2 and MAG achieved better performance than Aminer. The best F1 score of there available identifiers is below 90%, indicate AND for large scale database remain as a difficult task and efforts are being need for further improvement. We made the final dataset publicly available for facilitating future research.
通过链接到开放书目数据库挖掘PubMed作者标识符
作者标识符(ID)在许多下游任务中是必不可少的,例如合著者网络和科学家流动性分析。PubMed作为一个被广泛使用的数据库,作者ID并不是由美国国立卫生研究院(NIH)官方提供的,这限制了一些基于标识符的研究或系统。本研究利用三个开放书目数据库Aminer、Microsoft Academic Graph (MAG)和Semantic Scholar (S2)为PubMed关联作者ID。为此,执行论文链接和作者链接,以挖掘PubMed和这些数据库之间的论文和作者链接。在两个数据集上评估了作者姓名消歧(AND)的性能。我们的研究结果表明,S2包含完整的PubMed链接完整性。在作者ID的正确性方面,S2和MAG的性能优于Aminer。现有标识符的最佳F1得分低于90%,说明对于大规模数据库来说AND仍然是一项艰巨的任务,需要进一步改进。我们公开了最终的数据集,以促进未来的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信