PubMed Computed Authors in 2024: an open resource of disambiguated author names in biomedical literature.

Shubo Tian, Qingyu Chen, Donald C Comeau, W John Wilbur, Zhiyong Lu
{"title":"PubMed Computed Authors in 2024: an open resource of disambiguated author names in biomedical literature.","authors":"Shubo Tian, Qingyu Chen, Donald C Comeau, W John Wilbur, Zhiyong Lu","doi":"10.1093/bioinformatics/btae672","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Over 55% of author names in PubMed are ambiguous: the same name is shared by different individual researchers. This poses significant challenges on precise literature retrieval for author name queries, a common behavior in biomedical literature search. In response, we present a comprehensive dataset of disambiguated authors. Specifically, we complement the automatic PubMed Computed Authors algorithm with the latest ORCID data for improved accuracy. As a result, the enhanced algorithm achieves high performance in author name disambiguation, and subsequently our dataset contains more than 21 million disambiguated authors for over 35 million PubMed articles and is incrementally updated on a weekly basis. More importantly, we make the dataset publicly available for the community such that it can be utilized in a wide variety of potential applications beyond assisting PubMed's author name queries. Finally, we propose a set of guidelines for best practices of authors pertaining to use of their names.</p><p><strong>Availability and implementation: </strong>The PubMed Computed Authors dataset is publicly available for bulk download at: https://ftp.ncbi.nlm.nih.gov/pub/lu/ComputedAuthors/. Additionally, it is available for query through web API at: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/authors/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Summary: Over 55% of author names in PubMed are ambiguous: the same name is shared by different individual researchers. This poses significant challenges on precise literature retrieval for author name queries, a common behavior in biomedical literature search. In response, we present a comprehensive dataset of disambiguated authors. Specifically, we complement the automatic PubMed Computed Authors algorithm with the latest ORCID data for improved accuracy. As a result, the enhanced algorithm achieves high performance in author name disambiguation, and subsequently our dataset contains more than 21 million disambiguated authors for over 35 million PubMed articles and is incrementally updated on a weekly basis. More importantly, we make the dataset publicly available for the community such that it can be utilized in a wide variety of potential applications beyond assisting PubMed's author name queries. Finally, we propose a set of guidelines for best practices of authors pertaining to use of their names.

Availability and implementation: The PubMed Computed Authors dataset is publicly available for bulk download at: https://ftp.ncbi.nlm.nih.gov/pub/lu/ComputedAuthors/. Additionally, it is available for query through web API at: https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/authors/.

Supplementary information: Supplementary data are available at Bioinformatics online.

2024 年的 PubMed 计算作者:生物医学文献中已消歧作者姓名的开放资源。
摘要:PubMed 中超过 55% 的作者姓名是模棱两可的:不同的研究人员共享同一个名字。这给作者姓名查询的精确文献检索带来了巨大挑战,而这正是生物医学文献检索中的常见行为。为此,我们提出了一个全面的作者消歧义数据集。具体来说,我们利用最新的 ORCID 数据对 PubMed 作者自动计算算法进行了补充,以提高准确性。因此,增强后的算法在作者姓名消歧方面达到了很高的性能,随后我们的数据集包含了超过 3500 万篇 PubMed 文章的 2100 多万消歧作者,并且每周都在不断更新。更重要的是,我们向社会公开了数据集,使其可以用于协助 PubMed 作者姓名查询之外的各种潜在应用。最后,我们提出了一套作者使用其姓名的最佳实践指南:PubMed 计算作者数据集可在以下网址批量下载:https://ftp.ncbi.nlm.nih.gov/pub/lu/ComputedAuthors/。此外,还可通过网络 API 进行查询:https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/authors/.Supplementary information:补充数据可在 Bioinformatics online 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信