Leandro Rodrigues da Silva Souza , Daniel Hilário da Silva , Caio Tonus Ribeiro , Daiane Alves da Silva , Slawomir J. Nasuto , Catherine M. Sweeney-Reed , Adriano de Oliveira Andrade , Adriano Alves Pereira
{"title":"PubMedMetaTool: Automated metadata extraction from PubMed using Python for bibliometric analysis","authors":"Leandro Rodrigues da Silva Souza , Daniel Hilário da Silva , Caio Tonus Ribeiro , Daiane Alves da Silva , Slawomir J. Nasuto , Catherine M. Sweeney-Reed , Adriano de Oliveira Andrade , Adriano Alves Pereira","doi":"10.1016/j.simpa.2025.100766","DOIUrl":null,"url":null,"abstract":"<div><div>Bibliometric analyses often depend on extracting metadata from large scientific databases, a process that is still largely manual, repetitive, and error prone. This paper presents PubMedMetaTool, an open-source Python-based solution that automates the retrieval and transformation of bibliographic metadata from PubMed, using either article titles or Digital Object Identifiers as input. The tool implements a modular pipeline that extracts metadata using NCBI’s Entrez programming utilities and transforms it into formats compatible with tools such as Bibliometrix, VOSviewer, and pyBibX. Designed to be transparent and configurable, the tool improves bibliometric workflow efficiency, accuracy, and interoperability workflows.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"24 ","pages":"Article 100766"},"PeriodicalIF":1.3000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software Impacts","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2665963825000260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Bibliometric analyses often depend on extracting metadata from large scientific databases, a process that is still largely manual, repetitive, and error prone. This paper presents PubMedMetaTool, an open-source Python-based solution that automates the retrieval and transformation of bibliographic metadata from PubMed, using either article titles or Digital Object Identifiers as input. The tool implements a modular pipeline that extracts metadata using NCBI’s Entrez programming utilities and transforms it into formats compatible with tools such as Bibliometrix, VOSviewer, and pyBibX. Designed to be transparent and configurable, the tool improves bibliometric workflow efficiency, accuracy, and interoperability workflows.