Assessing the coverage of PubMed, Embase, OpenAlex, and Semantic Scholar for automated single-database searches in living guideline evidence surveillance: a case study of the international polycystic ovary syndrome guidelines 2023
Darren Rajit , Steve McDonald , Chau Thien Tay , Lan Du , Joanne Enticott , Helena Teede
{"title":"Assessing the coverage of PubMed, Embase, OpenAlex, and Semantic Scholar for automated single-database searches in living guideline evidence surveillance: a case study of the international polycystic ovary syndrome guidelines 2023","authors":"Darren Rajit , Steve McDonald , Chau Thien Tay , Lan Du , Joanne Enticott , Helena Teede","doi":"10.1016/j.jclinepi.2025.111789","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>Living guideline maintenance is underpinned by manual approaches toward evidence retrieval, limiting long-term sustainability. Our study aimed to evaluate the feasibility of using only PubMed, Embase, OpenAlex, or Semantic Scholar in automatically retrieving articles that were included in a high-quality international guideline - the 2023 international polycystic ovary syndrome (PCOS) guidelines.</div></div><div><h3>Methods</h3><div>The digital object identifiers (DOIs) and PubMed ID (PMIDs) of articles included after full-text screening in the 2023 international PCOS guidelines were extracted. These IDs were used to automatically retrieve article metadata from all tested databases. A title-only search was then conducted on articles that were not initially retrievable. The extent of coverage, and overlap of coverage, was determined for each database. An exploratory analysis of the risk of bias (RoB) of articles that were unretrievable was then conducted for each database.</div></div><div><h3>Results</h3><div>OpenAlex had the best coverage (98.6%), followed by Semantic Scholar (98.3%), Embase (96.8%), and PubMed (93.0%). However, 90.5% of all articles were retrievable from all four databases. All articles that were not retrievable from OpenAlex and Semantic Scholar were either assessed as medium or high RoB. In contrast, both Embase and PubMed missed articles that were of high quality (low RoB).</div></div><div><h3>Conclusion</h3><div>OpenAlex should be considered a single source for automated evidence retrieval in living guideline development, due to high coverage, and low risk of missing high-quality articles. These insights are being leveraged as part of transitioning the 2023 international PCOS guidelines toward a living format.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"183 ","pages":"Article 111789"},"PeriodicalIF":7.3000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Clinical Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895435625001222","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives
Living guideline maintenance is underpinned by manual approaches toward evidence retrieval, limiting long-term sustainability. Our study aimed to evaluate the feasibility of using only PubMed, Embase, OpenAlex, or Semantic Scholar in automatically retrieving articles that were included in a high-quality international guideline - the 2023 international polycystic ovary syndrome (PCOS) guidelines.
Methods
The digital object identifiers (DOIs) and PubMed ID (PMIDs) of articles included after full-text screening in the 2023 international PCOS guidelines were extracted. These IDs were used to automatically retrieve article metadata from all tested databases. A title-only search was then conducted on articles that were not initially retrievable. The extent of coverage, and overlap of coverage, was determined for each database. An exploratory analysis of the risk of bias (RoB) of articles that were unretrievable was then conducted for each database.
Results
OpenAlex had the best coverage (98.6%), followed by Semantic Scholar (98.3%), Embase (96.8%), and PubMed (93.0%). However, 90.5% of all articles were retrievable from all four databases. All articles that were not retrievable from OpenAlex and Semantic Scholar were either assessed as medium or high RoB. In contrast, both Embase and PubMed missed articles that were of high quality (low RoB).
Conclusion
OpenAlex should be considered a single source for automated evidence retrieval in living guideline development, due to high coverage, and low risk of missing high-quality articles. These insights are being leveraged as part of transitioning the 2023 international PCOS guidelines toward a living format.
期刊介绍:
The Journal of Clinical Epidemiology strives to enhance the quality of clinical and patient-oriented healthcare research by advancing and applying innovative methods in conducting, presenting, synthesizing, disseminating, and translating research results into optimal clinical practice. Special emphasis is placed on training new generations of scientists and clinical practice leaders.