Distribution of trial registry numbers within full-text of PubMed Central articles: implications for linking trials to publications and indexing trial publication types.
Arthur M Holt, Ang Michael Troy, Neil R Smalheiser
{"title":"Distribution of trial registry numbers within full-text of PubMed Central articles: implications for linking trials to publications and indexing trial publication types.","authors":"Arthur M Holt, Ang Michael Troy, Neil R Smalheiser","doi":"10.1186/s13063-025-08741-w","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Linking registered clinical trials with their published results continues to be a challenge. A variety of natural language processing (NLP)-based and machine learning-based models have been developed to assist users in identifying these connections. To date, however, no system has attempted to detect mentions of registry numbers within the full-text of articles.</p><p><strong>Methods: </strong>Articles from the PubMed Central full-text Open Access dataset were scanned for mentions of ClinicalTrials.gov and international clinical trial registry identifiers. We analyzed the distribution of trial registry numbers within sections of the articles and characterized their publication type indexing and other metrics.</p><p><strong>Results: </strong>Registry numbers mentioned in article metadata (e.g., the abstract) or in the Methods section of full-text are highly predictive of clinical trial articles. When a clinical trial article mentioned ClinicalTrials.gov identifier numbers (NCT) only in the Methods section, in every case examined, it was reporting clinical outcomes from that registered trial, and thus can reliably be used to link that trial to that publication. Conversely, registry numbers mentioned in Tables arise almost entirely from reviews (including systematic reviews and meta-analyses). Registry numbers mentioned in other full-text sections have relatively little predictive value for linking trials to their publications. Clinical trial articles that mention CONSORT or SPIRIT guidelines have a higher rate of mentioning registry numbers in article metadata, and hence are more easily linked to their underlying trials, than articles overall.</p><p><strong>Conclusions: </strong>The appearance and location of trial registry numbers within the full-text of biomedical articles provide valuable features for connecting clinical trials to their publications. They also potentially provide information to assist automated tools in assigning publication types to articles.</p>","PeriodicalId":23333,"journal":{"name":"Trials","volume":"26 1","pages":"34"},"PeriodicalIF":2.0000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783733/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13063-025-08741-w","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Linking registered clinical trials with their published results continues to be a challenge. A variety of natural language processing (NLP)-based and machine learning-based models have been developed to assist users in identifying these connections. To date, however, no system has attempted to detect mentions of registry numbers within the full-text of articles.
Methods: Articles from the PubMed Central full-text Open Access dataset were scanned for mentions of ClinicalTrials.gov and international clinical trial registry identifiers. We analyzed the distribution of trial registry numbers within sections of the articles and characterized their publication type indexing and other metrics.
Results: Registry numbers mentioned in article metadata (e.g., the abstract) or in the Methods section of full-text are highly predictive of clinical trial articles. When a clinical trial article mentioned ClinicalTrials.gov identifier numbers (NCT) only in the Methods section, in every case examined, it was reporting clinical outcomes from that registered trial, and thus can reliably be used to link that trial to that publication. Conversely, registry numbers mentioned in Tables arise almost entirely from reviews (including systematic reviews and meta-analyses). Registry numbers mentioned in other full-text sections have relatively little predictive value for linking trials to their publications. Clinical trial articles that mention CONSORT or SPIRIT guidelines have a higher rate of mentioning registry numbers in article metadata, and hence are more easily linked to their underlying trials, than articles overall.
Conclusions: The appearance and location of trial registry numbers within the full-text of biomedical articles provide valuable features for connecting clinical trials to their publications. They also potentially provide information to assist automated tools in assigning publication types to articles.
期刊介绍:
Trials is an open access, peer-reviewed, online journal that will encompass all aspects of the performance and findings of randomized controlled trials. Trials will experiment with, and then refine, innovative approaches to improving communication about trials. We are keen to move beyond publishing traditional trial results articles (although these will be included). We believe this represents an exciting opportunity to advance the science and reporting of trials. Prior to 2006, Trials was published as Current Controlled Trials in Cardiovascular Medicine (CCTCVM). All published CCTCVM articles are available via the Trials website and citations to CCTCVM article URLs will continue to be supported.