Bo Yang, Hong Jiao, Qingqing Fan, Jiawen Chen, Jiaxue Liu
{"title":"Does reusing scientific datasets reduce the impact of the papers?","authors":"Bo Yang, Hong Jiao, Qingqing Fan, Jiawen Chen, Jiaxue Liu","doi":"10.1016/j.joi.2025.101737","DOIUrl":null,"url":null,"abstract":"<div><div>Data reuse is increasingly advocated as a strategy to enhance research reproducibility, accelerate project progress, and reduce research costs. Although few dispute the principle of data reuse, its effect on citation performance in experiment-based or data-intensive studies remains uncertain. To dispel concerns about the impact of data reuse on research, researchers require clear evidence of its benefits. This study employs informetric analysis, analysis of variance, and multiple linear regression to conduct a large-scale investigation of scientists’ dataset (re)use behavior, providing direct evidence of the citation performance of their research. The results show that: (i) The volume of released data in biomedical and life sciences continues to grow steadily; however, tracking the (re)use of Gene Expression Omnibus datasets over time shows that actual utilization and reuse have not kept pace with; (ii) Papers that declare the reuse of released datasets, especially those reusing their own data (self-reuse), garner more citations, indicating that dataset reuse does not negatively impact citation performance and may even enhance it; (iii) Our co-citation model predicts that, owing to the “sheep flock effect,” data reuse could increase the exposure of reusers’ related works and subsequently enhance the citation performance of their other publications.</div></div>","PeriodicalId":48662,"journal":{"name":"Journal of Informetrics","volume":"19 4","pages":"Article 101737"},"PeriodicalIF":3.5000,"publicationDate":"2025-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informetrics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157725000999","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Data reuse is increasingly advocated as a strategy to enhance research reproducibility, accelerate project progress, and reduce research costs. Although few dispute the principle of data reuse, its effect on citation performance in experiment-based or data-intensive studies remains uncertain. To dispel concerns about the impact of data reuse on research, researchers require clear evidence of its benefits. This study employs informetric analysis, analysis of variance, and multiple linear regression to conduct a large-scale investigation of scientists’ dataset (re)use behavior, providing direct evidence of the citation performance of their research. The results show that: (i) The volume of released data in biomedical and life sciences continues to grow steadily; however, tracking the (re)use of Gene Expression Omnibus datasets over time shows that actual utilization and reuse have not kept pace with; (ii) Papers that declare the reuse of released datasets, especially those reusing their own data (self-reuse), garner more citations, indicating that dataset reuse does not negatively impact citation performance and may even enhance it; (iii) Our co-citation model predicts that, owing to the “sheep flock effect,” data reuse could increase the exposure of reusers’ related works and subsequently enhance the citation performance of their other publications.
期刊介绍:
Journal of Informetrics (JOI) publishes rigorous high-quality research on quantitative aspects of information science. The main focus of the journal is on topics in bibliometrics, scientometrics, webometrics, patentometrics, altmetrics and research evaluation. Contributions studying informetric problems using methods from other quantitative fields, such as mathematics, statistics, computer science, economics and econometrics, and network science, are especially encouraged. JOI publishes both theoretical and empirical work. In general, case studies, for instance a bibliometric analysis focusing on a specific research field or a specific country, are not considered suitable for publication in JOI, unless they contain innovative methodological elements.