重复使用科学数据集会降低论文的影响力吗？

IF 3.5 2区管理学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Informetrics Pub Date : 2025-10-19 DOI:10.1016/j.joi.2025.101737

Bo Yang, Hong Jiao, Qingqing Fan, Jiawen Chen, Jiaxue Liu

{"title":"重复使用科学数据集会降低论文的影响力吗？","authors":"Bo Yang, Hong Jiao, Qingqing Fan, Jiawen Chen, Jiaxue Liu","doi":"10.1016/j.joi.2025.101737","DOIUrl":null,"url":null,"abstract":"<div><div>Data reuse is increasingly advocated as a strategy to enhance research reproducibility, accelerate project progress, and reduce research costs. Although few dispute the principle of data reuse, its effect on citation performance in experiment-based or data-intensive studies remains uncertain. To dispel concerns about the impact of data reuse on research, researchers require clear evidence of its benefits. This study employs informetric analysis, analysis of variance, and multiple linear regression to conduct a large-scale investigation of scientists’ dataset (re)use behavior, providing direct evidence of the citation performance of their research. The results show that: (i) The volume of released data in biomedical and life sciences continues to grow steadily; however, tracking the (re)use of Gene Expression Omnibus datasets over time shows that actual utilization and reuse have not kept pace with; (ii) Papers that declare the reuse of released datasets, especially those reusing their own data (self-reuse), garner more citations, indicating that dataset reuse does not negatively impact citation performance and may even enhance it; (iii) Our co-citation model predicts that, owing to the “sheep flock effect,” data reuse could increase the exposure of reusers’ related works and subsequently enhance the citation performance of their other publications.</div></div>","PeriodicalId":48662,"journal":{"name":"Journal of Informetrics","volume":"19 4","pages":"Article 101737"},"PeriodicalIF":3.5000,"publicationDate":"2025-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Does reusing scientific datasets reduce the impact of the papers?\",\"authors\":\"Bo Yang, Hong Jiao, Qingqing Fan, Jiawen Chen, Jiaxue Liu\",\"doi\":\"10.1016/j.joi.2025.101737\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Data reuse is increasingly advocated as a strategy to enhance research reproducibility, accelerate project progress, and reduce research costs. Although few dispute the principle of data reuse, its effect on citation performance in experiment-based or data-intensive studies remains uncertain. To dispel concerns about the impact of data reuse on research, researchers require clear evidence of its benefits. This study employs informetric analysis, analysis of variance, and multiple linear regression to conduct a large-scale investigation of scientists’ dataset (re)use behavior, providing direct evidence of the citation performance of their research. The results show that: (i) The volume of released data in biomedical and life sciences continues to grow steadily; however, tracking the (re)use of Gene Expression Omnibus datasets over time shows that actual utilization and reuse have not kept pace with; (ii) Papers that declare the reuse of released datasets, especially those reusing their own data (self-reuse), garner more citations, indicating that dataset reuse does not negatively impact citation performance and may even enhance it; (iii) Our co-citation model predicts that, owing to the “sheep flock effect,” data reuse could increase the exposure of reusers’ related works and subsequently enhance the citation performance of their other publications.</div></div>\",\"PeriodicalId\":48662,\"journal\":{\"name\":\"Journal of Informetrics\",\"volume\":\"19 4\",\"pages\":\"Article 101737\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Informetrics\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1751157725000999\",\"RegionNum\":2,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informetrics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157725000999","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

数据重用作为一种提高研究可重复性、加快项目进度和降低研究成本的策略越来越受到推崇。虽然很少有人质疑数据重用的原则，但在基于实验或数据密集型的研究中，数据重用对引文性能的影响仍然不确定。为了消除对数据重用对研究的影响的担忧，研究人员需要明确的证据来证明它的好处。本研究采用信息计量分析、方差分析和多元线性回归等方法，对科研人员的数据集（再）使用行为进行了大规模调查，为科研人员的被引绩效提供了直接证据。结果表明：(i)生物医学和生命科学领域公布的数据量继续稳步增长；然而，随着时间的推移，对基因表达Omnibus数据集的（重复）使用的跟踪表明，实际的利用和重用并没有跟上；（ii）声明重用已发布数据集的论文，特别是那些重用自己数据（自我重用）的论文，获得了更多的引用，这表明数据集重用不会对引用性能产生负面影响，甚至可能提高引用性能；（iii）我们的共被引模型预测，由于“羊群效应”，数据重用可以增加重用者相关作品的曝光率，从而提高其其他出版物的被引性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Does reusing scientific datasets reduce the impact of the papers?

Data reuse is increasingly advocated as a strategy to enhance research reproducibility, accelerate project progress, and reduce research costs. Although few dispute the principle of data reuse, its effect on citation performance in experiment-based or data-intensive studies remains uncertain. To dispel concerns about the impact of data reuse on research, researchers require clear evidence of its benefits. This study employs informetric analysis, analysis of variance, and multiple linear regression to conduct a large-scale investigation of scientists’ dataset (re)use behavior, providing direct evidence of the citation performance of their research. The results show that: (i) The volume of released data in biomedical and life sciences continues to grow steadily; however, tracking the (re)use of Gene Expression Omnibus datasets over time shows that actual utilization and reuse have not kept pace with; (ii) Papers that declare the reuse of released datasets, especially those reusing their own data (self-reuse), garner more citations, indicating that dataset reuse does not negatively impact citation performance and may even enhance it; (iii) Our co-citation model predicts that, owing to the “sheep flock effect,” data reuse could increase the exposure of reusers’ related works and subsequently enhance the citation performance of their other publications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Informetrics Social Sciences-Library and Information Sciences

CiteScore

6.40

自引率

16.20%

发文量

期刊介绍： Journal of Informetrics (JOI) publishes rigorous high-quality research on quantitative aspects of information science. The main focus of the journal is on topics in bibliometrics, scientometrics, webometrics, patentometrics, altmetrics and research evaluation. Contributions studying informetric problems using methods from other quantitative fields, such as mathematics, statistics, computer science, economics and econometrics, and network science, are especially encouraged. JOI publishes both theoretical and empirical work. In general, case studies, for instance a bibliometric analysis focusing on a specific research field or a specific country, are not considered suitable for publication in JOI, unless they contain innovative methodological elements.