Lonni Besançon, Guillaume Cabanac, Cyril Labbé, Alexander Magazinov
{"title":"Sneaked references: Fabricated reference metadata distort citation counts","authors":"Lonni Besançon, Guillaume Cabanac, Cyril Labbé, Alexander Magazinov","doi":"10.1002/asi.24896","DOIUrl":null,"url":null,"abstract":"<p>We report evidence of an undocumented method to manipulate citation counts involving “sneaked” references. Sneaked references are registered as metadata for published scientific articles in which they do not appear. This manipulation exploits trusted relationships between various actors: publishers, the Crossref metadata registration agency, digital libraries, and bibliometric platforms. By collecting metadata from various sources, we show that extra undue references are actually sneaked in at Digital Object Identifier (DOI) registration time, resulting in artificially inflated citation counts. As a case study, focusing on three journals from a given publisher, we identified at least 9% sneaked references (<span></span><math>\n <mrow>\n <mn>5978</mn>\n <mo>⁄</mo>\n <mn>65</mn>\n <mo>,</mo>\n <mn>836</mn>\n </mrow></math>) mainly benefiting two authors. Despite not being present in the published articles, these sneaked references exist in metadata registries and inappropriately propagate to bibliometric dashboards. Furthermore, we discovered “lost” references: the studied bibliometric platform failed to index at least 56% (<span></span><math>\n <mrow>\n <mn>36,939</mn>\n <mo>/</mo>\n <mn>65,836</mn>\n </mrow></math>) of the references present in the HTML version of the publications. This research led to an investigation by Crossref (confirming our findings) and to subsequent corrective actions. The extent of the distortion—due to sneaked and lost references—in the global literature remains unknown and requires further investigations. Bibliometric platforms producing citation counts should identify, quantify, and correct these flaws to provide accurate data to their patrons and prevent further citation gaming.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"75 12","pages":"1368-1379"},"PeriodicalIF":2.8000,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asi.24896","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Association for Information Science and Technology","FirstCategoryId":"91","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/asi.24896","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
We report evidence of an undocumented method to manipulate citation counts involving “sneaked” references. Sneaked references are registered as metadata for published scientific articles in which they do not appear. This manipulation exploits trusted relationships between various actors: publishers, the Crossref metadata registration agency, digital libraries, and bibliometric platforms. By collecting metadata from various sources, we show that extra undue references are actually sneaked in at Digital Object Identifier (DOI) registration time, resulting in artificially inflated citation counts. As a case study, focusing on three journals from a given publisher, we identified at least 9% sneaked references () mainly benefiting two authors. Despite not being present in the published articles, these sneaked references exist in metadata registries and inappropriately propagate to bibliometric dashboards. Furthermore, we discovered “lost” references: the studied bibliometric platform failed to index at least 56% () of the references present in the HTML version of the publications. This research led to an investigation by Crossref (confirming our findings) and to subsequent corrective actions. The extent of the distortion—due to sneaked and lost references—in the global literature remains unknown and requires further investigations. Bibliometric platforms producing citation counts should identify, quantify, and correct these flaws to provide accurate data to their patrons and prevent further citation gaming.
期刊介绍:
The Journal of the Association for Information Science and Technology (JASIST) is a leading international forum for peer-reviewed research in information science. For more than half a century, JASIST has provided intellectual leadership by publishing original research that focuses on the production, discovery, recording, storage, representation, retrieval, presentation, manipulation, dissemination, use, and evaluation of information and on the tools and techniques associated with these processes.
The Journal welcomes rigorous work of an empirical, experimental, ethnographic, conceptual, historical, socio-technical, policy-analytic, or critical-theoretical nature. JASIST also commissions in-depth review articles (“Advances in Information Science”) and reviews of print and other media.