Can We Mathematically Spot the Possible Manipulation of Results in Research Manuscripts Using Benford’s Law?

IF 2 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Pub Date : 2023-10-31 DOI:10.3390/data8110165

Teddy Lazebnik, Dan Gorlitsky

{"title":"Can We Mathematically Spot the Possible Manipulation of Results in Research Manuscripts Using Benford’s Law?","authors":"Teddy Lazebnik, Dan Gorlitsky","doi":"10.3390/data8110165","DOIUrl":null,"url":null,"abstract":"The reproducibility of academic research has long been a persistent issue, contradicting one of the fundamental principles of science. Recently, there has been an increasing number of false claims found in academic manuscripts, casting doubt on the validity of reported results. In this paper, we utilize an adapted version of Benford’s law, a statistical phenomenon that describes the distribution of leading digits in naturally occurring datasets, to identify the potential manipulation of results in research manuscripts, solely using the aggregated data presented in those manuscripts rather than the commonly unavailable raw datasets. Our methodology applies the principles of Benford’s law to commonly employed analyses in academic manuscripts, thus reducing the need for the raw data itself. To validate our approach, we employed 100 open-source datasets and successfully predicted 79% of them accurately using our rules. Moreover, we tested the proposed method on known retracted manuscripts, showing that around half (48.6%) can be detected using the proposed method. Additionally, we analyzed 100 manuscripts published in the last two years across ten prominent economic journals, with 10 manuscripts randomly sampled from each journal. Our analysis predicted a 3% occurrence of results manipulation with a 96% confidence level. Our findings show that Benford’s law adapted for aggregated data, can be an initial tool for identifying data manipulation; however, it is not a silver bullet, requiring further investigation for each flagged manuscript due to the relatively low prediction accuracy.","PeriodicalId":36824,"journal":{"name":"Data","volume":"54 1","pages":"0"},"PeriodicalIF":2.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/data8110165","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The reproducibility of academic research has long been a persistent issue, contradicting one of the fundamental principles of science. Recently, there has been an increasing number of false claims found in academic manuscripts, casting doubt on the validity of reported results. In this paper, we utilize an adapted version of Benford’s law, a statistical phenomenon that describes the distribution of leading digits in naturally occurring datasets, to identify the potential manipulation of results in research manuscripts, solely using the aggregated data presented in those manuscripts rather than the commonly unavailable raw datasets. Our methodology applies the principles of Benford’s law to commonly employed analyses in academic manuscripts, thus reducing the need for the raw data itself. To validate our approach, we employed 100 open-source datasets and successfully predicted 79% of them accurately using our rules. Moreover, we tested the proposed method on known retracted manuscripts, showing that around half (48.6%) can be detected using the proposed method. Additionally, we analyzed 100 manuscripts published in the last two years across ten prominent economic journals, with 10 manuscripts randomly sampled from each journal. Our analysis predicted a 3% occurrence of results manipulation with a 96% confidence level. Our findings show that Benford’s law adapted for aggregated data, can be an initial tool for identifying data manipulation; however, it is not a silver bullet, requiring further investigation for each flagged manuscript due to the relatively low prediction accuracy.

查看原文本刊更多论文

我们可以用本福德定律在数学上发现研究手稿中可能的操纵结果吗?

学术研究的可重复性是一个长期存在的问题，与科学的基本原则之一相矛盾。最近，在学术论文中发现了越来越多的虚假声明，这让人们对报告结果的有效性产生了怀疑。在本文中，我们利用本福德定律(一种描述自然发生的数据集中前导数字分布的统计现象)的改编版本来识别研究手稿中结果的潜在操纵，仅使用这些手稿中呈现的汇总数据，而不是通常不可用的原始数据集。我们的方法将本福德定律的原则应用于学术手稿中常用的分析，从而减少了对原始数据本身的需求。为了验证我们的方法，我们使用了100个开源数据集，并使用我们的规则成功预测了其中79%的数据集。此外，我们对已知的撤稿进行了测试，结果表明，使用该方法可以检测到大约一半(48.6%)的撤稿。此外，我们分析了过去两年在10个著名经济学期刊上发表的100篇手稿，每个期刊随机抽取10篇手稿。我们的分析以96%的置信水平预测了3%的结果操纵发生。我们的研究结果表明，本福德定律适用于汇总数据，可以作为识别数据操纵的初始工具;然而，这并不是灵丹妙药，由于预测精度相对较低，需要对每个标记的手稿进行进一步的调查。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊