预测撤回研究：数据集和机器学习方法。

IF 10.7 Q1 ETHICS

Research integrity and peer review Pub Date : 2025-06-11 DOI:10.1186/s41073-025-00168-w

Aaron H A Fletcher, Mark Stevenson

{"title":"预测撤回研究：数据集和机器学习方法。","authors":"Aaron H A Fletcher, Mark Stevenson","doi":"10.1186/s41073-025-00168-w","DOIUrl":null,"url":null,"abstract":"Background: Retractions undermine the scientific record's reliability and can lead to the continued propagation of flawed research. This study aimed to (1) create a dataset aggregating retraction information with bibliographic metadata, (2) train and evaluate various machine learning approaches to predict article retractions, and (3) assess each feature's contribution to feature-based classifier performance using ablation studies.Methods: An open-access dataset was developed by combining information from the Retraction Watch database and the OpenAlex API. Using a case-controlled design, retracted research articles were paired with non-retracted articles published in the same period. Traditional feature-based classifiers and models leveraging contextual language representations were then trained and evaluated. Model performance was assessed using accuracy, precision, recall, and the F1-score.Results: The Llama 3.2 base model achieved the highest overall accuracy. The Random Forest classifier achieved a precision of 0.687 for identifying non-retracted articles, while the Llama 3.2 base model reached a precision of 0.683 for identifying retracted articles. Traditional feature-based classifiers generally outperformed most contextual language models, except for the Llama 3.2 base model, which showed competitive performance across several metrics.Conclusions: Although no single model excelled across all metrics, our findings indicate that machine learning techniques can effectively support the identification of retracted research. These results provide a foundation for developing automated tools to assist publishers and reviewers in detecting potentially problematic publications. Further research should focus on refining these models and investigating additional features to improve predictive performance.Trial registration: Not applicable.","PeriodicalId":74682,"journal":{"name":"Research integrity and peer review","volume":"10 1","pages":"9"},"PeriodicalIF":10.7000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12153192/pdf/","citationCount":"0","resultStr":"{\"title\":\"Predicting retracted research: a dataset and machine learning approaches.\",\"authors\":\"Aaron H A Fletcher, Mark Stevenson\",\"doi\":\"10.1186/s41073-025-00168-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Retractions undermine the scientific record's reliability and can lead to the continued propagation of flawed research. This study aimed to (1) create a dataset aggregating retraction information with bibliographic metadata, (2) train and evaluate various machine learning approaches to predict article retractions, and (3) assess each feature's contribution to feature-based classifier performance using ablation studies.Methods: An open-access dataset was developed by combining information from the Retraction Watch database and the OpenAlex API. Using a case-controlled design, retracted research articles were paired with non-retracted articles published in the same period. Traditional feature-based classifiers and models leveraging contextual language representations were then trained and evaluated. Model performance was assessed using accuracy, precision, recall, and the F1-score.Results: The Llama 3.2 base model achieved the highest overall accuracy. The Random Forest classifier achieved a precision of 0.687 for identifying non-retracted articles, while the Llama 3.2 base model reached a precision of 0.683 for identifying retracted articles. Traditional feature-based classifiers generally outperformed most contextual language models, except for the Llama 3.2 base model, which showed competitive performance across several metrics.Conclusions: Although no single model excelled across all metrics, our findings indicate that machine learning techniques can effectively support the identification of retracted research. These results provide a foundation for developing automated tools to assist publishers and reviewers in detecting potentially problematic publications. Further research should focus on refining these models and investigating additional features to improve predictive performance.Trial registration: Not applicable.\",\"PeriodicalId\":74682,\"journal\":{\"name\":\"Research integrity and peer review\",\"volume\":\"10 1\",\"pages\":\"9\"},\"PeriodicalIF\":10.7000,\"publicationDate\":\"2025-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12153192/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research integrity and peer review\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s41073-025-00168-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ETHICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research integrity and peer review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41073-025-00168-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ETHICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：撤稿破坏了科学记录的可靠性，并可能导致有缺陷的研究继续传播。本研究旨在(1)创建一个包含文献元数据的撤稿信息的数据集，(2)训练和评估各种机器学习方法来预测文章撤稿，以及(3)使用消融研究评估每个特征对基于特征的分类器性能的贡献。方法：结合《撤稿观察》数据库信息和OpenAlex API开发开放获取数据集。采用病例对照设计，将撤回的研究文章与同期发表的未撤回的文章配对。然后对传统的基于特征的分类器和利用上下文语言表示的模型进行训练和评估。使用准确性、精密度、召回率和f1分数来评估模型的性能。结果：Llama 3.2基础模型总体精度最高。随机森林分类器识别未撤稿文章的精度为0.687，而Llama 3.2基础模型识别撤稿文章的精度为0.683。传统的基于特征的分类器通常优于大多数上下文语言模型，除了Llama 3.2基本模型，它在几个指标上都表现出竞争力。结论：尽管没有一个单一的模型在所有指标上都表现出色，但我们的研究结果表明，机器学习技术可以有效地支持撤回研究的识别。这些结果为开发自动化工具提供了基础，以帮助出版商和审稿人检测潜在的问题出版物。进一步的研究应该集中在改进这些模型和研究其他特征以提高预测性能。试验注册：不适用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Predicting retracted research: a dataset and machine learning approaches.

查看原文本刊更多论文

Predicting retracted research: a dataset and machine learning approaches.

Background: Retractions undermine the scientific record's reliability and can lead to the continued propagation of flawed research. This study aimed to (1) create a dataset aggregating retraction information with bibliographic metadata, (2) train and evaluate various machine learning approaches to predict article retractions, and (3) assess each feature's contribution to feature-based classifier performance using ablation studies.

Methods: An open-access dataset was developed by combining information from the Retraction Watch database and the OpenAlex API. Using a case-controlled design, retracted research articles were paired with non-retracted articles published in the same period. Traditional feature-based classifiers and models leveraging contextual language representations were then trained and evaluated. Model performance was assessed using accuracy, precision, recall, and the F1-score.

Results: The Llama 3.2 base model achieved the highest overall accuracy. The Random Forest classifier achieved a precision of 0.687 for identifying non-retracted articles, while the Llama 3.2 base model reached a precision of 0.683 for identifying retracted articles. Traditional feature-based classifiers generally outperformed most contextual language models, except for the Llama 3.2 base model, which showed competitive performance across several metrics.

Conclusions: Although no single model excelled across all metrics, our findings indicate that machine learning techniques can effectively support the identification of retracted research. These results provide a foundation for developing automated tools to assist publishers and reviewers in detecting potentially problematic publications. Further research should focus on refining these models and investigating additional features to improve predictive performance.

Trial registration: Not applicable.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Research integrity and peer review

自引率

0.00%

发文量

审稿时长

5 weeks