Benchmarking AI chatbots: assessing their accuracy in identifying hijacked medical journals.

IF 2 Q2 MEDICINE, GENERAL & INTERNAL

Diagnosis Pub Date : 2025-05-22 DOI:10.1515/dx-2025-0043

Mihály Hegedűs, Mehdi Dadkhah, Lóránt Dénes Dávid

{"title":"Benchmarking AI chatbots: assessing their accuracy in identifying hijacked medical journals.","authors":"Mihály Hegedűs, Mehdi Dadkhah, Lóránt Dénes Dávid","doi":"10.1515/dx-2025-0043","DOIUrl":null,"url":null,"abstract":"Objectives: The challenges posed by questionable journals to academia are very real, and being able to detect hijacked journals would be valuable to the research community. Using an artificial intelligence (AI) chatbot may be a promising approach to early detection. The purpose of this research is to analyze and benchmark the performance of different AI chatbots in identifying hijacked medical journals.Methods: This study utilized a dataset comprising 21 previously identified hijacked journals and 10 newly detected hijacked journals, alongside their respective legitimate versions. ChatGPT, Gemini, Copilot, DeepSeek, Qwen, Perplexity, and Claude were selected for benchmarking. Three question types were developed to assess AI chatbots' performance in providing information about hijacked journals, identifying hijacked websites, and verifying legitimate ones.Results: The results show that current AI chatbots can provide general information about hijacked journals, but cannot reliably identify either real or hijacked journal titles. While Copilot performed better than others, it was not error-free.Conclusions: Current AI chatbots are not yet reliable for detecting hijacked journals and may inadvertently promote them.","PeriodicalId":11273,"journal":{"name":"Diagnosis","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnosis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/dx-2025-0043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: The challenges posed by questionable journals to academia are very real, and being able to detect hijacked journals would be valuable to the research community. Using an artificial intelligence (AI) chatbot may be a promising approach to early detection. The purpose of this research is to analyze and benchmark the performance of different AI chatbots in identifying hijacked medical journals.

Methods: This study utilized a dataset comprising 21 previously identified hijacked journals and 10 newly detected hijacked journals, alongside their respective legitimate versions. ChatGPT, Gemini, Copilot, DeepSeek, Qwen, Perplexity, and Claude were selected for benchmarking. Three question types were developed to assess AI chatbots' performance in providing information about hijacked journals, identifying hijacked websites, and verifying legitimate ones.

Results: The results show that current AI chatbots can provide general information about hijacked journals, but cannot reliably identify either real or hijacked journal titles. While Copilot performed better than others, it was not error-free.

Conclusions: Current AI chatbots are not yet reliable for detecting hijacked journals and may inadvertently promote them.

查看原文本刊更多论文

对人工智能聊天机器人进行基准测试：评估其识别被劫持医学期刊的准确性。

目标：有问题的期刊给学术界带来的挑战是非常真实的，能够发现被劫持的期刊对研究界是有价值的。使用人工智能（AI）聊天机器人可能是一种很有前途的早期检测方法。本研究的目的是分析和基准测试不同的人工智能聊天机器人在识别被劫持的医学期刊方面的表现。方法：本研究利用了一个数据集，其中包括21种先前确定的被劫持期刊和10种新发现的被劫持期刊，以及它们各自的合法版本。选择ChatGPT、Gemini、Copilot、DeepSeek、Qwen、Perplexity和Claude进行基准测试。研究人员开发了三种问题类型来评估人工智能聊天机器人在提供被劫持期刊信息、识别被劫持网站和验证合法网站方面的表现。结果表明，目前的人工智能聊天机器人可以提供被劫持期刊的一般信息，但无法可靠地识别真实或被劫持的期刊标题。虽然“副驾驶”的表现要好于其他软件，但它并非没有错误。结论：目前的人工智能聊天机器人在检测被劫持期刊方面还不可靠，可能会无意中促进它们的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Diagnosis MEDICINE, GENERAL & INTERNAL-

CiteScore

7.20

自引率

5.70%

发文量

期刊介绍： Diagnosis focuses on how diagnosis can be advanced, how it is taught, and how and why it can fail, leading to diagnostic errors. The journal welcomes both fundamental and applied works, improvement initiatives, opinions, and debates to encourage new thinking on improving this critical aspect of healthcare quality.　 Topics: -Factors that promote diagnostic quality and safety -Clinical reasoning -Diagnostic errors in medicine -The factors that contribute to diagnostic error: human factors, cognitive issues, and system-related breakdowns -Improving the value of diagnosis – eliminating waste and unnecessary testing -How culture and removing blame promote awareness of diagnostic errors -Training and education related to clinical reasoning and diagnostic skills -Advances in laboratory testing and imaging that improve diagnostic capability -Local, national and international initiatives to reduce diagnostic error