利用人工智能全面报告医疗人工智能研究

IF 3 4区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Mohamed L. Seghier
{"title":"利用人工智能全面报告医疗人工智能研究","authors":"Mohamed L. Seghier","doi":"10.1002/ima.70047","DOIUrl":null,"url":null,"abstract":"<p>In this editorial, I would like to succinctly discuss the potential of using AI to improve reporting medical AI research. There are already several published guidelines and checklists in the current literature but how they are interpreted and implemented varies with publishers, editors, reviewers and authors. Here, I discuss the possibility of harnessing generative AI tools in order to assist authors to comprehensively report their AI work and meet current guidelines, with the ultimate aim to improve transparency and replicability in medical AI research. The succinct discussion below reckons two key issues: (1) AI has a seductive allure that might affect how AI-generated evidence is scrutinized and disseminated, hence the need for comprehensive and transparent reporting, and (2) authors sometimes feel uncertain about what to report in the light of so many existing guidelines about reporting AI research and the lack of consensus in the field.</p><p>It has been argued that extraneous or irrelevant information with a seductive allure can improve the ratings of scientific explanations [<span>1</span>]. AI, with its overhyped knowledgeability, can convey biases and false information that readers might judge believable [<span>2</span>]. AI can write highly convincing text that can impress or deceive readers, even in the presence of errors and false information [<span>3, 4</span>]. Likewise, merely mentioning “AI” in the title of a research paper seems to increase its citation potential [<span>5</span>]. The latter might incentivise scientists to use AI purely to boost their work citability, regardless of whether AI improved their work quality. In this context, one might speculate that some publications that used AI but with flawed methodologies or wrong conclusions might have slipped through the cracks of peer review, with many already being indexed and citable [<span>6</span>]. Overall, emerging evidence suggests that AI has an intrinsic seductive allure that is shaping the medical research landscape and impacting how readers appraise research articles that employ AI. This is why improving the reporting and evaluation of AI work is of paramount importance, and in this editorial, I underscore the potential role of generative AI for that purpose.</p><p>Consider this: readers might find a paper entitled “<i>Association between condition X and biomarker Y demonstrated with deep learning</i>” novel and worth reading. Now, imagine if the same finding was evidenced with a traditional analysis method and entitled “<i>Association between condition X and biomarker Y demonstrated with a correlation analysis</i>”, though it is unlikely that the authors of the latter will consider correlation analysis worth mentioning in the article title. Although both pieces of work report the same finding, they may not enjoy the same buzz and high citability in the field. This is because AI-based methods and traditional analysis methods operate at different maturity levels. Readers (and reviewers) are quite familiar with the scope and limitations of a correlation analysis, but the same cannot be said about AI. Having clear guidelines on how to comprehensively report and rigorously evaluate medical AI research is thus extremely important.</p><p>No one denies AI's huge potential in medical research, such as automating the analysis of complex medical data and accelerating the discovery of useful markers. However, AI may discover new data-driven features and disease-markers relationships that do not always align with prior medical knowledge, raising the question of how to reconcile common medical knowledge with AI-generated evidence. Likewise, there is a risk that AI's seductive allure might diminish the critical analysis and scrutiny of AI-generated evidence, thus weakening the rigour of the peer review process in evaluating AI papers. Therefore, when AI is used to enhance the process of scientific discovery, the core principles of scientific methodology, including falsifiability, must be upheld. However, when it comes to falsifiability, independently testing and disproving AI-generated evidence remains difficult. For example, does a 2% reduction in accuracy or another performance metric disprove a particular AI method?</p><p>Indeed, there is no consensus about the conceptual and methodological frameworks by which AI-generated evidence can be securitized and falsified. This is because deploying AI to study a particular question involves several aspects that create multiple sources of error or bias that are not always easy to gauge. This includes how data is curated, cleaned, imputed, augmented, divided or aggregated, how relevant features are identified, reduced or combined, and how AI architecture is built, trained or validated. As AI can generate fabricated articles including articles with empirical results (see discussion in [<span>4</span>]), frameworks that uphold falsification are paramount in AI research [<span>7</span>]. The recent example of the AI-Generated Science (AIGS) system [<span>8</span>], with AI agents that can independently and autonomously create knowledge, poses significant questions to AI research at many ethical, legal and scientific levels. This is why the authors of AIGS identified falsification as a core agent of that system to verify and scrutinise AI-generated scientific discoveries.</p><p>To minimise the risk of proliferation of flawed or fabricated AI research that could harm clinical practice, many guidelines and checklists for improving the reporting of medical AI research have been proposed. Such AI reporting guidelines are very useful to support authors to comprehensively present their AI work and to enhance the rigorous evaluation of their work during the peer review process. Some of the existing guidelines include MAIC-10 (Must AI Criteria-10), CLAIM (Checklist for Artificial Intelligence in Medical Imaging), STARD-AI (Standards for Reporting of Diagnostic Accuracy Study-AI), MI-CLAIM (Minimum Information about Clinical Artificial Intelligence Modeling), MINIMAR (Minimum Information for Medical AI Reporting), RQS (Radiomics Quality Score), QAMAI (Quality Analysis of Medical Artificial Intelligence), TRIPOD+AI (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis), CONSORT-AI (Consolidated Standards of Reporting Trials–AI), SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-AI), FUTURE-AI (Fairness Universality Traceability Usability Robustness Explainability-AI), CAIR (Clinical AI Research), DECIDE-AI (Developmental and Exploratory Clinical Investigations of DEcision support systems driven by Artificial Intelligence), CLEAR (CheckList for EvaluAtion of Radiomics research), DOME (Data, Optimization, Model and Evaluation); see discussion in [<span>9-11</span>]. The relevance of each checklist depends on the specific topic and scope of the AI research.</p><p>However, AI researchers feel overwhelmed (and sometimes confused) by so many guidelines and checklists that are not implemented or interpreted in the same way by reviewers, editors and publishers. Hence, to maximise their impact and usefulness, publishers should consider offering easy-to-follow article templates that explicitly specify what one must report in each section of a manuscript in order to meet their guidelines and checklists about AI research. Likewise, similar to existing AI-powered tools for plagiarism detection, image manipulation, and language editing, publishers should join force with AI developers to create AI-powered tools that can automatically flag up submissions that do not conform to specific guidelines and provide constructive feedback to authors on how to improve the reporting of their AI research. Such AI tools can be made accessible to authors before submission to guide them through the process of improving their manuscripts. These tools should be fine-tuned and updated regularly to meet the ever-changing challenges and trends of AI research, thereby ensuring comprehensive and accurate reporting of medical AI research and ultimately improving transparency and replicability in the field.</p><p>The author declares no conflicts of interest.</p>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 2","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ima.70047","citationCount":"0","resultStr":"{\"title\":\"Harnessing AI for Comprehensive Reporting of Medical AI Research\",\"authors\":\"Mohamed L. Seghier\",\"doi\":\"10.1002/ima.70047\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In this editorial, I would like to succinctly discuss the potential of using AI to improve reporting medical AI research. There are already several published guidelines and checklists in the current literature but how they are interpreted and implemented varies with publishers, editors, reviewers and authors. Here, I discuss the possibility of harnessing generative AI tools in order to assist authors to comprehensively report their AI work and meet current guidelines, with the ultimate aim to improve transparency and replicability in medical AI research. The succinct discussion below reckons two key issues: (1) AI has a seductive allure that might affect how AI-generated evidence is scrutinized and disseminated, hence the need for comprehensive and transparent reporting, and (2) authors sometimes feel uncertain about what to report in the light of so many existing guidelines about reporting AI research and the lack of consensus in the field.</p><p>It has been argued that extraneous or irrelevant information with a seductive allure can improve the ratings of scientific explanations [<span>1</span>]. AI, with its overhyped knowledgeability, can convey biases and false information that readers might judge believable [<span>2</span>]. AI can write highly convincing text that can impress or deceive readers, even in the presence of errors and false information [<span>3, 4</span>]. Likewise, merely mentioning “AI” in the title of a research paper seems to increase its citation potential [<span>5</span>]. The latter might incentivise scientists to use AI purely to boost their work citability, regardless of whether AI improved their work quality. In this context, one might speculate that some publications that used AI but with flawed methodologies or wrong conclusions might have slipped through the cracks of peer review, with many already being indexed and citable [<span>6</span>]. Overall, emerging evidence suggests that AI has an intrinsic seductive allure that is shaping the medical research landscape and impacting how readers appraise research articles that employ AI. This is why improving the reporting and evaluation of AI work is of paramount importance, and in this editorial, I underscore the potential role of generative AI for that purpose.</p><p>Consider this: readers might find a paper entitled “<i>Association between condition X and biomarker Y demonstrated with deep learning</i>” novel and worth reading. Now, imagine if the same finding was evidenced with a traditional analysis method and entitled “<i>Association between condition X and biomarker Y demonstrated with a correlation analysis</i>”, though it is unlikely that the authors of the latter will consider correlation analysis worth mentioning in the article title. Although both pieces of work report the same finding, they may not enjoy the same buzz and high citability in the field. This is because AI-based methods and traditional analysis methods operate at different maturity levels. Readers (and reviewers) are quite familiar with the scope and limitations of a correlation analysis, but the same cannot be said about AI. Having clear guidelines on how to comprehensively report and rigorously evaluate medical AI research is thus extremely important.</p><p>No one denies AI's huge potential in medical research, such as automating the analysis of complex medical data and accelerating the discovery of useful markers. However, AI may discover new data-driven features and disease-markers relationships that do not always align with prior medical knowledge, raising the question of how to reconcile common medical knowledge with AI-generated evidence. Likewise, there is a risk that AI's seductive allure might diminish the critical analysis and scrutiny of AI-generated evidence, thus weakening the rigour of the peer review process in evaluating AI papers. Therefore, when AI is used to enhance the process of scientific discovery, the core principles of scientific methodology, including falsifiability, must be upheld. However, when it comes to falsifiability, independently testing and disproving AI-generated evidence remains difficult. For example, does a 2% reduction in accuracy or another performance metric disprove a particular AI method?</p><p>Indeed, there is no consensus about the conceptual and methodological frameworks by which AI-generated evidence can be securitized and falsified. This is because deploying AI to study a particular question involves several aspects that create multiple sources of error or bias that are not always easy to gauge. This includes how data is curated, cleaned, imputed, augmented, divided or aggregated, how relevant features are identified, reduced or combined, and how AI architecture is built, trained or validated. As AI can generate fabricated articles including articles with empirical results (see discussion in [<span>4</span>]), frameworks that uphold falsification are paramount in AI research [<span>7</span>]. The recent example of the AI-Generated Science (AIGS) system [<span>8</span>], with AI agents that can independently and autonomously create knowledge, poses significant questions to AI research at many ethical, legal and scientific levels. This is why the authors of AIGS identified falsification as a core agent of that system to verify and scrutinise AI-generated scientific discoveries.</p><p>To minimise the risk of proliferation of flawed or fabricated AI research that could harm clinical practice, many guidelines and checklists for improving the reporting of medical AI research have been proposed. Such AI reporting guidelines are very useful to support authors to comprehensively present their AI work and to enhance the rigorous evaluation of their work during the peer review process. Some of the existing guidelines include MAIC-10 (Must AI Criteria-10), CLAIM (Checklist for Artificial Intelligence in Medical Imaging), STARD-AI (Standards for Reporting of Diagnostic Accuracy Study-AI), MI-CLAIM (Minimum Information about Clinical Artificial Intelligence Modeling), MINIMAR (Minimum Information for Medical AI Reporting), RQS (Radiomics Quality Score), QAMAI (Quality Analysis of Medical Artificial Intelligence), TRIPOD+AI (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis), CONSORT-AI (Consolidated Standards of Reporting Trials–AI), SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-AI), FUTURE-AI (Fairness Universality Traceability Usability Robustness Explainability-AI), CAIR (Clinical AI Research), DECIDE-AI (Developmental and Exploratory Clinical Investigations of DEcision support systems driven by Artificial Intelligence), CLEAR (CheckList for EvaluAtion of Radiomics research), DOME (Data, Optimization, Model and Evaluation); see discussion in [<span>9-11</span>]. The relevance of each checklist depends on the specific topic and scope of the AI research.</p><p>However, AI researchers feel overwhelmed (and sometimes confused) by so many guidelines and checklists that are not implemented or interpreted in the same way by reviewers, editors and publishers. Hence, to maximise their impact and usefulness, publishers should consider offering easy-to-follow article templates that explicitly specify what one must report in each section of a manuscript in order to meet their guidelines and checklists about AI research. Likewise, similar to existing AI-powered tools for plagiarism detection, image manipulation, and language editing, publishers should join force with AI developers to create AI-powered tools that can automatically flag up submissions that do not conform to specific guidelines and provide constructive feedback to authors on how to improve the reporting of their AI research. Such AI tools can be made accessible to authors before submission to guide them through the process of improving their manuscripts. These tools should be fine-tuned and updated regularly to meet the ever-changing challenges and trends of AI research, thereby ensuring comprehensive and accurate reporting of medical AI research and ultimately improving transparency and replicability in the field.</p><p>The author declares no conflicts of interest.</p>\",\"PeriodicalId\":14027,\"journal\":{\"name\":\"International Journal of Imaging Systems and Technology\",\"volume\":\"35 2\",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ima.70047\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Imaging Systems and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ima.70047\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Imaging Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ima.70047","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

在这篇社论中,我想简要讨论利用人工智能改进医学人工智能研究报告的潜力。在目前的文献中已经有一些已发表的指南和清单,但是如何解释和实施这些指南和清单因出版商、编辑、审稿人和作者而异。在这里,我讨论了利用生成式人工智能工具的可能性,以帮助作者全面报告他们的人工智能工作并满足当前的指导方针,最终目的是提高医疗人工智能研究的透明度和可复制性。下面的简短讨论考虑了两个关键问题:(1)人工智能具有诱人的吸引力,可能会影响人工智能产生的证据的审查和传播方式,因此需要全面透明的报告;(2)鉴于报告人工智能研究的现有指导方针如此之多,而且该领域缺乏共识,作者有时不确定该报告什么。有人认为,带有诱人诱惑力的无关或不相关的信息可以提高科学解释的评级。人工智能凭借其被夸大的知识能力,可能会传达读者可能认为可信的偏见和虚假信息。即使在存在错误和虚假信息的情况下,人工智能也可以写出具有高度说服力的文本,这些文本可以给读者留下深刻印象,也可以欺骗读者[3,4]。同样,仅仅在一篇研究论文的标题中提到“人工智能”似乎就会增加其被引用的可能性。后者可能会激励科学家纯粹为了提高他们的工作可引用性而使用人工智能,而不管人工智能是否提高了他们的工作质量。在这种情况下,人们可能会推测,一些使用人工智能但方法有缺陷或结论错误的出版物可能已经通过了同行评议的缝隙,其中许多出版物已经被编入索引并被引用。总体而言,新出现的证据表明,人工智能具有内在的吸引力,正在塑造医学研究格局,并影响读者对使用人工智能的研究文章的评价。这就是为什么改进人工智能工作的报告和评估至关重要,在这篇社论中,我强调了生成式人工智能在这方面的潜在作用。考虑一下:读者可能会发现一篇题为“深度学习证明了X和生物标志物Y之间的关联”的论文新颖而值得一读。现在,想象一下,如果用传统的分析方法证明了同样的发现,并命名为“病症X和生物标志物Y之间的关联通过相关分析证明”,尽管后者的作者不太可能认为相关分析值得在文章标题中提及。尽管这两篇论文都报告了相同的发现,但它们在该领域可能不会受到同样的关注和高引用率。这是因为基于人工智能的方法和传统的分析方法在不同的成熟度水平上运行。读者(和评论者)都非常熟悉相关分析的范围和局限性,但AI却并非如此。因此,就如何全面报告和严格评估医疗人工智能研究制定明确的指导方针极为重要。没有人否认人工智能在医学研究中的巨大潜力,比如自动化分析复杂的医疗数据,加速发现有用的标记物。然而,人工智能可能会发现新的数据驱动的特征和疾病标志物关系,这些特征和关系并不总是与先前的医学知识一致,这就提出了如何将普通医学知识与人工智能生成的证据相协调的问题。同样,人工智能的诱人魅力可能会减少对人工智能产生的证据的批判性分析和审查,从而削弱同行评议过程在评估人工智能论文中的严谨性。因此,当使用人工智能来增强科学发现的过程时,必须坚持科学方法论的核心原则,包括可证伪性。然而,在可证伪性方面,独立测试和反驳人工智能生成的证据仍然很困难。例如,准确性降低2%或其他性能指标是否证明特定的AI方法是错误的?事实上,对于人工智能生成的证据可以被证券化和证伪的概念和方法框架,人们没有达成共识。这是因为部署人工智能来研究一个特定的问题涉及到几个方面,这些方面会产生多个不容易衡量的错误或偏见来源。这包括如何管理、清理、输入、增强、划分或汇总数据,如何识别、减少或组合相关特征,以及如何构建、训练或验证人工智能架构。由于人工智能可以生成虚构的文章,包括具有实证结果的文章(见b[4]中的讨论),因此支持证伪的框架在人工智能研究b[7]中至关重要。 最近的人工智能生成科学(AIGS)系统[8]的例子,其中人工智能代理可以独立自主地创造知识,这对人工智能研究在许多伦理、法律和科学层面提出了重大问题。这就是为什么AIGS的作者将伪造确定为该系统的核心代理,以验证和审查人工智能产生的科学发现。为了最大限度地降低可能损害临床实践的有缺陷或捏造的人工智能研究扩散的风险,已经提出了许多指导方针和清单,以改进医疗人工智能研究的报告。此类人工智能报告指南对于支持作者全面展示其人工智能工作并在同行评审过程中加强对其工作的严格评估非常有用。现有的一些指南包括mac -10(必须人工智能标准-10)、CLAIM(医学成像人工智能清单)、star -AI(诊断准确性研究报告标准-AI)、MI-CLAIM(临床人工智能建模最低信息)、MINIMAR(医疗人工智能报告最低信息)、RQS(放射组学质量评分)、QAMAI(医疗人工智能质量分析)、TRIPOD+AI(透明报告个体预后或诊断的多变量预测模型),consortium -AI(合并报告试验标准-AI), SPIRIT-AI(标准协议项目:介入试验建议-AI)、FUTURE-AI(公平性、通用性、可追溯性、可用性、稳健性、可解释性-AI)、CAIR(临床人工智能研究)、DECIDE-AI(人工智能驱动下决策支持系统的开发和探索性临床研究)、CLEAR(放射组学研究评估清单)、DOME(数据、优化、模型和评估);参见[9-11]中的讨论。每个清单的相关性取决于人工智能研究的具体主题和范围。然而,人工智能研究人员对如此多的指导方针和检查表感到不知所措(有时甚至感到困惑),而审稿人、编辑和出版商却没有以同样的方式执行或解释这些指导方针和检查表。因此,为了最大限度地发挥其影响力和实用性,出版商应该考虑提供易于遵循的文章模板,明确规定在稿件的每个部分必须报告什么,以满足他们关于人工智能研究的指导方针和清单。同样,与现有的用于抄袭检测、图像处理和语言编辑的人工智能工具类似,出版商应该与人工智能开发人员联手创建人工智能工具,这些工具可以自动标记不符合特定指导方针的提交,并就如何改进其人工智能研究报告向作者提供建设性的反馈。这样的人工智能工具可以在提交之前提供给作者,以指导他们完成改进手稿的过程。应定期对这些工具进行微调和更新,以满足人工智能研究不断变化的挑战和趋势,从而确保全面准确地报告医疗人工智能研究,并最终提高该领域的透明度和可复制性。作者声明无利益冲突。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Harnessing AI for Comprehensive Reporting of Medical AI Research

In this editorial, I would like to succinctly discuss the potential of using AI to improve reporting medical AI research. There are already several published guidelines and checklists in the current literature but how they are interpreted and implemented varies with publishers, editors, reviewers and authors. Here, I discuss the possibility of harnessing generative AI tools in order to assist authors to comprehensively report their AI work and meet current guidelines, with the ultimate aim to improve transparency and replicability in medical AI research. The succinct discussion below reckons two key issues: (1) AI has a seductive allure that might affect how AI-generated evidence is scrutinized and disseminated, hence the need for comprehensive and transparent reporting, and (2) authors sometimes feel uncertain about what to report in the light of so many existing guidelines about reporting AI research and the lack of consensus in the field.

It has been argued that extraneous or irrelevant information with a seductive allure can improve the ratings of scientific explanations [1]. AI, with its overhyped knowledgeability, can convey biases and false information that readers might judge believable [2]. AI can write highly convincing text that can impress or deceive readers, even in the presence of errors and false information [3, 4]. Likewise, merely mentioning “AI” in the title of a research paper seems to increase its citation potential [5]. The latter might incentivise scientists to use AI purely to boost their work citability, regardless of whether AI improved their work quality. In this context, one might speculate that some publications that used AI but with flawed methodologies or wrong conclusions might have slipped through the cracks of peer review, with many already being indexed and citable [6]. Overall, emerging evidence suggests that AI has an intrinsic seductive allure that is shaping the medical research landscape and impacting how readers appraise research articles that employ AI. This is why improving the reporting and evaluation of AI work is of paramount importance, and in this editorial, I underscore the potential role of generative AI for that purpose.

Consider this: readers might find a paper entitled “Association between condition X and biomarker Y demonstrated with deep learning” novel and worth reading. Now, imagine if the same finding was evidenced with a traditional analysis method and entitled “Association between condition X and biomarker Y demonstrated with a correlation analysis”, though it is unlikely that the authors of the latter will consider correlation analysis worth mentioning in the article title. Although both pieces of work report the same finding, they may not enjoy the same buzz and high citability in the field. This is because AI-based methods and traditional analysis methods operate at different maturity levels. Readers (and reviewers) are quite familiar with the scope and limitations of a correlation analysis, but the same cannot be said about AI. Having clear guidelines on how to comprehensively report and rigorously evaluate medical AI research is thus extremely important.

No one denies AI's huge potential in medical research, such as automating the analysis of complex medical data and accelerating the discovery of useful markers. However, AI may discover new data-driven features and disease-markers relationships that do not always align with prior medical knowledge, raising the question of how to reconcile common medical knowledge with AI-generated evidence. Likewise, there is a risk that AI's seductive allure might diminish the critical analysis and scrutiny of AI-generated evidence, thus weakening the rigour of the peer review process in evaluating AI papers. Therefore, when AI is used to enhance the process of scientific discovery, the core principles of scientific methodology, including falsifiability, must be upheld. However, when it comes to falsifiability, independently testing and disproving AI-generated evidence remains difficult. For example, does a 2% reduction in accuracy or another performance metric disprove a particular AI method?

Indeed, there is no consensus about the conceptual and methodological frameworks by which AI-generated evidence can be securitized and falsified. This is because deploying AI to study a particular question involves several aspects that create multiple sources of error or bias that are not always easy to gauge. This includes how data is curated, cleaned, imputed, augmented, divided or aggregated, how relevant features are identified, reduced or combined, and how AI architecture is built, trained or validated. As AI can generate fabricated articles including articles with empirical results (see discussion in [4]), frameworks that uphold falsification are paramount in AI research [7]. The recent example of the AI-Generated Science (AIGS) system [8], with AI agents that can independently and autonomously create knowledge, poses significant questions to AI research at many ethical, legal and scientific levels. This is why the authors of AIGS identified falsification as a core agent of that system to verify and scrutinise AI-generated scientific discoveries.

To minimise the risk of proliferation of flawed or fabricated AI research that could harm clinical practice, many guidelines and checklists for improving the reporting of medical AI research have been proposed. Such AI reporting guidelines are very useful to support authors to comprehensively present their AI work and to enhance the rigorous evaluation of their work during the peer review process. Some of the existing guidelines include MAIC-10 (Must AI Criteria-10), CLAIM (Checklist for Artificial Intelligence in Medical Imaging), STARD-AI (Standards for Reporting of Diagnostic Accuracy Study-AI), MI-CLAIM (Minimum Information about Clinical Artificial Intelligence Modeling), MINIMAR (Minimum Information for Medical AI Reporting), RQS (Radiomics Quality Score), QAMAI (Quality Analysis of Medical Artificial Intelligence), TRIPOD+AI (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis), CONSORT-AI (Consolidated Standards of Reporting Trials–AI), SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-AI), FUTURE-AI (Fairness Universality Traceability Usability Robustness Explainability-AI), CAIR (Clinical AI Research), DECIDE-AI (Developmental and Exploratory Clinical Investigations of DEcision support systems driven by Artificial Intelligence), CLEAR (CheckList for EvaluAtion of Radiomics research), DOME (Data, Optimization, Model and Evaluation); see discussion in [9-11]. The relevance of each checklist depends on the specific topic and scope of the AI research.

However, AI researchers feel overwhelmed (and sometimes confused) by so many guidelines and checklists that are not implemented or interpreted in the same way by reviewers, editors and publishers. Hence, to maximise their impact and usefulness, publishers should consider offering easy-to-follow article templates that explicitly specify what one must report in each section of a manuscript in order to meet their guidelines and checklists about AI research. Likewise, similar to existing AI-powered tools for plagiarism detection, image manipulation, and language editing, publishers should join force with AI developers to create AI-powered tools that can automatically flag up submissions that do not conform to specific guidelines and provide constructive feedback to authors on how to improve the reporting of their AI research. Such AI tools can be made accessible to authors before submission to guide them through the process of improving their manuscripts. These tools should be fine-tuned and updated regularly to meet the ever-changing challenges and trends of AI research, thereby ensuring comprehensive and accurate reporting of medical AI research and ultimately improving transparency and replicability in the field.

The author declares no conflicts of interest.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Imaging Systems and Technology
International Journal of Imaging Systems and Technology 工程技术-成像科学与照相技术
CiteScore
6.90
自引率
6.10%
发文量
138
审稿时长
3 months
期刊介绍: The International Journal of Imaging Systems and Technology (IMA) is a forum for the exchange of ideas and results relevant to imaging systems, including imaging physics and informatics. The journal covers all imaging modalities in humans and animals. IMA accepts technically sound and scientifically rigorous research in the interdisciplinary field of imaging, including relevant algorithmic research and hardware and software development, and their applications relevant to medical research. The journal provides a platform to publish original research in structural and functional imaging. The journal is also open to imaging studies of the human body and on animals that describe novel diagnostic imaging and analyses methods. Technical, theoretical, and clinical research in both normal and clinical populations is encouraged. Submissions describing methods, software, databases, replication studies as well as negative results are also considered. The scope of the journal includes, but is not limited to, the following in the context of biomedical research: Imaging and neuro-imaging modalities: structural MRI, functional MRI, PET, SPECT, CT, ultrasound, EEG, MEG, NIRS etc.; Neuromodulation and brain stimulation techniques such as TMS and tDCS; Software and hardware for imaging, especially related to human and animal health; Image segmentation in normal and clinical populations; Pattern analysis and classification using machine learning techniques; Computational modeling and analysis; Brain connectivity and connectomics; Systems-level characterization of brain function; Neural networks and neurorobotics; Computer vision, based on human/animal physiology; Brain-computer interface (BCI) technology; Big data, databasing and data mining.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信