Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT's accuracy and reproducibility.

IF 7.7
PLOS digital health Pub Date : 2025-06-30 eCollection Date: 2025-06-01 DOI:10.1371/journal.pdig.0000695
Yasuko Fukataki, Wakako Hayashi, Naoki Nishimoto, Yoichi M Ito
{"title":"Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT's accuracy and reproducibility.","authors":"Yasuko Fukataki, Wakako Hayashi, Naoki Nishimoto, Yoichi M Ito","doi":"10.1371/journal.pdig.0000695","DOIUrl":null,"url":null,"abstract":"<p><p>This pilot study is the first phase of a broader project aimed at developing an explainable artificial intelligence (AI) tool to support the ethical evaluation of Japanese-language clinical research documents. The tool is explicitly not intended to assist document drafting. We assessed the baseline performance of generative AI-Generative Pre-trained Transformer (GPT)-4 and GPT-4o-in analyzing clinical research protocols and informed consent forms (ICFs). The goal was to determine whether these models could accurately and consistently extract ethically relevant information, including the research objectives and background, research design, and participant-related risks and benefits. First, we compared the performance of GPT-4 and GPT-4o using custom agents developed via OpenAI's Custom GPT functionality (hereafter \"GPTs\"). Then, using GPT-4o alone, we compared outputs generated by GPTs optimized with customized Japanese prompts to those generated by standard prompts. GPT-4o achieved 80% agreement in extracting research objectives and background and 100% in extracting research design, while both models demonstrated high reproducibility across ten trials. GPTs with customized prompts produced more accurate and consistent outputs than standard prompts. This study suggests the potential utility of generative AI in pre-institutional review board (IRB) review tasks; it also provides foundational data for future validation and standardization efforts involving retrieval-augmented generation and fine-tuning. Importantly, this tool is intended not to automate ethical review but rather to support IRB decision-making. Limitations include the absence of gold standard reference data, reliance on a single evaluator, lack of convergence and inter-rater reliability analysis, and the inability of AI to substitute for in-person elements such as site visits.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 6","pages":"e0000695"},"PeriodicalIF":7.7000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12208443/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000695","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This pilot study is the first phase of a broader project aimed at developing an explainable artificial intelligence (AI) tool to support the ethical evaluation of Japanese-language clinical research documents. The tool is explicitly not intended to assist document drafting. We assessed the baseline performance of generative AI-Generative Pre-trained Transformer (GPT)-4 and GPT-4o-in analyzing clinical research protocols and informed consent forms (ICFs). The goal was to determine whether these models could accurately and consistently extract ethically relevant information, including the research objectives and background, research design, and participant-related risks and benefits. First, we compared the performance of GPT-4 and GPT-4o using custom agents developed via OpenAI's Custom GPT functionality (hereafter "GPTs"). Then, using GPT-4o alone, we compared outputs generated by GPTs optimized with customized Japanese prompts to those generated by standard prompts. GPT-4o achieved 80% agreement in extracting research objectives and background and 100% in extracting research design, while both models demonstrated high reproducibility across ten trials. GPTs with customized prompts produced more accurate and consistent outputs than standard prompts. This study suggests the potential utility of generative AI in pre-institutional review board (IRB) review tasks; it also provides foundational data for future validation and standardization efforts involving retrieval-augmented generation and fine-tuning. Importantly, this tool is intended not to automate ethical review but rather to support IRB decision-making. Limitations include the absence of gold standard reference data, reliance on a single evaluator, lack of convergence and inter-rater reliability analysis, and the inability of AI to substitute for in-person elements such as site visits.

开发机构审查委员会预审查的人工智能工具:ChatGPT准确性和可重复性的试点研究。
这项试点研究是一个更广泛项目的第一阶段,该项目旨在开发一种可解释的人工智能(AI)工具,以支持对日语临床研究文件的伦理评估。该工具显然不是用来协助文档起草的。我们评估了生成式人工智能的基线性能——生成式预训练转换器(GPT)-4和GPT- 40 -分析临床研究方案和知情同意书(icf)。目的是确定这些模型是否能够准确和一致地提取伦理相关信息,包括研究目标和背景、研究设计以及参与者相关的风险和收益。首先,我们使用通过OpenAI的custom GPT功能(以下简称“GPT”)开发的自定义代理来比较GPT-4和GPT- 40的性能。然后,仅使用gpt - 40,我们比较了使用自定义日语提示进行优化的gpt生成的输出与使用标准提示生成的输出。gpt - 40在提取研究目标和背景方面达到80%的一致性,在提取研究设计方面达到100%的一致性,并且两种模型在10个试验中都具有很高的可重复性。带有定制提示的gpt比标准提示产生更准确和一致的输出。该研究表明,生成式人工智能在机构前审查委员会(IRB)审查任务中的潜在效用;它还为涉及检索增强生成和微调的未来验证和标准化工作提供了基础数据。重要的是,该工具的目的不是自动化伦理审查,而是支持IRB决策。局限性包括缺乏黄金标准参考数据,依赖单一评估者,缺乏融合和评估者之间的可靠性分析,以及人工智能无法替代现场访问等亲自元素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信